Hand Book of Applied Economic Statistics
Hand Book of Applied Economic Statistics
Hand Book of Applied Economic Statistics
ECONOMIC STATISTICS
edited by
Aman Ullah
University of California
Riverside, California
David E. A. Giles
University of Victoria
Victoria, British Columbia, Canada
M A R C E I
MARCELDEKKER,
INC. NEWYORK BASEL HONGKONG
D E K K E R
Library of Congress Cataloging-in-Publication Data
Headquarters
Marcel Dekker, Inc.
270 Madison Avenue, New York, NY 10016
tel: 212-696-9000; fax: 212-685-4540
The publisher offers discounts on this book when ordered in bulk quantities. For
more information, write to Special Sdes/Professional Marketing at the address
below.
Neither this book nor any part may be reproduced or transmitted in any form or
by any means, electronic or mechanical, including photocopying, microfilming,
and recording, or by any information storage and retrieval system, without
permission in writing from the publisher.
Many applied subjects, including economic statistics, deal with the collection of
data, measurement of variables, and the statistical analysis of key relationships and
hypotheses. The attempts to analyze economic data go back to the late eighteenth
century, when the first examinations of the wages of the poor were done in the United
Kingdom, followed by the the mid-nineteenth century research by Engle on food
expenditure and income (or total expenditure). These investigations led to the early
twentieth-century growth of empirical studies on demand, production, and cost func-
tions, price determination, and macroeconomic models. During this period the sta-
tistical theory was developed through the seminal works of Legendre, Gauss, and
Pearson. Finally, the works of Fisher and Neyman and Pearson laid the foundations
of modern statistical inference in the form of classical estimation theory and hypoth-
esis testing. These developments in statistical theory, along with the growth of data
collections and economic theory, generated a demand for more rigorous research in
the metholodogy of economic data analysis and the establishment of the International
Statistical Institute and the Econometric Society.
The post-World War I1 period saw significant advances in statistical science,
and the transformation of economic statistics into a broader subject: econometrics,
which is the application of mathematical and statistical methods to the analysis of
economic data. During the last four decades, significant works have appeared on
econometric techniques of estimation and hypothesis testing, leading to the appli-
cation of econometrics not only in economics but also in sociology, psychology, his-
tory, political science, and medicine, among others. We also witnessed major de-
velopments in the literature associated with the research at the interface between
econometrics and statistics, especially in the areas of censored models, panel (lon-
gitudinal) data models, the analysis of nonstationary time series, cointegration and
volatility, and finite sample and asymptotic theories, among others. These common
grounds are of considerable importance for researchers, practitioners, and students
of both of these disciplines and are of direct interest to those working in other areas
of applied statistics.
...
111
iv PREFACE
ways converge to the same ordering of states. The relationship between this ordering
and the partial ordering given by Lorenz dominance is shown.
Chapters 6-10 and 15-17 deal with econometric methodologies related to dif-
ferent kinds of data used in empirical research. Chapter 6 by Russell, Breunig and
Chiu is perhaps the first comprehensive treatment of the problem of aggregation as
it relates to empirical estimation of aggregate relationships. It is well known that the
analysis of individual behavior based on aggregate data is justified if the estimated
aggregate relationships can be consistently disaggregated to the individual relation-
ships and vice versa. Most empirical studies have ignored this problem; those that
have not are reviewed in this chapter. Anselin and Bera’s chapter details another
data problem ignored in the econometric analysis of regression model: the problem
of spatial autocorrelation and the correlation in cross-sectional data. This chapter
reviews the methodological issues related to the treatment of spatial dependence in
linear models. Another data issue often ignored in empirical development economics
and labor economics is related to the fact that most of the survey data is based on
complex sampling from a finite population, such as stratified, cluster, and systematic
sampling. However, the econometric analysis is carried out under the assumption of
random sampling from an infinite population. The chapter by Ullah and Breunig
reviews the literature on complex sampling and indicates that the effect of misspec-
ifying or ignoring true sampling schemes on the econometric inference can be quite
serious.
Panel data is the multiple time series observations on the same set of cross-
sectional survey units (e.g., households). Baltagi’s chapter reviews the extensive ex-
isting literature on econometric inference in linear and nonlinear parametric panel
data models. In a related chapter, Ullah and Roy develop the nonparametric kernel
estimation of panel data models without assuming their functional forms. The chapter
by Golan, Judge, and Miller proposes a maximum-entropy approach to the estimation
of simultaneous equations models when the economic data is partially incomplete.
In Chapter 15 Terasvirta looks into the modeling of time series data that exhibit non-
linear relationships due to discrete or smooth transitions and to regimes’ switching.
He proposes and develops a smooth transition regression analysis for such situations.
Finally, the chapter by Franses surveys econometric issues concerning seasonality in
economic time series data due to weather or other institutional factors. He discusses
the statistical models that can describe forecasts of economic time series with sea-
sonal variations encountered in macroeconomics, marketing and finance.
Chapters 12 and 18 are related to the simulation procedures and 11, 13, and
14 to the model and selection procedures in econometrics. The chapter by DeBene-
dictis and Giles surveys the diagnostic tests for the model misspecifications that can
have serious consequences on the sampling properties of both estimators and tests.
In a related chapter, Hadi and Son look into diagnostic procedures for revealing
outliers (influential observations) in the data which, if present, could also affect the
estimators and tests. They also propose a methodology of estimating linear models
vi PREFACE
Aman Ullah
David E. A. Giles
Contents
...
Preface LLL
Contributors LX
vii
viii CONTENTS
Index 619
Contributors
Luc Anselin, Ph.D. Research Professor, Regional Research Institute and Depart-
ment of Economics, West Virginia University, Morgantown, West Virginia
Linda F. DeBenedictis, M.A. Senior Policy Analyst, Policy and Research Division,
Ministry of Human Resources, Victoria, British Columbia, Canada
Adrian Rodney Pagan, Ph.D. Professor, Economics Program, The Australian Na-
tional University, Canberra, Australia
lames B. Davies
University of Western Ontario, London, Ontario, Canada
David A. Green
University of British Columbia, Vancouver; 5rItish Columbia, Canada
Harry J. Paarsch
University oflowa, Iowa Ci& Iowa
1. INTRODUCTION
We present a selective survey of the literature concerned with using economic statis-
tics to make social welfare comparisons. First, we define a number of different sum-
mary income inequality measures and social welfare indices as well as functional
summary measures associated with disaggregated dominance criteria. Next, we de-
scribe the theoretical basis for the use of such measures. Finally, we outline how data
from conventional sample surveys can be used to estimate the functionals that can
be interpreted in terms of social welfare and compared in decision-theoretic terms
with other functionals.
While we define and discuss some of the properties of popular summary in-
equality indices, our main focus is on functional summary measures associated with
disaggregated dominance criteria. In particular, we examine the estimation and com-
parison of Lorenz and generalized Lorenz curves as well as indicators of third-degree
stochastic dominance.* In line with a growing body of opinion, we believe that the
*The estimation of summary indices is discussed, for example, by Cowell (1989); Cowell and Mehta
(1982); and Cowell and Victoria-Feser (1996).
I
2 DAVIES
ET AL.
In this section we set out notation and define several scalar summary indices of eco-
nomic inequality and social welfare as well as two functional summary measures
associated with disaggregated dominance criteria. We confine ourselves to the posi-
tive evaluation of the behavior of the various measures. The normative properties
ECONOMIC STATISTICS AND WELFARE COMPARISONS 3
of the measures are discussed in Section 111. Nonetheless, it is useful from the out-
set to note some of the motivation for the various indices. In this section we pro-
vide a heuristic discussion, an approach that mimics how the field has developed
historically.
Economists have always agreed that increases in everyone’s income raise wel-
fare, so there is natural interest in measures of central location (the mean, the me-
dian), which would reflect such changes. But there has also always been a view that
increases in relative inequality make society worse off. The latter view has some-
times been based on, but is logically separate from, utilitarianism. Given the interest
in inequality, it was natural that economists would like to measure it. Historically,
economists have proposed a number of essentially ad hoc methods of measuring in-
equality, and found that the proposed indices were not always consistent in their
rankings. This gave rise to an interest in and the systematic study of the normative
foundations of inequality measurement. Some of the results of that study are surveyed
in Section 111.
One central concern in empirical studies of inequality has been the ability to
allocate overall inequality for a population to inequality between and within spe-
cific subpopulations. Thus, for example, one might like to know how much of the
overall income inequality in a country is due to inequality for females, how much is
due to inequality for males, and how much is due to inequality between males and fe-
males. One reason for the popularity of particular scalar measures of inequality (e.g.,
the variance of the logarithm of income) is that they are additively decomposable
into these various between- and within-group effects. The decompositions are often
created by dividing the population into subpopulations and applying the inequality
measure to each of the subpopulations (to get within-group inequality measures) and
to the “sample” consisting of the means of all the subpopulations (to get a between-
group measure). The more subpopulations one wants to examine, the more unwieldy
this becomes. Furthermore, in some instances, one is interested in answering the
question, how would inequality change if the proportion of the population who are
unionized increases, holding constant all other worker characteristics? The decom-
positions allowed in the standard inequality measures do not provide a clear answer
to this question. This point will be raised again in Section IV. In the following sec-
tion, the reader should keep in mind that indices like the generalized entropy family
of indices possess the decomposability property. For a more detailed discussion of
the decompositions of various indices, see Shorrocks (1980, 1982, 1984).
We consider a population each member of which has nonnegative income Y
distributed according to the probability density (or mass) function f ( y ) with corre-
sponding cumulative distribution function F ( y ) .
A. Location Measures
Under some conditions, economists and others attempt comparisons of economic
welfare which neglect how income is distributed. It has been argued, for example
4 DAVIES
ET AL.
by Harberger (1971), that gains and losses should generally be summed in an un-
weighted fashion in applied welfare economics. This procedure may identify poten-
tial Pareto improvements; i.e., situations where gainers could hypothetically com-
pensate losers. Furthermore, it is possible to conceive of changes that would affect
all individuals' incomes uniformly, so that distributional changes would be absent.
For these reasons, measures of central location are a natural starting point in any
discussion of economic measurement of social welfare.
El. = WI = jo rf(r)dY
in the continuous case or
in the discrete case.* Such a measure is easy to calculate and has considerable intu-
itive appeal, but it is sensitive to outliers in the tails of the distributions. For example,
an allocation in which 99 people each have an income of $1 per annum, while one
person has an income of $999,901 would be considered to be equivalent in welfare
terms to an allocation in which each person has $10,000 per annum. Researchers
are frequently attracted to alternative measures that are relatively insensitive to be-
havior in the tails of the distribution. One measure of location that is robust to tail
behavior is the median.
2. Median Income
The median is defined as that point at which half of the population is above and the
other half is below. In terms of the probability density and cumulative distribution
functions, the median solves the following:
Alternatively,
*Hereafter, without loss of generality, we shall focus almost exclusively on the continuous case.
ECONOMIC STATISTICS AND WELFARE COMPARISONS 5
where F - ' ( - ) is the inverse function of the cumulative distribution function. Clearly,
other quantiles could also be entertained: for example, the lower and upper quartiles,
which solve
2. Variance of Income
A standard measure of scale is the variance of Y , defined by
Like the variance and the standard deviation, in comparison to other popular mea-
sures, t is especially sensitive to changes in the tails of the distribution.
2 = log Y
where
is the average value of log Y.? Note that the variance of the logarithm of Y is inde-
pendent of scale, as is the standard deviation of the logarithm of income ox,with
which it may be used interchangeably.$
*In some cases, the transformation can actually go too far as is discussed in Section 111.
?Here, we have used the fact that the prohatiilitv drnsity function of Z is
d
.fz(.) = .fw; = /(exp(z)) exp(2)
since
y = exp(t) and dy
- = exp(2)
d2
$The coefficient of variation of the logarithm ofincwmr TZ is not used since 5 8 depends on scale and would
fall, for example, with an equiproportionate increase in all incomes. Thus, changing from measurement
in dollars to cents would cause TZ to fall.
8 DAVIES
ET AL.
6. Gini Coefficient
Perhaps the most popular summary inequality measure is the Gini coefficient K .
Several quite intuitive alternative interpretations of this index exist. We highlight
two of them. The Gini coefficient has a well-known geometric interpretation related
to the functional summary measure of relative inequality, the Lorenz curve, which is
discussed in the next subsection. Here, we note another interpretation of the index,
which may be defined as
K = L
-
!
21-L
1”I”
0
(U - v (f (U)f (U) du dv
In words, the Gini coefficient is one half the expected difference between the in-
comes of two individuals drawn independently from the distribution, divided by the
mean p.
One virtue of writing K this way is that it draws attention to the contrasting
weights that are placed on income differences in different portions of the distribution.
The weight placed on the difference lu - 211 is relatively small in the tails of the
distribution, where f ( u ) f ( v )is small, but relatively large near the mode. This means
that, in practice, K is dramatically more sensitive to changes in the middle of the
income distribution than it is to changes in the tails. This contrasts sharply with the
behavior of many other popular inequality indices which are most sensitive to either
or both tails of the distribution.
7. Atkinson’s Index
Atkinson (1970)defined a useful and popular inequality index which is based on the
family of additive social welfare functions (SWFs)of the form
The parameter E governs the concavity, and therefore the degree of inequality aver-
sion, shown by the function. Note also that for E equal to zero the function merely
aggregates all incomes, and therefore ranks the same as the mean, given a constant
population.
Atkinson’s index @ is defined as
ECONOMIC STATISTICS AND WELFARE COMPARISONS 9
The parameter E plays a dual role. As it rises, inequality aversion increases, but,
in addition, the degree of sensitivity to inequality at lower income levels also rises
with E . In the limit, as E goes to infinity, the index is overwhelmingly concerned with
inequality at the bottom of the distribution. While the sensitivity of this index to
inequality at different levels can be varied by changing E , it is always more sensitive
to inequality that occurs lower in the distribution.
+
@E =
{ 1 - [c(c - l)p,:
1 - exp(-pcL
1]’/C, c < 1,
c=o
c #0
Hence, for the particular form of Atkinson’s index given by some value of E , there is
always a corresponding, ordinally equivalent, generalized entropy measure.
All members of the generalized entropy family of inequality indices are based
on some notion of the average distance between relative incomes. These indices do
not take into account rank in the income distribution in performing this averaging,
which makes them fundamentally different from the Gini coefficient, for which rank
is very important.
The attraction of the generalized entropy family is enhanced by the fact that
it comprises all of the scale-independent inequality indices satisfying anonymity
10 DAVIES
ET AL.
and the strong principle of transfers which are also additively decomposable into in-
equality within and between population subgroups; for more on this, see Shorrocks
(1980). Furthermore, all indices which are decomposable (i.e., not necessarily addi-
tively) must be some positive transformation of a member of the generalized entropy
family; for more on this, see Shorrocks (1984).
I. Lorenz Curve
The Lorenz curve (LC) is the plot of the cumulative distribution function q on the
abscissa (x-axis) versus the proportion of aggregate income held by the quantile c ( q )
and below on the ordinate (y-axis). The qth ordinate of the LC is defined as
Note that C(0) is zero, while C(1) is one. A graph of a representative LC for income
is provided in Figure 1. The 45" line denotes the LC of perfect income equality. The
further is the LC bowed from this 45" line, the more unequal is the distribution of
income. The Gini coefficient K can also be defined as twice the area between the 4.5"
line and the LC L(q).Thus,
*Howes (forthcoming) provides an up-to-date and more technical discussion of other measures related to
these.
ECONOMIC STATISTICS AND WELFARE COMPARISONS II
LC
0 1
Q
Figure I Example of Lorenz curve.
valid social welfare measure. This indicator is the generalized Lorenz curve (GLC)
which has ordinate
D. Stochastic Dominance
In some of the discussion we shall use stochastic dominance concepts. These were
first defined in the risk-measurement literature, but it was soon found that they par-
alleled concepts in inequality and social welfare measurement. We introduce here
12 DAVlES ET AL.
GLC
mu
I
I
I
I
I
I
I
II
the notions of first-, second-, and third-degree stochastic dominance for two random
variables Y1 and Y2, each having the respective cumulative distribution functions
F1 (y) and Fz(y). First-degree stochastic dominance holds in situations where one
distribution provides a Pareto improvement compared to another. As discussed in
the next section, second-degree stochastic dominance corresponds to GLC domi-
nance. Finally, third-degree stochastic dominance may be important in the ranking
of distributions whose LCs or GLCs intersect.
and
In words, strict FSD means that the cumulative distribution function of Yl is every-
where to the right of that for Yz.
ECONOMIC STATISTICS AND WELFARE COMPARISONS 13
and
Note that FSD implies SSD. If the means of Y1 and Y2 are equal (i.e., &[Y1] = &[Y2]=
p), then Y1 SSD K2 implies that Y1 is more concentrated about p than is Y2.
and
It is possible that everyone may be better off in one distribution than in another, For
example, this may happen as a result of rapid economic growth raising all incomes.
In this case, there is an actual Pareto improvement. As a result, there will be fewer
individuals with incomes less than any given real income level Y as time goes on.
___
*Corresponding endpoint conditions can he stated for FSD and SSD, hut they are satisfied trivially.
14 DAVIES
ET AL.
Atkinson (1970) provided the first investigation of the SWF approach, using the ad-
ditive class of SWFs:
W ( Y ) = € [ Y ]= f!L (3)
This SWF corresponds to the use of per capita income to evaluate income distribu-
tion. Note that by using the mean one shows indifference to income inequality.
Atkinson (1970) pointed out that strictly concave SWFs obeyed what has come
to be known as the Pigou-Daltonprinciple of transfers. This principle states that in-
come transfers from poorer to richer individuals (i.e., regressive transfers) reduce so-
cial welfare. Atkinson showed that this principle corresponds formally to that of risk
aversion and that a regressive transfer is the analogue of a mean-preserving spread
(MPS) introduced into the risk-measurement literature by Rothschild and Stiglitz
(1970).
ECONOMIC STATISTICS AND WELFARE COMPARISONS 15
Atkinson also noted that a distribution F1 (y) is preferred to another &(y) ac-
cording to all additive utilitarian SWFs if and only if the criterion for second-degree
stochastic dominance (SSD) is satisfied. Finally, he showed that a distribution Fl (y)
would have V ( Y 1 ) which weakly exceeds W(Y2) for all U ( y ) with U’(y) positive
and U”(y) nonpositive, if and only if L I( q ) , the LC of F1 (y), lies weakly above
&(q), the LC of F2(y), for all q . In summary, Atkinson (1970) showed the following
theorem.
Theorem 3.1. The following conditions are equivalent (where Fl (y) and F2(y)
have the same mean):
(i) W(Y1) 3 W(Y2) for all U ( y ) with U’(y) > 0 and U”(y) 5 0.
(ii) F2(y) can be obtained from Fl (y) by a series of MPSs, regressive trans-
fers.
(iii) F1 (y) dominates F2(y) by SSD.
(iv) L1( q ) 2 .c2(q) for all q .
Dasgupta, Sen, and Starrett (1973) generalized this result. First, they showed
that it is unnecessary for W ( Y )to be additive. A parallel result is true for any concave
W ( Y ) . Dasgupta, Sen, and Starrett also showed that the concavity of W could be
weakened to Schur or “S” concavity. These generalizations are important since they
indicate that Lorenz dominance is equivalent to unanimous ranking by a very broad
class of inequality measures.
To a large extent, in the remainder of this chapter we shall be concerned with
stochastic dominance relations and their empirical implementation. The standard
definitions of stochastic dominance, which were set out in the last section, are stated
with reference to an additive objective function, reflecting their origin in the risk-
measurement literature. Therefore, for the sake of exposition, it is convenient to con-
tinue to refer to the additive class of SWFs in the treatment that follows. This does not
involve any loss of generality since we are studying dominance relations rather than
the properties of individual inequality measures.* Dominance requires the agree-
ment of aZZ SWFs in a particular class. As the results of Dasgupta, Sen, and Starrett
show for the case of SSD, in situations where all additive SWFs agree on an inequal-
ity ranking (i.e., there is dominance), all members of a much broader class of SWFs
also may agree on the ranking.
*In the study of individual inequality indices, it would be a serious restriction to confine ones attention
to those which are associated with additive SWFs. This would eliminate the coefficient of variation, the
Gini coefficient, and many members of the generalized entropy family from consideration. Nonadditive
SWFs may be thought of as allowing interdependence of social preferences toward individual incomes.
For additional discussion of these topics, see Maasoumi (1997)and the references cited therein.
16 DAVIES
ET AL.
Shorrocks (1983) went beyond this sufficient condition and established the equiva-
lence of GLC dominance and SSD. Using his results, we can state
Theorem 3.3. The following conditions are equivalent (where Fl (y) and F2(y)
may have different means):
(i) W(Y1) 2 W(Y2) for all U(y) with U’(y) > 0 and U”(y) 5 0.
(ii) F l ( y ) dominates F 2 ( y ) by SSD.
(iii) G ( q ) 1 G2(q) for all q .
Theorem 3.3 is of great practical importance in making welfare comparisons be-
cause the means of real-world distributions being compared are seldom equal. Also,
Shorrocks (1983) and others have found that, in many cases when LCs cross, GLCs
do not. Thus, using GLCs greatly increases the number of cases where unambiguous
welfare comparisons can be made in practice.
Another approach to extending the range of cases where unambiguous welfare com-
parisons can be made, beyond those which could be managed with the techniques
of Atkinson (1970), has been to continue to restrict attention to relative inequality,
but to adopt a normative axiom stronger than the Pigou-Dalton principle of transfers.
This axiom has been given at least two different names. Shorrocks and Foster (1987)
referred to it as transfer sensitivity. Here, we follow Davies and Hoy (1994, 1995)
who referred to it as aversion to downside inequality (ADI).
AD1 implies that a regressive transfer should be considered to reduce welfare
more if it occurs lower in the income distribution. In the context of additive SWFs,
this clearly restricts U ( y ) to have U”’(y), which is nonnegative in addition to U’(y)’s
being positive and Il”(y)’s being nonpositive.
The ordering induced by the requirement that W(Y1) be greater than or equal
to W(Y2) for all U ( y ) such that U ’ ( y ) is positive, U”(y) is nonpositive, and U”’(y)
is nonnegative corresponds to the notion of third-degree stochastic dominance (TSD)
introduced in the risk-measurement literature by Whitmore (1970). TSD initially
received far less attention in inequality and welfare measurement than SSD, despite
its embodiment of the attractive AD1 axiom. This was, in part, due to the lack of
a readily available indicator of when TSD held in practice. Shorrocks and Foster
(1987) worked to fill this gap.
When LCs do not intersect, unambiguous rankings of relative inequality can
be made under SSD, provided the means are equal. When GLCs do not intersect,
18 DAVIES
ET AL.
unambiguous welfare rankings can also be made under SSD. When either LCs or
GLCs intersect difficulties arise.
Suppose that €[Yl] equals E[Y2]and that the LCs C l ( q ) and & ( q ) intersect
once. A necessary condition for Fl(y) to dominate F2(y) by TSD is that C l ( q ) is
greater than or equal to &(q) at lower incomes and that C1 ( q ) is less than or equal
to & ( q ) at higher incomes. In other words, given that the LCs cross once, the only
possible candidate for the “more equal” label is the distribution which is better for
the poor. This is because the strength of aversion to downside inequality may be so
high that no amount of greater equality at high income levels can repair the damage
done by greater inequality at low incomes.
Shorrocks and Foster (1987) added the sufficient condition in the case of singly
intersecting LCs, proving the equivalent of the following theorem for discrete distri-
butions:
Theorem 3.4. If €[Yl] = €[I‘ll and the LCs for Fl(y) and F2(y) have a single
intersection, then F1 (y) is preferred to F2(y) by TSD if and only if
Since multiple intersections of LCs are far from rare in applied work, this result
has considerable practical value. Its implementation has been studied by Beach,
Davidson, and Slotsve (1994) and is discussed in the next section.
Typically, it is far too expensive and impractical to sample the entire population to
construct F ( y ) . Thus, researchers usually take random samples from the population,
and then attempt to estimate F ( y ) as well as functionals that can be derived from
F ( y ) , such as LCs and GLCs. Often, researchers are interested in comparing LCs
(or GLCs) across countries or, for a given country, across time. To carry out this sort
ECONOMIC STATISTICS AND WELFARE COMPARISONS 19
of analysis, one needs estimators of LCs and GLCs as well as a distribution theory
for these estimators.
In this section, we show how to estimate LCs and GLCs using the kinds of
microdata that are typically available from sample surveys. We consider the case
where the researcher has a sample { Y l , Y2, . . . , Yiy} of N observations, each of
which represents an independent and identical random draw from the distribu-
tion F(y).*
A. Parametric Methods
Because parametric methods of estimation and inference are the most well known to
researchers, we begin with them.t Using the parametric approach, the researcher as-
sumes that F ( y ) comes from a particular family of distributions (exponential, Pareto,
lognormal, etc.) which is known up to some unknown parameter or vector of param-
eters. For example, the researcher may assume that
*Many large, cross-sectional surveys have weighted observations. For expositional reasons, we avoid this
complication, but direct the interested reader to, among others, the work of Beach and Kaliski (1986)
for extensions developed to handle this sort of complication. We also avoid complications introduced
by dependence in the data, hut direct the interested reader to, among others, the work of Davidson and
Duclos (1995).
TA useful reference for the epistemology of distribution functions is Chipman (1985).
20 DAVIES
ET AL.
while the maximum likelihood estimators of 60 and 61 in the Pareto case are
N
Ci=llog Y,
M=
N
In each of these cases (except that of To), the maximum likelihood estimators
are consistent and distributed normally, asymptotically.* One can also derive the
pointwise asymptotic distribution of both the LC and the GLC. This is easiest to do
in the exponential case where the GLC is linear in the parameter to be estimated.
Note that
so
by a central limit theorem, where p0 is the true value of p, so one can show that for
each q , a pointwise characterization of the asymptotic distribution is
*The estimator To of 80 is superconsistent and converges at rate N rather than a; its asymptotic dis-
tribution is exponential. This is a property of extreme order statistics when the density is positive at the
boundary of the support; see Galambos (1987). Because To converges at a rate N , which is faster than
the rate JX for the other estimator T l , it can he treated as if it actually equals 00, and thus ignored in
an asymptotic expansion of T1.
ECONOMIC STATISTICS AND WELFARE
COMPARISONS2 I
Note that in the exponential case, the LC is invariant to the parameter p since
Gm - [ q
G(q) - -
C ( q ;p ) = - + (1 - q ) log(1 - q ) ]
WI P
so no estimation is required.
In the other two examples, the LCs and GLCs are nonlinear functions of the
parameters. We find it useful to consider these examples further, since they illustrate
well the class of technical problems faced by researchers. In the Pareto case, where
Now
so the LC is
L(q;01) = [l - (1 - q)(@\-l)'@y
Because G ( q ; 80,01) and C ( q ;0 , ) are continuous functions of 00 and 01, the maxi-
mum likelihood estimators of them are
22 DAVIES
ET AL.
u q ; T l ) = [1 - (1 - 4 ) (T1- l)/Tl3
respectively. These estimators are also consistent.*
To find the asymptotic distributions of G ( q ; To, T1) and L ( q ; T l ) , one needs
first to know the asymptotic distribution of T l . Now
Zl = log (Z)
is distributed exponentially with unknown parameter 81, so its probability density
function is
f.&) = 81 exp(-Olz), O1 > 0, z > 0
Thus, Ti is a function of the sample mean Z N of N independently and identically
distributed exponential random variables { Z , , Z 2 , . . . , Z N }where, by a central limit
theorem (such as Lindeberg-Levy), ZN has the following asymptotic distribution:
with 8: being the true value of 81. Because Tl is a continuous and differentiable
function of the sample mean Z N ,we can use the delta method (see, for example, Rao
1965)to derive Tl’s asymptotic distribution. We proceed by expanding the function
T1 ( Z N )in a Taylor’s series expansion about r[zN],
which equals l/O:. Thus,
plim ~1 = 8:
N+CC
Thus, to find the asymptotic distribution of L ( q ; Tl), for example, expand L ( q ; T1)
in a first-order Taylor's series about the point 07, with Rk being the remainder,
to get
Similar calculations can be performed for the estimator G ( q ; TO,Tl)of S(q; 80,81).*
) and G ( q ; p , a
In this case, both e(q; p , a
' '
) are only defined numerically. For
a specific q, conditional on some estimates r?i and s^2, one can solve the quantile
equation (5) numerically and then the GLC equation (4). To apply the delta method,
one would have to use Leibniz's rule to find the effect of changes in M and S2 on
the asymptotic distribution of G ( q ; p , 0') at q . A similar analysis could also be per-
formed to find the asymptotic distribution of C(q; p , a2).As one can see, except in a
few simple cases, the technical demands can increase when parametric methods are
used because the quantiles are often only defined implicitly and the LCs and GLCs
can typically only be calculated numerically, in the continuous case. Moreover, the
*Note that one could perform a similar large-sample analysis to find the asymptotic distribution for
estimators of such summary measures as the Gini coefficient, but this is beyond the focus of this
chapter.
24 DAVIES
ET AL.
where
with Yi = Y ( q i ) . To carry out this sort of analysis, one must first order the sam-
ple { Y l , Y2, . . . , Y N }so that Ycl) 5 Y(z) 5 . . - 5 Y ( N ) .Beach and Davidson de-
fined E ( q ) , an estimator of the qth population quantile c(q), to be the rth-order
statistic Y(rl where r denotes the greatest integer less than or equal to qN.Thus,
*McFadden (1989)and Klecan, McFadden, and McFadden (1991) have developed nonparametric pro-
cedures for examining SSD, while Anderson (19%) has employed nonparametric procedures and FSD,
SSD, and TSD principles to income distributions. Xu, Fisher, and Wilson (1995)have also developed
similar work. Although these procedures are related to estimation and inference concerning LCs and
GLCs, for space reasons we do not discuss this research here.
ECONOMIC STATISTICS AND WELFARE COMPARISONS 25
where
with ri being the greatest integer less than or equal to q ; N . Note that G is, in fact,
the GLC and that G is an estimator of the GLC. Beach and Davidson showed that the
distribution of &V(G - 0')is asymptotically normal, centered about the ( J 1) +
zero vector, and has variance-covariance matrix Cl' where
with (Ay)'
being A2(q;), the variance of Y given that Y is less than c"(qi).
Now, L is a nonlinear function of G. By the delta method, Beach and Davidson
showed that it has the asymptotic distribution
26 DAVIES
ET AL.
where
where Vp denotes the gradient vector of the function to follow with respect to the
vector 8.
Beach and Davidson (1983) appear to have ignored the implications of their
results of GLCs. These notions were first applied by Bishop, Chakraborti, and Thistle
(1989).
with
at a countable number of points J , so the null hypothesis can also be written as I’I 2
0. Testing this hypothesis involves deciding whether the random variable L1 - Lz
(or G1 - 6 2 for GLC dominance) is in the nonnegative orthant of RJ (or R-’+’). A
number of authors have built on the work of Perlman (1969) to provide solutions
to this problem; see, for example, Kodde and Palm (1986)and Wolak (1987, 1989,
1991).To calculate the test statistic LR for realizations 1 1 and & of L1 and L2 and
estimates 91 and 92 of V1 and Vz, requires one to solve the following constrained,
quadratic programming problem:
which we rewrite as
min (p - p ) ’ V 1 ( p- p) subject to p 3 0
(P)
Letting
LR = (p - p)’V-’(p - p)
where Pr[x2(j) 2 c ] denotes the probability that a x2 random variable with j de-
grees of freedom exceeds some constant c , and w ( J , J - j , V) is a weighting function.
7=(T(ql),T(q2),. . . T(qj+l))’
as well as the asymptotic theory required to test for dominance of the cumulative
coefficients of variation. In particular, they derived the asymptotic distribution of
the estimator
with ri again being the largest integer less than or equal to qiN. As with the test of
LC (or GLC) dominance, a test of the null hypothesis of CCV dominance for two
populations involves deciding whether TI - T2 is in the nonnegative orthant of
RJ+~
aqlx) =
pX)
rf ( Y M dr -
-
A y x ) rf(ylx) 4Y
I;rf(rlx) 4 1x1
A LC of Y conditional on X equaling x is the plot of (4, L ( q l x ) ) .The conditional
GLC is the plot of ( q , € [ Y I x ] C ( q l x ) ) , which we denote by ( 4 , G(qlx)).
ECONOMIC STATISTICS AND WELFARE COMPARISONS 29
A. Parametric Methods
As in Section 1V.A one could assume that f(ylx) comes from a particular parametric
family. For example, suppose
logY = x p + v
where V is distributed independently and identically normal having mean zero and
variance 02.One could then use the method of maximum likelihood to get estimates
of p and 02, and these estimates in turn could be used to estimate g ( q l x ) or L(qIx).
Since the maximum likelihood estimators of p and o2 are asymptotically normal
and since both the LC and the GLC are functionals of f(ylx), one can then use the
delta method to find the pointwise asymptotic distribution of G ( q l x ) and .C(qlx). Of
course, as in the univariate case, the major criticism of the parametric approach is
that it imposes too much structure on the data.
C. SemiparametricMethods
The idea behind any semiparametric method is to put enough structure on the distri-
bution of Y given X equals x so as to reduce the curse of dimensionality that arises in
nonparametric methods. In this section, we focus on one particularly useful approach
(that of Donald, Green, and Paarsch 1995)to estimating GLCs semiparametrically.
These methods can be adapted to recover estimates of LCs, but in the interest of
space we omit the explicit parallel development of methods for LCs.
Donald et al. (1995) translated techniques developed for estimating spell-
duration distributions (Kalbfleisch and Prentice 1980)to the estimation of income
distributions. The main building block in their approach is the hazard function. For
a nonnegative random variable Y with associated probability density function f ( y )
and cumulative distribution function F ( y ) , the hazard function h ( y ) is defined by
the conditional probability
30 DAVIES
ET AL.
where S(y) is the survivor function. From (6) one can see that the hazard function
is simply a transformation of the probability density (or mass) function. One key
result from the literature on spell-duration estimation is that the conditional nature
of h ( y ) makes it easy to introduce flexible functions of the covariates and to entertain
complex shapes for the hazard function.
Donald et al. introduced covariates using an extension of a proportional haz-
ard model in which the range of Y is partitioned into P subintervals Yp = [$, y:),
where Yp n Y4 = 0 for all p # q with U,‘=, Yp = [0,00)’ and they allow the
covariate effects to vary over these subintervals.* They referred to these subinter-
vals as covariate segments. In particular, following Gritz and MaCurdy (1992),they
replaced xip in Cox’s model with
*This class of models was first introduced by Cox (1972, 1975). Specifically, the hazard rate for person i
conditional on x;,a particular realization of the covariate vector X,is
h(y1xi) = exp(xiS)ho(y)
where ho(y) is the baseline hazard function common to all individuals and is a vector of unknown
parameters. An important shortcoming of this specification is the restriction that individuals with very
different covariate vectors have hazard functions with the same basic shape, and that any particular
covariate shifts the entire hazard function up or down relative to the baseline specification. It can be
shown that if a particular element of the vector is negative, then F(ylx) for a person with a positive value
for the associated covariate and zero values for all other covariates first-degree stochastically dominates
F(ylO), the cumulative distribution function for a person with zero values for all covariates. This is quite
a strong restriction.
ECONOMIC STATISTICS AND WELFARE COMPARISONS 3I
hazard model with this form of baseline specification are discussed in Meyer (1990).
The major advantage of this approach is that, with a sufficiently large value for J ,
it can capture complicated shapes for the hazard, while allowing for very straight-
forward transformations from hazard to density estimates and then to estimates of
GLC ordinates. This latter occurs because sums rather than integrals are involved.
The main disadvantage is that density estimates are very “spikey,” including spikes
such as those induced by focal-point income reporting, e.g., integer multiples of
$1000. The latter spikes may prove distracting when they are not the main focus of
the analysis.
For a sample of size N , the logarithm of the likelihood function specifica-
tion is
N
i= 1
where the dependent variable for individual i falls in the jl*th baseline segment, Y;
is the set of Y values corresponding to the j t h baseline segment, Yp(,) is the set of
Y values corresponding to the pth covariate segment which itself is associated with
the j t h baseline segment, and Y, is less than y j if Y, is not right-censored. The vector
a contains the J baseline parameters and the notation a; indicates the element of
a corresponding to the jth baseline segment. Note that
For consistent estimates, Donald et al. required that the covariate values not change
within the baseline segments; they may, however, vary across baseline segments.
For the jth baseline segment, an estimate of the hazard rate is
where a, is the estimate of the jth element of the baseline parameter vector and &p
is an estimate of B p . An estimate of the survivor function is
32 DAVIES
ET AL.
The discrete form of the baseline hazard makes estimating the survivor function very
simple. An estimate of the probability mass function is then
f<,Ix) = R Y j - %,j+1Ix>
IX)
and
and
where t3, = rj - rj+l.Thus, the standard errors of the density estimate at each
baseline segment are easy to construct given d and an estimate of E. The discrete
form of the baseline hazard is helpful in this construction since one requires only
summations rather than integration.
One can also recover estimates of quantiles conditional on the covariates. The
qth quantile of Y conditional on x is defined by
*In practice, Donald et al. use the inverse of the Hessian matrix of the logarithm of the likelihood function
(divided by the sample size) to estimate E.
ECONOMIC STATISTICS AND WELFARE COMPARISONS 33
In the discrete case, however, estimates of the quantiles are easily found by E(qlx) =
yj if *
S(3jlx) 2 (1 - 4 ) ’ %Yj+llX>
and E(qlx) = YJ if
j(YJlx) 2 (1 - q)
Donald et al. characterized the limiting distribution of E(qlx) for different values
of q .
The main advantage of the approach of Donald et al. is its combination of flex-
ibility and tractability. Convergence for their likelihood function can be obtained
relatively quickly and easily. The transformations from d to estimates of the hazard
and then to the density functions described above are straightforward. At the same
time, the specification imposes few restrictions on the shape of the density function
and permits quite different shapes for different covariate vectors. In addition, the ap-
proach admits the examination of both decomposability of subgroups and marginal
effects of covariates. Finally, this approach provides a consistent means of address-
ing top-coded data since top-coded values are just right-censored “spells.””
Once a consistent estimate of f(ylx) is obtained, one can then estimate the
GLC ordinates conditional on the covariates. When the Y s are discrete, the ordinate
of the GLC at q is
J
G(qlx) = l[Yj 5 Hqlx)lYjf(r/Ix)
j= I
where
*In sample surveys, individuals with high incomes can often be identified by answers to other questions
on the survey instrument. For example, the richest man in a particular geographical region might have
a wife and six children. Thus, an interested individual might infer the man’s income from information
concerning where he lived and his household characteristirs. To provide confidentiality, those who con-
duct surveys often list high incomes as greater than some value, say $100,000 and higher. This practice
is often referred to as “top-coding.”
34 DAVIES
ET AL.
to be
where
semiparametric methods. The latter can be made sufficiently flexible to reflect com-
plex effects of covariates. A further advantage of the approach is that it provides a
consistent means of addressing top-coded data.
As the review conducted in this chapter shows, recent literature has provided
powerful theoretical and statistical tools which can be applied in social welfare com-
parisons. At this point there is an important challenge to workers on income distri-
bution topics to make greater use of these techniques in their analysis of the data.
VII. APPENDIX
In this appendix, we describe how to access several programs designed to carry out
the analysis described above. These programs reside in the Econometrics Laboratory
Software Archive (ELSA) at the University of California, Berkeley; this archive can
be accessed easily using a variety of different browsers via the Internet. To contact
the staff at ELSA, simply send e-mail to
elsa@econ.berkeley.edu
For those with access to an Internet browser like Netscape Navigator, simply click
on the Open icon and then enter
http://econ.berkeley.edu
This will put you into the HomePage of the Department of Economics at the Univer-
sity of California, Berkeley. A number of options exist. Click the option
[Research Facilities]
exists. Click on either the icon denoted Code or the icon denoted ~ o and
c scroll
through until you find the entries you desire.
The following describes the entries:
a. A Nonparametric Estimator of Lorenz Curves
FORTRAN code for implementing Beach and Davidson (1983)
b. A Nonparametric Estimator of Generalized Lorenz
Curves
FORTRAN code for implementing Bishop, Chakraborti, and Thistle (1989)
c. A SemiParametric Estimator of Distribution
Functions with Covariates
Gauss code for implementing Donald, Green, and Paarsch (1995)
36 DAVIES
ET AL.
ACKNOWLEDGMENTS
Green and Paarsch wish to thank the SSHRC of Canada for research support and
their co-author, Stephen G. Donald, for allowing them t o borrow from their unpub-
lished work. Most of this paper was written while Paarsch was the Arch W. Shaw
National Fellow at the Hoover Institution, Stanford, California. Paarsch would like
to thank the Hoover Institution for its hospitality and support. The authors are also
grateful to Gordon Anderson and Charles M. Beach as well as two anonymous refer-
ees for useful comments and helpful suggestions.
REFERENCES
Cox, D. (1972), Regression Models and Life-Tables, Journal ofthe Royal Statistical Society,
Series B, 34, 187-202.
Cox, D. (1975), Partial Likelihood, Biometrika, 62,269-276.
Dasgupta, P., A. Sen, and D. Starrett (1973), Notes on the Measurement of Inequality, Journal
of Economic Theory, 6, 180-187.
Davidson, R. and J.-Y. Duclos (1995), Statistical Inference for the Measurement of the Inci-
dence of Taxes and Transfers, typescript, Department of Economics, Queen’s Univer-
sity, Kingston, Canada.
Davies, J. and M. Hoy (1994), The Normative Significance of Using Third-Degree Stochastic
Dominance in Comparing Income Distributions, Journal of Economic Theory, 64,520-
530.
Davies, J. and M. Hoy (1995), Making Inequality Comparisons when Lorenz Curves Intersect,
American Economic Review, 85,980-986.
Donald, S., D. Green, and H. Paarsch (1995), Differences in Earnings and Wage Distributions
between Canada and the 1Jnited States: An Application of a Semi-Parametric Estima-
tor of Distribution Functions with Covariates, typescript, Department of Economics,
University of Western Ontario, London, Canada.
Engle, R. (1984), Wald, Likelihood Ratio, and Lagrange Multiplier Tests in Economics, in Z.
Griliches and M. Intriligator (eds.), Handbook of Ecunometrics, vol. 11, North-Holland,
Amsterdam.
Galambos, J. (1987), The Asymptotic Theory of Extreme Order Statistics, 2nd ed., Krieger,
Malabar, FL.
Gritz, M. and T. MaCurdy (1992), Unemployment Compensation and Episodes of Nonemploy-
ment, Empiricd Economics, 17, 183-204.
Harberger, A. (1971), Three Basic Postulates of Applied Welfare Economics, Journal of Eco-
nomic Literature, 9, 785-797.
Harrison, A. (1982), A Tail of Two Distributions, Review ofEconomic Studies, 48,621-631.
Howes, S. (forthcoming), Asymptotic Properties of Four Fundamental Curves of Distributional
Analysis, Journal ojEconometrics.
Kalbfleisch, J. and R. Prentice (1980), The Statistical Analysis of Failure Time Data, Wiley,
New York.
Klecan, L., R. McFadden, and D. McFadden (1991), A Robust Test for Stochastic Dominance,
typescript, Department of Economics, MIT, Cambridge, MA.
Kodde, D. and F. Palm (1986), Wald Criteria for Jointly Testing Equality and Inequality Re-
stric tions, Econometrica, 54, 1243- 1248.
Lambert, P. (1989), The Distribution and Redistribution of Income, A Mathematical Analysis,
Basil Blackwell, Cambridge, MA.
Maasoumi, E. (1997), Empirical Analysis of Inequality and Welfare, in M. Pesaran and P.
Schmi d t (eds .), Handbook of Appl ied Microeconometrics, Basil Blac kwell, London.
McDonald, J. (1984), Some Generalized Functions for the Size Distribution of Income, Econo-
metrica, 52,647-663.
McFadden, D. (1989), Testing for Stochastic Dominance, in T. Fomby and T. Seo (eds.), Stud-
ies in the Economics of Uncertainty: In Hunor of Josef Hadar, Springer-Verlag, New
York.
Meyer, B. (1WO), Unemployment Insurance and Unemployment Spells, Econometrica, 58,
757-782.
38 DAVIES
ET AL.
WaIte r Kriime r
University of Dortmund, Dortmund, Germany
1. THE PROBLEM
D:=UR:
n=2
not completely available, and Section VI concludes with some recent developments
in multidimensional inequality.
As the focus here is on mathematical aspects of the ordering of given vectors x
and y, it is only fair to point out that in most applications these problems are dwarfed
by the complications and ambiguities involved in the definition and measurement of
the basic vectors x and y themselves. This is perhaps most obvious with income in-
equality, where the outcome of any analysis depends much more on how income is
defined, on the unit (individuals versus households), and on the accounting period
than on the statistical procedures applied afterward. However, such problems of ba-
sic definition and measurement differ so much across applications that there is no
hope of an adequate treatment in a general survey such as this.
There is wide agreement across value systems and applications that if some vector
is subjected to a progressive transfer, then inequality should be reduced.* Formally,
let y = xT,where both y and x are of dimension n and where T is an n x n matrix
of the form
*Or at least: should not increase. In what follows, “reduction” excludes the limiting case of “no change.”
If “no change” is included, I say “weak reduction” (and similarly “increase” and “weak increase”).
MEASUREMENT
OF INEQUALITY 4I
i= 1 i= 1
and a central result in majorization theory, due to Hardy, Littlewood, and Polya
(1934), states that x majorizes y if and only if y = xT1 T2 . - . TL with finitely many
matrices of the form (2).
The obvious drawback of this (the only) universally accepted preordering is
that only very few pairs of elements x and y from D can be compared. Both vectors
must have the same dimension and the same sum of elements, which renders this
ordering almost useless in empirical applications.
One way out of this dilemma is to base a comparison on one of the scalar-valued
measures of inequality to be discussed in Section 111. As these measures all involve
value judgments of one sort or the other, one might compromise here and declare
one vector y more equal than some other vector x if Z(x) 2 Z(y) for all inequality
measures I ( . ) in some conveniently chosen class. For instance, almost by definition,
the majorization order on D is the one implied by the set of all Schur-convex functions
I : D + R ( a function I ( . ) is Schur convex if it respects the majorization ordering),
and by restricting this set of functions, one might hope to obtain an order which is
richer (i.e., allows more pairs of vectors to be ranked).
Quite surprisingly, however, if we restrict I ( - ) to be symmetric and convex in
+ +
the ordinary sense (i.e., Z(hx ( 1 - h ) y ) 2 hZ(x) (1 - h ) Z ( y )for all 0 5 h 5 1;
this implies Schur convexity), or even further to be of the form
with a convex function g : R + R,we still end up with the very restrictive majoriza-
tion ordering > M (Mosler 1994, Proposition 2.2).
A major breakthrough in this respect was the discovery by Atkinson (1970)
and Dasgupta, Sen, and Starrett (1973) that the preordering on D becomes much
richer, and in fact identical to the familiar Lorenz order, if we require in addition to
(4) that the measure be population invariant:
-1
Figure I A typical Lorenz curve: convex, monotone, passing through (0,O)and (1, 1).
Then y is more equal than x with respect to all inequality indices in that class if and
only if the Lorenz curve of y lies everywhere above the Lorenz curve of x, where the
Lorenz curve of x (and similarly for y) is obtained in the usual way by joining the
points (0,O) and
and x is more unequal than y in the absolute Lorenz sense (in symbols: x y) if
L A , ( p ) 5 L A , ( p ) . This gives another preordering on D which reduces to majoriza-
tion whenever the latter applies, but it incorporates a more leftist ideal of equality, as
first pronounced by Kolm (1976), for pairs of income vectors which cannot be ranked
by majorization.
There is an infinity of additional preorderings on D which are induced by fam-
ilies of “compromise inequality indices” (Eichhorn 1988, Ebert 1988, Bossert and
Pfingsten 1989), i.e., indices with the property
*Note that progressiveness by itself does not guarantee a reduction in inequality. It is easy to find examples
where income inequality in the Lorenz sense increases after a progressive income tax: take two incomes
of 5 and 10 and a progressive tax rate of 0%up to 6 and 100%afterwards.
TA generalized Lorenz curve is an ordinary Lorenz curve multiplied by x. It is most useful in welfare
rankings of income distributions (Shorrocks 1983, Thistle 1989).
MEASUREMENT
OF INEQUALITY 45
(the concentration area). It perfectly captures Lorenz’s original intuition that, as the
bow is bent, so inequality increases, taking a minimum value 0 when all x; are
equal (i.e., when the Lorenz curve is equal to the 45” line), and a maximum value of
(n - l)/n whenever all xi except one are zero. It depends on the underlying vectors
x only via the Lorenz curve of x, so it is obviously symmetric, homogeneous of degree
zero, and, by construction, compatible with the Lorenz ordering on D.The history of
this “mother of all indices of inequality” is nicely summarized in Giorgi (1990).
There are various equivalent expressions for the Gini coefficient, like Gini’s
own expression
where
i,j=l
i,j=l ;>;=I i= 1
where the latter expression is most useful when discussing sensitivity issues (see
below), or for showing that A(x)/2x is indeed equal to twice the concentration area.
Yet another algebraic identity was unearthed by Lerman and Yitzhaki (1984),
and independently by Berrebi and Silber (1987),who show that twice the Gini index
+
equals the covariance between the “rank gaps” r; := i - ( n - i 1) and the “share
gaps” s; = (x(;)- x(n-;+l))/ x(;)of the data. This identity, for instance, implies
that the Gini index is never smaller than 1/2 for symmetric distributions and that
a necessary condition for G ( x ) < 1/2 is that x is skewed to the right. It can also
be used to facilitate the empirical computation of the Gini index from large sets of
individual data, where competing measures were often preferred to the Gini index
only because of computational convenience.
In the context of income equality, an interesting intuitive foundation of the
Gini index based on (13) which does not rely on the geometry of the Lorenz curve
was suggested by Pyatt (1976):Choose randomly some unit, say i, with income xi,
and choose independently and randomly another unit x;. Then, if i keeps his income
if x; 5 xi and receives the difference if x; > xi, the Gini index is the expected profit
from this game, divided by average income.
46 KRAMER
with one coefficient for each value of a parameter U > 1. For U = 2, this is the
standard Gini index G ( x ) , but as U increases, higher weights are attached to small
incomes; the limit as U + 00 of G,(x) is 1 - %(I)/%, so in the limit inequality
depends only on the lowest income (given x), expressing the familiar value judgment
introduced by Rawls, that social welfare (viewed as a function of inequality) depends
only on the poorest member of society.
Mehran’s (1976) linear measures of inequality are defined as
tion percentage” (the relative amount of income transfers necessary for total equal-
ization), and it is also algebraically identical to one half of the relative mean
deviation:
2x
This measure was suggested, more or less independently, by Bresciani-Turroni
(1910),Ricci (1916),von Bortkiewicz (1931),Schutz (1951), Kuznets (1959),and
Elteto and Frigyes (1968), to name a few of the authors who have given it their at-
tention. For simplicity, I will refer to it as the Pietra index P ( x ) , in honor of Pietra
(1914/ 15).*
In the context of income inequality, it has long been viewed as a major draw-
back of both the Gini and Pietra indices and of related indices derived from them
that they are not explicitly linked to some social welfare function W ( x ) . t This un-
easiness was first expressed in Dalton’s (1920) proposal to use the “ratio of the total
economic welfare attainable under an equal distribution to the total welfare attained
under the given distribution” as a measure of income inequality; it has led Atkinson
(1970) to define inequality as A ( x ) = 1 - y/Z, where the scalar y is the income
that, if possessed by everybody, would induce the same social welfare as the actual
income vector x.
Given an additively separable welfare function W ( x ) = U ( x i ) / n ,where
U ( . ) is utility, and a utility function
A&(%)=
I 1-
geom. mean of x
arith. mean of x
(E
(E
2 0, E # 1)
= 1)
*See Kondor (1971) or Chipman (1985, pp. 142-143) for a hrief sketch of its rather long and winding
genesis.
?Of courhe, an implicit link can in most cases easily he constructed, as shown by Sheshinski (1972) for
the Gini index. In view of Newhery (1970), who shows that the Gini index is incompatible with additive
separable welfare functions, these implied welfare functions are sometimes rather odd.
48 KRAMER
(As) x LM y * 5 I(y).
These axioms, though disputable, are widely accepted. The next two constitute a
watershed, separating leftist from rightist measures of inequality:
*As this can happen only for very high incomes, where the very rich give to the not so very rich, the
significance of this aberration is in practice much disputed, and this measure continues to he widely
used.
MEASUREMENT
OF INEQUALITY 49
(A7) Z(x, 0) = Z(x) (i.e., appending zeros does not affect inequality).
(A8) Z(x, x, . . . , x) = Z(x) (the “population invariance”: replicating a pop-
ulation rn times does not affect inequality).
A prominent index satisfying (A7) but not (A8) is the Herfindahl coefficient
It is often used in legal disputes about merger activities (see, e.g., Finkelstein and
Friedberg 1967),where (relatively) small firms do not matter for the degree of con-
centration in the market, and where the axiom (A8) would in fact constitute a liability.
Unlike (A4) and (A5), (A7) and (A8) are nontrivially compatible with each
other and (Al)-(A4): consider the well-known Theil index
*At least not in the sense defined by (A9). However, as shown by Kakwani (1980, pp. 178-181), the Gini
index can be computed from information about factor incomes. This issue of decomposition by factors
versus decomposition by subgroups has engendered its own literature, which we do not have space to
cover here.
50 KRXMER
where e(i) is an ni-vector of ones, w; >_ 0, and CE1wi = 1. This means that
overall inequality can be expressed as a weighted sum of within-group inequalities,
plus between-group inequality, defined as the overall inequality that would obtain if
there were no inequality within the groups.
Still stronger is what Foster (1983) calls Theil decomposability:
i.e., the weights are the income shares of the groups. Not surprisingly, the Theil
coefficient T ( x ) from (23) is decomposable that way.
A final set of axioms refers to “transfer sensitivity” (Shorrocks and Foster
1987):These principles strengthen the Pigou-Dalton axiom (A3)by requiring that the
reduction in inequality resulting from a Robin Hood transfer should ceteris paribus
be larger, the poorer the recipient. Relying solely on (A3),one could have “a sit-
uation in which a millionaire made a small (regressive) transfer to a more affluent
millionaire and a simultaneous large (progressive) transfer to the poorest person in
society” (Shorrocks and Foster 1987, p. 485), but where the combined effect of these
transfers is that inequality increases: while the regressive transfer increases inequal-
ity and the progressive transfer reduces inequality, the axiom (A3) by itself puts no
restraint on the relative magnitude of these effects, so one needs an additional axiom
to prohibit such eccentric behavior.
(A12) I (y’) 5 I (y) whenever y’ and y differ from x by a Robin Hood transfer
of the same size, with spender and recipient equal amounts apart, but
where the recipient in y’ is poorer.
This notion can also be formalized by defining the sensitivity of an inequality index
I ( - ) ,evaluated at the components x(;) and x(;) of some vector x, as
-k , i , ; )
Sl(x, i , j ) = lim
s+o 6
whenever this limit exists, where
MEASUREMENT
OF INEQUALITY 5I
2
S&, i, j ) = ____ ( j - (25)
i.e., the sensitivity depends only on the rank of the units involved in the transfer, not
upon their incomes. In particular, this implies that equal transfers of incomes have
most effect where the population is densest, which is usually in the center.
On the other hand, for the Atkinson family, sensitivity is easily seen to be
proportional to - l/xFjl, so sensitivity increases both with E and with x(i),
given Ix(,) - and with x(,) - x(;)given x(,): ceteris paribus the decrease in
inequality is larger, the poorer the recipient and the larger the income gap between
spender and recipient.
Given some set of axioms (often augmented by normalization restrictions on
the range, or requirements concerning continuity and differentiability), the following
questions arise: (1)Are the axioms consistent with each other? (2) Are all the axioms
really necessary? (3)What do indices which satisfy these axioms look like?
The first question is usually answered by exhibiting some specific measure
that satisfies all requirements. The second question is trickier. While some axioms
are easily seen to be implied by others ((A3) by ( A l ) and (AZ), (A9) by (AlO), (A8) by
(All)), others are not: As Russell (1985)demonstrates, at least two of the require-
ments that Cowell (1980)imposes to characterize the CES class of inequality indices
are already implied by the others, and this implication is anything but trivial to see.
Usually, such questions of minimality are settled by exhibiting, for every axiom, at
least one index that fails this test but satisfies the others (Eichhorn and Gehrig 1982).
The third question has generated a minor industry, producing results of the
type: Any function I : D + R with continuous first-order derivatives satisfying (AZ),
(A4), (A8) and (AlO), plus I ( e ) = 0, can be expressed as a positive scalar multiple
of some function
with some c E R (Shorrocks 1980). This class of indices has become known as the
generalized entropy family; as special cases corresponding to c = 1 and c = 0
(defined as the limit of I,(x) as c -+ 1 or c -+ 0), it includes the Theil index T ( x )
defined in (23) and another index proposed by Theil (1967):
1= 1
In the same vein, Foster (1983)shows that an inequality index satisfies (A3), (A4),
and ( A l l ) if and only if it is a positive multiple of the Theil coefficient T ( x ) , and
52 KRAMER
Eichhorn (1988) shows that the only functions I : D -+ R satisfying (A3), (A6), and
I ( e ) = 0 are
Ax + ( 1 - A)e
AT+ (1 - A)
V. EMPlRlCAL IMPLEMENTATION
The empirical application of measures of inequality raises various issues which set
this branch of statistics apart from others. A first and minor problem is the proper
inference from a sample to a larger population. As samples are typically large, or
nonrandom, or populations rather small as in the context of industrial concentration,
relatively little work has been done on this.*
Much more important, in particular in the context of income inequality, is the
incompleteness of the data: Typically, income figures are available only for certain
quantiles of the population, and the rather voluminous literature on inequality mea-
surement with incomplete data can be classified according to the amount of addi-
tional information available.
For the case where only selected points of the Lorenz curve are given (i.e., frac-
tions of total income received by fractions of the total (ordered) population), Mehran
(1975)gives the most extreme Lorenz curves that are compatible with these points,
in the sense that the resulting concentration areas are the smallest and the largest
possible.
Obviously, the upper bound is attained by joining the observed points by
straight lines, as in Figure 2. This is at the same time an upper bound for the true
underlying Lorenz curve. Likewise, an obvious lower bound to the true underlying
Lorenz curve is given by extending these lines, as again in Figure 2. However, as
these line segments do not form a Lorenz curve, Mehran (1975) proposes tangents
to the true Lorenz curve at the observed points such that the concentration area and
thus the Gini index are maximized, and he gives a recursive algorithm to compute
the tangents' slopes. Although this curve does not necessarily bound from below the
true Lorenz curve, it gives the Lorenz curve which represents (in the sense of the
Gini index) the most unequal distribution compatible with the data.
If, in addition to selected points of the Lorenz curve, we are given the interval
means di), the interval endpoints a;-] and a ; , and the fraction of the population in
*See, for instance, Sendler (1979)on the asymptotic distribution of the Gini index and the Lorenz curve,
or McDonald and Ransom (1981)for the effects of' sampling variability on the bounds below.
MEASUREMENT
OF INEQUALITY 53
Figure 2 Given 2 nontrivial points P1 and P2, the true underlying Lorenz curve must pass
through the shaded area.
i,j=1 i= 1
this decomposition provides immediate bounds for the Gini index, with the lower
one being attained for A(i) = 0, (i = 1 , . . . , m ) , i.e., when there is no inequality
within the groups, and the upper bound being attained when the observations in a
given group are placed at both ends of the interval in a proportion such that the group
mean is z(~):
*In part already in Pizzetti (1955); see Giorgi and Pallini (1987).
54 KRXMER
i,j=l
those of the true underlying Lorenz curve* (Hermite interpolation). The resulting
function need not be convex, i.e., need not be a Lorenz curve itself, but it is easily
seen (see, e.g., Schrag and Kramer 1993)that the Hermite interpolator is convex and
monotone if in each interval
i.e., if the group means stay clear of the endpoints of the intervals.
As to fitting parametric Lorenz curves, various families have been proposed,
whose usefulness however seems much to be in doubt as the implied indices of in-
equality often violate the Gastwirth bounds. One such popular family, suggested by
Rasche et al. (1980)is
There are various additional research areas which impinge on inequality, such as
the welfare ranking of income distributions (Shorrocks 1983, Thistle 1989),the mea-
surement of interdistributional inequality (Butler and McDonald 1987, Dagum 1987),
or the parametric modeling of income distributions and the inequality orderings that
are implied by the parameters (Chipman 1985, Wilfling and Kramer 1993), which
I do not cover here. Rather, I pick the issues of multidimensional inequality and
poverty to show how the concepts introduced above can be extended.
Take poverty. Although among certain sociologists, this is taken to be almost
synonymous to inequality (“relative deprivation”), the majority consensus is that
*Rememberthat the Lorenz curve has the slope x ( , ) / X in the interval ((i - l)/n, i / n ) , so its (left-hand)
slope in pi is sip.
56 KRAMER
poverty is something different, and needs different measures for quantification. Fol-
lowing Sen (1976), this is usually done by first identifying a poverty line z below
which an income unit is considered poor, and then combining the poverty character-
istics of different units into an overall measure of poverty. Two obvious candidates
are the “head-count ratio”
n*
H(x) = -
n
where n* is the number of units below the poverty line, and the “income gap ratio”
i.e., the normalized per unit percentage shortfall of the poor. Individually, both mea-
sures have serious shortcomings-the head-count ratio is completely insensitive to
the extent of poverty, and might even fall when income is transferred from a poor
person to somebody not so poor, who thereby moves above the poverty line, and the
income gap ratio takes no account of the numbers of the poor-but they can be com-
bined into a measure with various desirable properties (Sen 1976):
+
S ( x ) = H ( x ) [ l ( x ) (1 - Z ( x ) ) G ( x ) ] (38)
-
where G ( x ) = G ( x ( l ) ,. . . , x(,,*))is the Gini index of the incomes of the poor. This
measure reduces to H ( x ) . Z(x) if all the poor have the same income, and it increases
whenever an income of a poor person is reduced (while incomes above the poverty
line always receive uniform weights, entering only via the head-count ratio H ( x ) ) .
For large n*, the coefficient S ( x ) can be shown to be almost identical to
i.e., it is a weighted sum of the income gaps of the poor, with weights increasing
as income gaps increase, which at the same time points to a serious shortfall of this
measure of poverty; it is based solely on the ranks, not on the distances of the incomes
of the poor, so there has been an enormous literature in the wake of the seminal paper
by Sen (1976), surveyed in Seidl (1988), which goes on from here.
Comparatively little work has been done on the second important issue of mul-
tidimensional inequality. While it is widely recognized that a single attribute such
as income is often not sufficient to capture the phenomenon whose inquality is to be
determined, the statistician’s toolbox is almost empty here.
There is a short chapter on multivariate majorization in Marshal1 and Olkin
(1979), where majorization among n-vectors x and y is extended to majorization
among M x n matrices X and Y : By definition, X > M Y if and only if Y = X P
with some doubly stochastic matrix P. When m = 1, this is equivalent to Y =
OF INEQUALITY57
MEASUREMENT
XT1, T2, . . . , TL with finitely many matrices of the form (2), but when rn 2 2 and
n 2 3, the latter condition is more restrictive, as simple examples show (Marshal1
and Olkin 1979, p. 431).Therefore I denote the latter preordering by ">T."
As shown by Rinott (1973),a differentiable function f:R""" + R respects
the preordering > T if and only if f ( x ) = f ( x P ) for all permutation matrices P , and
if for all j , k = 1, . . . , n,
and define
For rn = 1, this boils down to the univariate Lorenz order, as L Z ( X ) is then the
subset of R2 enclosed by the Lorenz curve L x ( p ) and the "dual Lorenz curve"
E x ( p ) := 1 - L x ( p - 1) (the area of which is equal to the Gini index G ( x ) ) . For
rn > 1, the Lorenz zonoid is a convex subset of the unit cube in Rrn+'.
The multivariate Lorenz ordering defined by (42) allows matrices of different
means and row dimensions to be compared. It is further related to univariate Lorenz
dominance by the fact that X 21,Y if and only if the generalized Lorenz curve of
d'X is below the generalized Lorenz curve of d'Y for all coefficient vectors d such
that d ; 2 0 and x ( d ; ) = 1 (Koshevoy and Mosler 1996, Theorem 3.1).
Other generalizations from univariate to multivariate concepts of inequality
exploit the parallel between inequality and choice under uncertainty (Kolm 1977).
Atkinson and Bourgignon 1982),or the obvious relationship between multivariate in-
equality and the decomposition of inequality by factor components (Shorrocks 1988,
Maasoumi 1986, Rietveld 1990), but as this field has not yet reached the maturity
for a useful survey, I had better close my survey here.
58 KRAMER
ACKNOWLEDGMENTS
Mosler, K. (1994), Majorization in Economic Disparity Measures, Linear Algebra and Its Ap-
plications, 199,91-114.
Moyes, P. (1987), A New Concept of Lorenz Domination, Economics Letters, 23,203-207.
Newbery, D.A. (1970), Theorem on the Measurement of Inequality, Journal of Economic The-
ory, 2,264-266.
Pfingsten, A. (1988a), New Concepts of Lorenz Domination and Risk Aversion, Methods of
Operations Research, 59, 75-85.
Pfingsten, A. (1988b),Progressive Taxation and Redistributive Taxation: Different Labels for
the Same Product?, Social Choice and WeZjare, 5,235-246.
Pietra, G. (1914/1915), Della Relazione Tra Gli Indice di Variabilita, Atti del Realo Zstituto
Veneto di Scienze, 775-792; 793-804.
Pigou, A. C. (1912), Wealth and Welfare, Macmillan, London.
Pizzetti, E. (1955),Osservazioni Sul Calculo Aritmetica del Rapport0 di Concentrazione, Studi
in onore di Guetano Pietra, Capelli, Bologna.
Pyatt, G. (1976), On the Interpretation and Disaggregation of Gini-Coefficients, Economic
Journal, 86,243-255.
Rasche, R. H., J. Gaffney, A. Y. C. Koo, and N. Obst (1980), Functional Forms for Estimating
the Lorenz Curve, Econometrica, 48, 1061-1062.
Ricci, U. (1916), L'indice di Variabiliti e la Curva d e Redditi, Giornali Degli Economisti e
Reivsta de Statistica, 3,177-228.
Rietveld, P. (1990), Multidimensional Inequality Comparisons, Economics Letters, 32, 187-
192.
Rinott, Y. (1973), Multivariate Majorization and Rearrangement Inequalities with Some Ap-
plications to Probability and Statistics, Israel Journal ofklathematics, 1 5 , 6 0 4 7 .
Russell, R. (1985), A Note on Decomposable Inequality Measures, Review of Economic Stud-
ies, 52,347-352.
Schader, M. and F. Schmid (1994), Fitting Parametric Lorenz Curves to Grouped Income
Distributions-A Critical Note, Empirical Economics, 19, 361-370.
Schrag, H. and W. Kramer (1993), A Simple Necessary and Sufficient Condition for the Con-
vexity of Interpolated Lorenz Curves, Stntistica, 53, 167-170.
Schutz, R. R. (1951), On the Measurement of Income Inequality, American Economic Review,
41,107-122.
Seidl, Ch. (1988),Poverty Measurement: A Survey, Welfare and Efficiency, in D. Bos, M. Rose,
and Ch. Seidl (eds.), Public Economics, Springer, Berlin.
Sen, A. (1976), Poverty: An Ordinal Approach to Measurement, Econometrica, 44,219-231.
Sendler, W. (1979),On Statistical Inference in Concentration Measurement, Metrika 26, 109-
122.
Sheshinski, E. (1972), Relation between a Social Welfare Function and the Gini Index of
Income Inequality, Journa2 of Economic Theory, 4,98-100.
Shorrocks, A. F. (1980), The Class of Additively Decomposable Inequality Measures, Econo-
metrica, 48,613-625.
Shorrocks, A. F. (1983), Ranking Income Distributions, Economica, SO, 3-17.
Shorrocks, A. F. (1984), Inequality Decomposition by Population Subgroups, Econometrica,
52,369-1385.
Shorrocks, A. F. (1988), Aggregation Issues in Inequality Measurement, in W. Eichhorn (ed.),
Economics, Physica, Heidelberg.
MEASUREMENT
OF INEQUALITY 6I
Martin Ravallion
World Bunk, Washington, D.C.
As China’s economic miracle continues to leave millions behind, more and more
Chinese are expressing anger over the economic disparities between the flour-
ishing provinces of China’s coastal plain and the impoverished inland, where 70
million to 80 million people cannot feed or clothe themselves and hundreds of
millions of others are only spectators to China’s economic transformation. The
New York Times, December 27, 1995, p. 1.
China is not unusual; almost all countries have their well-recognized “poor areas,” in
which the incidence of absolute poverty is unusually high by national standards. In
China, there is high poverty incidence in rural areas of the southwest and northwest
(the “inland” areas referred to in the quotation). Similar examples in other countries
include some of the eastern Outer Islands of Indonesia, parts of northeastern India,
northwestern and southern rural areas of Bangladesh, much of northern Nigeria, the
rural Savannah in Ghana, the northeast of Brazil, and many other places.
We would hope, and under certain conditions expect, that the growth process
will help these poor areas catch up. But that does not appear to be happening in some
countries. Figure 1 illustrates the divergence over time between the relatively well
off and more rapidly growing coastal areas of China and the lagging inland areas.
The figure plots the aggregate rate of consumption growth at county level in southern
China 1985-1990 against the initial county mean wealth. The data cover 119 coun-
ties spanning a region from the booming coastal province of Guangdong through to
the poor inland areas of Guizhou.* There is a positive regression coefficient, sug-
*The figure is reproduced from Ravallion and Jalan (1996).It is based on a panel of farm-household level
data for rural areas in four provinces of southern China. The data cover 4700 households living in 119
counties. The consumption measure is comprehensive, in that it includes imputed values (at local market
63
64 RAVALUON
0.4 0 0
0 0 0
0.2
0
0 0
-0.2 0 0 0
0
0
-0.4 I
-0.6
5.6 5.8 6.0 6.2 6.4 6.6 6.8 7.0 7.2 7.4 7.6 7.8 8.0
Log of county wealth per capita, 1985
gesting divergence, and it is significant (at the 1% level). Initially wealthier counties
tended to have higher subsequent rates of consumption growth.
Nor is China the only country in which poor areas appear to persist in spite
of robust economic growth; for example, the eastern Outer Islands of Indonesia ap-
pear to have shared rather little in that country’s sustained (and generally pro-poor)
economic growth since 1970. It seems that there is a degree of persistence in the
economic geography of poverty; indeed, a generation or more ago, the above list of
“poor areas” by country would probably have looked pretty similar.
As the opening extract suggests, there are widespread concerns about poor
areas, particularly when they persist amidst robust aggregate economic growth. In
assessing the social impact of economic growth or growth-oriented economic reform,
economists have traditionally focused on the impact on one or more measures of
prices) of consumption from own production plus the current service flows from housing and consumer
durables. The data also include a seemingly complete accounting of all wealth including valuations of
all fixed productive assets, cash, deposits, housing, grain stock, and consumer durables. The data are
discussed at length in Chen and Ravallion (19%).
POORAREAS 65
social welfare, including various measures of aggregate poverty. Yet, it appears that
impacts are typically diverse among the poor, and in the society as a whole; some lose
and some gain from economy-wide changes. This can be important to know, if only to
better understand the political economy of growth and reform, though policymakers
may well also make the (normative) judgment that a premium should be attached to
more “balanced” growth. Studying these diverse impacts may also hold important
keys to our understanding of the growth process itself and to a variety of questions
often asked by policymakers, such as what is the best policy response to the problem
of why some subgroups are lagging.
So why do some people, and in particular some regions, do so much better than
others in a growing economy? It turns out that most of the standard tools of analy-
sis used in studying poverty, distribution, and growth are ill-equipped to answer this
question. After reviewing those tools, this chapter suggests some new tools of empiri-
cal analysis that may offer a better chance of answering it. We look at the dynamics of
the geography of poverty from a microlevel to help understand the way various initial
conditions and exogenous shocks impinge on household-level prospects of escaping
poverty over time. While we note the links to various strands of theoretical and em-
pirical economics, the chapter is not a survey. Rather it tries to be forward-looking
on a set of seemingly important research questions, to explore how future research
might better address them.
This is also an issue of considerable relevance, as the chapter will emphasize.
The empirical approach outlined here would appear to entail a substantial expan-
sion in the number of policy-relevant variables which are included in microempiri-
cal models of poverty. Past interventions in poor areas are amongst those variables.
Faced with lagging regions amidst overall growth, governments and donors are regu-
larly called upon to do something about these lagging poor areas. Area-based inter-
ventions are now found in most countries.* How much impact do such interventions
have on living standards? To answer this we must be able to assess what would have
happened to living standards in the absence of the interventions. It should not be as-
~~
*For example, on recognizing the problem of lagging rural areas, China introduced a large antipoverty
program in 1986 which declared that 272 (rural) counties were “national-poor counties,” and targeted
substantial aid to those counties. The extra aid took the form of‘subsidized credit for village-level projects
(provided at well below market rates of interest), funding for public works projects (under “food-for-work”
programs), and direct budgetary support to the county government. This national poor area program is
the main direct intervention in the government of China’s current poverty reduction policy (Leading
Group 1988, World Bank 1992, Riskin 1994).Again China is not unusual. The World Bank has assisted
over 300 area development projects sinre the early 1950s spread over all regions; most of these projects
were designed to develop a selected rural area for the benefit of poor people. Other agencies, such as the
International Fund for Agricultural Development, also provide substantial support for such programs
(Jazairy et al. 1992).There has been a recent resurgence of interest in such programs in the World Bank
and elsewhere.
66 RAVALLION
sumed that such schemes will even entail net gains to poor people; by acting against
the flow of factor mobility from low to high productivity areas, it could be argued that
such interventions actually make matters worse in the longer term. Depending on
how the economy works in the absence of intervention-the nature of the technol-
ogy, preferences, and any constraints on factor mobility-a poor area program could
entail either a net benefit or net cost to poor people.
The paper argues that the geographic variation in both initial conditions and
the evolution of living standards over time offers scope for disentangling the effects of
poor-area programs from other factors. Even within poor countries, geographic areas
differ widely in their endowments of various aspects of “geographic capital,” includ-
ing locally provided public services and access to area-specific subsidies. These dif-
ferences are both geoclimatic and the outcomes of past policies and projects. There
is typically also a spatial variance in poverty indicators. The spatial variation in both
the incidence of poverty and in area characteristics offers hope of better understand-
ing why we see poor areas, what can be done to help them, and how well past efforts
have performed. By exploiting this spatial variation, we should be in a better position
to understand what role the lack of geographic capital plays in creating poor areas,
versus other factors including residential differentiation, whereby people who lack
“personal capital” end up being spatially concentrated.
The following section explains the motivation for studying the problem of “poor
areas.” Section I1 discusses the “standard” empirical tools found in practice, rely-
ing on either static micromodels or aggregate dynamic models. Section I11 discusses
a micromodeling approach and its links with recent work in economics on the de-
terminants of economic growth. Some potential lessons for policy are described in
Section IV.
1. MOTIVATION
A. W h y Do We See Unusually Poor Areas?
The starting point for assessing the pros and cons of poor-area policies is an under-
standing of why we see poor areas in the first place. Among economists and poli-
cymakers, a common explanation of poverty is based on an individualistic model in
which poverty arises from low household-level endowments of privately held produc-
tive resources, including human capital, albeit with important links to the regional
and macroeconomy, notably through wages and prices. This view is epitomized in
the familiar human-capital earnings functions. The dynamic version is the stan-
dard neoclassical growth model, the microfoundation of which assumes atomistic
agents linked only through trade at common prices. If one believes this model, then
poor areas presumably arise because people with poor endowments tend to live to-
gether. Area differences in access to (for example) local public goods might still be
allowed in such a model, but as long as there is free mobility they will not mat-
POORAREAS 67
ter to the welfare of an individual household, which will depend on its a-spatial
exogenous attributes. The types of antipoverty policies influenced by this model
emphasize raising the endowments of poor people, such as by enhanced access to
schooling.
Regional divergence is still possible in such a model. If there are increasing re-
turns to scale in private production inputs, then initially better-off areas, with better
endowments of private capital, will tend to see subsequently higher rates of growth.
This is the essence of the view of regional growth that one finds in the writings of
Myrdal (1957), Hirschman (1958), and others since (see the review in Richardson
and Townroe 1986). By this interpretation, persistently poor areas, and divergence
from wealthier areas, reflect the nature of the technology and the geography of nat-
ural resource endowments. There may still be a case for targeting poor areas, but it
would be a redistributive case, and it would imply a trade-off with the overall rate of
economic growth.
The individualistic model does not attach any causal significance to man-made
spatial inequalities in geographic capital-the set of physical and social infrastruc-
ture endowments held by specific areas. Indeed, with free mobility, the individu-
alistic model predicts that household welfare will only depend on private, mobile,
endowments, and other exogenous attributes of the household. Against this view,
one can postulate a geographic model in which individual poverty depends heavily
on geographic capital and mobility is limited. By this view, the marginal returns to a
given level of schooling, or a loan, depend substantially on where one lives, and lim-
ited factor mobility entails that these differences persist. Relevant geographic factors
might include local agroclimatic conditions, local physical infrastructure, access to
social services, and the stock of shared local knowledge about agroclimatic condi-
tions and about the technologies appropriate to those conditions. It is not implausible
that some or all of these geographic factors alter the returns to investments in private
capital. As I argue later, it is likely that they will also entail increasing returns to ge-
ographic capital when there are nonincreasing returns to private production inputs.
Thus it might well be that people are being left behind by China’s growth process
precisely because they live in poor areas; given their private endowments, they would
do better in China’s coastal areas.
If this model is right, then the policies called for will entail either public invest-
ment in geographic capital or (under certain conditions, discussed below) proactive
efforts to encourage migration, and such policies need not entail a trade-off with the
overall rate of growth. That will depend on the precise way in which differences in
geographic capital impact on the marginal products of private capital and, hence,
the rate of growth. That is an empirical question.
Neither model provides a complete explanation for poor areas. The individ-
ualistic model begs the questions of why individual endowments differ persistently
and why residential differentiation occurs. The geographic model begs the questions
of why community endowments differ and why mobility is restricted. But, as I will
68 RAVALLION
argue, knowing which model is right, or what the right hybrid model looks like, can
provide valuable information for policy.
8. PastWork
What do we already know about poor areas that might throw light on which of these
models is most relevant? There has been a vast amount of empirical research testing
the individualistic model, including human-capital earnings functions and similarly
motivated income and consumption determination models estimated on microdata
(Section 1I.B). This research has often assumed that the individual, “private capi-
tal,” model holds. While sometimes spatial variables are added, this is done in an
ad hoc way. At the same time, there is also a large, but mostly independent, liter-
ature on economic geography and regional science which has emphasized the im-
portance of spatial effects on the growth process (for a survey see Richardson and
Townroe 1986). The individualistic model has not been tested rigorously against the
geographic model, in an encompassing framework which would allow the two models
to fight it out.
But there is evidence of spatial effects in the processes relevant to creating
and perpetuating poverty. The evidence of spatial effects comes from a variety of
sources, including the following.
1. “Poverty profiles” (decompositions of aggregate poverty measures by sub-
groups of a population, including area of residence) typically contain evidence of
seemingly significant spatial differences in poverty incidence or severity. However,
typical poverty profiles do not allow one to say whether it is the individualistic model
or the geographic model that is producing these spatial effects. In the (far fewer)
cases in which suitable controls were used, spatial effects did appear to persist (van
de Walle 1995, Jalan and Ravallion 1996, Ravallion and Wodon 1997).
2. In some of the settings in which there are persistently poor rural areas
there does not appear to be much mobility among rural areas (though more so from
rural to urban areas). In some cases (such as China) mobility has been deliberately
restricted, but intrarural mobility seems uncommon elsewhere (such as in much of
South Asia, though exceptions exist, such as seasonal migration of agricultural la-
bor). Then the individualistic model immediately seems implausible; for how did the
residential differentiation come about with rather little mobility?
3. The literature on the diffusion process for new farm technologies has em-
phasized local community factors, including the demonstration effect of the presence
of early adopters in an area, and there is some supportive evidence for India in Foster
and Rosenzweig (1995).
4. There is also evidence for India that areas with better rural infrastructure
grow faster and that infrastructure investments tend to flow to areas with good agro-
climatic conditions (Binswanger et al. 1993).The type of data used has not, however,
allowed identification of external effects (discussed further in Section 111).
POORAREAS 69
C. Poor-Area Policies
What are the policy options in assisting poor areas? Two broad types of area-based
policy intervention can be identified which are aimed (explicitly or implicitly) at
poverty reduction: one is geographic targeting of subsidies, taxes, or public invest-
ments, and the other is migration policy.
Geographic targeting of antipoverty schemes has been popular, though so far
the assessments of poverty impacts have largely ignored dynamic effects. The at-
traction of this policy option for targeting stems from the existence of seemingly
70 RAVALLION
*For India, Datt and Ravallion (1993) consider the effects on poverty of pure (nondistortionary) trans-
fers among states, and between rural and urhan areas. They find that the qualitative etfect of reducing
regional/sectoral disparities in average living standards generally favors the poor. However, the quanti-
tative gains are small. For example, the elimination of regional disparities in the means while holding
intraregional inequalities constant, would yield only a small reduction in the proportion of persons below
the poverty line, from an initial 33% to 32%. Also see Ravallion (1993) for Indonesia.
POORAREAS 71
nature and extent of the interaction efSects among area characteristics as they affect
living standards and their evolution over time.
There are also potentially important implications for economic evaluations
of the dynamic gains from area-specific interventions to reduce poverty. With lit-
tle mobility, living in a designated poor area can be taken as exogenous to house-
hold choices. However, the existence of spatial externalities may well entail that the
growthpath of future household living standards is dependent on the same area char-
acteristics which influence the public decision to declare the community poor. The
problem is essentially one of omitted-variable bias when there is state dependence in
the growth process. For example, a low endowment of local public goods may simul-
taneously induce a lower rate of growth and a higher probability of the community
being declared poor. Unless this is accounted for, the value to households of living in
an area which is targeted under a poor-area program will be underestimated (Jalan
and Ravallion 1996).
Assessing the case for all such interventions requires a deeper understanding
of how poor areas came to exist. That understanding can also inform other areas of
policy. The case for proactive migration policies may be strengthened if one finds that
there are strong geographic factors in the creation and perpetuation of poor areas.
That will depend in part on the precise nature of those factors. If the geographic effect
is largely explicable in terms of physical infrastructure endowments (as at least the
proximate cause), then the case for migration policies will be strengthened; migrants
who go to better-endowed areas will gain, and those left behind will also gain if there
is less crowding of the existing infrastructure in the poor area. But if the geographic
factors are largely social (to do with social capital, or the spillover effects of local
endowments of human capital), then the migration policy may make matters even
worse for those left behind. (This is often said about the effects of suburbanization
or inner-city areas in the United States.)
Motivated by the above discussion, the following sections will discuss various
approaches to empirical modeling which might prove fruitful in understanding the
economic geography of poverty so as to inform these difficult policy issues.
where a and Pj, are parameters to be estimated and uit is an unobserved error term.
If we were to repeat this for other years, then we could also see how the geographic
poverty profile changes over time.
This type of geographic poverty profile is found in (for example) almost every
poverty profile found in the World Bank’s Country Poverty Assessments (the regres-
sion may not be run, but in this case a bivariate cross-tab of the poverty measure by
region is just another way of running the regression). While useful for some purposes
(including geographic targeting), this type of poverty profile tells us nothing about
why there is more poverty in one place than another. It may be, for example, that
people with little education tend to live in certain places. Then if one controlled for
education, the regional dummy variables would become insignificant.
Extending this logic, it is becoming common practice to estimate more com-
plex multivariate models in which a set of household characteristics are added, rep-
resented by the vector xi, giving the augmented regression
The specification in (2) and (3) imposes additive separability between the
regional effects and household-level effects. However, this can be readily relaxed
POORAREAS 73
by adding interaction effects (so, for example, returns to education may depend on
where one lives). All parameters can be allowed to vary locationally by estimating a
separate regression on household characteristics for each region.
If one has access to repeated cross-section samples representing the same pop-
ulation then these static models can be repeated to see how the identifiable regional
effects (both conditional and unconditional) have evolved over time. Does one find,
for example, that poor regions are catching up over time, or are they diverging? This
can be addressed by studying how the (conditional and unconditional) data-specific
coefficients on the area dummies evolve.
So far the discussion has focused on a single welfare indicator. But this is
almost surely too restrictive. More generally one can postulate a set of indicators
aiming to capture both “income” and “non-income” dimensions of well-being. In
addition to consumption of market goods and services one could include indicators
of attainments in terms of basic capabilities, such as being healthy and well nour-
ished. The aim here is not to make a long and overlapping list of such indicators but to
capture the aspects of welfare that may not be convincingly captured by consumption
or income as conventionally defined (Ravallion 1996).So indicators of child nutri-
tional status or morbidity would be compelling since conventional household-level
aggregates may be weak indicators of distribution within households.
Static “poverty regressions” have become a standard tool in poverty analysis. The
most common approach assumes that the poverty measure is the headcount index,
given by probability of living below the poverty line. One postulates that real con-
sumption or income Ci is a function of a (column) vector of observed household char-
+
acteristics xi, namely Ci = px; E ; , where p is a (row) vector of parameters and ii
POORAREAS 75
is an error term; this can be termed the “levels regression.” A now common method
in poverty analysis is not to estimate the levels regression but to define the binary
variable h; = 1 if z 2 C ; , and hi = 0 otherwise. The method then pretends not to
observe the yis, acting as if only hi and the vector of characteristics x; is observed.
The probability that a household will be poor is P = Prob[C < z I x] = Prob[& <
t - 8x1 = F ( t - px), where F is the cumulative distribution function specified
for the residuals in the levels regression. A probit or logit is then usually estimated,
depending on the assumption one makes about the distribution of the error term &i.*
(One could also use a semiparametric estimator which allows the distribution of the
error to be data determined.) One can also generalize this procedure to other (“higher
order”) poverty measures and estimate censored regression models.
However, this common practice is difficult to defend since-unlike the usual
binary response model-here the “latent” variable is fully observed. So there is
no need for a binary response estimator if one wants to test impacts on poverty of
household characteristics. The parameters of interest can be estimated directly by
regressing Ci on xi. The relevant information is already contained in the levels re-
gression which is consistently estimable under weaker assumptions about the errors.
Measurement errors at extreme Cs may prompt the use of probits, though there are
almost certainly better ways of dealing with such problems, which do not entail the
same loss of information, such as by using more robust estimation methods for the
levels regression.
Nor is the “poverty regression” method necessary if one is interested in cal-
culating poverty measures conditional on certain household characteristics. Subject
to data availability, Prob[C < z I x] can be estimated directly from sample data.
When the number of sampled households with a specific vector of characteristics of
interest, xl say, is too small to reliably estimate Prob[C < z 1 XI]from a subsample,
one can also turn to regression methods for out-of-sample predictions. But these pre-
dictions can also be retrieved from the levels regression, though one must then know
the distribution of the errors. (For example, if the errors are normally distributed with
zero mean and a variance o’,then the probability of being poor is F [ ( z - Bx)/a],
where F is standard normal.) There is nothing gained from using a binary-response
estimator, so the econometric sophistication of “probits” and so on buys us very little
in this case.
Poverty regressions may make more sense if one wants to test the stability of
the model for poverty across a range of potential poverty lines. Suppose, for example,
that one of the regressors is the price of food, and that very poor people tend to be net
*The earliest example that I know of is Bardhan (1984),who used a logit regression of the probability of a
household being poor against a range of household and community characteristics using sample survey
data for rural West Bengal. Other examples include Gaiha (1988), Grootaert (1994),Foley (1995),and
World Bank (1995).
76 RAVALUON
consumers of food, while those who are somewhat better off tend to be net producers.
Then the distributional shift with a change in the price of food will not entail first-
order dominance, as assumed by the standard levels regression. Instead one might
want to specify a set of regression functions the parameters of which vary according
to the segment of the distribution one is considering. One way of estimating such
a model is by assuming that the segment-specific error terms are of the logit form,
entailing a multinomial logit model (Diamond et al. 1990).
All the above models are essentially static; some welfare indicator at a single
date is modeled as a function of a range of individual and geographic data. Such mod-
els cannot distinguish effects on the growth rates of consumption (or other welfare
+
metric) from effects on its level. The true model could be Cj, = (j3 y t ) x i E ; ~ im-
, +
plying that x influences the growth rate of consumption as well as its level. However,
static data cannot distinguish B from y .
D. Aggregate Models
*See Barro and Sala-i-Martin (1995) for an overview of'these models, with developed country applications;
-L
examples for developing countries include Cashin and Sahay (1995)and Datt and Ravallion (1996).
1 Datt and Ravallion (19%) use a long time series of' repeated cross-sectional surveys (for India) to relax
this attribute of standard growth models.
POORAREAS 77
cision. However, bias can still arise from either migration (discussed above) or omit-
ted variables which simultaneously influence household-level welfare outcomes and
program placement.
Another disadvantage of aggregate models is that they do not allow one to dis-
tinguish internal and external effects on production and welfare. This can matter to
policy. By using the household as the unit of observation one can identify external
effects of geographic capital, including local public goods, on production processes
at household level (Ravallion and Jalan 1996). Consider Figure 1, based on aggre-
gate (county-level) data for China. Such evidence tells us nothing per se about the
spatial effects in a growth process. The highly aggregated form of such data does not
allQw one to distinguish two possible ways in which initial conditions may influence
the growth process at the microlevel. One way is through effects of individual condi-
tions on the individual growth process, and this is a common interpretation given to
nonzero values of the regression coefficient on initial income in a growth regression;
declining marginal product of capital would suggest a tendency for convergence; by
this interpretation, the type of divergence depicted in Figure 1 suggests increasing
returns toprivate capital. If this is right, then regional divergence, and the existence
of persistently poor areas, is to be expected when the rate of growth is at its maximum.
Conversely, under these conditions, governmental attempts to shift the allocation of
investment in favor of poor areas will entail a growth cost, though a policymaker
may still be willing to pay that cost to achieve a more balanced (and possibly more
sustainable) growth path.
But there is another way in which the divergence in Figure 1 can arise, even
with declining marginal products with respect to own capital at the microlevel and
constant returns to scale in private inputs. The microgrowth process might be driven
by intraregional externalities; individual growth prospects may be better in an ini-
tially better-off region through positive local spillover effects. Quite generally, the
marginal product of capital will depend on area characteristics. (Only with rather
special separability assumptions will this not be true.) There may well be declining
marginal products with respect to “own capital” but increasing marginal products to
geographic capital. Indeed, if there is constant returns to scale in the private inputs,
and geographic capital is productive, then there must be increasing returns overall.*
That may well be why we see the aggregate divergence in Figure 1,with the external
effect dominating. But the aggregation hides the difference. If in fact the regional
divergence is really due to the external effect of differences in geographic capital,
*Let output be F(K, G ) ,where K is a vector of private inputs (“own capital”) and G is a vector of pub-
lic inputs (“geographic capital”), and consider any A > 1. By constant returns to K , F(AK, AG) =
AF(K, AG) > AF(K, G ) , since geographic capital has a positive marginal procinct. Thus I;‘ exhibits
increasing returns to scale overall.
78 RAVALLION
then successful interventions to reduce the inequality in geographic capital need not
entail any cost to the overall rate of growth.
This second “external” channel through which area characteristics can alter a
growth process has received relatively little attention in empirical work on the deter-
minants of economic growth, though the possibility has been recognized in some of
the theoretical literature (notably Romer 1986 and Lucas 1988). The reason why this
external channel has been relatively neglected in growth empirics is undoubtedly
that the level of aggregation in past work has meant that-even if one was aware of
the possibility-the genuinely spatial effects of intraregional spillover effects could
not possibly be identified empirically.
To encompass both the “internal” (individualistic) and “external” (geographic)
channels through which initial conditions can affect a growth process one needs
to model that process at the microlevel. The growth rate for each household will
be a function of both its own initial conditions, characteristics of the area in which
the household lives, and external shocks during the period. The areawide growth
relationships (such as depicted in Figure 1) can then be interpreted as (approximate)
averages formed over the underlying microgrowth processes; but in the averaging one
loses the ability to distinguish the internal from the external effects (Ravallion and
Jalan 1996).
The recurrent problem in aggregate models is that the economic theory which
motivates them is typically a microeconomic model. So tests using aggregate data
always beg the question of whether one is testing the micromodel or the aggregation
assumptions.
*A number of versions of the classic Ramsey model-in which an intertemporal utility integral is max-
imized subject to flow constraints and product ion functions-have been proposed which can yield a
nonzero solution for the rate of consumption growth which will be a function of initial human and phys-
ical assets as well as preference and production parameters. For surveys of the theories of endogenous
growth see Grossman and Helpman (1991), Hammond and Rodriguez-Clare (1993),and Barro and Sala-
i-Martin (1995).
POORAREAS 79
of various privately provided inputs, but that output also depends positively on the
level of geographic capital. On fully accounting for all private inputs (all profits being
reckoned as payments for those inputs), there will then be constant returns to scale to
the privately provided inputs, but increasing returns to scale over all inputs, includ-
ing geographic capital. With the farm-household maximizing an intertemporal utility
sum-with instantaneous utility depending on current consumption, which must be
partly forgone to ensure future output-one can derive an endogenous consump-
tion growth rate which depends on the initial endowments of both private capital
and geographic capital. With this reinterpretation, the results on existence and wel-
fare properties of equilibrium in Romer (1986)model can be applied to the present
problem.
The key intertemporal equilibrium condition from such a model equates the
intertemporal marginal rate of substitution with the marginal product of “own capi-
tal,” which is a decreasing function of the initial endowment of own capital and in-
creasing in the amount of geographic capital, taken as exogenous at the microlevel.
With appropriate functional forms, the farm-household’s consumption growth rate
over any period is then a decreasing function of its endowment of private capital and
an increasing function of the level of geographic capital.
Past growth empirics have relied on country or regional aggregates. The trans-
lation of this approach to the microlevel is straightforward; one is simply undoing
the aggregation conditions used to go from the microgrowth theory to the aggregate
regional or country data. The translation is even more straightforward when it is
noted that many of the households in the world’s poor areas are farm households
who jointly produce and consume, rather than economies in which separate con-
sumers and producers interact through trade. But this is largely a matter of inter-
pretation; the separation of an economy into households (which consume) and firms
(which produce) is not essential in theoretical growth models.* In the present setting,
the farm-household can be thought of as a small open economy, trading with those
around it.
It should also be recognized that a poor area may have become poor due to a
location-specific transient shock (a local drought, for example). There may also be
lags in the growth process of consumption. By explicitly modeling these features of
the data generation process, it should be possible to identify longer-term impacts in
panels of sufficient length. (Averaging prior to estimation is not an efficient way of
dealing with these data features.)
Motivated by the above considerations, an empirical approach can be sug-
gested which entails consistently estimating a dynamic model of consumption growth
*Standard endogenous growth models postulate separate households and firms, but an equivalent formu-
lation is possible in which households both consume and produce (Barro and Sala-i-Martin 1995).
80 RAVALLION
at the household level using panel data. The model allows one to test the dynamic im-
pact over the length of the panel of a wide range of initial conditions at both household
and community levels. The proposed approach differs from the usual “fixed-effects”
method. While it is common to model the variables of interest in first-difference form,
or as deviations from their time means, this is typically done in the context of a static
model in levels, for which the time slope is a constant and there are unobserved fixed
effects. Clearly such a formulation is of little interest here since it does not allow ini-
tial conditions-including area-specific policies and projects-to affect the growth
path of the variable of interest.
a standard growth-theoretic model. One would want to allow for deviations from the
underlying steady-state solution, due to shocks and/or adjustment costs. It is thus
better to postulate an autoregressive distributed lag structure for the growth process,
augmented by exogenous shocks and unobserved effects. This will permit a more
powerful test of the impact of initial conditions on the evolution of living standards
than is possible by only modeling the long-run average growth rates.
Thus one can postulate an econometric model of the growth rate in living stan-
dards at the household level as a function of ( 1 ) initial conditions at the household
level, (2) initial conditions at the local community level, and (3) exogenous time-
varying factors (“shocks”) at both levels. The variance in household-level growth
rates due to the second set of variables could in principle be “explained” by a com-
plete set of area-dummy variables. However, by collating the micro (household-level)
data with geographic data bases on agroclimatic variables, and stocks of physical
and social infrastructure, it will be possible to obtain a far more illuminating speci-
fication in which specific attributes of the local area enter explicitly. As a check for
omitted-variable bias, one can then compare the results with a model in which the
geographic variation is picked up entirely by dummy variables. This may also sug-
gest idiosyncratic regional effects, such as due to local political factors, that might
be best studied on an ad hoc (case study) basis.
On introducing dynamics and both time-invariant and time-varying unobserved
effects, a suitable dynamic model could take the form
*One might also hypothesize that area-mean consumption entprs (4), but this effect is generally not iden-
tifiable; see Manski (1993).
82 RAVALLION
Thus consistent estimation of the above model does present a more difficult
problem than either the static micro- or dynamic macromodels. But a solution is
available (Jalan and Ravallion 1996). First notice that the error term in (4) has two
components: an unobserved individual specific time-invariant fixed effect, qi, and
the standard innovation error term, UiL. Let us assume that the unobserved individual-
specific effect q; is correlated with the regressors, i.e., E ( q i z ; ) , E ( q i , xit), and
E(qiCil-l) are nonzero.* The error uit is however serially uncorrelated and thus
satisfies the orthogonality conditions:
These conditions ensure that suitably lagged values of C;, and xit can be used as
instruments. In order to get consistent estimators, the unobserved fixed effects qi
need to be eliminated. This can be done by taking the first differences of (4) to obtain
the transformed “growth model”:?
There are various options for estimating such a model. GMM methods appear
to offer the best approach (Arellano and Bond 1991). Given that the uits are serially
uncorrelated, the GMM estimator is the most efficient one within the class of in-
strumental variable (IV) estimators. In estimating (6),Cit-2 or higher lagged values
(wherever feasible) are valid instrumental variables. Heteroscedasticity-consistent
standard errors can be computed using the residuals from a first-stage regression to
correct for any kind of general heteroscedasticity. Inferences on the estimated pa-
rameter vector are appropriate provided the moment conditions used are valid. Tests
for overidentifying restrictions can be implemented to test the null hypothesis that
the instruments are optimal (i.e., the instruments and the error term are orthogo-
nal); see Sargan (1958,1988) and Hansen (1982). In addition, a second-order serial
correlation test (the test statistic will be normally distributed) can be constructed
given that the consistency of the GMM estimators for the first-differenced model de-
pends on the assumption that E(Au;, A u ; , - ~ = ) 0.z Tests for spatial correlation
in the errors-arising from omitted geographic effects-can also be performed (fol-
lowing Frees 1995), though they will need to be adapted to the present problem.
*Bhargava and Sargan (1983)offer a dynamic random-effects model where it is assumed that some of the
regressors are uncorrelated with the unobserved individual specific effect.
Various transformations can he used to eliminate the nuisance parameters, though the estimation pro-
cedures used are similar to the one proposed here.
*There may be some first-order serial correlation; i.e., E(Aui, A u i l - l ) may not he equal to zero since
Auit are the first differences of serially uncorrelated errors. Alternatively, if uil is a random walk, then
there should not be any serial correlation in the first differenced Aujl.
POORAREAS 83
If corrective action is called for, then one can try introducing more geographic data,
or more geographic structure to the error process.*
*Potential approaches include Froot (1989) and Conley (19%);also see the special issue of Regional
Science and Urban Economics, September 1992, on “Space and Applied Econometrics.”
84 RAVALLION
*This approach has showed promise in research on other topics, such as intertemporal coiisumption be-
havior and inequality (Deaton and Paxson 1994).
POORAREAS 85
+
With cross sections we do not know consumption at t 1. But if we know the future
values of one or more predictors of consumption then these can be used as instru-
ments. One first models time t + 1 consumption as a function of variables observed
+
in time t 1 but also at time t . Then one uses that model to predict the consumption
at time t+ 1 of each household in the time t sample and, hence, estimate its rate
+
of consumption growth from t to t 1. This can then be regressed on the individual
and area characteristics at time t .
Many cross-sectional surveys do obtain information about likely future char-
acteristics which can be used as instruments. For example, the Rural Household
Surveys for China collect both beginning and end-of-year data on financial and phys-
ical wealth in each round. So the end-of-year data can be used to predict the next
period’s consumption along with other time-invariant variables. There are other po-
tential instruments; the next period’s demographic composition (number of persons
by age groups) of the household can be predicted from the current period’s composi-
tion. R2 will be lower, but consistent estimates should still be possible under regular
conditions. Estimators are available for dynamic models of this sort using repeated
cross sections (see Moffitt 1993 and references therein). The performance of these
methods could be studied using the panel data, but treating it as repeated cross sec-
tions; it would also be of interest to try the method out on the original samples (prior
to panel construction) so as to assess effects of panel attrition.
When confronted with the reality of extreme poverty in remote rural areas with poor
natural resources, observers often ask: “Why don’t these miserably poor people just
move out?” Those who claim that outmigration is the answer often also argue against
public investment in these areas. “This will just be at the expense of more profitable
investments elsewhere” the argument goes.
This chapter has questioned this reasoning, but it certainly has not refuted
it. That will be a matter of future empirical research. But some points can be made
now. One is that we should understand the nature of the incentives and constraints on
outmigration from these areas. You need some money to start up elsewhere, you need
some basic skills, and you need information. All are generally lacking, but more so
for some people than others. The reasons they are lacking can be traced to market
imperfections of one sort or another and how these interact with poverty. The outside
non-farm-labor-market options are typically thin or nonexistent for someone who
is illiterate, reflecting a lack of substitution possibilities with moderately educated
labor in even quite labor-intensive manufacturing. Credit market failures mean that
there is little chance of borrowing to finance the move. There is highly imperfect
information about prospects elsewhere, and sizable uninsured risk.
86 RAVALLION
The process of outmigration may be a mixed blessing for a poor area at least
initially. Those who have the money, skills, and information will naturally tend to
be the relatively better off. Their departure is likely to put upward pressure on the
incidence of poverty in the poor area. This comes about in various ways. As a purely
statistical proposition, most measures of poverty will rise when the nonpoor leave.
But there are more subtle dynamic effects through “ghettoization.” The local skill
base is likely to have external effects on the local growth process. It follows that
the outmigration of the better educated workers entails an erosion of local resource
base with adverse longer-term growth consequences. Results from research on poor
areas of southwest China have suggested that there exist strong external effects of
physical and human infrastructure on the returns to private investment and (hence)
the prospects of escaping poverty there (Jalan and Ravallion 1997). These effects
will be mitigated to some extent by remittances and reduced pressure on the land.
All this suggest that one of the best ways that government can help is by in-
vesting in the schooling, health, and nutrition of the children of the poor in these
areas. Public assistance with credit (to cover search costs for poor outmigrants), and
information will complete the package.
Should we also be investing in the land and physical capital of these areas?
What should be the balance between those investments and human resource de-
velopment? Here one could proceed on an ad hoc basis; if the investment passes a
standard (distribution-unweighted) cost-benefit test, then it should be done. But the
“anti-investment” argument would maintain that private capital flows would already
have found such opportunities.
One response is that, unless there is perfect factor mobility (which nobody
seems to consider plausible), there may still be an equity case for such investments
up to some point. Then they are part of a redistributive policy, exploiting the possi-
bilities for geographic targeting (Lipton and Ravallion 1995).That is fine. However,
for the same reasons that there may be too little outmigration, there may also be too
little investment in these areas from an efficiency point of view as well. Credit mar-
ket imperfections can entail that there are unexploited opportunities for investing in
the land and physical capital of these areas. The liquidity constraints that make it
hard to finance outmigration will also make it hard to finance otherwise profitable
local investments. And asymmetric information and supervision costs deter outside
investors.
The argument that investing in poor areas would entail lower overall growth in
the economy also breaks down as soon as one introduces local public goods and other
forms of “geographic capital” into the analysis, i.e., goods which cannot be supplied
efficiently by markets and which alter the rate of return to private investment. Poor
rural infrastructure in these areas could then be the underlying reason for low private
investment; better infrastructure would then encourage private capital inflow.
However, much of this is conjecture, based on little more than casual obser-
vations and common sense. The chapter has suggested some econometric methods
POORAREAS 87
which might be used to address these issues more rigorously. But is it an ambitious
research agenda, both in terms of the types of data needed, and the level of economet-
ric sophistication needed to convincingly disentangle these effects. So it is important
to ask: What will we have learned from such research that can reliably inform the
above policy choices about poor areas? Three types of potential lessons for policy
can be identified.
The first set of policy lessons concern the economic case for area-based in-
terventions. Should such interventions only be viewed as a specific kind of redis-
tributive policy with probable costs to the overall rate of growth? This view derives
from a growth model in which the existence of persistently poor areas, and regional
divergence more generally, are traced to the natural resource endowments and tech-
nologies, notably the (claimed) existence of increasing returns to private production
inputs. Private investment flows to the areas with the highest returns which are also
(according to this model) the initially richer areas. This still begs a number of policy
questions. For example, we need to know more about what the best indicators are for
this type of redistributive policy when the aim is to help individuals escape poverty
in the future.
But maybe we will find that the empirical results from the type of research
proposed here will reject this model at a fundamental level in favor of one which
says, in effect, that poor areas and divergence reflect spatial inequalities in access
to credit, and publicly provided social and physical infrastructure, and have rather
little to do with increasing returns to private capital, residential differentiation, and
so on. That conclusion could well dramatically alter the policy dialogue on poor-
area interventions and shift the emphasis to the task of redressing these preventable
spatial inequalities. If that conclusion is borne out by the data, then such policies will
be good for growth and good for equity. Or the results may point to a more complex
and mixed picture, possibly with a degree of country, and even regional, specificity.
A second set of broad policy lessons stem from the fact that the approach pro-
posed here allows one to measure spatial externalities. This can throw light on, for
example, how much of the welfare gain from schooling is transmitted though the
internal effects on earnings and so on, and how much is external, arising from the
(presumably positive) neighborhood effects of better education. This will have impli-
cations for the priority one attaches to efforts at finely targeting education subsidies
and for the policy arguments often made about how much basic education needs to
be subsidized on the grounds of its external benefits.
A third set of policy implications will be more specific to the types of projects
that should be recommended for dealing with persistently poor areas. In the process
of addressing these broad questions, empirical models can include explanatory vari-
ables of more or less direct policy relevance. One set of such variables is the very
existence of poor-area interventions. Is the subsequent rate of growth in living stan-
dards of poor people higher when a poor-area program is in place than would other-
wise have been the case, controlling for both household and community-level initial
88 RAVALLION
conditions and time-varying exogenous shocks? What were the longer-term welfare
gains? How do they compare to the budgetary outlays on such programs? There will
be other explanatory variables of policy relevance, such as the initial stocks of var-
ious components of publicly provided social and physical infrastructure, for which
all of the same questions apply, though here of course it may not always be easy to
account fully for their historical costs (though costs of new facilities will often be
known). What priority should be attached to social services versus physical infra-
structure or credit, and how does this vary with other factors? This should allow a
deeper understanding of what the complementarities are among these various types
of publicly provided inputs; we may learn, for example, how much access to one type
of infrastructure alters returns to another, or how much poor agroclimatic conditions
affect returns to different types of publicly provided inputs.
This chapter has argued that the long-standing problem of lagging poor areas
in growing economies, and more generally the diversity in prospects of escaping
poverty that one finds, are explicable with the right empirical tools and data. This
offers hope for better informing a number of difficult public choices on appropriate
responses to poor areas.
ACKNOWLEDGMENTS
I have had many useful discussions on this topic with Jyotsna Jalan. For their help-
ful comments on an earlier draft, I am also grateful to the Handbook’s referee, and
to Hans Binswanger, Ken Chomitz, Klaus Deininger, Lionel Demery, Bill Easterly,
Paul Glewwe, Emmanuel Jimenez, Aart Kraay, Valerie Kozel, Peter Lanjouw, Andy
Mason, Branko Milanovic, Lant Pritchett, Martin Rama, Zmarak Shalizi, Lyn Squire,
Dominique van de Walle, Mike Walton, and Quentin Wodon.
REFERENCES
Arellano, M. and S. Bond (1991), Some Tests of Specification for Panel Data: Monte-Carlo
Evidence and An Application to Employment Equation, Review o j Economic Studies,
58,277-298.
Bardhan, P. K. (1984), Land, Labor and Rural Poverty: Essays in Development Economics,
Columbia University Press, New York.
Barro, R. and X. Sala-i-Martin (1995), Economic Growth, McGraw-Hill, New York.
Bhargava, A. and J. D. Sargan (1983), Estimating Dynamic Random Effects Models from Panel
Data Covering Short Time Periods, Econometrica, 51, 1635-1659.
Binswanger, H. S. R. Khandker, and M. Rosenzweig (1993), How Infrastructure and Financial
Institutions Affect Agricultural Output and Investment in India, Journal of Develop-
ment Economics, 41,337-366.
POORAREAS 89
Hirschman, A. 0. (1958), The Strategy ofEconomic Development, Yale University Press, New
Haven, CT.
Hsiao, C. (1986),Analysis of Panel Data, Cambridge University Press, New York.
Jalan, J. and M. Ravallion (1996), Are There Dynamic Gains from a Poor Area Development
Program?, Journal of Public Economics, forthcoming.
Jalan, J. and M. Ravallion (1997). Spatial Poverty Traps? Policy Research Working Paper,
World Bank, Washington, D.C.
Jazairy, I., M. Alamgir, and T. Panuccio (1992), The State of World Rural Poverty: An Inquiry
into its Causes and Consequences, New York University Press for the International Fund
for Agricultural Development, New York.
Krugman, P. (1991), Geography and Trade, MIT Press, Cambridge MA.
Leading Group (1988), Outlines of Economic Development in China’s Poor Areas, Office of
the Leading Group of Economic Development in Poor Areas under the State Council,
Agricultural Publishing House, Beijing.
Lipton, M. (1995), and M. Ravallion, Poverty and Policy, in J. Behrman and T. N. Srinivasan
(eds.), Handbook of Development Economics, vol. 3, North-Holland, Amsterdam.
Lucas, R. E. (1988), On the Mechanics of Economic Development, Journal of Monetary Eco-
nomics, 12,3-42.
Maddala, G. S. (1986), Disequilibrium, Self-Selection, and Switching Models, in Z. Griliches,
and M. D. Intriligator (eds.), Handbook of Econometrics, North-Holland, Amsterdam.
Manski, C. F. (1993), Identification of Endogenous Social Effects: The Reflection Problem,
Review of Economic Studies, 60, 531442.
Moffitt, R. (1993), Identification and Estimation of Dynamic Models with a Time Series of
Repeated Cross-Sections, Journal of Econometrics, 59,99-123.
Myrdal, G. (1957), Economic Theory and Underdeveloped Regions, Duckworth, London.
Pitt, M., M. Rosenzweig, and D. Gibbons (1995), The Determinants and Consequences of
the Placement of Government Programs in Indonesia, in D. van d e Walle and K. Nead
(eds.), Public Spending and the Poor Theory and Evidence, Johns Hopkins University
Press.
Ravallion, M. (1982), The Welfare Economics of Local Public Spending: An Empirical Ap-
proach, Economica, 4 9 , 4 9 4 1 .
Ravallion, M. (1984), The Social Appraisal of Local Public Spending Using Revealed Fiscal
Preferences, Journal of Urban Economics, 16,46-64.
Ravallion, M. (1993), Poverty Alleviation through Regional Targeting: A Case Study for In-
donesia, in K. Hoff, A. Braverman, and J. Stiglitz (eds.) The Economics of Rural Orga-
nization, Oxford University Press, Oxford.
Ravallion, M. (1996), Issues in Measuring and Modelling Poverty, Economic Journal, 106,
1328-44.
Ravallion, M. and S. Chaudhuri (1997), Risk and Insurance in Village India: Comment, Econo-
metrica, 65, 171-184.
Ravallion, M. and J. Jalan (1%6), Growth Divergence due to Spatial Externalities, Economics
Letters, 53,227-232.
Ravallion, M. and Q. Wodon (1997), Poor Areas, or Only Poor People? Policy Research Work-
ing Paper 1363, World Bank, Washington, D.C.
Richardson, H. W. and P. M. Townroe (1986), Regional Policies in Developing Countries, in
P. Nijkamp (ed.), Handbook of Regional and Urban Economics, North-Holland, Am-
sterdam.
POORAREAS 91
Anil B. Deolalikar
University of Washington, Seattle, Washington
1. INTRODUCTION
As a result of the debt crises of the 1980s and the ensuing structural adjustment
and stabilization programs, many less-developed countries (LDCs) have had to cut
back social spending, including spending on government health programs (Cornea,
Jolly, and Stewart 1987). As a result, these countries have been forced to explore
alternative means of financing health services, including greater recovery of (re-
current) costs in the government health sector via user fees. Proponents of greater
cost recovery base their recommendations on the findings of several empirical stud-
ies that suggest that the demand for health care in LDCs is price inelastic (Akin
et al. 1987, Jimenez 1987, World Bank 1987). On the other hand, opponents of
the cost recovery argument contend that raising fees will reduce access to care,
especially by the poor, and consequently adversely affect health status (Cornea,
Jolly, and Stewart 1987, Gilson 1989).
Unfortunately, the empirical bases on which both arguments are made are
weak. The relatively few empirical studies of health-care demand for LDCs are flawed,
largely because of their failure to recognize (1)the role of quality of health services
in influencing demand and (2) the effect of health-care prices on utilization of health
services via their effect on the reporting of illnesses by individuals. The most orb-
vious reason for the lack of control for quality is that observable and quantifiable
data on quality are rarely available. But, since the price charged for medical care
93
94 DEOMLIKAR
often reflects the quality of care provided, the lack of control for quality confounds
quality with price effects and biases estimated price effects toward zero (as price
and quality influence demand in opposite directions). In addition, health-care de-
mand functions that are conditioned on reported morbidity can greatly understate
the total effect of health-care prices on the utilization of health services, since they
ignore the potentially adverse effect that these prices can have on the reporting of
morbidity.
A number of studies have previously attempted to estimate the demand for health
services in LDCs. Unfortunately, the existing literature in this area offers confusing
evidence regarding the price response of health-services utilization to user fees. One
strand of literature suggests that prices are not important determinants of health-care
utilization. Heller (1981), Akin et al. (1984, 1986), Birdsall and Chuhan (1986),
and Schwartz et al. (1988) all report very small and sometimes positive price ef-
fects, most of which are statistically insignificant. Another strand of work by Mwabu
(1986), Gertler et al. (1987), Alderman and Gertler (1988), and Gertler and van der
Gaag (1990)conclude that prices are important. The results of the first group of stud-
ies contrast sharply with most studies on the demand for medical care in developed
countries which report price elasticities ranging from -0.2 to as high as -2.1 (Ros-
set and Huang 1973, Goldman and Grossman 1978, Newhouse and Phelps 1974,
Manning et al. 1987). This divergence between the literature on developed and de-
veloping countries is paradoxical, since one would expect prices to be more impor-
tant in determining utilization in developing than in developed countries for two
reasons: first, income levels are substantially lower in the developing countries; sec-
ond, medical insurance, which is almost universal in developed countries, is virtually
nonexistent in most developing countries.
The paradox may be explained by the fact that most previous studies on health-
services utilization in developing countries are flawed in three respects. First, the
treatment of the price of health services in much of the previous work has been far
from satisfactory. While some studies have used expenditures per medical visit re-
ported by consumers as the relevant price, other studies have used standard fee
schedules, as reported by providers. Both methods are incorrect and can cause mis-
leading results. The amount paid by a consumer per provider visit (namely, the “unit
value”) depends not only on the price charged by that provider for a standard treat-
ment but also on the type of treatment and quality of service chosen by the patient.
For example, a visit for a common cold will necessarily cost less than a visit for a
more serious problem. In addition, health providers, like other suppliers of goods and
services, can typically provide a range of treatments of varying quality (and price) for
the same ailment. To calculate the true price of health services, the disease-specific
technological effect and the consumer-chosen quality effect need to be purged from
SERVICES IN A DEVELOPING
HEALTH COUNTRY 95
observed unit values. Much of the previous work on health-care demand has con-
founded these price, quality, and disease-specific variations.*
The use of established or official fees as price constructs does not solve the
problem either. Indeed, this procedure introduces another set of biases in the esti-
mates of demand functions. For example, in estimating the demand for health ser-
vices, Gertler and van der Gaag (1990)assume that the price of obtaining health ser-
vices from government medical establishments in Peru and the Ivory Coast is zero,
since such establishments do not have user fees in principle.? However, a number of
recent surveys in developing countries suggest that there may be a wide discrepancy
between officially established fees for medical visits and payments actually made by
patients (World Bank 1992a, Deolalikar and Vashishta 1992). Individuals may be
able to obtain speedier service and higher-quality treatments by paying for services,
even when such services are officially free of charge. Imposing the assumption that
prices do not vary in the sample (when, if fact, they do) can reduce the efficiency of
price elasticity estimates and incorrectly lead to the result that the price elasticity
of demand for health services is not significantly different from zero.
The second major problem with previous studies is that they estimate the de-
mand for health services, conditional on an illness episode being reported by an indi-
vidual or household. To the extent that health-care prices can affect morbidity (i.e.,
the probability of an individual experiencing an illness episode) and the reporting of
morbidity by individual respondents, the price effects obtained from a conditional
health-care demand model are partial. To complicate matters, health-care prices are
likely to have opposing effects on morbidity (positive), reporting of morbidity (nega-
tive), and health-care utilization (negative), so it is impossible to infer the total effect
of prices on health-services utilization from the conditional (and partial) demand es-
ti mates.
The third major problem in studies of health-care demand is the omission of
food prices. Within a general behavioral model of health determination, the demand
for food and medical care are jointly determined, since nutrition and medical care
are (possibly substitutable) inputs in the “production” of health status. This means
that the demand for health care is influenced not only by the price of health services
but also by food prices, in much the same way as the demand for different foods is
determined by food and health-care prices.$ Of course, the omission of food prices
from health-care demand functions will not necessarily bias the estimated effects of
health-care prices on health-care demand unless food prices are correlated with the
price of health services.
*See Deaton (1988)for a discussion of a somewhat similar problem in the analysis of food demand in
LDCs.
tThey assume that all of the price variation occurs in the form of variation in distance traveled to
providers.
$See Behrman and Deolalikar (1988).
96 DEOLALIKAR
111. T H E MODEL
Since the theory of demand for medical care is well-developed,* there is no need
here to develop an elaborate model of individual health determination. If it is as-
sumed that individuals maximize a utility function having health status and other
consumption as its argument, subject to a budget constraint and a health production
function that includes food and medical care as inputs, the resulting reduced-form
derived demand functions for food and medical care will include as their arguments
the prices of food and medical care, household income, and socio-demographic indi-
vidual and household characteristics. As noted earlier, the major empirical problem
in estimating such a reduced-form demand system is that health-care prices are not
directly observed; what are observed instead are the (endogenous) unit values. The
latter need to be purged of disease-specific technological and household-specific
quality variations before they can be treated as health-care prices. This is a problem
that Deaton (1988)has dealt with in the context of food prices.
We assume that (1)for a given type of health provider (e.g., private physician
versus a public health clinic), interhousehold variation in expenditure per illness
episode that is explained by individual and household characteristics, such as sex,
age, marital status, household income, household size and composition, and traits
(e.g., age, schooling and occupation) of the household head, reflects variation in the
quality of health services, and (2) that the spatial (intercluster) component of the
unexplained variation in unit values reflects true (quality-constant) price variation.
In other words, it is assumed that when individuals with high income and better
schooling spend a larger amount on treating the same ailment from the same type
of health provider, they are in effect buying higher quality of health care. However,
*See Grossman (1972).Behrman and Deolalikar (1988) also develop a generic model of health determi-
nation for an LDC.
HEALTH
SERVICES I N A DEVELOPING
COUNTRY 97
even after controlling for income, education, and other characteristics, if individuals
in one location spend more than consumers in another location for treating the same
ailment from the same type of provider, that difference reflects true price variation in
the cost of health services across the two locations. While this is a strong assumption,
it does not appear to be unreasonable.* Given that the quality of health care selected
by households is unobserved, an assumption of this type is required to identify prices
from observed unit values.
Controlling for disease-specific effects in the estimation of health-care prices
is much more straightforward, since the ailments for which individuals obtain med-
ical care are observed in the data.
An individual’s decision not to seek treatment for an illness or to seek treat-
ment from a traditional healer or modern provider is modeled as a multinomial logit
problem. The probability of seeking no, tradition, or modern care is
+ b k In p r + dk In x;4- ekzi; -k
exp(ak In p y pi;)
P ( M ; = k)
E,exp(n, In p y + 6, In pJF + d, In K; + e,Zi; + pi;)
(1)
k = 0,2, z=l,2
where i = index of the individual
j = index of the location or cluster of residence
M = choice of medical care
k = 0 for no treatment, 1 for a traditional healer, and 2 for a “modern” provider
(private physician, health clinic, or hospital)
p M = vector of health-care prices (i.e., the price of services obtained from tra-
ditional healers and modern health providers), derived below
p F = vector of food prices, derived below
Y = household income
Z = vector of individual and household characteristics, including age, educa-
tion, family size and composition, etc.
p = disturbance term
*This assumption is similar to that made by Deaton (1988)to separate the effect of true prices from the
effect of quality variations on consumer food demand.
98 DEOIALIKAR
where qM is the unit value and Q is the quality of care obtained from a particular
provider.* Note that the true price charged by that provider, pM,does not have an i
subscript, since it is assumed to vary only spatially. Taking the logs of both sides of
(2), we have
*The true price and the quality of' service are easier to interpret if the unit value (or observed price) is
modeled as a product (as opposed to another function) of the true price and quality. In this case, the true
price,y,".can be thought of as the amount (in Indonesian Rupiahs) paid per provider visit of standardized
quality, while the quality variable, QY, can be regarded as the ratio of a visit of standardized quality to
an actual visit.
?In the absence of any priors on the functional form for the quality-of-service function, I have chosen the
log-linear form for reason of convenience and tractability.
SERVICES IN A DEVELOPING
HEALTH COUNTRY 99
*Of course, most household health surveys are plagued by this problem, since the collection of objective
indicators of ill health is difficult and expensive.
?Indeed, no information is available on how many visits an individual made to the same or different health
providers in treating an illness episode.
100 DEOMLIKAR
interview. It is much less likely that individuals would have made multiple visits to
a health provider within a one-week period than in a three-month period. Therefore,
the one-week recall data are likely to provide more meaningful and reliable results.
In addition, the one-week recall data are more dependable because three months is
generally too long a period to recall an illness episode with much accuracy. There-
fore, only the one-week recall data are used in the analysis that follows.
The 1987 SUSENAS reports eight treatment alternatives for illnesses reported
during the past week and the past three months: no treatment, self-treatment, tra-
ditional healer, private physician, hospital, public health center, polyclinic, and
paramedic. Since each treatment alternative adds a total of 34 parameters to be es-
timated in the multinomial logit model, the eight treatment choices have been col-
lapsed into three broad alternatives: no treatment (with the dependent variable as-
suming a value of zero), self-treatment, or treatment from a traditional healer (value
of one), and treatment from a modern health provider (which includes all the remain-
ing choices) (value of 2). With the one-week recall data, the percentage of adults re-
porting an illness episode who sought these treatment methods were 31.3%, 5.4%,
and 63.3%, respectively (Table 1). For children below five years of age, the corre-
sponding figures are 27.6%, 4.3%, and 68.1%. The illness episodes of children aged
5-17 years appear to receive the least attention. For this age group, nearly 37.2%
of reported illness episodes were not treated, and fewer than 60% were treated by a
modern health provider.
The household characteristics (2) included in the unit value equations and
the treatment choice model are size and demographic composition of the household,
age, and schooling years of the household head, whether the household head is a
salaried employee, and whether the household is covered by any health insurance.
The presence of health insurance can often dramatically affect both the choice of
quality of health care as well as the choice of health providers for curative services.
In Indonesia, the only households typically covered by health insurance are those
with a member working in the public sector; in the current sample, only 1.5%of the
households belong to this category. Two cluster characteristics are included in the
unit value and treatment choice models: the proportion of villages in the kabupaten
having organized garbage collection and those having piped (and hence presumably
clean) drinking water. Since both of these environmental hygiene variables affect the
production of health status from food and medical care inputs, they are expected to
influence the demand for medical care. Finally, three individual characteristics are
included in the treatment choice model; these are sex, age, and schooling years.*
There are two econometric problems that warrant discussion. First, since
household income includes labor earnings, which are likely to be affected adversely
*Since the unit values are observed only at the household level, the unit-value equations do not include
individual characteristics as explanatory variables.
HEALTH
SERVICES IN A DEVELOPING
COUNTRY 10I
V. EMPIRICAL RESULTS
A. Unit Value Regressions
Cluster fixed-effects estimates of the unit value regressions are reported in Table 2.t
Since the unit value regression for traditional providers is estimated with very few
observations, only the empirical results relating to modern-provider unit values pro-
viders are discussed here. The estimates indicate that these unit values depend
*Usually, the problem in using nonlaljor income in most developing-country data sets is that it provides
very little additional information, since few households report any asset income. However, this is not
the case with the 1987 SUSENAS survey, in which over 94%of households reported nonzero values of
annual nonlabor income. Transfers comprise a large fraction of nonlabor income.
'ISince the true prices of medical care are derived from the estimated cluster effects, it is not possible
to control additionally for household fixed effects in the unit value regressions. If individuals within
a household have correlated error, not taking into account the error structure may lead to incorrect
standard errors. However, this is not a major problem in the current sample. Fewer than 10%of the
sample observations for the unit value regressions are accounted for by multiple individuals residing in
the same household.
Table I Descriptive Statistics, Indonesia, 1987
Household statistics Children under 5 years Children 5 1 7 years Adults over 17 years
Variable Mean Std. dev. Mean Std. dev. Mean Std. dev. Mean Std. dev.
significantly on the type of illness and symptoms for which treatment is sought. For
example, treatment for liver-related diseases costs 19.7% more, and that for tuber-
culosis costs 52.4% more, than treatment for the excluded category of “other ill-
nesses.” In contrast, malaria treatment costs 20.8% less than the excluded category.*
Controlling for the disease-specific and the spatial (or cluster) effects, household in-
come has a strong positive impact on both modern- and traditional-provider unit
values, with the income elasticity for the former being 0.80 and the latter, 0.55. As
would be expected, individuals with health insurance coverage spend significantly
(43.4%) more per illness episode than individuals without insurance coverage. The
unit value regressions also imply that individuals from large households buy lower
levels of health-care quality (i.e., spend less per illness episode), while those from
households with better-schooled heads purchase higher quality. Finally, the house-
hold composition effects indicate that additional numbers of children aged 0-4 years
and boys aged 5-14 years significantly reduce household expenditure per illness
episode.
*The large coefficient on the tuljerculosis dummy variable in the unit-value regression for traditional
providers is not reliable, since only a couple of individuals in the sample obtained treatment for tuber-
culosis from a traditional provider.
?These are conditioned on an individual reporting an illness during the one-week reference period.
Unless otherwise stated, all elasticities reported are evaluated at sample means.
Table 3 Multinomial Logit Estimates of the Probability of Seeking Treatment from a Traditional or Modern Health Provider for a Reported Illness
Episode Experienced during the Preceding Week, Children under 5 Years of Age, Indonesia, 1987
Traditional health provider Modem health provider Traditional health provider Modem health provider
Independent variable Parameter T ratio Elasticity Parameter T ratio Elasticity Parameter T ratio Elasticity Parameter T ratio Elasticity
Traditional health provider Modem health provider Traditional health provider Modern health provider
Independent variable Parameter T ratio Elasticity Parameter T ratio Elasticity Parameter T ratio Elasticity Parameter T ratio Elasticity
Traditional health provider Modem health provider Traditional health provider Modem health provider
Independent variable Parameter T ratio Elasticity Parameter T ratio Elasticity Parameter T ratio Elasticity Parameter T ratio Elasticity
respondents, with the result that health-care demand models estimated only on the
sample of individuals self-reporting an illness episode are likely to significantly un-
derstate the adverse effect of health-care prices on the demand for care in the modern
health sector.
The fact that the unconditional own-price elasticity of demand for curative
treatment is small and insignificant for the 0-5 years age group suggests that very
young children in households are relatively more protected than other members
against price changes. On the other hand, children aged 5-17 years are most vulner-
able to price increases, with their modern provider visits falling by 1.6%for a 1%
increase in the price of modern care.
2. The estimated own-price elasticities of demand for traditional health ser-
vices also differ significantly between the conditional and the unconditional model.
However, all elasticities for traditional health services tend to be unusually large be-
cause the sample mean of the unconditional probability of using traditional health
services, which is the point at which the elasticities are evaluated, is extremely small.
3 . Household size has a strong negative effect on both the conditional and
unconditional demand for modern care among children (0-1 7 years) but not among
adults. Surprisingly, household income is generally not significant in determining
the demand for either traditional or modern care. The only case where income is
significant is in influencing the demand for modern care among children aged 5-17
years (with an income elasticity of 0.42). However, two facts are relevant in putting
the nonsignificance of income i n perspective: (1)the significant negative effect of
household size implies that the demand for modern health providers does increase
with household income per capita, especially for children aged 0 - 4 and 5-17 years
of age; and (2) as noted earlier, household income has strong positive effects on the
demand for quality (i.e., unit values) of both modern and traditional health services.
4. The cross-price elasticities of demand for health care-estimated by few
previous studies for developing countries-are positive for all three age groups (al-
though not significantly so for children under 5), indicating that traditional and mod-
ern health services are (gross) substitutes for each other. The estimated elasticity
of demand for modern care with respect to the price charged by traditionai health
providers is 0.63 for children aged -5-17 years and 0.51 for adults. The correspond-
ing cross-price elasticities for traditional health services are extremely large (25.43
and 10.02 for persons aged 5-17 and 18 and over, respectively), again reflecting the
extremely low levels of usage of traditional healers. These estimates indicate that an
increase in the price of modern health services can cause a significant shift away
from the modern to the traditional health sector.
S. The effects of schooling of the individual* and that of the household head
on health-care demand also differ between the conditional and unconditional mod-
*An individual’s own schooling was inc:luded in the demand equations only for adults over 17 years of
age, since it is likely to be enclogrnoiis for rhildrrn.
HEALTH
SERVICES I N A DEVELOPING
COUNTRY I 13
els. While the former suggest that the demand for modern health care is strongly
(positively) influenced by an individual's schooling or the schooling of the household
head, the unconditional estimates shown neither or these effects to be significant.*
The latter result reflects the fact that the unconditional estimates confound the effects
of schooling on morbidity (which are likely to be negative), the reporting of morbid-
ity (positive), and the treatment of illness episodes from modern health providers
(positive).
6. Food prices, which have not been included in the demand for health care
by previous studies, are observed to generally have very strong effects on the de-
mand for both traditional and modern health care. Most of the estimated food price
effects on the demand for traditional care are positive, implying that food and tra-
ditional health care inputs are viewed as substitutes. However, the probability of
choosing a modern provider declines with most food prices, implying that food and
modern health-care inputs are complementary. Thus, the effect of rising food prices,
especially the price of rice and sugar, appears to be to shift people from modern to
traditional health providers.
7. Among all three age groups, age is associated with an increased (uncon-
ditional) probability of using modern providers. Since age has generally no effect
on the conditional probability of choosing a modern provider,i this suggests that ei-
ther true morbidity or the reporting of morbidity increases with age. Among adults,
morbidity is likely to increase with age; however, it is likely that, among children
under S, it is the perception and reporting of morbidity-not morbidity itself-that
increases with age.
8. Health insurance coverage and employee/self-employed status of the
household head have strong influences on the demand for modern medical care. The
magnitudes of the estimated health insurance coverage effects are especially large
(with insured individuals aged 04,5-17, and 18and over having 315.7%, 1751.7%,
and 114.7%, respectively, greater demand for modern care than uninsured individ-
uals). However, only the estimate for adults over 17 is significant. A possible reason
for the nonsignificance of the insurance variable is that it is highly correlated with
the employee status variable, since all government employees in Indonesia as well
as most salaried individuals employed in the formal business sector are covered by
health insurance schemes. Likelihood-ratio tests showed that the employee status
and insurance coverage variables were jointly significant in determining the choice
of modern health providers for all three age groups.
*For adults, both own schooling and the household head's schooling do significantly reduce the con-
ditional and unconditional proba1)ility of using a traditional health provider. For children aged 5-17
years, schooling of the household head is associated with a significantly lower unconditional probability
of choosing a traditional healer.
?For children aged 5-17 years, age does have a significant, although numerically small, effect on the
conditional probahility of using a modern provider.
I14 DEOULIKAR
The empirical results in this chapter clearly show that the demand for both tradi-
tional and modern health services in Indonesia is highly responsive to own price
when it is not conditioned on the reporting of morbidity and when quality variations
are purged from the price of health services. Indeed, unlike most previous studies for
LDCs that show very small price effects on health-care demand, the price elasticities
estimated here are large in magnitude (-0.13, -1.64, and -0.82 for modern care
and -2.2, -9.7, and -6.2 for traditional care for children under 5, children aged
5-17 and adults over 17, respectively).t The fact that the estimated own-price elas-
ticities for modern health providers are large and significant in the unconditioned
probability model but small and insignificant in the conditioned probability model
suggests that the price of care in the modern sector strongly reduces the reporting
of morbidity by respondents, so that health-care demand models estimated only on
the sample of individuals self-reporting an illness episode are likely to significantly
understate the total effect of health-care prices on the demand for care in the modern
health sector.
Thus, these findings cast doubt on the assumption, commonly maintained in
the literature, that increasing user fees at government health facilities will have lim-
ited effects on the utilization of health services and thus enable governments to raise
health-sector revenues. The empirical results clearly indicate that the utilization of
modern health services, especially by children aged 5-1 7 and adults, will decline
*The single exception is the effect of garbage c.ollec.tion on the demand for modern providers among
adults, which is not statistically signific-ant.
?The demand elasticities for traditional healer services are particularly large because the sample mean
level of usage of traditional healers is very low.
HEALTH
SERVICES I N A DEVELOPING
COUNTRY I 15
sharply if prices in the modern health sector are raised without simultaneously im-
proving the quality of health services.
The empirical results also indicate that a number of other prices-in partic-
ular, the price for health services charged by traditional healers and the price of
food staples (namely rice and sugar)-have significant effects on the demand for
modern health services. An increase in user fees charged by traditional healers
is associated positively, while an increase in the price of rice and sugar is asso-
ciated inversely, with the utilization of modern health services. Interestingly, the
availability of piped drinking water and organized trash collection in the commu-
nity, both of which are health-improving public interventions, serves to increase the
use of modern health providers. These results thus imply that individuals view tra-
ditional and modern health services as substitutes, modern medical care and food
inputs as complements, and modern health care and environmental hygiene inter-
ventions as complements, in the production and maintenance of health status. There
are obvious implications here for food price policy. Likewise, the empirical find-
ings indicate that there may be important positive externalities (in the form of in-
creased use of modern health services) to environmental hygiene and sanitation
interventions.
The empirical estimates also imply a massive increase in the demand for mod-
ern health services with an expansion in health insurance coverage in the Indone-
sian population. Currently, a very small fraction of households-only those having
a member in government or formal-sector wage employment-are covered by health
insurance schemes. However, like many other LDCs, Indonesia has been experi-
menting with expanded health insurance coverage for a much larger proportion of
the population.
Finally, the empirical results suggest that income growth alone is unlikely to
increase the utilization of modern health services in Indonesia. With increasing in-
comes, however, individuals are likely to purchase higher quality of health services
(i.e., spend more per illness episode).
ACKNOWLEDGMENTS
I would like to thank Jeffrey Hammer and Martin Ravallion, with whom I have had
numerous discussions on this topic. This chapter is a product of a World Bank re-
search project, “Determinants of Nutritional and Health Outcomes in Indonesia and
Implications for Health Policy Reforms,” in which both of them participated. I am
grateful to the World Bank Research Committee for funding the research. I would also
like to acknowledge the useful comments of an anonymous referee and of seminar
participants at Harvard, Yale, and the University of Pennsylvania, where an earlier
version of this work was presented. The responsibility for all errors rests entirely
with me.
I16 DEOMLIKAR
REFERENCES
Akin, J. S., N. Birdsall, and D. de Ferranti (1987), Financing Health Services in Developing
Countries, The World Bank, Washington, D.C.
Akin, J. S., C. Griffin, D. K. Guilkey, and B. M. Popkin (1984), The Demand for Primary
Health Care in the Third World, Littlefield, Adams, Totowa, NJ.
Akin, J. S., C. Griffin, D. K. Guilkey, and B. M. Popkin (1986), The Demand for Primary
Health Care in the Bicol Region of the Philippines, Economic Development and Cul-
tural Change, 34, 755-782.
Alderman, H. and P. Gertler (1988), The Substitutability of Public and Private Medical Care
Providers for the Treatment of Childrens' Illnesses in Urban Pakistan, Living Standards
Measurement Study Working Paper, World Bank, Washington, D.C.
Behrman, J. R. and A. B. Deolalikar (1988), Health and Nutrition, in Hollis Chenery and
R. N. Srinivasan (eds.), Handbook o j Development Economics, vol. 1, North-Holland,
Amsterdam.
Birdsall, N. and P. Chuhan (1986), Client Choice of Health Treatment in Rural Mali, mimeo,
Population, Health and Nutrition Department, World Bank, Washington, D.C.
Cornea, A. P., R. Jolly, and F. Stewart (eds.) (1987), Adjustment with a Human Face, vol. I,
Clarendon Press for UNICEF, Oxford.
Deaton, A. (1988), Quality, Quantity and Spatial Variation of Price, American Economic Re-
view, 78,418-430.
Deolalikar, A. B. and P. Vashishta (1992), The Utilization of Government and Private Health
Services in India, mimeo, University of Washington, Seattle.
Gertler, P., L. Locay and W. Sanderson (1987), Are IJser Fees Regressive? The Welfare Im-
plications of Health Care Financing Proposals in Peru, Journal of Econometrics, 33,
67-88.
Gertler, P. and J. van der Gaag (1990), The Willingness to Payfor Medical Care: Evidence from
Two Developing Countries, Johns Hopkins University Press, Baltimore.
Gilson, L. (1988), Government Health Care Charges: Is Equity Being Abandoned? EPC Pub-
lication No. 15, London School of Hygiene and Tropical Medicine.
Goldman, F. and M. Grossman (1978), The Demand for Paediatric Care: A Hedonic Appraisal,
Journal ojPolitica1 Economy, 86,259-280.
Government of Indonesia and UNICEF (1989), Situation Analysis of Children and Women in
Indonesia, Jakarta.
Grossman, M. (1972), On the Concept of Health Capital and the Demand for Health, Journal
of Political Economy, 80.
Heller, P. (1981), A Model of the Demand for Medical and Health Services in Peninsular
Malaysia, Social Science and Medicine, 16, 267-284.
Jimenez, E. (1987), Pricing Policy in the Socid Sectors: Cost Recovery for Education and
Health in Developing Countries, Johns Hopkins for The World Bank, Baltimore.
Manning, W. G., J. P. Newhouse, N. Daan, E. Keeler, B. Benjamin, A. Leibowitz, M. S. Marquis,
and J. Zwanziger (1987), Health Insurance and the Demand for Medical Care, mimeo,
The Rand Corporation, Santa Monica.
Mwabu, G. (1986), Health Care Decisions at the Household Level: Results of Health Survey
in Kenya, Social Science arid Medicine, 22, 313-319.
Newhouse, J. P. and C. E. Phelps (1974), Price and Income Elasticities for Medical Services,
in M. Perlman (ed.), The Econ,omic.s oj'Hecdth and Medical Care, Wiley, New York.
HEALTH
SERVICES IN A DEVELOPING
COUNTRY I 17
Rosset, R. N. and L. F. Huang (1973), The Effect of Health Insurance on the Demand for
Medical Care, Journal of Political Economy, 81,281-305.
Schwartz, J. G., J. S. Akin, and B. M. Popkin (1988), Price and Income Elasticities of Demand
for Modern Health Care: The Case of Infant Delivery in Indonesia, The World Bank
Economic Review, 2,49-76.
World Bank (1987), Health Center Expenditure in Indonesia, Background Paper I11 for Health
Planning and Budgeting, Washington, D.C.
World Bank (1992a), Viet Nam: Population, Health and Nutrition Sector Review, Report No.
10289-VN, Country Department I, East Asia Regional Office, The World Bank.
World Bank (1992b), World Development Report 1992, Oxford University Press for The World
Bank, New York.
This page intentionally left blank
On Mobility
Esfandiar Maasoumi
Southern Methodist University, Dallas, Texas
1. INTRODUCTION
The reliance on SWFs for evaluating and comparing welfare situations has
provided a scientifically useful tool that provides for economy of thought, as well as
discipline, since it forces a declaration of principles that are too often implicit. In the
instant case, the desirability of both types of mobility suggests functionals that are
increasing in incomes (for the growth component), and ultimately equality preferring
(for the exchange component). While changes in earnings and incomes have been
and should be studied with the aim of identifying significant “causal” factors, the
evaluation of an existing degree of mobility requires welfare comparisons.
There are currently at least two complementary lines of analyzing mobility.
The older approach requires specification of transition probabilities between social
states, and a welfare evaluation of the transition matrices that are estimated from the
existing data. This clearly requires detailed data, with large panel data being the best
source for sufficient cell repetitions which is necessary for reliable statistical infer-
ence. There are numerous mobility indices which are mappings from the transition
matrices to scalars. Shorrocks (1978a) and Geweke et al. (1986)provide systematic
discussions of criteria for sensible mobility indices.
In line with a general direction toward unanimous partial ordering in the liter-
ature on inequality, useful welfare ran king relations have been developed for tran-
sition matrices which rekindle the essential role played by Lorenz and generalized
Lorenz criteria. While the SWF evaluation has begun to be emphasized in this ap-
proach, the task of devising consensus SWFs over matrices remains a challenge
reminiscent of that faced in the multidimensional analysis of inequality. I provide
an account of this line of inquiry in Section 111.
The second approach was initiated by Shorrocks (1978b) and generalized by
recasting mobility as a multidimensionality question as in Maasoumi (1986); e.g.,
see Maasoumi and Zandvakili (1986, 1989, 1990). In this approach inequality in-
dices are computed for multiperiod incomes, that is, a type of “permanent income”
measured over more than a single period, and mobility indices with a profile of
equalization over time are obtained. The latter are directly related to and inter-
pretable by the familiar classes of increasing and equality preferring (Schur con-
cave) SWFs. It will be seen that the more recent welfare theoretic development of
the transition matrices approach is converging to the same welfare comparisons,
similar notion of long-run or “permanent” income, and thus similar preferred mo-
bility indices! Quite general mobility indices are proposed and empirically imple-
mented by Maasoumi and Zandvakili (1989, 1990). This approach is also demand-
ing of data and, like the first approach, it is ideally implemented with plentiful
micro data. But the Maasoumi-Shorrocks-Zandvakili (MSZ) measures may be ad-
equately estimated with data grouped by age, income, education, etc. This is use-
ful since much data is made available in this aggregated form, and the approach is
particularly focused on “equalization” between income groups in a way that allows
controlling for sources of heterogeneity among individuals and households. Such
controls with aggregate data appeal to some who may wish to tolerate some diver-
ON MOBILITY 12 1
sity due to such things as age or education. I present an account of this approach
in Section 11.
Our survey makes clear that both approaches share a concern for the distri-
bution of a welfare attribute as well as its evolution. And the convergence in both
approaches to rankings by Lorenz-type curves is of econometric significance. There
is now a well developed asymptotic inference theory for empirically testing for or-
der relations such as stochastic dominance, Lorenz dominance, and concentration
curves. This type of testing is a crucial first step since comparing mobility indices
(statistically or otherwise) is of questionable value when, for instance, generalized
Lorenz curves cross. Section IV contains a brief account of the statistical tools that
are available for inference on both the indices of mobility and order relations.
A loosely related strand of econometric research seeks to specify statistical
models of earnings or income “mobility.” While it is true that such models are fo-
cused on explaining “change” and “variation” which are not as meaningful as “mo-
bility,” they can shed light on significant explanations of earnings changes, as well as
account for heterogeneity. This is consequential for policy analysis. Further, there are
econometric models that seek to fit transition probabilities. Such studies are directly
useful for not only estimating transition matrices, but for explaining the estimated
probabilities. We do not delve into this empirically substantive area in this survey.
Section V concludes with several empirical applications of the “inequality reduc-
ing” mobility measures. An insightful survey of income mobility concepts is given
in Fields and Ok (1995). Absolute measurement of income mobility and partial or-
dering of absolute mobility states is treated in Mitra and Ok (1995).
In the case of mobility indices, Shorrocks (197833) and Maasoumi and Zand-
vakili (1986)argued that a more reliable measure of individual welfare and the dis-
tribution of incomes is to be obtained by considering the individual or household’s
“long-run” income. They argued that such a measure of income computed over in-
creasingly longer periods of time would remove transitory and some other life cycle
related movements which are picked up by year-to-year comparisons of income dis-
tributions. The annual “snapshots” are incapable of accounting for mobility and re-
turns to investments and/or human capital. The effects of seniority alone may make
the notion of income inequality meaningless.
These authors are therefore concerned with dynamics of income distribution,
and are thus accounting for, and challenged by, a fundamental lack of homogeneity
among households. The natural labelling of individuals at different points in their
life cycle is an essential form of heterogeneity which contradicts the common as-
sumption of symmetry (anonymity) which plays a crucial role in much of the welfare
theory that underlies inequality analysis. Shorrocks (1978b) proposed the simple
sum of incomes over T periods as the aggregate income. Maasoumi and Zandvakili
(1986, 1989,1990) proposed more general measure of “permanent income” encom-
passing the simple average and sum. This author recognized the essential multidi-
mensionality of the mobility analysis and proposed its treatment on the basis of the
techniques and the concepts developed in Maasoumi (1986). Consequently, general
functions for “permanent incomes” where developed which are maximum entropy
(ME) aggregators.
The next step in this development is to analyze the inequality in the perma-
nent incomes and compare with single period inequalities. A weighted average of the
latter was used by these authors to represent “short-run” inequality over any desired
number of periods. Clearly, a distribution of permanent incomes is being compared
and ranked with a reference short run income distribution. In view of this, all of the
rich welfare theory supporting Lorenz and generalized Lorenz (second order stochas-
tic dominance) dominance relations, as well as the convex inequality measures con-
sistent with such relations, comes at the disposal of the analyst for evaluating mo-
bility profiles and dynamic evolution of the income distribution. At first sight, this
appears an unnecessarily restrictive setting for defining the welfare value of mobil-
ity. Further reflection suggests that this is not so. Indeed, we will see that the recent
development of a welfare theory basis for the alternative of transition matrices has
made clear a certain inevitability for the role of the same welfare criteria and, there-
fore, the same types of mobility measures as the Maasoumi-Shorrocks-Zandvakili
indices!
Let X;,be the income of the ith individual in the tth state (period, say); i E
[ 1, n], and t E [ 1, T]. Let Si (Xi;
a,#?)denote the ith individual’s permanent income
(living standard?) over a number of periods k = 1 , 2 , . . . , T , and S = (S1, Sz,
. . . , S,) denote the vector of such incomes for the n households or individuals. The
inequality measures which are consistent with a set of axioms to be described below
ON MOBILITY I23
are represented by I,(.). Let X denote the welfare matrix with the typical element
Xit and denote its ith row by X ; and its tth column by X‘. The latter is the income
vector in the tth periodktate.
The k-period long-run inequality is given by Z,(S), and short run inequality
may be represented by I; = atZ,(X‘), for k = 1 , 2 , . . . , T . The vector a =
( a l ,a2, . . . , a ~represents
) the weights given to the income distribution in different
periods, such as the ratio of the mean income in the period, p,, and the overall mean
of incomes in all the k periods under analysis.
Shorrocks (197813) proposed S ; = ELl X i t , and p l / p ,the ratio of means just
described, as weights, and Maasoumi and Zandvakili (1986)generalized S; and the
weight functions, suggesting the following index of mobility:
with
where Z; is the between group inequality computed over the group mean incomes,
I‘ is the inequality within the rth group, r = 1 , 2 , . . . , R , n, is the number of units
in group I, and X, is total income of the rth group.
where a,s are the weights attached to each period. Minimizing Dg with respect to S ;
such that C
Si = 1, produces the following “optimal” aggregate income functions:
These are, respectively, the hyperbolic, the generalized geometric, and the
weighted means of the incomes over time. Noting that the “constant elasticity of
+
substitution” -0 = 1/(1 p), these functional solutions include many of the well-
known utility functions in economics, as well as some arbitrarily proposed aggregates
in empirical applications. For instance, the weighted arithmatic mean subsumes the
simple total income discussed earlier, and a popular “composite welfare indicator”
based on the principal components of X , when at’s are the elements of the first eigen-
vector of the X’X matrix (Maasoumi 1989a).The “divergence measure” I),(.) forces
a choice of an aggregate income vector S = (Sl , S2, . . . , S,) with a distribution that
is closest to the distributions of its constituent variables. This is desirable when the
goal of our analysis is the assessment of income distribution and its dynamic evolu-
tion. The entropy principle establishes that any other S would be extra distorting of
the objective information in the data matrix X . The distribution of the data reflect the
outcome of all optimal allocative decisions of all agents in the economy (Maasoumi
1986).
The next step in constructing general mobility indices as proposed by Maa-
soumi and Zandvakili (1986) is the selection of a measure of inequality. The GE
index described above was computed for the Si functions just obtained. It is instruc-
tive to analyze this measure in the discrete case:
where pi is the ith unit’s population share (typically = l/n), and Sr is S; divided by
the total K + S;.
ON MOBILITY I27
Z-l(S) = c T
t= 1
stz-1 ( X t ) - Do(x,s*;6)
DO(S*,x; 6) = 6, s; log
t i
However, there are some types of income structures that are eliminated from
this set. An example is situations in which, in every period t , all individuals receive
the same income pCLt.
The MSZ indices satisfy this requirement. Other indices may not; see below.
(A2) Continuiv. The degree of mobility varies continuously with the incomes
in X.
The MSZ indices satisfy this property.
(A3) Population symmetry. If X’ = nX for some permutation matrix l7,then
X and X’ are equally mobile. This requires “rank invariance” to be acceptable since
X and X‘ may not given the same ranking of individuals.
The term “population symmetry” is used because in the analysis of mobility,
unlike inequality, we can consider permutation of the time period distributions (the
columns of X ) as well as permutations of the individual income profiles (the rows of
X ) . One may thus define time symmetry separately:
ON MOBILITY I29
the above definition that relative mobility is measured since one is only looking at
changes in relative incomes. This rules out pure exchange mobility. It is a type of
rank invariance property that goes beyond requiring that a structure in which all
individuals incomes are constant over time is completely immobile.
The MSZ indices satisfy this property. They also satisfy the following stronger
normalization property:
Then the sample covariance of the observed income profile can be shown to
be zero.
The associated property is defined as follows:
Then
(A I 2) Atlunson-Bourguignon condition (for two periods). The income structure
X’ is more mobile than X whenever X’ is obtained from X by a simple switch.
This condition implies that if income profiles of the two persons i and j are
initially rank correlated, then a switch of incomes in either period enhances mobility.
This condition has not been generalized to T 2 2. The MSZ indices satisfy this
condition when income aggregates are weighted averages.
Shorrocks (1992) studied a particular member of the MSZ family which sat-
isfies all of the above properties except (A8),strong perfect mobility. This member
is obtained from the measures defined by Maasoumi and Zandvakili (1986, 1989)
where aggregate income is E, a t X i t , a,= l / p t ,and the short run inequality is rep-
resented by a,,utZ,(Xt), and Z,(.) is the coefficient of variation, a 2 / p .He refers
to this as an “ideal” index. Shorrocks (1992) also looked at another mobility index
attributed to Hart. The latter index is more conveniently described in relation to
indices defined over transition matrices to which we now turn.
132 MAASOUMI
As was argued before, income distributions change over time under the effect of dif-
ferent transition mechanisms. Transition mechanisms affect social welfare by chang-
ing the income distribution. Two societies with the same income distribution at a
point in time may have different levels of social welfare depending on the mobility
of the populations. This requires welfare functions defined over an expanding time
dimension.
In a Markov chain model of income generation, Dardanoni (1993) consid-
ers how economic mobility influences social welfare by following the approaches
of Atkinson (1970), Markandya (1984),and Kanbur and Stiglitz (1986).He consid-
ers the welfare prospects of individuals in society by deriving the discounted stream
of income distributions which obtain under different mobility structures. He pro-
poses a class of SWFs over the aggregates of these welfareprospects, and derives some
necessary and sufficient conditions for unambiguous welfare rankings. Since these
aggregates are the discounted stream of incomes, a special case of the aggregates
proposed by this author and described in the previous section, the two approaches
of this section and the previous one converge when the same welfare functions and
the same measures of “permanent” income are used.
The fundamental inequality theorem states that the Lorenz curve gives the
normatively significant ordering of equal mean income distributions. Inequality in-
dices are difficult to interpret when Lorenz curves cross. In a similar vein, Dardanoni
(1993) derives a partial order of social mobility matrices which can be considered
as the natural extension of the Lorenz ordering to mobility measurement. The de-
rived ordering may provide conditions for an unambiguous welfare recommendation
without employing a specific mobility measure.
Summary mobility measures induce a complete order on the set of mobility
matrices and have the advantage of providing intuitive measurements and firm rank-
ings. However, it is clear that there are substantial problems in trying to reduce a ma-
trix of transition probabilities into a single number. This is very much the problem
of multidimensional inequality measurement addressed by Maasoumi (1986),Ebert
(1995a), and Shorrocks (1995). Dardanoni (1993) offers the following example of
three mobility matrices
PI = [ .35 B
.6 .4
.35 :
] P2 =
[ .6 .3 . l ]
.3 .5 .2 & =
[ ]
.6 .4 0
.3 .4 .3
.05 .25 .7 . 1 .2 .7 .1 .2 .7
The rows denote current state and columns denote future state. Suppose we
use some common summary immobility measures as proposed and discussed by,
for instance, Bartholomew (1982) and Conlisk (1990).Consider the second largest
eigenvalue modulus, the trace, the determinant, the mean first passage time, and
Bartholomew’s measure. These indices are defined below. Any of the three matrices
ON MOBILITY 133
may be considered the most mobile depending on which immobility index is chosen.
This is illustrated in the following table, which shows the most mobile mobility matrix
according to the different indices.
Therefore, given a vector U and a p, any two transition matrices with equal
steady-state income distributions will be indifferent. This result is given by Atkinson
(1983)and Kanbur and Stiglitz (1986)and indicates that we are not ranking mobility
as such, but the social welfare implications of each mobility matrix. The symmetric
additive social welfare functional implies that movement between income states is
ON MOBILITY 135
irrelevant. What is important is the spot distribution at each period since additive
separable lifetime welfares remove any influence that exchange mobility may have
on intertemporal social welfare. Thus additive SWFs take inadequate account of fair-
ness considerations. Under the stated assumptions, the equilibrium Lorenz curve of
the distribution of income will look identical each period under any transition matrix
with equal steady-state distribution, so that any (additive or otherwise) symmetric ex
post SWF defined on the vector of realized utilities will rank the matrices as indiffer-
ent. Yet, under different transition matrices the composition of people in each income
state will be different in each time period. For example, under the identity transition
matrix each individual in the population remains in the same income group as in
the initial situation; on the other hand, if transition is governed by a matrix in which
each entry is equal to l / n , each individual will have the same probability of belong-
ing to any of the n income groups regardless of the initial state. Therefore, though the
equilibrium ex post Lorenz curves associated with each of these matrices could look
identical for each period, social welfare may well be considered different if we take
account of several periods in terms of the position of each individual in the past.
Clearly the natural labelling of welfare units in the context of mobility requires
a relaxation of the “symmetry” assumption, such as (A3), which are replaced by ad-
ditional assumptions on “comparability.” These assumptions are discussed in, for
example, Sen (1970) and Atkinson and Bourguignon (1987). Here the natural “la-
bel” for each individual is hidher starting position in the income ranking. Thus one
restricted SWF would be the weighted sum of the expected welfares of the individ-
uals, with greater weights to the individuals who start with a lower position in the
society. That is, W ( V p ,A) = xi nihiyP,where h = ( h l h 2 , . . . , An)’ denotes a
nonincreasing nonnegative vector of weights. This is a step toward cardinalization
(Maasoumi 1996b). Furthermore, this asymmetric treatment makes sense only if it
is a disadvantage to start at a lower position. With no restriction on the mobility
matrices, this is not necessarily a disadvantage. There could be a transition matrix
such that the lower states are the preferred starting point in terms of lifetime expected
utility. Therefore the additional assumption:
(i) P(P> P M Q ( p ) -
(ii) The Lorenz curve of permanent income for P lies nowhere below that of Q
V nondecreasing income vectors.
ON MOBILITY 137
(iii)The covariance between initial status and lifetime status is greater under
Q for any nondecreasing score (rank) vectors.
(iv) P ( . ) can be derived f r o m Q ( . ) by afinite sequence of DPD exchanges.
Formby et al. (1995) extend this result by relaxing the assumption of identical
steady states. They note that T ’ n P ( p )y is the generalized Lorenz vector of “perma-
nent incomes,” and show the following result:
Theorem 4. Let P and Q be two monotone transition matrices with a given
discount factor p. Then the following conditions are equivalent:
(li)w(v“,A) 3 W ( V ~A).,
(lii)T ’ n [ ( P ( p )- Q ( p ) l T I0.
(liii)The generalized L0ren.z curve of permanent income for P lies nowhere
below that f o r Q f o r all nondecreasing income vectors y .
(liv) P ( p ) can be derived f r o m Q ( p ) by afinite sequence of DPD exchanges
and simple increments.
Proof By noting that the assumption of steady-state income distribution is not
crucial in proving Dardanoi’s results as well as in Theorem 1, Formby et al. (199s)
prove the equivalence among conditions (li), (lii), and (liii). Also, (liv) implies
(lii). The converse can be similarly proved. Note that each DPD or simple increment
leaves all elements other than the (2, j)th of T ’ n [ ( P ( p )- Q ( p ) ] T unchanged.
Formby et al. (1995) demonstrate a further result which can be useful in em-
pirical testing based on the generalized concentration curves:
Theorem 5. Let P and Q be two monotone transition matrices. Then the fol-
lowing conditions are equivalent:
Two main points emerge. One is that, both the inequality reducing and transi-
tion matrix measures are supported by the same type of welfare functions in terms of
“lifetime incomes.” Second, the empirical techniques for testing Lorenz type domi-
nance, or stochastic dominance, are all made available. For instance, Bishop, Chow,
and Formby (1994)show that matched pairs of estimates of generalized Lorenz and
concentration curve ordinates have a joint normal distribution and this sampling
property is distribution free. In general they have to be applied to the same statistics
which are, however, defined over the types of aggregate income functions proposed
by Maasoumi and Zandvakili (1986, 1989, 1990). Further description of some sta-
tistical techniques is given in the next section.
This model can be used to analyze both income movements over time as well
as the effect of mobility on the distribution of income. pt measure the extent to which
incomes regress toward the geometric mean. The case of a “unit root” corresponds to
Gibrat’s law of proportional effect: changes in relative incomes are independent of
current income. This simple model of mobility may be extended by further modeling
the E , in terms of individual specific characteristics and/or time varying effects. An
example is Lillard and Willis (1978) where panel data are used. Of course, using
the techniques of limited dependent variable models, transition probabilities can
be similarly modeled in terms of individual specific and time varying components.
A survey of several applications is given in Creedy (1985).Alternative models of
diffusion describing the evolution of income have been proposed which derive its
steady-state distribution forms. An example is Sargan (1957). Interestingly, the focus
seems to have shifted to analyzing the properties of the equilibrium distribution and
instantaneous inequality and poverty, rather than the dynamic welfare implications
of the evolution mechanism. Mobility analysis is thus a return to first principles.
Shorrocks defined the following function of Hart’s mobility index (Hart 1976a,
1976b, 1981, 1983),which is derived from the Galtonian model above:
There have been several significant advances in the development of statistical infer-
ence tools in the area of income inequality. These are generally applicable to infer-
ence on mobility indices and on ranking distributions.
For mobility indices such as the MSZ the connection is rather immediate. In-
equality indices are estimated by the method of moments (MM) estimators since they
are functions of population moments. Explicit formulae are derived for derivatives
that are required in the delta method which extends the well-known theory of MM
asymptotic distributions to that of inequality indices. This is surveyed in Maasoumi
(1996a) which contains an extensive citation to original sources. The extension to
mobility indices requires thinking in terms of long-run incomes and the inequality
in their distributions. Trede (1995)gives the details for the asymptotic distribution of
some of the MSZ measures, such as those based on the Atkinson family and Theil’s
inequality indices, but where aggregate income is the simple sum of incomes ana-
lyzed by Shorrocks (1978b). Extension to weighted sum function is immediate, but
the statistical theory for the more complicated aggregate functions is developed in
Maasoumi and Trede (1997). Trede (1995) also gives the asymptotic distributions of
the mobility indices that are based on transition matrices. Some of these measures
were discussed earlier. An application to German data is reported in Trede (1995)
which analyzes earnings mobility for different sexes. A program written in GAUSS
code is made available by Trede.
ON MOBILITY 141
But inference about indices derived in either of the two approaches described
in this paper may be inconclusive, if not bewildering, when Lorenz-type curves cross.
Therefore it is desirable to test for rank relations of the type described above and in
Maasoumi (1996a, 1996b). This type of testing is now possible and is inspired by
testing for inequality restrictions in econometrics and statistics. A brief account of
some of the available techniques for stochastic dominance would be helpful and
follows.
dinates, see Bishop et al. (1994)and Anderson (1994).These alternatives are some-
times problematic when their implicit null and alternative hypotheses are not a sat-
isfactory representation of the inequality (order) relations that need to be tested.
Xu et al. (1995) and Xu (1995) take proper account of the inequality nature of
such hypotheses and adapt econometric tests for inequality restrictions to testing
for FSD and SSD, and to GL dominance, respectively. Their tests follow the work in
econometrics of Gouieroux et al. (1982), Kodde and Palm (1986), and Wolak (1988,
1989), which complements the work in statistics exemplified by Perlman (1969),
Robertson and Wright (1981),and Shapiro (1988).The asymptotic distributions of
these x2 tests are mixtures of chi-squared variates with probability weights which
are generally difficult to compute. This leads to bounds tests involving inconclusive
regions and conservative inferences. In addition, the computation of the X 2 statis-
tic requires Monte Carlo or bootstrap estimates of covariance matrices, as well as
inequality restricted estimation which requires optimization with quadratic linear
programming.
In contrast, Maasoumi et al. (1996) propose a direct bootstrap approach that
bypasses many of these complexities while making less restrictive assumptions about
the underlying processes. They offer an empirical application for ranking U S . in-
come distributions from the CPS and the PSID data. Their chosen statistic is the
Kolmogorov-Smirnov (KS) as characterized by McFadden (1989), Klecan et al. (1991),
and Kaur et al. (1994).
McFadden (1989)and Klecan, McFadden, and McFadden (1991) have pro-
posed tests of first- and second-order “maximality” for stochastic dominance which
are extensions of the Kolmogorov-Smirnov statistic. McFadden (1 989) assumes i.i.d.
observations and independent variates, allowing him to derive the asymptotic distri-
bution of his test, in general, and its exact distribution in some cases (Durbin 1973,
1985). Klecan et al. generalize this earlier test by allowing for weak dependence
in the processes both across variables and observations. They demonstrate with an
application for ranking investment portfolios. The asymptotic distribution of these
tests cannot be fully characterized, however, prompting Monte Carlo and bootstrap
methods for evaluating critical levels. In the following section some definitions and
results are summarized which help to describe these tests.
Quantiles q , ( p ) and qr(p) are implicitly defined by, for example, F[X 5
qx(p)l = P -
Definition (FSD). X first-order stochastic dominates Y , denoted X FSD Y , if
and only if any one of the following equivalent conditions holds:
The tests of FSD and SSD are based on empirical evaluations of conditions
(2) or (3). Mounting tests on conditions (3) typically relies on the fact that quantiles
are consistently estimated by the corresponding order statistics at a finite number
of sample points. Mounting tests on conditions (2) requires empirical cdfs and com-
parisons at a finite number of observed ordinates. Also, from Shorrocks (1983) or Xu
(1995) it is clear that condition ( 3 ) of SSD is equivalent to the requirement of GL
dominance. FSD implies SSD.
Noting the usual definition of the Lorenz curve of, for instance, X as L,(x) =
( l / p x )m
J: X x d F ( t ) , and its GL (x) = p,L,(x), some authors have developed
tests for Lorenz and GL dominance on the basis of the sample estimates of conditional
interval means and cumulative moments of income distributions; e.g., see Bishop
et al. (1989), Bishop et al. (1991), Beach et al. (1995), and Maasoumi (1996a) for a
general survey of the same. The asymptotic distributions given by Beach et al. (1995)
are particularly relevant for testing for third-order stochastic dominance (TSD). The
latter is a useful criterion when Lorenz or GL curves cross at several points and the
investigator is willing to adopt “transfer sensitivity” of Shorrocks and Foster (1987),
that is a relative preference for progressive transfers to poorer individuals. When
either Lorenz or generalized Lorenz curves of two distributions cross unambiguous
ranking by FSD and SSD is not possible. Whitmore (1970) introduced the concept of
TSD in finance. Shorrocks and Foster (1987) showed that the addition of the “transfer
sensitivity” requirement leads to TSD ranking of income distributions. This require-
ment is stronger than the Pigou-Dalton principle of transfers and is based on the
class of welfare functions U:3 which is a subset of U2 with U”‘ 2 0. TSD is defined
as follows:
144 MAASOUMI
(1) E[u(X)] 3 E[u(Y)] for all U E U s , with strict inequality for some U.
(2) Jf, l:,[F(t) - G ( t ) ]dt dv 5 0, for all x in the support, with strict in-
equality for some x,with the endpoint condition S_+,”[F(t) - G ( t ) ]dt 5
0.
(3) When E[X] = E[Y], X TSD Y iff h:(qi) 5 h;(qi), for all Lorenz curve
+
crossing points i = 1 , 2 , . . . , ( n 1);where h;(q;) denotes the “cumu-
lative variance” for incomes up to the ith crossing point (Davies and Hoy
1995).
When n = 1, Shorrocks and Foster (1987) showed that X TSD Y if (a) the
Lorenz curve of X cuts that of Y from above, and (b) Var(X) 5 Var(Y). This situation
revives the coefficient of variation as a useful statistical index for ranking distri-
butions.
Kaur et al. (1994)assume i.i.d observations for independent prospects X and
Y. Their null hypothesis is condition (2) of SSDf o r each x against the alternative of
strict violation of the same conditionf o r all x. The test of SSD then requires an appeal
to union intersection technique which results in a test procedure with maximum
asymptotic size of a if the test statistic at each x is compared with the critical value
2, of the standard normal distribution.
McFadden offers a definition of “maximal” sets, as follows:
The MSZ index method has been implemented by Shorrocks, Maasoumi, Zand-
vakili, Trede, and others. Trede’s work is based on German panel data and, as men-
tioned earlier, reports statistical tests of significant change in mobility. The first three
authors use the Michigan Panel data. We now exemplify some of these latter studies:
(i) There is a tendency for the profiles to fall and then level off as the num-
ber of aggregated years increases from one to 13.
(ii) The profiles for households headed by men fall faster and further than
those of women headed households.
(iii) These patterns are robust with respect to the choice of aggregation func-
tion, family size adjustment, and inequality measure.
The fact that the profiles are becoming flatter is an indication that, although
there have been some transitory movements in the size distribution of income, there
is a lack of any permanent equalization. Further, while some equalization has taken
place within each group of households, inequality between men and women headed
households has increased in absolute value.
anticipated that if the major causes of variation in incomes are transitory in nature,
the length of time spent in any income class will be short. “Permanent” income in-
equality changes will be very revealing in this context.
The total sample is divided into seven income groups (Gl-G7). The assign-
ment to groups is one a one-time basis and according to the simple arithmetic mean
income of the individual household over the 13-year period. These real income lev-
els begin with mean incomes of less than $4999, and increasing in increments of
$5000. The last group contains mean incomes of $35,000 or more.
Short-run inequalities and their decompositions based on income level are
given in Table 4. All the tables and figures in this section are taken from Maasoumi
and Zandvakili (1990). There are several recognizable patterns. The between-group
inequality has increased steadily over this period. The within-group component of
inequality fluctuates around a relatively constant mean value. The observed pat-
terns suggest that the nonincome differences do contribute to the increase in between
group inequality. Over 70% of women-headed households earn less than $15,000. Of
course, this is confounded by the differential impact of inflation on different income
groups (we use real incomes).
In Table 5 long-run inequality levels have risen after an initial decline. De-
composition by income level shows that the between-group component of I,(S) has
increased uniformly. At the same time the within-group inequality has decreased
steadily. This change has been dramatic so that in the later years the between-group
component is larger than the within component. These changes include the well-
known life-cycle and human capital effects, and are non inconsistent with the cu-
mulat ive effects predict ed by discrimination theories.
The long-run within-group inequalities reveal a falling trend for each of the
seven income groups. This is anticipated since transitory components are smoothed
out and individual incomes have approached group mean incomes in the long run.
These long-run grouping observations are somewhat sensitive to the family size.
Within-group aggregate income inequalities are noticeably smaller when income is
not adjusted for family size, and there is generally less inequality within the higher-
income groups.
Table 6 reports the stability profiles which reveal much higher degrees of per-
manent equalization within income groups than was observed for the gender groups
of the last section. Note that as the stability profiles of the whole sample flatten, the
corresponding within group profiles continue to fall. In our view some equalization
has occurred, but this is mostly confined to within income groups.
On the basis of the approximately 2300 households which remained in the
Michigan panel over the period 1969-1981. Maasoumi and Zandvakili (1990)con-
clude that (i) there is not a great deal of inequality between the men and women-
headed households; (ii) the dominant within-group component of inequality is either
increasing over this period or, when incomes are smoothed by time aggregation,
152 MAASOUMI
1969 1.109 0.151 0.958 0.879 1.429 0.668 0.608 0.282 0.293 0.209
1971 1.049 0.224 0.824 0.672 0.560 0.626 0.725 0.281 0.265 0.450
1973 1.002 0.310 0.692 0.524 0.415 0.430 0.479 0.217 0.220 0.212
1975 1.636 0.412 1.225 0.950 0.539 0.459 0.449 0.274 0.236 0.242
1977 1.569 0.524 1.045 0.633 0.487 0.479 0.365 0.290 0.178 0.242
1979 1.895 0.624 1.271 0.639 0.683 0.544 0.460 0.227 0.232 0.211
1981 2.441 0.725 1.716 0.722 0.985 0.679 0.525 0.570 0.310 0.299
1969 0.430 0.114 0.316 0.429 0.413 0.421 0.385 0.236 0.240 0.179
1971 0.466 0.161 0.306 0.413 0.334 0.356 0.411 0.231 0.227 0.224
1973 0.464 0.205 0.259 0.352 0.284 0.302 0.336 0.190 0.199 0.190
1975 0.531 0.259 0.272 0.354 0.324 0.312 0.314 0.224 0.204 0.211
1977 0.578 0.316 0.262 0.364 0.310 0.322 0.273 0.208 0.165 0.218
1979 0.613 0.346 0.267 0.353 0.374 0.335 0.286 0.194 0.193 0.183
1981 0.706 0.384 0.322 0.393 0.490 0.394 0.342 0.252 0.227 0.225
1969 0.375 0.103 0.272 0.396 0.356 0.389 0.353 0.233 0.243 0.179
1971 0.407 0.144 0.264 0.388 0.299 0.325 0.373 0.228 0.230 0.219
1973 0.404 0.180 0.224 0.336 0.263 0.279 0.314 0,188 0.202 0.192
1975 0.456 0.225 0.231 0.326 0.292 0.291 0.294 0.224 0.205 0.212
1977 0.494 0.275 0.219 0.337 0.282 0.303 0.260 0.205 0.167 0.224
1979 0.505 0.293 0.212 0.332 0.331 0.310 0.269 0.191 0.192 0.183
1981 0.571 0.322 0.249 0.364 0.432 0.361 0.315 0.234 0.218 0.221
1969 0.367 0.096 0.270 0.402 0.339 0.391 0.349 0.243 0.263 0.187
1971 0.402 0.134 0.268 0.398 0.290 0.323 0.367 0.235 0.246 0.228
1973 0.395 0.165 0.230 0.348 0.261 0.272 0.313 0.192 0.214 0.202
1975 0.448 0.207 0.242 0.333 0.284 0.290 0.293 0.237 0.215 0.223
1977 0.492 0.254 0.238 0.343 0.274 0.306 0.263 0.214 0.176 0.245
1979 0.483 0.266 0.218 0.345 0.320 0.311 0.271 0.196 0.200 0.191
1981 0.549 0.293 0.256 0.382 0.424 0.362 0.311 0.236 0.222 0.233
ON MOBILITY 153
1969 1.109 0.151 0.958 0.879 1.429 0.668 0.608 0.282 0.293 0.209
1969-71 0.949 0.208 0.741 0.610 0.708 0.544 0.601 0.230 0.240 0.261
1969-73 0.889 0.255 0.634 0.486 0.504 0.425 0.506 0.208 0.222 0.222
1969-75 0.951 0.311 0.640 0.490 0.404 0.344 0.426 0.188 0.208 0.200
1969-77 0.974 0.365 0.609 0.444 0.341 0.303 0.322 0.166 0.188 0.186
1969-79 1.035 0.425 0.610 0.422 0.313 0.270 0.274 0.142 0.160 0.172
1969-81 1.100 0.481 0.619 0.401 0.308 0.244 0.247 0.147 0.146 0.155
1969 0.430 0.114 0.316 0.429 0.413 0.421 0.385 0.236 0.240 0.179
1969-71 0.414 0.1M 0.270 0.354 0.311 0.346 0.374 0.198 0.204 0.176
1969-73 0.408 0.170 0.238 0.309 0.261 0.282 0.328 0.181 0.193 0.166
1969-75 0.414 0.200 0.213 0.275 0.225 0.242 0.290 0.166 0.176 0.159
1969-77 0.426 0.233 0.192 0.259 0.200 0.210 0.233 0.150 0.158 0.154
1969-79 0.434 0.264 0.171 0.241 0.181 0.180 0.192 0.129 0.135 0.143
1969-81 0.455 0.294 0.161 0.232 0.179 0.167 0.180 0.122 0.122 0.130
1969 0.375 0.103 0.272 0.396 0.356 0.389 0.353 0.233 0.243 0.179
1969-71 0.364 0.128 0.236 0.334 0.274 0.320 0.342 0.199 0.208 0.177
1969-73 0.360 0.150 0.210 0.294 0.236 0.262 0.304 0.182 0.198 0.169
1969-75 0.363 0.175 0.188 0.260 0.205 0.225 0.269 0.168 0.179 0.163
1969-77 0.373 0.204 0.169 0.245 0.182 0.196 0.219 0.157 0 .161 0.160
1969-79 0.381 0.231 0.150 0.229 0.164 0.172 0.185 0.135 0.139 0.150
1969-81 0.396 0.257 0.139 0.219 0.162 0.163 0.175 0.128 0.127 0.138
~~ ~
1969 0.367 0.0% 0.270 0.402 0.339 0.391 0.349 0.243 0.263 0.187
1969-71 0.355 0.117 0.238 0.339 0.261 0.318 0.335 02.07 0.223 0.187
1969-73 0.351 0.137 0.214 0.298 0.226 0.256 0.2% 0.188 0.212 0.179
1969-75 0.354 0.159 0.195 0.263 0.197 0.219 0.262 0.176 0.190 0.173
1969-77 0.366 0.186 0.180 0.246 0.174 0.191 0.214 0.168 0.169 0.173
1969-79 0.371 0.210 0.161 0.230 0.153 0.171 0.182 0.144 0.147 0.161
1969-81 0.386 0.235 0.152 0.220 0.153 0.165 0.173 0.136 0.135 0.152
154 MAASOUMI
1969 1.OOO 0.136 0.864 1.000 1.OOO 1.OOO 1.OOO 1.000 1.000 1.OOO
1969-71 0.916 0.201 0.715 0.812 0.857 0.872 0.907 0.857 0.878 0.%7
1969-73 0.885 0.254 0.631 0.754 0.775 0.766 0.857 0.815 0.853 0.825
1969-75 0.828 0.271 0.557 0.712 0.669 0.680 0.794 0.726 0.793 0.780
1969-77 0.786 0.295 0.492 0.676 0.605 0.592 0.670 0.631 0.756 0.744)
1969-79 0.744 0.305 0.439 0.644 0.538 0.507 0.578 0.562 0.675 0.709
1969-81 0.692 0.303 0.390 0.614 0.480 0.437 0.523 0.4% 0.577 0.617
1969 1.000 0.265 0.735 1.000 1.OOO 1.OOO 1.OOO 1.000 1.000 1.000
1969-7 1 0.928 0.323 0.605 0.852 0.861 0.901 0.930 0.880 0.892 0.871
1969-73 0.903 0.376 0.527 0.799 0.785 0.805 0.880 0.844) 0.870 0.833
1969-75 0.877 0.425 0.452 0.745 0.689 0.726 0.821 0.768 0.819 0.791
1969-77 0.855 0.468 0.386 0.713 0.627 0.634 0.713 0.693 0.773 0.755
1969-79 0.832 0.505 0.327 0.670 0.554 0.545 0.611 0.617 0.684 0.716
1969-81 0.813 0.526 0.287 0.637 0.510 0.490 0.572 0.567 0.607 0.639
1969 1.000 0.275 0.725 1.000 1.OOO 1.OOO 1.OOO 1.000 1.WO 1.OOO
1969-7 1 0.932 0.328 0.604 0.864 0.860 0.908 0.936 0.890 0.903 0.885
1969-73 0.911 0.379 0.532 0.812 0.789 0.815 0.887 0.853 0.885 0.853
1969-75 0.885 0.427 0.458 0.758 0.694 0.732 0.825 0.785 0.833 0.812
1969-77 0.867 0.474 0.393 0.722 0.629 0.638 0.721 0.727 0.783 0.781
1969-79 0.855 0.518 0.337 0.685 0.553 0.564 0.629 0.654 0.705 0.750
1969-81 0.840 0.545 0.296 0.647 0.513 0.520 0.595 0.609 0.637 0.675
1969 1.000 0.262 0.738 1.000 1.OOO 1.OOO 1.OOO 1.000 1.000 1.OOO
1969-71 0.927 0.306 0.621 0.861 0.848 0.904 0.932 0.891 0.906 0.894
1969-73 0.904 0.352 0.553 0.806 0.776 0.806 0.879 0.851 0.886 0.860
1969-75 0.879 0.395 0.484 0.752 0.682 0.718 0.811 0.790 0.834 0.822
1969-77 0.864 0.440 0.424 0.712 0.614 0.627 0.708 0.745 0.780 0.800
1969-79 0.852 0.483 0.369 0.673 0.530 0.563 0.623 0.667 0.705 0.767
1969-81 0.839 0.510 0.330 0.633 0.4% 0.528 0.592 0.628 0.649 0.700
ON MOBILITY 155
relatively stable; (iii) this larger within-group component of inequality is due to high
levels of inequality within lower income groups (such as women headed households);
(iv) grouping by real income brackets leads predictably to very large between-group
inequality values. (v) Some equalization of real incomes has occurred over time
within most income groups, but this is very hard to judge by a comparison of an-
nual inequality measures and most clearly revealed by using our “permanent in-
come” distributions; (vi) modest levels of mobility are recorded as the aggregation
interval is expanded, but the corresponding profiles flatten out after about eight
or nine years.
We close this subsection with Fig. 1, which summarize the evolution of the
income distribution for this sample with the graph of the stability profile pt.
Maasoumi and Zandvakili (1989)is based on the same data as the previous section,
but the role of years of schooling of the head of household, hisher age, and race were
examined through decomposition of the inequality/mobility measures. Tables 7-1 5
are from that source.
Table 7 is a summary of short run, long run, and the stability values for all the
13years. Tables 8-10 provide decompositions by educational attainment which was
indicated by the years of completed schooling by the head. The increase in overall
short-run inequality is primarily due to increases in within group inequalities. Long-
run inequality is quite stable. The R measure declines over longer periods. This
indicates that while there is much short-run mobility (change), this does not change
permanent income inequality. Note that this phenomenon may be partly due to the
anonymity of our measures which are invariant to short-run switching of positions by
individuals.
It should be noted that education is both a capital good and a provider of
a stream of consumption. It has different values for different individuals. This
heterogeneity effect is here controlled for leading to conditional inferences. For a
discussion of these issues and a multidimensional treatment in which education
is regarded as a distinct attribute (with income and wealth) see Maasoumi and
Nickelsburg (1988).
Tables 8-10 indicate that the greatest inequality is within the group with fewest
schooling years. Indeed, within-group inequality declines steadily with educational
attainment: education is an equalizer (some might argue it is a restraint over un-
usual earnings!). Between groups inequality is rising somewhat over these years but
is about one quarter of total inequality, and declining proportionately.
Long-run inequality is much more stable over time and is smaller than short-
run inequalities. Looking at these figures a policymaker is less likely drawn to quick
reactions to transitory phenomenon, and more likely to focus on stable features, for
156 MAASOUMI
I .oo 9=2.0
-A--
0.95
0.90
Measure 0.85
of income
stability
0.80
0.75 -
0.70 -
c
I
L I 1 I I I I I I I I I
69 I1 73 75 77 79 81
Year
Figure I Measure of income stability PCFI 1969-1981, based on mean of income weights.
Table 7 Per Capita Family Income 1969-1981 Based on Mean of Income Weights
Short-run inequalities
1%9 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981
/2.0(Y) 1.109 0.961 1.049 0.940 1.002 1.190 1.636 1.357 1.569 1.840 1.895 2.072 2.441
Il,o(Y) 0.430 0.443 0.466 0.460 0.464 0.488 0.531 0.548 0.573 0.594 0.613 0.669 0.706
Zo,s(Y) 0.375 0.388 0.407 0.404 0.404 0.423 0.456 0.468 0.494 0.4% 0.505 0.551 0.571
Zo.o(Y) 0.367 0.381 0.402 0.399 0.395 0.415 0.448 0.458 0.492 0.481 0.483 0.541 0.M9
~~
2.0 1.109 0.979 0.949 0.906 0.889 0.896 0.951 0.955 0.974 1.007 1.035 1.060 1.100
1.0 0.430 0.417 0.414 0.412 0.408 0.409 0.414 0.418 0.426 0.429 0.434 0.445 0.455
0.5 0.375 0.364 0.364 0.362 0.360 0.360 0.363 0.368 0.373 0.377 0.381 0.387 0.3%
0.0 0.367 0.355 0.355 0.353 0.351 0.350 0.354 0.359 0.366 0.370 0.371 0.378 0.386
Income stability
V R1 R2 R3 R4 R5 R6 R7 R8 R9 RIO R11 R12 R13
2.0 1.000 0.948 0.916 0.897 0.885 0.860 0.828 0.809 0.786 0.764 0.744 0.717 0.692
1.0 1.000 0.955 0.928 0.915 0.903 0.891 0.877 0.865 0.855 0.843 0.832 0.822 0.813
0.5 1.000 0.954 0.932 0.917 0.911 0.897 0.885 0.877 0.867 0.861 0.855 0.843 0.840
0.0 1.000 0.950 0.927 0.912 0.904 0.890 0.879 0.872 0.864 0.859 0.852 0.841 0.839
158 MAASOUMI
Table 8 (Continued)
Table 9 (Continued)
Table I0 (Continued)
~~ ~~ ~ ~~ ~ ~
Table I I (Continued)
instance, the fact that some mobility is experienced in the early part of this period,
but has ceased in the latter part of the sample. Similarly, we note that between-group
long-run inequality has risen consistently, suggesting that the expected returns to
schooling has materialized. In fact the inequality gap between the educational groups
here has widened by 50% over time; see John et al. (1993).
Tables 11-13 have the same structure as before but focus on the impact of
age. These tables suggest that between-group inequality, both short and long run,
has increased dramatically over time. Seniority matters! Within group inequality is
larger the older the group. This is also due to accumulation of returns to different
investments, opportunities, and attainments. Short-run inequalities increase within
groups, while long-run inequalities are stable with a moderate increase toward the
end of this period. Maasoumi et al. (1996)find these trends have continued. Finally
we note that these figures are based on per capita incomes. Since family size and
composition changes over time these figures show greater volatility than the authors
found with total family incomes unadjusted for family size (Maasoumi and Zandvakili
1989, Appendix).
Tables 14-15 provide decompositions by race, noting that the “non-white”
group includes all heads not classified as “white.” This explains the large within-
group inequality. The number of households in each group is given in the last col-
umn. Inequality among non-whites has increased faster than among whites. Short
run inequality has increased within both groups, and somewhat increased in the ag-
gregated incomes. Between-group inequality in the short run distributions declined
somewhat in the first half of the period and increased again in the last 4-5 years
of the sample. For the long-run incomes, between-group inequalities are rather sta-
ble with a slight decline over time. It would appear that within-group characteristics
are controlling of the degree of inequality in this sample and for this decomposition.
Other grouping criteria that are more race specific than “non-white” are known to
indicate greater between group inequality. See John et al. (1991).
Experimentation over the members of the GE family, as well as with different
sets of weights given to incomes at different points in time, represent an attempt to
robustify summary findings. This is an important element of empirical work in this
area since unanimity with respect to weights and degree of aversion to inequality is
not likely. Of course, an interpretation of this “robustification” technique is that it
is an empirical substitute for unanimous ranking by Lorenz-type comparisons over
plausible ranges of parameter values. This is useful when such curves cannot be
statistically ordered when they cross only at extreme parametric values.
Several other applications to U.S. and U.K. data are reported in Shorrocks
(1978b, 1981).A good deal more is now possible given the dominance testing tech-
niques of Section IV, and the asymptotic distribution theory summarized in Maa-
soumi (1996a). The bootstrap alternative appears very promising, as demonstrated
by Mills and Zandvakili (1996).
ON MOBILITY 165
Table I 2 (Continued)
Table I 3 (Continued)
Table I 4 (Continued)
Table I 5 (Continued)
ACKNOWLEDGMENTS
REFERENCES
Maasoumi, E. and S. Zandvakili (1990), Generalized Entropy Measures of Mobility for Differ-
ent Sexes and Income Levels, Journal of Econometrics, 43, 121-133.
Markandya, A. (1984), The Welfare Measurement of Changes in Economic Mobility, Econo-
metrica, 51,457-471.
McFadden, D. (1989), Testing for Stochastic Dominance, in T. Formby and T. K. Seo (eds.),
Studies in the Economics of Uncertainty, Part 11 (in Honor of J. Hadar), Springer-Verlag,
New York.
Mills, J. and S.Zandvakili (19%), Bootstrapping Inequality Measures, Working Paper, De-
partment of Economics, University of Cincinnati.
Mitra, T. and E. A. Ok (1995), The Measurement of Income Mobility: A Partial Ordering
Approach, mimeo, Department of Economics, Cornell University.
Perlman, M. D. (1969), One-sided Testing Problems in Multivariate Analysis, Annals ofMath-
ematical Statistics, 40, 549-562.
Prais, S.J. (1955), Measuring Social Mobility, Journal of the Royal Statistical Society A , 56-
66.
Robertson, T. and F. T. Wright (1981), Likelihood Ratio Tests for and Against Stochastic Or-
dering Between Multinomial Populations, Annals ofStatistics, 9, 1248-1257.
Roberston, T. and F. T. Wright, and R. Dykstra (1982), Order Restricted Statistical Inference,
Wiley, New York.
Sargan, J. D. (1957), The Distribution of Wealth, Econometrica, 25,568-590. Also Reprinted
as Chapter 3 in E. Maasoumi (ed.), Contributions to Econometrics: J. D.Sargan, Vol. 1,
Cambridge University Press, Cambridge.
Sen, A. (1970), Collective Choice and Social Weware, Holden-day, San Francisco, reprinted,
North-Holland, Amsterdam, 1979.
Shapiro, A. (1988),Towards a Unified Theory of Inequality Constrained Testing in Multivariate
Analysis, International Statistical Review, 5 6 , 4 9 4 2 .
Shorrocks, A. F. (1976), Income Mobility and the Markov Assumption, Economic Journal, 86,
566-577.
Shorrocks, A. F. (1978a), The Measurement of Mobility, Econometrica, 46, 1013-1024.
Shorrocks, A. F. (1978b), Income Inequality and Income Mobility, Journal of Economic The-
ory, 19,376-393.
Shorrocks, A. F. (1980), The Class of Adaptively Decomposable Inequality Measures, Econo-
metrica, 48,376-393.
Shorrocks, A. F. (1981), Income Stability in the IJnited States, Chapter 9 in N. A. Klevmarken
and J. A. Kybeck (eds.), The Stutics and Dynamics of Income, Tieto, Clevedon.
Shorrocks, A. F. (1983), Ranking Income Distributions, Economica, 50,s-17.
Shorrocks, A. F. (1984), Inequality Decomposition by Population Subgroups, Econornetrica,
52,1369-1385.
Shorrocks, A. F. and J. Foster (1987), Transfer Sensitive Inequality Measures, Review of Eco-
nomic Studies, 54,485-497.
Shorrocks, A. F. and J. Foster (1992), On the Hart Measure of Income Mobility, in Industrial
Concentration and Income Inequality: Festschrzj f o r Peter Hart, forthcoming.
Shorrocks, A. F. and J. Foster (1995), Inequality and Welfare Evaluation of Heterogeneous In-
come Distributions, Unpublished Paper, Department of Economics, University of Es-
sex.
Theil, H. (1972), Statistical Decomposition Analysis, North-Holland, Amsterdam.
ON MOBILITY 175
1. INTRODUCTIONAND SUMMARY
A. Overview
The neoclassical development of producer and consumer theory, culminating in the
use of duality theory and the introduction of flexible functional forms in the 1970s,
focused on the restrictions on demand and supply functions implied by optimizing
behavior of producers and consumers. These restrictions are completely character-
ized by the symmetry and negative semidefiniteness of the (Slutsky) matrix of sub-
stitution effects in consumer theory and the symmetry and positive semidefiniteness
of the Jacobian of the (net) supply functions in the case of producer theory.* They
are important for econometric demand and supply analysis in part because they re-
duce the number of independent parameters to be estimated. A classic example is
development of the linear expenditure system, first estimated by Stone (1954). Be-
ginning with a system of equations in which optimal expenditure on each commodity
is a linear function of income and n prices, Klein and Rubin (1947-1948) showed
that requiring the system to be generated by income-constrained utility maximization
reduces the number of parameters to be estimated from ( n 1)2 to 2n - 1.+
The problem with this nexus between theory and empirical application is that
the estimation of demand and supply systems typically uses aggregate, or per capita,
*In addition, these matrices must have reduced rank hecause of homogeneity properties. The conditions
are most easily derived in the dual from the (easily proved) concavity and homogeneity of expenditure
functions and convexity and homogeneity of profit functions in prices.
I 77
I78 RUSSELL
ET AL.
data. In the case of aggregate supply functions of producers, this poses no problem so
long as all inputs are efficiently allocated, as is the case in competitive equilibrium
with no fixed inputs and no production externalities. In this case, profit maximiza-
tion by all producers on individual technology sets yields the same aggregate net
supply as profit maximization on the aggregate technology set, which is obtained
by simple summation of the individual technology sets. The equivalence of these
two optimization problems is salient in general-equilibrium theory and welfare eco-
nomics (Debreu 1954,1959) and has been elegantly illustrated and aptly referred to
by Koopmans (1957) as the “interchangeability of set summation and optimization.”
The essence of Koopman’s interchangeability principle is that boundary points of
the aggregate technology set are obtained as the sum of boundary points of the in-
dividual technology sets where the supports are equal; but this is equivalent to an
efficient allocation of net outputs, where the aggregate net output vector and the op-
timal vectors of each producer are supported by the same price vector. (The vector
summation of boundary points of individual technology sets with unequal supports
yields interior points of the aggregate technology set.)
From the standpoint of econometric applications (and other applications as
well, especially in macroeconomic theory and international trade), the beauty of the
aggregation result for profit-maximizing, price-taking producers is that no restric-
tions (other than those needed for the existence of an optimum) are required. (In
particular, convexity of technology sets is not required.) This means that there is no
loss of generality in positing the existence of a “representative producer,” which gen-
erates aggregate net-supply functions by maximizing aggregate profit subject to the
constraint that the aggregate net-supply vector be contained in the aggregate tech-
nology set. As a result, the Jacobian of the system of aggregate net supply functions
has the same properties as those of individual producers.
But aggregation on the consumer side is not so simple; the symmetry and neg-
ative semidefiniteness of the substitution matrix does not carry over to aggregate de-
mand systems. In fact, as shown by Debreu (1974), Mantel (1974), and Sonnenschein
(1973),* the only restrictions imposed on aggregate demand functions by individual
optimization are Walras’ law (simply the aggregate budget identity) and homogeneity
of degree zero (in prices and income for income-constrained consumers and in prices
for endowment-constrained consumers). This, in turn, implies that, without further
restrictions, the use of a representative agent in consumer theory is, a fortiori, un-
justified. Essentially, the reason for this discouraging result in the aggregation of
consumer demand systems is that aggregate demand depends on the arbitrary dis-
tribution of incomes or endowments. In fact, if consumer incomes were determined
*See also Mas-Colell and Neuefeind (1977). An excellent survey of these results can be found in Shafer
and Sonnenschein (1982).
ANALYSIS
OF DEMAND
AND SUPPLY 179
Section I1 closes with a brief discussion of the tests for the existence of a rep-
resentative agent, including the parametric tests of Christensen, Jorgenson, and Lau
(1975), and others (usingflexible functional forms like the translog) and the nonpara-
metric, nonstochastic (mathematical programming) tests of Varian (1982) and Diew-
ert and Parkan (1985).The power of these nonparametric tests has been assessed
by Bronars (1987) and Russell and Tengesdal (1996).Lewbel (1991) has provided
some evidence in favor of Muellbauer’s PIGLOG specification if we exclude incomes
in the tails of the distribution, but evidence against it if we include the tails.
The existence of a representative consumer, while necessary for many pur-
poses, is not necessary for the existence of aggregate demand functions that re-
quire less-than-complete information about the distribution of incomes. Lau (1977a,
1977b, 1982) and Gorman (1981) spelled out restrictions on individual demand
functions that are necessary and sufficient for the existence of an aggregate demand
function that depends on prices and summary statistics of the income distribution.
This weaker aggregation condition, discussed in Section III.A, is referred to as “exact
aggregation” and is a natural generalization of the (rank 2) Muellbauer conditions.
In his remarkable theorem, Gorman (1981)also showed that the requisite indi-
vidual demand systems are consistent with income-constrained utility maximization
(i.e., satisfy the integrability conditions) if and only if the Engel curves can be con-
tained in a three-dimensional subspace (for given prices)-that is, that the demand
systems have rank no greater than 3. His theorem, presented in Section III.B, also
completely characterizes the class of such functions, which encompasses virtually
all demand systems that have been estimated econometrically. Gorman’s theorem
has been extended and clarified in a series of papers by Heineke (1979, 1993)and
Heineke and Shefrin (1982, 1986, 1987, 1988).
Consumer attributes (like household size, geographical region, age of head,
etc.) have been incorporated into the exact aggregation framework by Lau (1977a,
1977b, 1982) and imp1emente.l Ilsing the translog specification, by Jorgenson, Lau,
and Stoker (1980).A related issde is the recovery of the parameters of individual
demand systems from the estimati, n of the aggregate demand system. The neces-
sary and sufficient conditions for ihis identification property have been developed
by Heineke and Shefrin (1990). This literature is briefly discussed in Section 1II.C.
Additional research by, for example, Jorgenson, Lau, and Stoker (1981), Pol-
lak and Wales (1978, 1980), Stoker (1984), Russell(1983), Jorgenson and Slesnick
(1984), Stoker (1986a, 1986b), Buse (1992), Blundell, Pashardes, and Weber (1993),
and Nicol(l994) have further developed and applied these ideas. A summary of this
literature is presented in Section III.D.*
*The notion of exact aggregation can also be applied to firms with individual characteristics pertaining
to the technology or to fixed inputs or outputs (see Appelbaum 1982, Borooah and Van Der Ploeg 1986,
Gourikroux 1990, Fortin 1991, and Chavas 1993).
OF DEMAND
ANALYSIS AND SUPPLY I 8I
Gorman’s results on the rank of demand systems have been extended through
a series of papers that have characterized well-known-and some lesser-known-
specifications. The key contributions have been made by Lewbel (1987, 1989b,
1990,1991).In a related study, Hausman, Newey, and Powell(l995) provide a cross-
sectional test of the Gorman rank-3 condition.
The rank-2 phenomenon shows up in other studies of aggregation (e.g., Jeri-
son’s (19Ma) results on painvise aggregation with a fixed income distribution) and
as one of the necessary and sufficient conditions for the weak axiom of revealed pref-
erence to hold in the aggregate (Freixas and Mas-Collel 1987). Curiously, the rank-2
condition also emerges in a study of a (proportional) budgeting procedure for an or-
ganization (Blackorby and Russell 1993). These disparate results, which seem to be
crying out for a unifying general theorem, are discussed in Section IV.
The second approach to dealing with the paucity of implications for aggre-
gate consumer demand systems of individual optimization is to restrict the distribu-
tion of incomes or preferences of the population. Hildenbrand (1983,1993), Hardle,
Hildenbrand, and Jerison (1991), and Grandmont (1992) seek additional restrictions
on aggregate demand implied by reasonable restrictions on the distribution of income
or preferences. An early precurser of these results is Becker (1962), who showed
that aggregate demand curves will be downward sloping even if individuals are “ir-
rationally” nonoptimizing, in the sense that they distribute their demand “randomly”
across the budget hyperplane (in particular, according to a rectangular distribution).
The intuition behind this result is fairly obvious: the Giffen paradox will occur only
if a sufficient number of individuals is concentrated in the Giffen portions of their
demand functions, but this will not happen if the distribution of consumers across
the budget plane is rectangular (more generally, “sufficiently” dispersed). This is
the motif for the results of Hildenbrand and others, who show that the Jacobian of
the aggregate demand system will be negative semidefinite (equivalently, that the
weak axiom of revealed preference holds in the aggregate), implying that demand
curves are downward sloping if the distribution of income is nonincreasing. Sim-
ilarly, Grandmont derives the same restriction on aggregate demands by assuming
that (neatly parameterized) preferences are sufficiently heterogeneous. These results
are surveyed in Section V.
As noted above, there is no aggregation problem on the production side of the
economy only if all inputs and outputs are efficiently allocated. If some inputs are
not efficiently allocated, aggregation is not so straightforward. Inefficient allocation
would occur, for example, if some inputs (e.g., capital) are fixed in the short run
(and perfect capital rental markets do not exist). If we take as given the distribution
of fixed inputs among firms, aggregation over the variable inputs is straightforward.
There has been, however, a persistent interest in the question of the existence of an
aggregate amount of fixed inputs, such as an aggregate capital stock that is fixed in
the short run. Interest in this area, sparked in part by the “Cambridge controversy,”
has centered not only on aggregation of fixed inputs across firms, but also across dif-
I82 RUSSELLET AL.
ferent fixed inputs. At the individual level, the existence of such aggregates is known
to be equivalent to certain separability conditions. (See the classic contributions by
Gorman 1959, 1968a and the subsequent expositions by Blackorby, Primont, and
Russell 1978, 1997.) Requiring the existence of such commodity aggregates at the
macro level, however, requires stronger restrictions on individual technologies.
This aggregation problem was, in fact, first posed by Klein (1946a, 1946b),
solved by Nataf (1948), and extended by Gorman (1953b). The Klein-Nataf aggrega-
tion problem assumes that no inputs are efficiently allocated and leads to a very un-
realistic (linear) structure for individual technologies. It was pointed out early, how-
ever, by May (1946) and Pu (1946), and emphasized later by Solow (1964), that the
efficient allocation of some inputs could be used to restrict the admissable allocations
and hence weaken the aggregation conditions for the fixed inputs. These conjectures
turned out to be correct, as rigorously shown by Gorman (1968b).* Blackorby and
Schworm (1984,1988a) provide comprehensive treatments of the problem of obtain-
ing aggregate inputs in aggregate technologies under different assumptions about the
existence of efficiently allocated and fixed inputs. The problem of the existence of ag-
gregate commodities also is relevant to the empirical analysis of consumer demand,
since most studies employ such aggregates; Blackorby and Schworm (1988b) pro-
vide necessary and sufficient conditions for the existence of commodity aggregates
in market demand functions.
These results on aggregation across both agents and commodities are surveyed
in Section VI. Section VII concludes.
B. Caveats
The aggregation literature surveyed in this chapter overlaps with an extensive liter-
ature on the specification of functional form, cross-section (Engel curve) estimation,
and other areas of interest to applied econometricians. While we unavoidably touch
on these subjects, we limit our discussions to our focus on the issue of aggrega-
tion over agents in econometric estimation. For excellent surveys of the literature
on econometric demand analysis, specification of functional form, and econometric
modeling of producer behavior see, respectively, Deaton (1986), Lau (1986), and
Jorgenson (1986).
Another topic that we do not cover is the aggregation of individual preferences
to obtain a social welfare function. There is, of course, a huge (social choice) litera-
ture on this topic, emanating from the classic impossibility theorem of Arrow (1951).
As noted above, aggregate data will be consistent with the existence of a represen-
tative consumer if commodities are allocated by maximizing a Bergson-Samuelson
*See also the series of papers by Fisher (1965, 1968a, 1968b, 1982, 1983).
ANALYSISOF DEMAND
AND SUPPLY I83
social welfare function. This fact, of course, is of little use in the econometric esti-
mation of demand systems in a market economy.
It is important to distinguish the notion of a representative agent that is useful
in econometric studies from the notion of a “representative consumer” commonly
employed in the macro/finance literature regarding the replication of a competi-
tive equilibrium (Constantinides 1982, Aiyagari 1985, Huang 1987, Eichenbaum,
Hansen, and Singleton 1988, and Vilks 1988a, 198833).This concept requires no re-
striction on preferences, which is not surprising, since it generates a representative
consumer only at equilibrium points, but not in even a neighborhood of prices and
incomes; as such, it is not a useful construct for econometric applications.*
Finally, despite our efforts to limit the scope of this survey, we discovered
that the literature on aggregation across agents is even more extensive than we had
expected (and is still burgeoning). Hence, our main focus has been to try to integrate
this large body of research (something that has not been done since the publication
of Green 1964);t we have made no serious attempt to critique it. Even with this
limited objective, we have not been as successful as we would have liked; it seems
to us that there is a need for a monograph on aggregation over agents-one that would
be accessible to econometricians as well as theorists.
*This characterization is not intended to demean these results: while they cannot be used in economet-
ric modeling, they are relevant to calibration exercises that attempt to simulate a single history of an
economy (cf., the large body of calibration stimulated by Kydland and Prescott 1982).
7 See, however, van Daal and Merkies (1984),who cover some of‘the topics addressed in the survey.
$Notation: A =: B or B := A means B is defined by A.
I84 RUSSELLET AL.
Theorem. Assume that d h ( y , y h ) > 0for all h.* The necessary and suficient
restriction on the individual demand functions, d", h = 1 , . . . , H , f o r additive ag-
gregation, (3),to hold is that they take the form
h h
d h ( p , y h ) = &p> + B ( p > y h+ E h Vh
*If d ( p , y h ) is restricted to be nonnegative, but corner solutions are allowed, it is necessary that a h ( p )=
0 for all h in the following restriction.
TSketch of proof of necessity (assuming differentiahility): Substitute (2) into (3) and differentiate with
respect to each y h to obtain
This shows that, for all i, the derivative is identical for all h. Moreover, as the left-hand side (LHS) is
L
independent of y h for # h, the right-hand side (RHS) must be independent of y , which in turn implies
that the LHS is, in fact, independent of y h as well. Integration then yields (4).More generally (eschewing
differentiability), (3) is a system of Pexider equations, whose solution is (4);see Corollary 10 on page 43
of Aczel and Dhombres 1989.
$If one does not require that demands aggregate exactly, as in (3), but only that the expected value of
zh be independent of the distribution of y h , then it is sufficient that the fi coefficients be distributed
independently of xh.
OF DEMAND
ANALYSIS AND SUPPLY I85
The result in the above theorem has been known for a long time; see, for ex-
ample, the papers culminating in Theil (1954) and reviewed by Green (1964). Gor-
man's (1953a) contribution is to characterize this aggregation problem in terms of
the restrictions on consumer preferences.* Formally, the problem is posed as fol-
lows. Roy's identity, combined with (4), yields the following system of (nonlinear)
partial differential equations for consumer h.
where V h is the indirect utility function. Integration of this system yields the required
structure of preferences:?
Theorem (Gorman 19530). Assume interior solutions to the individual opti-
mization problems.$ The aggregation condition ( 3 ) holds $ and only fz consumer
preferences can be represented by expenditure functions with structure
E h ( u ,p ) = n ( p ) u h + Ah(p) Vh (8)
or, equivalently, by indirect utility functions with the structures
*Antonelli (1886) seems to have been the first to notice that homothetic and identical preferences are
necessary and sufficient for consistent aggregation if we require that the conditions hold globally. Re-
quiring that they hold only in a neighborhood, as in Gorman (1953a, l%l),yields a richer structure of
preferences, allowing preference heterogeneity.
?This integration problem can be simplified, since affinity of the ordinary (Marshallian) demand functions
in income implies affinity of the constant-utility (income compensated, or Hicksian) demand functions
in (a particular normalization of) the utility variable:
s h ( u ,p ) = G h ( p ) + B(p>uh Vh
Integration of this system of differential equations yields the consumer expenditure functions given by
(8) below.
$This restriction is inconsequential in most empirical studies. If the restriction is eschewed, the necessary
and sufficient conditions for aggregation is homotheticity of preferences (see Samuelson 1956), in which
case Ah( p ) = 0 in (8) and (9) and, as noted earlier, ah( p ) = 0 for all h in (4)or (6).
§This structure presumes a particular normalization of the utility representation; see the discussion to
come.
I86 RUSSELLET AL.
where nhand Ah are concave and homogeneous of degree 1 (hence, rhis homoge-
neous of degree 0).
The salient feature of (8)or (9)is that they completely characterize, in the dual,
the preferences that generate demands with the affine structure in the one idiosyn-
chratic variable y h . The marginal propensity to consume the ith commodity is
Thus, the ICCs are linear but not necessarily parallel for different prices (although,
or course, they cannot intersect). Consumption bundles on the base (zero utility)
indifference surface are given by
Note that preferences are well defined for all consumption bundles on or above the
base indifference surface, but may not be well defined below this surface. This is so
because (8)represents consumer preferences only if it is concave in prices. But when
U < 0, the first term in (8)is convex in prices and for sufficiently small values of U
the convexity of this term will dominate the concavity of the second term, violating
the fundamental regularity condition for consumer expenditure functions. Of course,
OF DEMAND
ANALYSIS AND SUPPLY I87
if the base indifference surface does not intersect the positive orthant, preferences
are well defined globally.
The structure (8) has an evocative interpretation. The “intercept” term (p) rh
can be interpreted as the “fixed cost” of obtaining the base indifference surface-or
the base utility level (normalized to be zero)-and I 7( p ) is the marginal price of the
composite commodity “utility.”
The individual structure of consumer preferences that are necessary and suffi-
cient for Gorman aggregation is often referred to as the “Gorman polar form” (GPF),
following Blackorby, Primont, and Russell(1978), but Gorman refers to it as “quasi-
homotheticity,” since it is a generalization of homotheticity. By Shephard’s (1970)
decomposition theorem, homotheticity is characterized in the dual by
E h ( u , p ) = rI(p)uh
in which case the base indifference surface degenerates to a single point-the ori-
gin-and all income consumption curves are rays. An intermediate special case is
“affine homotheticity,” generated by
not necessarily the origin. A prominent example of affine homotheticity is the Stone-
Geary structure, generated by (16) and
The direct utility function dual to this structure is a Cobb-Douglas function in affine
transformations of the consumption quantities,
*The Stone-Geary structure evolved as the solution to a classic demand-system integrability problem.
Klein and Rubin ( 1 9 4 7 4 ) showed that the unconstrained linear expenditure system
p d , ( y , P) = BlY + caI,P, vi
J
E,
is integrable if and only if the following parameter restrictions hold: B, > 0 V i ; = 1; a,, = y,B1
for j # i; and all = y , ( l - B,) V i , in which case the demand system simplifies to (19). Later, Geary
(1950), using Roy’s identity, solved the partial integration problem to obtain the form of the utility func-
tion and Stone (1954) implemented the system hy estimating the parameters using British consumption
data.
I88 RUSSELL
ET AL.
h h h
has the structure of an expenditure function in aggregate utility (since ll and A have
the requisite homogeneity, monotonicity, and curvature properties); indeed, (21) has
the same (GPF) structure as the individual expenditure functions in (8). Correspond-
ing to (21) is the aggregate indirect utility function
which, of course, has the same structure as the individual utility functions (9). Fi-
nally, the application of Roy’s Identity to (22) yields the aggregate demand system
*The Stone-Geary system has been generalized in other directions. Howe, Pollak, and Wales (1979),with
some amendments by van Daal and Merkies (1989),constrained a system that is quadratic in income to
satisfy integrability conditions; their structure is a generalization of the Gorrnan polar form (and hence
does not satisfy his aggregation condition).
ANALBIS OF DEMAND
AND SUPPLY I89
which, with y = Eh
yh, is identical to the aggregate demand system (ll),obtained
by direct aggregation of the individual demand systems. Thus, the aggregate demand
system is rationalized by a utility function defined on aggregate commodity quanti-
ties, x = Ehd . In other words, the aggregate demand system is generated by an
optimizing “representative agent,” with utility U = Ehuh and aggregate income
y = Ehyh. Thus, the econometric estimation of any aggregate demand system with
the structure (23 ) is consistent with individual optimization and aggregation of the
individual demand systems.
The insight in Gorman’s theorem can be further elucidated by a comparison of
the representative-agent phenomenon in production theory and in consumer theory.*
Producers, labeled f = 1 , . . . , F , are assumed to maximize profit, p - zf, on a
technology set, Tf R”,defined as the set of technologically feasible net-output
vectors. The aggregate technology set is T := Ef
Tf.t The net-supply functions,
#f, f = 1 , . . . , F , are defined by
= argmax,j {p . zf I zf E 1 vf
~ . f (24)
*Although the discussion is couched in the simplest possible terms, the producer case could be elabo-
rated upon to encompass intertemporal production technologies (see, however, Blackorby and Sehworm
1982), and the consumer case can be adapted to the study of labor supply, asset demand, and other
problems characterized by optimization subject to a preference ordering and a budget constraint (with
fixed endowments).
?This construction, of course, assumes that there are no externalities in production.
$See Section VI.
I90 RUSSELLET AL.
;h = argminp { p * xh I xh E N h (ih)}
Vh
where
N h ( i h ) = {xh E R; 1 U h ( x h )>_ Zlh(ih)}
is the no-worse-than-; set of consumer h. The community level set (the Scitovsky
1942 set), the set of aggregate consumption bundles that can be distributed to the
H consumers in ways that place each consumer in the individual no-worse-than-;
set, is given byt
N ” 2 , . . . ,*xH ) = Nh(2)
h
’,
(The boundary of N s ( i . . . , ih)
is a community (Scitovsky) indifference surface.)
Note that, by the interchangeability of set summation and optimization,
or, equivalently,
B. Muellbauer Aggregation
The necessary conditions for Gorman aggregation are quite stringent. This fact in-
spired Muellbauer's (1975,1976)formulation of a weaker form of aggregation-and,
concomitantly, a weaker representative-agent concept. The two fundamental differ-
ences between Gorman aggregation and Muellbauer aggregation are that (i) Muell-
bauer requires consistent aggregation of expenditure shares rather than commodity
demands and (ii) the aggregate income of Muellbauer's representative agent is a pos-
sibly nonlinear function of individual incomes, rather than a simple sum, and can
depend on prices as well as the distribution of income.
Individual commodity shares are defined by
Thus, aggregate shares, like the aggregate demands in (2), depend in general on
the distribution of income and prices. Muellbauer aggregation holds if there exists a
function Y such that
This condition is nontrivial because the function Y is identical for all n aggregate
demand shares. Thus, if Muellbauer aggregation holds, aggregate commodity shares
I92 RUSELLET AL.
'
depend on a common scalar Y (y , . . . , y H ,p ) , interpreted as "aggregate income,"
and the commodity prices.
It is easy to see that Muellbauer aggregation subsumes Gorman aggregation as
a special case. The (GPF) demand system (10)yields the following system of share
equations:
Eh(& p ) = q h ( u h ,I Y p ) ) n ( p ) Vh
in which case the indirect utilityfunctions a re
Vh
*This structure is necessary as well as sufficient if we add to (40) a function of prices, A h ( p ) ,that van-
ishes when we sum over households: E h A h ( p ) = 0. While this possibility is formally required for the
structure to be necessary as well as sufficient, it is uninteresting-a nuisance term. It does not make a
lot of sense for preferences to satisfy a condition like this. For one thing, it means that individual pref-
erences would have to depend on the number of households H ; otherwise, the sum of these nuisance
terms would not vanish when we changed H . For these reasons, we carry out the analysis of Muellbauer
aggregation ignoring this term (as has been the case in subsequent studies building on Muellbauer's
ideas). (See, however, Blackorby, Davidson, and Schworm 1993 for a different approach.) Note also that
the functions in these representations must satisfy certain homogeneity conditions, but because of the
ways in which they enter the expenditure and indirect utility functions, there are degrees of freedom in
choosing the degrees of homogeneity.
OF DEMAND
ANALYSIS AND SUPPLY 193
The subscripts on 0 indicate differentiation with respect to the first or second term,
7 and A indicate differentiation with respect to the indicated
and the subscripts on l
price variable. Summation yields
H
= W(yl, * * * 7 y 7 p),p )
where Y ( p ,y l , . . . , y H )is implicitly defined by
q h ( u h ,U p ) ) = $h(uh> +U p ) (48)
in (40) yields the Gorman polar form. Because Muellbauer’s structure is a straight-
forward generalization of Gorman’s “linear” (actually affine) structure, he refers to
(40)or (41) as “generalized linearity.’’
The system of aggregate share equations, along with the implicit definition of
aggregate income in (47)’ is rationalized by an aggregate utility function with the
structure
m u , p ) = q t w r(p)>n(p> (50)
That is, ratios of marginal effects of income changes on shares are independent of
income levels for both individuals and the representative consumer.
Muellbauer’s generalized linearity is further explicated by rewriting the share
equations in matrix notation:
U = U ( u ’ ,. . . , U h , p ) (54)
Gorman first noted that Muellbauer aggregation is trivially satisfied if there are only
two commodities, since determining one aggregate share determines the other and
U therefore is implicitly defined by, e.g.,
This fact is not surprising, since in the two-commodity case there is no integrability
problem. More importantly, it suggests that the solution to Muellbauer’s aggregation
problem might be characterized by the existence of two aggregate pseudocommodi-
ties, and in fact this is an evocative interpretation of the solution given by (40), in
which r(p) and n(p) are interpreted as the “prices,” or “unit costs,” of two “in-
termediate commodities” in the production of utility. A complete explication of this
interpretation requires quite a bit of duality theory, so we refer the reader to Gorman’s
paper for further study. (Compare this interpretation, however, to the interpretation
of the Gorman polar form expenditure function (8).)
Gorman also pointed out a problem with the Muellbauer representative agent:
there is no requirement that Y in (36) or (47) be increasing (or even nondecreasing)
in individual incomes-equivalently, that U defined by (53) be increasing in indi-
vidual utility levels. Thus, the income or utility of the representative agent could be
increasing when the incomes or utilities of the consumers it represents are declining.
This seems to be a consequence of requiring only that the representative consumer
replicate the aggregate shares, but not aggregate demands.
Muellbauer-as well as most follow-up studies-focused on the special case
where Y ( y ’ , . . . , y H ,p ) is independent of p . In this case, u h ( y h ,p ) in (47) must be
independent of p for all h (for arbitrary distributions of incomes, y’, . . . , y h ) :
*To avoid confusion, where the income variable is raised to a power, we move the household indicator
index h to the subscript position.
I96 RUSSELLET AL.
or
*See, however, Pollak and Wales (1979) and Shapiro and Braithwait (1979) for direct tests of the GPF
structure. which doesn’t fare at all well.
OF DEMAND
ANALYSIS AND SUPPLY 197
the approach is to estimate the derived demand system with and without symme-
try of these coefficients and use a maximum likelihood ratio or Wald test to test for
symmetry.* There is by now a large literature on this approach, inspired in part by
the competition between proponents of alternative flexible-form specifications, and
we make no attempt to survey it. Suffice it to say that the symmetry conditions are
commonly rejected.
Of course, the weakness of these parametric tests is that rejection can be at-
tributable not to the failure of consistent aggregation to hold, but rather because
the preferences of the representative agent are misspecified. Although, by definition
and design, flexible functional forms can provide a second-order approximation to
an arbitrary “true” specification at a point, they are in fact employed as global spec-
ifications in tests of symmetry (and other properties). The second-order parameters
are estimated from data over the entire sample space.?
The alternative approach, which eschews specification of functional form, is
based on the revealed preference approach to testing the consumer-optimization hy-
pothesis, formulated by Samuelson (1938, 1946-47) and Houthakker (1950) and
first implemented empirically by Houthakker (1963)and Koo (1963). But the mod-
ern formulation has its roots in a remarkable theorem of Afriat (1967, 1972) (which
was nicely elucidated by Diewert 1973). Varian (1982) applied Afriat’s method to
annual U.S. data on nine consumption categories from 1947 to 1978. Remarkably,
he found no violations of the revealed preference axioms. As the data are aggre-
gate U.S. consumption quantities and prices, this result apparently provides strong
support for the existence of a Gorman representative consumer.
Varian raised the possibility, however, that these tests may be plagued by low
power, given the data with which economists have to work. In particular, they will
have low power if the variation in total expenditure over time is large relative to the
variation in relative prices. (If, for example, the budget hyperplanes do not intersect,
the tests have zero power.)
Varian’s concerns were apparently allayed by Bronars (1987): in a Monte Carlo
assessment of the power of Varian’s test against the alternative hypothesis of Becker-
type irrational behavior, he found that these tests have a considerable amount of
power, especially when per capita data are used. Roughly speaking, Becker’s no-
tion of irrational behavior entails a (uniformly) random distribution of consumption
quantities across the budget hyperplane. To implement this notion as an alternative
hypothesis, Bronars took Varian’s price and total expenditure data as given, replaced
the consumption quantities with randomly constructed values (using three different
*Asymmetry of these second-order coefficients is equivalent to violation of Young’s Theorem on the sym-
metry of second-order cross derivatives.
?See White (1980)for a penetrating analysis of the problems with interpreting tests using flexible forms
as tests at “the” point of approximation.
I98 RUSSELLET AL.
algorithms), and applied Varian’s GARP test, alternatively using per capita and ag-
gregate data. He repeated this exercise a large number of times to obtain a measure
of the power of Varian’s test against the Becker alternative hypothesis. The upshot of
his results is that Varian’s test has a substantial amount of power against the Becker
irrationality hypothesis.
Bronars noted that his alternative hypothesis is “rather naive” (page 697). It
is not clear what type of individual behavior, if any, together with aggregation across
consumers, would yield a (uniformly) random allocation of aggregate consumption
quantities across the budget plane. Recently, Russell and Tengesdal (1996) raise
the question of whether Varian’s tests have substantial power against a less naive
alternative hypothesis, predicated on the fact that the null hypothesis of Varian’s
test, given the aggregate data employed, is a compound one, entailing both individual
rationality and aggregation consistency.
Using Monte Carlo methods analogous to Bronars’, Russell and Tengesdal
use actual price and aggregate total expenditure data, but generate aggregate con-
sumption quantity data by (1) specifying heterogeneous individual utility functions
(that do not satisfy Gorman aggregation)* and a distribution of total expenditures
among individuals, (2) generating optimal individual demands, given the prices and
the preferences and total expenditures of individuals, and (3) aggregating demands
across individuals. Using Varian’s algorithm to calculate power indices in terms of
the percentage of the simulations with GARP violations, they find few violations,
despite a large number of sensitivity tests. These results suggest that Varian’s tests
using aggregate U S . consumption data lack power and hence provide little support
for the existence of a Gorman representative consumer.
A recent paper by Fleissig, Hall, and Seater (1994)reinforces this point. They
find that Varian’s GARP tests are violated using both monthly and quarterly US.
consumption data. Another suggestion that these nonparametric (nonstochastic) tests
lack power is the surprising result of Manser and McDonald (1988)’in which it is
shown that the Afriat-Varian tests fail to reject the hypothesis of a homothetic utility
function rationalizing aggregate US. consumption data. But the concomitant impli-
cation of constant budget shares-assuming strictly convex indifference surfaces-
is easily rejected statistically. As Manser and McDonald point out, these two out-
comes are explained by the fact that the rationalizing utility function in Afriat’s the-
orem is piecewise linear, as are the indifference surfaces, so that substantial changes
in budget shares can be consistent with homotheticity.
*In particular, they employ the Stone-Geary specification described in Section II.A, but with heteroge-
neous pi parameters. Gorman aggregation holds for this specification if and only if these parameters are
identical for all consumers. This specification allows a simple parameterization of consumer heterogene-
ity and hence the “degree” of aggregation inconsistency, ranging all the way to “maximal” heterogeneity
(a uniform distribution over the [0,11interval or, alternatively, preferences in which each consumer has
just one positive pi-equal of course to 1).
ANALYSIS OF DEMAND
AND SUPPLY 199
= B ; ( p ) / p ; ,for all i,
200 RUSSELLET AL.
for all h. The two cases in (64) generate the PIGL and PIGLOG systems, in which
case individual demands as well as aggregate demands have rank 2 and aggregate
demands are given by
or
Thus, in this special case of Muellbauer aggregation, we need information about the
second as well as the first moment of the distribution to determine aggregate demand.
But knowledge of these two statistics allows us to determine aggregate demand ex-
actly, no matter how many individuals are in the economy.
This characterization of Muellbauer's results raises the question of whether
it generalizes to cases where there are more than two aggregator (0,) functions and
where the forms of these functions are arbitrary. rl iis is the motivation for the notion
of exact aggregation, defined by the existence of I' symmetric* functions of individ-
ual incomes, say O,(yl, . . . , y H ), t = 1, . . . , T , where T -= H , such that?
*By symmetry, we mean anonymity-that is, that the values of the aggregator functions are unaffected by
an arbitrary permutation of the y h variables. Without this condition, calculations would be intractible
for large H .
?Obviously, if T 2 H , there is no economy of information and this structure imposes no restrictions on
the individual demand functions.
ANALYSIS
OF DEMAND
AND SUPPLY 20 I
Theorem (Lau I 982, Gorman I 98 I). Given certain regularity cbzditions,? the
aggregate demand functions satisfy (69) if and only if the individual demand func-
tions have thefollowing form:
t= 1
1=1
Thus, for exact aggregation to hold, individual demand functions must be affine in
functions of income and identical across consumers up to the addition of a function
of prices that is independent of income. Moreover, the identical part of the function
must be multiplicatively separable into a function of prices and a function of income.
Finally, the symmetric functions, O , , t = 1, . . . , T , in the aggregate demand system
must be additive functions of (identical) transformations, ct,
of individual incomes.
Note t at exact aggregation, unlike Muellbauer (hence Gorman) aggregation,
does not irn d y the existence of a representative consumer; that is, without additional
restriction.,. the aggregate demand functions (71) cannot be generated by the max-
imizatior. 2f an aggregate utility function subject to an aggregate budget constraint.
Thus, exact aggregation is weaker than the existence of a representative consumer.
*Gorman uses this aggregation condition as motivation for his theorem on the maximal rank of demand
systems, discussed below. He first presented the paper at a London School of Economics workshop in
January 1977.
?See Lau (1982) and Heineke and Shefrin (1988) for the particulars.
202 RUSSELLET AL.
Theorem (Gorman I 98 I). Ifthe complete system of demand equations (72) re-
flects well-behaved preferences, then the rank of its coeficient matrix B ( p ) is at most 3.
Moreover, one of thefollowing must hold (where I+ is the set of nonnegative integers):*
Note that the Muellbauer’s PIGLOG and PIGL systems are generated as spe-
cial cases of (i) or (ii), with R = 2 (hence rank 2). PIGLOG is a special case of (i)
with ~1 = 1 and ~2 = 0. PIGL is a special case of (ii) with p2 = 0. Similarly, the
Gorman polar form (quasi-homotheticity) is a special case of (ii) with R = 2 (rank
2), p1 = -1, and p2 = 0. The rank-3 quadratic expenditure system is obtained by
setting p1 = -1, p2 = 0, p3 = 1 in (ii). Homotheticity is rank 1 with p1 = -1
in (ii). In fact, virtually all consumer demand systems that have been estimated be-
long to the class of rank-2 demand systems satisfying the conditions of Gorman’s
theorem. Section IV contains more on rank-2 (and rank-3) demand systems. The key
point here, however, is that exact aggregation is considerably more general than the
existence of a representative agent; the latter is but one way of specifying demand
systems that can be consistently aggregated.
C. Household Attributes
In the Lau-Gorman exact aggregation theorem, heterogeneity of preferences enters
only through the term a h ( p ) in (70), which is independent of income. Additional
heterogeneity can be introduced into the exact aggregation framework by incorpo-
rating an attribute vector ah into B;,(p) and c r ( y h ) ,for each r. (Commonly used
attributes include household size, age of household head, race, and geographical
*As noted by Gorman, the consumption space can divide into subsets with different forms among (i)-(iii)
holding over different regions. Heineke and Shefrin (1987) have shown that, if we require the following
specifications to hold globally, bountledness of budget shares implies that the only integrable system is
the homothetic (rank 1)specification. See Heineke (1979, 1993)and Heineke and Shefrin (1982, 1986,
1987, 1988, 1990) for extensions and clarifications of Gorman’s theorem.
ANALYSIS
OF DEMAND
AND SUPPLY 203
region.) For the sake of symmetry, assume that the heterogeneity of ah is also cap-
tured by these attributes, so that*
r= 1
h r= 1 h
*As heterogeneity of preferences is incorporated entirely through the heterogenous attribute vectors, the
demand functions do not require an h superscript.
?More precisely, these functions separate into a finite sum of multiplicatively separable terms, so that the
number of terms in the following structure may he larger than R; see Heineke and Shefrin (1988),who
refer to this structure as the "finite basis property," for details.
204 RUSSELL
ET AL.
Thus, specifying functional forms for y i r ( p ) and gir(yh, a h ) in (77) and then sum-
ming across consumers of different types, defined by uh, results in an aggregate de-
mand system that is consistent with individual rationality and exact aggregation.
The question that immediately follows is: under what conditions do the pa-
rameters estimated from (77) allow us to recover (uniquely) the parameters of the
microfunctions in (75)? Given a joint distribution of income and attributes, expected
per capita demand is given by
Since the y i r ( p ) terms enter individual and aggregate demand functions in the same
way, the parameters in Yir(P) estimated from (77) will be the parameters in y i r ( P )
of (75). On the other hand, the term E(<ir(uh)gir(yh, a h ) ) in the aggregate demand
function may not include the parameters in the microeconomic demand function.
Heineke and Shefrin (1990) show that (75) is recoverable from the estimation of
(77) if
(5,.“) = Q ( p / y h ,a h )
where a! is an n-vector of parameters, Bppand Bba are (appropriately dimensioned)
(79)
- 1 y h In Yh
-
Q(P)
( a + Bpplnp - Bppe
E h
Eh Yh
+ Bpu-
Ch Yh
Note that (82) satisfies the conditions of exact aggregation; specifically, the
demand equations (given here in expenditure share form) are linear in the parameters
and the microequations (82) contain only parameters that appear in the aggregate
equation (83). The parameters in (83)can be estimated by using both cross-sectional
and time-series data. Thus, the individual expenditure shares can also be obtained
by replacing the unknown parameters in (82) with those estimated from (83).
As shown by Heineke and Shefrin (1990),imposing (81) is equivalent to im-
posing identification conditions; that is, the translog expenditure share system au-
tomatically possesses their restricted finite basis property. The JLS translog system
is in fact a restricted version of (76)(i). As Heineke and Shefrin (1990)have pointed
out, it restricts (76)(i) to being rank 2, when in fact the Gorman theorem allows the
system to be up to rank 3. Also, JLS conflate the rank-:! specification and the re-
quirement that there be only two functions of income and attributes. The restrictions
on rank do not imply that the number of income/attribute functions must be only as
great as the rank of the coefficient matrix Bh(p) in (72).
The JLS translog specification can also be shown to incorporate the following
restrictions on (76)(i):
*In estimation, the attributes considered by JLS can Le divided into discrete categories, allowing for the
use of dummy variables. This implies further restriction on the til functions.
206 RUSSELLET AL.
The translog system is thus a restricted version of (76)(i) and a much wider
class of similar demand functions is in fact compatible with exact aggregation.
D. Empirical Applications
Empirical applications of the theory of exact aggregation are based primarily on
the work of Jorgenson, Lau, and Stoker (1980, 1982).* Their econometric model
incorporates household attributes into the translog indirect utility function with a
set of 18 dummy variables for attributes like family size, age of head, region of res-
idence, race, and type of residence. They impose restrictions on the parameters of
the share equations such that conditions for exact aggregation hold and aggregate
across individuals to obtain the aggregate share demand functions. They estimate
this aggregate demand system by pooling cross-sectional data on expenditures by
individual households and aggregate annual time-series data on expenditure, price
levels, and statistics of the distribution of family income and demographic vari-
ables. In addition to estimating the model, they analyze the welfare changes from oil
price shocks.
Jorgenson, Lau, and Stoker use dummy variables and simple linear equa-
tions for the incorporation of demographic attributes. This method remains the most
commonly used one in applications. On the other hand, Pollak and Wales (1978,
1980) propose two alternative methods for inclusion of demographic variables: de-
mographic translating and demographic scaling. In both demographic translating
and demographic scaling, new parameters are introduced to capture the effects of
demographic heterogeneity. For the demographic translating procedure, n “translat-
ing parameters,” tl,T Z ,. . . , T,,,are added to the model and the original demand
function d? is modified as follows:?
where
*See also Jorgenson, Lau, and Stoker (1981), Stoker (1986a), Jorgenson and Slesnick (1984), and Nicol
(1994).
?This demand is assumed to be derived 1)y utility maximization. The authors investigated the LES, QES,
basic translog (i.e., the translog of Christensen, Jorgenson, and Lau 1975), and generalized translog
demand systems. The primary empirical rrsult of their work is that both the number and age of children
in a household have significant effects on consumption patterns.
OF DEMAND
ANALYSIS AND SUPPLY 207
& ( p , y h , a h ) = ai d h; ( p l a i , p n a z , . . . , p n a n ) Vi Vh
* (87)
where
i j
Note that, in (84) and (87), t i and oi are the only parameters in the demand system
that depend on demographic variables. Unfortunately, as Pollak and Wales point out,
their aggregation procedure, unlike that of Jorgenson, Lau, and Stoker, is inappropri-
ate. Moreover, there do not appear to be any applications of demographic translating
and demographic scaling to appropriately aggregable demand systems.
In a related development that accounts for serial correlation, Blundell, Pa-
shardes, and Weber (1993) incorporate household demographic attributes and time-
dependent variables (time-trend and seasonal dummy components) into demand sys-
tems. The aggregate models they derive from this specification seem to perform quite
well compared to models using microlevel data in both forecasting efficacy and eval-
uation of aggregate consequences of public policy changes.
In demand system estimation, most authors include some type of dynamic
specification (commonly an AR(1) process for the error terms) to account for habit
formation. Few, however, include forms that adjust for the distribution of incomes.
In an interesting paper, Stoker (1986b) compares an LES model with the habit for-
mation structure with one that does not have any dynamics, but instead allows for
distributional effects on demand. He shows that the distributional effect is statisti-
cally significant and that it can displace AR(1) dynamics in the widely used models.
His results were confirmed and extended by Buse (1992),who used Canadian rather
than U.S. data and estimated a quadratic expenditure systems (QES) model as well
as an LES model. He concluded that Stoker’s results are widely applicable. Stoker
(1993)provides further extension of this work.
To summarize, “exact” aggregation allows summary statistics of income and
attributes to be used to create aggregate functions that are consistent with aggre-
gation over individual demand functions generated by income-constrained utility
maximization. The conditions for existence of such functions and for their identifi-
ability have been briefly discussed here. Even if the conditions for the existence of
a Gorman representative agent do not hold, “exact” aggregation demonstrates the
conditions under which aggregate data may still be used for econometric estimation
when appropriate distributional information is available.
208 RUSSELLET AL.
r= 1
*The GPF is described in (lO), the PIGL and PIGLOG were characterized by Muellbauer (58),and the
QES was characterized by Pollak and Wales (64)with the intercept term c r i ( p ) appended). Lewbel in-
troduces three new demand systems: extended PIGL, extended PIGLOG, and LINLOG. The extensions
to PIGL and PIGLOG demands involve the addition of an additive constant term and, in the extended
PIGL case, the A i ( p ) term is multiplied by a differentiaMe function of prices. None of‘ these has been
used in empirical application.
OF DEMAND
ANALYSIS AND SUPPLY 209
where q is the vector of log prices and z is log income, we know that the rank of the
matrix of coefficients [ b r i ] is at most 3. Lewbel shows that if we replace @ r ( z )with
q r ( Z ) , where z is the log of dejlated income, then the system
has at most rank 4. In general, all the other properties that hold for Gorman demand
systems of the form (72) hold for the deflated demand systems of (91).If V ( q , z ) is the
indirect utility function corresponding to a Gorman system, then V ( q , @ ( Z ) ) , where
@ (2) is a nonzero, bounded, continuous differentiable function, is the indirect utility
function for the deflated demand system.
Lewbel(l991) constructed nonparametric tests for the rank of a demand sys-
tem and applied these tests to family expenditure survey data from the U.K. and con-
sumer expenditure surveys from the United States. To minimize income-correlated
demographic variation, Lewbel selected only a subset of households that were fairly
homogeneous in terms of household attributes. (Specifically, only married couples
with two children, where the head of household was employed full-time, were cho-
sen. Note that another approach would be to define rank as the space spanned by
function of “attributes” as well as income. This is the approach explored in Section
1II.C. Here, Lewbel only considered households in one cross section of attributes.)
Using this reduced data set, he found that, in the middle part of the distribution
the data are consistent with rank 2 and that the PIGLOG system gives a fairly
good fit. In the tails of the distribution (the lower and upper S%), however, he finds
that the data appear to be of higher rank and that rank-3 systems would be a
better fit.
Hausman, Newey, and Powell(1995) (HNP) also find that rank-3 demand sys-
tems fit the data better than rank-2 systems. They are primarily concerned with the
problem of estimating systems of demand when there are errors in variables. This
is a common problem with survey data dealing with income and expenditure. Using
the specification for shares,
where the Pi’s are constants since the estimation is done at only one price situation,
HNP estimate demands for five categories of goods. They compare the results from
OLS and instrumental variable techniques using a repeated observation as an in-
strument. They find that the inclusion of the quadratic term gives better estimates of
the demand system than a rank-2 specification. To test whether or not a rank-4 spec-
ification gives any additional information, HNP add another term and reestimate the
system. Thus, the specification becomes
If the demand system is rank 3, the ratio Bn//?2 should be constant. (In other words,
the addition of the last term provides no new information.) HNP find that, in fact,
this ratio is almost perfectly constant. Thus, Gorman’s rank condition seems to be
confirmed in the data for this specification.
All of these are rank-2 demand systems. Thus, even when the weaker condi-
tion of pairwise aggregation is considered, the rank-2 condition continues to hold.
Case (iii) is Muellbauer’s generalized linear demand structure.* Jerison’s aggre-
gation question, however, differs from Muellbauer’s in that (i) and (ii) are neither
stronger nor weaker than Muellbauer’s GL demands restricted to satisfy his aggre-
gation condition.
Freixas and Mas-Colell (1987) likewise find that the generalized linear cat-
egory of demands provides an important condition for the weak axiom of revealed
preference (WARP) to hold in the aggregate. Rather than requiring aggregate de-
mand to be rationalizable by a utility maximizing representative consumer, they ask
what conditions must be placed on Engel curves such that aggregate data satisfy
WARP. They assume that preferences are identical across consumers and they put
no restrictions on the income distribution.
Freixas and Mas-Colell introduce two conditions on Engel curves. The first
they call uniform curvature (UC). Uniform curvature essentially requires that goods
are either luxuries or necessities in all ranges of allowable income. No torsion (NT),
the other condition, requires that Engel curves lie in a plane through the origin; in
other words, Engel curves must have rank 2, or the generalized linear structure of
(42). They then prove the following theorem.
*Specifically, demands are analytic (i.e., demands can be represented by a power series arbitrarily closely
at any point) and the income distribution belongs to a convex, open set of income distributions confined
to the H - 1 unit simplex.
ISpecifically, they are analytic with i’H = 0 = i ’ H h and i’C = 0 = i’Ch (where i is the appropriately
dimensioned unit vector) for all consumers h = 1, . . . , U .
$Note, however, that Jerison’s conditions rule out the additive terms that sum to zero in the Muellbauer
necessity conditions (see the footnote to Muellhauer’s theorem on p. 192).
2 12 RUSSELL
ET AL.
wise aggregation is not sufficient for PIPB. Blackorby and Russell also discuss the
relationship of their results to those of Muellbauer for the more general proportional
budgeting (PB). They show that aggregate share functions have the Muellbauer PIGL
and PIGLOG structure in aggregate expenditure. However, these could not have been
derived by aggregating over individual PIGL or PIGLOG demand structures. The
aggregation rule that emerges from an economy practicing proportional budgeting
is linear. This implies a greater restriction than the results of Muellbauer (see Sec-
tion 11).
can be contrasted with the first approach in that they allow preferences to take any
form. Also, such approaches need not be based on any assumptions about individual
rationality or optimization.
The early result by Becker regarding the spread of agents across the budget hyper-
plane leads to the question of what restrictions on aggregate demand might result
from restricting the distribution of income while allowing preference heterogeneity
among agents. For the case where each consumer has a fixed share of total income,
Eisenberg (1961)showed that market demand could be generated by a multiplica-
tive social welfare function with exponential weights equal to each agent’s share of
aggregate income. Though preferences are allowed to differ between agents in this
formulation, they must be homothetic. Of course, any shift in the relative income
distribution would demand a new set of weights to rationalize the allocation.?
*As it turns out, Becker was incorrect in assuming that individual rationality gave downward-sloping
aggregate demand curves. The DSM result 20 years after his work demonstrated this.
?Chipman (1974) clarified this result and proved it in a slightly different manner. See also Shapiro (1977).
OF DEMAND
ANALYSIS AND SUPPLY 2 15
where p is the income distribution. A sufficient condition for the aggregate law of
demand is that the Jacobian matrix
[y]
be negative definite for all p. By the definition of (mean) market demand,
where 6 is the compensated demand function (identical for all consumers here),
is the mean Slutsky substitution matrix, and TT is the mean Slutsky income-effect
matrix. The T superscript denotes transpose. S is negative semidefinite, since all
the individual substitution matrices must be negative semidefinite. So the law of
demand reduces to the condition that M be positive definite. Hildenbrand shows that
if we restrict the distribution of income of agents such that the density of incomes
is monotone and nonincreasing, then all aggregate partial demand curves will be
downward-sloping:
( p - q ) ( D ( p )- 50 (100)
Notice that restrictions are placed on the distribution of income of agents,
while no restrictions are placed on the form of the preference function. Preferences
here are assumed to satisfy the weak axiom of revealed preference and to be identical
across consumers, but no assumptions are placed on the form of Engel curves.
The restriction that all consumers have identical preferences is certainly an
undesirable one. Let us assume that preferences belong to some allowable set, A .
And let demand vary between consumers, where individual demands are now
d Y P , Y) (101)
where a E A indexes consumers. Market demand is
P
where is the joint distribution of incomes and attributes. The mean Slutsky matrix
is now
~~~~
*See Hildenbrand (1989)and Hiirdle, Hildenbrand, and Jerison (1991)for technical details.
OF DEMAND
ANALYSIS AND SUPPLY 2 I7
Hildenbrand’s result does show that restricting the income distribution is, at some
level, analogous to restricting the shape of individual Engel curves. It leaves open
the possibility that some combination of restrictions can be shown to hold empirically
in the economy.
Jean-Michel Grandmont (1987) extends Hildenbrand’s result to a slightly
broader class of densities that allow for discontinuities and also allow for unbounded
densities, provided certain convergence conditions are met. Grandmont also shows
that a result similar to Hildenbrand’s can be derived by restricting the distribution
of preferences.
Hardle, Hildenbrand, and Jerism (1991) (HHJ) consider the aggregate data
and the question of whether market demand adheres to the law of demand. They
consider the mean demand function of (102). A sufficient condition of monotonicity
of the mean demand function is that s* be negative semidefinite and M* be positive
semidefinite. We know that S*will be negative semidefinite, since all the individual
substitution effects are negative. Thus, positive semidefiniteness of M* is sufficient
for the law of demand to hold in the aggregate.
Using the assumption of metonymy, HHJ are able to estimate the mean Slutsky
income-effect matrix. Utilizing the symmetricized matrix M ,
M=M*+M*T (104)
(which will only be positive definite when M* is), they estimate a related matrix,
A*,which is identical to M* under the metonymy assumption. They consider nine
commodity aggregates in U.K. expenditure data from 1969 through 1983 and show
that in all cases (for all years and for all commodities) the matrix fi* is positive
definite. Unfortunately, it is hard to judge the robustness of this result since their tests
rely on the eigenvalues of the estimated matrix, and there is no available distribution
theory for these estimated eigenvalues. Using bootstrapping techniques, they give
some idea that their results may be robust.
From nonparametric density estimates presented in their paper, HHJ clearly
show that the distribution of incomes is not decreasing. So decreasing density of
income is not driving the adherance to the law of demand observed in the data. HHJ
identify two possible explanations. One is that, at higher income levels, preferences
seem to be more heterogenous. In other words, Engel curves tend to spread out at
higher income levels. This reduces the possibility of pathological outcomes. It would
also seem to provide some corroboration of Lewbel’s results that at higher income
levels, rank-2 Engel curves do not describe demand adequately (see Section 1V.A).
Also, the slopes of cross-product curves for the estimated fi* matrix are fairly small
compared to the slopes of the own-product curves. Thus own-price effects seem to
be dominating.
Hardle and Hart (1992)develop the asymptotic theory necessary for testing the
positive definiteness of income-effect matrices using a bootstrapping approach. They
2 18 RUSSELL
ET AL.
x, = ea 8 x = ( e a l x l , . . . , eanx,) (105)
These transforms can then be used to generate a linear structure on preference rela-
tions and/or demand functions. They are affine, since each point in the commodity
space is transformed linearly to a new point based on the values of the a-vector. If
ai = a for every i, then the transformation is said to be homothetic, as each point in
the commodity space is transformed to a point on a ray passing through the original
point.
Grandmont considers one such equivalence class (i.e., one class of demand
functions that are all related to each other as a-transforms). Within that class, any
preference can be represented as an n-dimensional vector, a.If demands in the econ-
omy are all members of one such class, and if the distribution of a is flat enough, then
aggregate demand will be monotone and individual demand curves are downward-
sloping. Aggregate demand also obeys the weak axiom of revealed preference.
What is the meaning of this? The distribution of a represents the degree of
heterogeneity of preferences within any equivalence class. If the distribution of a is
not flat, then preferences are concentrated around some point and aggregate demand
may behave strangely depending on those individual preferences. If the distribution
of a is flat, then preferences are sufficiently heterogenous and aggregate demand is
well behaved.
These results do not depend on any restrictions on individual demands except
that they obey homogeneity and Walras' law. Demands are not even required to be
preference-driven. Thus, individual rationality, as in Becker's formulation, is simply
unnecessary for aggregate demand regularity.
Grandmont does not consider heterogeneity across equivalence classes. Nor
does he consider heterogeneity in both preference and income dimensions. These ar-
OF DEMAND
ANALYSIS AND SUPPLY 2 19
eas seem promising for future research. Given the results of Hildenbrand and Grand-
mont, it seems that some restrictions on heterogeneity across multiple dimensions
should also yield regularity of aggregate demand. Perhaps the unrealistic restric-
tions on the distribution of income (that the density must be nonincreasing) may be
weakened and combined with some restriction on the distribution of preferences to
yield the law of demand.
Conditional on the quantities of fixed inputs or outputs of all firms, the aggregate
technology set is then defined (assuming no externalities exist) by
T(z’, . .. ,z F ) = cf
Tf(2f)
U E T ( t ’ , .. ., z F ) G ( v , z ’ , . . . ,z F )5 0
*Parts of this section borrow shamelessly from the excellent editorial summaries of Gorman (1953b) (writ-
ten by Bill Schworm) and Gorman (1968b) (written by the editors) in Blackorby and Shorrocks (1995).
220 RUSSELL
ET AL.
Thus, the aggregate technology set and the aggregate production function depend
on the quantity of each fixed input held by each firm (or each fixed output). From an
econometric point of view, this is a nightmare. Thus, it is standard practice (explicitly
or implicitly) to aggregate over these fixed inputs or outputs, in order to reduce the
dimensions of estimation problems to manageable proportions.
The existence of an aggregate fixed input or output for an individual firm is
represented formally by the existence of functions, ef
and Zf such that
8. Klein Aggregation
Several ways of simplifying the structure in (108), taking full account of the aggre-
gation problem, have been proposed. The first suggestion was that of Klein (1946a,
1946b), who proposed an aggregation procedure predicated on the assumption that
none of the inputs or outputs is efficiently allocated. He posited multiple types of
labor and capital inputs producing a set of outputs in accordance with the following
production function:
G f ( x f ,n f , z f ) = 0 Vf (111)
where x f , n f , and zf are the vectors of output, labor, and capital quantities, respec-
tively, for firm f.To simplify the analysis of economy-wide production functions,
Klein proposed the existence of functions, c,
X , N , and 2 such that
e ( X ( x ' , . . . , xF), N ( n ' , . . . , n F ) ,Z(z', . . . , z F ) ) = 0
(112)
G f ( x f ,n f , zf) = 0 Vf
Thus, a problem of gargantuan proportions is reduced to one in which there are just
three variables in the economy's production function: aggregate output, aggregate
labor input, and aggregate capital input.
*The equivalence between separability and the functional structure in (109) was developed independently
by Leontief (1947a, 1947b) and Sono (1961) and subjected to a penetrating analysis by Gorman (1%8a).
For a comprehensive treatment and a recent survey, see Blackorby, Primont, and Russell(l978, 1997).
ANALYSIS
OF DEMAND
AND SUPPLY 22 1
The question posed by Klein was answered by Nataf (1948),who showed that
(112)holds if and only if each firm's production function can be written as
N(n1,. . . , nF) := N f ( n l ,. . . , 2 )
f
Z(t.1, . . . ,z F ) := ~ z q z '. .,. , 2 F )
s
and
e(X(x*,. . . , xF), N ( n l , .. . , n F ) ,Z ( z ' , . . . , z F ) )
:= X ( x 1 , . . . , xF) + N ( n 1 , . . . , nF) + Z ( Z 1 , . . . , 8 )
(116)
X f ( x f ) + N J ( n J )+ ZJ(2f) = 0 vj-
Gorman (1953b) noted that the Klein-Nataf structure is not very useful for
empirical implementation because the aggregator functions for output, labor, and
capital in (112)and (116)depend on the entire distribution of these variables among
firms. He suggested that, to be useful, these aggregate commodities should depend
only on the sum across firms of the component variables: i.e., that the aggregate
production function take the form
He then showed that the necessary and sufficient conditions for this aggregate pro-
duction structure is that the individual production equation have the affine form
*Hence, the variables are, in Gorman's (1968a) terminology, completely separable in Gf:in addition to
each group being separable, arbitrary unions of groups are separable (so that, e.g., technical rates of
substitution between labor and capital inputs are independent of output quantities).
222 RUSSELLET AL.
where a,p, and y are appropriately dimensioned vectors of parameters that are iden-
tical for all firms and Of is an arbitrary, idiosynchratic scalar for firm f.In this case,
the economy production equation is
f f
so that the Klein aggregates have the linear structures
and
The restrictions on individual technologies that are necessary and sufficient for the
existence of Klein aggregates are too demanding to be useful in empirical analysis.
Even if we allow the aggregates to depend on the entire distribution of component
variables, as in (112), the individual technologies are linear in the firm-specific ag-
gregates, and if we more realistically require that the economy aggregates depend
only on the sum of component variables, as in (1 17), the individual technologies are
linear in each output and input variable (implying, e.g., linear isoquants and linear
production possibility surfaces).
Subsequent literature on “capital” aggregation has focused on the unreason-
ableness of Klein’s requirement that the allocation of outputs and inputs be en-
tirely arbitrary. In fact, May (1946) and Pu (1946) immediately pointed out that
competitive-equilibrium conditions-in particular, profit maximization by firms-
might be exploited to weaken the requirements for the existence of composite com-
modities in aggregate production technologies. In particular, Solow (1964) showed
how the efficient allocation of labor inputs might facilitate the aggregation of (fixed)
OF DEMAND
ANALYSIS AND SUPPLY 223
capital inputs. His ingenious insight was to note that, under certain conditions, the
efficient allocation of inputs not being aggregated can render efficient an arbitrary
allocation of fixed inputs as well.
The ultimate-and naturally elegant-solution to the problem of aggregating
over fixed inputs in the aggregate technology and in the presence of efficiently allo-
cated inputs was presented by Gorman (1965, 1968b). The existence of efficiently
allocated inputs and outputs allowed him to exploit duality theory to solve the prob-
lem (as in the case of the representative consumer in Section 11). Given the tech-
nology represented in (106), the variable profit function* of firm f,denoted Pf,is
defined by
, := max{p. v f
~ f ( pzf) I ~ f ( v zf f, ) = 01
Vf
where p is the (competitive equilibrium) price vector for variable inputs and outputs.
For the economy, the variable profit function P is defined by
~ ( pzl,, . . . , z F ) = max{p. ‘u I~ ( vz l, , . . . , z F ) = 01
V
where the second identity follows from the existence of a “representative firm” con-
ditional on the allocation of fixed inputs (i.e., the interchangeability of set summation
and optimization), discussed in Section 1I.A. Note that, if outputs are fixed and inputs
are variable, P is the cost function of the firm, showing minimal cost as a function of
input prices and the fixed output quantities.
Theorem (Gorman I 968b). The aggregatorfunction in ( 110) exists $-and only
fi the individual variable projit functions (or cost functions) have the structure
=: n ( p ) Z ( z ’ ,. . . , z F ) + A(p)
The similarity between this structure and the structure of the expenditure
function that is necessary and sufficient for the existence of a representative con-
sumer in Section I1 is palpable. In fact, as nicely elucidated in the editorial summary
N(u1, . . . , u h ) = Nh(uh)
h
and
v . f E T . f ( z { ,. . . , Zi) t G f ( v . f, z ( , . . . , Z i ) = 0 Vf (132)
where z;f is the vector of fixed inputs of type r, r = 1, . . . , R , held by firm f.The
aggregate technology set then depends on the distribution of quantities of each type
of fixed input among firms and the question is when can the aggregate technology by
represented by the production equation
- 1 1 F R I
G(v, Z (zl,. . . , z1 ), . . . , Z ( z R , . . . ,z,”)) = 0 (133)
The answer, as elegantly shown by Gorman (1968b), is a straightforward generaliza-
tion of the earlier result: the variable profit functions of the individual firms must
be
and the variable profit function of the economy therefore has the structure
P ( p , Z’(Zi, . . . , zf), . . . , zqz;, . . . ,z,”))
where
Again, the similarity of this structure, which rationalizes the existence of mul-
tiple aggregate inputs in an economy-wide production function, with the structure of
exact aggregation of consumer demand functions with multiple income distribution
statistics, surveyed in Section 111, is palpable. In fact, using the envelope theorem to
derive the net-supply functions of variable netput vectors for firm f,we obtain
226 RUSSELLET AL.
which has a structure that is very similar to that analyzed by Gorman (1981).Aggre-
gate (vector valued) net-supply functions are then given by
E. Other Extensions
Econometric studies using aggregate data also commonly aggregate over variable in-
puts (e.g., different types of labor). Again, at the level of the individual firm the nec-
essary and sufficient condition for the existence of such aggregates is separability,
OF DEMAND
ANALYSIS AND SUPPLY 227
but at the aggregate level more is required. The necessary and sufficient conditions
for aggregation of efficiently allocated inputs are different from, but similar in many
respects to, the conditions for aggregation of fixed inputs. Moreover, as in the case of
fixed inputs, the conditions are less demanding if aggregation is required only over
a subset of the efficiently allocated inputs. In addition, aggregation of variable in-
puts is made more difficult if there exist fixed inputs in the technology. These issues
were analyzed in Gorman (1967, 1982a), and Blackorby and Schworm (1988a) pro-
vide an excellent survey and analysis of the various results obtained under different
assumptions about which subsets of efficiently allocated and fixed inputs are being
aggregated in the economy-wide technology.* To a large extent, the common theme
of affine structures runs through these results, but there are some surprises (which,
in the interest of space, are left to the reader to discover).
The use of common duality concepts in each of these aggregation problems en-
tails the use of strong convexity assumptions regarding the technology. The possible
inadequacy of this approach for long-run analysis led Gorman (1982b) to examine
aggregation conditions under constant returns to scale.
Joint aggregation over commodities and agents is also common in studies of
consumer behavior. As one would expect, the necessary and sufficient conditions,
as shown by Blackorby and Schworm (1988b), are stronger than those required for
either type of aggregation taken separately.
Finally, Gorman (1990)has more recently examined aggregation problems in
which the aggregate inputs for the economy depend on the quantities of all inputs,
perhaps because of externalities. These studies lead to structures that are quite simi-
lar to the affine structures above and, interestingly, reminiscent of the rank conditions
for Engel curves in Sections I11 and IV.
As Section 1.A summarizes the literature, our concluding remarks will be limited
to two observations. First, when we began research on this survey, we expected the
chapter to be about half theory and half econometrics; the reader has undoubtedly
noted, however, that the theory of aggregation dominates. Thus, the main lesson we
take from this endeavor is that, while the theory of aggregation is fairly well devel-
oped, econometric application is in its infancy. The principal empirical literature-
most notably Jorgenson, Lau, and Stoker (1980, 1981, 1982)-maintains exact ag-
gregation conditions, generates results on demographics and demand, and studies
various issues of welfare economics. Less well developed is the testing of restric-
tions on preferences and on the distribution of incomes or preferences that are nec-
*Also see Fisher (1965, 1968a, 1968b, 1%9, 1982, 1983) and Blackorby and Schworm (1984).
228 RUSSELL
ET AL.
essary or sufficient for various types of aggregation consistency. The most promising
avenues for research in this direction have been forged by Lewbel (1991), Hardle,
Hildenbrand, and Jerison (1991),and Hausman, Newey, and Powell (1995). But it
seems that the potential for testing aggregation conditions has barely been tapped.
Second, just as the theory of aggregation has dominated this literature, it should
be obvious to the reader that one person, Terence Gorman, has in turn dominated the
search for theoretical results. His towering achievements in this area are testimony
to his penetrating intellect. Moreover, while many of his papers are challenging read-
ing, he has always kept his eye on the potential for empirical application. One would
hope that applied econometricians would focus more on testing the aggregation the-
ories promulgated primarily by Gorman.
ACKNOWLEDGMENTS
We are deeply indebted to Chuck Blackorby and Tony Shorrocks for their masterful
editorial effort in making Terence Gorman’s works, many previously unpublished,
available in Volume I of the Collected Works of W M . Gorman; without this towering
volume, including the illuminating editorial introductions to Gorman’s papers, our
task would have been impossible. Nor would the chapter have been possible without
the investment in human capital accruing to the long collaboration between Blacko-
rby and Russell. We also thank Kusum Mundra for comments, and we are especially
grateful to Chuck Blackorby and Bill Schworm for reading the manuscript and for
many valuable suggestions. Needless to say, they are in no way responsible for any
remaining errors.
REFERENCES
Aczel, J. and J. Dhombres (1989), Functional Equations in Several Variables with Applications
to Mathematics, Znjormation Theory and to the Natural and Social Sciences, Cambridge
University Press, Cambridge.
Afriat, S. N. (1967), The Construction of a Utility Function From Expenditure Data, Znterna-
tional Economic Review, 8,460-472.
Afriat, S. N. (1972), Efficiency Estimation of Production Functions, International Economic
Revieal. 13, 568-598.
A iyagari, S. (1985), Observational Equivalence of the Overlapping Generations and Dis-
counted Dynamic Programming Frameworks for One-Sector Growth, Journal of Eco-
nomic Theory, 35,201-221.
Antonelli, G. B. (1886), Sulla Teoria Matematica della Economiu Politica, Pisa: nella ti-
pografia del Folchetto; English translation in Preferences, Utility, and Demand ( J . S.
Chipman, L. Hunvicz, M. K. Richter, and H. F. Sonnenschein, eds.), Harcourt Brace
Jovanovich, New York, 1971).
ANALYSIS
OF DEMAND
AND SUPPLY 229
Appelbaum, E. (1982), The Estimation of the Degree of Oligopoly Power, Journal of Econo-
metrics, 19, 287-299.
Arrow, K. J. (1951), Social Choice and Zndividual Values, Wiley, New York.
Banks, J., R. Blundell, and A. Lewbel(1993), Quadratic Engel Curves and Welfare Measure-
ment, Working Paper, Institute of Fiscal Studies, London.
Becker, G. S.(1962), Irrational Behavior and Economic Theory, Journal of Political Economy,
70, 1-13.
Blackorby, C., R. Boyce, and R. R. Russell(1978), Estimation of Demand Systems Generated
by the Gorman Polar Form: A Generalization of the S-Branch Utility Tree, Economet-
rica, 46, 345-364.
Blackorby, C., R. Davidson, and W. Schworm (1993), Economies with a Two-Sector Represen-
* tation, Economic Theory, 3, 717-734.
Blackorby, C., D. Primont, and R. R. Russell (1978), Duality, Separability, and Functional
Structure: Theory and Economic Applications, North-Holland/American Elsevier, New
York.
Blackorby, C., D. Primont, and R. R. Russell (1997), Separability, in Handbook of Utility
Theory (S.Barbara, P. J. Hammond, and C. Seidl, eds.), Kluwer, Boston.
Blackorhy, C. and R. R. Russell (1993), Samuelson’s “Shibboleth” Revisited: Proportional
Budgeting among Agents and Rank-Two Demand Systems, in Mathematical Modelling
in, Economics (W. E. Diewert, K. Spremann, and F. Stehling, eds.), Springer-Verlag,
Heidelberg, 3,546.
Blackorby, C. and R. R. Russell (1997), Two-Stage Budgeting: An Extension of Gorman’s
Theorem, Economic Theory, 9, 185-193.
Blackorby, C. and W. Schworm (1982), Aggregate Investment and Consistent Intertemporal
Technologies, Review of’Economic Studies, 49, 595-614.
Blackorby, C. and W. Schworm (1984), The Structure of Economies with Aggregate Measures
of Capital: A Complete Characterisation, Review of Economic Studies, 51,633-650.
Blackorby, C. and W. Schworm (1988a), The Existence of Input and Output Aggregates in
Aggregate Production Functions, Econometricn, S6,613-643.
Blackorby, C. and W. Schworm (1988b),Consistent Commodity Aggregates in Market Demand
Equations, in Measurement in Economics (W. Eichhorn, ed.), Physica-Verlag, Heidel-
berg, 577-606.
Blackorby, C. and W. Schworm (1993), The Implications of Additive Consumer Preferences
in a Multi-Consumer Economy, Review ofEconomic Studies, 60,209-228.
Blackorby, C. and A. F. Shorrocks (1995), Separability and Aggregation: Collected Works of
K M. Gorman, Vol. I, Clarendon Press, Oxford.
Blundell, R., P. Pashardes, and G. Weber (1993), What do We Learn About Consumer De-
mand Patterns from Micro Data? American Economic Review, 83, 570-597.
Borooah, V. K. and F. Van Der Ploeg (1986), Oligopoly Power in British Industry, Applied
Economics, 18,583-598.
Bronars, S.G. (1987), The Power of Nonparametric Tests of Preference Maximization, Econo-
metrica, 55,693-698.
Brown, M. and D. Heien (1972), The S-Branch Utility Tree: A Generalization of the Linear
Expenditure System, Econometrica, 40, 737-747.
Buse, A. (1992), Aggregation, Distribution and Dynamics in the Linear and Quadratic Expen-
diture Systems, Review of Economics and Statistics, 74, 45-53.
230 RUSSELLET AL.
Chavas, J.-P. (1993), On Aggregation and Its Implications for Aggregate Behavior, Ricerche
Economiche, 47,201-214.
Chipman, J. (1974), Homothetic Preferences and Aggregation, Journal of Economic Theory,
8,26-38.
Christensen, L., D. C. Jorgenson, and L. J. Lau (1975), Transcendental Logarithmic Utility
Functions, American Economic Review, 65,367-383.
Constantinides, G. (1982), Intertemporal Asset Pricing with Heterogeneous Consumers and
without Demand Aggregation, Journal of Business, 55,253-267.
Deaton, A. (1986), Demand Analysis, in Handbook of Mathematical Economics, Vol. 3 (K. J.
Arrow and M. D. Intriligator, eds.), North-Holland, Amsterdam, 1767-1839.
Deaton, A. S. and J. Muellbauer (1980), An Almost Ideal Demand System, American Eco-
nomic Review, 70,312-326.
Debreu, G. (1954),Valuation Equilibrium and Pareto Optimum, in Proceedings ofthe National
Academy of Sciences of the USA, 46,588-592.
Debreu, G. (1959), Theory of Value, Wiley, New York.
Debreu, G. (1974),Excess Demand Functions, Journal of Mathematical Economics, 1,15-21.
Diewert, W. E. (1971),An Application of the Shephard Duality Theorem: A Generalized Leon-
tief Production Function, Journal of Political Economy, 79,481407.
Diewert, W. E. (1973), Afriat and Revealed Preference Theory, Review of Economic Studies,
40,419-426.
Diewert, W. E. (1974), Applications of Duality Theory, in Frontiers of Quantitative Economics,
Vol. 2 (M. Intriligator and D. Kendrick, eds.), North-Holland, New York, 106-171.
Diewert, W. E. and C. Parkan (1985), Tests for the Consistency of Consumer Data, Journal of
Econometrics, 30, 127-147.
Eichenbaum, M., L. Hansen, and K. Singleton (1988), A Time Series Analysis of Representa-
tive Agent Models of Consumption and Leisure Choice Under Uncertainty, Quarterly
Journal of Economics, 103,Sl-78.
Eisenberg, B. (1961), Aggregation of Utility Functions, Management Science, 7,337-350.
Fisher, F. (1965), Embodied Technical Change and the Existence of a n Aggregate Capital
Stock, Review of Economic Studies, 32,263-288.
Fisher, F. (1968a),Embodied Technology and the Existence of Labour and Output Aggregates,
Review of Economic Studies, 35, 391-412.
Fisher, F. (1968b), Embodied Technology and Aggregation of Fixed and Movable Capital
Goods, Review of Economic Studies, 35, 41 7-428.
Fisher, F. (1969), The Existence of Aggregate Production Functions, Econometrica, 37, 553-
577.
Fisher, F. (1982), Aggregate Production Functions Revisited: The Mobility of Capital and the
Rigidity of Thought, Review ojEconomic Studies, 49,615-626.
Fisher, F. (1983), On the Simultaneous Existence of Full and Partial Capital Aggregates, Re-
view of Economic Studies, 50, 197-208.
Fleissig, A. R., A. R. Hall, and J. J. Seater (1994),GARP, Separability, and the Representative
Consumer, Working Paper, Department of Economics, North Carolina State University.
Fortin, N. M. (1991), Fonctions de Production et Biais d'Agrkgation, Annales d'kconomie et
de Statistigue, 20121, 41-68.
Freixas, X. and A. Mas-Colell (1987), Engel Curves Leading to the Weak Axiom in the Ag-
gregate, Econometrica, 55,5 15-53 1.
OF DEMAND
ANALYSIS AND SUPPLY 23 I
Hardle, W. and T. M. Stoker (1989), Investigating Smooth Multiple Regression by the Method
of Average Derivatives, Journal ofthe American Statistical Association, 84,986-995.
Hausman, J. A., W. K. Newey, and J. L. Powell(1995), Nonlinear Errors in Variables Estima-
tion of Some Engel Curves, Journal of Econometrics, 65,205-233.
Heineke, J. M. (1979), Exact Aggregation and Estimation, Economics Letters, 4, 157-162.
Heineke, J. M. (1993), Exact Aggregation and Consumer Demand Systems, Richerche Eco-
nomiche, 47,215-232.
Heineke, J. M. and H. M. Shefrin (1982), The Finite Basis Property and Exact Aggregation,
Economics Letters, 9, 209-213.
Heineke, J. M. and H. M. Shefrin (1986),On a n Implication of a Theorem Due to Gorman,
Economics Letters, 21, 321-323.
Heineke, J. M. and H. M. Shefrin (1987), On Some Global Properties of Gorman Class Demand
Systems, Economics Letters, 25, 155-160.
Heineke, J. M. and H. M. Shefrin (1988), Exact Aggregation and the Finite Basis Property,
International Economic Review, 29, 525-538.
Heineke, J. M. and H. M. Shefrin (1990), Aggregation and Identification in Consumer Demand
Systems, Journal of Econometrics, 44, 377-390.
Hildenbrand, W. (1983), On the Law of Demand, Econometrica, 51,997-1019.
Hildenbrand, W. (1989), Facts and Ideas in Microeconomic Theory, European Economic Re-
view, 33,251-276.
Hildenbrand, W. (1993), Market Demand: Theory and Empirical Evidence, Princeton Univer-
sity Press, Princeton.
Houthakker, H. S. (1950), Revealed Preference and the Utility Function, Economicu, 17,159-
174.
Houthhakker, H. S. (1963), Some Problems in the International Comparison of Consumption
Patterns, in Les Besoins de Biens de Consommation (R. Mosse, ed.), Centre National de
la Recherche Scientifique, Grenoble.
Howe, H., R. A. Pollak, and T. J. Wales (1979), Theory and Time Series Estimation of the
Quadratic Expenditure System, Econometrica, 47, 1231-1247.
Huang, C. (1987), An Intertemporal General Equilibrium Asset Pricing Model: The Case of
Diffusion Information, Econometrica, 55, 117-142.
Jerison, M. (1984a), Aggregation and Pairwise Aggregation of Demand When the Distribution
of Income Is Fixed, Journal of Economic Theory, 33, 1-31.
Jerison, M. (1984b), Social Welfare and the Unrepresentative Representative Consumer, manu-
script, SUNY Albany.
Jorgenson, D. W. (1986), Econometric Methods for Modeling Producer Behavior, in Handbook
oJ’Mathematica1 Economics, Vol. 3 (K. J. Arrow and M. D. Intriligator, eds.), North-
Holland, Amsterdam, 1841-191 5.
Jorgenson, D. W., L. J. Lau, and T. M. Stoker (1980), Welfare Comparison under Exact Aggre-
gation, American Economic Review, 70, 268-272,
Jorgenson, D. W., L. J. Lau, and T. M. Stoker (1981), Aggregate Consumer Behavior and In-
dividual Welfare, in Macroeconomic Analysis (D. Currie, R. Nobay, and D. Peel, eds.),
Croom-Helm, London, 35-61.
Jorgenson, D. W., L. J. Lau, and T. M. Stoker (1982), The Transcendental Logarithmic Model
of Aggregate Consumer Behavior, in Advances in Econometrics, Vol. 1( R . Baseman and
G. Rhodes, eds.), JAI Press, Greenwich, 97-238.
OF DEMAND
ANALYSIS A N D SUPPLY 233
Jorgenson, D. W. and D. T. Slesnick (1984), Aggregate Consumer Behaviour and the Measure-
ment of Inequality, Review of Economic Studies, 51, 369-392.
Kirman, A. (1992), Whom or What Does the Representative Individual Represent? Journal
of Economic Perspectives, 6, 117-136.
Kirman, A. and K. Koch (1986), Market Excess Demand Functions in Exchange Economies:
Identical Preferences and Collinear Endowments, Review of Economic Studies, 53,
457-463.
Klein, L. (1946a),Macroeconomics and the Theory of Rational Behaviour, Econometrica, 14,
93-108.
Klein, L. R. (1946b),Remarks on the Theory of Aggregation, Econometrica, 14,303-312.
Klein, L. R. and H. Rubin (1947-1948), A Constant Utility Index of the Cost of Living, Review
of Economic Studies, 15,84-87.
Koo, Y. C. (1963), An Empirical Test of Revealed Preference Theory, Econometrica, 31,646-
664.
Koopmans, T. C. (1957), Three Essays on the State of Economic Science, McGraw-Hill, New
York.
Kydland, F. E. and E. C. Prescott (1982), Time to Build and Aggregate Fluctuations, Econo-
metrica, 50, 1345-1370.
Lau, L. J . (1977a), Existence Conditions for Aggregate Demand Functions: The Case of a
Single Index, IMSSS Technical Report 248, Stanford University.
Lau, L. J. (1977b),Existence Conditions for Aggregate Demand Functions: The Case of Mul-
tiple Indexes, IMSSS Technical Report 249, Stanford University.
Lau, L. J. (1982), A Note on the Fundamental Theorem of Exact Aggregation, Economics Let-
ters, 9, 119-126.
Lau, L. J. (1986), Functional Forms in Econometric Model Building, in Hundbook ofEcono-
metrics, Vol. 111 (Z. Griliches and M. D. Intriligator, eds.), North-Holland, Amsterdam.
Leontief, W. W. (1947a), Introduction to a Theory of the Internal Structure of Functional Re-
lationships, Econometrica, 15, 361-373.
Leontief, W. W. (1947b), A Note on the Interrelation of Subsets of Independent Variables
of a Continuous function with Continuous First Derivatives, Bulletin ofthe American
,‘Mathematical Society, 5 3 , 343-350.
Lewbel, A. (1987), Characterizing Some Gorman Engel Curves, Econometrica, 55, 1451-
1459.
Lewbel, A. (1989a),Nesting the AIDS and Translog Demand Systems, Internationul Economic
Review, 30,349-356.
Lewbel, A. (1989b),A Demand System Rank Theorem, Econometrica, 57, 701-705.
Lewbel, A. (1990), Full Rank Demand Systems, International Economic Review, 31, 289-
300.
Lewbel, A. (1991), The Rank of Demand Systems: Theory and Nonparametric Estimation,
Econometrica, 59, 71 1-730.
Lewbel, A. (1994), An Examination of Werner Hildenbrand’s Market Demand, Journal of
Economic Literature, 32, 1832-1841.
Manser, M. E. and R. J. McDonald (1988), An Analysis of Substitution Bias in Measuring
Inflation, 1959-8.5, Econometricu, 56,909-930.
Mantel, R. (1974),On the Characterization of Aggregate Excess Demand, Journal ofEconomic
Theory, 7, 348-353.
234 RUSSELLET AL.
Sonnenschein, H. (1973), Do Walras' Identity and Continuity Characterize the Class of Com-
munity Excess Demand Functions? Journal of Economic Theory, 6,345-354.
Sonnenschein, H. (1974), Market Excess Demand Functions, Econometrica, 40, 549-563.
Sono, M. (1%1),The Effect of Price Changes on the Demand and Supply of Separable Goods,
International Economic Review, 2,239-27 1.
Stoker, T. M. (1984), Completeness, Distribution Restrictions, and the Form of Aggregate
Functions, Econometrica, 52,887-907.
Stoker, T. M. (1986a),The Distributional Welfare Effects of Rising Prices in the United States:
The 1970's Experience, American Economic Review, 76,335-349.
Stoker, T. M. (1986b), Simple Tests of Distributional Effects on Macroeconomic Equations,
Journal of Political Economy, 94,763-795.
Stoker, T. M. (1993), Empirical Approaches to the Problem of Aggregation over Individuals,
Journal of Economic Literature, 31, 1827-1874.
Stone, R. (1954), Linear Expenditure Systems and Demand Analysis: An Application to the
Pattern of British Demand, The Economic Journal, 64,511-527.
Theil, H. (1954),Linear Aggregation ofEconomic Relations, North-Holland, Amsterdam.
Van Daal, J. and A. H. Q. M. Merkies (1984),Aggregation in Economic Research, D. Reidel,
Dordrecht .
Van Daal, J. and A. H. Q. M. Merkies (1989), A Note on the Quadratic Expenditure Model,
Econometrica, 57, 1439-1443.
Varian, Hi. R. (1982), The Nonparametric Approach to Demand Analysis, Econometrica, 50,
945-973.
Varian, H. R. (1992),Microeconomic Analysis, W. W. Norton, New York.
Vilks, A. (1988a), Approximate Aggregation of Excess Demand Functions, Journal of Eco-
nomic Theory, 45,417-424.
Vilks, A. (1988b), Consistent Aggregation of a General Equilibrium Model, in Measurement
in Economics (Wolfgang Eichhorn, ed.), Physica-Verlag, Heidelberg, 691-703.
White, H. (1980), Using Least Squares to Approximate Unknown Regression Functions, In-
ternational Economic Review, 21, 149-170.
This page intentionally left blank
Spatial Dependence in Linear
Regression Models with an
Introduction to Spatial Econornetrics
Luc Anselin
West Virginia University, Morgantown, West Virginia
Anil K. Bera
University oflllinois, Charnpaign, lllinois
1. INTRODUCTION
Econometric theory and practice have been dominated by a focus on the time di-
mension. In stark contrast to the voluminous literature on serial dependence over
time (e.g., the extensive review in King 1987), there is scant attention paid to its
counterpart in cross-sectional data, spatial autocorrelation. For example, there is
no reference to the concept nor to its relevance in estimation or specification test-
ing in any of the commonly cited econometrics texts, such as Judge et al. (1982),
Greene (1993), or Poirier (1995), or even in more advanced ones, such as Fomby
et al. (1984), Amemiya (1985), Judge et al. (1995), and Davidson and MacKinnon
(1993) (a rare exception is Johnston 1984). In contrast, spatial autocorrelation and
spatial statistics in general are widely accepted as highly relevant in the analysis of
cross-sectional data in the physical sciences, such as in statistical mechanics, ecol-
ogy, forestry, geology, soil science, medical imaging, and epidemiology (for a recent
review, see National Research Council 1991).
In spite of this lack of recognition in “mainstream” econometrics, applied
workers saw the need to explicitly deal with problems caused by spatial autocorrela-
tion in cross-sectional data used in the implementation of regional and multiregional
econonietric models. In the early 1970s, the Belgian economist Jean Paelinck coined
the term “spatial econometrics” to designate a field of applied econometrics dealing
237
238 ANSELINAND BERA
with estimation and specification problems that arose from this. In their classic book
Spatial Econometrics, Paelinck and Klaassen (1979) outlined five characteristics of
the field: (1) the role of spatial interdependence in spatial models; (2) the asymmetry
in spatial relations; (3) the importance of explanatory factors located in other spaces;
(4) differentiation between ex post and ex ante interaction; and (5) explicit model-
ing of space (Paelinck and Klaassen 1979, pp. 5-1 1; see also Hordijk and Paelinck
1976, Paelinck 1982).In Anselin (1988a, p. 7), spatial econometrics is defined more
broadly as “the collection of techniques that deal with the peculiarities caused by
space in the statistical analysis of regional science models.” The latter incorporate
regions, location and spatial interaction explicitly and form the basis of most recent
empirical work in urban and regional economics, real estate economics, transporta-
tion economics, and economic geography. The emphasis on the model as the starting
point differentiates spatial econometrics from the broader field of spatial statistics,
although they share a common methodological framework. Much of the contributions
to spatial econometrics have appeared in specialized journals in regional science
and analytical geography, such as the Journal of Regional Science, Regional Science
and Urban Economics, Papers in Regional Science, International Regional Science
Review, Geographical Analysis, and Environment and Planning A. Early reviews of
the relevant methodological issues are given in Hordijk (1974, 1979), Bartels and
Hordijk (1977), Arora and Brown (1977), Paelinck and Klaassen (1979), Bartels and
Ketellapper (1979), Cliff and Ord (1981), Blommestein (1983), and Anselin (1980,
1988a, 1988b). More recent collections of papers dealing with spatial econometric
issues are contained in Anselin (1992a), Anselin and Florax (1995a), and Anselin
and Rey (1997).
Recently, an attention to the spatial econometric perspective has started to ap-
pear in mainstream empirical economics as well. This focus on spatial dependence
has occurred in a range of fields in economics, not only in urban, real estate, and
regional economics, where the importance of location and spatial interaction is fun-
damental, but also in public economics, agricultural and environmental economics,
and industrial organization. Recent examples of empirical studies in mainstream
economics that explicitly incorporated spatial dependence are, among others, the
analysis of U S . state expenditure patterns in Case et al. (1993), an examination of
recreation expenditures by municipalities in the Los Angeles region in Murdoch
et al. (1993), pricing in agricultural markets in LeSage (1993), potential spillovers
from public infrastructure investments in Holtz-Eakin (1994),the determination of
agricultural land values in Benirschka and Binkley (1994), the choice of retail sales
contracts by integrated oil companies in Pinkse and Slade (1995), strategic interac-
tion among local governments in Brueckner (1996), and models of nations’ decisions
to ratify environmental controls in Beron et al. (1996) and Murdoch et al. (1996).
Substantively, this follows from a renewed focus on Marshallian externalities, spa-
tial spillovers, copy-catting, and other forms of behavior where an economic actor
I N LINEARREGRESSION
SPATIAL DEPENDENCE MODELS 239
mimics or reacts to the actions of other actors, for example in the new economic
geography of Krugman (1991), in theories of endogenous growth (Romer 1986), and
in analyses of local political economy (Besley and Case 1995).Second, a number
of important policy issues have received an explicit spatial dimension, such as the
designation of target areas or enterprise zones in development economics and the
identification of underserved mortgage markets in urban areas. A more practical
reason is the increased availability of large socioeconomic data sets with detailed
spatial information, such as county-level economic information in the REIS CD-
ROM (Regional Economic Information System) of the U.S. Department of Commerce,
and tract -level data on mortgage transactions collected under the Housing Mortgage
Disclosure Act (HMDA) of 1975.
From a methodological viewpoint, spatial dependence is not only important
when it is part of the model, be it in a theoretical or policy framework, but it can
also arise due to certain misspecifications. For instance, often the cross-sectional
data used in model estimation and specification testing are imperfect, which may
cause spatial dependence as a side effect. For example, census tracts are not housing
markets and counties are not labor markets, but they are used as proxies to record
transactions in these markets. Specifically, a mismatch between the spatial unit of
observation and the spatial extent of the economic phenomena under consideration
will result in spatial measurement errors and spatial autocorrelation between these
errors in adjoining locations (Anselin 1988a).
In this chapter, we review the methodological issues related to the explicit
treatment of spatial dependence in linear regression models. Specifically, we focus
on the specification of the structure of spatial dependence (or spatial autocorrela-
tion), on the estimation of models with spatial dependence and on specification tests
to detect spatial dependence in regression models. Our review is organized accord-
ingly into three main sections. We have limited the review to cross-sectional settings
for linear regression models and do not consider dependence in space-time nor mod-
els for limited dependent variables. Whereas there is an established body of theory
and methodology to deal with the standard regression case, this is not (yet) the case
for techniques to analyze the other types of models. Both areas are currently the sub-
ject of active ongoing research (see, e.g., some of the papers in Anselin and Florax
1995a). Also, we have chosen to focus on a classical framework and do not consider
Bayesian approaches to spatial econometrics (e.g., Hepple 1995a, 1995b, LeSage
1997).
In our review, we attempt to outline the extent to which general econometric
principles can be applied to deal with spatial dependence. Spatial eeonometrics is
often erroneously considered to consist of a straightforward extension of techniques
to handle dependence in the time domain to two dimensions. In this chapter, we
emphasize the limitations of such a perspective and stress the need to explicitly
tackle the spatial aspects of model specification, estimation, and diagnostic testing.
240 ANSELINAND 6ERA
We begin this review with a closer look at the concept of spatial dependence, or its
weaker expression, spatial autocorrelation, and how it differs from the more familiar
serial correlation in the time domain. While, in a strict sense, spatial autocorrela-
tion and spatial dependence clearly are not synonymous, we will use the terms inter-
changeably. In most applications, the weaker term autocorrelation (as a moment of
the joint distribution) is used and only seldom has the focus been on the joint density
as such (a recent exception is the semiparametric framework suggested in Brett and
Pinkse 1997).
In econometrics, an attention to serial correlation has been the domain of time-
series analysis and the typical focus of interest in the specification and estimation
of models for cross-sectional data is heteroskedasticity. Until recently, spatial auto-
correlation was largely ignored in this context, or treated in the form of groupwise
equicorrelation, e.g., as the result of certain survey designs (King and Evans 1986).
In other disciplines, primarily in physical sciences, such as geology (Isaaks and
Srivastava 1989, Cressie 1991) and ecology (Legendre 1993), but also in geogra-
phy (Griffith 1987, Haining 1990) and in social network analysis in sociology and
psychology (Dow et al. 1982, Doreian et al. 1984, Leenders 1995),the dependence
across “space” (in its most general sense) has been much more central. For example,
Tobler’s (1979) “first law of geography” states that “everything is related to every-
thing else, but closer things more so,” suggesting spatial dependence to be the rule
rather than exception. A large body of spatial statistical techniques has been devel-
oped to deal with such dependencies (for a recent comprehensive review, see Cressie
1993;other classic references are Cliff and Ord 1973,1981, Ripley 1981,1988,Up-
ton and Fingleton 1985, 1989). Useful in this respect is Cressie’s (1993)taxonomy
of spatial data strucures differentiating between point patterns, geostatistical data,
and lattice data. In the physical sciences, the dominant underlying assumption tends
to be that of a continuous spatial surface, necessitating the so-called geostatistical
perspective rather than discrete observation points (or regions) in space, for which
the so-called lattice perspective is relevant. The latter is more appropriate for eco-
nomic data, since it is to some extent an extension of the ordering of observations on
a one-dimensional time axis to an ordering in a two-dimensional space. It will be the
almost exclusive focus of our review.
The traditional emphasis in econometrics on heterogeneity in cross-sectional
data is not necessarily misplaced, since the distinction between spatial heterogene-
ity and spatial autocorrelation is not always obvious. More specifically, in a single
cross section the two may be observationally equivalent. For example, when a spatial
cluster of exceptionally large residuals is observed for a regression model, it cannot
be ascertained without further structure whether this is an instance of heteroskedas-
ticity (i.e., clustering of outliers) or spatial autocorrelation (a spatial stochastic pro-
cess yielding clustered outliers). This problem is known in the literature as “true”
I N LINEARREGRESSION
SPATIAL DEPENDENCE MODELS 24 I
Our main focus in this review will be on the second approach, the so-called
lattice perspective. For each data point, a relevant “neighborhood set” must be de-
fined, consisting of those other locations that (potentially) interact with it. For each
observation i, this yields a spatial ordering of locations j E S; (where S; is the neigh-
borhood set), which can then be exploited to specify a spatial stochastic process. The
covariance structure between observations is thus not modeled directly, but follows
from the particular form of the stochastic process. We return to this issue below. First,
we review the operational specification of the neighborhood set for each observation
by means of a so-called spatial weights matrix.
B. Spatial Weights
A spatial weights matrix is a N by N positive and symmetric matrix W which ex-
presses for each observation (row) those locations (columns) that belong to its neigh-
borhood set as nonzero elements. More formally, wlJ = 1 when i and j are neighbors,
and wlJ = 0 otherwise. By convention, the diagonal elements of the weights matrix
are set to zero. For ease of interpretation, the weights matrix is often standardized
such that the elements of a row sum to one. The elements of a row-standardized
weights matrix thus equal ws = w l J / c / w,,. This ensures that all weights are be-
‘J
tween 0 and 1 and facilitates the interpretation of operations with the weights matrix
as an averaging of neighboring values (see Section 1I.C). It also ensures that the spa-
tial parameters in many spatial stochastic processes are comparable between mod-
els. This is not intuitively obvious, but relates to constraints imposed in a maximum
likelihood estimation framework. For the latter to be valid, spatial autoregressive pa-
rameters must be constrained to lie in the interval 1/mmln to l/mm,,, where mm,n and
U,,, are respectively the smallest (on the real line) and largest eigenvalues of the ma-
trix W (Anselin 1982). For a row-standardized weights matrix, the largest eigenvalue
is always +1 (Ord 1975), which facilitates the interpretation of the autoregressive
coefficient as a “correlation” (for an alternative view, see Kelejian and Robinson
1995). A side effect of row standardization is that the resulting matrix is likely to
become asymmetric (since E, w,, # Cl w,,), even though the original matrix may
have been symmetric. In the calculation of several estimators and test statistics, this
complicates computational matters considerably.
The specification of which elements are nonzero in the spatial weights matrix
is a matter of considerable arbitrariness and a wide range of suggestions have been
offered in the literature. The “traditional” approach relies on the geography or spa-
tial arrangement of the observations, designating area1 units as “neighbors” when
they have a border in common (first-order contiguity) or are within a given distance
of each other; i.e., wlJ = 1 for d, 5 6, where d, is the distance between units i
and j , and 6 is a distance cutoff value (distance-based contiguity). This geographic
approach has been generalized to so-called Cliff-Ord weights that consist of a func-
tion of the relative length of the common border, adjusted by the inverse distance
244 ANSELIN
AND BERA
between two observations (Cliff and Ord 1973, 1981). Formally, Cliff-Ord weights
may be expressed as:
where bi; is the share of the common border between units i and j in the perimeter of i
(and hence bi; does not necessarily equal bji), and a! and are parameters. More gen-
erally, the weights may be specified to express any measure of “potential interaction”
between units i and j (Anselin 1988a, Chap. 3). For example, this may be related
directly to spatial interaction theory and the notion of potential, with wi, = l/d: or
wij = e - p d i j , or more complex distance metrics may be implemented (Anselin 1980,
Murdoch et al. 1993). Typically, the parameters of the distance function are set a
priori (e.g., a = 2, to reflect a gravity function) and not estimated jointly with the
other coefficients in the model. Clearly, when they are estimated jointly, the resulting
specification will be highly nonlinear (Anselin 1980, Chap. 8, Ancot et al. 1986,
Bolduc et al. 1989, 1992, 1995).
Other specifications of spatial weights are possible as well. In sociometrics,
the weights reflect whether or not two individuals belong to the same social network
(Doreian 1980).In economic applications, the use of weights based on “economic”
distance has been suggested, among others, in Case et al. (1993). Specifically, they
suggest to use weights (before row standardization) of the form wij = l/lxi - xi),
where xi and x; are observations on “meaningful” socioeconomic characteristics,
such as per capita income or percentage of the population in a given racial or ethnic
group.
It is important to keep in mind that, irrespective of how the spatial weights
are specified, the resulting spatial process must satisfy the necessary regularity con-
ditions such that asymptotics may be invoked to obtain the properties of estimators
and test statistics. For example, this requires constraints on the extent of the range of
interaction and/or the degree of heterogeneity implied by the weights matrices (the
so-called mixing conditions; Anselin 1988a, Chap. 5). Specifically, this means that
weights must be nonnegative and remain finite, and that they correspond to a proper
metric (Anselin 1980).Clearly, this may pose a problem with socioeconomic weights
when xi = x; for some observation pairs, which may be the case for poorly chosen
economic determinants (e.g., when two states have the same percentage in a given
racial group). Similarly, when multiple observations belong to the same area1 unit
(e.g., different banks located in the same county) the distance between them must
be set to something other than zero (or l / d ~+ 00). Finally, in the standard estima-
tion and testing approaches, the weights matrix is taken to be exogenous. Therefore,
indicators for the socioeconomic weights should be chosen with great care to en-
sure their exogeneity, unless their endogeneity is considered explicitly in the model
specification.
I N LINEARREGRESSION
SPATIAL DEPENDENCE MODELS 245
Operationally, the derivation of spatial weights from the location and spatial
arrangement of observations must be carried out by means of a geographic informa-
tion system, since for all but the smallest data sets a visual inspection of a map is im-
practical (for implementation details, see Anselin et al. 1993a, 1993b, Anselin 1995,
Can 1996).A mechanical construction of spatial weights, particularly when based
on a distance criterion, may easily result in observations to become “unconnected”
or isolated islands. Consequently, the row in the weights matrix that corresponds to
these observations will consist of zero values. While not inherently invalidating es-
timation or testing procedures, the unconnected observations imply a loss of degrees
of freedom, since, for all practical purposes, they are eliminated from consideration
in any “spatial” model. This must be explicitly accounted for.
sures that a spatial lag operation yields a smoothing of the neighboring values, since
the positive weights sum to one.
Higher-order spatial lag operators are defined in a recursive manner, by ap-
plying the spatial weights matrix to a lower-order lagged variable. For example, a
second-order spatial lag is obtained as W ( W y ) , or W2y. However, in contrast to
time series, where such an operation is unambiguous, higher-order spatial operators
yield redundant and circular neighbor relations, which must be eliminated to ensure
proper estimation and inference (Blommestein 1985, Blommestein and Koper 1992,
Anselin and Smirnov 1996).
In spatial econometrics, spatial autocorrelation is modeled by means of a func-
tional relationship between a variable. y, or error term, E , and its associated spatial
lag, respectively Wy for a spatially lagged dependent variable and WE for a spatially
lagged error term. The resulting specifications are referred to as spatial lug and spa-
tial error models, the general properties of which we consider next.
(I -pW)y = X p + E (6)
where (I - p W ) y is a spatially filtered dependent variable, i.e., with the effect of
spatial autocorrelation taken out. This is roughly similar to first differencing of the
dependent variable in time series, except that a value of p = 1is not in the allowable
parameter space for (3) and thus p must be estimated explicitly (Section 111).
y=xp+E (7)
i.e., a linear regression with error vector E , and
where h is the spatial autoregressive coefficient for the error lag WE (to distinguish
the notation from the spatial autoregressive coefficient p in a spatial lag model),
and 6 is an uncorrelated and (without loss of generality) homoskedastic error term.
Alternatively, this may be expressed as
yields nonconstant diagonal elements in the error covariance matrix, thus inducing
heteroskedasticity in E , irrespective of the heteroskedasticity of (an illuminating
numerical illustration of this feature is given in McMillen 1992). We have a much
simpler situation for the case of autocorrelation in the time-series context where the
model is written as ct = h t - l + el. Therefore, this is a special case of (8)with
w=wT=j
0 0 0 ~
III
~
o1 10 0 0 ~ ~0 01
*
~
0 0
0 0
0 0 0 * * 1 0
where each observation is connected to only its immediate past value. As we know,
for this case, Var(&,) = a2/(1 - h2)for all t. That is, autocorrelation does not induce
heteroskedasticity. In a time-series model, heteroskedasticity can come only through
t l given the above AR(1) model.
A second complicating factor in specification testing is the great degree of
similarity between a spatial lag and a spatial error model, as suggested by the error
covariance structure. In fact, after premultiplying both sides of (9) by (I - hW)and
moving the spatial lag term to the right side, a spatial Durbin model results (Anselin
1980):
y = hWy + xg - hwxg + $I (1 1)
This model has a spatial lag structure (but with the spatial autoregressive parameter
h from (8))with a well-behaved error term 6. However, the equivalence between (7)-
(8)and (1 1) imposes a set of nonlinear common factor constraints on the coefficients.
Indeed, for (11) to be a proper spatial error model, the coefficients of the lagged ex-
ogenous variables WX must equal minus the product of the spatial autoregressive
coefficient h and the coefficients of X , for a total of K constraints (for technical de-
tails, see Anselin 1988a, pp. 226-229).
Spatial error dependence may be interpreted as a nuisance (and the parameter
h as a nuisance parameter) in the sense that it reflects spatial autocorrelation in
measurement errors or in variables that are otherwise not crucial to the model (i.e.,
the “ignored” variables spillover across the spatial units of observation). It primarily
causes a problem of inefficiency in the regression estimates, which may be remedied
by increasing the sample size or by exploiting consistent estimates of the nuisance
parameter. For example, this is the interpretation offered in the model of agricultural
land values in Benirschka and Binkley (1994).
The spatial autoregressive error model can also be expressed in terms of spa-
tially filtered variables, but slightly different from (6). After moving the spatial lag
variable in (11) to the left hand side, the following expression results:
(I - h W ) y = (I - hW)XB +6 (12)
250 ANSELINAND BERA
This is a regression model with spatially filtered dependent and explanatory vari-
ables and with an uncorrelated error term 6, similar to first differencing of both y
and X in time-series models. As in the spatial lag model, h = 1 is outside the pa-
rameter space and thus h must be estimated jointly with the other coefficients of the
model (see Section 111).
Several alternatives to the spatial autoregressive error process (8) have been
suggested in the literature, though none of them have been implemented much in
practice. A spatial moving average error process is specified as (Cliff and Ord 1981,
Haining 1988, 1990):
E=yWt+C (13)
e
where y is the spatial moving average coefficient and is an uncorrelated error term.
This process thus specifies the error term at each location to consist of a location-
specific part, Ci (“innovation”), as well as a weighted average (smoothing) of the
errors at neighboring locations, W t . The resulting error covariance matrix is
E[&&’]= 0 2 ( I + )/W)(I + YW’) = o‘[I+ )/(W + W’) + y 2 W W ’ ] (14)
Note that in contrast to (lO), the structure in (14)does not yield a full covariance ma-
+
trix. Nonzero covariances are only found for first-order ( W W’) and second-order
(WW’) neighbors, thus implying much less overall interaction than the autoregres-
sive process. Again, unless all observations have the same number of neighbors and
identical weights, the diagonal elements of (14)will not be constant, inducing het-
eroskedasticity in &, irrespective of the nature of 6.
A very similar structure to (13)is the spatial error components model of Kele-
jian and Robinson (1993, 1995),in which the disturbance is a sum of two indepen-
dent error terms, one associated with the “region” (a smoothing of neighboring errors)
and one which is location-specific:
&=Wti-@ (15)
with 6 and @ as independent error components. The resulting error covariance is
E [ € € / ]= a;r + a;WW’ (16)
where a$ and o2 6 are the variance components associated with respectively the
location-specific and regional error parts. The spatial interaction implied by (16)
is even more limited than for (14), pertaining only to the first- and second-order
neighbors contained in the nonzero elements of WW’. Heteroskedasticity is implied
unless all locations have the same number of neighbors and identical weights, a sit-
uation excluded by the assumptions needed for the proper asymptotics in the model
(Kelejian and Robinson 1993, p. 301).
In sum, every type of spatially dependent error process induces heteroskedas-
ticity as well as spatially autocorrelated errors, which will greatly complicate spec-
ification testing in practice. Note that the “direct representation” approach based
SPATIAL DEPENDENCE
I N LINEARREGRESSION
MODELS 25 I
on geostatistical principles does not suffer from this problem. For example, in Du-
bin (1988, 1992), the elements of the error covariance matrix are expressed di-
rectly as functions of the distance dij between the corresponding observations, e.g.,
E[E;E= ~ ] y ~ e ( - ~ i j with ) , and y2 as parameters. Since e-dii’Y2 = 1, irrespec-
/ ~ ~y1
tive of the value of y2, the errors E will be homoskedastic unless explicitly modeled
otherwise.
for the moving-average part, in the same notation as above. For greater generality, a
regressive component X/3 can be added to (17) as well. The spatial autocorrelation
pattern resulting from this general formulation is highly complex. Models that imple-
ment aspects of this form are the second-order SAR specification in Brandsma and
Ketellapper (1979a) and higher-order SAR models in Blommestein (1983, 1985).
A slightly different specification combines a first-order spatial autoregressive
lag with a first-order spatial autoregressive error (Anselin 1980, Chap. 6; Anselin
1988a. pp. 60-65). It has been applied in a number of empirical studies, most no-
tably i n the work of Case, such as the analysis of household demand (Case 1987,
1991), of innovation diffusion (Case 1992), and local public finance (Case et al.
1993, Besley and Case 1995). Formally, the model can be expressed as a combi-
nation of (3) with (8),although care must be taken to differentiate the weights matrix
used in the spatial lag process from that in the spatial error process:
After some algebra, combining (20) and (19)yields the following reduced form:
y =pwly+~w2y-hpw2wly+X/3-hw~Xq+~ (21)
i.e., an extended form of the spatial Durbin specification but with an additional set of
nonlinear constraints on the parameters. Note that when Wl and W2 do not overlap, for
example when they pertain to different orders of contiguity, the product W2Wl = 0
252 ANSELINAND BERA
and (21) reduces to a biparametric spatial lag formulation, albeit with additional
constraints on the parameters. On the other hand, when Wl and W2 are the same,
the parameters p and h are only identified when at least one exogenous variable is
included in X (in addition to the constant term) and when the nonlinear constraints
are enforced (Anselin 1980, p. 176). When Wl = W2 = W , the model becomes
Similar to when serial dependence is present in the time domain, classical sam-
pling theory no longer holds for spatially autocorrelated data, and estimation and
inference must rely on the asymptotic properties of stochastic processes. In essence,
rather than considering N observations as independent pieces of information, they
are conceptualized as a single realization of a process. In order to carry out mean-
ingful inference on the parameters of such a process, constraints must be imposed
on both heterogeneity and the range of interaction. While many properties of esti-
mators for spatial process models may be based on the same principles as developed
for dependent (and heterogeneous) processes in the time domain (e.g., the formal
properties outlined in White 19%, 1994), there are some important differences as
well. Before covering specific estimation procedures, we discuss these differences
in some detail, focusing in particular on the notion of stationarity in space and the
SPATIAL DEPENDENCE
IN LINEARREGRESSION
MODELS 253
error model is a special case of a nonspherical error term, both of which can be
tackled by means of generally established econometric theory, though not as direct
extensions of the time-series analog.
The emphasis on “simultaneity” in spatial econometrics differs somewhat from
the approach taken in spatial statistics, where conditional models are often consid-
ered to be more natural (Cressie 1993, p. 410). Again, the spatial case differs sub-
stantially from the time-series one since in space a conditional and simultaneous
approach are no longer equivalent (Brook 1964, Besag 1974, Cressie 1993, pp. 402-
410). More specifically, in the time domain a Markov chain stochastic process can be
expressed in terms of the joint density (ignoring a starting point to ease notation) as
where z refers to the vector of observations for all time points, and Qt is a function
that only contains the observation at t and at t - 1 (hence, a Markov chain). The
conditional density for this process is
illustrating the lack of memory of the process (i.e., the conditional density depends
only on the first-order lag). Due to the one-directional nature of dependence in time,
(23)and (24)are equivalent (Cressie 1993, p. 403).An extension of (23)to the spatial
domain may be formulated as
Prob[t] = n
i= 1
N
Q;[z;,z;; j E S;] (25)
where the z; only refer to those locations that are part of the neighborhood set S; of
i. A conditional specification would be
Prob[z;(z;, j # i] = Prob[zi(zj;j E S ; ] (26)
i.e., the conditional density of z;, given observations at all other locations only de-
pends on those locations in the neighborhood set of i. The fundamental result in this
respect goes back to Besag (1974),who showed that the conditional specification
only yields a proper joint distribution when the so-called Hammersley-Clifford the-
orem is satisfied, which imposes constraints on the type and range of dependencies
in (26). Also, while a joint density specification always yields a proper conditional
specification, in range of spatial interaction implied is not necessarily the same.
For example, Cressie (1993, p. 409) illustrates how a first-order symmetric spatial
autoregressive process corresponds with a conditional specification that includes
third-order neighbors (Haining 1990, pp. 89-90). Consequently, it does make a dif-
ference whether one approaches a spatially autocorrelated phenomenon by means of
SPATIAL DEPENDENCE
IN LINEAR REGRESSIONMODELS 255
(26) versus (25). This also has implications for the substantive interpretation of the
model results, as illustrated for an analysis of retail pricing of gasoline in Haining
(1984).
In practice, it is often easier to estimate a conditional model, especially for
nonnorrnal distributions (e.g., auto-Poisson, autologistic). Also, a conditional speci-
fication is more appropriate when the focus is on spatial prediction or interpolation.
For general estimation and inference, however, the constraints imposed on the type
and range of spatial interaction in order for the conditional density to be proper
are often highly impractical in empirical work. For example, an auto-Poisson model
(conditional model for spatially autocorrelated counts) only allows negative autocor-
relation and hence is inappropriate for any analysis of clustering in space.
In the remainder, our focus will be exclusively on simultaneously specified
models, which is a more natural approach from a spatial econometric perspective
(Anselin 1988a, Cressie 1993, p. 410).
in the same notatiorl as used in Section 11. This expression clearly illustrates why,
in contrast to the time-series case, ordinary least squares (i.e., the minimization of
the last term in (28)) is not maximum likelihood, since it ignores the Jacobian term.
From the usual first-order conditions, the ML estimates for p and 0’ in a spatial lag
model are obtained as (for details, see Ord 1975, Anselin 1980, Chap. 4: Anselin
1988a, Chap. 6):
Conditional upon p , these estimates are simply OLS applied to the spatially filtered
dependent variable and the explanatory variables in (6). Substitution of (29) and (30)
in the log-likelihood function yields a concentrated log-likelihood as a nonlinear
function of a single parameter p:
X’ w*xp X’X
-
02
0
0 2
where WA = W(1 - pW)-’ to simplify notation. Note that while the covariance
between p and the error variance is zero, as in the standard regression model, this
is not the case for p and the error variance. This lack of block diagonality in the
information matrix for the spatial lag model will lead to some interesting results on
SPATIAL DEPENDENCE
I N LINEARREGRESSION
MODELS 257
the structure of specification tests, to which we turn in Section IV, It is yet another
distinguishing characteristic between the spatial case and its analog in time series.
Maximum likelihood estimation of the models with spatial error autocorrela-
tion that were covered in Section 1I.E can be approached by considering them as
special cases of general parametrized nonspherical error terms, for which E[&&’] =
a252(8), with 8 as a vector of parameters. For example, from (32) for a spatial au-
toregressive error term, it follows that
for example, with 52 (A) as in (33).First-order conditions yield the familiar general-
ized least-squares estimates for B, conditional upon A:
where e = y - X/3 are GLS residuals. For a spatial autoregressive error process,
X2-'/8h = -W - W' +
AW'W. Solution of condition (36) can be obtained by
numerical means. Alternatively, the GLS expression for t9 and similar solution of
the first-order conditions for o2 can be substituted into the log-likelihood function
to yield a concentrated log-likelihood as a nonlinear function of the autoregressive
parameter h (for technical details, see Anselin 1980, Chap. 5):
with u'u = yLy~- ~ L X L [ X ~ ~ XXLi y] [-, ~, and y~ and X L as spatially filtered variables,
respectively y - hWy and X - hWX. The Jacobian term follows from In 1 f'2 (A)1 =
2 In 11 - h W ( and the Ord simplification in terms of eigenvalues of W .
The asymptotic variance for the ML estimates conforms to the Magnus (1978)
and Breusch (1980) general form and is block diagonal between the regression
( b )and error variance parameters o2 and 8. For example, for a spatial autoregres-
sive error, the asymptotic variance for the regression coefficients is AsyVar[B] =
a 2 [ X ; X ~ ] - ' .The variance block for the error parameters is
1
where, for ease of notation, Wjj = W(I - hW)-'. Due to the block-diagonal form
of the asymptotic variance matrix, knowledge of the precision of h does not affect
the precision of the /3 estimates. Consequently, if the latter is the primary inter-
est, the complex inverse and trace expressions in (38) need not be computed, as
in Benirschka and Binkley (1994). A significance test for the spatial error parame-
ter can be based on a likelihood ratio test, in a straightforward way (Anselin 1988a,
Chap. 8).
Higher-order spatial processes can be estimated using the same general prin-
ciples, although the resulting log-likelihood function will be highly nonlinear and the
use of a concentrated log-likelihood becomes less useful (Anselin 1980, Chap. 6).
The fit of spatial process models estimated by means of maximum likelihood
procedures should not be based on the traditional R 2 , which will be misleading in
the presence of spatial autocorrelation. Instead, the fit of the model may be assessed
by comparing the maximized log-likelihood or an adjusted form to take into account
the number of parameters in the models, such as the familiar AIC (Anselin 198813).
C. GMM/IV Estimation
The view of a spatially lagged dependent variable Wy in the spatial lag model as a
form of endogeneity or simultaneity suggests an instrumental variable (IV) approach
SPATIAL DEPENDENCE
IN LINEAR REGRESSIONMODELS 259
to estimation (Anselin l980,1988a, Chap. 7; 1990b). Since the main problem is the
correlation between Wy and the error term in (3), the choice of proper instruments
for Wy will yield consistent estimates. However, as usual, the efficiency of these
estimates depends crucially on the choice of the instruments and may be poor in
small samples. On the other hand, in contrast to the maximum likelihood approach
just outlined, IV estimation does not require an assumption of normality.
Using the standard econometric results (for a review, see Bowden and Turk-
+
ington 1984),and with Q as a P by N matrix ( P >_ K 1) of instruments (including
K “exogenous” variables from X ) , the IV or 2SLS estimate follows as
where A is a consistent estimate for the error covariance matrix. The asymptotic vari-
ance for BGMMis [Z’Q(Q’hQ)-’Q’Z]-’. For the spatial error components model,
Kelejian and Robinson (1993, pp. 302-304) suggest an estimate for h = @11 +
$2 WW’, with $1 and $2 as the least-squares estimates in an auxilliary regression
of the squared IV residuals (y - ZBrv) on a constant and the diagonal elements
of WW‘.
A particularly attractive application of GLS-IV estimation in spatial lag mod-
els is a special case of the familiar White heteroskedasticity-consistent covariance
estimator (White 1984, Bowden and Turkington 1984, p. 91). The estimator is as in
(40), but Q’AQ is estimated by Q’QQ, where Q is a diagonal matrix of squared IV
residuals, in the usual fashion. This provides a way to obtain consistent estimates
for the spatial autoregressive parameter p in the presence of heteroskedasticity of
unknown form, often a needed feature in applied empirical work.
A crucial issue in instrumental variables estimation is the choice of the instru-
ments. In spatial econometrics, several suggestions have been made to guide the se-
lection of instruments for Wy (for a review, see Anselin 1988a, pp. 84-86; Land and
Deane 1992). Recently, Kelejian and Robinson (1993 p. 302) formally demonstrated
260 ANSELINAND BERA
the consistency of BGMM in the spatial lag model with instruments consisting of first-
order and higher-order spatially lagged explanatory variables ( W X , W 2 X , etc.).
An important feature of the instrumental variables approach is that estima-
tion can easily be carried out by means of standard econometric software, provided
that the spatial lags can be computed as the result of common matrix manipulations
(Anselin and Hudak 1992). In contrast, the maximum likelihood approach requires
specialized routines to implement the nonlinear optimization of the log-likelihood
(or concentrated log-likelihood). We next turn to some operational issues related to
this.
less sparse matrix routines can be exploited (avoiding the need to store a full N by
N matrix). This is increasingly the case in state-of-the-art matrix algebra packages
(e.g., Matlab, Gauss), but still fairly uncommon in application-oriented economet-
ric software; hence, the computation of spatial lags will typically necessitate some
programming effort on the part of the user (the construction of spatial lags based on
sparse spatial weights formats in SpaceStat is discussed in Anselin 1995).Once the
spatial lagged dependent variables are computed, IV estimation of the spatial lag
model can be carried out with any standard econometric package.
. The other major operational issue pertains only to maximum likelihood esti-
mation. It is the need to manipulate large matrices of dimension equal to the number
of observations in the asymptotic variance matrices (32) and (38) and in the Jaco-
bian term (27)of the log-likelihoods (31)and (37).In contrast to the time-series case,
the matrix W is not triangular and hence a host of computational simplifications are
not applicable. The problem is most serious in the computation of the asymptotic
variance matrix of the maximum likelihood estimates. The inverse matrices in both
WA = W(1- pW)-' of (32)and WB = W(1- hW)-' of (38)are full matrices which
do not lend themselves to the application of sparse matrix algorithms. For low values
of the autoregressive parameters, a power expansion of (I - pW)-' or (I - ATV)-'
may be a reasonable approximation to the inverse, e.g., (I - p W ) - ' = Ck pkWk+
error, with k = 0, 1 , . . . , K , such that p K < 6, where 6 is a sufficiently small value.
However, this will involve some computing effort in the construction of the powers of
the weights matrices and is increasingly burdensome for higher values of the autore-
gressive parameter. In general, for all practical purposes, the size of the problem for
which an asymptotic variance matrix can be computed is constrained by the largest
matrix inverse that can be carried out with acceptable numerical precision in a given
softwarehardware environment. In current desktop settings, this typically ranges
from a few hlindred to a few thousand observations. While this makes it impossible
to compute asymptotic t-tests for all the parameters in spatial models with very large
numbers of observations, it does not preclude asymptotic inference. In fact, as we ar-
gued in Section III.B, due to the block diagonality of the asymptotic variance matrix
in the spatial error case, asymptotic t-statistics can be constructed for the estimated
j? coefficients without knowledge of the precision of the autoregressive parameter h
(see also Benirschka and Binkley 1994, Pace and Barry 1996).Inference on the au-
toregressive parameter can be based on a likelihood ratio test (Anselin 1988a, Chap.
6). A similar approach can be taken in the spatial lag model. However, in contrast
to the error case, asymptotic t-tests can no longer be constructed for the estimated j?
coefficients, since the asymptotic variance matrix (32)is not block diagonal. Instead,
likelihood ratio tests must be considered explicitly for any subset of coefficients of
interest (requiring a separate optimization for each specification; see Pace and Barry
1997).
With the primary objective of obtaining consistent estimates for the parameters
in spatial regression models, a number of authors have suggested ways to manipu-
262 ANSELINAND BERA
late popular statistical and econometric software packages in order to maximize the
log-likelihoods (28) and (37). Examples of such efforts are routines for ML estima-
tion of the spatial lag and spatial autoregressive error model in Systat, SAS, Gauss,
Limdep, Shazam, Rats and S-PLUS (Bivand 1992, Griffith 1993, Anselin and Hu-
dak 1992, Anselin et al. 1993b). The common theme among these approaches is to
find a way to convert the log-likelihoods for the spatial models to a form amenable
for use with standard nonlinear optimization routines. Such routines proceed incre-
mentally, in the sense that the likelihood is built up from a sum of elements that
correspond to individual observations. At first sight, the Jacobian term in the spatial
models would preclude this. However, taking advantage of the Ord decomposition in
terms of eigenvalues, pseudo-observations can be constructed for the elements of the
Jacobian. Specifically, each term 1 - pwi is considered to correspond to a pseudo-
variable o;, and is summed over all “observations.” For example, for the spatial lag
model, the log-likelihood (ignoring constant terms) can be expressed as
which fits the format expected by most nonlinear optimization routines. Examples
of practical implementations are listed in Anselin and Hudak (1992, Table 10, p.
533) and extensive source code for various econometric software packages is given
in Anselin et al. (1993b).
One problem with this approach is that the asymptotic variance matrices com-
puted by the routines tend to be based on a numerical approximation and do not
necessarily correspond to the analytical expressions in (32) and (38). This may lead
to slight differences in inference depending on the software package that is used
(Anselin and Hudak 1992, Table 10, p. 533). An alternative approach that does not
require the computation of eigenvalues is based on sparse matrix algorithms to effi-
ciently compute the determinant of the Jacobian at each iteration of the optimization
routine. While this allows the estimation of models for very large data sets (tens of
thousands of observations), for example, by using the specialized routines in the Mat-
lab software, this does not solve the asymptotic variance matrix problem. Inference
therefore must be based on likelihood ratio statistics (for details and implementation,
see Pace and Barry 1996,1997).
To illustrate the various spatial models and their estimation, the results for
the parameters in a simple spatial model of crime estimated for 49 neighborhoods
in Columbus, Ohio, are presented in Table 1 . The model and results are based on
Anselin (1988a, pp. 187-196) and have been used in a number of papers to bench-
mark different estimators and specification tests (e.g., McMillen 1992, Getis 1995,
Anselin et al. 1996, LeSage 1997). The data are also available for downloading via
the internet from http://www.rri.wvu.edu/spacestat.htm.The estimates reported in
Table 1 include OLS in the standard regression model, OLS (inconsistent), ML, IV,
and heteroskedastic-robust IV for the spatial lag model, and ML for the spatial error
SPATIAL DEPENDENCE
IN LINEARREGRESSIONMODELS 263
model. The spatial lags for the exogenous variables ( W X ) were used as instruments
in the IV estimation. In addition to the estimates and their standard errors, the fit of
the different specifications estimated by ML is compared by means of the maximized
log-likelihood. For OLS and the IV estimates, the R2 is listed. However, this should
be interpreted with caution, since R2 is inappropriate as a measure of fit when spa-
tial dependence is present. All estimates were obtained by means of the Spacestat
software.
A detailed interpretation of the results in Table 1 is beyond the scope of this
chapter, but a few noteworthy features may be pointed out. The two spatial models
provide a superior fit relative to OLS, strongly suggesting the presence of spatial de-
pendence. Of the two, the spatial lag model fits better, indicating it is the preferred
alternative. Given the lack of an underlying behavioral model (unless one is willing to
make heroic assumptions to avoid the ecological fallacy problem), the results should
be interpreted as providing consistent estimates for the coefficients of income and
housing value after the spatial dependence in the crime variable is filtered out. The
most affected coefficient (besides the constant term) pertains to the income variable,
and is lowered by about a third while remaining highly significant. The estimates
for the autoregressive coefficient vary substantially between the inconsistent and
biased OLS and the consistent estimates, but the Lag-IV coefficient has a consider-
ably higher standard error. In some instances, OLS can thus yield “better” estimates
in an MSE sense relative to IV. Diagnostics in the Lag-ML model indicate strong re-
maining presence of heteroskedasticity (the spatial Breusch-Pagan test from Anselin
264 AND BERA
ANSELIN
1988a, p. 123, yields a highly significant value of 25.35, p < 0.00001). The robust
Lag-GIVE estimates support the importance of this effect: the estimate for the au-
toregressive parameter is quite close to the ML value while obtaining a significantly
smaller standard error relative to both OLS and the nonrobust IV. Moreover, the es-
timate for the Housing variable is no longer significant. This again illustrates the
complex interaction between heterogeneity and spatial dependence.
threads. As in the standard case, most of the tests for dependence in the spatial
model can be constructed based on the OLS residuals. In our discussion we will em-
phasize the similarities and the differences between specification testing in spatial
econometric models and the standard case.
We start the remainder of the section with a discussion of Moran’s I statistic
and stress its close connection to the familiar Durbin-Watson test. Moran’s I was
not developed with any specific kind of dependence as the alternative hypothesis,
although it has been found to have power against a wide range of forms of spatial
dependence. We next consider a test developed in the same spirit by Kelejian and
Robinson (1992). This is followed by a focus on tests for specific alternative hypoth-
esis in the form of either spatial lag or spatial error dependence. Tests for these two
kinds of autocorrelations are not independent even asymptotically, and their sepa-
rate applications when other or both kinds of autocorrelations are present will lead
to unreliable inference. Therefore, it is natural to discuss a test for joint lag and er-
ror autocorrelations. However, the problem with such a test is that we cannot make
any specific inference regarding the exact nature of dependence when the joint null
hypothesis is rejected. One approach to deal with this problem is to test for spa-
tial error autocorrelation after estimating a spatial lag model, and vice versa. This,
however, requires ML estimation, and the simplicity of tests based on OLS residuals
is lost. We therefore consider a recently developed set of diagnostics in which the
OLS-based RS test for error (lag) dependence is adjusted to take into account the
local presence of lag (error) dependence (Anselin et al. 1996). We then provide a
brief review of the small-sample properties of the various tests. Finally, the section
is closed into a discussion of implementation issues and our illustrative example of
the spatial model of crime.
A. Moran’s I Test
Moran’s (1950a, 1950b) I test was originally developed as a two-dimensional analog
of the test of significance of the serial correlation coefficient in univariate time series.
Cliff and Ord (1972, 1973) formally presented Moran’s I statistics as
1
where e = y - X g is a vector of OLS residuals, = ( X ’ X ) - ’ X ’ y , W is the spatial
weights matrix, N is the number of observations, and So is a standardization factor
equal to the sum of the spatial weights, xi xj
w ~For
. a row-standardized weights
matrix W , Sosimplifies to N (since each row sum equals 1)and the statistic becomes
e’ We
I=--- (43)
e’e
Moran did not derive the statistic from any basic principle; instead, it was suggested
as a simple test for correlation between nearest neighbors which generalized one
266 AND BERA
ANSELIN
of his earlier tests in Moran (1948). Consequently, the test could be given different
interpretations. The first striking characteristic is the similarity between Moran’s I
and the familiar Durbin-Watson (DW) statistic
e’Ae
DW=- (44)
e’e
where
-1 0 0 ... 0 0 0
2 -1 0 ... 0 0 0
-1 2 -1 ... 0 0 0
... .
... .
0 0 0 ... -1 2 -1
0 0 0 ... 0 -1 1
Therefore, both statistics equal the ratio of quadratic forms in OLS residuals and they
differ only in the specification of the interconnectedness between the observations
(neighboring locations). It is well known that the DW test is a uniformly most power-
ful (UMP) test for one sided alternatives with error distribution E~ = hct-l +cl (see,
e.g., King 1987). Similarly Moran’s I possesses some optimality properties. More
precisely, Cliff and Ord (1972) established a link between the LR and I tests. If we
take the alternative model as (8),i.e.,
&=hW&+t
then the LR statistic for testing Ho: h = 0 against the alternative H,: h = hi, when
E and o2are known, is proportional to
E‘ WE
&’(I+ ~ : G ) E
(45)
V(1) =
tr(MWMW’) + tr(MWl2 + {tr(MW)I2 - CE(1)l2
(47)
(N -K)(N - K + 2)
where M = I - X ( X ’ X ) - ’ X ’ , and W is a row-standardized weights matrix.
It is possible to develop a finite-sample-bound test for I following Durbin and
Watson (1950,1951). However, for I , we need to make the bounds independent of not
only X but also of the weight matrix W . This poses some difficulties. Tiefelsdorf and
Boots (1995), using the results of Imhof (1961) and Koerts and Abrahamse (1968),
showed that exact critical values of I can be computed by numerical integration.
They first expressed I in terms of the eigenvalues y~ , y2, . . . , Y N - K of M W , other
than the K zeros, and N - K independent N ( 0 , 1) variables 6 1 , 6 2 , . . . , ~ N - K ;more
specifically,
N-K N-K
I = Yid’ / 6’
i= 1 i= 1
Then
(49)
where
N-K
268 ANSELINAND BERA
The integral in (50)can be evaluated by numerical integration (for more on this, see
Tiefelsdorf and Boots 1995).
It is instructive to note that the computation of exact critical values of the DW
statistic involves the same calculations as for Moran’s I except that the yi is the
eigenvalues of MA, where A is the fixed matrix given by in (44).Even with the recent
dramatic advances in computer technology, it will take some time for practitioners
to use the above numerical integration technique to implement Moran’s I test.
B. Kelejian-Robinson Test
The test developed by Kelejian and Robinson (1992)is in the same spirit of Moran’s
I in the sense that it is not based on an explicit specification of the generating process
of the disturbance term. At the same time the test does not require the model to be
linear or the disturbance term to be normally distributed. Although the test does not
attempt to identify the cause of spatial dependence, Kelejian and Robinson (1992)
made the following assumption about spatial autocorrelation:
KR=-
p’z’zp
04
Since for the implementation of the test we need the distribution only under the null
hypothesis, it is legitimate to replace 0‘ by a consistent estimate under a = 0.
I N LINEARREGRESSION
SPATIAL DEPENDENCE MODELS 269
P P
Note.that under Ho, (?t/hN -+ 0 4 ,where + means convergence in probability.
Therefore, an asymptotically equivalent form of the test is
hN
elz(Z’Z)- 2’ e
(54)
*
e‘t
which has the familiar N R 2 form. Here R2 is the uncentered coefficient of determi-
c
nation of on 2 and h N is the sample size of this regression.
It is also not difficult to see an algebraic connection between K R and Moran’s
I . From (43)
12 =
’
(e We)
(e’e)2
N N
-
where p;, are the elements of Z(Z’Z)-’Z’. Given that ti’s contain terms like ekel,
k < 1, it appears that the I 2 and K R statistics have similar algebraic structure.
and we test h = 0. All three general principles of testing, namely LR, W, and RS
can be applied. Out of the three, the RS test as described in Rao (1947)is the most
convenient one to use since it requires estimation only under the null hypothesis.
That is, the RS test can be based on the OLS estimation of the regression model (7).
Silvey (1959)derived the RS test using the Lagrange multiplier(s) of a constrained
optimization problem.
Burridge (1980) used Silvey’s form to test h = 0, although the Rao’s score
form, namely
RS = d ’ ( s ) z ( e ) - ’ d ( s ) (57)
270 ANSELINAND 6ERA
is more popular and much easier to use. In (57), d(8) = aL(8)/a8 is the score
vector, Z(8) = -E[a2L(8)/a(8)a(8)’] is the information matrix, L(8) is the log-
likelihood function, and 0 is the restricted (under the tested hypothesis) maximum
likelihood estimator of the parameter vector 8. For the spatial error autocorrelation
model 8 = (j?’,c?, A)’ and the log-likelihood function is given in (34). The test is
essentially based on the score with respect to A, i.e., on
We can immediately see the connection of this to Moran’s I statistic. After computing
Z(8) under Ho, from (36),we have the test statistic
where T = tr[(W’ +W ) W ] . Therefore, the test requires only OLS estimates, and
V
under Ho, RSA + xl.It is interesting to put W = W T (Section 1I.E) and obtain
T = N - 1 and RSA = ( N - 1)x2where = x c, e,e,-l/ c, ef-l in the time-series
context. Burridge (1980) derived the RS test (59)using the estimates of the Lagrange
multiplier following Silvey (1959). The Lagrangian function for this problem is
PI
ah
i.e.,
of the spatial model are resolved. This also raises the question whether RSA will
be inferior to other asymptotically equivalent tests such as LR and W, with respect
to power, since it does not use the precise information contained in the alternative
hypothesis. In the context of the standard regression model, Monte Carlo results of
Godfrey (1981) and Bera and McKenzie (1986)suggest that there is no setback in
the performance of RS test compared to the LR test. In Section IV.G, we discuss the
finite sample performance of R S A and other tests.
Computationally, the W and LR tests are more demanding since they require
ML estimation under the alternative, and the explicit forms of the tests are more
complicated. For instance, let us consider the W test which can be computed using
the ML estimate I by maximizing (34)with respect to /?,u2,and A. We can write the
W statistic as (Anselin 1988a, p. 104)
WSA =
x2
-c4
The appearance of the last term in (66)differentiate the spatial dependence situation
from the serial correlation case for time-series data.
272 ANSELINAND BERA
[e’%e/6”I2
RSA,...Aq =
1= 1 Tl
=7 E‘ w y
ap p=o
The inverse of the information matrix is given in (30). The complicating feature of
this matrix is that even under p = 0, it is not block diagonal; the ( p , B) term is
equal to (X‘WX/3)/02, obtained by putting p = 0; i.e., WA = W . This absence
of block diagonality causes two problems. First, as we mentioned in Section 11, the
presence of spatial dependence implies that a sample contains less information than
an independent counterpart. This can now be easily demonstrated using (30).In the
absence of dependence ( p = 0 in (3)), the ML estimate of will have variance
02(X’X)-‘ which is the inverse of the information. But when p # 0, to compute the
variance of the ML estimate of B we need to add a positive-definite part to 0 2 ( X ’ X ) - ’
due to absence of block diagonality. Second, to obtain the asymptotic variance of d,,
even under p = 0 from (30), we cannot ignore one of the off-diagonal terms. This
was not the case for dA in Section 1V.C. Asymptotic variance of dA was obtained
just using the (2,Z) element of (36) (see (59)).For the spatial lag model, asymptotic
variance of d, is obtained from the reciprocal of the (1,l)element of
SPATIAL DEPENDENCE
IN LINEAR REGRESSIONMODELS 273
If ML estimation is already performed, LR, is much easier to compute than its Wald
counterpart. Under p = 0 both Wald and LR statistics will be asymptotically dis-
tributed as x:.
E. Testing in the Possible Presence of Both Spatial Error and
Lag Autocorrelation
The test described in the Sections 1V.C and 1V.D can be termed as one-directional
tests in the sense that they are designed to test a single specification assuming cor-
rect specification for the rest of the model. For example, we discussed RSA, WSh,
and LRA statistics for the null hypothesis Ho : h = 0 assuming that p = 0. Because
of the nature of the information matrix, these tests will not be valid even asymptot-
ically, when p # 0. For instance, we noted that under the null, Ho : h = 0 all
the three statistics are asymptotically distributed as central x2 with one degree of
freedom. This result is valid only when p = 0. To evaluate the effects of nonzero p
on RSA, WSA, and LRA, let u s write the model when both the spatial error and lag
autocorrelation are present:
where Wl and W2 are spatial weights matrices associated with the spatially lagged
dependent variable and the spatial autoregressive disturbances, respectively. Recall
from Section 1I.F that for model (73)to be identified, it is necessary that Wl # W2 or
274 ANSELINAND BERA
that the matrix X contain at least one exogenous variable in addition to the constant
term. An alternative specification of spatial moving-average error process for E as in
(1317
has no such problems and it also leads to identical results in terms of test statistics
discussed here. Using the results of Davidson and MacKinnon (1987) and Saikko-
nen (1989),we evaluate the impact of local presence of p on the asymptotic null
distribution of RSA, L R A , and WSA.Let p = 6/a 6 ,< 00, then it can be shown
that under Ho : h = 0, all three tests asymptotically converge to a noncentral x:,
with noncentrali ty parameter
+
where Gj = tr[KK$ y’K$], j = 1 , 2 (note that Tl2 = Tzl). Therefore, the tests will
reject the null of error autocorrelation even when h = 0 due to the local presence
of the lag dependence. In a similar way we can express the asymptotic distributions
of RS,, LR,, and WS,. Under p = 0 and local presence of error dependence, say,
h= t/a, t < 00. In this case the distributions remain x:, but with a noncentrality
parameter
(77)
SPATIAL DEPENDENCE
IN LINEARREGRESSION
MODELS 275
where E = (D/a2)T22 - ( T I ~ )Note ~ . that this joint test not only depends on dk
and d, but also on their interaction factor with a coefficient 7’12. Expression (77) ap-
pears to be somewhat complicated but can be computed quite easily using only OLS
residuals. Also the expression simplifies greatly when the spatial weights matrices
W1 and W2 are assumed to be the same which is the case in most applications. Under
Wl = W2 = W , 2‘1 1 = T21 = Tl2 = T = tr[ ( W’ + W )W ] ,and (77) reduces to
where the “hat” denotes quantities are evaluated at the maximum likelihood esti-
mates of the parameters of the model Y = pWiy + +
Xg 6 obtained by means
of nonlinear optimization. In (79) T 2 1 ~stands for tr[W2W1A-’ +
W,’WlA-’], with
A = I - PW]. Under Ho : h = 0, RSA,, will converge to a central x* with one
degree of freedom. Similarly, an RS test can be developed for Ho : p = 0 in the
presence of error dependence (Anselin et al. 1996).This test statistic can be writ-
ten as
where 2 is a vector of residuals in the ML estimation of the null model with spatial
+
AR errors, y = X g (I - AWz)-’< with Q = (g’, A ,a2)’,and B = I - xW2. The
terms in the denominator of (80)are
276 AND BERA
ANSELIN
- *
and Var(8 is the estimated variance-covariance matrix for the parameter vector 8.
It is also possible to obtain the W and LR statistics in the above three cases,
though these will involve the estimation of a spatial model with two parameters, re-
quiring considerably more complex nonlinear optimization. In contrast, RSLl, and
RS,lk are theoretically valid statistics that have the potential to identify the possible
source(s) of misspecification and can be derived from the results of the maximization
of the log-likelihood functions (32)and (26). However, this is clearly more computa-
tionally demanding than tests based on OLS residuals. We now turn to an approach
that accomplishes carrying out the tests without maximum likelihood estimation of
h and p.
Comparing RS; in (81)and RSA in (59), it is clear that the adjusted test modifies
RSA by correcting for the presence of p through 2, and 2'12, where the latter quantity
represents the covariance between dA and d,. Under Ho: h = 0 (and p = 6 / a ) ,
RS: converges to a central x: distribution; i.e., RS; has the same asymptotic distri-
bution as RSA based on the correct specification. This therefore produces asymptoti-
cally the correct size in the presence of local lag dependence. Also as noted for RS;,
SPATIAL IN LINEARREGRESSION
DEPENDENCE MODELS 277
we only need OLS estimation thus circumventing direct estimation of the nuisance
parameter p. However, there is a price to be paid for robustification and simplicity
in estimation. Consider the case when there is no lag dependence ( p = 0), but only
spatial error dependence through h = r / n . Under this setup, the noncentrality
parameters of RSA and RS; are respectively t2T22/N and r2(T22 - T,?p2D-')/N.
/ N 0, the asymptotic power of RS; will be less than that of
Since S ~ T ~ ~ C ? D - ' 2
RSA when there is no lag dependence. The above quantity can be regarded as a cost
of robustification. Once again, note its dependence on TQ. It is also instructive to
compare RS; with Anselin's RSkl, in (79). It is readly seen that RSkl, does not
have the mean correction factor. RSkl, uses the restricted ML estimator of p (under
h = 0) for which J, = 0. We may view RSA,, as the spatial version of Durbin's h
statistic, which can also be derived from the general RS principle. Unlike Durbin's
h, however, RSAI, cannot be computed using the OLS residuals.
In a similar way, the adjusted score test for Ho : p = 0, in the presence of local
misspecification involving spatial-dependent error process can be expressed as
[ J , - J*]2
RST, = (84)
3-26 -T
unadjusted and adjusted tests and exploiting the result (85),it is possible to identify
the exact nature of dependence in practice (Anselin et al. 1996). Finally, we should
mention that because of the complexity of the Wald and LR tests, it is not possible
to derive their adjusted versions that would be valid under local misspecification.
Of course, it is not computationally prohibitive to obtain these tests after the joint
estimation of both h and p.
all tests, with a higher power obtained in the rook case. Third, as in Anselin and
Rey (1991), higher powers were achieved by lag tests relative to tests against error
dependence. This is important, since the consequences of ignoring lag dependence
are more serious. Fourth, the KR statistic did not perform well. For example, when
the errors were generated as lognormal, it significantly over rejected the true null hy-
pothesis in all configurations. There are two possible explanations. One is its higher
degrees of freedom. Another is that its power depends on the degree of autocorre-
lation in the explanatory variables which substitute for the weights matrix (compare
(55) and (56)).It is interesting to note that White’s (1980)test for heteroskedasticity
which is very similar to KR encounters problems of the same type. Fifth, the most
striking result is that the adjusted tests RST and RSp* performed remarkably well.
They had reasonable empirical sizes, remaining within the confidence interval in all
cases. In terms of power they performed exactly the way they were supposed to. For
instance, when the data were generated under p > 0, h = 0, although R S , had the
most power, the powers of RS; was very close to that of RS,. That is, the price paid
for adjustments that were not needed turned out to be small. The real superiority of
RS: was revealed when h > 0 and p = 0. It yielded low rejection frequencies even
for h = 0.9. The correction for error dependence in RS; worked in the right direc-
tion when no lagged dependence was present for all configurations. When p > 0,
the power function of RS; was seen to be almost unaffected by the values of A, even
for those far away from zero (global misspecification). For these alternatives RSA,
also had good power, but could not point to the correct alternative when only one
kind of dependence is present. RS; also performed well though not as spectacularly
as RS;. The adjusted tests thus seem more appropriate to test for lag dependence
in the presence of error correlation than for the reverse case. Again, this is impor-
tant since ignoring lag dependence has more severe consequences. Based on these
results Anselin and Florax (1995b) suggested a simple decision rule. When RS, is
more significant than R S k , and RS; is significant while RS; is not, a lag depen-
dence is the likely alternative. In a similar way presence of error dependence can
be identified through RST. Finally, the finite-sample performance of tests against
higher-order dependence R S A , A(see~ (68))and the joint test RSA, were satisfactory,
although these type of tests provide less insightful guidance for an effective specifi-
cation search. For joint and higher-order alternatives, these tests are optimal, and in
practice they should be used along with the unadjusted and adjusted one-directional
tests.
the latter is significant at p slightly higher than 0.05. In other words, the impression
of spatial error autocorrelation that may be given by an uncritical interpretation of
Moran’s I is spurious, since no evidence of such autocorrelation remains after ro-
bustifying for spatial lag dependence. Instead, a spatial lag model is the suggested
alternative, consistent with the estimation results in Table 1.
V. CONCLUSIONS
ACKNOWLEDGMENTS
We would like to thank Aman Ullah and an anonymous referee for helpful sugges-
tions, and Robert Rozovsky for very able research assistance. We also would like
to thank Naoko Miki for her help in preparing the manuscript. However, we retain
the responsibility for any remaining errors. The first author acknowledges ongoing
support for the development of spatial econometric methods by the U S . National Sci-
ence Foundation, notably through grants SES 87-21875, SES 89-21385, and SBR
94-10612 as well as grant SES 88-10917 to the National Center for Geographic In-
formation and Analysis (NCGIA). The second author acknowledges financial support
by the Bureau of Economic and Business Research of the University of Illinois.
REFERENCES
Albert, P. and L. M. McShane (1995), A Generalized Estimating Equations Approach for Spa-
tially Correlated Binary Data: Applications to the Analysis of Neuroimaging Data, Bio-
metrics, 51,627-638.
Amemiya, T. (1985), Advanced Econometrics, Harvard University Press, Cambridge, MA.
Ancot, J-P., J. Paelinck, and J. Prins (1986), Some New Estimators in Spatial Econometrics,
Economics Letters, 21,245-249.
Anselin, L. (1980), Estimation Methods for Spatial Autoregressive Structures, Regional Sci-
ence Dissertation and Monograph Series 8, Cornell University, Ithaca, NY.
Anselin, L. (1982), A Note on Small Sample Properties of Estimators in a First-Order Spatial
Autoregressive Model, Environment and Planning A , 14, 1023-1030.
Anselin, L. (1988a), Spatial Econometrics: Methods and Models, Kluwer, Dordrecht.
Anselin, L. (1988b), Model Validation in Spatial Econometrics: A Review and Evaluation of
Alternative Approaches, International Regional Science Review, 11, 279-316.
Anselin, L. (1988c), Lagrange Multiplier Test Diagnostics for Spatial Dependence and Spatial
Heterogeneity, Geographical Analysis, 20, 1-17.
Anselin, L. (1990a), Spatial Dependence and Spatial Structural Instability in Applied Re-
gression Analysis, Journal of Regional Science, 30, 185-207.
Anselin, L. (1990b), Some Robust Approaches to Testing and Estimation in Spatial Econo-
metrics, Regional Science and Urban Economics, 20, 141-163.
Anselin, L. (ed.) (1992a), Space and Applied Econometrics. Special Issue, Regional Science
and Urban Economics, 22.
IN LINEARREGRESSION
SPATIAL DEPENDENCE MODELS 283
Anselin, L. (1992b), SpaceStat: A Programfor the Analysis of Spatial Data, National Center
for Geographic Information and Analysis, University of California, Santa Barbara, CA.
Anselin, L. (1995), SpaceStat Version 1.80 User’s Guide, Regional Research Institute, West
Virginia University, Morgantown, WV.
Anselin, L., A. K. Bera, R. Florax, and M. J. Yoon (1996), Simple Diagnostic Tests for Spatial
Dependence, Regional Science und Urban Economics, 26,77-104.
Anselin, L. and A. Can (19%), Spatial Effects in Models of Mortgage Origination, Paper pre-
sented at the Mid Year AREUEA Conference, Washington, DC, May 28-29.
Anselin, L., R. Dodson, and S. Hudak (1993a), Linking GIS and Spatial Data Analysis in
Practice, Geographical Systems, 1,3-23.
Anselin, L., R. Dodson, and S. Hudak (1993b), Spatial Data Analysis and GZS: Interfacing
G/S and Econometric Software, Technical Report 93-7, National Center for Geographic
Information and Analysis, University of California, Santa Barbara.
Anselin, L. and R. Florax (eds.) (199Sa), New Directions in Spcitial Econometrics, Springer-
Verlag, Berlin.
Anselin, L. and R. Florax (1995b), Small Sample Properties of Tests for Spatial Dependence
in Regression Models: Some Further Results, in L. Anselin and R. Florax (eds.), New
Directions in Spatial Econometrics, Springer-Verlag, Berlin, 21-74.
Anselin, L. and S. Hudak (1992), Spatial Econometrics in Practice, a Review of Software
Options, Regional Science and Urban Economics, 22,509-536.
Anselin. L. and S. Rey (1991), Properties of Tests for Spatial Dependence in Linear Regression
Models, Geogruphic Analysis, 23, 112-131.
Anselin, L. and S. Rey (eds.) (1997), Spatial Econometrics. Special Issue, International Re-
gional Science Review, 20.
Anselin, L. and 0. Smirnov (1996), Efficient Algorithms for Constructing Proper Higher Order
Spatial Lag Operators, Journal ofRegiona1 Science, 36, 67-89.
Arora, S. and M. Brown (1977), Alternative Approaches to Spatial Autocorrelation: An Im-
provement over Current Practice, Internationul Regional Science Review, 2,67-78.
Bartels, C. P. A. and L. Hordijk (1977), On the Power of the Generalized Moran Contigu-
ity Coefficient in Testing for Spatial Autocorrelation among Regression Disturbances,
Regional Science and Urban Economics, 7,83-101.
Bartels, C. and R. Ketellapper (eds.) (1979), Exploratory and Explanatory Arialysis of Spatial
Ilatn, Martinus Nijhoff, Boston.
Benirschka, M. and J. K. Binkley (1994), Land Price Volatility in a Geographically Dispersed
Market, American Journal of’Agriculturti1 Economics, 76, 185-195.
Bera, A. K. and C. M. Jarque (1982), Model Specification Tests: A Simultaneous Approach,
Journal of Econometrics, 20, 59-82.
Bera, A. K. and C. R. McKenzie (1986), Alternative Forms and Properties of the Score Test,
Journal ofApplied Statistics, 13, 13-25.
Bera, A. K. and C. R. McKenzie (1987), Additivity and Separability of the Lagrange Multiplier,
Likelihood Ratio and Wald Tests, Joiimal of Quuntitntive Economics, 3, 53-63.
Beron, K. J., J. C. Murdoch, and W. P. M. Vijverberg (1996), Why Cooperate? An Interdepen-
dent Probit Model of Network Correlations, Working Paper, School of Social Sciences,
University of Texas at Dallas, Richardson, TX.
Bera, A. K. and A. Ullah (1991), Rao’s Score Test in Econometrics, Journal c$Quantitative
Economics, 7, 189-220.
284 ANSELINAND 6ERA
Bera, A. K. and M. J. Yoon (1993), Specification Testing with Misspecified Alternatives, Econo-
metric Theory, 9,649-658.'
Besag, J. (1974), Spatial Interaction and the Statistical Analysis of Lattice Systems, Journal
of the Royal Statistical Society, B, 36, 192-225.
Besley, T. and A. Case (1995), Incumbent Behavior: Vote-Seeking, Tax-Setting, and Yardstick
Competition, American Economic Review, 8 5 , 2 5 4 5 .
Bivand, R. (1992), Systat Compatible Software for Modeling Spatial Dependence among Ob-
servations, Computers and Geosciences, 18,951-963.
Blommestein, H. (1983), Specification and Estimation of Spatial Econometric Models: A Dis-
cussion of Alternative Strategies for Spatial Economic Modelling, Regional Science and
Urban Economics, 13,250-271.
Blommestein, H. (1985), Elimination of Circular Routes in Spatial Dynamic Regression Equa-
tions, Regional Science and Urban Economics, 15, 121-130.
Blommestein, H. J. and N. A. Koper (1992), Recursive Algorithms for the Elimination of
Redundant Paths in Spatial Lag Operators, Journal of Regional Science, 32,91-111.
Bolduc, D., M. G. Dagenais, and M. J. Gaudry (1989), Spatially Autocorrelated Errors in
Origin-Destination Models: A New Specification Applied to Urban Travel Demand in
Winnipeg, Transportation Research, B 23,361-372.
Bolduc, D., R. Laferrikre, and G. Santarossa (1992), Spatial Autoregressive Error Components
in Travel Flow Models, Regional Science and Urban Economics, 22,371-385.
Bolduc, D., R. Laferrihre, and G. Santarossa (1995), Spatial Autoregressive Error Components
in Travel Flow Models, an Application to Aggregate Mode Choice, in L. Anselin and R.
Florax (eds.), New Directions in Spatial Econometrics, Springer-Verlag, Berlin, 96-108.
Bowden, R. J. and D. A. Turkington (1984), Instrumental Variables, Cambridge University
Press, Cam bridge.
Box, G. E. P., G. M. Jenkins, and G. C. Reinsel(1994), Time Series Analysis, Forecasting and
Control, 3rd ed., Prentice Hall, Englewod Cliffs, NJ.
Brandsma, A. S. and R. H. Ketellapper (1979a), A Biparametric Approach to Spatial Auto-
correlation, Environment and Planning A , 11, 51-58.
Brandsma, A. S. and R. H. Ketellapper (1979b). Further Evidence on Alternative Procedures
for Testing of Spatial Autocorrelation among Regression Disturbances, in C. Bartels
and R. Ketellapper (eds.), Exploratory and Explanatory Analysis in Spatial Data, Mar-
tin Nijhoff, Boston, 111-136.
Brett, C. and C. A. P. Pinkse (1997), Those Taxes Are All over the Map! A Test for Spatial Inde-
pendence of Municipal Tax Rates in British Columbia, Znternational Regional Science
Review, 20, 131-151.
Breusch, T. (1980), useful Invariance Results for Generalized Regression Models, Journal of
Econometrics, 13,327-340.
Brook, D. (1964), On the Distinction between the Conditional Probability and Joint Probability
Approaches in the Specification of Nearest Neighbor Systems, Biometrika, 51, 481-
483.
Brueckner, J. K. (19%), Testing for Strategic Interaction among Local Governments: The Case
of Growth Controls, Discussion Paper, Department of Economics, University of Illinois,
Champaign.
Burridge, P. (1980), On the Cliff-Ord Test for Spatial Autocorrelation, Journal of the Royal
Statistical Society B, 42, 107-108.
IN LINEARREGRESSION
SPATIAL DEPENDENCE MODELS 285
Can, A. (1992), Specification and Estimation of Hedonic Housing Price Models, Regional
Science and Urban Economics, 22, 453-474.
Can, A. (19%), Weight Matrices and Spatial Autocorrelation Statistics Using a Topological
Vector Data Model, Znternational Journul of Geographical Information Systems, 10,
1009- 1017.
Can, A. and I. F. Megbolugbe (1997),Spatial Dependence and House Price Index Construc-
tion, Journal of Real Estate Finance and Economics, 14,203-222.
Case, A. (1987),On the Use of Spatial Autoregressive Models in Demand Analysis, Discus-
sion Paper 135, Research Program in Development Studies, Woodrow Wilson School,
Princeton University.
Case, A. (1991),Spatial Patterns in Household Demand, Econometrica, 59,953-965.
Case, A. (1992). Neighborhood Influence and Technological Change, Regional Science and
llrban Economics, 22,491-508.
Case, A. C., H. S. Rosen, and J. R. Hines (1993), Budget Spillovers and Fiscal Policy In-
terdependence: Evidence from the States, Journal of Public Economics, 52, 285-
307.
Cliff, A. and J. K. Ord (1972),Testing for Spatial Autocorrelation among Regression Residu-
a l ~Geographic
, Analysis, 4, 267-284.
Cliff, A. and J. K. Ord (1973),Spatial Autocorrelation, Pion, London.
Cliff, A. and J. K. Ord (1981),Spatial Processes: Models and Applications, Pion, London.
Cressie, N. (1991), Geostatistical Analysis of Spatial Data, in National Research Council,
Spatial Statistics and Digital Zmage Analysis, National Academy Press, Washington,
DC, 87-108.
Cressie, N (1993),Statistics for Spatial Data, Wiley, New York.
Dasgupta, S. and M. D. Perlman (1974),Power of the Noncentral F-test: Effect of Additional
Variate on Hotelling’s T2-test,Journal of the American Statistical Association, 69,174-
180.
Davidson, R. and J. G. MacKinnon (1987), Implicit Alternatives and Local Power of Test
Statistics, Econometrica, 55, 1305-1329.
Davidson, R. and J. G. MacKinnon (1993),Estimation and Inference in Econometrics, Oxford
University Press, New York.
Deutsch, C. V. and A. G. Journel (1992), GSLZB: Geostatisticul Software Library and User’s
Guide, Oxford University Press, Oxford.
Doreian, P. (1980), Linear Models with Spatially Distributed Data, Spatial Disturbances or
Spatial Effects, Sociological Methods and Research, 9 , 2 9 4 0 .
Doreian, P., K. Teuter, and C-H. Wang (1984),Network Autocorrelation Models, Sociological
Methods and Research, 13, 155-200.
Dow, M. M., M. L. Burton, and D. R. White (1982), Network Autocorrelation: A Simulation
Study of a Foundational Problem in Regression and Survey Study Research, Social
Networks, 4, 169-200.
Dubin, R. (1988),Estimation of Regression Coefficients in the Presence of Spatially Autocor-
related Error Terms, Review of Economics and Statistics, 70,466-474.
Dubin, R. (1992), Spatial Autocorrelation and Neighborhood Quality, Regional Science and
Urban Economics, 22,433-452.
Durbin, J. and G. S. Watson (1950),Testing for Serial Correlation in Least Squares Regression
I, Biometrika, 37,409-428.
286 ANSELINAND BERA
Durbin, J. and G. S. Watson (1951), Testing for Serial Correlation in Least Squares Regression
11, Biometrika, 38, 159-179.
Florax, R. and S. Rey (1995), The Impacts of Misspecified Spatial Interaction in Linear Re-
gression Models, in L. Anselin and R. Florax (eds.), New Directions in Spatial Econo-
metrics, Springer-Verlag, Berlin, 111-135.
Fomby, T. B., R. C. Hill, and S. R. Johnson (1984), Advanced Econometric Methods, Springer-
Verlag, New York.
Getis, A. (1995), Spatial Filtering in a Regression Framework: Examples Using Data on Ur-
ban Crime, Regional Inequality, and Government Expenditures, in L. Anselin and
R. Florax (eds.), New Directions in Spatial Econometrics, Springer-Verlag, Berlin, 172-
185.
Godfrey, L. (1981), On the Invariance of the Lagrange Multiplier Test with Respect to Certain
Changes in the Alternative Hypothesis, Econometrica, 49, 1443-1455.
Godfrey, L. (1988), MisspeclJication Tests in Econometrics, Cambridge University Press, New
York.
Greene, W. H. (1993), Econometric Analysis, 2nd ed., Macmillan, New York.
Griffith, D. A. (1987), Spatial Autocorrelation, A Primer, Association of American Geogra-
phers, Washington, DC.
Griffith, D. A. (1993), Spatial Regression Analysis on the PC: Spatial Statistics Using SAS,
Association of American Geographers, Washington, DC.
Haining, R. (1984), Testing a Spatial Interacting-Markets Hypothesis, The Review of Eco-
nomics and Statistics, 66, 576-583.
Haining, R. (1988), Estimating Spatial Means with an Application to Remotely Sensed Data,
Communications in Statistics: Theory and Methods, 17, 573-597.
Haining, R. (1990), Spatial Data Analysis in the Social and Environmental Sciences, Cam-
bridge University Press, Cambridge.
Heijmans, R. and J. Magnus (1986a), On the First-Order Efficiency and Asymptotic Normality
of Maximum Likelihood Estimators Obtained from Dependent Observations, Statistica
Neerlandica, 40, 169-188.
Heijmans, R. and J. Magnus (1986b), Asymptotic Normality of Maximum Likelihood Estima-
tors Obtained from Normally Distributed but Dependent Observations, Econometric
Theory, 2,374-412.
Hepple, L. W. (1995a), Bayesian Techniques in Spatial and Network Econometrics. 1: Model
Comparison and Posterior Odds, Environment and Planning A , 27, 447-469.
Hepple, L. W. (1995b), Bayesian Techniques in Spatial and Network Econometrics. 2: Com-
putational Methods and Algorithms, Environment and Planning A , 27,615-644.
Holtz-Eakin, D. (1994), Public-Sector Capital and the Productivity Puzzle, Review of Eco-
nomics and Statistics, 76, 12-21.
Hooper, P. and G . Hewings (1981), Some Properties of Space-Time Processes, Geographical
Analysis, 13,203-223.
Hordijk, L. (1974), Spatial Correlation in the Disturbances of a Linear Interregional Model,
Regional and Urban Economics, 4, 117-140.
Hordijk, L. (1979), Problems in Estimating Econometric Relations in Space, Papers, Regional
Science Association, 42,99-11.5.
Hordijk, L. and J. Paelinck (1976), Some Principles and Results in Spatial Econometrics,
Recherches Economiques de Louvain, 42, 175-197.
IN LINEAR REGRESSION
SPATIAL DEPENDENCE MODELS 287
Huang, J. S. (1984). The Autoregressive Moving Average Model for Spatial Analysis, Aus-
tralian Journal of Statistics, 26, 169-178.
Imhof, J. P. (1961), Computing the Distribution of Quadratic Forms in Normal Variables,
Biometrika, 48,419-426.
Isaaks, E. H. and R. M. Srivastava (1989), An Introduction to Applied Ceostatistics, Oxford
University Press, Oxford.
Johnson, N. L., and S. Kotz (1%9),Distributions in Statistics: Discrete Distributions, Houghton
Mifflin, Boston.
Johnston, J. (1984), Econometric Models, McGraw-Hill, New York.
Judge, G., R. C. Hill, W. E. Griffiths, H. Lutkepohl, and T-C. Lee (1982), Introduction to the
Theory and Practice of Econometrics, Wiley, New York.
Judge, G., W. E. Griffiths, R. C. Hill, H. Lutkepohl, and T.-C. Lee (1985), The Theory and
Practice of Econometrics, 2nd ed., Wiley, New York.
Kelejian, H. and D. Robinson (1992),Spatial Autocorrelation: A New Computationally Simple
Test with an Application to Per Capita Country Police Expenditures, Regional Science
and Urban Economics, 22,317-331.
Kelejian, H. H. and D. P. Robinson (1993), A Suggested Method of Estimation for Spatial
Interdependent Models with Autocorrelated Errors, and a n Application to a County
Expenditure Model, Papers in Regional Science, 72,297-312.
Kelejian, H. H. and D. P. Robinson (1995), Spatial Correlation: A Suggested Alternative to
the Autoregressive Model, in L. Anselin and R. Florax (eds.), New Directions in Spatial
Econometrics, Springer-Verlag, Berlin, 75-95.
King, M. L. (1981), A Small Sample Property of the Cliff-Ord Test for Spatial Correlation,
Journal of the Royal Statistical Society B , 43,263-264.
King, M. L. (1987), Testing for Autocorrelation in Linear Regression Models: A Survey, in
M. L. King and D. E. A. Giles (eds.), Specijication Analysis in the Linear Model, Rout-
ledge and Kegan Paul, London, 19-73.
King, M. L. and M. A. Evans (1986), Testing for Block Effects in Regression Models Based
on Survey Data, Journal ofthe American Statistical Association, 81,677-679.
King, M. L. and G. H. Hillier (1985), Locally Best Invariant Tests of the Error Covariance
Matrix of the Linear Regression Model, Journal ofthe Royal Statistical Society B , 47,
98-102.
Koerts, J. and A. P. I. Abrahamse (1968), On the Power of the BLUS Procedure, Journal of
the American Statistical Association, 63, 1227-1236.
Krugman, P. (1991), Increasing Returns and Economic Geography, Journal ofPolitica1 Econ-
omy, 99,483499.
Land, K. and G. Deane (1992), On the Large-Sample Estimation of Regression Models with
Spatial- or Network-Effects Terms: A Two Stage Least Squares Approach, in P. Marsden
(ed.), Sociological Methodology, Jossey Bass, San Francisco, 221-248.
Leenders, R. T. (1995), Structure and Injluence. Statistical Models for the Dynamics of Actor
Attributes, Network Structure and Their Interdependence, Thesis Publishers, Amster-
dam.
Legendre, P. (1993), Spatial Autocorrelation: Trouble or New Paradigm, Ecology, 74, 1659-
1673.
LeSage, J. P. (1993), Spatial Modeling of Agricultural Markets, American Journal of Agricul-
tural Economics, 75, 1211-1216.
288 ANSELINAND BERA
Badi H. Baitagi
Texas A&M University, College Station, Texas
1. INTRODUCTION
Panel data refers to data sets consisting of multiple observations on each sampling
unit. This could be generated by pooling time-series observations across a variety
of cross-sectional units including countries, states, regions, firms, or randomly sam-
pled individuals or households. Two well-known examples in the United States are
the Panel Study of Income Dynamics (PSID) and the National Longitudinal Survey
(NLS). The PSID began in 1968 with 4802 families, including an oversampling of
poor households. Annual interviews were conducted and socioeconomic character-
istics of each of the families and of roughly 31000 individuals who have been in
these or derivative families were recorded. The list of variables collected is over
5000. The NLS followed five distinct segments of the labor force. The original sam-
ples include 5020 older men, 5225 young men, 5083 mature women, 5159 young
women, and 12686 youths. There was an oversampling of blacks, hispanics, poor
whites, and military in the youths survey. The list of variables collected runs into
the thousands. Panel data sets have also been constructed from the U.S. Current
Population Survey (CPS), which is a monthly national household survey conducted
by the Census Bureau. The CPS generates the unemployment rate and other la-
bor force statistics. Compared with the NLS and PSID data sets, the CPS contains
fewer variables, spans a shorter period, and does not follow movers. However, it
covers a much larger sample and is representative of all demographic groups. Eu-
ropean panel data sets include the German Social Economic Panel, the Swedish
study of household market and nonmarket activities, and the Intomart Dutch panel
of households.
Some of the benefits and limitations of using panel data sets are listed in Hsiao
(1986). Obvious benefits are a much larger data set with more variability and less
29 I
292 BALTAGI
new wave and increases the degree of computational difficulty in the estimation of
qualitative limited dependent variable panel data models (Baltagi 1995b).
Although, random coefficient regressions can be used in the estimation and specifi-
cation of panel data models (Swamy 1971, Hsiao 1986, Dielman 1989),most panel
data applications have been limited to a simple regression with error components
disturbances
quadratic unbiased (BQU) estimators of the variance components are ANOVA type
estimators based on the true disturbances and these are minimum variance unbi-
ased (MVU) under normality of the disturbances. One can obtain feasible estimates
of the variance components by replacing the true disturbances by OLS residuals
(Wallace and Hussain 1969). Alternatively, one could substitute the fixed-effects
residuals as proposed by Amemiya (1971). In fact, Amemiya (1971) shows that the
Wallace and Hussain (1969)estimates of the variance components have a different
asymptotic distribution from that knowing the true disturbances, while the Amemiya
(1971)estimates of the variance components have the same asymptotic distribution
as that knowing the true disturbances. Other estimators of the variance components
were proposed by Swamy and Arora (1972) and Fuller and Battese (1974). Maximum
likelihood estimation (MLE) under the normality of the disturbances is derived by
Amemiya (1971). The first-order conditions are nonlinear, but can be solved using
an iterative GLS scheme (Breusch 1987). Finally one can apply Rao’s (1972) min-
imum norm quadratic unbiased estimation (MINQUE) methods. These methods are
surveyed in Baltagi (1995b).Wallace and Hussain (1969) compare the RE and FE
estimators of B in the case of nonstochastic (repetitive) %it’sand find that both are (i)
asymptotically normal, (ii) consistent and unbiased, and that (iii) BKE has a smaller
generalized variance (i.e., more efficient) in finite samples. In the case of nonstochas-
tic (nonrepetitive) xit’s they find that both BKE and f i ~ 1 . :are consistent, asymptotically
unbiased and have equivalent asymptotic variance-covariance matrices, as both N
and T + 00. Under the random effects model, GLS based on the true variance
components is BLUE, and all the feasible GLS estimators considered are asymptot-
ically efficient as N and T + 00. Maddala and Mount (1973)compared OLS, FE,
RE, and MLE methods using Monte Carlo experiments. They found little to choose
among the various feasible GLS estimators in small samples and argued in favor of
methods that were easier to compute. MINQUE was dismissed as more difficult to
compute, ancl tLle applied researcher given one shot at the data was warned to com-
pute at least two methods of estimation. If these methods give different results, the
authors diagnose misspecification. Taylor (1980)derived exact finite sample results
for the one-way error component model ignoring the time effects. He found the fol-
lowing important results. (1) Feasible GLS is more efficient that the FE estimator
for all but the fewest degrees of freedom. (2) The variance of feasible GLS is never
more than 17% above the Cramer-Rao lower bound. (3)More efficient estimators of
the variance components do not necessarily yield more efficient feasible GLS esti-
mators. These finite-sample results are confirmed by the Monte Carlo experiments
carried out by Baltagi (1981a).
One test for the usefulness of panel data models is their ability to predict. For
the FE model, the best linear unbiased predictor (BLUP) was derived by Wansbeek
and Kapteyn (1978) and Taub (1979). This derivation was generalized by Baltagi
and Li (1992) to the RE model with serially correlated remainder disturbances. More
recently, Baillie and Baltagi (1995) derived the asymptotic mean square prediction
PANELDATAMETHODS 295
error for the FE and RE predictors as well as two other misspecified predictors and
compared their performance using Monte Carlo experiments.
Fixed versus random effects has generated a lively debate in the biometrics liter-
ature. In econometrics, see Mundlak (1978). The random and fixed effects models
yield different estimation results, especially if T is small and N is large. A speci-
fication test based on the difference between these estimates is given by Hausman
(1978). The null hypothesis is that the individual and time effects are not correlated
with the xit’s. The basic idea behind this test is that the fixed effects estimator f i F E is
consistent whether the effects are or are not correlated with the xit’s. This is true be-
cause the fixed effects transformation described by y~~wipes out the pi and A, effects
from the model. However, if the null hypothesis is true, the fixed effects estimator is
not efficient under the RE specification, because it relies only on the within variation
in the data. On the other hand, the RE estimator &E is efficient under the null hy-
pothesis but is biased and inconsistent when the effects are correlated with the xLL’s.
BRE
The difference between these estimators ij = f i -~ ~ tend to zero in probability
limits under the null hypothesis and is nonzero under the alternative. The variance of
this difference is equal to the difference in variances, var(4) = var(&E)-var(fiRE),
since cov(4, BKE)= 0 under the null hypothesis. Hausman’s test statistic is based
upon m = ij’[var(Q)]-‘Q and is asymptotically distributed as x’
with k degrees of
freedom under the null hypothesis.* The Hausman test can also be computed as a
variable addition test by running y* on the regressor matrices X* and 8 testing that
I
the coefficients of are zero using the usual F-test. This test was generalized by
Arellano (1993)to make it robust to heteroskedasticity and autocorrelation of arbi-
trary forms. In fact, if either heteroskedasticity or serial correlation is present, the
variances of the FE and RE estimators are not valid and the corresponding Haus-
man test statistic is inappropriate. Ahn and Low (1996)show that the Hausman test
statistic can be obtained as (NT)R’ from the regression of GLS residuals on R and
X where the latter denotes the matrix of regressors averaged over time. Also, an al-
ternative generalized method of moments (GMM) test is recommended for testing the
joint null hypothesis of exogeneity of the regressors and the stability of regression
parameters over time. If the regression parameters are nonstationary over time, then
both BR~: BFE:
and are inconsistent even though the regressors may be exogenous.
Ahn and Low perform Monte Carlo experiments which show that both the Haus-
*For the one-way error components model with individual elfects only, Hausman and Taylor (1981)show
that Hausman’s specification test can also he based on two other contrasts that yield numerically iden-
tical results. Kang (1985) extends this analysis to the two-way error romponents model.
296 BALTAGI
man and the alternative GMM test have good power in detecting endogeneity of the
regressors. However, the alternative GMM test dominates if the coefficients of the
regressors are nonstationary. Li and Stengos (1992) propose a Hausman specifica-
tion test based on fl-consistent semiparametric estimators. They apply it in the
context of a dynamic panel data model of the form
recreation demand model that a Stein-rule estimator give better forecast risk per-
formance than the pooled or individual cross-sectional estimates. More recently, the
fundamental assumption underlying pooled homogeneous parameters models has
been called into question. For example, Robertson and Symons (1992) warned about
the bias from pooled estimators when the estimated model is dynamic and homoge-
neous when in fact the true model is static and heterogeneous. Pesaran and Smith
(1995)argued in favor of heterogeneous estimators rather than pooled estimators for
panels with large N and T . They showed that when the true model is dynamic and
heterogeneous, the pooled estimators are inconsistent, whereas an average estimator
of heterogeneous parameters can lead to consistent estimates as long as both N and
T tend to infinity. Using a different approach, Maddala, Srivastava, and Li (1994) ar-
gued that shrinkage estimators are superior to either heterogeneous or homogeneous
parameter estimates especially for prediction purposes. In this case, one shrinks
the individual heterogeneous estimates toward the pooled estimate using weights
depending on their corresponding variance-covariance matrices. Baltagi and Griffin
(1997) compare the short-run and long-run forecast performance of the pooled homo-
geneous, individual heterogeneous, and shrinkage estimators for a dynamic demand
for the gasoline across 18 OECD countries. Based on 1-, 5-, and 10-year forecasts,
the results support the case for pooling. Alternative tests for structural change in
panel data include Han and Park (1989), who used the cumulative sum and cusum
of squares to test for structural change based on recursive residuals. They find no
structural break over the period 1958-1976 in U.S. foreign trade of manufacturing
goods.
Testing for random individual effects is of utmost importance in panel data
applications. Ignoring these effects lead to huge bias in estimation (Moulton 1986).
A popular Lagrange multiplier (LM) test for the significance of the random effects
H l ; 0: = 0 was derived by Breusch and Pagan (1980).This test statistic can be
easily computed using least-squares residuals. This assumes that the alternative hy-
pothesis is two-sided when we know that the variance components are nonnegative.
A one-sided version of this test is given by Honda (1985).This is shown to be uni-
formly most powerful and robust to nonnormality. However, Moulton and Randolph
(1989) showed that the asymptotic N ( 0 , 1) approximation for this one-sided LM
statistic can be poor even in large samples. They suggest an alternative standard-
ized Lagrange multiplier (SLM) test whose asymptotic critical values are generally
closer to the exact critical values than those of the LM test. This SLM test statistic
centers and scales the one-sided LM statistic so that its mean is zero and its variance
IS one.
For Hob; 0; = a : = 0, the two-sided LM test is given by Breusch and Pa-
gan (1980) and is distributed as x; under the null. Honda (1985)does not derive a
uniformly most powerful one-sided test for Hk, but he suggests a “handy” one-sided
test which is distributed as N ( 0 , 1)under H i . Later Honda (1991)derives the SLM
version of this one-sided test. Baltagi, Chang, and Li (1992) derive a locally mean
298 BALTAGI
most powerful (LMMP) one-sided test for Hk and its SLM version is given by Baltagi
(1995b). Under H:, a; = cr: = 0, these standardized Lagrange multiplier statis-
tics are asymptotically N(0, 1)and their asymptotic critical values should be closer
to the exact critical values than those of the corresponding unstandardized tests. Al-
ternatively, one can perform a likelihood ratio test or an ANOVA-type F-test. Both
tests have the same asymptotic distribution as their LM counterparts. Moulton and
Randolph (1989) find that although the F-test is not locally most powerful, its power
function is close to the power function of the exact LM test and is therefore rec-
ommended. A comparison of these various testing procedures using Monte Carlo ex-
periments is given by Baltagi, Chang, and Li (1992). Recent developments include a
generalization by Li and Stengos (1994) of the Breusch-Pagan test to the case where
the remainder error is heteroskedastic of unknown form. Also, Baltagi and Chang
(1996) propose a simple ANOVA F-statistic based on recursive residuals to test for
random individual effects.
For incomplete (or unbalanced) panels, the Breusch-Pagan test can be eas-
ily extended; see Moulton and Randolph (1989)for the one-way error components
model and Baltagi and Li (1990)for the two-way error components model. For non-
linear models, Baltagi (1996) suggests a simple method for testing for zero random
individual and time effects using a Gauss-Newton regression. In case the regression
model is linear, this test amounts to a variable addition test, i.e., running the original
regression with two additional regressors. The first is the average of the least-squares
residuals over time, while the second is the average of the least-squares residuals
over individuals. The test statistic becomes the F-statistic for the significance of the
two additional regressors.
Baltagi and Li (1995)derive three LM test statistics that jointly test for serial
correlation and individual effects. The first LM statistic jointly tests for zero first-
order serial correlation and random individual effects, the second LM statistic tests
for zero first-order serial correlation assuming fixed individual effects, and the third
LM statistic tests for zero first-order serial correlation assuming random individual
effects. In all three cases, Baltagi and Li (1995)showed that the corresponding LM
statistic is the same whether the alternative is AR(1) or MA(1). In addition, Baltagi
and Li (1995)derive two simple tests for distinguishing between AR(1) and MA(1)
remainder disturbances in error components regressions and perform Monte Carlo
experiments to study the performance of these tests. For the fixed-effects model,
Bhargava, Franzini, and Narendranathan (1982) derived a modified Durbin-Watson
test statistic based on FE residuals to test for first-order serial correlation and a test
for random walk based on differenced OLS residuals. Chesher (1984) derived a score
test for neglected heterogeneity, which is viewed as causing parameter variation.
Also, Hamerle (1990) and Orme (1993) suggest a score test for neglected hetero-
geneity for qualitative limited dependent-variable panel data models.
Holtz-Eakin (1988) derives a simple test for the presence of individual ef-
fects in dynamic (autoregressive) panel data models, while Holtz-Eakin, Newey, and
PANELDATAMETHODS 299
Rosen (1988)formulate a coherent set of procedures for estimating and testing VAR
(vector autoregression) with panel data. Arellano and Bond (1991) consider tests
for serial correlation and overidentification restrictions in a dynamic random-effects
model, while Arellano (1990) considers testing covariance restrictions for error com-
ponents or first-difference structures with white noise, MA, or AR schemes.
Chamberlain (1982, 1984) finds that the fixed effects specification imposes
testable restrictions on coefficients from regressions of all leads and lags of the de-
pendent variable on all leads and lags of independent variables. These overidenti-
fication restrictions are testable using minimum chi-squared statistics. Angrist and
Newey (1991) show that, in the standard fixed effects model, this overidentifica-
tion test statistic is simply the degrees of freedom times the R2 from a regression
of within residuals on all leads and lags of the independent variables. They apply
this test to models of the union-wage effect using five years of data from the National
Longitudinal Survey of Youth and to a conventional human capital earnings function
estimating the return to schooling. They do not reject a fixed effect specification in
the union-wage example, but they do reject it in the return to schooling example.
Testing for unit roots using panel data has been recently reconsidered by Quah
(1994), Levin and Lin (1996), and Im, Pesaran, and Shin (1996). This has been ap-
plied by MacDonald (1996) to real exchange rates for 17 OECD countries based
on a wholesale price index, and 23 OECD countries based on a consumer price in-
dex, all over the period 1973-1992. The null hypothesis that real exchange rates
contain a unit root is rejected. Earlier applications include Bourmahdi and Thomas
(1991). who apply a likelihood ratio unit root panel data test to assess efficiency of
the French capital market. Using 140 French stock prices observed weekly from Jan-
uary 1973to February 1986 ( T = 671) on the Paris Stock Exchange, Boumahdi and
Thomas (1991)do not reject the null hypothesis of a unit root. Also, Breitung and
Meyer (1994) apply panel data unit roots test to contract wages negotiated on firm
and industry level in western Germany over the period 1972-1987. They find that
both firm and industry wages possess a unit root in the autoregressive representation.
However, there is weak evidence for a cointegration relationship.
The error components disturbances are homoskedastic across individuals. This may
be an unrealistic assumption and has been relaxed by Mazodier and Trognon (1978)
and Baltagi and Griffin (1988). A more general heteroskedastic model is given by
Randolph (1988)in the context of unbalanced panels. Also, Li and Stengos (1994)
proposed estimating a one-way error component model with heteroskedasticity of
unknown form using adaptive estimation techniques.
300 BALTAGI
The error components regression model has been also generalized to allow
for serial correlation in the remainder disturbances by Lillard and Willis (1978),
Revankar (1979), MaCurdy (1982),Baltagi and Li (1991, 1995),and Galbraith and
Zinde-Walsh (1995). Chamberlain (1982,1984) allows for arbitrary serial correlation
and heteroskedastic patterns by viewing each time period as an equation and treating
the panel as a multivariate setup. Also, Kiefer (1980), Schmidt (1983), Arellano
(1987), and Chowdhury (1994) extend the fixed-effects model to cover cases with an
arbitrary intertemporal covariance matrix.
The normality assumption on the error components disturbances may be un-
tenable. Horowitz and Markatou (1996)show how to carry out nonparametric estima-
tion of the densities of the error components. Using data from the Current Population
Survey, they estimate an earnings model and show that the probability that individ-
uals with low earnings will become high earners in the future are much lower than
that obtained under the assumption of normality. One drawback of this nonparamet-
ric estimator is its slow convergence at a rate of l/(log N ) , where N is the number .
of individuals. Monte Carlo results suggest that this estimator should be used for N
larger than 1000.
Micro panel data on households, individuals, and firms are highly likely to
exhibit measurement error; see Duncan and Hill (1985) who found serious measure-
ment error in average hourly earnings in the Panel of Income Dynamics. Using panel
data, Griliches and Hausman (1986) showed that one can identify and estimate a va-
riety of errors in variables models without the use of external instruments. Griliches
and Hausman suggest differencing the data j periods apart (yil - x,~-,),thus gener-
ating “different-lengths” difference estimators. These transformations wipe out the
individual effect, but they may aggravate the measurement error bias. One can cal-
culate the bias of the different-lengths differenced estimators and use this infor-
mation to obtain consistent estimators of the regression coefficients. Extensions of
this model include Kao and Schnell(1987a, 1987b), Wansbeek and Koning (1989),
Hsiao (1991), Wansbeek and Kapteyn (1992), and Biorn (1992). See also Baltagi and
Pinnoi (1995) for an application to the productivity of the public capital stock.
The error components model has been extended to the seemingly unrelated
regressions case by Avery (1977), Baltagi (1980), Magnus (1982), Prucha (1984),
and Kinal and Lahiri (1990). Some applications include Howrey and Varian (1984)
on the estimation of a system of demand equations for electricity by time of day, and
Sickles (1985) on the analysis of productivity growth in the U.S. airlines industry.
For the simultaneous equation with error components. Baltagi (1981b) derives
the error component two-stage (ECZSLS) and three-stage (EC3SLS) least-squares es-
timators, while Prucha (1985) derives the full-information MLE under the normality
assumption. These estimators are surveyed in Krishnakumar (1988). Monte Carlo
experiments are given by Baltagi (1984) and Mhtyhs and Lovrics (1990).Recent ap-
plications of EC2SLS and EC3SLS include (i) an econometric rational-expectations
macroeconomic model for developing countries with capital controls (Haque, Lahiri,
METHODS 30 I
PANELDATA
and Montiel 1993), and (ii) an econometric model measuring income and price elas-
ticities of foreign trade for developing countries (Kinal and Lahiri 1993).
Mundlak (1978) considered the case where the endogeneity is solely attributed
to the individual effects. In this case, Mundlak showed that if these individual ef-
fects are a linear function of the averages of all the explanatory variables across
time, then the GLS estimator of this model coincides with the FE estimator. Mund-
lak’s (1978) formulation assumes that all the explanatory variables are related to the
individual effects. The random-effects model, on the other hand, assumes no corre-
lation between the explanatory variables and the individual effects. Instead of this
“all or nothing” correlation among the xit’s and the p i ’ s , Hausman and Taylor (1981)
consider a model where some of the explanatory variables are related to the p i ’ s . In
particular, they consider
where the zi’s are cross-sectional time-invariant variables. Hausman and Taylor
(1981), hereafter HT, split the matrices X and 2 into two sets of variables: X =
[ X I ;X2]and 2 = [Zl; 221, where X I is n x k l , X2 is n x k 2 , Z l is n x gl, 2 2 is
n x g2, and n = N T . The terms X I and 21are assumed exogenous in that they
are not correlated with pi and u i t , while X 2 and Z2 are endogenous because they
are correlated with the pi’s but not the viL’s.The within transformation would sweep
the p i ’ s and remove the bias, but in the process it would also remove the Zi’s and
hence t h e within estimator will not give an estimate of the y’s. To get around that,
Hausman and Taylor (1981) suggest an instrumental variable estimator that uses
fl,f 2 , XI, and 21as instruments. Therefore, the matrix of regressors X I is used
twice, once as averages and another time as deviations from averages. This is an
advantage of panel data allowing instruments from within the model. The order con-
dition for identification gives the result that the number of X I ’ S (kl) must be at least
as large as the number of Z2k (g2). With stronger exogeneity assumptions between
X and the pi’s, Amemiya and MaCurdy (1986) and Breusch, Mizon, and Schmidt
(1989) suggest more efficient instrumental variable (IV) estimators. Cornwell and
Rupert (1988) apply these IV methods to a returns to schooling example based on a
panel of 595 individuals drawn from the PSID over the period 1976-1982. Recently,
Metcalf (1996)shows that for the Hausman-Taylor model given in (4), using less in-
struments may lead to a more powerful Hausman specification test. Asymptotically,
more instruments led to more efficient estimators. However, the asymptotic bias of
the inefficient estimator will also be greater as the null hypothesis of no correlation
is violated. The increase in bias more than offsets the increase in variance. Since the
test statistic is linear in variance but quadratic in bias, its power will increase.
Cornwell, Schmidt, and Wyhowski (1992) consider a simultaneous equation
model with error components that distinguishes between two types of exogenous vari-
ables, namely singly exogenous and doubly exogenous variables. A singly exogenous
variable is correlated with the individual effects but not with the remainder noise,
302 BALTAGI
while a doubly exogenous variable is uncorrelated with both the effects and the re-
mainder disturbance term. For this encompassing model with two types of exogene-
ity, Cornwell, Schmidt, and Wyhowski (1992) extend the three instrumental vari-
able estimators considered above and give them a GMM interpretation. Wyhowski
(1994) extend these results to the two-way error components model, while Revankar
(1992) establishes conditions for exact equivalence of instrumental variables in a
simultaneous-equation two-way error components model.
*In particular, the assumptions made on the initial values are of utmost importance (Anderson and Hsiao
1982, Bhargava and Sargan 1983, Hsiao 1986). Hsiao (1986) summarizes the consistency properties of
the MLE and GLS under a RE dynamic model depending on the initial values assumption and the way
in which N and T tend to infinity.
PANELDATAMETHODS 303
(1989)finds that for simple dynamic error components models the estimator that uses
differences rather than levels yi,t-2 for instruments has a singularity point
and very large variances over a significant range of parameter values. In contrast, the
estimator that uses instruments in levels, i.e., ~ i , ~ - 2has
, no singularities and much
smaller variances and is therefore recommended. Additional instruments can be ob-
tained in a dynamic panel data model if one utilizes the orthogonality conditions
that exist between lagged values of yil and the disturbances vit (Holtz-Eakin 1988,
Holtz-Eakin, Newey, and Rosen 1988, Arellano and Bond 1991).Based on these ad-
ditional moments, Arellano and Bond (1991)suggest a GMM estimator and propose
a Sargan-type test for overidentifying restrictions.* Arellano and Bover (1995)de-
velop a unifying GMM framework for looking at efficient IV estimators for dynamic
panel data models. They do that in the context of the Hausman and Taylor (1981)
model given in (4). Ahn and Schmidt (1995)show that under the standard assump-
tions used in a dynamic panel data model, there are additional moment conditions
that are ignored by the IV estimators suggested by Arellano and Bond (1991).They
show how these additional restrictions can be utilized in a GMM framework. Ahn
and Schmidt (1995)also consider the dynamic version of the Hausman and Taylor
(1981) model and show how one can make efficient use of exogenous variables as
instruments. In particular, they show that the strong exogeneity assumption implies
more orthogonality conditions which lie in the deviations from mean space. These
are irrelevant in the static Hausman-Taylor model but are relevant for the dynamic
version of that model.
An alternative approach to estimating dynamic panel data models have been
suggested by Keane and Runkle (1992). Drawing upon the forward filtering idea
from the time-series literature, this method of estimation first transforms the model
to eliminate the general and arbitrary serial correlation pattern in the data. By doing
so, one can use the set of original predetermined instruments to obtain consistent
parameter estimates of the model. First differencing is also used in dynamic panel
data models to get rid of individual specific effect, and the resulting first-differenced
errors are serially correlated of an MA(1)type with unit root if the original uil’s are
classical errors. In this case, there will be gain in efficiency in performing the Keane
and Runkle filtering procedure on the FD model. Underlying this estimation proce-
dure are two important hypotheses that are testable. The first is H A ;the set of instru-
ments are strictly exogenous. In order to test H A , Keane and Runkle propose a test
based on the difference between fixed-effects 2SLS (FE-2SLS) and first-difference
2SLS (FD-2SLS). FE-2SLS is consistent only if H A is true. In fact if the matrix of in-
struments contains predetermined variables then FE-2SLS would not be consistent.
*Bhargava (1991) gives sufficient conditions for the identification of static and dynamic panel data models
with endogenous regressors.
304 BALTAGI
Incomplete panels are more likely to be the norm in typical economic empirical
settings. For example, if one is collecting data on a set of countries over time, a re-
searcher may find some countries can be traced back longer than others. Similarly,
in collecting data on firms over time, a researcher may find that some firms have
dropped out of the market while new entrants emerged over the sample period ob-
PANELDATAMETHODS 305
served. For randomly missing observations, unbalanced panels have been dealt with
in Fuller and Battese (1974), Baltagi (1985), Wansbeek and Kapteyn (1989), and
Baltagi and Chang (1994).* For the unbalanced one-way error component model,
GLS can still be performed as a least-squares regression. However, BQU estimators
of the variance components are a function of the variance components themselves.
Still, unbalanced ANOVA methods are available (Searle 1987). Baltagi and Chang
(1994) performed extensive Monte Carlo experiments varying the degree of unbal-
ancedness in the panel as well as the variance components. Some of the main results
include the following: (i) As far as the estimation of regression coefficients are con-
cerned, the simple ANOVA-type feasible GLS estimators compare well with the more
complicated estimators such as MLE and MINQUE and are never more than 4 %
above the MSE of true GLS. (ii) For the estimation of the remainder variance compo-
nent a :, these methods show little difference in relative MSE performance. However,
for the individual specific variance component estimation, a :
, the ANOVA-type es-
timators perform poorly relative to MLE and MINQUE methods when the variance
component a : is large and the pattern is severely unbalanced. (iii) Better estimates
of the variance components, in the MSE sense, do not necessarily imply better es-
timates of the regression coefficients. This echoes similar findings in the balanced
panel data case. (iv) Extracting a balanced panel out of an unbalanced panel by ei-
ther maximizing the number of households observed or the total number of observa-
tions lead in both cases to an enormous loss in efficiency and is not recommended.?
For an empirical application, see Mendelsohn et al. (1992), who use panel data on
repeated single-family home sales in the harbor area surrounding New Bedford, Mas-
sachusetts, over the period 1969 to 1988 to study the damage associated with prox-
imity to a hazardous waste site. Mendelsohn et al. (1992) find a significant reduction
in housing values, between $7000 and $10,000 (1989 dollars), as a result of these
houses’ proximity to hazardous waste sites. The extension of the unbalanced error
components model to the two-way model including time effects is more involved.
Wansbeek and Kapteyn (1989) derive the FE, MLE, and a feasible GLS estimator
based on quadratic unbiased estimators of the variance components and compare
their performance using Monte Carlo experiments.
*Other methods of dealing with missing data include (i) inputting the missing values and analyzing the
filled-in data h y complete panel data methods, and (ii) discarding the nonrespondents and weighting
the respondents to compensate for the loss of cases; see Little (1988) and the section on nonresponse
adjustments in Kasprzyk et al. (1989).
t Chowdhury (1991) showed that for the fixed effects error cwnponent model, the within estimator based
on the entire unbalanced panel is efficient relative to any within estimator based on a sub-balanced
pattern. Also, Mgtyhs and Lovrirs (1991) performed some Monte Carlo experiments to compare the loss
in efficiency of FE and GLS based on the entire incomplete panel data and complete subpanel. They
find the loss in efficiency is negligible if N T > 250, but serious for N T < 150.
306 BALTAGI
Rotating panels attempt to keep the same number of households in the survey
by replacing the fraction of households that drop from the sample in each period by
an equal number of freshly surveyed households. This is a necessity in surveys where
a high rate of attrition is expected from one period to the next. For the estimation of
general rotation schemes as well as maximum likelihood estimation under normality
(Biorn 1981). Estimation of the consumer price index in the United States is based on
a complex rotating panel survey, with 20%of the sample being replaced by rotation
each year (Valliant 1991).With rotating panels, the fresh group of individuals that
are added to the panel with each wave provide a means of testing for time-in-sample
bias effects. This has been done for various labor force characteristics in the Current
Population Survey. For example, several studies have found that the first rotation
reported an unemployment rate 10% higher than that of the full sample (Bailar 1975).
While the findings indicate a pervasive effect of rotation group bias in panel surveys,
the survey conditions do not remain the same in practice, and hence it is hard to
disentangle the effects of time-in-sample bias from other effects.
For some countries, panel data may not exist. Instead the researcher may find
annual household surveys based on a large random sample of the populations. Ex-
amples of some of these cross-sectional consumer expenditure surveys include the
British Family Expenditure Survey, which surveys about 7000 households annually.
Examples of repeated surveys in the United States include the Current Population
Survey and the National Crime Survey. For these repeated cross-sectional surveys,
it may be impossible to track the same household over time as required in a genuine
panel. Instead, Deaton (1985) suggests tracking cohorts and estimating economic
relationships based on cohort means rather than individual observations. One co-
hort could be the set of all males born between 1945 and 1950. This age cohort is
well defined and can be easily identified from the data. Deaton (1985) argued that
these pseudo panels do not suffer the attrition problem that plagues genuine panels,
and may be available over longer time periods compared to genuine panels.* For
this psuedo panel with T observations on C cohorts, the fixed effects estimator BFE,
based on the within-"cohort" transformation, is a natural candidate for estimating
B. However, Deaton (1985)argued that these sample-based averages of the cohort
means can only estimate the unobserved population cohort means with measurement
error. Therefore, one has to correct the within estimator for measurement error using
estimates of the errors in measurement variance-covariance matrix obtained from the
individual data. Details are given in Deaton (1985). There is an obvious trade-off in
the construction of a pseudo panel. The larger the number of cohorts, the smaller is
*Blundell and Meghir (1990) also argue that pseudo panels allow the estimation of life-cycle models
which are free from aggregation bias. In addition, Moffitt (1993) explains that a lot of researchers in
the United States prefer to use pseudo panels like thr Current Population Survey because it has larger
more representative samples anti the questions asked are more consistently defined over time than the
available U S . panels.
PANELDATAMETHODS 307
the number of individuals per cohort. In this case, C is large and the pseudo panel is
based on a large number of observations. However, the fact that the average cohort
size n, = N / C is not large implies that the sample cohort averages are not precise
estimates of the population cohort means. In this case, we have a large number C of
imprecise observations. In contrast, a pseudo panel constructed with a smaller num-
ber of cohorts and therefore more individuals per cohort is trading a large pseudo
panel with imprecise observations for a smaller pseudo panel with more precise ob-
servations. Verbeek and Nijman (1992b) find that n, + 00 is a crucial condition for
the consistency of the within estimator and that the bias of the within estimator may
be substantial even for large n,. On the other hand, Deaton’s estimator is consistent
for /3, for finite n,, when either C or T tend to infinity.
Moffitt (1993)extends Deaton’s (1985) analysis to the estimation of dynamic
models with repeated cross sections. Moffitt illustrates his estimation method for the
linear fixed-effects life-cycle model of labor supply using repeated cross sections
from the U.S. Current Population Survey. The sample included white males, ages
20-59, drawn from 21 waves over the period 1968 to 1988. In order to keep the es-
timation problem manageable, the data was randomly subsampled to include a total
of 15,500 observations. Moffitt concludes that there is a considerable amount of par-
simony achieved in the specification of age and cohort effects. Also, individual char-
acteristics are considerably more important than either age, cohort, or year effects.
Blundell, Meghir, and Neves (1993) use the annual U.K. Family Expenditure Sur-
vey covering the period 1970-1984 to study the intertemporal labor supply and con-
sumption of married women. The total number of households considered was 43,671.
These were allocated to 10different cohorts depending on the year of birth. The aver-
age number of observations per cohort was 364. Their findings indicate reasonably
sized intertemporal labor supply elasticities. Collado (1995) proposed a GMM es-
timator corrected for measurement error to deal with a dynamic pseudopanel data
model. This estimator is consistent as C tends to infinity for a fixed T and n,.
In many economic studies, the dependent variable is discrete, indicating, for exam-
ple, that a household purchased a car or that an individual is unemployed or that he
or she joined the union. For example, let yil = 1 if the ith individual participates in
the labor force at time t . This occurs if y,: the difference between the ith individual’s
offered wage and his unobserved reservation wage is positive. This can be described
more formally as follows:
308 BALTAGI
where
That is, y; can be explained by a set of regressors xi, and error components distur-
bances. In this case,
The last equality holds as long as the density function describing the cumulative
distribution function F is symmetric around zero. For panel data, the presence of in-
dividual effects complicates matters significantly. For the one-way error component
model with random individual effects E(ui,ui,) = 02, for any t , s = 1,2, . . . , T ,
and the joint likelihood of (yil, . . . , y ~ , can
) no longer be written as the product of
the marginal likelihoods of the yit’s. This complicates the derivation of maximum
likelihood and will now involve bivariate numerical integration. On the other hand,
if there are no random individual effects, the joint likelihood will be the product of
the marginals and one can proceed as in the usual cross-sectional limited depen-
dent variable case. For the fixed effects model, with limited dependent variable, the
model is nonlinear and it is not possible to get rid of the p i ’ s by taking differences
or performing the FE transformation, as a result B and of cannot be estimated con-
sistently for T fixed, since the inconsistency in the p i ’ s is transmitted to /3 and 02,
(Hsiao 1986).The usual solution around this incidental parameters ( p i ’ s ) problem
is to find a minimal sufficient statistic for the p i ’ s which does not depend on the B’s.
Since the maximum likelihood estimates are in general functions of these minimum
sufficient statistics, one can obtain the latter by differentiating the log-likelihood
T
function with respect to pi. For the logit model, this yields the result that yit is
a minimum sufficient statistic for pi. Chamberlain (1980)suggests maximizing the
conditional likelihood function
rather than the unconditional likelihood function. For the fixed-effects logit model,
this approach results in a computationally convenient estimator. However, the com-
putations rise geometrically with T and are excessive for T > 10.
In order to test for fixed individual effects, one can perform a Hausman-type
test based on the difference between Chamberlain’s conditional maximum likelihood
estimator and the usual logit maximum likelihood estimator, ignoring the individual
effects. The latter estimator is consistent and efficient only under the null of no in-
dividual effects and inconsistent under the alternative. Chamberlain’s estimator is
consistent whether Ho is true or not, but it is inefficient under Ho because it may not
use all the data. Both estimators can be easily obtained from the usual logit maxi-
mum likelihood routines. The constant is dropped and estimates of the asymptotic
PANELDATAMETHODS 309
*In ca4es where the conditional likelihood function is not feasible as in the fixed-effects probit case,
Manski (1987)suggests a conditional version of his maximum score estimator which under fairly general
conditions provides a strongly consistent estimator of' p.
3 10 BALTAGI
These are generalized score and Wald tests employed to detect omitted variables,
neglected dynamics, heteroskedasticity, nonnormality, and random-coefficient vari-
ations. The performance of these tests in small samples is investigated using Monte
Carlo experiments. Also, an empirical example on the probability of self-employment
in West Germany is given which uses a random sample of 1926 working men selected
from the German Socio-Economic Panel and observed over the period 1984-1989.
Heckman and MaCurdy (1980) consider a fixed-effects tobit model to esti-
mate a life-cycle model of female labor supply. They argue that the individual ef-
fects have a specific meaning in a life-cycle model and therefore cannot be assumed
independent of the xit’s. Hence, a fixed effects rather than a random-effects speci-
fication is appropriate. For this fixed-effects tobit model, the model is given by (7),
-
with uit IIN(0, at) and
yit = yz if y: > 0
(9)
=0 otherwise
where yit could be the expenditures on a car. This will be zero at time t, if the ith
individual does not buy a car. In the latter case all we know is that yz 5 O.* As in
the fixed-effects probit model, the pi’s cannot be swept away and as a result /3 and
cr: cannot be estimated consistently for T fixed, since the inconsistency in the p i ’ s
is transmitted to /Iand 0:. Heckman and MaCurdy (1980) suggest estimating the
log-likelihood using iterative methods. Recently, Honor6 (1992) suggested trimmed
least absolute deviations and trimmed least-squares estimators for truncated and
censored regression models with fixed effects. These are semiparametric estimators
with no distributional assumptions necessary on the error term. The main assumption
is that the remainder error vit is independent and identically distributed conditional
on the xit’s and the p i ’ s , for t = 1, . . . , T . Honor6 (1992) exploits the symmetry in
the distribution of the latent variables and finds that when the true values of the pa-
rameters are known, trimming can transmit the same symmetry in distribution to the
observed variables. This generates orthogonality conditions which must hold at the
true value of the parameters. Therefore, the resulting GMM estimator is consistent
provided the orthogonality conditions are satisfied at a unique point in the parameter
space. Honor6 (1992)shows that these estimators are consistent and asymptotically
normal. Monte Carlo results show that as long as N 2 200, the asymptotic distri-
bution is a good approximation of the small-sample distribution. However, if N is
*Researchersmay also be interested in panel data economic relationships where the dependent variable
is a count of some individual actions or events, such as the number of patents filed, the number of drugs
introduced, or the number of jobs held. These models can he estimated by using Poisson panel data
regressions (Hausman, Hall, and Griliches 1984).
PANELDATAMETHODS 3 I I
*For good surveys of simulation methods, see Hajivassiliou and Ruud (1994)for limited dependent vari-
able models and Gourieroux and Monfort (1993) with special reference to panel data. The methods
surveyed include simulation of the likelihood, simulation of the moment functions, and simulation of the
score.
312 BALTAGI
points. When maximum likelihood methods are not feasible, the MSM estimator out-
performs the simulated maximum likelihood estimator even when the highly accurate
GHK probability simulator is used. Keane (1993)applies the MSM estimator to the
same data set used by Keane, Moffitt, and Runkle (1988) to study the cyclical be-
havior of real wages. He finds that the Keane, Moffitt, and Runkle conclusion of a
weakly procyclical movement in the real wage appears to be robust to relaxation of
the equicorrelation assumption.
Heckman (1981a, 1981b, 1981c) emphasizes the importance of distinguish-
ing between “true state dependence” and “spurious state dependence” in dynamic
models of individual behavior. In the true case, once an individual experiences an
event, his or her preferences change and he or she will behave differently in the fu-
ture as compared with an identical individual that has not experienced this event
in the past. In the spurious case, past experience has no effect on the probability of
experiencing the event in the future. However, one cannot properly control for all
the variables that distinguish one individual’s decision from another. In this case,
past experience which is a good proxy for these omitted variables shows up as a sig-
nificant determinant of the future probability of occurrence of this event. Testing for
true versus spurious state dependence is therefore important in these studies, but it
is complicated by the presence of the individual effects or heterogeneity. In fact, even
if there is no state dependence, Pr[yi,/x;,, y;,,-e] # Pr[yit/xit] as long as there are
random individual effects present in the model. If in addition to the absence of the
state dependence there is also no heterogeneity, then Pr[yi,/xi,, y;,,-e] = Pr[yi,/xi,].
A test for this equality can be based on a test for y = 0 in the model
wyi, = w i t 7 yit-11 =O : , B +Ui,t-I) (11)
by using standard maximum likelihood techniques. If y = 0 is not rejected, we
ignore the heterogeneity issue and proceed as in conventional limited dependent
variable models not worrying about the panel data nature of the data. However, re-
jecting the null does not necessarily imply that there is heterogeneity since y can
be different from zero due to serial correlation in the remainder error or due to state
dependence. In order to test for time dependence one has to condition on the individ-
ual effects, i.e., test Pr[Y,,/y;,,-e, x;,, p ; ] = Pr[yi,/x;,, pi]. This can be implemented
following the work of Lee (1987) and Maddala (1987). In fact, if y = 0 is rejected,
Hsiao (1996) suggests testing for time dependence against heterogeneity. If hetero-
geneity is rejected, the model is misspecified. If heterogeneity is not rejected then
one estimates the model correcting for heterogeneity. See Heckman (1981~) for an
application to married women’s employment decisions based on a three-year sample
from the Panel Study of Income Dynamics. One of the main findings of this study is
that neglecting heterogeneity in dynamic models overstate the effect of past experi-
ence on labor market participation.
In many surveys, nonrandomly missing data may occur due to a variety of
self-selection rules. One such self-selection rule is the problem of nonresponse of
PANELDATAMETHODS 3 I 3
the economic agent. Nonresponse occurs, for example, when the individual refuses
to participate in the survey or refuses to answer particular questions. This problem
occurs in cross-sectional studies, but it gets aggravated in panel surveys. After all,
panel surveys are repeated cross-sectional interviews. So, in addition to the above
kinds of nonresponse, one may encounter individuals that refuse to participate in
subsequent interviews or simply move or die. Individuals leaving the survey cause
attrition in the panel. This distorts the random design of the survey and questions the
representativeness of the observed sample in drawing inference about the popula-
tion we are studying. Inference based on the balanced subpanel is inefficient even in
randomly missing data since it is throwing away data. In nonrandomly missing data,
this inference is misleading because it is no longer representative of the population.
Verbeek and Nijman (1996)survey the reasons for nonresponse and distinguish be-
tween “ignorable” and “nonignorable” selection rules. This is important because, if
the selection rule is ignorable for the parameters of interest, one can use the standard
panel data methods for consistent estimation. If the selection rule is nonignorable,
then one has to take into account the mechanism that causes the missing observa-
tions in order to obtain consistent estimates of the parameters of interest.
We now consider a simple model of nonresponse in panel data. Following the
work of Hausman and Wise (1979), Ridder (1990), and Verbeek and Nijman (1996),
we assume that yit given by Eq. (1)is observed if a latent variable r: 2 0. This latent
variable is given by
where zit is a set of explanatory variables possibly including some of the xil’s. The
REFERENCES
Ahn, S. C. and S. Low (19%), A Reformulation of the Hausman Test for Regression Models
with Pooled Cross-Section Time-Series Data, Journal of Econometrics,71,309-319.
Ahn, S. C. and P. Schmidt (1995), Efficient Estimation of Models for Dynamic Panel Data,
Journal of Econometrics, 68, 5-27.
Amemiya, T. (1971), The Estimation of the Variances in a Variance-Components Model, In-
ternational Economic Review, 12, 1-13.
Amemiya, T.and T. E. MaCurdy (1986), Instrumental-Variable Estimation of an Error Com-
ponents Model, Econometricu, 54,869-881.
Anderson, T. W. and C. Hsiao (1982), Formulation and Estimation of Dynamic Models Using
Panel Data, Journal of Econometrics, 18, 47-82.
Angrist, J. D. and W. K.Newey (1991), Over-identification Tests in Earnings Functions with
Fixed Effects, Journal of Business und Economic Statistics, 9,317-323.
Arellano, M. (1987), computing Robust Standard Errors for Within-Groups Estimators, Oxford
Bulletin of Economics and Statistics, 49, 431-434.
Arellano, M. (1989), A Note on the Anderson-Hsiao Estimator for Panel Data, Economics
Letters, 31,337-341.
Arellano, M. (1990), Some Testing for Autocorrelation in Dynamic Random Effects Models,
Review of Economic Studies, 57, 127-134.
Arellano, M.(1993), On the Testing of Correlated Effects with Panel Data, Journal of Econo-
metrics, 59, 87-97.
Arellano, M. and S. Bond (1991), Some Tests of Specification for Panel Data: Monte Carlo
Evidence and an Application to Employment Equations, Review of Economic Studies,
58,277-97.
Arellano, M. and 0. Bover (1995), Another Look at the Instrumental Variables Estimation of
Error-Component Models, Journal ofEconometrics, 68,2941.
Avery, R. B. (1977), Error Components and Seemingly Unrelated Regressions, Econometrica,
45,199-209.
Avery, R. B., L. P. Hansen, and V. J. Hotz (1983), Multiperiod Probit Models and Orthogonality
Condition Estimation, International Economic Review, 2 4 , 2 1-35.
Bailar, B. A. (1975), The Effects of Rotation Group Bias on Estimates from Panel Survey,
Journal of the American Statistical Association, 70,23-30.
316 BALTAGI
Baillie, R. and B. H. Baltagi (1995), Prediction from the Regression Model with One-way
Error Components, Working Paper, Department of Economics, Texas A&M University,
College Station, Texas.
Baltagi, B. H. (1980), On Seemingly Unrelated Regressions with Error Components, Econo-
metrica, 48, 1547-1551.
Baltagi, B. H. (1981a), Pooling: An Experimental Study of Alternative Testing and Estimation
Procedures in a Two-way Error Components Model, Journal of Econometrics, 17,Zl-
49.
Baltagi, B. H. (1981b), Simultaneous Equations with Error Components, Journal of Econo-
rnetrics, 17, 189-200.
Baltagi, B. H. (1984), A Monte Carlo Study for Pooling Time-Series of Cross-Section Data in
the Simultaneous Equations Model, tnternational Economic Review, 25,603-624.
Baltagi, B. H. (1985), Pooling Cross-Sections with Unequal Time-Series Lengths, Economics
Letters, 18, 133-136.
Baltagi, B. H. (1995a), Editor’s Introduction: Panel Data, Journal of Econometrics, 68, 1-4.
Baltagi, B. H. (1995b), Econometric Analysis of Panel Data, Wiley, Chichester.
Baltagi, B. H. (1996), Testing for Random Individual and Time Effects Using a Gauss-Newton
Regression, Economics Letters, 50, 189-192.
Baltagi, B. H. and Y. J. Chang (1994), Incomplete Panels: A Comparative Study of Alternative
Estimators for the Unbalanced One- Way Error Component Regression Model, Journal
of Econometrics, 62,67-89.
Baltagi, B. H. and Y. J. Chang (1996), Testing for Random Individual Effects Using Recursive
Residuals, Econometric Reviews, 15,331-338.
Baltagi, B. H., Y. J. Chang, and Q. Li (1992), Monte Carlo Evidence on Panel Data Regressions
with AR(1) Disturbances and an Arbitrary Variance on the Initial Observations, Journal
of Econometrics, 52, 371-380.
Baltagi, B. H. and J. M. Griffin (1988), A Generalized Error Component Model with Het-
eroscedastic Disturbances, International Economic Review, 29, 74.5-753.
Baltagi, B. H. and J. M. Griffin (1995), A Dynamic Demand Model for Liquor: The Case for
Pooling, Review of Economics and Statistics, 77, 545-553.
Baltagi, B. H. and J. M. Griffin (1997), Pooled Estimators vs. Their Heterogeneous Counter-
parts in the Context of Dynamic Demand for Gasoline, Journal of Econornetrics, 77,
303-327.
Baltagi, B. H., J. Hidalgo, and Q. Li (1996), A Non-parametric Test for Poolability Using Panel
Data, Journal of Econometrics, 75, 345-367.
Baltagi, B. H. and Q. Li (1990), A Lagrange Multiplier Test for the Error Components Model
with Incomplete Panels, Econometric Reviews, 9, 103-107.
Baltagi, B. H. and Q. Li (1991), A Transformation That Will Circumvent the Problem of Au-
tocorrelation in an Error Component Model, Journal of Econometrics, 48,385-393.
Baltagi, B. H. and Q. Li (1992), Prediction in the One-way Error Component Model with Serial
Correlation, Journal of Forecasting, 11, 561-567.
Baltagi, B. H. and Q. Li (1995), Testing AR(1) against MA(1) Disturbances in an Error Com-
ponent Model, Journal of Econometrics, 68, 133-151.
Baltagi, B. H. and N. Pinnoi (1995), Public Capital Stock and State Productivity Growth:,
Further Evidence from an Error Components Model, Empirical Economics, 20, 351-
359.
PANELDATAMETHODS 3 I7
Becketti, S., W. Gould, L. Lillard, and F. Welch (1988), The Panel Study of Income Dynamics
after Fourteen Years: An Evaluation, Journal of Labor Economics, 6,472-492.
Bhargava, A. (1991), Identification and Panel Data Models with Endogenous Regressors, Re-
view of Economic Studies, 58, 129-140.
Bhargava, A., L. Franzini, and W. Narendranathan (1982),Serial Correlation and Fixed Effects
Model, Review of Economic Studies, 49, 533-549.
Bhargava, A. and J. D. Sargan (1983),Estimating Dynamic Random Effects Models from Panel
Data Covering Short Time Periods, Econometrica, 51, 1635-1659.
Biorn, E. (1981),Estimating Economic Relations from Incomplete Cross-Section/Time-Series
Data, Journal of Econometrics, 16,221-236.
Biorn, E. (1992), The Bias of Some Estimators for Panel Data Models with Measurement Er-
rors, Empirical Economics, 15,221-236.
Bjorklund, A. (1985), Unemployment and Mental Health: Some Evidence from Panel Data,
The Journal of Human Resources, 20, 469-483.
Blundell, R. W. and C. H. Meghir (1990), Panel Data and Life-Cycle Models, in J. Hartog, G.
Ridder, and J. Theeuwes (eds.), Panel Data and Labor Market Studies, North-Holland,
Amsterdam, 231-252.
Blundell, R., C. Meghir, and P. Neves (1993), Labor Supply and Intertemporal Substitution,
Journal of Econometrics, 59, 137-160.
Boumahdi, R. and A. Thomas (1991), Testing for Unit Roots Using Panel Data: Application
to the French Stock Market Efficiency, Economics Letters, 37, 77-79.
Breitung, J. and W. Meyer (1994),Testing for Unit Roots in Panel Data: Are Wages on Different
Bargaining Levels Cointegrated?, Applied Economics, 26,353-361.
Breusch, T. S. (1987), Maximum Likelihood Estimation of Random Effects Models, Journal
uf Econometrics, 36,383-389.
Breusch, T. S., G. E. Mizon, and P. Schmidt (1989), Efficient Estimation Using Panel Data,
Econometrica, 57, 695-700.
Breusch, T. S. and A. R. Pagan (1980), The Lagrange Multiplier Test and Its Applications to
Model Specification in Econometrics, Review uf Econometric Studies, 47,239-253.
Butler, J. S. and R. Moffitt (1982), A Computationally Efficient Quadrature Procedure for the
One Factor Multinominal Probit Model, Econometrica, 50, 761-764.
Chamberlain, G. (1980), Analysis of Covariance with Qualitative Data, Review of Economic
Studies, 47,225-238.
Chamberlain, G. (1982), Multivariate Regression Models for Panel Data, Journal of Econo-
metrics, 18, 5-46.
Chamberlain, G. (1984), Panel Data, in Z. Griliches and M. Intrilligator (eds.), Handbook of
Econometrics, North-Holland, Amsterdam.
Chesher, A. (1984),Testing for Neglected Heterogeneity. Econometrica, 52,865-872.
Chow, G. C. (1960),Tests of Equality between Sets of Coefficients in Two Linear Regressions,
Econometrica, 28, 591-605.
Chowdhury, G. (1991), A Comparison of Covariance Estimators for Complete and Incomplete
Panel Data Models, Oxford Bulletin of Economics and Statistics, 53,88-93.
Chowdhury, G. (1994), Fixed Effects with Interpersonal and Intertemporal Covariance, Em-
pirical Economics, 19,523-532.
Collado, M. D. (1995), Estimating Dynamic Models from Time-Series of Cross-Sections, Uni-
versity of Carlos 111 d e Madrid, Madrid, Spain.
322 BALTAGI
Wansbeek, T. J. and A. Kapteyn (1992), Simple Estimators for Dynamic Panel Data Models
with Errors in Variables, in R. Bewley and T. Van Hoa (eds.), Contributions to Consumer
Demand and Econometrics: Essays in Honor of Henri Theil, St. Martin’s Press, New
York, 238-251.
Wansbeek, T. J. and R. H. Koning (1989), Measurement Error and Panel Data, Statistica Neer-
landica, 4 5 , 8 5 9 2 .
Wooldridge, J. M. (1995), Selection Corrections for Panel Data Models under Conditional
Mean Independence Assumptions, Journal of Econometrics, 68,115-132.
Wyhowski, D. J. (1994), Estimation of a Panel Data Model in the Presence of Correlation
between Regressors and a Two-way Error Component, Econometric Theory, 10, 130-
139.
Ziemer, R. F. and M. E. Wetzstein (1983), A Stein-Rule Method for Pooling Data, Economics
Letters, 11, 137-143.
This page intentionally left blank
Econometric Analysis in
Complex Surveys
1. INTRODUCTION
In the last five decades there has been a significant growth of research in econo-
metric methods and their application in various areas of economics. Indeed, in the
last two decades, the tremendous growth in econometrics has dichotomized the sub-
ject into cross-sectional (micro) econometrics and time-series (macro) econometrics.
Whereas the new cross-sectional methodology was partly due to the nature of the data
and the empirical issues in microbased labor economics and industrial organiza-
tion, the new time-series methodology was an outcome of the challenging empirical
issues and data problems in macroeconomics and finance. Despite these develop-
ments, econometric inference methods (especially in cross-sectional econometrics)
have been confined to the assumptions that data is generated as a simple random
sample with replacement or that it is coming from an infinite population (Johnston
1991, Greene 1993).These assumptions are certainly not valid in the case of survey
data used in development and labor economics. Surveys usually have a well-defined
frame consisting of a finite population of individuals, households, or villages. Sample
data for analysis is generated from this finite population using a sampling design dif-
ferent from random sampling with replacement (RSWR). Sampling schemes such as
systematic sampling, stratified random sampling, and cluster sampling may be used
alone or in combination. These have been the subject of four decades of extensive
work in statistics literature (Kish 1965, Cochran 1953, Sukhatme 1984, Levy and
Lemeshow 1991, Thompson 1992).
The history of survey sampling can be traced back to the early eighteenth cen-
tury, and even earlier (see Hansen 1987 and Deaton 1994 for detailed references).
3 25
326 ULLAH AND BREUNIG
metric analysis is carried on under the false assumption of RSWR; although see the
excellent works of Pudney (1989), Deaton (1994),and Howes and Lanjouw (1994)
for notable exceptions. This is especially a matter of concern in development eco-
nomics where measures of income inequality, poverty and elasticities are used in pol-
icymaking by governments and international agencies. We think there are perhaps
two reasons for this state of affairs. One is the statistical complexity of the various
sampling designs for an average development economist; the second is a complete
lack of exposure to the statistical literature on survey design in econometrics texts.
Given this deficiency a systematic development of the parametric and nonparametric
econometric inference (estimation and testing) of various econometric models, under
various practical sampling designs, is urgently needed. This is an ambitious project
and it is by no means attempted here. Instead, this chapter is a modest beginning in
this direction. Some new results are also presented. Essentially, our objectives are
as follows. The first is to provide a unified econometric framework of the five decades
of diverse statistical literature on estimating the population mean. We refer to this
as the mean model. The second is to explore the implications of results from the
mean model for the linear regression model and the nonparemetric kernel density
estimation. The third is to explore the implications of misspecifying the sampling
design on the properties of econometric estimators. It is hoped that this chapter will
contribute to further development of econometric inference results in other practical
econometric models and for other parameters of interest.
In Section I1 we present the estimation of the finite population mean, the den-
sity, and linear regression coefficients under RSWR and random sampling without
replacement (RSWOR). Section 111 deals with stratified sampling. Section IV con-
siders cluster sampling, systematic sampling, and two-stage sampling. In Section V
we give some limited simulation results. Finally, in the Appendix we provide some
technical details of the results in Section 11.
where Y; is the ith population observation, Ui is the ith error, and j3 and o2are the
population mean and variance, respectively, given by
N
N-1
S2
1= 1
N i= I
N
328 ULLAH AND 6REUNlG
where
N
i= 1
N-1
The errors U ; are nonsampling errors which sum to zero by the definition of B in
(2). Therefore, U ; and Y, are nonstochastic variables. However, if we treat the finite
population model (1) as generated from an infinite population or superpopulation
model, then U ; and Y, are stochastic. This case is not considered here.
A random sample without replacement of size n, often referred to in the liter-
ature as a simple random sample (SRS), is taken from the above finite population of
size N. We denote the sample observations as yi and write (1) for these observations
as
is the probability that the rth population unit is selected in the ith draw and
is the probability that the (T-,s)th unit is selected in the (i, j)th draw where i, j =
1 , . . . , n and r, s = 1 , . . . , N ( z # j , r # s). These probabilities provide
N
ITr = Cnr(i)
=-
n
N
i= 1
the probability of selection of the rth population unit in the sample of size n, and
ifj .
the probability of selection of the (r, s)th population unit in the sample.
In view of (4) to (7), we get
and, for i # j,
Eu! = ( + 3)a4 = -
~ 2
1
N
E(Yi- /?)4
i= 1
1 N N
and for i # j # k # 1,
2y] (r3
____-
( N - 1) ( N - 2)
01234
2(Y2 +3) - N 04
(N - 1)(N - 2)
330 ULIAH AND BREUNIG
where y1 and y 2 are Pearson's measures of skewness and excess kurtosis. For nor-
mal distribution, y1 = y 2 = 0. The outcomes (8)to (11) indicate that RSWOR
represents a set of n identically distributed but correlated random variables yi.
In the case of RSWR the draws are independent, so
B. Estimation of Parameters
c = a2(1- p ) I + _
[ P
_
1 - p"']
i= 1
ECONOMETRIC ANALYSIS
IN COMPLEX SURVEYS 33 I
where the last equality gives the familiar expression of the variance of the sample
mean under RSWOR. The term 1 - n/N is known as the finite population correction
(fpc). For n + N , V ( b ) + 0.
The efficient generalized least-squares estimator of is
s = (l’x-ll)-+’x-ly = ( L ’ q ’ l ’ y = ji (19)
where the second equality follows by using (16).Thus, under (15),the two estimators
and their variances are the same.
When the sampling is RSWR or the population is infinite, C = 0’1 because
p = 0. In this case, Eb = B and
o2
V(6)= -
n
which also follows from the last equality of (18)where n/N + 0 as N + 00. From
(18)and (20)
The above results indicate that the LS estimator b is unbiased for both RSWOR
and RSWR. However, if the sampling is actually without replacement, the variance
formula in (20) is wrong and gives an overestimate of the correct variance (18).To
obtain the correct variance one needs to deflate (20) by ( N - n ) / (N - 1).For example
if n = 20 and N = 40 the correct variance will be approximately 50% smaller than
the wrong variance. The smallness of the variance of ~ R S V ( I O Ris due to the negative
correlation p.
In order to calculate the variance of b we look into an unbiased estimator of
S2. This is given by (using (13))
where M = I - u‘/n is an idempotent matrix. From the result ( 1 16) in the Appendix
it is straightforward to show that s2 is an unbaised estimator,
332 ULLAH AND 6REUNlG
V ( S 2 )= 1
n
[{ y2 + 3 - n ?")( N - 1 J 0 4 (24)
n+l N
N-1
and
where
n
y2
+ 20+-
"1
+ 48 - 4y18'/' + 2-n - 1
n + l +20-) n
(35)
n-1 n-1
)
- 4e1l2 y 3 + 4 y l ___ 4-75@ - 10883/2y1
( n-1 1
ECONOMETRIC ANALYSIS
IN COMPLEX SURVEYS 335
From (34)we observe that the bias goes to zero as n + 00. Further, the asymp-
totic MSE (AMSE) is
Thus, for distributions with y2 > 0 (y2 < 0) the AMSE will be above (below) the
AMSE under normality. Further, we note that the bias, up to 0(a3)and O(n-’), is
positive for negatively skewed distributions, but the bias to O ( d ) is negative for pos-
itively skewed distributions. However, to O ( n - l ) , the bias is negative for positively
skewed distributions provided
2
cv = 01’2< ,)q (37)
Generally, 8 will give an underestimate of the true 0 for positively skewed distribu-
tions. Since income distributions are generally positively skewed, it is possible that,
in the past, the use of 8 in measuring income inequality gave an underestimation
of inequality. In view of this, Breunig (1996) hazuggested an estimator of 0 which
adjusts the bias of 8. This is given as = 8 - Bias(8), where BTs(8) is the bias of
8 in (34)with 8 replaced by 8 and y1 by p1 = C(Yi - T)’/ns^”.Although we do not
attempt it here, it will be interesting to analyze these results for sampling without
replacement. This could be done by using (29)and (30).Another possible extension
involves the use of the geometric mean, useful when the data are expressed in ra-
tio form. We could then formulate the sample cv as the ratio of the sample geometric
mean to the sample standard deviation. The authors are unaware of any development
of the finite-sample moments for such a statistic, even under random sampling with
replacement.
and then obtain the LS estimator of /3 by minimizing the weighted squared error
U‘WU = (y - @ ) ’ W ( y - $I) ,
where W = Diag(w1, . . . , w,) is an n x n stochas-
tic diagonal weight matrix whose elements wi,known as the normalized expansion
factors, satisfy l‘wl = wi = 1.This gives the weighted LS estimator of /3 as
336 ULLAH AND BREUNIG
The stochastic weights w;are chosen such that (using (4) and (6))the sample is rep-
resentative of population in the sense that the sample mean, on average, is identical
to the population mean. That is,
/ n \ n N
This gives
1
w; = - (41)
Nni
and b , = N-’ C;(yJni), where ni is the probability of selection of the ith popu-
lation unit in the sample. When n; = n / N we get b , = b as given in (17).
An alternative way to obtain (41) is to write Eb, = E ( x ’ ; wiy;) = E ( x F
w;diY,)= Cr(wi.Ed;)U; = winiYi where di is a dummy random variable
which is 1 when Y, is in the sample, and 0 otherwise. Since the probability of selection
of the ith population unit in the sample is n;, we can verify that
Let us consider the mean of the conditional population of Y given a vector of k vari-
ables X I , . . . , X k as
Y = x*p +U (44)
where Y is an N x 1vector and X * is an N x k matrix. The model (44)is a conditional
mean model if the conditional mean of the nonsampling errors, E ( U l X ) , is zero or
ECONOMETRIC ANALYSIS
IN COMPLEX SURVEYS 337
y=xp+u (45)
where y is an n x 1 vector and X is an n x k matrix. Under RSWOR, E ( u ( X ) = 0,
and V ( u l x ) = E, where C is as given in (15).The LS and GLS estimators of /I and
their variances, respectively, are
1 "
k; = -
Nh
l
cN
i= I
d;K; (49)
where k; = h,'K((y; -y)/h,) and h = n / N . Thus the finite population density esti-
mation problem reduces to the problem of estimating the population mean discussed
in (1) to (3).It therefore follows from the results in Section 1I.A that
E f ( y ) = (Nh)-"(Ed,)
(E-K ('lb'))
N;
____ (51)
+f*
The details of (51) and (52) can be worked out by following Rosenblatt (1956) and
Pagan and Ullah (1995).For the asymptotic theory in the parametric, finite popula-
tion models, see Fuller (1984)and the references therein.
In the regression context we can write the nonparametric versions of the finite
population and sample models in (44)and (45), respectively as
y; = rn(X;) + ui (53)
and
y = ax,)&J +U (55)
where Z ( x , ) = [ l (x - LX,)], 6 ( x , ) = [ m ( x , ) rn'(x,)]', and V ( u ) = C as given
in (46).The local least-squares estimation of 6 ( x , ) can then be carried on by mini-
mizing U'C-'/'KC-'/~U.The properties of $(,) so obtained can be worked out by
using the procedures developed in Ruppert and Wand (1994).
ECONOMETRIC ANALYSIS
IN COMPLEX SURVEYS 339
yi = pi + Ui (59)
where is an Ni x 1 vector of observations. The population size is N = Ni. E, M
and
Here, Mi is the matrix M in (22) with L = i i . The parameter of interest may, however,
be the overall mean of the population. That is,
M N, M
But n;/N;,the probability of selection of the jth population element in the sam-
ple of size ni in the ith stratum, may not be constant across strata. Therefore the
weighted LS described in Section 1I.D will be more useful here. For this we mini-
mize CYi wi,(y;j - j3 - s:)’with the restriction that w;jp; = 0 and
Cp;’ w;;= 1. This gives
M n;
and
This inflation or expansion factor w;;is chosen such that E b = #? and Eb; = pi.
This gives
This is the same as V(b,,) or V ( b ) if n;/n = N ; / N and #?; = 6. In this case the
population is homogeneous and the combined sample is a simple random sample.
342 ULLAH AND BREUNIG
In general, V ( b s ~ s >
) V(bst),especially if within-strata heterogeneity is low and
between-strata heterogeneity is high. To see this, consider V ( b s L for
) case of propor-
tional stratified sampling, where n; = nN;/N. In this case
Alternatively,
From the above analysis it is clear that if the sample observations y;i are gen-
erated by stratified random sampling then they should be reweighted to resemble
the population by replicating (inflating) sampling units, using the inflation or ex-
pansion factor, and treating the enlarged sample as if it were the population. The
inflation factor, O i j , for each sampling unit j in the ith stratum is the reciprocal of
its sampling probability; that is, Oi, = l/nij = N i / n ; . If we multiply each sample
observation by its inflation factor 8;;. we obtain an unbiased estimate of the popu-
lation total. Alternatively if we multiply the sample observations by their weights
wij = Oy/ Oi, = l/Nnij, the normalized inflation factor, we get an unbiased
estimate of population mean, as shown in (72) and Section 11. Exactly the same pro-
cedures can be used to obtain estimates of medians, variances, and other parameters.
We will examine the weighting for regression parameters.
B. Regression Model
Suppose now that the parameters of interest are no longer population means, but the
parameters of a linear regression model
i= 1
By the analogy of the population mean case one may consider 0; = N i / N . If the
population is stratified on some economic grounds such as rural and urban, then the
estimates of will be useful in their own right. However, if the population is divided
into a large number of strata on administrative grounds, then studying /?will be more
meaningful. The estimates of are
bi = (xi'xL>-'xL!yi (80)
with V ( b ; ) = aT(Xl!Xi)-'. If we have RSWOR bi = ( X ~ C ~ ' X ; ) - ' X ( C ~ with
'yi
V ( b ; ) = ( X f C ; ' X i ) - ' , X i is C with 1 replaced by 1; and n by ni.
To estimate /3 we write
/ M \-' M M
and
so that the bias does not depend on the sample values; wi is the same as in (83) with
pi replaced by 0i. Furthermore if Bi - p is random, Q; = Q or pi = p, then the bias
vanishes. For Q; = Q the bias vanishes when wi = 8;. Note that for = /?both
b, and b p are unbiased but b p will be more efficient by the Gauss Markov theorem
(DuMouchel and Duncan 1983). Kish and Frankel (1974), however, argue in favor
of b, and the parameter of interest
i= 1
y= xp + m y + U (89)
and testing for y = 0 by the standard F-test. Alternatively, one can combine the two
by using Stein-type shrinkage estimators as
bs = (1 - +) bp + i b ,
The properties of bs are not known, but they can be developed by following Judge
and Bock (1978) or Vinod and Ullah (1981).
Magee et al. (1996)suggest an alternative estimator to weighted least squares
when the sampling probabilities are known but the form of the sample design is un-
known. They propose a conditional maximum likelihood estimator, which, under cer-
tain conditions, is superior (in the mean-squared error sense) to WLS or OLS. They
treat the weights (sampling probabilities) as having been generated by a stochas-
tic process and independently distributed throughout the population, an assumption
which is violated under either stratification or clustered sampling. They suggest us-
ing their ML estimator in any case, since information about the sample is usually
unknown. Magee (1996) and Magee et al. (1996) suggest a way to improve weighted
least-squares regression for survey data. Including weights often injects additional
346 ULlAH AND BREUNIG
heteroskedasticity into the model and the WLS estimator, though consistent, often
has a high variance. Magee (1996)suggests creating new weights by multiplying the
weights by a function parameterized so as to minimize variance. Again, indepen-
dence across the sample is assumed, necessitating some adjusting of the procedure
for use under stratification or clustered sampling.
For further discussion of regression, including inference on finite population
and superpopulation parameters, Kalton (1983) provides a clear and readable in-
troduction, Pfefferman (1993) reviews some of the recent work on regression mod-
els and weighting, Selden (1994) considers the case of weighted, generalized least
squares in the mean model case, and Godambe (1995,1997)provides a more general
model.
The estimators b p , b,, and bs are all weighted averages of 6;. Another weighted
average of bi follows by considering pi - /3 to be random with mean vector 0 and
diagonal covariance matrix A,. Thus, heterogeneity across strata is due to variance
+ +
only. In this case, X ; ( p ; - p ) ui has the variance a: X;AX,/ and we can get the
GLS estimator of j3 as
i= 1
+
where wi = ( [ C I ( A p &)-']-'(As + i$)-';& = 0f(XL!Xi)-').This is the well-
known Swamy (1971) estimator. The estimators b p and 6, will both be inefficient
under this scenario. Their standard errors need to be calculated by using a White
(1980) kind of adjustment.
The above procedures are useful when heterogeneity across strata is present.
This is important when the number of strata are large. If the number of strata are
small, one could do separate regressions, combine them by allowing stratum-specific
intercepts, or use strata dummies with the same variables.
Although we do not explore questions of prediction here, they will be important
in choosing the parameters of interest. The census parameter, /?in (79)would be of
interest when one would like to predict the change in the independent variable for a
small change in the dependent variable for every member of society. If a particular
policy would affect some strata differently than others, this might not be a parameter
which aids in prediction.
For the purpose of prediction, however, nonparametric analysis will be more
useful since it directly estimates the regression function and thus avoids the prob-
lem of defining the parameter of interest. This is not taken up here, but will be an
interesting subject of a future study.
Also not considered here are the usual diagnostic tests used in both parametric
and nonparametric ecomometric analysis. Obviously, these too must be adapted to
account for complex sampling procedures.
ECONOMETRIC ANALYSIS
IN COMPLEX SURVEYS 347
is an unbiased estimator of a:, where iiij = yij . b,. In practice, one could consider
i2'a/n, which is consistent. Further, for i = 1, . m,
* 7
gives
Deaton (1994) provides numerical examples of the effect of ignoring p in the calcu-
lation of standard errors by 6,. He showed that for estimated food price elasticities
in Pakistani villages, p is between .3 and .6, leading to underestimation of V ( b c ) by
a factor greater than 2 when the mean cluster size is 12. Now we turn to the case of
two-stage sampling where, within each selected cluster, we pick a sample of n; < Ni
units. The probability of selection of every element in the chosen cluster is mni/MNi.
It is easy to verify that the estimator of the mean /3 is 62s = E'; b;/m, where bi is
C:i yi;/ni. For the variance of b2s, see Kish (1965).
B. Systematic Sampling
Systematic sampling is one of the most common techniques used in development eco-
nomics. In systematic sampling, the sampling units are (usually) arranged in random
order with respect to the variable of interest. Of the first K units, one is selected at
random. Then every Kth unit is sampled in order. This sampling design is the easiest
to implement because it involves drawing only one sample. Systematic sampling can
ECONOMETRIC ANALYSIS
IN COMPLEX SURVEYS 349
3 = /9 + uj, j = 1, . . . , n
The LS estimator of the mean, when cluster k is chosen, is
In general, we will not be able to estimate this variance. In the case where our data
consists of one systematic sample, the population mean, p, is unknown as are the
remaining K - 1 unsampled clusters. In some surveys, resampling is possible. In
this case, information can be gathered about the within and across-cluster hetero-
geneity and an approximation for V ( b s y s ) as a function of the intracluster correlation
coefficient, p :
C. Regression Model
In the regression model
6, = b u = (X’X)-’X’y, ) (X’X)-’X’StX(X’X)-’
V ( b ~ s= (106)
and
V(b,) = a 2 ( X ’ X ) - ’ d (108)
which reduces to the variance in the mean model where k = 1 and X = 1. Moulton
(1990)and Deaton (1994)provide examples of potential underestimation, using the
usual variance o~(x’x)-’.
The results for the case of observations in X changing with clusters or when
n; # n is not well developed in the literature, although see Pfefferman and Smith
(1985)who provide an upper bound of V(bc).Also, the efficiency of b ~ compared u
to b u needs to be analyzed.
An alternative to estimating St by first estimating p and cr2 is to consider $2 to
be of unknown form and then estimate X’QX consistently by X ’ h X , as suggested
by White (1980)and Arellano (1987).
where ni is the probability of selection of the ith cluster in the sample. For RSWR,
7 t i = m/M. We note that wij = (Z'ij)-l, where Pi; is the probability of selecting
yi; = probability of selecting the ith cluster x probability of selecting the jth unit
of the population in the sample given the ith cluster is selected. This gives
If the cluster is chosen with probability proportional to size of the cluster (i.e., 7ri =
mNJN = mei,
1 "
b, = - bi
m
i=1
In practice 7ri is proportional to estimated size N;*. This gives
where 87 = N r / N * , N* = N:.
When we consider single-stage sampling or cluster sampling, the estimator bw
can be written as
yi = XiBi + ui
bi = (XfXi)-'Xfyi, and we can consider b, = C y ( B i b i / n i ) as before (Konijn
1962).For cluster sampling we can again consider 6, = m-l E';"bi/ni.
We can also estimate j3 by considering to be random so that yi = X i s ; +
+ +
= X;B Xi& U i , where 6; = B; - B. This will involve Swamy-type random
coefficient estimation of B described in Section I11 (also see Porter 1973).
Fuller (1975)estimates the one-way error component model
yi = Xi/3 + eili + ui
to capture the correlation across the elements of the ith cluster, where ei has mean
zero and constant variance.
352 ULIAH AND BREUNIG
V. SIMULATION
In this section, we present a summary of the results from a detailed simulation of the
mean model under complex sampling. The two primary objectives of the simulation
are (1) to illustrate the effect of ignoring sample design in data analysis, and (2)
to ascertain the properties of our estimators under various sample designs where
analytical results do not exist.
The first step in our simulation was the creation of several finite 44populations.”
The populations created ranged in size from 50 to 20,000, with means ranging from
1 to 2000. They were drawn from an “infinite” population of randomly distributed,
normal numbers. From these finite “populations” we then drew n observations, us-
ing the sampling design in question. For the questions under consideration in this
section, the shape of the distribution is irrelevant, so only normal random numbers
were considered. For analyzing other variables, such as the distribution of V ( s 2 )
considered in Section I, the shape of the distribution does matter, and conclusions
based on simulations using only normally distributed populations should be made
with caution.
To demonstrate the first point, a sample of size n was drawn using the sample
design of interest (stratified, clustered, etc.), then b and V ( b ) were estimated using
the information on how the sample was drawn. Then taking this same sample and
ignoring the sample design, we calculated ~ R S W Rand V(bRSWR)-i.e., treating the
sample as if it were a random sample drawn with replacement. These values are aver-
aged over 1000 repetitions. We then compare the average bias (b)and bias ( ~ R S W K )
and the ratio of the averages of the two variances, which can be interpreted as the de-
gree of over- or underestimation arising from ignoring (or misspecifying) the sample
design.
The second type of simulation we have undertaken, to answer the second ques-
tion raised above, involves drawing a separate sample for each sampling design of
interest. From these distinct samples, we calculate b and V ( b ) for each of the sample
designs. After r repetitions, we compute the average bias and the simulation variance
of the estimator for each design.
By way of example, let us consider a simulation of both kinds comparing sam-
pling under finite population and infinite population from Section I. Recall that
Following the first method, we take 1000 samples of size n from our population and
calculate V(bs~s),using the fact that the sample has been drawn without replace-
ment from a finite population. Then we calculate V ( ~ R S Wand
R ) compare the ratio of
the average of these two over the 1000 repetitions.
The ratio should be the inverse of the finite population correction. Indeed
this is confirmed in the results in Table 1. Results from the second type of simu-
ECONOMETRIC ANALYSIS
IN COMPLEX SURVEYS 353
Sample size
Population
size 5 10 20 50 100
lation are presented in Table 2. Here, two separate samples are drawn from the same
population-one under SRS, the other under sampling with replacement. Results for
10,000 repetitions are reported. As expected, the results closely approximate those
in Table 1. One way to interpret these results is that for the same sample size, sam-
pling without replacement is more precise than sampling with replacement. (Since
the variance of the estimator b under SRS is, on average, smaller.) We can also think
of the ratio as representing the “cost” of assuming that sampling is from an infinite
population when in fact sampling is from a finite population.
Tables 3 through 6 present comparisons between RSWR and stratified sam-
pling without replacement, conducted under the first method described. In Table 3,
we consider two strata, each with a population of 1000, where a sample of size rt;
is drawn from each stratum. Since rtl # rt2 and both strata have a population of
1000, the sampling probability is unequal. As we saw in Section 111, the unweighted
estimator of /3 will be biased. As we can see from Table 3, the more unequal the
Sample size
Population
size 5 10 20 50 100
sampling probabilities, the greater the bias in the unweighted estimator, ~ R S W R and
,
the greater the ratio of mean-squared errors. Most data in labor and development
economics is stratified, and the most common case is unequal sampling probabil-
ities, either by design or because of different rates of nonresponse across strata.
Thus, as the simulation shows, a potentially serious bias problem exists even in cal-
culating a simple mean. The general intuition behind these results extends to the
regression case.
In Tables 4 to 6 , we present results from the stratified case, but with equal
probabilities of selection in both strata. In Table 4, we see that even though ~ R S W R
# of Ratio of Expected
clusters Total True variances kish design
sampled sample size pop. Pop Var(bCLu)/ effect
(4 (n = c * = m) mean p P var(bRSWR) (4
5 100 1000 .12 .088 3.13 3.28
1000 .17 .13 5.00 4.23
1000 .26 20 6.48 5.94
1000 .44 .344 7.24 9.36
1000 .48 .38 10.02 10.12
10 200 1000 .12 .ll 2.39 3.28
1000 .17 .15 3.89 4.23
1000 .26 .24 5.78 5.94
1000 .44 .41 10.36 9.36
1000 .48 .45 11.78 10.12
25 500 1000 .12 .119 3.69 3.28
1000 .17 .169 4.29 4.23
1000 .26 .256 5.42 5.94
1000 .44 .439 9.69 9.36
1000 .48 .48 10.09 10.12
nomics to encounter stratified samples where the urban mean income is three times
that of rural mean income. In the case where = 200 and 82 = 600, we see that
this can lead to an overestimate of the variance of the population mean by a factor of
20. It increases for large sample sizes, because the finite population correction has a
proportionally larger effect. From Table 6, we see that stratification does not improve
efficiency when the strata have the same mean regardless of the difference in within-
stratum variance. The small increases in efficiency that we see are the result of the
increasing effect of the finite population correction as the sample size increases. In
other simulations, not reported here, we show that assuming a stratified structure
when the data does not have one leads to no gains in efficiency. This result follows
intuitively from the results in Table 6.
Table 7 presents the cost of ignoring the one-stage clustered sample design
and assuming that the sample is actually a RSWR. Here we have drawn a clustered
sample from one stratum with a population of 1000,which is divided into 50 clusters,
each of size 10. We present the ratio of variances, V ( b c ) / V ( b ~ s w where
~ ) , we have
calculated V ( b c ) using the estimated sample value of 8. We compare this to the
+
expected Kish design effect 1 (rn - 1 ) p given our knowledge of the true value of
p: 8 gives a slight underestimation, which disappears as n + N .
ECONOMETRIC ANALYSIS
IN COMPLEX SURVEYS 357
Ratio of variances
# of Total True
clusters sample pop. Population SRSI SRS/ Clustered/
sampled size mean P clustered systematic systematic
Ratio of
Sample size variances Ratio of MSEs
(for each Strata Strata Bias Bias Vur(bsyS)/ MsE(bRswR)/
stratum) means pop- bw bsys VWbfU) M S E ( bW )
5 (2,900) 100 3.21 -0.09 1.06 1.03
(1000, 1000) 500 0.09 -0.54 1.05 1.02
25 (2,900) 100 4.11 -3.11 2.55 1.78
(1000, 1000) 500 -0.17 -0.04 1.46 1.23
50 (2,900) 100 1.89 0.30 1.89 1.44
(1000, 1000) 500 -0.16 0.77 1.74 1.37
358 ULLAH AND BREUNIG
tematic sampling in a stratified population. Results are for 10,000 repetitions. The
systematic sample does not perform as well as the stratified random sample since
stratification will more evenly cover the population over repeated sampling.
As the results of Sections I through IV demonstrate, problems of inference and
estimation arise when data is gathered under a complex sampling design. The simu-
lation helps to demonstrate that these are of more than trivial interest. Unequal sam-
pling probabilities are the rule, not the exception, and treating such data as having
been drawn under RSWR will lead to biased estimators. Even where the dispropor-
tion is 2 to 1, this leads to large bias as shown in Table 3.
As different strata will usually have different parameter means, ignoring strat-
ification will lead to large overestimates of the true variance of our estimate of ,!?. A
recent survey of income in Kenya showed that average rural income was one-third
that of average urban income. Ignoring stratification when calculating a population
mean in this case will lead to confidence intervals which are 20 times too wide. The
exact opposite problem occurs in clustering. Intraclass correlation coefficients of .5
are common in developing country studies. The simulation shows that ignoring the
sample design leads to an underestimate of the variance by a factor of 10, more if
the average cluster size is greater than 20.
Bias problems and misestimates of standard errors are exacerbated in more
complex sample designs which combine different aspects of stratification, cluster-
ing, and systematic sampling. Clearly, the same problems will arise in the regression
case. The simulation demonstrates that assuming away sample design effects as triv-
ial is unjustified. Instead, more careful attention should be paid to using available
methods of analysis and information on sampling to construct unbiased and more
precise estimates.
VI. APPENDIX
A. Some Useful Expectations
Suppose the elements of an n x 1 vector U satisfy (8)to (11). Let A and B be n x n
symmetric matrices of known constants, b an n x 1 vector of known constants, 1 an
*
n x 1 vector unit of elements, and A B the Hadamard product of A and B. Then
we can verify the following expectations:
E(u’Au) = E
[:+x ] [: + $
= a’tr A
uiujaij
pa’(1’ A1
=E
- tr A )
u?aii ~i~jaij] (116)
r n n n 1
E ( U’Au)(u’Bu)
B. Proof of Proposition 2
Let us use (2) and (17) in (28) and write the estimator 8 = ( G ) 2 of 8 = (cv)’ as
360 ULLAH AND BREUNIG
where U is the error vector where moments are determined by the moments of U in
+ +
(8)to (11).Then y’My = a2v’Mu and y’My = nB2 0(2/3u’i) 02u’Mv, and these
give, up to 0(04),
= a2a2 + 2 , s + a4a4
where
3 -
04a4 = -(u’Mu)(u‘Mu)
n284
ACKNOWLEDGMENTS
The research on this topic is an outcome of the first author’s visit to the Policy
Research Division of the World Bank and helpful discussions with C. Howes, E.
Jimenez, and M. Ravallion. The authors are thankful for comments by participants of
seminars at York University, University of Windsor, UCR, and University of Guelph
ECONOMETRIC ANALYSIS
IN COMPLEX SURVEYS 36 I
and participants in the Canadian Econometric Studies Group 19% meeting. Special
thanks to Gordon Anderson, David Giles, V. P. Godambe, and Lonnie Magee. Fi-
nancial support from the Academic Senate, UCR, is gratefully acknowledged by the
first author.
REFERENCES
Judge, G. G. and M. E. Bock (1978), The Statistical Implications of Pre-Test and Stein Rule
Estimators in Econometrics, North-Holland, Amsterdam.
Kadane, J. B. (1971), Comparisons of k-Class Estimators When the Disturbances are Small,
Econometrica, 39, 723-737.
Kakwani, N. (1980), Income Inequality and Poverty: Methods of Estimation and Policy Appli-
cations, World Bank, Oxford University Press, Oxford.
Kalton, G. (1983), Introduction to Survey Sampling, Sage University Paper #35.
Kiaer, A. N. (1897), The Representative Method of Statistical Surveys (translation 1976, orig-
inal in Norwegian), Kristiania Videnskabsselskabets Skrifter, Historisk-jlosojske Klasse,
4: (34-56), Statistisk Sentralbyra, Oslo.
Kish, L. (1965), Survey Sampling, Wiley, New York.
Kish, L. and M. R. Frankel(l974) Inference from Complex Samples, Journal of Royal Statis-
tical Society, Series B, 1, 1-35.
Kloek, T. (1981), OLS Estimation in a Model Where a Microvariable Is Explained by Aggre-
gates and Contemporaneous Disturbances are Equi-correlated, Econometric, 49,205-
207.
Konjin, M. S. (1962), Regression Analysis in Sample Surveys, Journal of the American Sta-
tistical Association, 57, 590-606.
Levy, P. S. and S. Lemeshow (1991), Sampling of Populations: Methods and Applications, 2nd
ed., Wiley, New York.
Magee, L. (1W6), Improving Survey-Weighted Least Squares Regression, manuscript, Mc-
Master University.
Magee, L., A. L. Robb, and J. B. Burbidge (1996), On the Use of Sampling Weights
When Estimating Regression Models with Survey Data, manuscript, McMaster Univer-
sity.
McKay, A. T. (1932), Distributions of the Coefficient of Variation and the Extended t-Distri-
bution, Journal of the Royal Statistical Society, 95,695-698.
Moulton, B. R. (1990), An Illustration of a Pitfall in Estimating the Effects of Aggregate Vari-
ables on Microunits, Review of Economics and Statistics, 72,334-338.
Neuts, M. (1982), On the Coefficient of Variation of Mixtures of Probability Distributions,
Communications in Statistics-Simulation and Computation, 11,649-657.
Neyman, J. (1938), Contribution to the Theory of Sampling Human Populations, Journal of
the American Statistical Association, 33, 101-1 16.
Neyman, J. (1934), On the Two Different Aspects of the Representative Method: The Method of
Stratified Sampling and the Method of Purposive Selection, Journal of Royal Statistical
Society, 97, 558-606.
Pagan, A. and A. Ullah (199S), Nonparametric Econometrics, manuscript, Australian National
University, Australia.
Pfefferman, D. (1993), The Role of Sampling Weights When Modeling Survey Data, Interna-
tional Statistical Review, 61, 317-337.
Pfefferman, D. and T. M. F. Smith (1985), Regression Models for Grouped Populations in
Cross-Sectional Surveys, International Statistical Review. 53, 37-59.
Porter, R. D. (1973), On the Use of Survey Sample Weights in the Linear Model, Annals of
Economics and Social Measurement, 212, 141-158.
Prasad, B. and M.P. Singh (1992), Unbiased Estimators of Finite Population Variance Using
Auxiliary Information in Sample Surveys, Communications in Statistics-Theory and
Methods, 21,1367-1376.
ECONOMETRIC ANALYSIS
IN COMPLEX SURVEYS 363
Pudney, S.(1989),Modeling Individual Choice: The Econometrics of Corners, Kinks, and Holes,
Basil Blackwell, Oxford.
Rosenblatt, M. (1956), Remarks on Some Nonparametric Estimates of Density Function, An-
nals of Mathematical Statistics, 27, 832-837.
Ruppert, D. and M. P. Wand (1994), Multivariate Locally Weighted Least Squares Regression,
The Annals of Statistics, 22, 1346-1370.
Selden, T. (1994), Weighted Generalized Least Squares Estimation for Complex Survey Data,
Economics Letters, 46, 1 4 .
Sen, A. (1992), Inequality Reexamined, Russell Sage Foundation, Oxford University Press,
New York.
Singh, M. (1993), Behaviour of Sample Coefficient of Variation Drawn from Several Distribu-
tions, Sankhya: The Indian Journal of Statistics, 55,65-76.
Srivastava, V. K., T. D. Dwivedi, M. Beluisky, and R. Tiwari (1980), A Numerical Comparison
of Exact Large Sample and Small Disturbance Approximations of Properties of K-Class
Estimators, International Economic Review, 21,249-252.
Stephan, F. F. (1948),History of the Uses of Modern Sampling Procedures, Journal ofAmerican
Statistical Association, 43.
Sukhatme, P. V. (1984), Sampling Theory of Surveys with Applications, Iowa State University
Press, Ames, IA.
Swamy, P. A. V. B. (1971), Statistical Inference in Random Coeficient Regression Models,
Springer-Verlag, Berlin.
Thompson, S.(1992), Sampling, Wiley, New York.
Tschuprow, A. (1923), On the Mathematical Expectation of the Moments of Frequency Distri-
butions in the Case of Correlated Observation, Metron, 2,461493,646-680.
Ullah, A. and R. Breunig (19%), On the Bias of the Standard Errors of the LS Residual and
the Regression Coefficients under Normal and Nonnormal Errors, Econometric Theory,
Problems and Solutions, 12, 868.
Vinod, H.D. and A. Ullah (1981), Recent Advances in Regression Methods, Marcel Dekker,
New York.
Warren, W. G. (1982),On the Adequacy of the Chi-Squared Approximation for the Coefficient
of Variation, Communications in Statistics-Simulation and Computation, 11, 659-
666.
White, €I. (1980). A Heteroskedasticity-Consistent Covariance Matrix Estimator and a Direct
Test for Heteroskedasticity, Econometrica, 48, 817-838.
Yates, F. and I. Zacapony (193.5), The Estimation of the Efficiency of Sampling with Special
Reference to Sampling for Yield in Cereal Experiments, Journal of Agricultural Sci-
ence, 25,543-577.
Zinde-Walsh, V. and A. Ullah (1987),On Robustness ofTests of Linear Restrictions in Regres-
sion Models with Elliptical Error Distributions, in Time-Series Econometric Modeling
(I. B. Macneil and G . J. Umphrey, ttds.), Reidel, Holland.
This page intentionally left blank
10
Information Recovery
In Simultaneous-Equations Statistical Models
Amos Golan
University of California at Berkeley, Berkeley, California and
American University, Washington, D.C.
George judge
University of California at Berkeley, Berkeley, California
Douglas Miller
Iowa State University, Ames, Iowa
1. INTRODUCTION
For the last five decades a significant portion of econometric effort has been directed
to
Despite the productive efforts of many, questions remain concerning the insecure
assumptions underlying the sampling theory, likelihood, and asymptotic approaches
and the usefulness of traditional multiple-equation estimation and inference proce-
dures in helping us find order when using the partial-incomplete underlying eco-
nomic data that is normally found in practice. Against this backdrop, we propose a
new method of estimation in multiple-equation statistical models that is widely ap-
plicable because it does not require the specification of a parametric family for the
likelihood function. The estimation rule is robust with respect to likelihood, is flexi-
3 65
366 GOLANETAL.
ble with respect to the dynamic, stochastic, and feedback nature of economic data as
well as to the introduction of prior information, and is computationally simple. Using
linear and quadratic risk measures, we compare the finite-sample performance of this
method to other widely used traditional estimation rules.
The organization of the chapter is as follows: In Section I1 the simultaneous-
equations statistical model is introduced, traditional estimation rules are identified,
and corresponding asymptotic and finite-sample performances are noted. In Sec-
tion 111 the maximum-entropy approach to recovering information in the case of
inverse problems with noise is formulated and corresponding asymptotic sampling
properties are developed. In Section IV sampling experiments are proposed as a
basis for comparing finite-sample performance of the alternative estimation rules,
and the resulting empirical sampling-risk results are evaluated. Section V contains
summary comments and recommendations.
To provide a format for analyzing the SESM that reflects an instantaneous feedback
mechanism between some of the variables in the stochastic system of equations,
consider a statistical model consistent with the data generation process for a system
of G simultaneous equations:
where nr -B or
D has a joint density function f ( [ O ] , 52) and Y has some corresponding density func-
tion f ( Y l n , 52).
For single-equation-analysis purposes, it is conventional to assume that r con-
tains -1's on the diagonal and to rewrite the ith equation as
INFORMATIONRECOVERY
IN STATISTICAL MODELS 367
where yi and x represent the endogenous jointly determined variables in the ith
equation and X ; represents the exogenous predetermined variables in the ith equa-
tion, XT represents the exogenous predetermined variables appearing in the system
but not included in equation i, and X represents the K exogenous predetermined
+
variables in the system of equations. Let 2; be a T x (G; K ; ) matrix representing
the G; endogenous Y; and K ; exogenous predetermined X ; variables that appear in
the ith equation with nonzero coefficients. Further, 6; = (gl, bl)' is a (Gi + Ki)-
dimensional vector of unknown and unobservable parameters corresponding to the
endogenous and exogenous variables in the ith equation, and ei is a 3"-dimensional
random vector for the ith equation that is traditionally assumed to have mean 0 and
scale parameter oi;. The variables yi, Y;, Xi are observed, and 6; and e; are unob-
served and unobservable.
Given the ith equation (4), the complete system of G equations may be writ-
[:]
ten as
+ (5)
eCI
or compactly as
y=Zb+e
where, in the context of the traditional SESM, y and e are GT-dimensional random
+
vectors and 6 is an unknown and unobservable Ci(Gi K;)-dimensional vector.
Traditionally e is assumed to have mean 0 and cov(e) = C @I I T , where C is a
G x G unknown covariance matrix. Given (5) and (6),the corresponding system of
reduced form equations may be written as
i
When prior information is available, consistent with the set of discrete points
s$ in (9), this may be specified by corresponding prior probabilities q$ =
T I T
[qikl, q&2, . . . , q;,,,]’. Also, consistent with the set of discrete points V i l , we may
specify corresponding prior probabilities uit = [ u i t l ,u i ~. ., . , U ; ~ J ] ’ .For the com-
plete system of reduced-form equations, the statistical model (7) is reparameter-
ized as
rn
with rn = 1 , 2 , .. . , M and M 2 2.
To simplify notation, we also use the definitions
and
y = ( I G @ x ) s X p n I/w+ (16)
the structural equation to reduced-form restrictions (3),reflecting (13) and (14),
m n n r n
370 GOLANET AL.
and
where pi is the vector of Lagrange multipliers associated with (17) that refers to equa-
tion i, and Q(.) and @(.) are the partition (normalization) functions for the proba-
bilities.
Finally, using (9), (lO), and (12) yields
subject to (16)-(20). Carrying through the first-order conditions and solving the re-
sulting system yields the estimated probabilities
and
In a similar way, estimates are provided for the structural parameters pJ' and p B .
Finally, the point estimates 5,d, and 6 (or 'i/ and 8)
are recovered as in (25)-
(27).
As shown in Chapter 6 of Golan, Judge, and Miller (1996), for the traditional lin-
ear statistical (regression) model, the Hessian matrix of the GCE problem is posi-
tive definite for pi, wi >> 0, and thus satisfies the sufficient condition for a unique
global minimum. When prior information does not exist, both q i k and uit become
uniform (e.g., q i k m = 1/M for all rn and k ) and the GCE solution is equivalent
to the GME solution. Although the GME-GCE solutions do not have a closed
form, the dual unconstrained formulation proposed by Miller (1994) and Golan,
Judge, and Miller (1996) may be used to evaluate the sampling behavior of the
solutions within the extremum of M-estimation framework (Huber, 1981). In gen-
eral, the GME-GCE solutions may be viewed as discrete members of the ex-
ponential family of probability distributions, and these functional forms may be
used to relate the original parameter vector, pi, to the dual parameters, Ai
and pi. The large-sample properties of the GME-GCE estimators are based on
the asymptotic behavior of the dual parameters, and the relationship follows the
corresponding results in the exponential family literature (Brown 1986, Johansen
1979).
Following Golan, Judge, and Miller (1996, Chap. 6), we develop the dual un-
constrained GME-SESM. Given the Lagrangian for the optimization problem (15)-
(20), we substitute the maximum-entropy posteriors probabilities (21)-(24) into the
Lagrangian where, forsimplicity, we use pB and p)' instead of p'. Further, since these
372 GOWNETAL.
posteriors already satisfy the adding-up requirements, Eqs. (18)-(20) are omitted
from the Lagrangian. Using some simple algebra, one gets the dual unconstrained
problem. Specifically,
L ( X , p ) = -pn’ In pn - pDrIn pD - p ~In’ p~ - wrIn w
i k m
r 1
i k i t
i k i n
Minimizing the dual unconstrained GME model with respect to X and p yields fi
and p , which, in turn, yield fin, pJ’,p p , and*.Investigating the concentrated, or
dual, objective function (31) reveals the following properties. First, letting G = 1,
the system reduces to the simple (one equation) linear statistical model where the
last two terms disappear and the summation and indices involving i are deleted from
the first three, Thus, we have the GME estimator for the linear statistical model.
Second, the first three terms correspond to the reduced-form system of equations
and involve the data y’X and the sum of the partition functions for 7r and w, re-
lNFORMATlON RECOVERYIN STATISTICAL MODELS 373
spectively. This part can be viewed as an empirical likelihood function (Golan and
Judge, 1996a) for the reduced-form equations. Third, the last two terms correspond
to the definition (3) or its reparameterized form (17). There is no noise component
involved in these two terms, so they are related to the classical (pure) maximum-
entropy formulation.
What remains is to show that (i) .it is a statistically consistent estimator of m,
B
and (ii) i3/ and are consistent internally estimates of y and 8. Part (i) is a trivial
generalization of Golan, Judge, and Miller (1996, Proposition 6.3, p. 104). Part (ii)
follows the principle of classical (pure) maximum entropy. That is, given the esti-
mated 7r, which serves as the data for the (pure) ME problem, the entropies of y and
j3 are maximized. This ensures estimates that can be realized in the greatest number
of ways consistent with the data (Jaynes 1957a,b, Levine 1980, Golan, Judge, and
Miller 1996, Chap. 3). Furthermore, for those equations that are exactly identified,
the maximum-entropy approach yields the exact mathematical inversion.
An alternative consistency motivation may be based on the following heuristic
argument: Given the value of m, the problem (15)-(20) can be viewed as equivalent
to maximizing (15)subject to
y - ( l c 8 X ) m = vw (32)
7r = snpn (33)
and (17)-(20). The solution is then one of choosing pT with maximum entropy -pn’
In pT to satisfy (32), with the remainder of the problem (w and p6) being separable
from the choice of pn. Given the continuity of the constraint functions in the original
P
problem, and since .ir 7r, it is reasonable to view the problem of deriving w and
__+
B. Remarks
In applied work, many times emphasis is focused on one structural equation in the
system of equations. For this situation it is traditional to use the 2SLS method of
moments (Hansen 1982) or LIML estimators. Within a GME context several pos-
sibilities exist, and a range of these is discussed in Chapter 12 of Golan, Judge,
and Miller (1996). One straightforward GME possibility is just to make use of the
information, yi, zi, in the structural equation of interest. Although this formulation
ignores X:, the exogenous variables in the remainder of the system, the sampling
results provided by Golan, Judge, and Miller (1996) suggest it performs well relative
to traditional sampling-theory competitors.
A GME single-equation formulation and estimation rule consistent with the
objectives of LIML may be developed as a special case of the complete system of
374 GOLANET AL.
x’z;s;pi+ z,’x(x’x)-’x’v,wi
z,’x(x’X)-’x’yi= zL!x(x’x)-’ (34)
Letting Qj = 0 for all j yields the pure moment condition
ix (X’X)- ’X’ZlSiPi
zix (X’X)- I x’y; = z (35)
which, with the addition of a reparameterization for 6i,is identical to the usual first-
order conditions for the GMM estimator. If (35)replaces the relaxed moment condi-
tion (34) as the consistency relation in the GME formulation, then traditional GMM
estimates for Si result. If the relaxed moment relation (34) is used in the GME-GCE
formulation, and the bounded parameter space S contains the true parameter vec-
tor S;, then asymptotically the resulting estimates have, under standard regularity
conditions, the same large-sample properties as the GMM estimators (Judge et a1
1988, pp. 641-643). In finite samples, sampling results presented in Golan, Judge,
and Miller (1996, Chap. 12) suggest that, under a squared error measure, GME is a
superior performing estimator.
Although analytic small-sample results are available in a few cases, much of the in-
formation that we have about the finite-sample properties of simultaneous-equations
estimation rules comes from sampling experiments conducted over a four-decade
period. Despite the usefulness of these studies, many questions are unresolved. For
example, when considered in a loss-risk context, the rankings of traditional estima-
tion rules remain somewhat in doubt. To obtain some experience with the MaxEnt
estimator specified in Section 111, and to gauge how they compare performance-wise
with traditional sampling-theory rules, we conducted a limited range of sampling
experiments. These experiments focused on some of the special characteristics of
nonexperimental economic data such as small samples, collinear relations among
INFORMATION RECOVERYIN STATISTICAL MODELS 375
variables, and the lack of independence between some of the right-hand-side vari-
ables and the equation errors. As a basis for judging estimator performance, we use
the quadratic loss measures 118 - 811’for some unknown 8.
A. Sampling Design
In the sampling experiments, we work with a linear simultaneous-equations model
involving three structural equations. The model follows Tsurumi (1990)and is a mod-
ification of the model structure employed by Cragg (1967).In the context of (l),
4.4
0 .74 0
r= ( -1.0
.222
0
.267
-1.0
.046 -y7)
-1.0
B=
.7
0
0 .53
0 .ll
.96 .13 0
0 0 .56
0
C= ( 1
-1
-.125
-1
4
0.625
-.125
.0625
8
(37)
For comparison purposes, the results for an alternative covariance and drawings from
a t ( 3 ) distribution are also reported. In addition to the zero and normalization restric-
tions, the support spaces specified for the structural and reduced-form parameters
and equation errors are s$ = s$ = [-5, -2.5,0, 2.5, 51’ fork = 2, 3, . . . , 7; s z =
s$ = [-20, - 1 O , O , 10,201’ for k = 1; and srn = [-2, - 1 , O , 1,2]’ for i, n =
1 , 2 , 3 .The errors’ support V is specified as v;, = [-3o,, 0, 3oM]’,where ofiis the
empirical standard deviation of yi. For all experiments, the sampling performances
(empirical risks) are reported for the whole system of structural equations and for a
single parameter in (1). For comparison purposes, 3SLS results that use the COT-
376 GOLANET AL.
rect covariance matrix (37) are reported. To provide results when the analysis focuses
only on one equation, a GME formulation using only the information in (yi, 2;)is
reported.
B. Sampling Results
The sampling results for a range of experiments are summarized in Table 1. The
results for the base experiment ( T = 20, K ( X ’ X ) = 1, and normal errors) are
given in the first row of Table 1. Focusing on the MSE results for the reduced-
form parameter vector T , the unrestricted LS empirical MSE is 125, and thus close
to its theoretical value. In contrast, the GME estimator of T that takes account of
zero restrictions in the system yields MSE(fi) = 4.11. This reduction in MSE
relative to the traditional unrestricted T estimator is impressive. In terms of the
structural parameters, note, relative to the 3SLS estimator with known error C O -
variance, the superior empirical risk performance of the GME estimator. The em-
pirical sampling variability of the GME estimator is given in parentheses for the
structural parameters r and B. These results reflect the relative stability of the
GME estimator, even under conditions of nonnormal errors or a high condition num-
ber. Intuitively speaking, significant improvement of the GME relative to the 3SLS
is due to (i) shrinkage possibilities for both the signal and noise components, (ii)
use of a dual loss function, and (iii) avoiding distributional assumptions (restric-
tions).
To reflect the sampling characteristics of the GME and 3SLS (with known CO-
variance) estimators, we follow Tsurumi (1990) and focus on the y12 parameter in
(1) and give in Figure 1 a frequency plot for the two estimators. Relative to the
3SLS known error covariance estimator, the high concentration of y l 2 and the re-
stricted range variability of the GME estimates are nicely reflected in the empirical
histogram. The empirical bias of the 3SLS (known covariance) estimator is slightly
smaller than that of the GME estimator.
In terms of results for (l),the GME results indicate the empirical-risk gain
when information from the whole system is used (column 4) relative to using only
information from (yi, Zi)in column 7. In contrast, note the empirical-risk superiority
of the GME ( y ; ,2;)estimation rule over the 3SLS (known covariance) estimation rule
for (1). Also note the GME (yi, Zi) estimation rule remains stable and is concentrated
near y12 as the condition number or error distribution changes. Finally, repeating
the experiment involving T = 20 and K ( X ’ X ) = l ) , but using relaxed moment
constraints
t t k m
Table I Empirical-Risk Results from loo0 Experiments Using the SESM, GME, and 3SLs Estimators with MSE Performance Measures“
~
bCov= (
1.4 -2.3 0.9
4.1 y)
378 GOLANET AL.
I I I 1
-2 0 2 4
Gamma12
Figure I Empirical histogram of the GME and 3SLS for Gamma 12.
MSE measure of the GME estimation rule appears to hold over a range of conditions
normally found in practice.
V. CONCLUDING REMARKS
We propose a new GME method for recovering the unknown parameters in a simul-
taneous-equations statistical model that is (i) robust in respect to likelihood; (ii) flex-
ible in respect to introducing sample and nonsample information; (iii) works well in
IN STATISTICAL MODELS
INFORMATION RECOVERY 379
both ill-posed (e.g., collinear X’s) and well-posed problems and with small sam-
ples of data; (iv) has the usual asymptotic sampling properties; (v) in finite samples,
under a squared error loss measure and relative to traditional estimators, is a high-
performing estimation rule; and (vi) is computationally simple.
In contrast to traditional estimation for simultaneous-equations models, it per-
mits the sample information to be introduced in either a data or moment form. It per-
mits information recovery in case of nonlinear and/or nonstationary expectational
models (Golan, Judge, and Karp 1996) and with discrete and/or limited endogenous
regressors (Golan, Judge, and Perloff 1995, 1996). Using the normalized entropy
concept provides a basis for selecting among alternative competing statistical mod-
els (Golan and Judge 199613). By employing the entropy measure for each of the
unknown endogenous and exogenous variables, when all the support spaces s are
defined to be symmetric about zero, it is possible to identify the extraneous variables
in each of the G equations. This problem will be further developed in future work.
The finite-sample results reported suggest, relative to the 3SLS rule with known
covariance, the superior performance of the GME rule under selected experimental
designs. What is needed at this point is extensive sampling experiments that make
a sharp comparison with traditional sampling theory and Bayes’ estimators for the
SESM.
ACKNOWLEDGMENTS
REFERENCES
Linda F. DeBenedictis
Ministry of Humon Resources, Victoria, British Columbia, Conado
David E. A. Giles
University of Victoria, Victoria, British Columbia, Canada
1. INTRODUCTION
383
384 AND GILES
D~ENEDICTIS
idea underlies the well-known family of Hausman (1978) tests and the information
matrix tests of White (1982,1987). This approach to specification testing is based on
the stance that, in practice, there is generally little information about the precisefonn
of any misspecification in the model. Accordingly, no specific alternative specifica-
tion is postulated, and a pure significance test is used. This stands in contrast with
testing procedures in which an explicit alternative hypothesis is stated, and used in
the construction and implementation of the test (even though a rejection of the null
hypothesis need not lead one to accept the stated alternative). In the latter case, we
frequently “nest” the null within the alternative specification and then test whether
the associated parametric restrictions are consistent with the evidence in the data.
The use of likelihood ratio, Wald, and Lagrange multiplier tests, for example, in this
situation are common and well understood.
As noted, specification tests which do not involve the formulation of a specific
alternative hypothesis are pure significance tests. They require the construction of a
sample statistic whose null distribution is known, at least approximately or asymp-
totically. This statistic is then used to test the consistency of the null with the sample
evidence. In the following discussion we will encounter tests which involve a spe-
cific alternative hypothesis, although the latter may involve the use of proxy variables
to allow for uncertainties in the alternative specification. Our subsequent focus on
the RESET test involves a procedure which really falls somewhere between these
two categories, in that although a specific alternative hypothesis is formulated, it is
largely a device to facilitate a test of a null specification. Accordingly, it should be
kept in mind that the test is essentially a “destructive” one, rather than a “construc-
tive” one, in the sense that a rejection of the null hypothesis (and hence of the model’s
specification) generally will not suggest any specific way of reformulating the model
in a satisfactory form. This is certainly a limitation on its usefulness, so it is all the
more important that it should have good power properties. If the null specification
is to be rejected, with minimal direction as to how the model should be respecified,
then at least one would hope that we are rejecting for the right reason(s). Accord-
ingly, in our reconsideration of the RESET test in Sections I11 and IV we emphasize
power properties in a range of circumstances.
Variable-addition tests are based on the idea that if the model specification
is “complete,” then additions to the model should have an insignificant impact, in
some sense. As noted by Pagan and Hall (1983)and Pagan (1984), there are many
forms that such additions can take. For instance, consider a standard linear multiple
regression model, with k fixed regressors and T observations:
y=XBfu (1)
where it may be assumed that (y I X ) -
N [ X / 3 , 0‘1771. One could test this specifi-
cation in terms of the adequacy of the assumed conditional mean of y, namely X/?;
or one might test the adequacy of the assumed conditional covariance matrix, a 2 1 ~ .
The assumed normality could be tested with reference to higher-order moments, as
386 DEBENEDICTIS
AND GILES
in Jarque and Bera (1980).In most of these cases, tests can be constructed by fitting
auxiliary regressions which include suitable augmentation terms, and then testing
the significance of the latter.
and test the hypothesis that y = 0. This assumes, of course, that W is known and
observable. In the event that it is not, a matrix of corresponding proxy variables, W * ,
may be substituted for W , and (2) may be written as
and we could again test if y = 0. As Pagan (1984, p. 106) notes, the effect of this
substitution will show up in terms of the power of the test that is being performed.
An alternative way of viewing (2) (or (3) if the appropriate substitution of the proxy
variables is made below) is by way of an auxiliary regression with residuals from (1)
as the dependent variable:
put-) + E ~ where
, lpl < 1. We wish to test the hypothesis that p = 0. The mean
of y in (1) is then conditional on the past history of y and X so it is conditional on
previous values of the errors. Accordingly, the natural variable-addition test would
involve setting W in (2) to be just the lagged value of U. Of course, the latter is unob-
servable, so the proxy variable approach of (3) would be used in practice, with W*
comprising just the lagged OLS residual series (U:,) from the basic specification,
(1). Of course, in the case of a higher-order AR process, extra lags of U* would be
used in the construction of W * , and we would again test if y = 0. It is important
to note that the same form of variable-addition tests would be used if the alternative
hypothesis is that the errors follow a moving-average process, and such tests are gen-
erally powerful against both alternatives. The standard Durbin-Watson test can be
linked to this approach to testing for model misspecification, and various other stan-
dard tests for serial independence in the context of dynamic models, such as those
of Godfrey (1978), Breusch (1978), and Durbin (1970), can all be derived in this
general manner. Tests for structural stability which can be given a variable-addition
interpretation include those of Salkever (1976), where the variables that are used to
augment the basic model are suitably defined “zero-one” dummy variables. Further,
the well-known tests for regressor exogeneity proposed by Durbin (1954),Wu (1973),
and Hausman (1978) can also be reexpressed as variable-addition tests which use
appropriate instrumental variables in the construction of the proxy matrix W * (e.g.,
Pagan 1984, pp. 114-1 15).
The problem of testing between (nonnested) models is one which has attracted
considerable attention during the last 20 years (e.g., McAleer 1987,1995). Such tests
frequently can be interpreted as variable-addition tests which focus on the specifica-
tion of the conditional mean of the model. By way of illustration, recall that in model
(1)the conditional mean of y (given X , and the past history of the regressors and of y)
is X g . Suppose that there is a competing model for explaining y, with a conditional
mean of X+p, where X and X + are nonnested, and p is a conformable vector of
unknown parameters. To test one specification of the model against the other, there
are various ways of applying the variable-addition principle. One obvious possibility
(assuming an adequate number of degrees of freedom) would be to assign W* = X +
in (3), and then apply a conventional F-test. This is the approach suggested by Pe-
saran (1974) in one of the earlier contributions to this aspect of the econometrics
literature. Another possibility, which is less demanding on degrees of freedom, is to
set W* = X+(X+’X+)-’X+’y (that is, using the ordinary least-squares (OLS) es-
timate of the conditional mean from the second model as the proxy variable), which
gives us the J-test of Davidson and MacKinnon (1981). There have been numer-
ous variants on the latter theme, as discussed by McAleer (1987), largely with the
intention of improving the small-sample powers of the associated variable-addition
tests.
Our main concern is with variable-addition tests which address possible mis-
specification of the functional form of the model or the omission of relevant explana-
388 DEBENEDICTIS
AND GILES
tory effects. The treatment of the latter issue fits naturally into the framework of
Eqs. (1)-(4).The key decision that has to be made in order to implement a variable-
addition test in this case is the choice of W (or, more likely, W*). If we have some
idea what effects may have been omitted wrongly, then this determines the choice of
the “additional” variables, and if we were to make a perfect choice then the usual
F-test of y = 0 would be exact and uniformly most powerful (UMP). Of course,
this really misses the entire point of our present discussion, which is based on the
premise that we have specified the model to the best of our knowledge and ability,
but are still concerned that there may be some further, unknown, omitted effects. In
this case, some ingenuity may be required in the construction of W or W * , which
is what makes the RESET test (and our modification of this procedure in this chap-
ter) of particular interest. We leave a more detailed discussion of the RESET test to
Section 111.
In many cases, testing the basic model for a possible misspecification of its
functional form can be considered in terms of testing for omitted effects in the con-
ditional mean. This is trivially clear if, for example, the fitted model includes simply
a regressor, xt,but the correct specification involves a polynomial in xt. Construct-
ing W* with columns made up of powers of xt would provide an optimal test in this
case. Similarly, if the fitted model included x, as a regressor, but the correct spec-
ification involved some (observable) transformation of xt,such as log(x,), then (2)
could be constructed so as to include both the regressor and its transformation, and
the significance of the latter could be tested in the usual way. Again, of course, this
would be feasible only if one had some prior information about the likely nature of the
misspecification of the functional form. (See also, Godfrey, McAleer, and McKenzie
1988).
More generally, suppose that model (1) is being considered, but in fact the
correct specification is
of = o2 + ztb, (7)
where zt is an observation on a vector of r known variables, and 4 is r x 1. We
then test the hypothesis that 4 = 0. To make this test operational, (7) needs to be
reformulated as a “regression relationship” with an observable “dependent variable”
and a stochastic “error term.” The squared tth element of U in (1) gives us of, on
average, so it is natural to use the corresponding squared OLS residuals on the left
side of (7). Then
+
= o2 zt4 + (U;)2 - of = o2+ ztqb + vt (8)
where vt = ( u : ) ~- of. Equation (8) can be estimated by OLS to give estimates
of o2and 4 and to provide a natural test of 4 = 0. The (asymptotic) legitimacy of
the usual t -test (or F-test) in this capacity is established, for example, by Amemiya
(1977).
So the approach in (8) is essentially analogous to the variable-addition ap-
proach in the case of (2) for the conditional mean of the model. As was the situation
there, in practice we might not be able to measure the zl vector, and a replacement
proxy vector, z:, might be used instead. Then the counterpart to (3)would be
+ + (zt - z;)@ +
= o2 zt4 U, + z;(b +
= o2 U; (9)
390 DEBENEDICTIS
AND GILES
where we again test if 4 = 0, and the choice of z; determines the particular form of
heteroskedasticity against which we are testing.
For example, if z* is an appropriately defined scalar dummy variable then we
can test against a single break in the value of the error variance at a known point.
This same idea also relates to the more general homoskedasticity tests of Harrison
and McCabe (1979)and Breusch and Pagan (1979). Similarly, Garbade’s (1977) test
for systematic structural change can be expressed as the above type of variable-
addition test, with z; = txf; and Engle’s (1982) test against ARCH(1) errors amounts
to a variable addition test with z r = ( u , * - , ) ~Higher-order
. ARCH and GARCH
processes* can be accommodated by including additional lags of ( u : ) ~in the def-
inition of z*. Pagan (1984, pp. 115-118) provides further details as well as other
examples of specification tests which can be given a variable-addition interpretation
with respect to the conditional variance of the errors.
D. Multiple Testing
Variable-addition tests have an important distributional characteristic which we have
not yet discussed. To see this, first note that under the assumptions of model (l),the
UMP test of y = 0 will be a standard F-test if X and W (or W * ) are both non-
stochastic and of full column rank. In the event that either the original or “addi-
tional” regressors are stochastic (and correlated with the errors), and/or the errors
are nonnormal, the usual F-statistic for testing if y = 0 can be scaled to form a
statistic which will be asymptotically chi-square. More specifically, if there are T
observations and if rank(X) = Ic and rank(W) = p, then the usual F-statistic will
be FP,”,under the null (where U = T - k - p). Then p F will be asymptotically x:
under the null.? Now, suppose that we test the model’s specification by means of a
variable addition test based on (2) and denote the usual test statistic by F”. Then,
suppose we consider a second test for misspecification by fitting the “augmented”
model
where rank(2) = q, say. In the latter case, denote the statistic for testing if 6 = 0 by
F”. Asymptotically, p F ” is x:,and qF” is xi,
under the respective null hypotheses.
Now, from the usual properties of independent chi-square statistics, we know
that if the above two tests are independent, then pF” +
qF” is asymptotically x&,
under the null that y = 6 = 0. As discussed by Bera and McKenzie (1987) and
Eastwood and Godfrey (1992, p. 120), independence of the tests requires that plim
*Lee (1991)shows the equivalence of ARCH(p) and GARCH(p, Q ) tests under the null, for a constant
q , where p is any natural number.
?Its distribution under the alternative is discussed in Section 111.
DIAGNOSTIC
TESTING
IN ECONOMETRICS 39 I
*Essentially, this follows from Basu’s (19%)independenre theorem. For example, see Anderson (1971,
pp. 34-43, 116-134,270-276) and the asymptotic extensions discussed by Mizon (1977a, 1977b).
For some general discussion of this point, see Phillips and McCabe (1983), Pagan and Hall (1983), and
Pagan (1984, pp. 116-117,125-127). Phillips and McCabe (1989) also provide extensions to other tests
where the statistics can be expressed as ratios of quadratic forms in a normal random vector.
$For a more comprehensive discussion of this point, see Giles and Giles (1993, pp. 176-180).
392 DEBENEDICTIS
AND GILES
the two-part test for the specification of the conditional mean of the model will differ
from the sizes nominally assigned to either of its component parts, because of the
randomization of the choice of second-stage test which results from the application
of the first-stage test (for the specification of the conditional variance of the model).
More generally, specification tests of the variable-addition type may not be in-
dependent of each other. As noted by Pagan (1984, pp. 125-127), it is unusual for
tests which focus on the same conditional moment to be mutually independent. One
is more likely to encounter such independence between tests which relate to dzferent
moments of the underlying process (as in the discussion above). In such cases there
are essentially two options open. The first is to construct joint variable-addition tests
of the various forms of misspecification that are of interest. This may be a somewhat
daunting task, and although some progress along these lines has been made (e.g.,
Bera and Jarque 1982), there is still little allowance for this in the standard econo-
metrics computer packages. The second option is to apply separate variable-addition
tests for the individual types of model misspecification, and then adopt an “induced
testing strategy” by rejecting the model if at least one of the individual test statistics
is significant. Generally, in view of the associated nonindependence and the likely
complexity of the joint distribution of the individual test statistics, the best that one
can do is to compute bounds on the overall significance level for the “induced test.”
The standard approach in this case would be to use Bonferroni inequalities (e.g.,
David 1981, Schwager 1984), though generally such bounds may be quite wide and,
hence, relatively uninformative. A brief discussion of some related issues is given
by Kramer and Sonnberger (1986, pp. 147-155), and Savin (1984) deals specifically
with the relationship between multiple t-tests and the F-test. This, of course, is di-
rectly relevant to the case of certain variable-addition tests for the specification of
the model’s conditional mean.
There are several further distributional issues which are important in the context
of variable-addition tests. In view of our subsequent emphasis on the RESET test
in this chapter, it is convenient and appropriate to explore these issues briefly in
the context of tests which focus on the conditional mean of the model. However, it
should be recognized that the general points that are made in the rest of this section
also apply to variable-addition tests relating to other moments of the data-generating
process. Under our earlier assumptions, the basic form of the test in which we are
now interested is an F-test of y = 0 in the context of model (2). In that model, if
W is truly the precise representation of the omitted effects, then the F-test will be
UMP. Working, instead, with the matrix of proxy variables W”, as in (3), does not
affect the null distribution of the test statistic in general, but it does affect the power
of the test, of course. Indeed, the reduction in power associated with the use of the
proxy variables increases as the correlations between the columns of W and those
TESTINGIN ECONOMETRICS
DIAGNOSTIC 393
of W* decrease. Ohtani and Giles (1993)provide some exact results relating to this
phenomenon under very general distributional assumptions, and find the reduction
in power to be more pronounced as the error distribution departs from normality.
They also show that, regardless of the degree of nonnormality, the test can be biased*
as the hypothesis error grows, and they prove that the usual null distribution for the
F-statistic for testing if y = 0 still holds even under these more general conditions.
Of course, in practice, the whole point of the analysis is that the existence,
form, and degree of model misspecification are unknown. Although the general form
of W* will be chosen to reflect the type of misspecification against which one is
testing, the extent to which W* is a “good” proxy for W (and hence for the omitted
effect) will not be able to be determined exactly. This being the case, in general it is
difficult to make specific statements about the power of such variabIe-addition tests.
As long as W* is correlated with W (asymptotically), a variable-addition test based
on model (3) will be consistent. That is, for a given degree of specification error,
as the sample size grows the power of the test will approach unity. In view of the
immediately preceding comments, the convergence path will depend on the forms of
W and W * .
The essential consistency of a basic variable-addition test of y = 0 in (2)
is readily established. Following Eastwood and Godfrey (1992, pp. 123-125), and
assuming that X ’ X , W’W, and X’W are each O,(T), the consistency of a test based
on F” (as defined in Section II.D, and assuming independent and homoskedastic
disturbances) is ensured if plim(FW/T) # 0 under the alternative. Now, as is well
known, we can write
(RSS - USS)/p
F” =
USS/( T - k - p )
where RSS denotes the sum of the squared residuals when (2) is estimated by OLS
subject to the restriction that y = 0, and USS denotes the corresponding sum of
squares when (2) is estimated by unrestricted OLS. Under quite weak conditions the
denominator in (11)converges in probability to o2 (by Khintchine’s theorem). So,
by Slutsky’s theorem, in order to show that the test is consistent, it is sufficient to
establish that plim (RSS - USS)/T # 0 under the alternative.? Now, for our problem
we can write
* A biased test is one whose power can fall helow its significance level in some region of the relevant
parameter space.
?Clearly, this plim is zero if‘the null is true, because then both RSSIT and USSIT are consistent estimators
of (T2.
394 DEBENEDICTIS
AND GILES
where R = [Zk, O,], and X* = (X : W ) . Given our assumption about the orders in
probability of the data matrices, we can write plim(X*’X*/T) = Q*, say, where Q*
is finite and nonsingular. Then it follows immediately from (12) that plim[(RSS -
USS)/T] > 0, if y # 0, so the test is consistent. It is now clear why consistency is
retained if W* is substituted for W , as long as these two matrices are asymptotically
correlated. It is also clear that this result will still hold even if W is random or if
W* is random (as in the case of a RESET test involving some function of the OLS
prediction vector from (1) in the construction of W*).
Godfrey (1988, pp. 102-106) discusses another important issue that arises in
this context, and which is highly pertinent for our own analysis of the RESET test in
this chapter. In general, if we test one of the specifications in (2)-(4) against model
(l), by testing if y = 0, it is likely that in fact the true data-generating process differs
from both the null and maintained hypotheses. That is, we will generally be testing
against an incorrect alternative specification, In such cases, the determination of
even the asymptotic power of the usual F-test (or its large-sample chi-square coun-
terpart) is not straightforward, and is best approached by considering a sequence of
local alternatives. Not surprisingly, it turns out that the asymptotic power of the test
depends on the (unknown) extent to which the maintained model differs from the
true data-generating process.
Of course, in practice the errors in the model may be serially correlated and/or
heteroskedastic, in which case variable-addition tests of this type generally will be
inconsistent, and their power properties need to be considered afresh in either large
or small samples. Early work by Thursby (1979, 1982) suggested that the RESET
test might be robust to autocorrelated errors, but as noted by Pagan (1984, p. 127)
and explored by Porter and Kashyap (1984), this is clearly not the case. We abstract
from this situation in the development of a new version of the RESET test later in this
chapter, but it is a topic that is being dealt with in some detail in our current research.
Finally, we should keep in mind that a consistent test need not necessarily have high
power in small samples, so this remains an issue of substance when considering
specific variable-addition tests.
Finally, it is worth commenting on the problem of discriminating between two
or more variable-addition tests, each of which is consistent in the sense described
above. If the significance level for the tests is fixed (as opposed to being allowed to
decrease as the sample size increases), then there are at least two fairly standard
ways of dealing with this issue. These involve bounding the powers of the tests away
from unity as the sample size grows without limit. The first approach is to use the
“approximate slope” analysis of Bahadur (1960,1967). This amounts to determining
how the asymptotic significance level of the test must be reduced if the power of
the test is to be held constant under some fixed alternative. The test statistics are
O,(T), and they are compared against critical values which increase with T . The
approximate slope for a test statistic, S, which is asymptotically chi-square, is just
DIAGNOSTIC
TESTING
IN ECONOMETRICS 395
Among the many “diagnostic tests” that econometricians routinely use, some variant
or other of the RESET test is widely employed to test for a nonzero mean of the error
term. That is, it tests implicitly whether a regression model is correctly specified in
terms of the regressors that have been included. Among the reasons for the popularity
of this test are that it is easily implemented and that it is an exact test whose statistic
follows an F-distribution under the null. The construction of the test does, however,
require a choice to be made over the nature of certain “augmenting regressors” that
are employed to model the misspecification, as we saw in Section 1I.B. Depending on
this choice, the RESET test statistic has a nonnull distribution which may be doubly
noncentral F or totally nonstandard. Although this has no bearing on the size of the
test, it has obvious implications for its power.
The most common construction of the RESET test involves augmenting the
regression of interest with powers of the prediction vector from a least-squares re-
gression of the original specification and testing their joint significance. As a result
*For example, see Geweke (1981), Magee (1987), and Eastwood and Godfrey (1992, p. 132).
396 DE~ENEDICTIS
AND GILES
of the Monte Carlo evidence provided by Ramsey and Gilbert (1972) and Thursby
(1989), for example, it is common for the second, third, and fourth powers of the
prediction vector to be used in this way.* Essentially, Ramsey’s original sugges-
tion, following earlier work by Anscombe (1961),involves approximating the un-
known nonzero mean of the errors, which reflects the extent of the model mis-
specification, by some analytic function of the conditional mean of the model. The
specific construction of the RESET test noted above then invokes a polynomial ap-
proximation, with the least-squares estimator of the conditional mean replacing its
true counterpart.
Other possibilities include using powers and/or cross products of the individ-
ual regressors, rather than powers of the prediction vector, to form the augmenting
terms. Thursby and Schmidt (1977) provide simulation results which appear to fa-
vor this approach. However, all of the variants of the RESET test that have been
proposed to date appear to rely on the use of local approximations, essentially of a
Taylor series type, of the conditional mean of the regression. Intuitively, there may be
gains in terms of the test’s performance if a global approximation were used instead.
This chapter pursues this intuition by suggesting the use of an (essentially unbiased)
Fourier flexible approximation. This suggestion captures the spirit of the develop-
ment of cost and production function modeling, and the associated transition from
polynomial functions (e.g., Johnston 1960) to Translog functions (e.g., Christensen
et al. 1971,1973) and then to Fourier functional forms (e.g., Gallant 1981, Mitchell
and Onvural 1995,1996).
Although Ramsey (1969) proposed a battery of specification tests for the lin-
ear regression model, with the passage of time and the associated development of the
testing literature, the RESET test is the one which has survived. Ramsey’s original
discussion was based on the use of Theil’s (1965, 1968) “BLUS” residuals, but the
analysis was subsequently recast in terms of the usual OLS residuals (e.g., Ramsey
and Schmidt 1976, Ramsey 1983), and we will follow the latter convention in this
chapter. As Godfrey (1988, p. 106)emphasizes, one of the principles which under-
lies the RESET test is that the researcher has only the same amount of information
available when testing the specification of a regression model as was available when
the model was originally formulated and estimated. Accordingly direct tests against
new theories, perhaps embodying additional variables, are ruled out.
A convenient way of discussing and implementing the standard RESET test is
as follows. Suppose that the regression model under consideration is (l), which we
reproduce here:
y=xp+u (13)
*For instance, the SHAZAM (1993)package adopts this approach. Clearly, the first power cannot be used
as an extra regressor in the “augmented” equation as the design matrix would then be perfectly collinear.
DIAGNOSTIC
TESTING
IN ECONOMETRICS 397
y = xg+ ze f E (14)
We then test if 6 = 0 by testing if 13 = 0, using the usual F-test for restrictions
on a subset of the regression coefficients. Different choices of the T x p matrix Z
lead to different variants of the RESET test. As noted, the most common choice is to
construct 2 to have tth row vector
2, = [ ( X b ) T , (Xb)Q, . . . , ( X b ) f + ’ ]
In this case, the power of the RESET test can be computed exactly for any given
degree of misspecification, 6, by recognizing that
*In view of the Monte Carlo evidence provided by Thurshy and Schmidt (1977), in principle it would also
be interesting to consider multivariate Fourier expansions in terms of the original regressors.
TESTING I N ECONOMETRICS
DIAGNOSTIC 399
val of length 2n.Mitchell and Onvural (1995, 1996) and other authors use a linear
transformation.* In our case, this amounts to constructing
where (Xb),;, and (Xb),,, are respectively the smallest and largest elements of the
prediction vector. We also consider an alternative sinusoidal transformation, based
on Box (1966):
Then the 2 matrix for (2) is constructed to have tth row vector
for some arbitrary truncation level p’. This recognizes that the Fourier approxima-
tion, g ( x ) , of a continuously differentiable function, f(x), is
where?
ug =
uj =
+
*If the data have to be positive, as in the case of cost functions, then a positive interval such as [c, c 2x1
would be appropriate.
tThe range of summation in (22) is from j = 1 to j = 00; the ranges of integration in (23) are each from
--II to +R.
400 AND GILES
DE~ENEDICTIS
In our case, the u;’s and v;’s in (23) correspond to elements of 8 in(14), so they
are estimated as part of the testing procedure.* Note that 2 is random here, but only
through b. Our FRESET test involves constructing 2 in this way and then testing if
8 = 0 in (14). Under this null, the FRESET test statistic is still central F with 2p’
and t - k - 2p‘ degrees of freedom. Its nonnull distribution will depend upon the
form of the model misspecification, the nature of the regressors, and the choice of p’.
This distribution will differ from that of the RESET test statistic based on a (trun-
cated) Taylor series of X b . In the following discussion we use the titles FRESETL
and FRESETS to refer to the FRESET tests based on the linear transformation (19)
and the sinusoidal transformation (20) respectively.
V. A M O N T E CARLO EXPERIMENT
We have undertaken a Monte Carlo experiment to compare the properties of the new
FRESET tests with those of the conventional RESET test for some different types
of model misspecification. Table 1 shows the different formulations of the tests that
we have considered in all parts of the experiment. Effectively, we have considered
choicest of p = 1 , 2 , or 3 and p’ = 2 or 3. In the case of the RESET test, the vari-
ables whose significance is tested (i.e., the “extra” 2 variables which are added to
the basic model) comprise powers of the prediction vector from the origind model
under test, as in (15). For the FRESET test they comprise sines and cosines of mul-
tiples of this vector, as in (21), once the linear transformation (19)or the sinusoidal
transformation (20) has been used for the FRESETL and FRESETS variants respec-
t i vel y.
Three models form the basis of our experiment, and these are summarized in
Table 2. In each case we show a particular data-generating process (DGP), or “true”
model specification, together with the model that is actually fitted to the data. The
latter “null” model is the one whose specificatiori is being tested. Our model 1 allows
for misspecification through static variable omission, and corresponds to models 6-8
(depending on the value of y ) of Thursby and Schmidt (1977, p. 638). Our model 2
allows for a static misspecification of the functional form, and our model 3 involves
the omission of a dynamic effect. In each case, x p , x3, and x4 are as in Ramsey and
*The parameter uo gets “absorbed” into the coefficient of the intercept in the model (14).
We found that setting p’ > 3 resulted in a singular matrix when constructing the FRESET tests. East-
wood and Gallant (1991)suggest that setting the numher of parameters equal to the sample size raised
to the two-thirds power will ensure consistency anti asymptotic normality when estimating a Fourier
function. Setting p’ = 2 or p’ = 3 is broadly in keeping with this for our sample sizes. As Mitchell
and Onvural(l996) note, increasing p’ will incwase the variance of test statistics. In the context of the
FRESET test it seems wise to limit the value of p’.
DIAGNOSTIC IN ECONOMETRICS
TESTING 40 I
RESET
Pt
FRESETL 2 1)sin(w,), cos(wt), sin(2wl), cos(2wt)
(linear transformation) 3 2) sin(w,), cos(wI), sin(2wt), cos(2wL),
sin(3wl), cos(3w,)
FRESETS 2 3 ) sin(w,), cos(w,), sin(2wl), cos(2wl)
(sinusoidal transformation) 3 4) sin(w,), cos(w,), sin(2wl), cos(2wL),
sin (3w t) , cos (3wt)
Table 2 Models
~~
1 + + +
DGP: y, = 1.0 - 0 . 4 ~ 3 , ~ 4 , y ~ g l U ( Omitted variable
+
Null: y, = Bo B P 3 l B 4 W + + U, (omitted static effect)
2 + + +
DGP: y, = 1.0 - 0 . 4 ~ 3 , xal( 1 y x z r ) ut Incorrect functional fon..
+
Null: yt = Bo B3X.3, + +
64x4, U , (omitted multiplicative effect)
3 + + +
DGP: yt = 1.0 - 0 . 4 ~ 3 , ~ 4 1 ~ ~ l - 1 U , Incorrect functional form
+
Null: YL = P O P 3 X 3 , P@4,+ + U, (omitted dynamic effect)
Gilbert (1972) and Thursby and Schmidt (1977),* and sample sizes of T = 20 and
T = 50 have been considered.
Various values of y were considered in the range [-8.0, +8.0] in models 1
and 2, though in the latter the graphs and tables reported relate to a “narrower”
range as the results “stabilize” quite quickly. Values of y in the (stationary) range
[ -0.9, +0.9] were considered in model 3. If y = 0 the fitted (null) model is correctly
specified. Other values of y generate varying degrees of model misspecification, and
we are interested in the probability that each test rejects the null model (by rejecting
the null hypothesis that 8 = 0 in (14)) when y # 0. For convenience, we will term
these rejection rates “powers” in the ensuing discussion. However, care should be
*Ramsey and Gilhert (1972, p. 185) provide data for two series, xi and x g , for a sample size of T = 10.
(We follow them in “repeating” these values to generate regressors which are “fixed in repeated samples”
+
when considering samples of size T = 20, SO.) As in Thurshy and Schmidt (1977), xg = xi x 2 ; and
x4 = x;/10.
402 DEBENEDI~TISAND GILES
1.2
0.8
1 0.6
RESET
FRESETL
0.4
0.2
1.2
0.0
FRESETL
]0.6
c FRESETS L
0.4
0.2
0
~ ~ ~ Q T Q ~ ~ O - W F
ormv
1.2
0.8
1 0.8
0.4
0.2
1.2
0.8
1 0.6 FRESETL
FR ESET6
0.4
0.2
0
c c
1.2
0.8
RESET
1 0.B
FRESETL
0.4
0.2
1.2
0.8
1 0.6
0.4
0.2
a 1
taken over their interpretation in the present context. Strictly, of course, the power
of the RESET or FRESET test is the rejection probability when 0 # 0, As noted
in Section 11, this power can be determined (in principle), as the test statistics are
typically doubly noncentral F when 8 # 0. Only in the very special case where 2 8
in (14) exactly coincides with the specification error, t, would “powers” of the sort
that we are computing and reporting actually correspond to theformal powers of the
tests. (In model 1, for example, this would require that the RESET test be applied by
using just xp,fortuitously, as the only “augmenting” variable rather than augmenting
the model with powers of the prediction vector.) Accordingly, it is not surprising that,
in general, the shapes of the various graphs reported in the next section do not accord
with that for the true power curve for an F-test.
The error term, u, was generated to be standard normal, though of course the
tests are scale invariant and so the results are invariant to the value of the true error
variance. The tests were conducted at the 5% and 10%significance levels. As the
RESET and FRESET test statistics are exactly F-distributed if y = 0, there is no
“size distortion” if the appropriate F critical values are used-the nominal and true
significance levels coincide. Precise critical values were generated using the Davies
(1980)algorithm as coded in the DISTRIB command in the SHAZAM (1993)pack-
age. Each component of the experiment is based on 5000 replications. Accordingly,
from the properties of a binomial proportion, the standard error associated with a
rejection probability, ~t ( in Tables 3 4 , takes the value [n(1 - ~t)/5000]’/~, which
takes its maximum value of 0.0071 when y = 0.5. The simulations were undertaken
using SHAZAM code on both a PC and a DEC Alpha 3000/400.
In this section, we present the results of Monte Carlo experiments designed to gather
evidence on the power of the various RESET and the above FRESET tests in Table 1
for each model in Table 2. The experimental results and the graphs of the rejection
rates are given below. For convenience we will refer to the graphs below as “power”
curves. As discussed, it is important to note that these are not conventional power
curves. Only results for case 3 for the RESET and case 2 for the FRESET tests in
Table 1 are presented in detail, for reasons which will become evident. The entries
in Tables 3-5 are the proportions of rejections of the null hypothesis. Not surpris-
ingly, for all models considered, the RESET, FRESETL, and FRESETS tests exhibit
higher rejection rates at the 10% significance level than at the 5% level. However,
the “pattern” of the power curves is insensitive to the choice of significance level,
hence we will focus on the 10%significance level in the remainder of the discussion.
Generally, the patterns of the power curves differ only marginally when the
sample size is increased from 20 to 50. The results for a sample size of 50 display
higher power than the comparable sample-size-20 results, reflecting the consistency
Table 3 Model 1, p = 3, p’ = 3
T = 20 T = 50
5% 10% 5% 10%
y RESET FRESETS FRESETL RESET FRESETS FRESETL RESET FRESETS FRESETL RESET FRESETS FRESETL
-8.0 0.188 0.893 I .WO 0.710 1.000 0.942 1.ooo 0.998 1.000 1.Ooo 1.OOO 0.998
-7.5 0.209 0.909 1.ooo 0.709 1.000 0.946 1.ooo 0.999 1.000 1.000 1.Ooo 0.999
-7.0 0.233 0.913 1.000 0.708 1.ooo 0.944 1.ooo 0.999 1.ooo 1.ooo 1.OOO 1.WO
-6.5 0.258 0.874 1.000 0.705 1.000 0.918 1.000 0.998 1.000 1.000 1.OOO 0.999
-6.0 0.284 0.868 1.ooo 0.700 1.ooo 0.915 1.ooo 0.996 1.ooo 1.ooo 1.Ooo 0.999
-5.5 0.309 0.870 1.000 0.695 1.ooo 0.911 1.000 0.995 1.000 1.000 1.oOO 0.998
-5.0 0.331 0.879 1.ooo 0.687 1.ooo 0.925 1.ooo 0.998 1.ooo 1.000 1.000 0.999
-4.5 0.350 0.801 1.ooo 0.674 1.ooo 0.869 1.000 0.992 1.ooo 1.000 1.Ooo 0.996
-4.0 0.366 0.8% 1.000 0.658 1.000 0.937 1.ooo 0.998 1.000 1.000 1.OOO 0.998
-3.5 0.378 0.872 1.ooo 0.639 1.000 0.918 1.ooo 0.994 1.ooo 1.000 1.000 0.996
-3.0 0.384 0.795 0.997 0.609 1.ooo 0.870 0.999 0.998 1.000 1.000 LOO0 0.998
-2.5 0.369 0.767 0.965 0.576 0.992 0.854 0.996 0.995 1.ooo 0.999 1.OOO 0.998
-2.0 0.337 0.646 0.850 0.516 0.930 0.766 0.975 0.991 1.000 0.992 1.000 0.995
-1.5 0.266 0.414 0.580 0.419 0.738 0.565 0.866 0.941 0.997 0.941 0.998 0.%5
-1.0 0.137 0.216 0.254 0.235 0.392 0.334 0.479 0.706 0.791 0.617 0.882 0.806
-0.5 0.064 0.092 0.091 0.133 0.168 0.161 0.152 0.207 0.233 0.247 0.352 0.315
0.0 0.046 0.050 0.054 0.101 0.106 0.105 0.052 0.054 0.053 0.104 0.102 0.101
0.5 0.141 0.095 0.096 0.233 0.173 0.173 0.339 0.237 0.249 0.471 0.366 0.355
1.o 0.117 0.204 0.176 0.206 0.289 0.332 0.409 0.713 0.643 0.556 0.766 0.817
1.5 0.061 0.381 0.267 0.126 0.413 0.528 0.365 0.916 0.907 0.524 0.954 0.953
2.0 0.037 0.617 0.608 0.091 0.745 0.749 0.406 0.987 0.9% 0.593 0.999 0.992
2.5 0.027 0.720 0.679 0.083 0.771 0.807 0.559 0.980 0.997 0.748 1.000 0.989
3.0 0.02 1 0.778 0.983 0.085 0.994 0.866 0.727 0.998 1.000 0.881 1.Ooo 0.999
3.5 0.019 0.960 0.992 0.087 0.999 0.979 0.854 1.000 1.000 0.956 1.000 1.000
4.0 0.017 0.829 0.999 0.089 1.ooo 0.887 0.935 0.995 1.000 0.987 1.000 0.997
4.5 0.014 0.853 1.000 0.091 1.ooo 0.910 0.976 0.998 1.000 0.998 1.OOO 0.999
5.0 0.011' 0.857 1.000 0.093 1.000 0.908 0.993 0.995 1.000 0.999 1.OOO 0.996
5.5 0.009 0.879 1.000 0.094 1.000 0.929 0.998 0.996 1. O N 1.000 1.OOO 0.997
6.0 0.007 0.915 1.000 0.095 1.ooo 0.956 0.999 1.000 1.000 1.000 1.000 1.000
6.5 0.005 0.920 1.000 0.096 1.ooo 0.949 1.000 0.999 1.000 1.000 1.000 0.999
7.O 0.005 0.846 1.000 0.097 1.000 0.907 1.ooo 0.999 1.000 1.000 LOO0 1.000
7.5 0.003 0.811 1.000 0.098 1.ooo 0.903 1.ooo 0.994 1.000 1.000 1.Ooo 0.997
8.0 0.002 0.913 1.ooo 0.096 1.ooo 0.946 1.000 0.999 1.000 1.000 1.OOO 0.999
408 DEBENEDICTIS
AND GILES
of the tests. This is also in accord with the fact that the larger sample size yields
“smoother” power curves in the FRESETL and FRESETS cases for models 1 and 2,
as in Figs. 1 to 4.
The probability of rejecting a true null hypothesis when specification error is
present depends, in part, on the number of variables included in Z t . In general, our
results indicate, regardless of the type of misspecification, that the use of p = 3 in
the construction of Z , as in (15)yields the most powerful RESET test. However, this
does not always hold, such as in model 3 where the RESET test with only the term jf
included in the auxiliary regression yields higher power for relatively large positive
misspecification ( y > 0.3) and large negative misspecification ( y < -0.7).
The FRESETL and FRESETS tests with p’ = 3 terms are generally the most
powerful of these tests. The pattern of the power curves tends to fluctuate less and
the results indicate higher rejection rates than in the comparable p’ = 2 case. This
is not surprising, as we would expect a better degree of approximation to the omitted
effect as more terms are included. However, the ability to increase the number of
test variables included in the auxiliary regression is constrained by the degrees of
freedom. We focus primarily on the RESET test with p = 3 and the FRESET tests
with p f = 3.
In all cases, the FRESET tests perform equally as well as, and in many cases
yield higher powers than, the comparable RESET tests. A comparison of the rejec-
tion rates of the various tests for the three models considered indicates FRESETL is
the most powerful test for models 1 and 2. The FRESETS test yields higher power
for model 3 than the FRESETL and RESET tests, with the exception of high levels
of misspecification, where FRESETL exhibits higher rejection rates. The FRESETS
test yields higher rejection rates than the comparable RESET test for models 1 and 2,
with two exceptions. First, model 1 in the presence of a high degree of misspecifica-
tion; second, model 2 in the presence of positive levels of misspecification ( y > 0).
However, FRESETL yields higher rejection rates than the RESET test for the two
exceptions. The FRESETL test dominates the RESET test for model 3 , as in Figs. 5
and 6.
The power of the RESET test is excellent for models 1 and 2, and p = 3, with
sample size 50. Then for larger coefficients of the omitted variable, the proportion of
rejections increases to 100%.For model 1, the use of the squares, cubes, and fourth
powers of the predicted values as the test variables for the RESET test results in
power which generally increases as the coefficient of the omitted variable becomes
increasingly negative. In the presence of positive coefficients of the omitted vari-
able, the rejection rate generally increases initially as the level of misspecification
increases but decreases as the coefficient of the omitted variable continues to in-
crease. However, power begins to marginally increase again at moderate levels of
misspecification ( y = 3 ) .
Our results for model 2 indicate the power of the RESET test increases as the
coefficient of the omitted variable increases for lower and higher levels of misspec-
Table 4 Model 2, p = 3, p’ = 3
T = 20 T = 50
5% 10% 5% 10%
y RESET FRESETS FRESETL RESET FRESETS FRESETL RESET FRESETS FRESETL RESET FRESETS FRESETL
-1.0 1.0oO 0.891 1.Ooo 1.Ooo 1.Ooo 0.922 1.Ooo 0.995 1.Ooo 1.Ooo 1.Ooo 0.998
-0.9 1.Ooo 0.865 1.Ooo 1.Ooo 1.Ooo 0.902 1.Ooo 0.989 1.Ooo 1.Ooo 1.Ooo 0.992
-0.8 1.Ooo 0.924 0.994 1.0oO 0.999 0.950 1.Ooo 0.998 1.Ooo 1.Ooo 1.Ooo 0.999
-0.7 0.996 0.910 0.832 1.Ooo 0.954 0.937 1.Ooo 0.994 1.Ooo 1.Ooo LOO0 0.996
-0.6 0.883 0.938 0.843 0.980 0.942 0.958 1.Ooo 0.998 1.Ooo 1.Ooo 1.Ooo 0.998
-0.5 0.339 0.931 0.996 0.604 0.999 0.953 1.Ooo 0.996 1.m 1.Ooo 1.Ooo 0.997
-0.4 0.013 0.926 1.o00 0.037 1.Ooo 0.950 0.309 0.998 1.Ooo 0.481 1.Ooo 0.999
-0.3 0.243 0.866 0.999 0.391 1.Ooo 0.909 0.870 0.989 1.Ooo 0.926 1.Ooo 0.994
-0.2 0.880 0.735 0.874 0.945 0.947 0.847 1.Ooo 0.994 1.Ooo 1.Ooo 1.Ooo 0.997
-0.1 0.404 0.221 0.23 1 0.552 0.363 0.340 0.899 0.697 0.776 0.952 0.862 0.792
0.0 0.046 0.050 0.054 0.101 0.105 0.106 0.043 0.M9 0.048 0.094 0.101 0.105
0.1 0.387 0.216 0.290 0.539 0.428 0.339 0.898 0.694 0.847 0.947 0.917 0.785
0.2 0.014 0.647 0.809 0.032 0.902 0.766 0.148 0.975 1.Ooo 0.290 1.0oO 0.985
0.3 0.022 0.851 0.980 0.076 0.995 0.899 0.729 0.991 1.Ooo 0.888 1.Ooo 0.9%
0.4 0.015 0.892 0.945 0.090 0.973 0.930 0.955 0.993 1.Ooo 0.994 1.Ooo 0.995
0.5 0.005 0.825 0.940 0.047 0.969 0.873 0.984 0.98 1 1.Ooo 0.998 1.Ooo 0.988
0.6 0.001 0.912 1.Ooo 0.021 1.000 0.938 0.994 0.993 1.Ooo 1.Ooo 1.Ooo 0.995
0.7 0.001 0.831 1.Ooo 0.014 1.Ooo 0.873 0.999 0.988 1.Ooo 1.Ooo 1.Ooo 0.992
0.8 0.Ooo 0.840 1.Ooo 0.020 1.Ooo 0.882 1.000 0.991 1.Ooo 1.Ooo 1.Ooo 0.994
0.9 0.001 0.910 1.Ooo 0.041 1.Ooo 0.939 1.Ooo 0.998 1.Ooo 1.Ooo 1.Ooo 0.999
1.o 0.001 0.936 1.000 0.099 1.000 0.960 1.Ooo 0.997 1.Ooo 1.Ooo 1.Ooo 0.999
Table 5 Model 3, p = 3, p’ = 3
T = 20 T = 50
5% lW0 5% 1WO
y RESET FRESETS FRESETL RESET FRESETS FRESETL RESET FRESETS FRESETL RESET FRESETS FRESETL
~
-0.9 O.OO0 0.858 0.989 0.Ooo 0.995 0.899 0.Ooo 0.997 1.Ooo 0.099 1.Ooo 0.998
-0.8 0.001 0.849 0.987 0.009 0.998 0.891 0.065 0.990 1.Ooo 0.278 1.Ooo 0.993
-0.7 0.020 0.851 0.913 0.069 0.970 0.909 0.586 0.995 1.Ooo 0.815 1.0oO 0.998
-0.6 0.055 0.709 0.672 0.147 0.825 0.819 0.816 0.994 1.Ooo 0.924 1.Ooo 0.996
-0.5 0.073 0.573 0.402 0.173 0.572 0.708 0.801 0.986 0.998 0.908 1.OO0 0.993
-0.4 0.068 0.482 0.252 0.160 0.376 0.609 0.674 0.964 0.934 0.805 0.967 0.982
-0.3 0.060 0.337 0.159 0.129 0.254 0.474 0.449 0.896 0.631 0.603 0.734 0.936
-0.2 0.052 0.174 0.094 0.104 0.167 0.286 0.222 0.617 0.293 0.338 0.410 0.740
-0.1 0.046 0.077 0.064 0.095 0.120 0.150 0.090 0.200 0.108 0.160 0.191 0.309
0.0 0.045 0.053 0.052 0.096 0.102 0.101 0.059 0.055 0.051 0.101 0.103 0.109
0.1 0.042 0.074 0.057 0.094 0.110 0.136 0.069 0.179 0.110 0.130 0.192 0.285
0.2 0.031 0.156 0.085 0.069 0.150 0.249 0.099 0.591 0.329 0.172 0.427 0.712
0.3 0.020 0.255 0.132 0.047 0.211 0.373 0.142 0.840 0.4% 0.241 0.575 0.900
0.4 0.014 0.327 0.181 0.037 0.272 0.462 0.233 0.836 0.527 0.372 0.m 0.882
0.5 0.011 0.360 0.197 0.039 0.287 0.478 0.382 0.839 0.518 0.555 0.618 0.883
0.6 0.011 0.357 0.172 0.037 0.251 0.469 0.524 0.909 0.588 0.695 0.700 0.941
0.7 0.007 0.285 0.122 0.029 0.184 0.388 0.550 0.883 0.722 0.741 0.820 0.928
0.8 0.003 0.162 0.076 0.013 0.115 0.242 0.371 0.786 0.659 0.593 0.781 0.861
0.9 O.OO0 0.037 0.019 0.002 0.037 0.059 0.034 0.277 0.157 0.100 0.250 0.381
DIAGNOSTIC
TESTING
IN ECONOMETRICS 4I I
1.2
0.8
1 0.6
0.4
0.2
. I
‘.
’.
0 . I
c
amrr
ification. This result generally holds, but as can be seen by Figure 7, it is possible
for power to decrease as the level of misspecification increases. The test yields low
power at positive levels of misspecification for a sample size of 20 when there is an
omitted multiplicative variable. For model 3 and both sample sizes, the rejection
rate initially increases as the coefficient of the omitted variable increases and then
falls as the degree of misspecification continues to increase.
The powers of the FRESETL and FRESETS tests are excellent for models 1
and 2, when p’ = 3. The proportion of rejections increases to 100%as the coeffi-
cient of the omitted variable increases, with the exception of the FRESETS test* for
model 1. The inclusion of three sine and three cosine terms of the predicted values
as the test variables for the FRESETL and FRESETS tests results in power gener-
ally increasing as the coefficient of the omitted variable increases for models 1 and
2 with both sample sizes. However, as can be seen by Figure 8, it is possible for the
rejection rate to decrease as the coefficient of the omitted variable increases in the
p‘ = 2 case. For models 1 and 2 the “power” curve increases at a faster rate and is
“smoother” for sample size 50.
*For FRESETS, the number of rejections is greater than 90% for higher levels of misspecification.
4 12 DEBENEDICTIS
AND GILES
1.2
0.8
1 0.6
0.4
0.2
v
0 I
9 3 9 9 y q C y y o - N " ' m ~ t - 0
QumU
For model 3, our results indicate that the rejection rate increases initially as
the coefficient of the omitted variable increases and then decreases as the level of
misspecification continues to increase for positive omitted coefficients of the lagged
variable. However, the rejection rate increases as misspecification increases for neg-
ative coefficients of the omitted lagged variable.
Finally, we have also considered the inclusion of an irrelevant explanatory
variable in order to examine the robustness of the RESET and FRESET tests to an
overspecification of the model. We consider model 1 where the DGP now becomes
yt = Bo + B3x3t + B4x4t + vt
and the "null" becomes
yt 1.0 - 0.4~3, + + yxzt +
~ 4 t (25)
In this case, the coefficient ( y ) of the redundant regressor is freely estimated and
therefore we cannot consider a range of preassigned y values. Our results indicate
that the power results differ negligibly from the true significance levels, as the re-
jection rates fall within two maximum standard deviations of the size.* That is, the
tests appear to be quite robust to a simple overspecification of the model.
VII. CONCLUSIONS
the corresponding tests when the robust error covariance matrix estimators of White
(1980) and Newey and West (1987)are used in their construction. In these cases,
as well as in the case of models in which the null specification is dynamic and/or
nonlinear in the parameters, asymptotically valid (chi-square) counterparts to the
RESET and FRESET tests are readily constructed, as in Section 1I.D. The finite-
sample qualities of these variants of the tests are also under investigation by the
authors.
ACKNOWLEDGMENTS
We are grateful to Karlyn Mitchell for helpful correspondence in relation to the use of
Fourier functional forms and for supplying unpublished material, to Lindsay Tedds
for her excellent research assistance, and to Peter Kennedy, John Small, and Michael
Veal1 for their helpful comments. This research was supported by University of Vic-
toria Internal Research Grant 1-41566.
REFERENCES
Jarque, C. J. and A. K. Bera (1980), Efficient Tests for Normality, Homoscedasticity and Serial
Independence of Regression Residuals, Economics Letters, 6,255-259.
Jarque, C. J. and A. K. Bera (1987), A Test for Normality of Observations and Regression
Residuals, International Statistical Review, 55, 163-172.
Johnston, J. (1960), Statistical Cost Analysis, McGraw-Hill, New York.
Kramer, W. and H. Sonnberger (1986), The Linear Regression Model under Test, Physica-
Verlag, Heidelberg.
Lee, J. H. H. (1991), A Lagrange Multiplier Test for GARCH Models, Economics Letters, 37,
265-27 1.
MacKinnon, J.G. and L. Magee (1990),Transforming the Dependent Variable in Regression
Models, International Economic Review, 3 1 , 315-339.
Magee, L. (1987), Approximating the Approximate Slopes of LR, W and LM Test Statistics,
Econometric Theory, 3,247-271.
McAleer, M. (1987), Specification Tests for Separate Models: A Survey, in M. L. King and
D. E. A. Giles (eds.), Specijcation Analysis in the Linear Model, Routledge and Kegan
Paul, London, 146-196.
McAleer, M. (1995), Sherlock Holmes and the Search for Truth: A Diagnostic Tale, reprinted
in L. Oxley et al. (eds.), Surveys in Econometrics, Blackwell, Oxford, 91-138.
Milliken, G. A. and F. A Graybill (1970), Extensions of the General Linear Hypothesis Model,
Journal ojthe American Statistical Association, 65, 797-807.
Mitchell, K. and N. M. Onvural (1995), Fourier Flexible Cost Functions: An Exposition and
Illustration Using North Carolina S & L’s, mimeo., College of Management, North Car-
olina State University.
Mitchell, K. and N. M. Onvural(19%), Economies of Scale and Scope at Large Commercial
Banks: Evidence from the Fourier Flexible Functional Form, Journal ofMoney, Credit
and Banking, 28,178-199.
Mizon, G. E. (1997a), Inferential Procedures in Nonlinear Models; An Application in a U.K.
Cross Section Study of Factor Substitution and Returns to Scale, Econometrica, 45,
1221-1242.
Mizon, G. E. (1977b), Model Selection Procedures, in M. J. Artis and A. R. Nobay (eds.),
Studies in Modern Economic Analysis, Blackwell, Oxford, 97-120.
Newey, W. and K. West (1987), A Simple, Positive-Definite, Heteroskedasticity and Autocor-
relation Consistent Covariance Matrix, Econometrica, 55, 703-708.
Ohtani, K. and J. A. Giles (1993),Testing Linear Restrictions on Coefficients in a Linear Model
with Proxy Variables and Spherically Symmetric Disturbances, Journal of Economet-
rics, 57,393-406.
Pagan, A. R. (1984), Model Evaluation by Variable Addition, in D. F. Hendry and K. F. Wallis
(eds.), Econometrics and Quantitative Economics, Blackwell, Oxford, 103-133.
Pagan, A. R. and A. D. Hall (1983), Diagnostic Tests as Residual Analysis, Econometric Re-
views, 2, 159-218.
Pesaran, M. H. (1974), On the Problem of Model Selection, Review of Economic Studies, 41,
153-17 1.
Phillips, G. D. A. and B. P. M. McCabe (1983), The Independence of Tests for Structural
Change in Econometric Models, Economics Letters, 12,283-287.
Phillips, G. D. A. and B. P. M. McCabe (1989), A Sequential Approach to Testing for Structural
Change in Econometric Models, in W. Kramer (ed.), Econometrics ojStructural Change,
Physica-Verlag, Heidelberg, 87-101.
DIAGNOSTIC
TESTING
IN ECONOMETRICS 4 17
Michael R. Veall
McMaster University Hamilton, Ontario, Canada
1. INTRODUCTION
419
420 VEALL
We shall suggest that there are two principal stages in the development of the
bootstrap. In the first stage, the bootstrap was used as a replacement for analytic
methods of calculating standard errors, confidence intervals, etc. This makes obvi-
ous sense when there are no available analytic methods (the “better than nothing”
case). This first-stage bootstrap has also sometimes been used by researchers under
the impression that its finite-sample properties would prove superior to the analytic
alternatives. While this is sometimes true, there is little general theoretical support
for this position. Section I1 focuses on this first stage of development. In Section
111 a number of applications are described, emphasizing cases such as confidence
intervals for forecasts and inference after specification search, where the bootstrap
may be used because there is no good alternative. Section IV emphasizes the second
stage of the bootstrap literature concerning cases where asymptotic, analytic tools
are available but in which bootstrap refinements are used to improve finite-sample
performance. These are most valuable when the original estimation is plagued by
substantial bias. Section V considers more applications, sometimes comparing the
results of different approaches. The cases of system estimation and nonlinearities are
emphasized. Section VI summarizes briefly and concludes, A principal conclusion
is that the bootstrap, while no panacea, may be an important step toward a style of
econometric practice that routinely checks the applicability of inferential tools that
are not exact in the statistical sense and hence depend on some form of potentially
unreliable asymptotic approximation. This argument for simulation is in addition to
that of McCloskey and Ziliak (1996),who suggest simulation as a tool for elucidating
the economic meaning of econometric results.
Finally a caveat: no paper could now cover all the published applications of
the bootstrap in empirical economics, let alone the theoretical developments which
may prove relevant to future practice. This chapter instead tries to describe the
development of the bootstrap along the lines indicated, using the vocabulary most
economists will know from their experience with regression methods, with emphasis
on the standard techniques involving confidence intervals or hypothesis tests. The
focus is on what an economic statistician or econometrician can reasonably expect
from the bootstrap at its current stage of development. Hence, only a few areas of
bootstrap application are considered, including inference problems when a system of
equations has been estimated or when the estimate of interest is a nonlinear function
of the parameter estimates. These are chosen to illustrate the strengths and weak-
nesses of the bootstrap approach. There are a number of surveys that cover many
applications, including Veal1 (1989),Jeong and Maddala (1993),Vinod (1993),and
Li and Maddala (1996).The last survey is particularly recommended for researchers
interested in bootstrapping in time-series contexts, with potential problems due to
unit roots, lagged dependent variables, and serial correlation. There are also books
that treat the statistical background of the bootstrap in more detail, with Efron’s early
monograph (1982) still a useful reference. Efron and Tibshirani (1993) is probably
the most straightforward reference for econometric practitioners, while some of the
APPLICATIONS
OF THE BOOTSTRAP 42 I
discussion in Hall (1992) is at a more difficult theoretical level and includes more
emphasis on the nature of bootstrap approximation. (LePage and Billard 1992 also
include theoretical papers on the limits of the bootstrap: See, for example, Arcones
and Gin6 1992, who prove that the bootstrap distribution of any M-estimator con-
verges weakly to the true limiting distribution in probability. This provides the basis
for a similar result in Hahn 1996 for generalized method of moments estimators.)
It should be noted that the Hall and Efronmibshirani books represent different ap-
proaches on a key aspect of bootstrap practice, as will be discussed further in Sec-
tions IV, V, and VI.
=X;S+e;, i = l , ...,n
and the row k-vector X ; are observations on the dependent and independent vari-
B
ables respectively with the first element of X i a 1, ,6 and are column vectors of
parameters and their estimates respectively, U ; are random errors which will be as-
sumed to be independently and identically distributed (i. i. d) with mean zero and
variance 0 2 ,and e; are regression residuals. The bootstrap estimate of the variance
matrix in this case is calculated by the following simulation experiment.
Step 1: Create an artificial sample:
(2)
The elements er' are created by resampling from e;, that is drawing from e; randomly
but with replacement. (We have assumed there is an intercept so that the e; have
mean zero; otherwise we would have to work with deviations in e; by subtracting off
the mean.) The artificial sample can be thought of as the result of one trial of a Monte
Carlo experiment where the independent variables are set as the actual values of X;,
fi
the parameters are set as the actual data estimates and its disturbances are the
draws from the empirical distribution function of the residuals e ; , that is the discrete
probability distribution function that attaches probability l/n to each value e;.
Step 2: Estimate B*' on artificial sample 1.
Step 3: Repeat steps 1 and 2 B times.
Step 4: The sampling distribution of the B*J a
- over the B Monte Carlo
bootstrap trials is an estimate of the sampling distribution of fi - @. In particular,
422 VEAU
B
the variance of is estimated by fi*(B*J), the sample variance matrix of the over B*J
the B bootstrap samples.
An alternative, known as the parametric bootstrap, is identical to the above
except the disturbances are drawn from a particular distribution with its parame-
ters set as estimates from the residuals, most commonly a normal random number
generator with mean zero and variance equal to the residual variance. It is also pos-
sible to generate the disturbances as weighted averages of draws from the empirical
distribution function of the residuals and the parametric distribution; this method
may be called the smoothed bootstrap with the weights determining the degree of
smoothing. Other techniques in Monte Carlo analysis, such as importance sampling
and antithetic variates, can also be applied to the bootstrap. (See Efron and Tib-
shirani 1 9 9 3 for some discussion and references.) These alternatives have the same
essential properties as the ordinary bootstrap, which will remain our focus as it is
the method of resampling that has almost exclusively been employed in econometric
applications.
Why have we bothered to show this simple result, particularly as it seems the
bootstrap is redundant in this case, in that it is not essentially different from the
analytic estimate? First, the fact that the bootstrap is so close to the right answer
in a case where we know the right answer is reassuring given that the bootstrap is
also proposed in other contexts in which analytic formulas are not known. Second,
that the bootstrap estimate has divisor n when we know that the exact test requires
(among other things) that the divisor be n - k emphasizes that the justification for
the use of the simple bootstrap is typically only asymptotic, although in this case
a simple rescaling eliminates the discrepancy. (We shall refer to this issue later.)
Third, it shows that the bootstrap estimate of the variance, just like the standard an-
alytic variance estimate, will only be “correct” for inference if the initial model is
correctly specified. For example, the disturbances must be identically and indepen-
dently distributed; hence practitioners should check these assumptions by suitable
diagnostics before bootstrapping. Finally, the exercise emphasizes that the bootstrap
does not create any additional information. It is simply a computational device to uti-
lize information already in the original sample.
The type of bootstrap just described in the regression context was called “resid-
ual resampling” by Kiviet (1984), because it kept the fixed structure of the indepen-
dent variables and only resampled on the residuals. Alternatively we could use what
Kiviet called “complete resampling,” where resampling is from the row (k 1)- +
vectors (x, X i ) and then the bootstrap algorithm proceeds as usual. As discussed
in Jeong and Maddala (1993, pp. 577-578), this method should give a consistent
estimate of the variance-covariance matrix of the estimated coefficients even in the
presence of random X’s and/or heteroskedasticity. The discussion in Efron and Tib-
shirani (1993, Chap. 9) goes in a different direction, pointing out that the bootstrap
will yield an estimate of the sampling distribution of b provided only that the origi-
+
nal observation vectors were chosen from the underlying (k 1)-variate probability
distribution. This does not even depend on the existence of a “true” linear model
as in the first line of (1).Naturally, however, if there is no true model, interpretation
will be difficult: we shall have an estimate of the sampling distribution of with-
out knowing its relationship, if any, to the parameter being estimated, if any. Even
granting knowledge of the true model, the assumption of random draws in the com-
plete resample precludes most time-series estimation; it, however, may be a natural
approach to use in cross-sectional estimation.
C. An Example
As an example consider the Theil(l971) textile data where Yt,the log of Netherlands
annual textile consumption for 1923-1939, is regressed on X,1, the log of real income
per capita, and X,2, the log of the relative price of textiles:
~~ -- ~
able by determining whether corresponding confidence regions cover the null hy-
pothesis parameter values. Noting this equivalence, we shall concentrate our dis-
cussion on bootstrapping confidence regions. Moreover for simplicity and because
the theory is better developed in this area, we focus on confidence regions involv-
ing one parameter, which of course correspond to t-statistics in a hypothesis-testing
framework.
One way to use the bootstrap in inference would be to use the bootstrap stan-
dard errors instead of analytic standard errors in the basic confidence interval for-
mula. We call these bootstrap standard error confidence intervals, and they are of the
si
form ft:2k -s.e.*(bi).As the bootstrapped standard errors were similar to the an-
alytic standard errors in the linear regression context, therefore bootstrap confidence
intervals of this form will not be much different from their analytic counterparts. An-
other more popular bootstrap confidence interval has been the percentile method.
In this, the artificial sample values of the statistic of interest, such as a regression
coefficient, are sorted and the 1 - 2a confidence intervals are set as the a t h and
(1 - a)th percentiles. While some find it intuitively more pleasing, in terms of re-
gression coefficient estimates these intervals tend to be similar to analytic or boot-
strap standard error confidence intervals, even when the disturbances are nonnormal
(although the combination of severe nonnormality, few degrees of freedom, and very
small a can lead to substantial differences). Regression estimates are weighted av-
erages, and very commonly this ensures (by way of a central limit theorem), that
the distribution function of 6" is approximately normal, as it is for 6. Hence for
constructing confidence intervals of linear regression coefficients, there is relatively
little to be gained by bootstrap methods.
More generally, Hall (1992) demonstrates that in most cases (essentially for
any root-n-consistent statistic that may be expanded in Edgeworth form), the end-
points of the bootstrap percentile confidence intervals and of the bootstrap standard
error confidence intervals are accurate only to 0 ( n - ' j 2 ) , which in general is the
accuracy that any analytic asymptotic method can be expected to achieve. Similar
results hold for the accuracy of the tail coverage-that is, the degree of approxima-
tion of the area in each tail outside the confidence interval to its putative value of
a. Given this, the best case for using these types of bootstrap methods is when there
is really no alternative. We shall discuss two cases: (1) forecasting from the linear
regression model and (2) inference after a specification search.
One context where the bootstrap may be very useful is in the confidence intervals for
8,
forecasts. If we stay with the linear regression context, the forecast is xn+l where
426 VEALL
xn+l is the row vector of observations on the k right-hand-side variables for period
+ f is
n 1. The forecast error e,+l
While the variance of this expression can be estimated analytically, note that in con-
structing confidence intervals, a central limit theorem does not ensure normality:
the second term of (4) may tend to the normal distribution as n grows large, but the
first will only be normal if the disturbances themselves are normal. To deal with this
nonnormality is difficult analytically but is straightforward using the bootstrap. Fol-
lowing Freedman and Peters (1984a, 1984b), we bootstrap just as above only we
focus on the forecast x,+lD*’ as the statistic of interest. We then calculate a “sim-
ulated actual” by adding an additional single bootstrapped residual to the actual
forecast %,+lb.The difference between the forecast and the simulated actual is the
simulated forecast error; we obtain an estimate of the probability distribution of the
forecast error by repeating the process B times.
Moreover, unlike the standard OLS case, in bootstrap simulation it is easy to
incorporate uncertainty in the x,+1 in the forecast confidence intervals, the impor-
tance of which is stressed by Feldstein (1971). Early contributions to this approach
are the stochastic simulation methods of, for example, Brown and Mariano (1984)
and especially Fair (1979, 19801, who, independently of the bootstrap literature,
proposed the same bootstrap method of evaluating forecast uncertainty. In addition,
Fair proposes a modification to the basic bootstrap uncertainty measures to make
allowance for specification error and applies the method to macroeconomic forecast-
ing in the United States. Freedman and Peters (1984a) use the bootstrap technique
to develop forecast standard errors in a generalized least-squares application in-
volving United States electricity consumption by region. Veall (1987a) applies the
method, with emphasis on the percentile method and the uncertainty in the inde-
pendent variable forecasts, to forecasting the demand for electricity in Ontario in a
time-series context; Veall (1987b) is a Monte Carlo study that confirms the reliability
of the approach for this problem. Bernard and Veall (1987) extend the same exercise
for Quebec emphasizing the dynamics still more. Prescott and Stengos (1987) use
the same approach for studying the United States supply of pork.
sponse to initial results may be an important part of empirical modeling, the result-
ing “pretesting” may lead to the coefficients and standard errors estimated from the
eventual model being seriously biased.
There have been a few attempts to use simulation methods as a way of treat-
ing the specification error problem. The work of Fair (1979, 1980), which began to
deal with specification uncertainty, has already been cited in the prediction con-
text. Efron and Gong (1983)attempted to grapple with the problem of specification
search by studying the sampling distribution for estimates from an entire data-mining
procedure by bootstrap simulation and provide an example relating to hepatitis di-
agnosis. The entire decision tree of the investigator is laid out (e.g., step 1: esti-
mate whole model; step 2: drop all variables whose coefficients have t-statistics less
than 1, etc.) and then applied on the data. Then the same entire decision tree is
applied to each of the bootstrap samples, generated either by the complete resam-
pling method or the residual resampling method using either the first-stage or the
final model.
Brownstone (1990)and Veal1 (1992) apply this technique to econometric ex-
amples, the former also estimating the standard errors of Stein-James shrinkage
estimators in this context. (See Vinod and Raj 1988 for an application involving
ridge regression.) Freedman, Navidi, and Peters (1988) and Dijkstra and Veldkamp
(1988)study this kind of method for stylized data-mining procedures and, in general,
find that the simulation method does not yield accurate standard errors for the data-
mining estimator. The results of Freedman, Navidi, and Peters are particularly dis-
couraging, although it should be noted that their example is based on an initial stage
of estimating 75 coefficients from 100 observations. It must also be remembered that
the method is only valid if the estimation procedure can be modeled as if it were a
prespecified decision tree: if new hypotheses and approaches were entertained only
after seeing the first set of results, then, strictly speaking, even bootstrapping the
entire procedure as run does not solve the fundamental problem. Nonetheless this
method does seem to be the only feasible possibility at this time for dealing with the
pretest issue in any real estimation context. It does seem a minimal requirement for
any estimated econometric model that if Monte Carlo samples are generated from it
and the entire data-mining procedure is applied to those samples, the results over the
Monte Carlo samples should be consistent in all important respects with the results
from the original data.
Finally, in a Bayesian context, Geweke (1986)proposes a useful method of im-
plementing strong priors on coefficient signs in regression methods, essentially by
bootstrapping an unrestricted regression model but basing all estimates on the arti-
ficial samples for which the estimates meet the sign restrictions. Hence, if the only
prior is that the income coefficient is positive, the estimate of the income coefficient
will be the average of all the positive estimates over the bootstrap samples and the
standard error will be the standard deviation of these estimates. Some researchers
may wonder why they should ignore the negative estimates in this context, but this is
the consequence of the prior they purport to hold. Chalfant, Gray, and White (1991)
is an application of the same technique.
better idea of how the analytic standard errors are biased, and it therefore seems
natural to come up with a “bias-adjusted” standard error by multiplying the original
analytic standard error estimate by the bias factor. We note that, in the linear regres-
sion context, if standard errors have been based on a variance estimate in which n
rather than n - k has been used as the divisor, the Freedman and Peters approach
will “automatically” adjust the standard error for degrees of freedom. Marais (1986)
refines this approach and finds the method can be quite accurate in the context of
estimating systems of regression equation. However, because the purpose of stan-
dard errors is for inference, the Freedman and Peters approach has been eclipsed
by approaches that directly adjust the potential bias in the test procedures or in the
confidence intervals.
where B*(a) is the l o a t h percentile of the B*’s over the B bootstrap runs. If we set
a1 = a and a2 = 1 - a , we have the ordinary bootstrap percentile confidence inter-
vals with intended coverage 1 - 2a but with no adjustment for bias. Efron proposes
the BCa confidence intervals where instead we implement (5) with
where
and
where <p is the standard normal cumulative distribution function, P* is the empirical
cumulative distribution function of the b*’s, W O ( 1 - a ) is defined analogously to
wo(a),and B is called the acceleration constant and will be discussed.
Consider first the case where a = 0, which yields what have been called
the bias-corrected (BC) percentile confidence intervals. This method primarily deals
with bias in the coefficient estimates. If there were no tendency toward bias in the
coefficient estimate of the b, we should expect that in the empirical distribution
430 VEALL
B
function of the B*'s, half should be above and half below; i.e., that P*(B) = .5 and
hence that 20 = 0-'(.5) = 0. This in turn implies that a1 = a and a2 = 1 - a.
Hence, with no coefficient bias, the BC percentile confidence intervals are just the
ordinary bootstrap percentile confidence intervals. Now suppose P* (b)
exceeded .5,
suggesting a negative bias in the coefficient estimates. Therefore 20 will be positive
+ +
and a1 = a(2.20 z(OL))and a2 = 0(220 z('-OL))will exceed a and 1 - a re-
spectively. Hence, the BC method adjusts to the downward bias in the coefficient
estimate by shifting the entire confidence interval up. (One option to adjust the co-
efficient estimate itself is to use a = .5 in the above formula.)
The role of the acceleration constant 6 is less obvious, but it is partially related
to bias in the estimation of the standard error. For example, if 20 = 0 and given that
z(OL)is negative and z ~ is- positive,
~ it can be shown that changing 6 from zero to
a small positive value will widen the BCa bootstrap percentile confidence intervals.
More generally, we could argue that the usual normal approximation
for some increasing transformation rn, where seo(rn(b>) denotes the standard error
of m(B) when the true value B equals any conveniently chosen PO:recall the point
of the exercise is that in finite samples, se(rn(6)) will depend on the value of /3 and
the approximation in the denominator in (10)attempts to capture this. Efron (1987)
or Efron and Tibshirani (1993, pp. 326-327) show that if we use the normalizing
transform rn, calculate confidence intervals based on the normal distribution, and
then transform back using r n - l , we obtain the BCa intervals except that a and t o
need to be estimated. Note that rn does not need to be known. These papers also
argue that in one-parameter families, a good approximation for d is one-sixth of the
skewness coefficient of the score function of /3, evaluated at B;
for multiparameter
families, they offer a formula based on the infinitesimal jackknife. However, most
econometricians will prefer, at least computationally, the simpler jackknife formula
(Efron and Tibshirani, 1993, p. 186):
B
where the summations run from 1 to n, B ( i ) is calculated on a sample with the ith
observation deleted, and B j , the jackknife estimator of /3, is the average of the B ( i ) .
OF THE BOOTSTRAP
APPLICATIONS 43 I
C. Percentile-t Methods
where t*'Y is the ath percentile of the t*'s. Essentially this technique uses the boot-
strap to create its own critical values instead of using those supplied by the usual
t-distribution.
The percentile-t method is not without its flaws. It is not transformation invari-
ant: we cannot obtain the confidence interval endpoints for h ( 8 )by simply plugging
the endpoints of the 8 confidence intervals into the function h. This is a familiar prob-
lem in statistics (described in an econometrics context by Gregory and Veal1 1985),
but leaves open the possibility that different ways of specifying restrictions or pa-
rameters can lead to markedly different substantive results. A solution proposed by
Tibshirani (1988)is to implement the percentile-t method with a variance-stabilizing
transform.
While the BCa method is transformation invariant in the single-parameter
case, it is not in the multiparameter case if the transformation h involves the pa-
rameters: the confidence interval for 8182 - l is not the same as the confidence
interval for 81 - 1/82. While this seems quite natural, if the idea is to perform a
test for the null hypothesis that 81/32 - 1 = 0 (equivalent to seeing whether the
confidence interval of 81/32- 1 covers zero), it is disconcerting that it matters if we
instead test the algebraically equivalent null hypothesis 81 - 1/82 = 0.
where 80 is the value of under the null hypothesis and t** are the t-values gener-
ated by a bootstrap simulation using the estimates that are generated with the null
hypothesis. In the case, for example, when there are no other parameters to estimate,
a confidence interval calculated in this manner (or the corresponding test) is exact in
finite samples. This result extends if there are parameters to estimate even under the
null hypothesis, provided those other parameters do not enter the null distribution
of the parameter under test. (An example would be in the (static) linear regression
case, where the null distribution of each coefficient does not depend on the other
coefficients.) However, if this property does not hold, such as in many dynamic or
nonlinear regression contexts, the property of exactness is lost due to the estimation
of the other "nuisance" parameters: see Dufour (1995),who incidentally attributes
the initial idea behind this procedure to Dwass (1957)rather than to Barnard. Theil
and his associates provide examples of the Barnard method (Rosalsky, Finke, and
Theill984, Theil, Shonkwiler, and Taylor 1985, Taylor, Shonkwiler, and Theil 1986,
OF THE BOOTSTRAP 433
APPLICATIONS
Shonkwiler and Theil 1986, Theil and Shonkwiler 1986), some of which will be dis-
cussed subsequently. Van Giersbergen and Kiviet (1 993) promote the approach in
the context of a dynamic regression model. The method of “calibration” now com-
monly used in macroeconomics, in which the distribution of statistics is generated
under a null hypothesis imposed under an elaborate computer model called an “ar-
tificial economy,” can be seen as an extension of this basic idea. See Gregory and
Smith (1993)for a survey.
While it is sometimes called a bootstrap correction, what we have called here the
Barnard approach can be used as a size correction for almost any test. Horowitz
(19940, for example, uses this approach to correct the size of the information matrix
test for heteroskedasticity, Fan and Li (1995) use it to correct the J-test for testing
a nonnested hypothesis, Theil and Shonkwiler (1986) use it to study tests for se-
rial correlation, and Davidson and MacKinnon (1996)use both the J-test and serial
correlation test examples in a more general discussion of this type of bootstrap test-
ing. Horowitz argues that 100 bootstraps are sufficient for many cases where a is
not too small (so that t*.” would be the fifth smallest value of t* and t*.9s would be
the 96th smallest, that is the fifth largest) (see also Marriot 1979, Hall 1986).While
Horowitz’s results are supported by his Monte Carlo simulation results which check
for empirical size, size is not the only criterion of interest, and it is possible that the
use of such a small number of bootstraps may affect power.
where Wit is the expenditure share of good i at time t , x: is a row vector consisting of
a 1 (corresponding to a constant), total expenditure, and all rn prices, and U;,repre-
sents a disturbance term which has no correlation with any disturbance term at any
other time but which may have a contemporaneous correlation across commodities.
The homogeneity property from consumer theory suggests that the last rn + 1 ele-
ments of each should sum to zero, implying that if all prices and total expenditure
were changed in the same proportion there would be no change in expenditure shares
and, hence, in physical quantities purchased. Because the right-hand-side variables
434 VEALL
are the same in each equation, if the disturbances are normally distributed, OLS is
maximum likelihood.
Laitinen noted that before his paper this kind of proposition was commonly
tested with a Wald test based on the OLS estimates of each share equation and a
cross-equation variance-covariance matrix estimated from the OLS residuals. The
resulting test is asymptotically distributed as chi-square, with n - 1 degrees of free-
dom. Checking the asymptotic distribution using a simulation experiment based on
data from Theil (1975), when the number of commodities is 14, a true (by construc-
tion) null hypothesis is rejected by the Wald test based on a nominal 5% level, 87
times out of 100 rather than the expected 5. Laitinen argued that this is one reason
there are typically so many rejections of homogeneity in actual applications. One
intuition is that the variance-covariance matrix is badly misestimated by maximum
likelihood methods because there is no adjustment for degrees of freedom lost due to
parameter estimation, particularly as the number of estimated parameters increases
directly with the number of equations.
As it turns out, for the special case of the homogeneity test, Laitinen finds an
exact test using the Hotelling T 2 distribution. But his exact solution does not apply
in other contexts, for example the closely related problem of testing for the property
of symmetry in demand systems. Meisner (1979) and Bera, Byron, and Jarque (1981)
use simulation methods to examine the test for symmetry and find very inaccurate
test sizes and poor power as well.
Theil, Shonkwiler, and Taylor (1985)and Taylor, Shonkwiler, and Theil(l986)
apply the Barnard method directly to the demand homogeneity and symmetry Wald
tests; Shonkwiler and Theil (1986) use the same method to develop critical values
for alternative, non-Wald tests that they show can have superior power. Raj and Tay-
lor (1989)apply a Barnard-type bootstrap to testing within-equation restrictions in
demand systems.
Other researchers generate ordinary bootstrap standard errors and base tests
on these. Korajczyk (1985)and Freedman and Peters (1984a, 198433)have been dis-
cussed. Atkinson and Wilson (1992) have a different reading of Freedman and Peters
than ours and believe Freedman and Peters are arguing for direct application of the
bootstrap standard errors with no adjustment for bias. Given our discussion above
that ordinary bootstrap quantities are not theoretically more accurate than their ana-
lytic counterparts, it is not surprising that Atkinson and Wilson find in Monte Carlo
analysis based on relatively small systems that SUR standard errors (calculated en-
tirely using OLS residuals) may be no more accurate than ordinary bootstrap stan-
dard errors. Rilstone and Veal1 (1996a), emphasizing that the purpose of estimat-
ing the variance-covariance matrix and standard errors is for inference, find that
percentile-t confidence intervals are considerably more accurate than those based
on OLSISUR, although the BCa confidence intervals performance is only fair. Ril-
stone (1993) has similar findings for percentile-t (see also Rayner 1991) and BCa
intervals in a single-equation regression context with AR(1) errors.
OF THE BOOTSTRAP 435
APPLICATIONS
C. Nonlinearities
While discussing the lack of invariance of percentile-t methods, it was pointed out
that sometimes test results and confidence intervals are sensitive to nonlinearities.
In demand systems of the kind just described, for example, even if we ignore the
SUR problem there are potential difficulties in inferences because the desired esti-
mates of the price and income elasticities are nonlinear in the estimated parameters.
Green, Rocke, and Hahn (1987) find that the bootstrap estimates of the standard er-
rors of the estimated price elasticities are not much different from their asymptotic
counterparts, but there are substantial differences for the income elasticity. Krinsky
and Robb (1986, 1990, 1991) find no large differences between analytic and boot-
strap alternatives for a different data set and a translog system of equations, nor do
McManus and Rosalsky (1985) in a nonlinear earnings equation. George, Oksanen,
and Veall (1995)find some differences in a context where desired stock is estimated
as a nonlinear quantity, as do Veall and Zimmermann (1993) in another dynamic,
nonlinear context where they also use simulation to estimate power.
The point of these examples is that sometimes nonlinearities seem to matter
and sometimes they do not. If we return to the simple example of Gregory and Veall
(1985), it is easy to see what can cause the problem. If we compare the quantities
8182 - 1and 81 - 1/82,for example, it is immediately clear that the second is much
more nonlinear as 82 approaches zero. Rilstone and Veall (199613)examine the use
of percentile-t methods and BCa methods and find that neither works well at all in
this simple nonlinear example. While it may be simply that any approximation will
break down with enough nonlinearity (e.g., if small enough 82 above), nonetheless
we must conclude that the bootstrap is not yet a complete answer to the problems
associated with finite-sample inference involving nonlinear quantities.
An important role of the bootstrap is to provide standard errors and other tools of
inference when there are no other available methods. A s discussed, such methods
are no more accurate, at least in a theoretical sense, than analytic methods based
on asymptotic approximations, but in the typical case where the use of the bootstrap
can be justified asymptotically, bootstrap standard errors cautiously used may be
valuable as “better than nothing” when analytic alternatives are not available. (This
is particularly true when the appropriateness of the bootstrap is itself checked by
simulation as discussed in the final paragraph.)
However, this survey has raised some questions with respect to whether boot-
strap methods are necessarily more accurate than analytic, asymptotic methods.
Only bootstrap refinements, such as the BCa method with its analytic bias correc-
tion based on an estimated “acceleration” coefficient or the percentile-t method, are
436 VEALL
more accurate in the Edgeworth sense. Yet even these have their flaws: it is possible
for BCa (1 - 2a) confidence intervals to narrow as a increases; percentile-t confi-
dence intervals are not transformation invariant. These flaws seem to be reflected in
actual performance problems, for example the severe shortcomings of the bootstrap
in the simple nonlinear case just discussed. Only some applications of the Barnard
method, involving simulation under a null hypothesis which does not depend on nui-
sance parameters, yield bootstrap-based tests which are exact in finite samples.
Fortunately, while simulation reveals the problem it may also provide the an-
swer. The first point to emphasize is that analytic, asymptotic methods often have
bad finite-sample performance. However, the quality of finite-sample performance
of such methods is usually based on speculation unless a simulation experiment is
done. If a simulation is done which tends to verify the accuracy of asymptotic meth-
ods, further simulations are not a priority. But based on current understanding of the
bootstrap, if the bootstrap and analytic, asymptotic methods differ it may be that the
bootstrap results are slightly to be preferred (especially refined bootstrap methods),
but what really needs to be done is further simulation study of the bootstrap itself.
(See Beran 1988, 1990 or Beran and Ducharme 1991 for approaches along these
lines, in which, for example, the percentile-t bootstrap is performed using bootstrap
standard errors.) While this “bootstrapping the bootstrap” approach seems compu-
tationally tedious, computer time is the one thing that is getting cheaper and,. except
in the very rare case of exact tests, we now see that many kinds of results cannot be
relied upon in finite samples unless they can be confirmed in a simulation experi-
ment. In some sense, our answer to the student’s question in the introduction as to
how big a sample is big enough to use asymptotics reliably is, “It depends, but I can
tell you a way that may help to find out for any given problem.” Hence, while adding
layers of simulation to our standard econometric practice may seem difficult, it is
comforting to know that there is at least one feasible method to check the asymptotic
approximations that are so widespread in econometrics.
ACKNOWLEDGMENTS
The author acknowledges the research assistance of Deb Fretz, the useful comments
of a referee, and the financial support of the Social Sciences and Humanities Re-
search Council of Canada.
REFERENCES
Arcones, M. and E. Gin6 (1992),On the Bootstrap of M-estimators and Other Statistical Func-
tionals, in P. E. LePage and L. Billard (eds.), Exploring the Limits ofthe Bootstrap,
Wiley, New York, 13-47.
APPLICATIONS OF THE BOOTSTRAP 437
Fan, Y. and Q. Li (1995), Bootstrapping J-type Tests for Non-nested Regression Models, Eco-
nomics Letters, 48, 107-1 12.
Feldstein, M. (1971), The Error and Forecast in Econometric Models When the Forecast-
Period Exogenous Variables Are Stochastic, Econometrica, 39, 55-60.
Freedman, D. A., W. Navidi, and S. C. Peters (1988), On the Impact of Variable Selection
in Fitting Regression Equations, in T. K. Dijkstra (ed.), On Model Uncertainty and Its
Statistical Implications, Springer, Berlin, 1- 16.
Freedman, D. A. and S. C. Peters (1984a), Bootstrapping a Regression Equation: Some Em-
pirical Results, Journal of the American Statistical Association, 79,97-106.
Freedman, D. A. and S. C. Peters (1984b), Bootstrapping an Econometric Model: Some Em-
pirical Results, Journal of Business and Economics Statistics, 2, 150-158.
George, P. J., E. H. Oksanen, and M. R. Veall (1995), Analytic and Bootstrap Approaches to
Testing a Market Saturation Hypothesis, Mathematics and Computers in Statistics, 29,
3 11-3 15.
Geweke, J. (1986), Exact Inference in the Inequality Constrained Normal Linear Regression
Model, Journal of Applied Econometrics, 1, 127-141.
Green, R., D. Rocke, and W. Hahn (1987), Standard Errors for Elasticities: A Comparison of
Bootstrap and Asymptotic Standard Errors, Journal of Business and Economics Statis-
tics, 4, 145-149.
Gregory, A. W. and G. W. Smith (1993),Statistical Aspects of Calibration in Macroeconomics,
in G. S. Maddala, C. R. Rao, and H. D. Vinod (eds.), Handbook ofStatistics, Vol. 11,
Elsevier, Amsterdam, 703-719.
Gregory, A. W. and M. R. Veall (1985), On Formulating Wald Tests of Nonlinear Restrictions,
Econometrica, 53, 1465-1468.
Hahn, J. (1996),A Note on Bootstrapping Generalized Method of Moments Estimators, Econo-
metric Theory, 12, 187-197.
Hall, P. (1986), On the Number of Bootstrap Simulations Required to Construct a Confidence
Interval, Annals of Statistics, 14, 1453-1462.
Hall, P. (1988), Theoretical Comparison of Bootstrap Confidence Intervals (with discussion),
Annals of Statistics, 16, 927-952.
Hall, P. (1992),TheBootstrap and Edgeworth Expansions, Springer, Berlin.
Horowitz, J. L. (1994), Bootstrap-Based Critical Values for the Information Matrix Test, Jour-
nal of Econometrics, 61, 395-411.
Jeong, J. and G. S. Maddala (1993), A Perspective on Application of Bootstrap Methods, in
G. S. Maddala, C. R. Rao, and H. D. Vinod (eds.), Handbook of Statistics, Vol. 11,
Elsevier, Amsterdam, 573-610.
Kiviet, J. F. (1984), Bootstrap Inference in Lagged Dependent Variable Models, University of
Amsterdam Working Paper.
Korajczyk, R. A. (1985), The Pricing of Forward Contracts for Foreign Exchange, Journal of
Political Economy, 93,346-368.
Krinsky, I. and A. L. Robb (1986),On Approximating the Statistical Properties of Elasticities,
Review of Economics and Statistics, 68, 715-719.
Krinsky, I. and A. L. Robb (1990), On Approximating the Statistical Properties of Elasticities:
A Correction, Review of Economics and Statistics, 72, 189-190.
Krinsky, I. and A. L. Robb (1991), Three Methods for Calculating the Statistical Properties of
Elasticities: A Comparison, Empirical Economics, 16, 199-209.
APPLICATIONS
OF THE BOOTSTRAP 439
Theil, H., J. S. Shonkwiler, and T. G. Taylor (1985), A Monte Carlo Test of Slutsky Symmetry,
Economics Letters, 19,331-332.
Tibshirani, R. (1988), Variance Stabilization and the Bootstrap, Biometrika, 75,433-444.
van Giersbergen, N. P. A. and J. F. Kiviet (1993), How to Implement Bootstrap Hypothesis
Testing in Regression Models, University of Amsterdam Working Paper.
Veall, M. R. (1987a),Bootstrapping the Probability Distribution of Peak Electricity Demand,
International Economic Review, 28,203-2 12.
Veall, M. R . (1987b), Bootstrapping and Forecast Uncertainty: A Monte Carlo Analysis, in
I. B. MacNeill and G. J. Umphry (eds.), Time Series and Econometric Modelling, Reidel,
Dordrecht, 373-384.
Veall, M. R. (1989), Applications of Computationally-Intensive Methods to Econometrics,
Bulletin of the International Statistical Institute, Proceedings of the 4 7th Session, 3,
75-78.
Veall, M. R. (1992), Bootstrapping the Process of Model Selection: An Econometric Example,
Journal of Applied Econometrics, 7,93-99.
Veall, M. R. and K. F. Zimmermann (1993), The Size and Power of Integrability Tests in Dy-
namic Demand Systems, Computational Statistics, 8, 127-139.
Vinod, H. D. (1993), Bootstrap Methods: Applications in Econometrics, in G. S. Maddala,
C. R. Rao, and H. D. Vinod (eds.), Handbook ofstatistics, Vol. 11, Elsevier, Amsterdam,
629-66 1.
Vinod, H. D. and B. Raj (1988), Economic Issues in Bell System Divestiture: A Bootstrap
Application, Applied Statistics, 37,251-261.
Detection of Unusual Observations
in Regression and Multivariate Data
Ali S. Hadi
Cornell Universiv, Ithaca, New York
Mun S. Son
University of Vermont Burlington, Vermont
Regression and multivariate analysis techniques are commonly used to analyze data
from many fields of study including economic and social sciences. These data often
contain unusual observations. Unusual observations, usually referred to as outliers,
are observations that do not conform to the pattern (model) suggested by the major-
ity of the observations in a data set. If they exist in the data, outliers can distort the
analysis of data,and the conclusions based on the analysis. For example, outliers can
distort parameter estimation, invalidate test statistics, and lead to incorrect statisti-
cal inference. We illustrate this point and the methods presented in this chapter by
the following data set.
Example I : Financial Data. In this chapter we make a repeated use of the fol-
lowing data set, which we refer to as the financial data. The data set was collected
and thoroughly analyzed by Jeff M. Semanscin (a student in one of the authors’ ap-
plied regression methods class) using Standard & Poor’s Compustat PC Plus. The
purpose of using the data here is only to illustrate the methods presented in this
chapter. There are several variables in the data set, but for illustrative purposes we
consider only a subset consisting of the following three variables:
44 I
442 HADIAND SON
respectively. When the outliers are deleted, mean and covariance matrix become
(1)
1
21.65 125.24 32.59 -0.45
respectively. Note the dramatic effects of outliers on the estimated variances and co-
(2)
variances. To illustrate how the confidence regions can change substantially because
of outliers, let us examine the bivariate scatter plot of X Iversus X 2 . The scatter plot,
Number X1 x2 Xn Number X1 x2 x3
~~
Figure I Financial data: Trivariate Scatter plot of X I , Xz,and Xs with the outliers indicated
by their numbers.
together with two ellipses expected to contain 95% of the observations, are shown
in Figure 2. The larger ellipse is based on the mean and covariance matrix of the
full data (all 26 observations) and the smaller ellipse is based on the mean and CO-
variance matrix of the data without the outliers (indicated by their numbers on the
scatter plot). Observe the huge difference between the two ellipses in terms of their
I
69m73
44 m98
XI 20.23
-4 m52
/)!
.....
-29m27 I I I I I
-15.13 -4.61 5m92 16.44 26.96
x2
Figure 2 Financial data: Bivariate scatter plot of XI versus X , with two ellipses (expected
to contain 95%of the observations). The larger ellipse is based on the mean and covariance
matrix of the full data (all 26 observations), and the smaller ellipse is based on the mean and
covariance matrix of the data without the outliers (indicated by their numbers).
444 HADIAND SON
57.44- 26 0
0
0
#
0
#
#
#
0
#
. . .I
1.524 %
I I I f
0.20 6.89 13.58 20.27 26.96
*2
Figure 3 Financial data: Scatter plot of X1 versus X2 with two superimposed regression lines.
The solid line is the least-squares regression line obtained using the full data. The dotted line
is obtained when the outliers (1 and 26) are deleted.
sizes, orientations, and shapes. Note also how the larger ellipse is affected by the
outliers. The larger ellipse detects only one observation as an outlier, whereas the
smaller ellipse declares four observations as outliers.
Let us now think of the data as regression data, where distinction between
dependent and independent variables has to be made. Consider, for example, the
simple regression of X I on X 2 . The scatter plot of X1 versus X z , with two super-
imposed least-squares regression lines, is shown in Figure 3. The solid line is the
least-squares regression line obtained using the full data. The dotted line is obtained
when the two outlying observations 1 and 26 are deleted. Again, we obtain two sub-
stantially different lines.
The above example shows that outliers can lead to misleading or erroneous
conclusions. It is therefore important for data analysts to first identify and examine
outliers if they exist in the data, before making conclusions based on data.
Before we proceed any further, we wish to make an important point. After read-
ing the literature on outlier detection, some people are left with the incorrect impres-
sion that once outliers are identified, they should be deleted from the data and the
analysis continues. We do not advocate automatic deletion (or even automatic down-
weighing) of outliers because outliers are not necessarily bad observations. On the
contrary, if they are correct, they may be the most informative points in the data. For
example, they may indicate that the data did not come from a normally distributed
population, as commonly assumed by almost all multivariate analysis techniques; or
they may indicate that the model is not linear. For this reason the outliers should be
DETECTION
OF UNUSUAL OBSERVATIONS 445
2030
1523
y 1016
5 09
3
0 7..5 15 22.5 30
X
Figure 4 A scatter plot of population size, Y , versus time, X. The curve is obtained by fit-
ting an exponential function to the full data. The straight line is the least-squares line when
observations 22 and 23 are deleted.
called the unusual or even the interesting observations. To emphasize that outliers
can be the most informative points in the data, we use the exponential growth data
described in the following example.
Example 2: Exponential Growth Data. Figure 4 is the scatter plot of two vari-
ables, the size of a certain population, Y , and time, X . As can be seen from the
scatter of points, the majority of the points resemble a linear relationship between
population size and time as indicated by the straight line in Figure 4. According to
this model, points 22 and 23 in the upper right corner are outliers. If these points,
however, are correct, they are the only observations in the data set that indicate that
the data follow a nonlinear (e.g., exponential) model, such as the one shown in the
graph. Think of this as a population of bacteria which increases very slowly over a
period of time, then once somebody sneezes the population size explodes.
What do we do with outliers once they are identified? Because outliers can
be the most informative observations in the data set, they should not be automati-
cally discarded without justification. Instead, they should be examined to determine
why they are outlying. Based on this examination, appropriate corrective actions can
then be taken. These corrective actions include correction of errors in the data, dele-
tion or downweighing outliers, transforming the data, considering a different model,
redesigning the experiment or the sample survey, collecting more data, etc.
Outliers in multivariate data are intrinsically more difficult to detect than out-
liers in univariate and bivariate data. For example, in simple regression and in uni-
variate, bivariate, and trivariate data, outliers can be detected easily by graphing the
data. In higher than three dimensions and in the presence of multiple outliers, it is
difficult to detect outliers because
446 HADIAND SON
188
181
168
162
163 169 175 181 187
Height
Figure 5 A scatter plot of weight versus height with the box plots on the margin of each
variable. The two bivariate outliers cannot be detected by examining only univariate graphs
such as box plots.
Example 3: The Weight-Height Data. The two bivariate outliers which appear
in the scatter plot of weight versus height in Figure 5 cannot be detected by exam-
ining only univariate graphs (e.g., box plots) of weight and height separately. Note,
however, that methods that work for higher-dimensional data will continue to work
for lower-dimensional data, but the converse is not generally true.
2. If the data contain a single outlier, the problem of identifying the out-
lier is simple, but if the data contain more than one outlier the problem
of identifying them becomes difficult due to the masking and swamping
problems. Masking occurs when a method fails to detect outlying observa-
tions (false negative decisions). Swamping occurs when a method declares
good points as outliers (false positive decisions). Masking and swamping
are serious problems and are the cause of the failure of many outlier de-
tection methods. Note that methods that work in the presence of multiple
outliers will continue to work if the data contain a single outlier or no
outliers at all, but the converse is not generally true.
In this chapter we discuss some methods for the detection of outliers in re-
gression and multivariate data. We concentrate our attention on methods that have
been recently developed and that require reasonable computational effort. Rele-
vant outlier detection methods are found, for example, in Rohlf (1975), Hawkins
DETECTION
OF UNUSUAL OBSERVATIONS 447
(1980), Schwager and Margolin (1982), Beckman and Cook (1983), Barnett and
Lewis (1984), Hampel et al. (1986), Bacon-Shone and Fung (1987), Rousseeuw and
Leroy (1987), Fung (1988), Rasmussen (1988), and Caroni and Prescott (1992). The
rest of the chapter is organized as follows. Section I1 describes a unified framework
for the detection of outliers in both multivariate and regression data. Sections I11
and IV discuss the specifics of this unified framework for the detection of outliers in
multivariate and regression data, respectively. Section V deals with the problem of
outliers detection in very large data sets.
In regression analysis situations the data set contains a response variable y, con-
sisting of n observations, and a matrix X,consisting of n rows (observations) and p
columns (covariates). In multivariate analysis situations there is no y. The data set
contains only X,but we think of X as a random sample generated from a multivari-
ate elliptically symmetric distribution such as a multivariate normal or a multivariate
t-distribution. Our objective is to identify outliers if they exist in the data set in each
of these two situations.
It has been long recognized that classical methods, such as least-squares resid-
uals or Mahalanobis distances, are not effective in the detection of outliers because
they are not robust; that is, they are affected by the outliers that they are supposed to
detect. One way out of this problem is to replace classical methods by robust meth-
ods, which produce estimates that are resistant to the presence of outliers and/or to
violations of distributional assumptions. Indeed, several books have been devoted
either entirely or in large part to robust methods and/or outlier detection techniques;
see, for example, Barnett and Lewis (1984), Hawkins (1980), Huber (1981), Hampel
et al. (1986),Rousseeuw and Leroy (1987), and Chatterjee and Hadi (1988). Other
relevant articles include Maronna (1976), Campbell (1980),Rousseeuw and Yohai
(1984),and Lopuha (1989).
Robust methods have been suggested for many years now, but they have not
yet been widely used in practice because they involve extensive computations. The
most widely known of the robust methods are the least median of squares (for regres-
sion data) and minimum-volume ellipsoids (for multivariate data) estimators. These
methods are highly effective because they are not affected by outliers, but they are
computationally prohibitive, like other robust estimation methods.
Another way out of the problem has been recently developed by Hadi (1992a,
1994) and Hadi and Simonoff (1993, 1994, 1997). The main idea of these methods
is to first form a basic subset of about half of the data which is presumably free of
outliers, then add observations that are consistent with the basic subset. If all the
observations are added to the basic subset, the data set is declared to be free of out-
liers; otherwise the observations that are not consistent with the basic subset are
448 HADIAND SON
The answers to these questions depend on whether we are dealing with multivariate
or regression data.
The elliptical distance d ( c , V) measures the distance between the ith observation,
x;,and a location (center) estimator c, relative to a measure of dispersion, V. The
classical choices of c and V are c = x (the sample mean) and V = S (the sample
covariance matrix), respectively. This choice of c and V gives
which is known as the Mahalanobis distance. If the data come from a p-variate nor-
mal distribution, the dV(Z,S) follows approximately a X2-distribution with p de-
grees of freedom. Thus, using an a-level of significance, values of di(%, S) larger
than Jm are declared to be outliers.
Unfortunately, d ; ( X , S) is affected by outliers. For example, some of the out-
liers may still have small values of d; (%, S) (masking) and some of the observations
which are not outliers may have large di(x,S) (swamping). This is illustrated by the
following example.
DETECTION
OF UNUSUAL OBSERVATIONS 449
Table 2 Financial Data: The Mahalanobis Distances for the Trivariate and
Bivariate Data Sets Graphed in Figures 1 and 2
3.971.
0
2.07
eo
.
e ee
eee eee
e e
0.334
1 1 1 1 1 1 1 1 1 1 1
1 5 9 13 17 21 25 1 5 9 13 17 21 25
(a) Trivariate data (b) Bivariate data
Figure 6 Financial data: Index plot of the Mahalanobis distance for (a) the trivariate data
{ X l , X2,Xs}and (b) the bivariate data {XIand Xz}.
450 HADIAND SON
trivariate and bivariate scatter plots in Figures 1 and 2, we see that the Mahalanobis
distance failed to identify observations 5, 11, and 26 in the trivariate case and ob-
servations 10,23 and 26 in the bivariate case.
The Mahalanobis distance fails to detect the outliers because it depends on the
sample mean and covariance matrix, which are known to be sensitive to the presence
of outliers. One way to solve this problem is to replace the mean and the covariance
matrix by more robust estimators of the location and scale. There are many robust
estimators for multivariate data. One problem with robust methods, however, is that
they are computationally extensive and at times practically infeasible. This may ex-
plain why robust statistics have not been widely implemented in statistical packages,
although they have appeared in the literature for so many years. Alternatively, using
the mean and the covariance matrix of the basic subset, we obtain
i= 1
The two stages of the method in the multivariate case are given in Algorithms 1 and
2 (Hadi 1994). In these algorithms Xb and S b are the mean and covariance matrix of
the observations in the current basic subset xb.
Algorithm I : Finding the Basic Subset
Output: A basic subset of size h observations that are likely to be free from outliers.
Step 0: Compute d;(m, A). Let X b and s b be the mean and the covariance
matrix of the observations with the h smallest values of d;(iii,A).
Compute di (Xg, Sb). Rearrange the observations in ascending order
according to di(Xb,S b ) . Divide the observations into two initial sub-
+
sets: a basic subset containing the first p 1 observations and a non-
basic subset containing the remaining n - p - 1 observations.
DETECTION
OF UNUSUAL OBSERVATIONS 45 I
Step 1: If the basic subset is of full column rank, compute di(%b, S b ) , where
-
Xb and S b are the mean and covariance matrix of the observations
in the current basic subset. If the basic subset is not of full rank,
increase the basic subsets by as many observations as needed for
the basic subset to become full rank. If needed, the observations are
added according to their ranked order.
Step 2: Rearrange the n observations in ascending order according to di (xg,
S b ) . Let s be the number of observations in the current basic subset.
Divide the observations into two initial subsets: a basic subset con-
+
taining the first s 1 observations and a nonbusic subset containing
the remaining n - s - 1 observations.
Step 3: Repeat Steps 1 and 2 until the basic subset contains h observations.
If desired, the final basic subset obtained in Algorithm 2 can be used to com-
pute the final distance for each of the observations in the data set. The method pro-
posed here is easy to compute. It is also effective in identifying the outliers when
tried on real as well as simulated data. This method has been implemented in some
commercially available statistical packages (e.g., Stata; see Gould and Hadi 1993).
Example 5: financial Data. Consider again the financial data described in Ex-
ample 1. The di(Z6,S b ) for the trivariate and bivariate data sets graphed in Figures 1
452 HADIAND SON
Table 3 Financial Data: The di(%b, S b ) for the Trivariate and Bivariate Data
Sets Graphed in Figures 1and 2
and 2 are given Table 3. The corresponding index plots are shown in Figure 7a and
b, respectively. In both cases, the Mahalanobis distance declares only observation 1
as an outlier (the distances for observation 1 of 3.97 and 3.96 are slightly larger than
the cutoff point of 3.86). By comparison with the trivariate and bivariate scatter plots
in Figures 1 and 2, we see that di(Zb,St,) identifies all outliers in both data sets.
27.63 - 10.27 -
20.83 - 7.74 -
14.04- 5.20-
* *
7.24- 2.66 -
..
e
0 ** * 0 ** *.**.** **
.Oo ~ o ' o * ~~**..*~e***** *e
0.44- 0.12-
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
A. Detection of Outliers
As we mentioned, outliers are observations that do not conform to the pattern (model)
suggested by the majority of the observations in a data set. To detect the outliers in X
we need to measure the distance between the ith observation xi and the fitted model.
The classical choice of a distance here is the least-squares standardized residual
D .
where e; is the ith element of the residual vector e = (I, - P)y and pii is the ith
diagonal element of the projection matrix
P = x(xTx)-'xT (9)
Replacing o by 6 = d m ,we obtain
i = 1, . . . , n (10)
where 6(i)is the estimate of o when the ith observation is deleted. For simplicity
of notation, we write r; and r: instead of ri(6) and r;(6(il),respectively. Note that r;
and r: are related by
ri = r;
n-p-r:
43-461
xi
j j
29.48
15.50
e d 14
1-52
0.07 0.51 0.95 1.38 1.82
Figure 8 Financial data: Scatter plot of X I versus X3. The solid line is the regression line
based on all 26 observations and the dotted line is obtained when the observations 5,11, and
26 are deleted.
Outliers tend to have large absolute values of r; or r:, but unfortunately this
is not always the case. Outliers can have small residuals (masking), and observa-
tions which are not outliers can have large residuals (swamping). The reason for
this, again, is that ri and r;* are affected by outliers. For example, the two points in
the lower right corner of the scatter plot of X I versus X 3 in Figure 8 have very small
residuals because they are close to the solid regression line, which is based on all 26
observations, yet they are very far from the dotted regression line, which is obtained
when the outliers 5, 11, and 26 are deleted. Thus, using the ri or r:, observations
5 and 11 will be masked. On the other hand, observation 14, which is close to the
dotted regression line, is far from the solid regression line (swamping).
To deal with masking and swamping problems, we replace r; and r;* by a more
robust residual. One alternative is to use the least median of squares (LMS) residual.
However, the LMS is computationally intensive. Another alternative is to define the
residual with respect to the fitted model based only on a basic subset which is likely
to be free from the outliers. Let yb and Xb be the observations in the basic subset.
Let b b and 6;be the least-squares estimate of /3 and 0’ based on the observations in
the basic subset, respectively. A robust version of the residual (which can actually
be thought of as the scaled prediction error) can be defined as
DETECTION
OF UNUSUAL 455
OBSERVATIONS
The two stages of the method in the regression case are given in Algorithms 3
and 4 (Hadi and Simonoff 1993).In these algorithms B b and 3; are the least-squares
estimates of B and 0
' based on the current basic subset yb and xb.
Number di Number di
1 -3.70 14 - 1.25
2 0.07 15 -0.02
3 0.16 16 1.01
4 2.24 17 1.15
5 -1.06 18 1.75
6 0.50 19 0.30
7 0.09 20 -0.55
8 -0.64 21 -0.35
9 -0.40 22 -0.70
10 -1.84 23 - 1.09
11 0.67 24 -1.51
12 0.00 25 -0.41
13 1.43 26 4.29
If desired, the final basic subset obtained in Algorithm 4 can be used to com-
pute the final residual for each of the observations in the data set.
Notice the similarity between Algorithms 1 and 2 for the detection of outliers
in multivariate data and Algorithms 3 and 4 for the detection of outliers in regression
data. They are special cases of the unified framework discussed in Section 2.
Example 6:financial Data. Consider again the financial data described in
Example 1. The final di obtained by Algorithm 4 for the simple regression of X I on
X 2 is given in Table 4. The corresponding index plot is shown in Figure 9. Comparing
these results with the scatter plot in Figure 3, we see that the method identified the
two outliers marked on the graph.
where pi; is the ijth element of the matrix P in (9), and observing the pii is the
weight or leverage attached to yi in determining the ith fitted values. Two important
and interesting properties of pi; are
DETECTION
OF UNUSUAL OBSERVATIONS 457
4 -29-
2.29-
e*
0 - 3 0 - .a*. . *. * . . . .* . . e.. . 0 . . . . . . . . .
** **.
-1.70 -
-3.70 7
Figure 9 Financial data: Index plot of the final di obtained by Algorithm 4 for the simple
regression of X1 on X p . Two observations (1 and 26) are declared outliers.
0 F pii i 1
and
e2
pi; + 1< 1
eTe -
For proofs of these and many other properties of the matrix P and its elements, see
Chatterjee and Hadi (1988, Chapter 2). Consequently, the larger the p;; the higher
the leverage of the yi in determining fi. For example, in the extreme case where
pii = 1, we have y; = yi and ei = 0; that is yi completely determines fi. Thus,
observations with large values of pi; are called high-leverage points and the pii’s are
called leverage values.
The presence of high-leverage points, individually or in groups, makes it very
difficult to identify the outliers. Therefore, the X data should be examined for the
presence of high-leverage points. High-leverage points tend to have large values of
pii. Unfortunately, high-leverage points may not always have large leverage values
because a group of points can collaborate together and collectively induce high lever-
age, although their individual leverage values are not high. In other words, while all
points with large leverage values are high-leverage points, some observations with
small leverage values may be collectively a high-leverage group. Such a group of
high-leverage points can be identified by exploiting the relationship between the
concept of high-leverage and outliers in the multivariate X-space. One can think
of high-leverage points simply as outliers in the X-space. Thus, to identify high-
leverage points, one can think of X as multivariate data and apply Algorithms 1 and
2 to identify the outliers in X (see also Rousseeuw and van Zomeren 1990,1991).In
the context of regression, these outliers are called high-leverage points.
458 HADIAND SON
where b(i)
is the least-squares estimate of B when the ith observation is deleted. A
comparison with (3) shows that Ci is the squared elliptical distance between and B
b(i).Thus, a large value of Ci indicates that the ith observation is influential on B.
After some algebraic manipulations, one can show that
from which it follows that C; is a multiplicative function of the residual and leverage
values. Although a large value of Ci indicates that the ith observation is influential
S,
on a small value of C; does not necessarily indicate that the ith observation is not
influential. This can be seen from (17) because a high-leverage point tends to have a
small residual, hence a small value of Ci. From (19)it can be seen that an observation
B
will be influential on if it is an outlier (large value of Iril), a high-leverage point
(large value of pii), or both. Hadi (1992b) utilizes this idea and develops the additive
influence measure
where df = ef/eTe is the square of the ith normalized residual. The first term is a
leverage term which measures outlyingness in the X-space. The function pii/( 1-pii)
is known as the potential function. The second term in Hi is a residual term which
measures outlyingness of the observation in the y-direction. Since Hi is an additive
function of the residual and potential functions, it will be large if the observation
is an outlier in either the X-space, the y-space, or both. To determine which is the
case, Hadi (1992b) suggests plotting the potential versus the residual function, that
is, the scatter plot of
This plot is referred to as the potential-residual (P-R) plot. In the P-R plot, high-
leverage points are located in the upper area of the plot and observations with large
DETECTION
OF UNUSUAL OBSERVATIONS 459
~ ~
prediction error are located in the area to the right. Both Hi and the P-R plot have
been implemented in commercially available statistics packages such as Data Desk
and Stata.
The methods presented in Sections 111 and IV have been shown to perform well in
many real-life and simulated data. They produce results in a reasonable amount of
time for small to medium data sets. But for large data sets, increasing the basic subset
one observation at a time can be time consuming. Hadi and Velleman (1997) adapt
these methods to large data sets as follows:
460 HADIAND SON
3.53 . 1
. .
0.54
........ 0=42i
.
1.93
0.32 ........0.
. ..
Oi.,. ...............
-1.28
0 .
.
..................... . 0.I6
-2.89 .
1
1
1
5
1 1 1
9
1
13
1 1
17
1 1
21
1 1 1
25
1 1
1
1 1
5
1 1
9
1 1
13
1 1
17
1 1
21
0 .
1 1
25
1
3.80-’. 1.80-1.
2.85- 1.36 -
1.90- 0.92 -
-
0.95
0.00 ..........................
0.48
0.04-
-
.
.................. 0..
I 1 I l l I l l 1 I l l 1 1 - 1 1 1 1 1 1 1 1 1 1 1 1 1 1
Figure I 0 Financial data: Index plot of (a) r:, (b) p i ; , (c) Ci,and (d) Hi obtained when XI is
regressed on X2.
1. In some applications the data analyst may have some reasons to believe
that a certain subset of the data is free from outliers. In these applications,
this subset can be used as the first basic subset instead of the one obtained
in Step 0 of Algorithms 1 and 3 . Actually, this suggestion is applicable
to data sets of all sizes, but the computational savings that result from
eliminating Step 0 of Algorithms 1 and 3 increase as the size of the data
set increases. Additional computational savings can also be achieved if
the size of the chosen basic subset is larger than the initial size of p 1. +
2. In Step 2 of Algorithms 1 and 3, the basic subset size is increased by
one observation at a time. Computational savings can be realized here by
adding to the basic subset all observations that are below a certain cutoff
point. In this way the subset size grows more rapidly than in the original
algorithms.
DETECTION
OF UNUSUAL OBSERVATIONS 46 I
1.19 1 '1
-.- 0-90 -
m
Y
0.62-
22
E
0.33 -
0
26
0.04- -8S 0
Figure I I Financial data: The Potential-Residual plot obtained when X Iis regressed on Xz.
3. In Algorithms 2 and 4, testing starts only after the size of the basic subset
reaches h. Computational time can be saved by starting testing as soon as
the basic subset stabilizes or includes a certain prespecified number of
observations per parameter (e.g., six or more observations per parameter).
Note that in Algorithms 1 4 ,the nonbasic subset size can be at most n - h observa-
tions. These observations, which constitute less than 50%of the data, are interpreted
as outliers. The above modifications imply that the nonbasic subset can be as large
as n - s, where s is the size of the initial basic subset chosen by the data analyst. In
this case, the basic and nonbasic subsets are regarded as two distinct subgroups in
the data set. Each of these subgroups can be further divided into two smaller subsets
by applying the above modified algorithms as many times as desired. In this way, the
modified algorithms can be thought of as methods for finding homogeneous groups,
rather than finding outliers, in data sets.
REFERENCES
Bacon-Shone, J. and W. K. Fung (1987),A New Graphical Method for Detecting Single and
Multiple Outliers in Univariate and Multivariate Data, Journal ofthe Royal Statistical
Society ( C ) 36, 153-162.
Barnett, V. and T. Lewis (1984),Outliers in Statistical Data, 2nd ed., Wiley, New York.
Beckman, R. and R. D. Cook (1983),Outlier.. . s, Technometrics, 25, 119-149.
Campbell, N. A. (1980),Robust Procedures in Multivariate Analysis. I: Robust Covariance
Estimation, Applied Statistics, 29, 231-237.
Caroni, C. and P. Prescott (1992),Sequential Application of Wilks's Multivariate Outlier Test,
Applied Statistics, 41, 355-364.
462 HADIAND SON
Chatterjee, S.and A. S.Hadi (1986), Influential Observations, High Leverage Points, and
Outliers in Linear Regression (with discussions), Statistical Science, 1,379-416.
Chatterjee, S.and A. S.Hadi (1988),Sensitivity Analysis in Linear Regression, Wiley, New
York.
Cook. R. D. (1977),Detection of Influential Observations in Linear Regression, Technometrics,
19,lS-18.
Fung, W. K. (1988),Critical Values for Testing in Multivariate Statistical Outliers, Journal of
Statistical Computation and Simulation, 30, 195-212.
Gould, W. and A. S. Hadi (1993),Identifying Multivariate Outliers, Stata Technical Bulletin,
11,z-5.
Hadi, A. S. (1992a), Identifying Multiple Outliers in Multivariate Data, Journal ofthe Royal
Statistical Society, Series ( B ) ,54, 761-771.
Hadi, A. S.(1992b), A New Measure of Overall Potential Influence in Linear Regression,
Computational Statistics and Data Analysis, 14, 1-27.
Hadi, A. S. (1994),A Modification of a Method for the Detection of Outliers in Multivariate
Samples, Journal of the Royal statistical Society, Series ( B ) , 56, No. 2,393-396.
Hadi, A. S. and J. S. Simonoff (1993),Procedures for the Identification of Multiple Outliers
in Linear Models, Journal of the American Statistical Association, 88, 424, 1264-
1272.
Hadi, A. S.and J. S.Simonoff (1994),Improving the Estimation and Outlier Identification
Properties of the Least Median of Squares and Minimum Volume Ellipsoid Estimators,
Parisankhyan Sammikkha, 1,61-70.
Hadi, A. S. and J. S.Simonoff (1997),A More Robust Outlier Identifier for Regression Data,
Bulletin of the International Statistical Institute, 281-282.
Hadi, A. S. and P. F. Velleman (1997),Computationally Efficient Adaptive Methods for the
Identification of Outliers and Homogeneous Groups in Large Data Sets, Proceedings of
the Statistical Computing Section, American Statistical Association (in press).
Hampel, F. R., E. M. Ronchetti, P. J. Rousseeuw, and W. A. Stahel (1986),Robust Statistics:
The Approach Based on Influence Functions, Wiley, New York.
Hawkins, D. M. (1980),Identijlcicntion of Outliers, Chapman and Hall, London.
Huber, P. (1981),Robust Statistics, Wiley, New York.
Lopuha, H. P. (1989),On the Relation between S-Estimators and M-Estimators of Multivariate
Location and Covariance, Annals ofstatistics, 17, 1662-1683.
Maronna, R. A. (1976),Robust M-Estimators of Multivariate Location and Scatter, Annals of
Statistics, 4, 51-67.
Rasmussen, J. L. (1988),Evaluating Outlier Identification Test: Mahalanobis D Squared and
Comrey Dk, Multivariate Behavioral Research, 23, 189-202.
Rohlf, F. J. (1975),Generalization of the Gap Test for the Detection of Multivariate Outliers,
Biometrics, 31, 93-101.
Rousseeuw, P. J. and A. M. Leroy (1987),Robust Regression and Outlier Detection, Wiley, New
York.
Rousseeuw, P. J. and B. C. van Zomeren (1990),Unmasking Multivariate Outliers and Lever-
age Points (with discussion), Journcd ofthe American Statistical Association, 85,633-
651.
DETECTION
OF UNUSUAL OBSERVATIONS 463
Rousseeuw, P. J. and B. C. van Zomeren (19911, Robust Distances: Simulations and Cutoff
Values, in Directions in Robust Statistics and Diagnostics: Part II (W. Stahel and S.
Weisberg, eds.), Springer-Verlag, New York, 195-203.
Rousseeuw, P. J. and V. J. Yohai (1984), Robust Regression by Means of S Estimators, in
Robust and Nonlinear Time Series Analysis, Lecture Notes in Statistics, Springer-Verlag,
New York,' Vol. 26,256-272.
Schwager, S. J. and B. H. Margolin (1982), Detection of Multivariate Normal Outliers, Annals
of Statistics, 10,943-954.
This page intentionally left blank
14
Union-Intersection and Sample-Split
Methods in Econometrics with
Applications to MA and SURE Models
Jean-Marie Dufour
University ofMontreal, Montreal, Quebec, Canada
Olivier Torres
Universitc! de M e , Villeneuve d’Ascq, France
1. INTRODUCTION
465
466 DUFOUR
AND T O R R ~ S
For example, take m = 2. A model of type 1 could express the relation be-
tween the log of the wage and a variable measuring the level of education for two
individuals. The coefficient p is then interpreted as the return to education (Ashen-
felter and Krueger 1992), and we may wish to test whether this return is the same for
individuals 1 and 2. In models of type 2 we may wish to know whether the parameter
linking variable y to variable x is the same over the whole period of observation. An
example of a type 3 model could be two equations where ~ 1 and , ~~ 2 represent
, ~ the
consumption of two different goods and x 1,I and x2, are different vectors of explana-
tory variables. Model 3 is composed of two distinct relationships, but for some reason
we want to test the equality of the two coefficients. An important example of a type
4 model is a linear regression model with errors that follow a moving-average (MA)
process of order 1, where the first equation contains the odd-numbered observations
and the second equation the even-numbered observations.
The most common practice in such situations is to rely on asymptotic infer-
ence procedures. The lack of reliability of such methods is well documented in the
literature. This feature of asymptotic tests has been established by Park and Mitchell
(1980), Miyazaki and Griffiths (1984), Nankervis and Savin (1987), and DeJong et al.
UNION-INTERSECTION AND SAMPLE-SPLIT METHODS 467
(1992) in the context of AR(1) models. Burnside and Eichenbaum (1994)provide ev-
idence on the poor performance of GMM-based Wald test statistics. For more general
theoretical results on the inaccuracy of asymptotic methods, the reader may consult
Dufour (1997); see also Nelson, Startz, and Zivot (1996),Savin and Wiirtz (1996),
and Wang and Zivot (1996). Furthermore, there are situations where usual asymp-
totic procedures do not apply. For instance, consider a model for panel data with
time-dependent errors: if no assumption is made on the dependence structure, it is
not at all clear what should be done.
The main characteristic of model (1) is that the vector of dependent variables
y = (’yi,. . . ,y’,)’ is in some way divided into rn subsamples (different individu-
als and/or different subperiods), whose relationship is unknown. Because the joint
distribution of the vector of errors U = (U;,U;, . . . , u i ) ’ is not specified, usual
inference methods based on the whole sample y are not applicable. This chapter
develops inference procedures which are valid in such contexts.
The general issues we consider can be described as follows. Given several
data sets whose stochastic relationship is not specified (or difficult to model), but on
which we can make inferences separately, we study the following problems: (I) how to
combine separate tests for an hypothesis of interest bearing on the different data sets
(more precisely, how to test the intersection of several related hypotheses pertaining
to different data sets); for example, in model (l), we may wish to test whether the
linear restrictions Cip; = yio, i = 1, 2, . . . , rn, holdjointly; (11) how to test cross-
restrictions between the separated models (such as = pz = - . = Bm, when k; =
k, i = 1 , 2 , . . . , m), which involves testing the union of a large (possibly infinite)
numberof hypotheses of the preceding type (e.g., pi = PO,i = 1 , 2 , . . . , n,for some
PO); (111) how to combine confidence sets (e.g., confidence intervals or confidence
ellipsoids) for a common parameter of interest and based on different data sets in
order to obtain more accurate confidence sets. All these problems require procedures
for pooling information obtained from separate, possibly nonindependent, samples
and for making comparisons between them.
Besides being applicable to situations where the stochastic relationship be-
tween the different samples is completely unknown, the methods proposed will also
be useful for inference on various models in which the distribution of a standard
statistic based on the complete sample is quite difficult to establish (e.g., because
of nuisance parameters), while the distributional properties of test statistics can be
considerably simplified by looking at properly chosen subsamples. This is the case,
for example, in seemingly unrelated regressions (SURE) and linear regressions with
MA errors.
The methods proposed here rely on a systematic exploitation of Boole-Bonfer-
roni inequalities (Alt 1982) which allow one to bound the probability of the union (or
intersection) of a finite set of events from their marginal probabilities, without any
knowledge of their dependence structure. Although such techniques have been used
in the simultaneous inference literature to build simultaneous confidence intervals,
468 DUFOUR
AND TORR~S
especially in standard linear regressions (Miller 1981, Savin 1984), it does not ap-
pear they have been exploited for the class of problems studied here. In particular,
for general problems of type I, we discuss the use of induced tests based on rejecting
the null hypothesis when at least one of the several separate hypotheses is rejected
by one of several separate tests, with the overall level of the procedure being con-
trolled by Boole-Bonferroni inequalities. For problems of type 11, we propose using
empty intersection tests which reject the null hypothesis when the intersection of a
number of separate confidence sets (or intervals) is empty. In the case of confidence
intervals, this leads to simple rules that reject the null hypothesis when the distance
between two parameter estimates based on separate samples is greater than the sum
of the corresponding critical points. We also discuss how one can perform empty in-
tersection tests based on confidence ellipsoids and confidence boxes. For problems
of type 111, we propose using the intersection of several separate confidence sets as
a way of pooling the information in the different samples to gain efficiency. These
common characteristics have led us to use the terminology union-intersection (UI)
methods.
The techniques discussed in this chapter for type I problems are akin to pro-
cedures proposed for combining test statistics (Folks 1984) and for meta-analysis
(Hedges and Olkin 1985). Meta-analysis tries to combine the evidence reported in
different studies and articles on particular scientific questions; it has often been used
to synthesize medical studies. However, these studies have concentrated on situa-
tions where the separate samples can be treated as independent and do not deal
with econometric problems. Conversely, these methods are practically ignored in
the econometric literature. Note also that the techniques we propose for problems
of types I1 and 111 can be viewed as extensions of the union-intersection method
proposed by Roy (1953) (Arnold 1981, pp. 363-364) for testing linear hypotheses
in multivariate linear regressions, in the sense that an infinity of relatively simple
hypothesis tests are explicitly considered and combined. A central difference here
comes from the fact that the “simple” null hypotheses we consider are themselves
tested via induced tests (because we study quite distinct setups) and from the differ-
ent nature of the models studied.
As pointed out, our methods have the advantage of being versatile and straight-
forward to implement, even when important pieces of information are missing. These
also turn out to be easily applicable in various problems where the distributional
properties of test statistics can be considerably simplified by looking at appropri-
ately selected subsamples. We show in particular that this is the case for several
inference problems in SURE models and linear regressions with MA errors. This
provides original and rather striking examples of “sample-split techniques” for sim-
plifying distributional properties. For other recent illustrations of this general idea,
the reader may consult Angrist and Krueger (1994),Dufour and Jasiak (1995), and
Staiger and Stock (1997).In the first reference, the authors propose a sample-split
UNION-INTERSECTION AND SAMPLE-SPLIT METHODS 469
technique to obtain IV estimators with improved properties, while the two other pa-
pers suggest similar methods to obtain more reliable tests in structural models.
The paper is organized as follows. Section I1 presents the general theory: in the
context of a general statistical model, we derive procedures for testing null hypothe-
ses of types I and 11. In Section 111, we consider the problem of pooling confidence
sets obtained from different data sets (type I11 problems). In Section IV, we apply our
results to test the equality of linear combinations of parameters of different equa-
tions in a SURE model, an interesting setup where standard tests and confidence
sets only have an asymptotic justification (for a review, see Srivastava and Giles
1987). In particular, we impose no restrictions on the contemporaneous covariance
matrix, allowing for different variances and instantaneous cross-correlation. In Sec-
tion V, we study inference for linear regression models with MA(q) errors. We show
that our inference technique is very well suited for testing hypotheses on regression
coefficients in the presence of MA errors. We study in detail the case of an MA(1)
process and consider the problem of testing an hypothesis about the mean. We com-
pare our procedure with some alternative tests. It appears much easier to implement
than other commonly used procedures, since it does not require estimation of MA
parameters. We also study the performance of our method by simulation. The re-
sults show that sample-split combined test procedures are reliable from the point of
view of level control and enjoy surprisingly good power properties. We conclude in
Section VI.
for any 0 E 0 0 , = 00. Therefore, if we want the induced test to have level
a , we only need to choose the ay’sso that they sum to a (or less).
To our knowledge, there is no criterion for choosing the ay’s in an optimal
manner. Without such a rule, in most of our applications we will give the null hy-
potheses Ho, the same degree of protection against an erroneous rejection by taking
a , = a0 = a / r , V y E r. However, there may exist situations where we wish to weigh
the H0,’s in a different way. In particular, if for some reason we know that one of the
decisions d,l (say, accepting or rejecting Ho,~) is less reliable than the other deci-
sions, we are naturally led to give d,l less impact on the final decision concerning
the acceptance or rejection of Ho. In other words, we will choose a,/ c a,, V y # y’.
In the case where we choose a , = a0 = a / r , V y E r, we reject Ho, at level
a0 when y is in W,(ao).Assuming F,,e(x) is continuous in x, this region of Y can
be reexpressed as
UNION-INTERSECTION AND SAMPLE-SPLIT METHODS 47 I
If we assume that the statistics T, are identically distributed under the null hypothe-
sis, then F,,e = Fe, VO E 00and t,(ayg) = t(ao),V y E r, hence (with probability
1)
W ( a ) = Cy E Y : max T,@) 2 t(ao>}
*On admissibility of decision rules, see Lehmann (1986, Section 1.8, p. 17).
472 DUFOUR
AND TORR~S
r 3 S(xT, xa, . . . , x:) 5 S ( x l , x2, . . . , xr). In our case, S(A1, Az, . . . , Ar) =
min(A1, Az, . . . , Ar}, is clearly nondecreasing. For further discussion of admissi-
bility issues in such contexts, see Folks (1984).
which provides a lower bound for the probability of making a type I error. Of course,
this type of bound is of no use since we try to bound from above the probability of
an erroneous rejection of Ho. Appropriate upper bounds for the probability of an
intersection are difficult to obtain. Second, when r is infinite, it is impossible to
build W y ( a y )for every y Er.
It is however interesting to note that some null hypotheses can be written as
the union of several hypotheses (possibly an infinite number of such hypotheses).
It is then natural to construct an overall rejection region which is equivalent to the
infinite intersectionnYEr . example, consider the hypothesis Ho : 81 =
W y ( a y ) For
82 = . . = 8,, where 8; is a q x 1 subvector of the initial parameter vector 8. We
note that HO is true if and only if 380 E RQsuch that 81 = 8 2 = . . - = 8, = 80,
where 80is the unknown true value of 8; under the null. Defining Oo(80) (8 E 0 :
81 = 82 = . . . = 8, = O O } , we have 0 0 = U B O E R 4 0 ~ ( HO 8 0 )can
. be expressed
as an infinite union of subhypotheses Ho(80) : 8 E Oo(80).Therefore Ho is true if
and only if anyone of the Ho(8o)’s is true.
Obviously, it is impossible to test every Ho(80). Instead, we propose the follow-
ing procedure. For each i E { 1 , 2 , . . . , m } , we build a confidence region C; (yi,ai)
for 8i with level 1 - ai using the sample yi, where the ayi’s are chosen so that
ELl ai = a. This region satisfies
Pe[A;(8;,ai)] 2 1 - ai, V8 E 0
where Ai(8i, ai) = Cy E y : C;(y;, ai) 3 8 i } , i = 1,2,. . . , rn, and G 3 x means
that the set “G contains x.” In particular, if 80 is the true value of 8;, we have
Pe[A;(80,ai)]2 1 - a ; , V8 E 00
AND SAMPLE-SPLIT METHODS 473
UNION-INTERSECTION
i=l i= 1
We shall call a critical region of the form of W(a,m) an empty intersection test.
In our notation, W ( a ,m ) does not depend directly upon a, but on how the a;’s are
chosen to satisfy the constraint ELla; 5 a. For this procedure to be applicable,
we need to have confidence regions C i O i , a;)with levels 1 - ai.This is of course
possible in model (1) as long as 52i = i E { 1 , 2 , . . . , m } . We describe three
interesting special cases for which the procedure takes a simple and appealing form.
or equivalently,
rninI8; + c i u b ; , ai) : i = 1 , 2 , . . . , m }
n
c max(8; - c ; ~ , ( y ai)
i, : i = 1,2, . ..,m}
or
and check whether there is at least one element of E,*(a2)lying in El (al),in which
case the two confidence ellipsoids have a non empty intersection and Ho is accepted
at level al+ a2. This is justified by the following lemma.
Lemma 2. Let E,* (az) c E2 (a2)be the set of the solutions of (2). Then
Since E;(a2) is not empty, it follows that minyEE2(a2))Iy - PI 11’ > c ~ ( a l )a, con-
tradiction. Thus we must have El ( a l )n E,*(a2) # 0.
476 DUFOUR
AND TORR~S
To be more specific, step 2 simply requires one to find the vector 9 which minimizes
IIy - 91 subject to the restriction ( y - p2)’D(y - p2) = c 2 ( a 2 ) , and then to
reject Ho when 119 - 91 (I2 > c l ( a 1 ) .
-
where ui N ( 0 , aTIN,),and is a k x 1 vector of unknown parameters.
We wish to compare the two parameter vectors and 82. However, only the
standard errors of the coefficients are known (or reported), not their covariance ma-
trices. Then it is not possible to use the previous procedure. But it is possible to use
simultaneous inference techniques and build simultaneous confidence boxes (hy-
perrectangles). Various methods for building such confidence sets are described in
Miller (1981) and Savin (1984). More precisely, let us build for each of the two re-
gressions in (4) k simultaneous confidence intervals, denoted by C { ( y i , a{)for the
component /3i of /I;j,= 1 , 2 , . . . , k, i = 1 , 2 , such that
UNION-INTERSECTION AND SAMPLE-SPLIT METHODS 477
+
Then choosing the a i ’ s so that a1 a2 = a,and applying the results of Proposition
1, we reject Ho : = 8 2 at level a when the intersection of the two hyperrectangles
is empty.
Checking whether the intersection of the two boxes is empty is especially
simple because one simply needs to see whether the confidence intervals for each
component of @; have an empty intersection (as in Section II.B.l). Furthermore it
is straightforward to extend this technique in order to compare more than two re-
gressions. Similarly, although we proposed a test for the null hypothesis that all
the parameter vectors /I;are equal (then imposing that in each equation has the
same number of parameters), it is easy to extend this procedure in order to test the
equality of linear transformations of Pi,i = 1,2, . . . , rn. Indeed, the method re-
lies only on the ability to derive confidence regions for parameters which are re-
stricted to be equal under the null. This is clearly possible whenever the param-
eters of interest are of the form R;&. The procedure is actually applicable to any
function h(0) of the parameter, provided we are able to build a confidence region
for h(8).
i= 1
sumption ( A l ) : U = ( u { , u $ ., . . ,U;)’ -
We will consider two versions of (5), depending on whether we make the as-
N ( 0 , a 2 Z ~ where
), N =
Under A l , there exists an optimal test of Hi1) given be the critical region associ-
N;. EL1
ated with the Fisher F-statistic, based on the stacked modely = Xg + U , where
y = (Y’,,Y;, . . . ,yL)’, B = ( P i , . . . Pi)’, X = diag(X;);=i,z ,..., and
(A - Ao>’[s2R(X’X)-’R’]-’(A - A*)
F=
Q
I t
withA = (A’,, A;, . . . , A;)’, A = ( A l l A,, . . . , & ) I l A0 = (A;’, Ahz, . . . , &)I,
A; = R;, pi,& = (XiXi)-‘Xiyi, i = 1 , 2 , .. . , rn, R = diag(R;);,l,z ,...,m , s2 =
A A
I qiF(ai; qi, Ni - k ; ) }
in the A; space, and reject whenever n,:
C i b i , ai) = 0, with ai = a.
Note that, under assumption ( A l ) , the induced procedure for a test of Hh’) can
be improved by taking into account the independence of the regressions. In Section
11, we showed that the rejection region associated with an induced test of HA’) is
Ukl R(a;),where K ( a; )is the critical region for a test of /I; = /I0 at level ai.
Under ( A l ) , we have
Pe
L1
U KW
I = 1 - pe ny\K(ai)
I =1- n
m
i=l
Under Hh’) we have Pe[J’\K((a;)] = 1 - a;, Thus by choosing the a;’s so that
P~[Y\R(~;>I
B. Some Examples
We now present some illustrations of the procedure described in the previous sec-
tions.
If the producer has a cost-minimizing strategy, it can be shown that the demands for
factors K and L are given by
A stochastic version of this model would consist in the two-equation SURE model
kt = ak + bkPtk + k
Ut 1, = al + hip, -+-
1 1
Ut
where uk and U’ are two Gaussian random vectors with zero mean and covariance
matrices O ~ Z N and CJ:ZN, respectively, and N = 25 is the sample size for each vari-
able of the model. A restriction imposed by the theory is b k = bl, which will be
our null hypothesis. To test Ho, the procedure described in Section II.B.l is partic-
ularly well suited since we have no a priori information on the relation between the
UNION-INTERSECTION AND SAMPLE-SPLIT METHODS 48 I
0.3992
0.1659
Figure I 97.5%confidence ellipsoids and intervals in the Berndt example. Confidence el-
lipsoid for ( a k , bk)’: -; confidence ellipsoid for (all bl)’: ---;
confidence intervals for b k and
ul appear on the left and right vertical axes, respectively.
random variables U: and U:. Using the data provided in Berndt (1991), which are
described in Berndt and Wood (1975),we performed separate tests of the following
null hypotheses:
I , = -0.OMM + 0.28295 pi + Cf
(.OO1621) (.OO2350)
where the standard errors are given in parentheses. In Figure 1, we show the two
97.5%level confidence ellipsoids required for testing H l . It is straightforward to
see that we can reject both null hypotheses at level 5%because none of the regions
intersect. Similarly, the 97.5%level confidence intervals for b k and bl are respec-
tively (-0.01869,0.02539) and (0.1659,0.3992), and so do not intersect.
Since no information on the joint distribution of U: and U: is available, usual
GLS procedures cannot be applied in this context. However, suppose that we assume
482 DUFOUR
AND TORR~S
that (U!, U;, . . . , uss, U:, U:, . . . , ui5)’ is a Gaussian random vector with variance
matrix
as is usually done in SURE models. Using standard GLS techniques, the estimate of
( a k , b k , a l , bi)’ is (0.05100,0.00235,-0.04886,0.28804)’
and the F-statistics for
testing Ho and H: are 27.61 and 938.37, respectively. Since the corresponding 5%
asymptotic critical values are 4.05175 and 3.19958, the null hypotheses are both
rejected. However, one may prefer the empty intersection test procedure, because
it makes a weaker assumption on the error distribution. Moreover, GLS-based tests
only have an asymptotic justification.
where K; and Xi; represent the log wage and the schooling of the ith brother in the jth
family. These equations are the reduced form of a structural model which expresses
the relationship between the wage and years of schooling:
+
where F is a family specific component. We must have 0i = /?; A;,i = 1 , 2 .
The structural model has been estimated over a sample of 143 pairs of brothers.
The estimates reported by Ashenfelter and Zimmerman (1993, Table 3) are given
below, with standard errors in parentheses:
A natural hypothesis to test here is Ho : (B1, h.1)’ = (82,h.2)’. This can eas-
ily be tested from the estimated structural model, since Ho is equivalent to H; :
(01, AI)’ = ( 0 2 , A2)’. Here, we will use the hyperrectangle technique, because
Ashenfelter and Zimmerman (1993) do not provide the full estimated covariance
matrix for each regression. We first find a confidence interval with level 1 - a / 4 for
each one of the mean parameters in the structural model, and check whether the
two rectangles so obtained overlap, in which case we accept the null hypothesis.
This is done for a = 5%. Each event [0.0140,0.0900] 3 81, [-0.0326,0.0686] 3
A I ,[0.0199,0.1161] 3 Q 2 , [ -0.0320,0.0440] 3 h.2 occurs with probability 0.9875.
UNION-INTERSECTION AND SAMPLE-SPLIT METHODS 483
We accept the null hypothesis at level 5%, since the two boxes [0.0140,0.0900] x
[ -0.0326,0.0686] and [0.0199,0.1161] x [ -0.0320,0.0440] have a nonempty in-
tersection, which is [0.0199,0.0900] x [ -0.0320,0.0440].
In this section, we show that the procedures developed in Section I1 can be useful
for inference in some dynamic models.
yl = ml + ut, ut = Q ( B ) E ~ ,t E T = { 1 , 2 , .. . , 7’)
E EZ7,. . . , E o , E l , . . . , ET)’ - N ( 0 , a2ZT+(J
(6)
where Q ( z ) = $0 + +
$12 $2z2 + +.* $ f q Z q , $0
*
K
1, mt = x k = l x l k b k = xib,
b = ( b l , b 2 , . . . , b K ) ’ is a vector of unknown coefficients, and x l = ( x l l , x12, . . . ,
xl~)’,t = 1 , 2 , . . . , T , are vectors of fixed (or stricly exogenous) variables. In model
with
-
(6),y N ( m , a), where m = ( E m i , Em2, . . . , EmT)’ and = ( W ~ , . J ~ ,,..., ~ =T ~
, ,~
(7) shows the key feature of model (6): observations distant by more than q periods
from each other are mutually independent. Then, we are naturally led to consider
model (6)for subsamples obtained as follows. Define subsets of T, J; = {i, i ( q + +
+ + + +
l ) ,i 2(q l ) , . . . , i ni(q l)},where ni G I[(T - i ) / ( q l ) ] (I[.] denotes +
+
the integer part of x), i = 1 , 2 , . . . , q 1, and consider the q 1 equations +
Equation (8) belongs to the class of model (1). In each equation, the error term sat-
isfies the assumptions of the linear regression model, so that it is possible to apply
usual inference procedures to test restriction on b , Ho : b E Cp. This null hypothesis
+
can be seen as the intersection of q 1 hypotheses H o , ~each, of which restricts the
+
mean of the ith subsample to be in @, i = 1 , 2 , . . . , q 1. The methods presented
in Sections I1 and I11 are perfectly suited to such situations. We build q 1 critical +
+
regions with level a/(q 1) to test each one of the hypotheses Ho,i, and reject the
484 DUFOUR
AND TORR~S
null hypothesis at level a! if the vector of observations belongs to the union of these
regions. Note we did not make any assumption on the roots of W ( t ) . In particular,
we did not restrict the MA process { * ( B ) E ~: t E T} to be invertible.
In the next subsection we apply the procedure to a MA(1) process with a con-
stant and provide comparisons with some alternative procedures such as asymptotic
tests and bounds tests.
yt = B + + Et @&t-l, ~t -
ind
N ( 0 , a2), t E T (9)
The vector of parameters is 8 = (B, @, a2)'.The null hypothesis we consider is Ho :
8 E 0 0 , O o = (0 E 0 : B = 0). According to our procedure, assuming T is even,
we form two subsamples of size T/2, ( y t ,t E J i ) , where J 1 = { 1,3,5, . . . , T - 1)
and J2 = { 2 , 4 , 6 , . . . , T ) . For each subsample, we make inference on B from the
regression equation
where bi is the OLS estimator of B and P@;) the usual unbiased estimator of the
variance of /!Ii from regression (10) using sample (yt : t E J i ) ; t ( T / 2 - 1; 4 4 ) is the
upper 1 - a/4 percentile of Student's t distribution with T/2 - 1 degrees of freedom.
We reject Ho : B = 0 at level a! ify E Wl(a/2)U W2(a!/2).
2. Alternative Procedures
We compared this procedure with two alternatives. The first one consists in testing Ho
using bounds proposed by Hillier and King (1987), Zinde-Walsh and Ullah (1987),
Vinod (1976), and Kiviet (1980); see also Vinod and Ullah (1981, Chap. 4). The
latter are based on standard least-squares-based tests statistics for testing /? = 0
obtained from the complete sample, such as the t-statistic or its absolute value. Since
the distributions of the latter depend on the unknown value of the moving average
parameter @, one finds instead bounds &(a) and t " ( a ) which do not depend on the
parameter vector 8 and such that
UNION-INTERSECTION AND SAMPLE-SPLIT METHODS 485
for all 0 E 0 0 , a E (0, 1). Then the decision rule that consists in rejecting Ho
when TCy) > t"(a)and accepting Ho when T O ) < &(a)has level a. An incon-
venient feature of such procedures is that they may be unconclusive (when T O ) E
[t'(a),t"(a)]). Obviously, to avoid losses of power, the bounds should be as tight as
possible.
In all the above references on bounds tests, the bounds are derived assuming
that the MA parameter is known, so that they depend on it, even under the null
hypothesis. Therefore we will denote by t i ( a )and t i ( a )the lower and upper bounds
on te(a).But as @ is unknown, we have to find the supremum, t"(a),over the set
{ t i ( a ): ,$(a)L t o ( @ ) , VO E Oo}, to make sure that the test based on the rejection
region
Since the moving-average parameter is not restricted by Ho, the set of admissible
values for @ is R. The upper bound is then likely to be quite large.
In the context of model (9), T O ) is typically the usual t-statistic, its square
or its absolute value. Since under Ho, its distribution only depends on @ (and the
sample size), we write te, t ; , and tfL instead of t g , t i , and t i , respectively.
Here, we only use the bounds of Zinde-Walsh and Ullah (1987) and Kiviet
(1980),denoted by t;,$(a) and t i , $ ( a ) because
, they are respectively tighter than
those of Hillier and King (1987)and Vinod (1976).The supremum t i ( a )of ti,+(a)
for @ E R is difficult to establish, but Kiviet (1980,Table 6, p. 357), gives the values
of the bounds for @ E {.Z, .3, .5, .9},and it can be seen that t i , , 9 ( a 2 ) t i , + ( a )for
,
@ E { 2 , .3, .5, .9}.We note that these bounds increase with @, and we suspect that
the supremum is arbitrarily large, possibly infinite when @ # 1. Nevertheless, we
will use t i 9(a)as the relevant upper bound in our simulations. Zinde-Walsh and
Ullah (1987)derived bounds on the Fisher statistic (or on the square of the t-statistic
in our case). t ; , + ( ( ~ is
) proportional to the ratio )Lmax(@)/hmin(@) of the highest and
lowers eigenvalues of the covariance matrix ofy:
We need to make here a remark about the accuracy of Zinde-Walsh and Ullah's
bound. Their test rejects Ho at level a when [TCy)I2 > s ~ p ~ ~ ~ t $ , t;(a). ~ ( a )
The critical value t i ( a )is not easy to determine analytically, so instead of finding
the maximum of t i , $ ( a )on R, we reckoned ti,$(0.05)for some values of @ in the
interval [-1, 21. We found a maximum at @ = 1, and a minimum at @ = -1, for
every sample size we considered. Although ti,,(0.05) 5 t;(0.05), we used this
486 DUFOUR
AND TORRCS
value as the upper bound. Doing so gives more power to the Zinde-Walsh-Ullah test
than it really has, because it may reject Ho more often than it would do if we used
t;(0.05). Despite this fact, t;, (0.05) is so large (see Table 1)that the power of the
test is zero everywhere on the set of alternatives we considered, for any sample size
and for any @ (see Section V.B.3).
The second alternative consists of using asymptotic tests. In this category, we
considered three commonly used test. The first category includes tests based on a
GLS estimation of (9). In the first step, one finds a consistent estimator h of G!and
P such that P’P = h-’ . In the second step, we multiply both sides of (9) by P and
apply ordinary least squares (OLS) to that transformed model. In the last step, we
test Ho using the standard F-test. We examine two estimation procedures that lead
to a consistent estimator of /?,resulting in two test statistics. The first one is detailed
in Fomby, Hill, and Johnson (1984, pp. 220-221). We denote it by GLS-MM because
in the first step of GLS, we estimate the MA parameter @ by the method of moments.
@ is estimated by minimizing the distance (in the sense of the Euclidean norm on
R) between the sample and true first-order autocorrelations. The second estimation
procedure uses exact maximum likelihood in the first step of GLS and will be denoted
by GLS-ML.*
The third test we consider is motivated by a central limit theorem (Brockwell
and Davis 1991, p. 219) which establishes the following property: if a process, with
mean B, has an infinite-order MA representation with IID error terms and MA coef-
ficients @i, i = . . . , -2, - 1 , O , 1,2, . . . , satisfying the conditions
00 00
then the sample mean of the process is asymptotically normally distributed, with
mean /3 and variance T - ’ y ( k ) , where y ( k ) is the autocovariance at lag k .
Note that the last condition on the @i’s is not satisfied for the MA(1) process (9) with
@ = -1, but as @ is unknown, we might not be aware of this fact or ignore it. Then a
*For further discussion of ML estimation in this context, see Tunnicliffe Wilson (1989)and Laskar and
King (1995).
UNION-INTERSECTION AND SAMPLE-SPLIT METHODS 487
T = 25 T = 50
Sample size $* Size (%) ACV CCV $* Size (%) ACV CCV
Sample size +* Size (%) ACV CCV $* Size (%) ACV CCV
natural way of testing Ho is to estimate B by the sample mean LTand the asymptotic
variance by the consistent estimator proposed in Newey and West (1987):
*Of course, the list of the methods considered in the present simulation is not exhaustive. For example,
possible variants of the NW method include the covariance matrix estimators proposed by Wooldridge
(1989).Bayesian methods (Kennedy and Simons 1991) and marginal likelihood methods (King 19%)
could also be used in this context. But space and time limitations have precluded us from including all
proposed methods in our simulations.
488 DUFOUR
AND TORRB
(a) T=25
80
beta beta
(c) T=75 (d) T=lOO
beta
(a) T=25
100
1
0 1
beta beta
(c) T=75 (d) T=lOO
beta beta
(ACV), and the 5% corrected critical value (CCV), for each sample size T , and each
of the three asymptotic procedures.
3. Simulations
In our simulations, we proceeded as follows. For 1c/ E {-1, -.5,0, .5. 1) and T E
{25,50,75, loo}, we considered a grid of /? values around = 0. In each case,
1000 independent samples (yt, t = 1 , 2 , . . . , 7‘) were generated and the follow-
ing statistics were computed: (1)the t-statistic based on the whole sample; (2) the
t-statistic based on the two subsamples (yt : t E J,) and (yL: t E J 2 ) containing
the odd and even observations respectively; (3) the GLS-MM and GLS-ML based
F-statistics; (4) the c,””
-statistic. Using these statistics, the following tests were
implemented at level 5% and the corresponding rejection frequencies were com-
puted: (a) Zinde-Walsh and Ullah’s bounds test; (b) Kiviet’s bounds test;* (c) GLS-
*Because Kiviet (1980)does not provide the upper bound for T = 75 and T = 100, we did not investigate
the behaviour of Kiviet’s test for these values of T.
490 DUFOUR
AND TORR~S
0 1 0 1
beta beta
. . T=75
(c) (d) T=lOO
0 1
beta beta
MM asymptotic test (corrected and uncorrected for size); (d) GLS-ML asymptotic test
(corrected and uncorrected for size); (e) NW asymptotic test (corrected and uncor-
rected for size); (f) the induced test which combines the standard t-tests based on
the two subsamples (yt : t E ,II)and (yt : t E 1 2 ) ; (g) the separate tests based on the
subsamples (yt : t E J 1 ) and (yt : t E 1 2 ) . The results are presented in Figures 2 to
6 and Tables 3 to 7 .
As it became clear in the description of the induced test, when applying such
a procedure to model (9),one is led to split the sample in two, and make two tests
at level a/2. At first sight, the procedure displays features that may seem quite
unattractive. First, it splits the available sample in two, and second it combines two
tests whose levels are only a/2 (instead of a).From these two remarks, one may ex-
pect the procedure to lack power. But we should keep in mind that, since the two
“subtests” have level a/2, the resulting induced test has level certainly greater than
a/2 (although not greater than a).Furthermore, this test actually uses the informa-
tion contained in the whole sample. Then it becomes less clear whether the induced
test procedure automatically leads to a loss of power relatively to other alternative
UNION-INTERSECTION AND SAMPLE-SPLIT METHODS 49 I
beta beta
. . T=75
(c) (d) T=lOO
0
beta beta
tests. Two questions arises from these remarks: (1) Is combining preferable to not
combining? i.e., should our decision at level a rely on an induced test procedure or
on a test based on one of the subsample only? (2) How does the induced test compare
with the procedures mentioned in Section V.B.2?
Figures 2 to 6 answer the first question. They show that the power of the in-
duced test (solid line) is generally higher than that of an a-level test based on one
of the two subsamples (dashed lines). In other words, combining is preferable to not
combining. When it is not the case (when the true value of the MA parameter is
unity, @ = 1, see Figures 2a to Zd), the power loss from using the induced test is
very small, so that one would usually prefer the sample-split procedure that uses all
the observations.
Tables 3 to 7 report the estimated probability of a rejection of Ho : B =
0 for different sample sizes (7' E (25, SO, 75, 100)) and true values for p ( p E
{ - 1, -.8, - .5, -.2,0, 2 , .5, .8, 1 }), for each one of the test procedures of Section
V.B.2. If we first consider bounds tests, we note that the Kiviet test is dominated by
the induced test, except for $ = .5 and $ = 1. We already mentioned in Section
492 DUFOUR
AND TORR~S
‘51 0 1
beta beta
(c) T=75 (d) T=100
0 1
beta beta
(. . . and - . -).
V.B.2 that the bound which has been used here, namely ti,,(0.05), is not appropri-
ate because we do not know whether this value satisfies the level constraint:
In other words, a critical region based on Kiviet’s bounds has an unknown level.
Moreover, what makes the induced test more attractive relatively to Kiviet’s test is
that it avoids the calculation of a bound that changes with the sample size. Finally,
because Zinde-Walsh and Ullah’s upper bounds are so large (see Table l),the power
of their test is zero for all @. These are not reported in Tables 3-7.
The most surprising result which emerges from our Monte Carlo study can
be seen in Tables 3’4, and 5. Once the asymptotic critical values used for the GLS-
MM and GLS-ML tests have been corrected so that the corresponding critical regions
have the desired level, our procedure becomes more powerful than these alternatives
for many plausible values of @. The difference between estimated power functions
grows as @ increases, but diminishes when the sample size T gets larger. The GLS-
MM method seems to be the worst of all the asymptotic procedures studied here,
UNION-INTERSECTION AND SAMPLE-SPLIT METHODS 493
-1.0 0.80 3.40 56.00 66.70 59.80 97.70 0.70 99.10 98.80
-0.8 0.20 1.10 37.80 46.60 40.00 84.90 0.10 91.50 89.70
-0.5 0.00 0.20 17.00 20.10 16.60 40.00 0.00 52.00 48.00
-0.2 0.00 0.00 6.10 6.20 5.20 6.60 0.00 14.30 9.90
-0.1 0.00 0.00 4.50 5.00 3.50 2.20 0.00 7.00 4.40
0.0 0.00 0.00 4.30 4.30 3.50 1.10 0.00 4.30 2.70
0.1 0.00 0.00 5.20 5.10 4.30 2.10 0.00 6.50 3.80
0.2 0.00 0.00 6.50 7.60 5.30 5.80 0.00 11.80 9.80
0.5 0.10 0.60 18.10 21.20 18.10 38.60 0.00 55.20 48.90
0.8 0.10 1.80 38.80 46.40 40.50 83.60 0.20 91.10 89.40
1.o 0.30 3.90 55.00 64.90 59.60 %.70 0.80 98.90 98.80
Sample size T = 50 Sample size T = 100
- 1.0 46.00 0.20 92.40 93.30 90.70 99.50 25.60 99.80 99.70
-0.8 21.50 0.00 77.30 78.30 72.30 97.00 6.70 97.70 %.90
-0.5 3.50 0.00 40.10 40.20 32.90 61.30 0.20 69.10 63.20
-0.2 0.20 0.00 9.60 10.00 6.80 11.90 0.00 17.40 12.20
-0.1 0.00 0.00 4.90 4.50 3.50 4.60 0.00 7.90 5.00
0.0 0.10 0.00 3.90 3.90 2.40 3.40 0.00 4.40 2.80
0.1 0.10 0.00 4.30 4.70 3.50 5.60 0.00 7.40 3.90
0.2 0.20 0.00 8.20 9.20 6.60 11.60 0.00 15.10 10.30
0.5 3.70 0.00 34.90 40.60 33.70 62.80 0.30 67.00 60.40
0.8 21.00 0.10 74.00 78.60 72.70 96.50 4.60 97.30 96.60
1.o 45.10 0.40 89.70 95.00 90.30 99.90 22.90 99.90 99.80
494 AND TORR~S
DUFOUR
-1.0 83.00 2.40 99.30 99.80 99.40 100.00 65.10 100.00 100.00
-0.8 50.80 0.20 93.80 94.30 94.00 99.80 26.80 99.90 99.90
-0.5 10.80 0.00 58.30 59.20 55.80 86.80 1.10 90.90 89.00
-0.2 0.40 0.00 12.70 11.50 11.90 20.50 0.00 24.70 22.50
-0.1 0.10 0.00 6.50 4.10 5.60 6.00 0.00 10.90 7.90
0.0 0.10 0.00 3.30 2.90 3.60 3.40 0.00 4.30 3.50
0.1 0.20 0.00 5.70 4.30 5.20 7.40 0.00 8.70 6.70
0.2 0.40 0.00 13.00 12.20 11.90 19.10 0.00 22.30 20.70
0.5 10.40 0.00 59.70 59.30 57.40 88.10 1.oo 89.70 88.10
0.8 50.60 0.50 94.80 96.00 94.70 100.00 25.20 100.00 99.90
1.0 82.10 3.50 99.40 99.70 99.50 100.00 63.30 100.00 100.00
UNION-INTERSECTION AND SAMPLE-SPLIT METHODS 495
-1.0 35.30 45.80 97.80 96.20 97.00 100.00 70.30 100.00 100.00
-0.8 19.30 23.10 89.30 81.70 85.20 100.00 28.20 100.00 100.00
-0.5 6.20 5.10 48.40 31.70 45.30 96.90 0.80 98.50 94.30
-0.2 2.50 0.90 8.90 2.80 9.90 25.80 0.00 33.70 28.20
-0.1 1.20 0.30 3.30 0.90 6.60 7.20 0.00 11.80 10.40
0.0 1.oo 0.20 3.10 0.40 4.70 1.40 0.00 2.70 4.50
0.1 1.40 0.60 5.10 0.80 7.00 6.90 0.00 10.60 9.50
0.2 1.90 0.70 9.40 3.OO 11.oo 23.40 0.00 32.10 27.10
0.5 7.00 5.40 48.30 31.80 45.70 96.10 0.80 98.00 94.30
0.8 20.80 23.80 87.50 80.50 85.90 100.00 26.70 100.00 100.00
1.o 37.70 45.30 98.00 97.20 97.00 100.00 70.70 100.00 100.00
Sample size T = 50 Sample size T = 100
-1.0 99.70 40.80 100.00 100.00 100.00 100.00 99.80 100.00 100.00
-0.8 95.00 13.20 100.00 99.90 99.60 100.00 93.20 100.00 100.00
-0.5 45.10 0.50 88.60 73.30 80.20 99.80 20.80 99.60 98.30
-0.2 2.70 0.00 21.40 6.90 19.40 42.50 0.00 47.50 36.30
-0.1 0.50 0.00 7.50 1.40 6.50 12.30 0.00 14.60 12.50
0.0 0.30 0.00 2.40 0.10 4.70 3.90 0.00 3.30 5.20
0.1 0.90 0.00 6.90 1.60 8.00 12.60 0.00 12.70 11.50
0.2 2 -60 0.00 20.80 7.10 18.30 43.50 0.20 43.60 35.40
0.5 43.80 0.60 89.00 72.90 80.20 100.00 17.40 99.60 98.60
0.8 95.10 13.30 99.90 99.70 99.60 100.00 94.00 100.00 100.00
1.0 99.60 41.30 100.00 100.00 100.00 100.00 99.90 100.00 100.00
496 DUFOUR
AND TORR~S
?!/ GLS-ML GLS-MM NW Kiv. Ind. test GLS-ML GLS-MM NW Ind. test
-1.0 93.50 91.70 100.00 97.50 97.20 100.00 100.00 100.00 100.00
-0.8 82.80 76.30 99.80 77.30 84.90 100.00 98.50 100.00 100.00
-0.5 49.80 41.70 82.70 10.10 40.30 100.00 56.70 100.00 96.90
-0.2 18.00 18.20 6.30 0.00 10.80 82.90 14.70 65.10 24.60
-0.1 8.00 8.60 1.40 0.00 6.40 29.60 9.30 10.80 9.90
0.0 4.50 4.30 0.10 0.00 4.40 3.60 5.30 0.40 4.00
0.1 10.60 9.70 1.90 0.00 6.20 29.20 9.60 12.20 9.40
0.2 18.90 19.50 8.70 0.00 11.10 81.50 1.5.30 67.10 23.80
0.5 50.00 41.80 81.40 10.50 42.80 100.00 58.00 100.00 97.60
0.8 82.10 76.00 99.80 76.60 84.40 100.00 98.10 100.00 100.00
1.o 93.10 92.40 100.00 97.90 97.90 100.00 100.00 100.00 100.00
Sample size T = 50 Sample size T = 100
~~ ~~
- 1.0 100.00 97.90 100.00 100.00 100.00 100.00 100.00 100.00 100.00
-0.8 100.00 86.90 100.00 100.00 100.00 100.00 100.00 100.00 100.00
-0.5 98.60 38.90 100.00 66.10 81.80 100.00 97.00 100.00 99.90
-0.2 33.50 16.80 39.70 0.10 18.30 94.90 20.60 84.60 32.30
-0.1 10.30 10.30 4.60 0.00 7.00 45.50 9.40 18.60 11.00
0.0 2.90 5.00 0.10 0.00 4.40 5.00 4.40 0.40 4.30
0.1 9.50 9.10 5.60 0.oo 7.10 44.40 9.30 16.80 10.10
0.2 33.90 15.80 37.20 0.00 17.30 95.30 19.70 82.70 32.00
0.5 99.00 41.30 99.90 67.00 80.20 100.00 97.40 100.00 99.80
0.8 100.00 87.70 100.00 99.90 99.90 100.00 100.00 100.00 100.00
1.0 100.00 98.10 100.00 100.00 100.00 100.00 100.00 100.00 100.00
UNION-INTERSECTION AND SAMPLE-SPLIT METHODS 497
-1.0 99.50 93.10 100.00 79.90 85.20 100.00 100.00 100.00 100.00
-0.8 98.20 80.10 100.00 34.80 64.00 100.00 99.20 100.00 100.00
-0.5 89.80 58.30 89.10 0.00 28.70 100.00 82.30 100.00 83.60
-0.2 49.90 40.80 0.10 0.00 9.00 100.00 51.60 84.60 17.10
-0.1 2.20 4.30 0.00 0.00 5.90 98.80 47.00 0.10 8.10
0.0 0.00 0.00 0.00 0.00 4.80 0.70 0.00 0.00 3.80
0.1 2.30 2.80 0.00 0.00 6.10 99.10 47.00 0.10 8.10
0.2 52.40 41.40 0.40 0.00 9.30 100.00 51.40 87.90 17.90
0.5 90.70 59.40 88.10 0.40 28.50 100.00 82.00 100.00 85.50
0.8 98.10 8 1.30 100.00 34.30 64.70 100.00 99.30 100.00 100.00
1.o 99.60 93.10 100.00 79.40 87.20 100.00 100.00 100.00 100.00
Sample size T = 50 Sample size T = 100
-1.0 100.00 98.30 100.00 100.00 100.00 100.00 100.00 100.00 100.00
-0.8 100.00 92.60 100.00 99.20 99.00 100.00 100.00 100.00 100.00
-0.5 100.00 69.30 100.00 13.10 58.20 100.00 99.20 100.00 98.30
-0.2 96-60 46.00 29.10 0.00 12.10 100.00 6 1.20 99.70 21.70
-0.1 75.00 42.10 0.00 0.00 6.20 100.00 48.60 0.50 8.50
0.0 0.00 0.00 0.00 0.00 3.SO 1.10 0.00 0.00 3.80
0.1 76.00 4 1.90 0.00 0.00 6.50 100.00 48.90 0.40 8.70
0.2 %.40 45.80 28.20 0.00 10.80 100.00 61.90 99.50 20.70
0.5 100.00 69.30 100.00 12.10 59.20 100.00 99.00 100.00 98.50
0.8 100.00 92.30 100.00 99.50 98.70 100.00 100.00 100.00 100.00
1.o 100.00 98.30 100.00 100.00 100.00 100.00 100.00 100.00 100.00
498 DUFOUR
AND TORR~S
whereas GLS-ML appears to benefit from the asymptotic efficiency property of max-
imum likelihood estimators. But for nonnegative values of @, the sample size has to
be T = 100 for the GLS-ML test to have a probability of correctly rejecting the null
as high as the induced test. The GLS-MM test is still dominated for some negative
values of +(@ = -.5), irrespective of the sample size. Only when @ is close to -1
does this procedure become admissible.
While the two comm‘only used asymptotic inference procedures, GLS-MM and
GLS-ML, cannot be recommended on the ground of our Monte Carlo study, the con-
clusion is less negative for the NW method. Except for small sample sizes ( T = 25)
and large values of the MA parameter (@ = 1, .5), it does better than the induced
test procedure. This result is somewhat unexpected because the Newey-West estima-
tor of V ( ~ Tdoes
) not take into account the autocovariance structure of the process.
However, although the induced test is conservative, it is more powerful than NW test
for alternatives close to the null hypothesis when @ is negative. Furthermore, it is
important to remember that the NW test suffers from level distortions (see Table 2)
that are not easy to correct in practice.
We now apply our procedure to test the nullity of the mean of a process that has a
MA(1) representation. Our series is the first difference of the Canadian per capita
GDP, denominated in real 1980 Purchasing Power Parity-adjusted US dollars, ob-
served yearly from 1901 to 1987. It is taken from Bernard and Durlauf (1995). Fig-
ure 7 plots the series. Using standard Box-Jenkins procedure (autocorrelation and
partial autoconelation functions), we identified a MA( 1) process for the series (see
Table 8).
B
We then consider a model like (9). ML estimation of (9) gives = 136.1810
and IJ = 0.4211 with estimated variances 945.1919 and 0.0095, respectively.
6)
The estimated COV(B, is 0.0834 and the sample variance of the residuals is
401 17.5725.
To implement an induced test for the nullity of the mean parameters, /3, at level
5%, we split the sample in two parts, {yL : t E J i } , i = 1 , 2 , which contain respec-
tively the odd and the even observations, and make two 2.5% tests of /3 = 0, using the
statistics ti = &‘lY;(/si, where = ri x / n i , s? = (EjEJ,(% - r i ) 2 ) / ( n i- l ) ,
and ni is the size of subsample i, i = 1, 2. We reject the null hypothesis when
t l > t(a/4,U I ) or t 2 > t(a/4,UZ), where t(a/4, U) is the 1 - ( a / 4 )percentile of
Student’s t distribution with U degrees of freedom. We also perform both GLS-MM
and GLS-ML asymptotic tests. Our results are reported in Table 9. is the two step
estimator of B, $ the estimator of @ that has been obtained in the first step to esti-
mate the error covariance matrix, and t the test statistic, whose distribution will be
approximated by a student’s t-distribution with 86 degrees of freedom. Both subtests
UNION-INTERSECTION AND SAMPLE-SPLIT METHODS 499
Year
Figure 7 First differences of the Canadian per capita GDP. (From Bernard and Durlauf
1994.)
reject the null hypothesis at level 2.5%. Hence the induced test rejects the nullity
of the mean at level 5%. The two asymptotic tests also reject the null hypothesis, if
we admit that the asymptotic critical value is a good approximation when the sample
size is 87. Our findings are consistent with the results of the Monte Carlo study of
Section V.B.3. For similar sample sizes ( T = 75 or T = 100) we found that the
GLS-MM test produces larger values of the test statistic than the GLS-ML test does.
This is what we have here with T = 87.
If we decide to include a linear trend in the mean of the MA(1) process, our
induced test procedure still applies. The per capita GDP series now admits the rep-
resentation
aa 1 2 3 4 5 6 7 8 9 10 11 12
utocorrelation .41 .19 .10 -.04 .OS .07 .12 .04 -.04 .09 .08 .20
tandard error .ll .12 .13 .13 .13 .13 .13 .13 .13 .13 .13 .13
jung-box 15.4 18.8 19.8 19.9 20.2 20.6 22.1 22.3 22.5 23.3 24.0 28.3
@statistic
500 DUFOUR
AND TORR~S
s(v)
t
127.6836
4.1892 (43)
125.4406
3.5076 (42)
122.2522
7.5112 (86)
123.6574
6.9345 (86)
p-value 0.00014 0.00109 0.00000 0.00000
$ - - 05298 0422 1
For each hypothesis, we perform the induced test as well as the asymptotic test.
Results appear in Table 10. We note that only one of the subtests rejects the pres-
ence of a linear trend. However, according to our decision rule, this is enough to
reject HA’). Both GLS-MM and GLS-ML unambiguously reject this hypothesis. But
we know from our simulations that the asymptotic tests tend to reject the null too
often when it is true. For the parameters p;, j = 1,2, we also report two confidence
intervals and I;, each with level 97.5%, based on the two subsamples (yt : t E J l )
and (yt : t E J 2 ) . The intersection Z{ n Zi gives the set of values y E R such that the
hypothesis H ; ( y ) : 3/; = y is not rejected at level 5% by the induced test. These
intervals are
tj, j = 1 , 2 , and F denote Student’s t - and Fisher’s F-statistics used for testing H$),j = 1 , 2 , and Ho,
respectively.
UNION-INTERSECTION AND SAMPLE-SPLIT METHODS 50 I
In this chapter we proposed a set of inference methods for comparing and pooling
information obtained from different data sets, which simply use separate tests (or
confidence sets) based on the different data sets. The methods described are based
on a systematic exploitation of Boole-Bonferroni inequalities and can yield exact
tests and confidence sets without the need to specify at all the relationship between
the data sets, even with small sample sizes. As a result, they are quite versatile and
usually easy to implement. The general problems studied include (1) combining sep-
arate tests based on different data sets for an hypothesis of interest (more precisely,
502 DUFOUR
AND TORRh
for the intersection of similar hypotheses), to obtain more powerful tests; (2) compar-
ing parameters estimated from the different data sets (e.g., to test their equality); (3)
combining confidence regions based on different samples to obtain a more accurate
confidence set. For problem 1,we were led to consider Bonferroni-type induced tests;
for problem 2, we proposed empty intersection tests; and for problem 3, we suggested
taking the intersection of separate confidence sets with appropriate levels.
We also showed that the methods proposed can be quite useful in various mod-
els where usual inference procedures based on a complete sample involve difficult
distributional problems (e.g., because of nuisance parameters), but for which distri-
butional properties of test statistics computed on appropriately selected subsamples
are simpler. This leads to an interesting form of sample-split (SS) method. One first
splits the sample into several subsets of observations from which separate inferences
(tests of confidence sets) are obtained. Then these results are recombined, using the
general union-intersection (UI) methods already described, to obtain a single infer-
ence which uses the full sample. The way the data is split depends on the model
considered. In some situations the structure naturally suggests the division. This
is for example true when the model contains several equations. In other cases, the
division is based on more elaborate arguments, as in moving average models.
The UI/SS methods proposed can be applied to a wide spectrum of econometric
situations and models. We discussed and illustrated their applications in two cases
where only asymptotic methods are typically available, namely inference in SURE
models and linear regressions with MA errors. In the latter case, we also presented
an extensive Monte Carlo study comparing the UI-SS method for testing an hypoth-
esis about a mean with other available approaches. Two main conclusions emerged
from these results: first, they provided further evidence on the size distortions asso-
ciated with usual asymptotic procedures; second, they showed that UI-SS tests not
only have the predicted levels, but enjoy good power properties. In view of the fact
that these methods involve splitting the sample and lead to conservative procedures,
hence leading one to expect a power loss, this is indeed quite remarkable. Our re-
sults show that the Bonfen-oni-based recombination of the evidence obtained from
the different subsamples apparently makes up for the loss. For another application
of UI-SS approach to autoregressive and other types of dynamic models, the reader
may consult Dufour and Torrks (1995).
Before closing, it is worthwhile noting a few other points. First, the U1 (or UI-
SS) procedures are often simpler to implement than usual asymptotic procedures.
For SURE models and linear regressions with MA errors, they only require critical
values from standard distributions. For MA models, they avoid the task of estimating
MA coefficients. Second, they offer some extra robustness to model specification, as
illustrated by SURE where no assumption on the relationship between the different
equations is needed. Third, although we stressed here the derivation of finite sample
methods, there is nothing that forbids the application of U1 (or UI-SS) methods to
situations where only asymptotically justified tests or confidence sets are available
UNION-INTERSECTIONAND SAMPLE-SPLIT METHODS 503
from the separate data sets. In such cases, the methods are applied in exactly the
same way. This feature may be especially attractive for gaining robustness to model
specification. We think all these properties make this UI-SS approach an attractive
and potentially quite useful addition to the methods available to applied econome-
tricians.
ACKNOWLEDGMENTS
We thank Eugene Savin, the editor, David Giles, and an anonymous referee for sev-
eral useful comments. This work was supported by grants from the Social Sciences
Research Council of Canada, the Natural Sciences and Engineering Council of Canada,
and the Government of Qukbec (Fonds FCAR).
REFERENCES
Dufour, J.-M. (1990), Exact Tests and Confidence Sets in Linear Regressions with Autocorre-
lated Errors, Econometrica, 58,479-494.
Dufour, J.-M. (1997), Some Impossibility Theorems in Econometrics, with Applications to
Structural and Dynamic Models, Econometrica, forthcoming.
Dufour, J.-M. and J. Jasiak (1993, Finite Sample Inference Methods for Simultaneous Equa-
tions and Models with Unobserved and Generated Regressors, Discussion Paper, CRDE,
Universitede Montreal.
Dufour, J.-M. and 0. Torrhs (1995), Two-sided Autoregressions and Exact Inference for Sta-
tionary and Nonstationary Autoregressive Processes, Discussion Paper, CRDE, Univer-
sitkde Montrkal.
Folks, J. L. (1984), Combination of Independent Tests, in Handbook of Statistics, Volume 4,
Nonparametric Methods (P. R. Krishnaiah and P. K. Sen, eds.), North Holland, Amster-
dam, pp. 113-121.
Fomby, T. B., R. C. Hill and S. R. Johnson (1984),Advanced Econometric Methods, Springer-
Verlag, New York.
Gourikroux, C. and A. Monfort (1989),Stntistique et Modbles kconom&triques,vol. 2, Econom-
ica, Paris.
Hedges, L. V. and I. Olkin (1985),Statistical Methodsfir Meta-Analysis, Academic Press, San
Diego.
Hillier, G. H. and M. L. King (1987), Linear Regression with Correlated Errors: Bounds on
Coefficient Estimates and t-values, in Specijcation Analysis in the Linear Model (M. L.
King and D. E. A. Giles, ed.), Routledge & Kegan Paul, London, Ch. 4.
Kennedy, P. and D. Simons (1991), Fighting the Teflon Factor: Comparing Classical and Bayesian
Estimators for Autocorrelated Errors, Journal of Econometrics, 48, 15-27.
King, M. L. (1996), Hypothesis Testing in the Presence of Nuisance Parameters, Journal of
Statistical Planning and Inference, 50, 103-120.
Kiviet, J. F. (1980), Effects of ARMA Errors on the Significance Tests for Regression Coeffi-
cients: Comments on Vinod’s Article; Improved and Additional Results, Journal ofthe
American Statistical Association, 75, 353-358.
Laskar, M. R. and M. L. King (1995), Parameter Orthogonality and Likelihood Functions, in
Proceedings of the 1995 Econometrics Conference at Monash, Melbourne (Australia),
(C. S. Forbes, P. Kofman, and T. R. L. Fry, eds.), Monash University, 253-289.
Lehmann, E. L. (1986), Testing Statistical Hypotheses, Wiley, New York.
McCabe, B. P. M. (1988), A Multiple Decision Theory Analysis of Structural Change in Re-
gression, Econometric Theory, 4, 499-508.
Miller, R. G. J r (1981), simultaneous Statistical Inference, Springer-Verlag, New York.
Miyazaki, S. and W. E. Griffiths (1984),The Properties of Some Covariance Matrix Estimators
in Linear Models with AR(1) Errors, Economics Letters, 14,351-356.
Nankervis, J. C. and N. E. Savin (1987), Finite Sample Distributions o f t and F Statistics in
a n AR(1) Model with a n Exogenous Variable, Econometric Theory, 3,387-408.
Nelson, C. R., R. Startz, and E. Zivot (1996), Valid Confidence Intervals and Inference in the
Presence of Weak Instruments, Technical Report, Department of Economics, University
of Washington.
Newey, W. K. and K. D. West (1987), A Simple, Positive Semi-Definite, Heteroskedasticity
and Autocorrelation Consistent Covariance Matrix, Econometrica, 55, 703-708.
UNION-INTERSECTION AND SAMPLE-SPLIT METHODS 505
Park, R. E. and B. M. Mitchell(1980), Estimating the Autocorrelated Error Model with Trended
Data, Journal of Econometrics, 13, 185-201.
Phillips, G. D. A. and B. P. M. McCabe (1988), Some Applications of Basu's Independence
Theorems in Econometrics, Statistica Neerlandica, 42, 37-46
Phillips, G. D. A. and B. P. M. McCabe (1989), A Sequential Approach to Testing Econometric
Models, Empirical Economics, 14, 151-165.
Roy, S.N. (1953), On a Heuristic Method of Test Construction and Its Use in Multivariate
Analysis, Annals of Mathematical Statistics, 24,220-238.
Savin, N. E. (1984), Multiple Hypothesis Testing, in Handbook of Econometrics (Z. Griliches
and M. D. Intriligator, eds.), North-Holland, Amsterdam, Chap. 14.
Savin, N. E. and A. Wurtz (1996),The Effect of Nuisance Parameters on the Power of LM Tests
in Logit and Probit Models, Technical Report, Department of Economics, University of
Iowa, Iowa City, IA.
Srivastava, V. K. and D. E. Giles (1987), Seemingly Unrelated Regression Models, Marcel
Dekker, New York.
Staiger, D. and J. H. Stock (1997), Instrumental Variable Regression with Weak Instruments,
Econometrica, 65, 557-586.
Tippett, L. H. C. (1931), The Methods ofStatistics, Williams & Norgate, London.
Tunnicliffe Wilson, G. (1989), On the Use of Marginal Likelihood in Time Series Model Esti-
mation, Journal of the Royal Statistical Society, Series B , 51, 15-27.
Vinod, H. D. (1976), Effects of ARMA Errors on the Significance Tests for Regression Coef-
ficients, Journal of the American Statistical Association, 71,929-933.
Vinod, H. D. and A. Ullah (1981), Recent Advances in Regression Methods, Marcel Dekker,
New York.
Wang, J. and E. Zivot (1996), Inference on a Structural Parameter in Instrumental Variables
Ikgression with Correlated Instruments,' Technical Report, Department of Economics,
University of Washington.
Wooldridge, J. M. (1989), A Computationally Simple Heteroskedasticity and Serial Correla-
tion Robust Standard Error for the Linear Regression Model, Economics Letters, 31,
239-243.
Zinde-Walsh, V. and A. Ullah (1987), On the Robustness of Tests of Linear Restrictions in
Regression Models with Elliptical Error Distributions, in Time Series and Econometric
Modelling (I. B. MacNeill and G. J. Umphrey, eds.), Reidel, Dordrecht.
This page intentionally left blank
Modeling Economic Relationships
with Smooth Transition Regressions
Timo Terasvirta
Stockholm School of Economics, Stockholm, Sweden
1. INTRODUCTION
507
508 TE~SVIRTA
see Goldfeld and Quandt (1972, Section 9.2). Piecewise regression models, see for
example Ertel and Fowlkes (1976) and the references therein, belong to this cate-
gory. The general idea forms a basis for testing parameter constancy against a struc-
tural break at an unknown point as in Quandt (1960)and Andrews and Ploberger
(1994). Goldfeld and Quandt (1973) first discussed the case in which the sequence
of unobserved switching variables is an irreducible first-order Markov chain, and
Lindgren (1978) derived the maximum likelihood estimators of parameters of such
hidden Markov models.
Switching regression models may be generalized in such a way that the tran-
sition from one extreme regime to the other is not discrete but smooth. Such gen-
eralizations are the topic of this chapter. Instead of a usually (small) finite number
of regimes there exists in these generalizations a continuum of them. Bacon and
Watts (1971) first suggested such a model and coined the term “smooth transition”
to illustrate how a locally linear equation changes from the one extreme linear pa-
rameterization to the other as a function of the continuous transition variable. In the
econometrics literature, Goldfeld and Quandt (1972, pp. 263-264) presented a sim-
ilar idea. Suppose one has the following switching regression model of two variables
yt and xt:
Goldfeld and Quandt pointed out that the estimation of the parameters of (1)includ-
ing c in (2) is complicated and that the problem can be simplified by using a feasible
approximation of (2). Their suggestion was to define
i.e., to assume that the transition function is the cumulative distribution function
of the normal ( c , a2)variable and that ay = a; in (1). Replacing (2) by (3) in the
switching regression (1) defines a smooth transition regression. In the time series
literature, Chan and Tong (1986)made a similar suggestion in order to generalize a
univariate switching autoregressive or “threshold autoregressive” model, albeit not
for computational reasons. Tong (1990) contained a detailed account of threshold
autoregressive models. Maddala (1977, p. 396) proposed that (3) be replaced by the
logistic function
~ ( z ,=
) (1 + exp(S1 + ~22,})-’ (4)
MODELING
ECONOMIC RELATIONSHIPS 509
which is one of the alternatives to be considered in this chapter; see also Granger
and Terasvirta (1993) and Terasvirta (1994).
A smooth transition between two extreme regimes may be an attractive param-
eterization because the resulting smooth transition regression (STR) model is locally
linear and thus often allows easy interpretation. Also from the point of view of the-
ory, the assumption of a small number (usually two) of regimes may sometimes be
too restrictive compared to the STR alternative. For instance, instead of assuming
that an economy just has two discrete states, expansion and contraction, say, it may
be more convenient and realistic to assume a continuum of states between the two
extremes. Another argument is that economic agents may not all act promptly and
uniformly at the same moment; their response to news requiring action may con-
tain delays. Nevertheless, these two viewpoints are not competitors. As is already
obvious from Goldfeld and Quandt (1972, pp. 263-264), the two-regime switching
regression model is a special case of an STR model and can therefore be treated in
that framework.
Furthermore, an STR model may be used in the same way as the switching
regression to serve as an alternative against which to test parameter constancy in
a linear model. The alternative to parameter constancy in this framework is a con-
tinuous change in parameters, which often is statistically a more convenient case to
handle than just a single structural break. Lin and Terasvirta (1994)discussed this
possibility which requires defining zt = t in (4) while the STR model has the form
model. They first obtained the joint marginal posterior distribution for the parame-
ters of the transition function; for example in (3) those are c and 1/02.Then they
estimated the remaining parameters conditionally on the mean of this joint posterior
distribution. Later, Tsurumi (1982) applied the same idea to his “gradual switching”
simultaneous-equation model. More recent Bayesian treatments have been mainly
concerned with switching regression or threshold autoregressive models. Pkguin-
Feissolle (1994) who discussed the smooth transition autoregressive model consti-
tutes an exception. Published work on threshold autoregressive models includes Pole
and Smith (1985),Geweke and Terui (1993),Pfann, Schotman, and Tschernig (1996),
and Cook and Broemeling (1996).
This chapter is organized as follows. The STR model is defined and its prop-
erties and potential in applications are discussed in Section 11. Section I11 considers
statistical inference in STR models. This includes testing linearity which should
precede any nonlinear modeling. The modeling cycle consisting of specification, es-
timation, and evaluation of STR models is described in Section IV. Section V con-
tains an application to a U.K. house price equation considered in Hendry (1984),
and Section VI concludes.
II. S M O O T H T R A N S I T I O N REGRESSION M O D E L
wherex, = (1, xll, . . . , xpt)‘ = (1, y,-1, . . . , y,-k; z l t , . . . , z,,)’ withp = k+m is
the vector of explanatory variables, cp = (cpo, cpl , . . . , cpp)’ and 8 = (80,81,. . . ,8,)’
are parameter vectors, and {U,} is a sequence of independent, identically distributed
errors. Some of the parameters qi and 8, may be zero a priori or the restriction cpi =
-8; may hold for some i. In (6), G is a bounded continuous transition function; it is
customary to bound G between zero and unity, and S, is the transition variable. It may
be a single stochastic variable, for example, an element of xt,a linear combination
of stochastic variables or a deterministic variable such as a linear time trend. By
writing (6)as
it is seen that the model is locally linear in x, and that the combined parameter vector
+
q 8 G is a function of the transition variable sI. If G is bounded between 0 and 1,
+
the combined parameters fluctuate between cp and cp 8. In dynamic modeling, this
property makes it possible, for example, to characterize an economy with dynamic
properties in expansion being different from those in contraction. Terasvirta and
ECONOMIC RELATIONSHIPS 5 I I
MODELING
Anderson (1992) contained univariate examples of this kind. Model (6) is an example
of the smooth transition regression (STR) model discussed in the Introduction.
The practical applicability of (6)depends on how G is defined. A few defini-
tions have been suggested in the literature; see, for example, Granger and Terasvirta
(1993, Chap. 7). If G has the form
where the restrictions on y , c1, and c2 are identifying restrictions. This transition
function is symmetric about (cl +
c2)/2, and G(y, c; s t ) + 1 for st + f w . The
minimum value of G remains between 0 and 1/2, the upper limit holds for cl = c2.
On the other hand, when y + 00, G ( y , c; s,) + 0 for cl 5 st 5 cg; for other values
G ( y , c; s,) + 1. This is a special case of a three-regime switching regressions model
in which the two outer regimes are equal. The STR model (6)with transition function
(9) is called the LSTR2 model.
Jansen and Terasvirta (1996) suggested (9) as a generalization of the expo-
nential STR (ESTR) model discussed in the literature; see Granger and Terasvirta
(1993, Chap. 7). The transition function of an ESTR model is defined as
Defining st = t yields an important special case of the STR model. Then (7)
becomes
Model (11)can be interpreted as a linear model whose parameters change over time.
It contains as a special case the presence of a single structural break which has
been the most popular alternative to parameter constancy in econometric work. This
special case is obtained by completing (6) by (8)with st = t and letting y -+ 00 in
(8).Lin and Terasvirta (1994)defined another nonmonotonic transition function (see
also Jansen and Terasvirta 1996)
where y > 0, cl 5 c2 5 c3. In fact, Lin and Terasvirta (1994) defined the exponent
of (12) directly as a third-order polynomial without requiring the roots to be real. As
we shall see, this does not make any difference as far as testing parameter constancy
is concerned. On the other hand, if an STR model with (12) is to be estimated, re-
stricting the roots to be real alleviates the potential problem of very high correlation
between the estimator of 0 on the one hand and that of y and possibly c l , c2, and c3
on the other. At the same time one does not give up too much generality in the sense
that (12) still allows quite a lot of flexibility in the transition function.
Many parameter constancy tests explicitly or implicitly assume the alternative
to parameter constancy to be a single structural break. If this null hypothesis is re-
jected it is often not obvious what to do next. More information can be obtained by
using recursive tests: see, for example, Hendry (1995, Chap. 16). An advantage of
testing parameter constancy in the STR framework is that any rejection of the null
hypothesis is a rejection against a parametric alternative. In case of a rejection the
parameters of the alternative can be estimated, which helps obtain information about
where and how parameter constancy breaks down if it does. This information in turn
is helpful in deciding how the specification of the model should be improved in order
to obtain a model with constant parameters.
It is of course possible to define (12) also for other transition variables than
st = t. However, macroeconomic time series are usually not overly long. When
modeling with such series it may be advisable to restrict the order of the exponent
of the transition function to two to avoid excessive difficulties in parameter estima-
tion unless there is economic theory suggesting a higher order. This is done here
excepting the case st = t. In that case the experience has shown that there are less
problems: a heuristic explanation to that is that for st = t, superconsistency makes
the estimation of the parameters in the exponent easier in small samples than if s t is
stationary.
MODELINGECONOMIC RELATIONSHIPS5 I3
-
(although q~ and 8 are changed the previous notation is retained). In order to de-
rive the test statistic, assume that uc nid(0, a'). The conditional log-likelihood
function of the model is
T T
T
- logo'
2
--
1
20'
ZU; (14)
1=1 i=1
timated under Ho because 0 and c are then nuisance parameters whose values do
not affect the value of the likelihood. For this reason, the likelihood ratio statistic
does not have its standard asymptotic x’
distribution under the null hypothesis. The
two other classical tests, the Lagrange multiplier and the Wald test share the same
property.
Davies (1977, 1987) first discussed solutions to this problem; for later con-
tributions in the econometrics literature, see for example Shively (1988),King and
Shively (1993), Lee, White, and Granger (1993), Andrews and Ploberger (1994),
and Hansen (1996).It occurs in connection with many nonlinear models which nest
a linear model such as the STR model, the switching regression model with an un-
known switchpoint or the Hidden Markov model of Goldfeld and Quandt (1973)and
Lindgren (1978).A common feature in much of the literature is that under the null
hypothesis there is a single nuisance parameter in the model. When the STR model
is concerned there are at least two such parameters, whichever way one formulates
the null hypothesis. A way of solving the identification problem by circumventing
it is discussed for example in Granger and Terasvirta (1993, Chap. 6) and is also
considered here. It is based on the work of Saikkonen and Luukkonen (1988) and
Luukkonen, Saikkonen, and Terasvirta (1988a).
To discuss this idea, take the logistic transition function GT = G I - 1/2 and
its Taylor series approximation with the null hypothesis y = 0 as the expansion
point. The latter can be written as
Tl = 60 + 61% + M y , c ; 4 (1 5)
where R I is the remainder and 60 and 61 are constants. Substituting Tl for Gy in (13)
yields
(l/T) cf, fi:, tit is the residual estimated under the null hypothesis, zt = xt and
wt = x t s t , has an asymptotic x2 distribution with p + 1 degrees of freedom when
the moments and cross-moments implied by (17) exist. For detailed derivations, see,
for example Luukkonen, Saikkonen, and Terasvirta (1988a, 1988b) or Granger and
Terasvirta (1993, Chap. 6).
The above notation requires that st is not an element of xt. If it is then the
auxiliary regression becomes
+
yt = x : s o + (ats;)’Bz + ( P t s Q ) ’ B 3 + U:
(atdB1 (20)
where U: = uc + (xiB)Rs(y,c ; s t ) and B j = y b j , j = 1 , 2 , 3 . The LM-type test
of HA: pj = 0, j = 1, 2, 3, against H i : “at least one Bj # 0” can be constructed
as before. The test statistic is (17) with wt = (Pis,,ZLs?, Pis:)’ and the number of
degrees of freedom in the asymptotic x2 distribution under HO is 3 p . This result
requires the existence of all the moments implied by w tand (17).
When xt has a large number of elements, the auxiliary null hypothesis will
sometimes be large compared to the sample size. In that case the asymptotic x2
distribution is likely to be a poor approximation to the actual small sample dis-
tribution. It has been found out (see Granger and Terasvirta 1993, Chap. 7) that
an F-approximation to (17) works much better (the empirical size of the test re-
mains close to the nominal size while power is good). The test can be carried out
in stages:
5 I6 TERASVIRTA
choice is between the elements of Z l . Then one can define the linear combination
a‘Zl where a = (0, . . . , 1 , 0 , 0 , . . . , 0)’ is a p x 1 vector with the only unit element
corresponding to the true but unknown transition variable and substitute it for st in
(8). Proceeding as in Luukkonen, Saikkonen, and Terasvirta (1988a) leads to the
auxiliary regression
P P P P
P P
The number of degrees of freedom in the test based on (24) is p ( p + 1)/2 +p,
the null hypothesis of linearity being HA: B l i j = 0, i = 1 , 2 , . . . , j ; j = i, i +
1 , .. . , p ; /33j = 0, j = 1 , 2 , . . . , p . Of course, the choice of potential transition
variables may be restricted to only a subset of variables in z l . In that case, the null
hypothesis must be modified accordingly and the relevant coefficients in (23)or (24)
set equal to zero a priori.
The above statistical theory has the advantage that the asymptotic null dis-
tributions are standard and the tests can be carried out just by using ordinary least
squares. Although they are designed against STR they are also sensitive to other
types of nonlinearity. After rejecting linearity it may therefore not be clear what to
do next. However, one may have decided to consider STR models if linearity is re-
jected. In that case, tests based on the above auxiliary regression may be used only
for testing linearity but also, in case of a rejection, for the specification of an STR
model. This argument is elaborated in Section 1V.B.
The LM-type test statistics continue to have reasonable power in small sam-
ples when y + 00, at least when the alternative is an LSTRl model; see Luukkonen,
Saikkonen, and Terasvirta (1988a) and Hansen (1996). But then, if the alternative is
a switching regression model (it is assumed a priori that y is infinite) the above theory
does not work. Hansen (1996) recently considered a general framework for hypoth-
esis testing when the model is only identified under the alternative. His results have
bearing for switching regression models as well. Let v E N be the vector of nuisance
5 I8 TE~SVIRTA
L=c--ha
2 202
t=l
The information matrix related to (26) is block diagonal such that the element cor-
responding to the second derivative of (26) with respect to o2forms its own block.
The variance a2 can thus be treated as a fixed constant in (26) when deriving the
test statistic. The first partial derivatives of the log-likelihood with respect to a and
II/ are
Furthermore, in (28)
and
The exponential terms in (29) and (30) are bounded for y < 00. It follows that the
existence of the necessary fourth moments is required for consistent estimation of
parameters.
520 TERASVIRTA
1’. Estimate the LSTRl model by NLS under the assumption of uncorrelated
errors. Regress the residuals fit on 2, and compute the residual sum of
squares SSRo = T U,cl=, A*2
.
Step 1guarantees CiEl
T
ci,*tt= 0 and prevents size distortion of the test. This
extension of step 1can also be recommended for the other two tests discussed in this
section.
This STR model has two additive nonlinear components, and the transition func-
tion H where rt is assumed an element of x L may be defined analogously to (8)or
(9). Since testing the presence of the additional component is discussed, assume
H ( 0 , c z , r t ) = 0 for notational simplicity. When adequacy of the standard STR
model is the issue it can be investigated by testing Ho: y2 = 0 in (31). Because
of the parameterization, the additive STR model (31) is not identified under this null
hypothesis. This problem may be solved (Eitrheim and Terasvirta 1996) in the same
way as in Section 1II.A. This means that the transition function H is replaced by its
third-order Taylor approximation
T3(y2, c2; rt) = 60 + 61rt + 6 2 r ~+~63r: + R 3 ( Y 2 , c2; rt)
+
where /3j = ypp;, j = 1 , 2 , 3 , and U ; = ul (x:$)Rs(y2, cp; r t ) .The null hypoth-
esis is H;): 81 = 8 2 = 8 3 = 0 and when it holds, U : = u l . Deriving the appropriate
test statistic with the asymptotic x2 null distribution is straightforward. In practice,
the test can be carried out in the three stages described above. In the present case,
8, = (%irt,Zir:, 2;r-f)’ in stage 2, whereas & is the same as before. The degrees of
freedom in the F-statistic are 3p and T - 4p - 1, respectively. If G zs 0 in (32), the
test collapses into the linearity test discussed in Section II1.A.
In the above it is assumed that all elements of $ in (30) are nonzero. This is
not necessary, and the elements of pL included in the second nonlinear component
+
may be selected freely. On the other hand, write z i p = x’llp~ xk1p2 in (13) and
consider the case in which the parameters have been estimated under the restriction
p 2 = 0. Then (32) can be written as
If the STR does not adequately characterize the nonlinearity in the data then K ( x , )
is a nonlinear function. To investigate that possibility assume that K ( x , ) is at least
three times continuously differentiable with respect to x, and expand K ( x , ) into a
Taylor series about the expansion point xt = xe. The third-order expansion has the
form
+
where U; = ut R K ( ~ ,The
.
) . null hypothesis of no remaining nonlinearity is HA:
.
Kij = 0 , i = 1 , . . . , p ; J = 1 , . . . , P ; K i ; e = 0 , i = 1 , . . . , p ; J = i, . . . , P ; l =
j , . . . , p . The test can be carried out as before as an F-test if all the necessary mo-
ments (sixth) for x, exist. Because the maintained model is very general, the null
hypothesis is large. As a result, the test is likely not to have very good power in small
samples if p is not small. Note that if p = 1, the test is equivalent to the corre-
sponding test against STR based on the auxiliary regression (33) with r, = x l t and
8 3 = 0.
3. Parameter Constancy
Parameter constancy is one of the key assumptions of an STR model. Testing it is
therefore as important as it is in linear models. In this chapter the alternative to
parameter constancy is a set of smoothly changing parameters. Following Eitrheim
and Terasvirta (1996), the definition of parameter change is based on the idea of
smooth transition, and the developments in this section are just a generalization of
results in Lin and Terasvirta (1994).Consider the STR model
and
Again, an F-version of the test is preferred to the asymptotic theory. The degrees of
+ +
freedom of the F-statistic are 3(po p l ) and T - 4 ( p o p l ) , respectively.
If the alternative to parameter constancy is characterized by H2 then 8 3 = 0
and 87 = 0 in (41), and the null hypothesis is reduced accordingly. If only monotonic
change is considered to be the alternative to stable parameters, then the transition
function is the logistic function (37)and one has 8 2 = 8 3 = 0 and 86 = 8 7 = 0 in
(41).Even this test may be restricted to cover only certain parameters of the model
while the remaining ones are constant. This is done by rewriting (36)as
4. Additional Remarks
The tests discussed in this and the preceding subsection as well as the linearity tests
of Section 1II.A have been designed against parametric alternatives. There exists a
wide selection of other tests that are used in connection with nonlinear modeling and
testing for structural change; see for instance Granger and Terasvirta (1993,Chap.
6)for an account. Many of those tests do not have a specific alternative hypothesis
although some of them, such as RESET (Ramsey 1969)may be interpreted as LM
tests of linearity against a parametric nonlinear alternative. They have rather been
intended as general tests of either linearity or parameter constancy in linear models.
Nonparametric tests surveyed in Tjflstheim (1994)form a large subset of this class
of tests. Because the focus in this chapter is on econometric modeling with STR
models these other tests have not received the attention that they otherwise would
deserve.
MODELINGECONOMIC RELATIONSHIPS525
should be the transition variable. Third, after estimating an STR model, the ade-
quacy of the model has to be checked. In order to deal with these issues, Terasvirta
(1994) proposed a modeling cycle for univariate STAR models consisting of specifi-
cation, estimation, and evaluation stages. This was an application of ideas in Box and
Jenkins (1970), who developed such an approach to constructing ARIMA models.
Granger and Terasvirta (1993, Chap. 7) extended the STAR modeling cycle to STR
models. This cycle is the topic of the present section. The misspecification tests of
estimated STR models introduced in Eitrheim and Terasvirta (1996)and discussed
in the previous section have not become available until recently and are thus new
compared to previous presentations of the STAR or STR modeling cycle. The three
main stages of the cycle will be considered separately. The use of the encompassing
principle to compare an STR model with its rivals explaining the same phenomenon
is another addition not discussed before.
carry out the tests and apply the following decision rule. If the reject’on of Ho3 S
the strongest one, choose an LSTR2 model (or an ESTR) model, otherwise select an
LSTRl model. The coefficient vectors p j , j = 1, 2 , 3 , are functions of the parameters
of the original STR model and they depend on the type of the model. The selection
rule is based on this fact; for details see Granger and Terasvirta (1993, Chap. 7) or
Terasvirta (1994).
There exists another, computationally slightly heavier but still very practica-
ble strategy. Consider the original STR model (6).Giving fixed values to the parame-
ters in the transition function makes (6)linear in parameters. These parameters can
be estimated by OLS. Construct a two-dimensional (LSTRl) and three-dimensional
(LSTRZ) grid of y and c and estimate the other parameters for these combinations
of y and c . In order to be able to choose a meaningful set of values of y , the expo-
nent of the transition function should be standardized; see the discussion in the next
subsection. A reasonable set of values of c may be selected between the observed
minimum and maximum values of the transition variable. Estimate the models for
both LSTRl and LSTR2 and select between these alternatives after comparing the
fit of the best-fitting LSTRl and LSTRZ models. This procedure can also be used to
obtain initial estimates for the NLS estimation and to reduce the size of the model
by imposing exclusion restrictions. An illustration can be found in the empirical ex-
ample of Section V. Finally, the choice between LSTR2 and ESTR can be made after
estimating an LSTR2 model by testing c1 = c2 within that model.
22 (LSTR2), far outside the observed range of the transition variable is often a sign
of convergence to an infeasible local minimum and thus an indication of an inade-
quate model. In that case, however, the model usually also fails at least some of the
misspecification tests discussed above. However, in the case of an LSTR2 model, if
either i.1 or 22 lies far outside the observed range of the transition variable while p
is not small, it may also indicate that an LSTRl model is a better choice than an
LSTR2 one.
If the model fails the test of no error autocorrelation, respecification seems the
only feasible solution. This is probably the most common route to follow also when
the model badly fails the tests of no additional nonlinearity. When the STR model
does not have constant parameters, respecification is an obvious solution as well.
But there exists at least one special case in which one might actually want to have
another STR component to accommodate and parameterize such nonconstancy. This
is when the model contains seasonal dummies and their coefficients seem to change
over time. There may not be economic reasons for such a change but seasonality
may change slowly anyway because of changing institutions. This is not an uncom-
mon situation in macroeconometric models. It has not been frequently accounted for
in practice, perhaps because the econometricians have generally seen a single struc-
tural break as the most interesting alternative to parameter constancy. The paper by
Farley, Hinich, and McGuire (1975) was an early exception to this rule. The STR
framework offers a possibility of parameterizing such a continuous change; see the
example of the next section. For other macroeconometric applications of this idea,
see for example Jansen and Terasvirta (1996)and Lutkepohl, Terasvirta, and Wolters
(1995). Gradual changes in institutions affecting other things than seasonality may
also be modelled using the STR approach as in Heinesen (1996).
the rival model is also an STR model then the MNM may be an additive STR model.
The M N M trivially encompasses the two original models because they are nested
in it; see Hendry (1995, p. 511). If an MNM can be constructed, one can apply a
simplification encompassing test (Mizon and Richard 1986)to see if the STR model
parsimoniously encompasses the MNM. If the rival model is linear then the test con-
sists of testing the null hypothesis that the corresponding linear component does not
additively enter the nonlinear MNM model. Such a test was discussed in Section
III.B.2 when the idea was to test exclusion restrictions on coefficients of the linear
component of an STR model. Accepting this hypothesis is equivalent to accepting
that the STR model encompasses the MNM. Finding out if the linear model parsimo-
niously encompasses the MNM is tantamount to testing linearity within an M N M of
STR type. This test has to be carried out using the techniques discussed in Section
1II.A because an STR type MNM is not identified under the null hypothesis. Suppose
that this is done and the null hypothesis rejected. Then the conclusion is that the lin-
ear model does not encompass the MNM. If the STR model does, then the transitivity
property of encompassing implies that the STR model encompasses its linear rival.
The use of this testing procedure requires that both the STR equation and its single-
equation rival are valid models in that they can be analyzed without knowledge of the
rest of the system. An example of simplification encompassing tests can be found in
Section V.
V. APPLICATION
A. Background
This section contains an example of the modeling cycle consisting of specification,
estimation and evaluation of STR models. It is based on the data set and results in
Hendry (1984), who modeled house prices in the United Kingdom in 1960-1981
using error-correction models. For another econometric analysis of this data set, see
Richard and Zhang (1996).The purpose here is not to present a new model for UK
house prices: in order to do that the first thing would be to extend the time series
as close to the present time as possible. The main objective is instead to use the
equation for house price expectations Hendry (1984) specified and estimated as a
benchmark and see if the STR approach based on that equation and the same ob-
servation period leads to any new insight or yields an improved specification. This
provides an opportunity to show how the: STR modeling strategy works in practice.
The period Hendry (1984)considered was eventful. The nominal house prices
increased 12-fold and the real prices by over 50%. The nominal prices were also
clearly more volatile than the ordinary retail price index. These and other features of
the observed time series are discussed in Hendry (1984). The paper also contains a
description of the way the housing market functioned in the United Kingdom during
MODELINGECONOMIC RELATIONSHIPS 53 I
the observation period. The theoretical model in the paper is based on the assumption
that the housing stock H , evolves intertemporally according to
where 6, is the depreciation rate and C, denotes net additions. At any time, C, is
very small relative to H,-l so that H,-1 is taken to be the fixed supply of housing in
the short run. As a result, the fluctuations in demand translate into fluctuations in
the price of housing, Ph,. Thus (see Hendry 1984, pp. 224-225 for a more complete
argument), the demand equation is the one determining the price of housing. Its
general form is postulated as
H D = f(Ph/P, Y , p , R ,R m , M , T , N , F )
- + - - - + Y + ?
where P is the general price level, Y the real income, p the real rental rate, R the
market interest rate, and Rrn is the mortgage rate of interest, which for institutional
reasons may differ substantially from the market interest rate. Furthermore, M is the
stock of mortgages, T the tax rate, N the size of the population, and F the average
family size. Changes in the real price of housing, the real rental rate and the interest
rates have a negative effect on demand. Changes in the real income, the stock of
mortgage, and the size of the population have the opposite effect. The tax rate and the
average family size also affect the demand for housing, but the sign is indeterminate.
In the following the focus will be solely on modeling price expectations which
Hendry (1984)discussed in detail. The expectations are assumed unbiased:
where A is the difference operator, ph, the logarithmic nominal price of housing (low-
ercase letters denote logarithms), ph; the corresponding expectation, and uLan inno-
vation with respect to the information used in predicting Aph,: E u , = 0, var(u,) =
o:,cov(u,, U,) = 0, .r # t . Equation (11) in Hendry (1984, p. 228) gives the expec-
tations the following parametric form:
2
Aphr = CjAphL-1 - q ( R " - A4p)L-I + cq.Am,-~
i= 1
+
- ~ 5 ( p h h - p - y - C,L),--I + c6(m - ph - h - cb)i-l
where RP is the after-tax interest rate, c,, and C b are constants, and c , , i = 1 , . . . , 6 ,
are unknown parameters. The expectation equation contains two error correction
terms. The first one requires the nominal value of housing to stand in constant pro-
portion to nominal income in the long run. The second error correction term implies
+
a constant long-run ratio of mortgage ( m )to own equity (ph h); H is the housing
stock.
532 TERASVIRTA
*The estimated equation is not exactly the same as Eq. (18)in Hendry (1984).The differences are due to
the fact that the data set Hendry (19%)used was no longer available in its original form. The data set
used in this chapter 15 as close to the original data a\ possible.
ECONOMIC RELATIONSHIPS 533
MODELING
Maximum lag q
Test 1 2 3 4 6
~~~~~~
Table 2 p-Values of Tests of Linearity of the Linear (Aph,-l not cubed) U.K. Housing Price Equation against STR, Transition Variable
Assumed Known
Null hypothesis
Parameter
constancy test (1) (2) (3) (4)
0.090 0.026 0.85 0.42
0.19 0.056 0.54 0.50
0.37 0.24 0.78 0.53
(1): Ho: “All parameters except the coefficients of and 11: are constant.”
(2): Ho: “Intercept and the coefficients of seasonal dummy variables are con-
stant.”
(3): Ho: “Coefficients of Aph,-l and Aph,-p are ronstant.”
(4):Ho: “Coefficients of ‘exogenous’ variables are constant.”
Notes: (1)The parameters not under test are assumed constant also under the
alternative. (2) Test 4 is a test against an STR model with transition
function H,, j = 1, 2 , 3 ; see Section III.B.3 and definitions (23)-(25).
Linear Standard
coefficient of Estimate deviation t-Value
Nonlinear
coefficient of
P 9.5
E 0.07
0.87
0.104
ECONOMIC RELATIONSHIPS 537
MODELING
Maximum lag y
Test 1 2 3 4 6
where &(Aph,-l) is the sample standard deviation of Aph,-l and 8; the residual
variance of the corresponding linear model. The residual variance of (47) is only
about 70% of that of the corresponding linear model. Results of the LM test of no error
autocorrelation in Table 5 do not indicate autocorrelation, nor is there any evidence
of ARCH. Large skewness and excess kurtosis estimates are mainly due to a large
positive residual in 1964(1);see Figure 1.
-0.04 4 *.
* ..
1959(1)
. ' ' .
1963( I ) 1967( 1) 1971(1)
. . . . . . . ' . . . . . . . .. . . .
1975(1)
- -
1979( I )
.. ... ..
Quarter
Figure I Residuals from the linear model with a cubic lag (46)and from the STR model (47)
for the first differences of the logarithmic UK housing price index, 1959(1)-1982(2).
price turbulence in 1973.The linear equation strengthened by the cubic lag does not
explain the features of the housing price boom and its immediate aftermath as well as
(47).Apart from that period both models have almost identical fits. It seems obvious
that the nonlinear specification is mainly required to model the exceptional increase
in house prices in 1973. This is also seen from Figure 2. It shows that the transition
function obtains values close to zero most of the time. A comparison between (46)
and the linear part of (47)indicates that they are quite similar. For those periods (46)
and (47)thus may be expected to have rather similar residuals.
1 .o
0.8
0.6
0.4
0.i
Figure 2 Values of the transition function of the STR model (47), 1959(1)-1982(2).
MODELING
ECONOMIC RELATIONSHIPS 539
It is instructive to find out how the nonlinear error correction (nec,) works in
1973. From (47),
nec, = 0.38 + 3.OGl(P,2; Aphl-l)
+ (0.13 + O.6OG1(P, 2; Aphl-l)}(rn - ph -
where
The graph of (48) over the observation period appears in Figure 3. When a large
price shock arrives (Aph, obtains a large value) it causes a sharp increase in (48)
+
one period later through the combined intercept 0.38 3.OG1. The error-correcting
combination (rn - ph - h ) has negative values throughout so that an increase in
the value of the transition function initially weakens the error correction. This initial
effect is soon offset by a large change in (rn - p h --h),-1 as the value of the mortgage
stock does not follow the rapid increase in the value of the housing stock. Because
of the large positive nonlinear coefficient (0.60)of (rn - ph - h)l-l in (48)the pull
toward the equilibrium increases dramatically and eventually suppresses the price
boom.
The STR model (47) seems to explain the dynamics of the unusually large
increase in housing prices but does it explain all nonlinearity in the data? Table 6
contains the results of the tests of no additive nonlinearity against an additive STR
model considered in Section III.B.2. Note that Aph,-l is included in the second
nonlinear component although it only appears in the transition function of (47).When
the test is carried out with Aph,-l as the transition variable, the p-value of the test
equals 0.11. This indicates that the nonlinearity causing a very low p-value for the
540 TERASVIRTA
Table 5 p-Values of Tests of No Additive N ~ n l i n e ~ rin ~ LSTRl Model (47) for a Set of Transition Variables
i t the
Null hypothesis
Parameter
constancy test (1) (2) (3) (4) (5)
Fl 0.022 0.017 0.61 0.0078 0.71
F2 0.19 0.15 0.84 0.038 0.87
F3 0.30 0.19 0.96 0.12 0.71
~ ~~~~~~
corresponding linearity test (Table 3) has been dealt with in a satisfactory manner.
Another result pointing at the same direction is that the test with F13(rn - p ) as the
transition variable has p-value 0.17, whereas the corresponding linearity test had a
low value. On the other hand, the test with A2(Ayt) as the transition variable now
has a p-value close to 0.01, but this order of magnitude is considerably higher than
that of the lowest p-values in the linearity tests. As at the same time all the other
tests have p-values exceeding 0.1, this result does not cause too much concern. As a
whole, it can be concluded that the STR model (47) explains most of the nonlinearity
present in the data.
Parameter constancy tests of the linear model indicated some nonconstancy
although the result could also have been interpreted as an effect of neglected non-
linearity on these tests. Table 7 contains results of the parameter constancy tests de-
scribed in Section 1II.B. The results suggest that despite careful parametrization of
nonlinearity the parameter nonconstancy is still a problem. It seems obvious that sea-
sonality in U.K. house prices has been changing over time. Furthermore, the change
seems to have been monotonic during the observation period because Fl is the test
with the strongest rejection of the null hypothesis just as it was in the linear model.
this model the second transition function has time as the transition variable. A spec-
ification search indicated that the second and the third quarters have to be included
in the additional nonlinear component of the model. The estimated equation has the
form (note that the second transition variable t / T is standardized between 0 and 1)
I .o
0.8
0.8
0.4
0.1
0.a
19!59(1) I963(1)
LYC'.....' ..''...'.......'..,..,.........
1967(1) 1971(1) 1975( 1) 1979(1)
Ouorter
Figure 4 Values of the transition function of seasonal dummy variables in the STR model
(49), 1959(1)-1982(2).
0'015
0.040 I
0 035
0 030
0 025
0 020
0015
0010
0 005
0 000 Y
Ouorter
Figure 5 Values of the second and third quarter time-varying seasonal effects (second quar-
ter = solid line, third quarter = dashed line) according to Eq. (49), 1959(1)-1982(2).
544 TERASVIRTA
In order to find out whether or not the large uncertainty in the estimates of sea-
sonal parameters is mainly due to overparameterization the second logistic transition
function in (49) is replaced by the linear approximation ( t / T ) and the parameters
reestimated. This yields
The estimates of other parameters than the seasonal dummies remain practically
unchanged. The seasonality seems indeed to be changing over time. The amount of
uncertainty in the parameter estimates is considerably less than in (49) and AZC
smaller. The seasonals in the linear part of (50) could even be removed altogether;
that is, in the beginning of the period there has been little or no seasonality in house
prices. For illustration, however, those variables have been retained in the model.
The test results for models (49) and (50) are very close to each other. Those
for (SO) are reported. Results of testing the hypothesis of no remaining nonlinear-
ity against the same alternative as previously can be found in Table 8. They are
rather similar to those for model (47). The only conspicuous difference is that the
test against LSTR with (rn - ph - h)c-l as the transition variable now has a rel-
atively small p-value. Because most of the other tests have clearly higher p-values
(even the one with A2(yL) as the transition variable) the current specification is ten-
tatively accepted.
Results of the parameter constancy tests can be found in Table 9. The null
of constancy is not rejected in any of the tests. The parameterization of changing
MODELING
ECONOMIC RELATIONSHIPS 545
Table 8 p-Values of Tests of No Additive Nonlinearity in the LSTRl Model (SO)for a Set of Transition Variables
Null hypothesis
Parameter
constancy test (1) (2) (3) (4) (44 (5)
(1): Ho: “All parameters except the coefficients of Dy and D; are constant.”
(2): Ho: “All parameters in the linear part of the model except the coefficients of DY and
Di are constant.”
(3): Ho: “All parameters in the nonlinear part of model are constant.”
(4): Ho: “Intercepts and coefficients of the seasonal dummy variables are constant.”
(4a): Ho: “Linear intercept and coefficients of the seasonal dummy variables are constant.”
(5): Ho: “All parameters in the linear part of the model except the coefficients of the dummy
variables are constant .”
“Test not computed due to near-singularity of the moment matrix.
Notes: (1) The parameters not under test are assumed constant also under the alternative.
(2) Test I$ is a test against an STR model with transition function H,, j = 1, 2 , 3 ;see
Section III.B.3 and definitions (23)-(25).
seasonality has removed the variation in the coefficients of seasonal dummy vari-
ables. According to (50),U.K. housing price changes now have a trending seasonal
component. However, it is not very realistic to extend the conclusion far outside the
sampling period. In fact, model (49) should be preferred to (50) as far as the in-
terpretation of the “trend” is concerned. The change one observed just happens to
lie within an interval for.which the logistic transition function is almost linear. Of
course, even the interpretation that Eq. (49) offers may turn out to be incorrect in
the light of any new data, but at any rate, the assumption of a long-run linear trend
in seasonality is hardly a plausible one.
E. Encompassing Tests
An objective of this application is to find out if the specification of the price ex-
pectations equation (46) can be improved by applying STR models. Equation (50)
is an STR model, and the question is whether or not it can be viewed as an im-
provement over (46).This can be investigated by encompassing tests as discussed
in Section 1V.E. The task is to investigate whether the STR model (50) encompasses
(46) or not and vice versa. To find out if (50) encompasses (46) the first step is to
construct an MNM. This is done by augmenting (50) linearly by the cubic lag of
the price change ( A & - I ) ~ . The augmented model trivially encompasses (50) be-
MODELING
ECONOMIC RELATIONSHIPS 547
cause the latter model is nested in it. In order to see whether (50) parsimoniously
encompasses the M N M one estimates the parameters of the M N M and computes the
likelihood ratio statistic
In this chapter the emphasis is on showing how STR models can be applied to model-
ing problems in time series econometrics. It is demonstrated how the actual modeling
is carried out in a systematic fashion through a modeling cycle. This cycle can be
repeated until an adequate model passing the available diagnostics is found. Alter-
natively, the cycle may be terminated by concluding that the family of STR models is
548 TERASVIRTA
not an appropriate one for empirical modeling of the economic relationship in ques-
tion. The central role of hypothesis testing in STR modeling becomes clear from the
text. First, it is important to single out the linear cases from nonlinear ones. But test-
ing is also an essential part of model specification and evaluation. The validity of
assumptions underlying the STR model are investigated by tests after estimating the
parameters of the model as well as the question whether or not an estimated STR
model is an improvement over previous models in the literature.
The STR can be used, as in the example of Section V, for modeling economic
relationships between variables. Another role of the STR model is that it constitutes
a feasible alternative to important null hypotheses concerning linear models. The
null hypothesis of parameter constancy is one: in that case the alternative to con-
stant parameters are continuously changing parameters. As Section V shows, this
is a useful alternative, for example, in testing the constancy of the pattern of sea-
sonal fluctuations. Furthermore, although this possibility has not been discussed in
this chapter in any detail, the STR model provides a convenient framework for joint
testing of weak exogeneity and a restricted form of invariance, that is, for testing su-
perexogeneity. Finally, the additive STR model may be used for testing for Granger
causality in the presence of STAR-type nonlinearity.
The STR model considered in this chapter is a single-equation model. In the-
ory, the idea of smooth transition can be extended to systems of equations. This can
be done in many ways of which the ones with practical significance still need to be
sorted out. Anderson and Vahid (1995) recently searched for common nonlineari-
ties between variables, using a vector STAR model. In general, however, there is as
yet little empirical experience available of such systems and more is needed. In the
meantime, additional applications of the single-equation STR model are also neces-
sary to learn more about how the proposed modeling strategy works in practice and
to find out ways of improving it and developing it further.
ACKNOWLEDGMENTS
REFERENCES
AlbEk, K. and H. Hansen (1995), Estimating Aggregate Labour Market Relations, mimeo.,
Department of Economics, University of Copenhagen.
Anderson, H. M. (1997),Transaction costs and Nonlinear Adjustment towards Equilibrium in
the US Treasury Bill Market, Oxford Bulletin of Economics and Statistics, forthcoming.
Anderson, H. M. and F. Vahid (1995),Testing Multiple Equation Systems for Common Non-
linear Components, mimeo., Texas A & M University.
Andrews, D. W. K. and W. Ploberger (1994), Optimal Tests When a Nuisance Parameter Is
Present Only under the Alternative, Econometrica, 62, 1383-1414.
Bacon, D. W. and D. G. Watts (1971), Estimating the Transition Between Two Intersecting
Straight Lines, Biometrika, 58, 525-534.
Banerjee, A., J. Dolado, J. W. Galbraith, and D. F. Hendry (1993), Co-integration, Error-
Correction, and the Econometric Analysis of Non-stationary Data, Oxford University
Press, Oxford.
Bates, D. M. and D. G. Watts (1988).Nonlinear Regression Analysis a n d h Applications, Wiley,
New York.
Bell, D., J. Kay, and J. Malley (1996), A Nonparametric Approach to Nonlinear Causality
Testing, Economics Letters, 51, 7-18.
Box, G. E. P. and G. M. Jenkins (1970),Time Series Analysis, Forecasting and Control, Holden-
Day, San Francisco.
Chan, K. S. and H. Tong (1986),On Estimating Thresholds in Autoregressive Models, Journal
of Time Series Analysis, 7, 178-190.
Chiarella, C., W. Semmler, and L. Koqkesen (19%), The Specification and Estimation of a
Nonlinear Model of Real and Stock Market Interaction, mimeo., Department of Eco-
nomics, New School for Social Research, New York.
Cook, P. and L. D. Broemeling (1996),Analyzing Threshold Autoregressions with a Bayesian
Approach, in T. Fomby (ed.), Advances in Econometrics, Vol. 11, Part B: Bayesian Meth-
ods Applied to Time Series Data, JAI Press, Greenwich CT, 89-107.
Coutts, J. A., T. C. Mills, and J. Roberts (1995),Parameter Stability in the Market Model: Tests
and Time Varying Parameter Estimation with U.K. Data, mimeo., Sheffield University
Management School.
Davies, R. B. (1977),Hypothesis Testing When a Nuisance Parameter Is Present Only under
the Alternative, Biometrika, 64,247-254.
Davies, R. B. (1987),Hypothesis Testing When a Nuisance Parameter Is Present Only under
the Alternative, Biometrika, 74, 33-44.
Eitrheim, 0.and T. Terasvirta (1996),Testing the Adequacy of Smooth Transition Autoregres-
sive Models, Journal of Econometrics, 74, 59-75.
Engle, R. F. (1982),Autoregressive Conditional Heteroskedasticity with Estimates of the Vari-
ance of the U.K. Inflation, Econometrica, SO, 987-1008.
Ericsson, N. R. (1992),Cointegration, Exogenctity and Policy Analysis: An Overview, JournaZ
of Policy Modeling, 14,251-280.
Ertel, J. E. and E. B. Fowlkes (1976), Some Algorithms for Linear Spline and Piecewise
Multiple Linear Regression, Journal of the American Statisticul Association, 71,
640648.
550 TERASVIRTA
Escribano, A. and S. Mira (1995), Nonlinear Time Series Models: Consistency and Asymptotic
Normality of NLS under New Conditions, Universidad Carlos I11 d e Madrid, Statistics
and Econometrics Series 14, Working Paper 95-42.
Fair, R. C. and D. M. Jaffee (1972), Methods of Estimation for Markets in Disequilibrium,
Econometrica, 40,497-514.
Farley, J. U., M. Hinich, and T. W. McGuire (1975), Some Comparisons of Tests for a Shift
in the Slopes of Multivariate Linear Time Series Model, Journal of Econometrics, 3,
297-3 18.
Geweke, J. (1984), Inference and Causality in Economic Time Series Models, in Z. Griliches
and M. D. Intriligator (eds.), Handhook of Econometrics, Vol. 2, North-Holland, Ams-
terdam, 1101-1 144.
Geweke, J. and N. Terui (1993),Bayesian Threshold Autoregressive Models of Nonlinear Time
Series, Journal of Time Series Analysis, 14, 441-454.
Goldfeld, S. M. and R. E. Quandt (1972), Nonlinear Methods in Econometrics, North-Holland,
Amsterdam.
Goldfeld, S. M. and R. E. Quandt (1973), A Markov Model for Switching Regression, Journal
of Econometrics, 1, 3-16.
Granger, C. W. J. (1969), Investigating Causal Relations by Econometric Models and Cross-
Spectral Methods, Econometrica, 37, 424-438.
Granger, C. W. J. (1981), Some Properties of Time Series Data and Their Use in Econometric
Model Specification, Journal ofEconometrics, 16, 121-130.
Granger, C. W. J. and T. Terasvirta (1993, Modelling Nonlinear Economic Relationships, Ox-
ford University Press, Oxford.
Granger, C. W. J., T. Terasvirta, and H. Anderson (1993), Modeling Nonlinearity over the
Business Cycle, in J. H. Stock and M. W. Watson (eds.), Business Cycles, Indicators,
and Forecasting, University of Chicago Press, Chicago, 31 1-325.
Hansen, B. E. (1996), Inference When a Nuisance Parameter Is Not Identified under the Null
Hypothesis, Econometrica, 64,413-430.
Hatanaka, M. (1996), Time-Series-Based Econometrics. Unit Roots and CO-integration, Oxford
University Press, Oxford.
Heinesen, E. (1996), The Tax Wedge and the Household Demand for Services, mimeo., Insti-
tute of Local Government Studies, Copenhagen.
Hendry, D. F. (1984), Econometric Modelling of House Prices in the United Kingdom, in D. F.
Hendry and K. F. Wallis (eds.), Econometrics and Quantitative Economics, Blackwell,
Oxford, 211-252.
Hendry, D. F. (199S),Dynamic Econometrics, Oxford University Press, Oxford.
Jansen, E. S. and T. Terasvirta (1996), Testing Parameter Constancy and Super Exogeneity in
Econometric Equations, Oxjord Bulletin ($Economics and Statistics, 58, 735-763.
Jarque, C. M. and A. K. Bera (1980), Efficient Tests for Normality, Homoscedasticity, and
Serial Independence of Regression Residuals, Economics Letters, 6,255-259.
Johansen, S. (1995),Likelihood-Based Inference in Cointegrated Vector Autoregressive Models,
Oxford University Press, Oxford.
Judge, G. G., W. E. Griffiths, R. C. Hill, H. Liitkepohl, and T.-C. Lee (1985), The Theory and
Practice of Econometrics, 2nd ed., Wiley, New York.
King, M. L. and T. S. Shively (1993), Locally Optimal Testing When a Nuisance Parameter Is
Present Only under the Alternative, Review of Economics and Statistics, 75, 1-7.
ECONOMIC RELATIONSHIPS 55 I
MODELING
Lee, T.-H., H. White and C. W. J. Granger (1993), Testing for Neglected Nonlinearity in Time
Series Models: A Comparison of Neural Network Methods and Alternative Tests, Jour-
nu1 of Econometrics, 56,269-290.
Lin, C.-F. and T. Terasvirta (1994), Testing the Constancy of Regression Parameters Against
Continuous Structural Change, Journal ojEconometrics, 62,211-228.
Lindgren, G. (1978), Markov Regime Models for Mixed Distributions and Switching Regres-
sions, Scandinavian Journal of Statistics, 5,81-91.
Lomnicki, Z. A. (1961), Test for Departure from Normality in the Case of Linear Stochastic
Processes, Metrika, 4 , 3 7 4 2 .
Lutkepohl, H., T. Terasvirta, and J. Wolters (199S), Investigating Stability and Linearity of a
German M 1 Money Demand Function, Stockholm School of Economics, Working Paper
Series in Economics and Finance No. 64.
Luukkonen, R., P. Saikkonen, and T.Terasvirta (1988a), Testing Linearity against Smooth
Transition Autoregression, Biometrika, 75, 491-499.
Luukkonen, R., P. Saikkonen, and T. Terasvirta (1988b), Testing Linearity in Univariate Time
Series, Scandinavian Journal ofstatistics, 15, 161-17s.
Maddala, D. S. (1977), Econometrics, McGraw-Hill, New York.
Maddala, D. S.(1986), Disequilibrium, Self-Selection, and Switching Models, in Z. Griliches
and M. D. Intriligator (eds.), Handbook ofEconometrics, Vol. 3, North-Holland, Ams-
terdam, 1633-1688.
McLeod, A. I. and W. K. Li (1983), Diagnostic Checking ARMA Time Series Models Using
Squared Residuals, Journal oj’Time Series Analysis, 4,269-273.
Mizon, G. E. and J.-F. Richard (1986). The Encompassing Principle and Its Application to
Non-nested Hypothesis Tests, Econometrica, S4,657-678.
Peguin-Feissolle, A. (1994), Bayesian Estimation and Forecasting in Nonlinear Models. Ap-
plication to an LSTAR model. Economics Letters, 46, 187-194.
Petruccelli, J. D. (1990), A Comparison of Tests for SETAR-Type Non-linearity in Time Series,
Journal of Forecasting, 9,25-36.
Pfann, G. A., P. C. Schotman, and R. Tschernig (1996), Nonlinear Interest Rate Dynamics and
Implications for the Term Structure, Journd of Econometrics, 74, 149-176.
Pole, A. M. and A. F. M. Smith (1985), A Bayesian Analysis of Some Threshold Switching
Models, Journal ofEconometrics, 29,97-119.
Quandt, R. E. (1960), Tests of the Hypothesis That a Linear Regression System Obeys
Two Separate Regimes, Joiirnd oj’ the American Statistical AssocicLtion, 55, 324-
330.
Quandt, R. E. (1984), Computational Problems and Methods, in Z. Griliches and M. D.
Intriligator (eds.), Handbook of Econometrics, Vol. 1, North-Holland, Amsterdam,
699-764.
Ramsey, J. B (1969), Test for Specification Errors in Classical Linear Least-Squares Regres-
sion Analysis, Joiwnd ojthe Royal Statistical Society B, 31, 350-371.
Richard, J.-F. and W. Zhang (1996), Econometric Modeling of UK House Prices Using Ac-
celerated Importance Sampling, (Ixford Bulletin ofEcon.omics and Statistics, 58,601-
613.
Saikkonen, P. and R. Luukkonen (1988), Lagrange Multiplier Tests for Testing Non-linearities
in Time Series Models, Scandinclvim Joiirnal of Statistics, 15, 55-68.
Seber, G. A. F. and C. J. Wild (1989), Nonlinear Regression, Wiley, New York.
552 TERASVIRTA
1. INTRODUCTION
This chapter surveys issues concerning seasonality in economic time series. An elab-
orate discussion on a formal definition of seasonality is given in, e.g., Hylleberg
(1986, 1992).Here I loosely refer to seasonality as the variation in time-series data
that displays a certain regularity corresponding with the measurement interval. For
example, for quarterly data one may consider the annually recurring positive or neg-
ative peaks in certain quarters as seasonal fluctuations. Furthermore, the observa-
tion that stock returns on Mondays seem more volatile than those on other weekdays
concerns seasonality too, that is, seasonality in variance.
In many cases, seasonality in economic time series is due to weather or insti-
tutional factors. An example of the latter is that school holidays are fixed by local
governments, and hence one may expect tourism spending to be high in the cor-
responding season. Another example is that the deadline for companies to publish
their annual reports can be dictated by law. Right after that deadline, one may expect
more volatility in stock markets in case the news differs from the expectations, and
one may also observe changes in key macroeconomic figures such as consumer confi-
dence indicators. Hence, part of the seasonal variation may be roughly constant over
time, since for example it is unlikely that Christmas will move to other months, but
another part of seasonality may change because of changes in institutional factors.
Finally, seasonal patterns can also change because economic agents start to behave
in a different way. In fact, if Mondays would always display high volatility, one would
be able to make money through derivatives. Hence, because of the so-called week-
end effect, one may expect a high volatility on Mondays, but this feature is unlikely
553
554 FRANSES
to be constant over time, see, e.g., Franses and Paap (1995). Another example is that
seasonal labor supply may make the unemployment rate to display more seasonality
in the expansion stage.
In this chapter I will focus on statistical models that can describe and forecast
economic time series with seasonal variation which changes over time. The running
examples in this chapter to be used for illustration are taken from macroeconomics,
tourism, marketing, and finance. A dominant approach in macroeconomics is to sea-
sonally adjust the data prior to analysis; that is, one assumes that seasonality is not an
interesting data feature and should be removed. In most fields of economics, however,
seasonality is considered important since it can convey information on, for example,
the behavior of economic agents (Ghysels 1994a). In this chapter I concur with this
view and I will therefore not consider seasonal adjustment, and confine myself to
models that explicitly incorporate descriptions of seasonality. I refer the reader in-
terested in seasonal adjustment to the surveys in Hylleberg (1992), Bell and Hillmer
(1984),and Maravall (1995)inter alia.
The outline of this chapter is as follows. Section I1 gives some summary statis-
tics for four sample series. These statistics mainly show that seasonality does not
appear constant over time, and that any changes do not occur quickly. Hence, sea-
sonality seems to change slowly over time. Section I11 reviews the two approaches
which are nowadays commonly used in many applications, i.e., univariate and multi-
variate models that incorporate seasonal unit roots and seasonal parameter variation.
Due to space limitations, I only highlight some of the key features of these models
and refer the interested reader to the surveys in Hylleberg (1994), Franses (1996a,
b), and to the specific studies mentioned here. In Section IV, I discuss further topics
of research. Section V concludes with some remarks.
In this section I discuss some time series features of four sample series. The first time
series is real consumption nondurables in the United Kingdom, which is observed
quarterly for 1955.1-1988.4. The source of these data is described in Osborn (1990).
The graph of this series is displayed in Figure 1. It is clear that the data display an
upward-moving trend, which seems sometimes hampered by shocks, especially those
around 1974 and 1979. Furthermore, seasonal variation seems a dominant source of
variation. As is usual for economic time series, these data are transformed by taking
natural logarithms.
The second quarterly time series is depicted in Figure 2 and it concerns the
unemployment rate in Federal Germany for 1962.1 to 1991.4. The source of these
data is the OECD Main Economic Indicators. The graph in Figure 2 shows that un-
employment seems to increase rapidly around the recession periods 1967,1974, and
1980-1982. The decline in unemployment occurs more slowly, and hence this series
SEASONALITY IN TIMESERIES
MODELING 555
, , , , l , , . , i , , , ~ i ~ , , ,, , , , , , , , , , , ,
1955 1960 1965 1970 1975 1980 1985
12.5
10.o
7.5
5 .O
2.5
0.o l ~ l ' i ' i I ' ' l ~ i ~ l 1 ' 1 l' 1 0 1 ' 1 ' 1 '
2 64 66 68 70 72 7 4 76 78 80 82 84 86 88 90
17.o
16.5
16.O
15.5
15.04 , , , , , , , , , ,
79 80 81 82 83 84 85 86 87 88 89 90 91 92 93
S
data are regressed on seasonal dummies (A1yl - psDs,l),some of the large
values at S and 2s reduce to small and insignificant values. This shows that one
should always account for seasonal constants, since neglecting these may lead to
spuriously large ACF values at seasonal lags. The ACFs of the AsyLseries seem to
die out reasonably quickly, and one may want to estimate low-order autoregressive
(AR) models for these transformed series. In the last row of each panel, one can
observe (and this is common to many economic series) that the ACF of A1 AsyLhas
significant values at lags 1, S - 1, S, and S 1. +
MODELING
SEASONALITY I N TIMESERIES 559
The latter feature for the A1 A s transformed series is also recognized in Box
and Jenkins (1970), who propose to analyze the so-called airline model
AiAsyt = (1 - 81B)(1 - 8sBs)&l (1)
where E t is a standard white noise series. Clearly, the theoretical ACF of (1) corre-
+
sponds with nonzero values for the ACF at lags 1, S - 1, S, and S 1. In the case
where 81 = 0s = 1, which amounts to overdifferencing since then (1 - B)(1 - B s )
cancels from both sides of (l),the corresponding ACF values are -0.5,0.25, -0.5,
and 0.25, respectively. Most textbooks in time series advocate the use of (1). Fur-
thermore, for many different example series model (1)fits well. Its key advantage is
that it only contains two moving-average (MA) parameters to be estimated.
As an example for the consumption nondurables data it appears that the follow-
ing model passes the usual LM-type diagnostic checks for residual autocorrelation
at lag 1 and at lags 1-to-S:
AiA4yl = 0.0002 + E, - O.634Et-4
(0.0010) (0.069)
where the parameters are estimated using Micro TSP routines, and where standard
errors are reported in parentheses. Notice that (2) effectively contains only one pa-
rameter, which (together with the double-differencing filter) appears sufficient to re-
move the autocorrelation in yl.
As another example, for the four-weekly advertising data, the following model
is found to pass the diagnostic checks for residual autocorrelation:
AlA13yt = 0.0012 + Et - 0.3942,-1 - O.388it-2
(0.027 7) (0.077) (0.072)
(3)
- O.4962,-13 - O.304Et-14
(0.069) (0.076)
Due to the disaggregation level of these data, one should expect the need for several
parameters to whiten the errors. For the MA part of the model, the solutions to its
characteristic polynomial are six pairs of complex roots with absolute values 0.981,
0.963,0.943,0.930, 0.928, and 0.921, and two real roots 0.993 and 0.612. Hence,
given that 13 of the 14 solutions are close to the unit circle, it may be that the A13
filter on the left-hand side of (3) is redundant. Given the fact that MA parameters
are typically estimated away from the unity boundary, one may now be tempted to
conclude that the A I filter for advertising should be sufficient. In fact, if so, with
seasonality being approximately constant, Bell (1987) shows that the solutions to
the MA polynomial for (1) approach the unit circle.
It is worthwhile to note that the solutions to 1 - z s = 0, or equivalently
to exp(Si4) = 1 (where i2 = -1) are { 1,cos(27tkl.S) + i sin(27tklS)) for k =
560 FRANSES
1 , 2 , . . . , yielding S different solutions which all lie on the unit circle. In other words,
+ +
the A1 As filter in (1)to (3) assumes S 1 unit roots, i.e., S 1independent sources
of stochastic and nonstationary variation. Even though the MA component in models
as (1) seems to “repair” in some sense the possibly overestimated number of unit
roots, one may question the notion that economic time series are governed by such
a large number of trends.
Consider for example the so-called seasonal random walk process Asyl = E~
and also consider the S time series Y s , ~which , are the annually observed data on
yt in season s = 1 , 2 , . . . , S . Given the seasonal random walk process, it is clear
that the individual Ys,r series are annually observed random walks which are in-
dependent. The observations in the various seasons are not tied together somehow,
and they wander through time without any restriction. Strictly speaking, this means
that “summer” can become “winter.” Of course, the MA component in (1) will pre-
vent such changes to occur rapidly, but in principle it is possible. Given the graphs
in Figures 1 through 4, it seems that for economic data the seasons are somehow
Consumption nondurables (S = 4)
1957.3-1 973-4 2,4,5,8,9 -0.054 (81) 0.041 (82) 0.042
1974.1-1988.4 2,4,5,8,9 -0.078 (81) 0.039 (84) 0.055
Unemployment rate (S = 4)
1963.3-1976.4 1,475 -0.679 (82) 0.590 (81) 0.548
1977.1-1991.4 1,495 -0.451 (82) 0.288 (81) 0.322
Tourism and travel receipts (S = 12)
1979.07-1986.12 1,2,3,4,5 -0.459 (812) 0.569 ( 8 8 ) 0.375
1987.01-1993.12 1,2,3,4,5 -0.438 (812) 0.434 (8,) 0.299
Advertising expenditures, radio (S = 13)
1980.02-1988.13 1,2, 3, 4, 5, 13 -0.114 (81) 0.086 ( 8 6 ) 0.059
1989.01-1994.13 1,2, 3, 4, 5, 13 -0.355 (8,) 0.297 (82) 0.179
~ ~~~~
where the number of lags is based on LM tests for residual autocorrelation at lags 1 and 1-to-S.
SD is the standard deviation of the estimated 8,. Note that a formal test for the equality of the
8, across the two subsamples is only valid when this regression model is valid. For example,
unemployment may not be linear, and perhaps filters as As are needed instead of A I , see
Sections I11 and IV.
MODELINGSEASONALITY IN TIME
SERIES 56 I
tied together. In other words, although the seasonal fluctuations seem to change over
time, they do so not that quickly.
To obtain a tentative impression of how seasonality can change over time, I
consider the estimates of 6,, s = 1 , 2 , . . . , S, from the auxiliary regression
where the values of i are set such that there is no autocorrelation in E t . In Table 2,
I report minimum and maximum values of 8, and the standard deviation of the S 8,
parameters for two subsamples which are roughly similarly large. For consumption
nondurables one can observe that the maximum value for 8, is obtained for the sec-
ond quarter in the first sample, while it corresponds with the fourth quarter in the
second sample. For unemployment, these extreme 8, values correspond to the same
seasons, although the standard deviation decreases with about 60%.For the Span-
ish tourism data, August loses its importance to July toward the end of the sample.
For illustrative purposes, I depict the estimates for the twelve 6, parameters in the
two subsamples in Figure 5. One can observe that tourism in Spain seems to shift
slightly from the summer months more toward the winter months January to March,
suggesting structural shifts in the behavior of tourists.
Figure 5 Estimates of monthly dummy parameters for 1979.01-1986.12 (Part 1) and for
1987.01-1993.12 (Part 2).
562 FRANSES
111. M O D E L I N G STRATEGIES
Broadly speaking, there are two general strategies to model seasonal economic time
series which display slowly changing seasonality. The first is to impose (depending
on formal test results) one or a few seasonal unit roots on the AR part of a model, and
the second is to allow the AR parameters to vary across the seasons. For brevity, I call
the first approach “seasonality in dynamics,” and discuss this in Section 1II.A and
call the second approach “seasonality in parameters” (Section 1II.B). Recent studies
in Boswijk, Franses, and Haldrup (1997) and Ghysels, Hall, and Lee (1996) seem
to suggest that combinations of the two approaches do not obtain much empirical
support. This section only highlights the main issues, and for further more detailed
overviews the reader is referred to, for example, Franses (1996a, b).
A. Seasonality in Dynamics
Consider the AR(p) model for a time series with seasonal frequency S:
yt = 41yt-1 + 42Yt-2 + +
* * . 4pYt-p + Et (5)
where typically p exceeds S. First, I focus on univariate yt processes, while the sec-
ond part of this section allows yL to be an rn x 1 vector process.
yt = -yt-1 +EL t 7)
which in case S = 4 is called a process with a seasonal unit root at the biannual
frequency, see Hylleberg et al. (1990). Substituting lagged yt in (7) results in
yt = E t - Et-1 + Et-2 - Et-3 + * * (8)
and hence the variance of yt at t equals to2,which is the same as that for a standard
random walk. Shocks E~ in (7) cause permanent changes to the pattern of yt because
their effect does not die out as seen from (8), and because of the seasonal unit root,
the seasonal pattern in yt changes permanently.
Given the conceptual impact of the assumption of seasonal unit roots, and
given that unit roots lead to wider forecasting intervals and to more involved model-
ing strategies for multivariate time series in the next step, it is important to test for
the number of seasonal and nonseasonal unit roots in a univariate time series. The
two commonly applied test procedures are proposed in Osborn et al. (1988)and in
Hylleberg et al. (1990)[HEGY]. The OCSB method investigates whether the #,(I?)
polynomial contains the components 1 - B , 1 - B", both, or none of these. The test
regression is
values are given in brackets. Other choices for pcdo not change the conclusion that
yt appears to contain the nonseasonal unit root 1 since P1 is insignificant and that
unemployment does not need the double-differencing filter.
An application to radio advertising expenditures, where now a trend is in-
cluded in (9) and q ( B ) = 1, results in t ( P 1 ) = -4.913 [-2.781 and t ( P2 ) =
-5.506 [-5.811. Hence, for this four-weekly series it is found that the As filter may
be needed since t ( P 2 ) does not appear significant. In sum, for both series the A l a s
filter assumes to many unit roots. It should be mentioned that the empirical work in
Osborn (1990)convincingly shows that there are only very few quarterly time series
for which the A1 A 4 filter is required to obtain stationarity.
Since the As filter assumes S - 1 seasonal unit roots, it is now relevant to test
how many of these roots are present for a certain time series yL.A commonly applied
method for this purpose is (a variant of) the HEGY approach. For S = 4 this method
concerns the auxiliary regression
where M ( B ) = 1 B + + + +
B2 B3 and A(B) = -(1 - B)(1 B 2 ) . HEGY give
asymptotics and critical values for S = 4 and pcwith Bs = O(s = 1 , . . . , S - l),
Smith and Taylor (1995) allow for seasonally varying trends in case S = 4, and
Franses and Hobijn (1997) consider additional cases. The key focus is on the n l to
n 4 parameters in (11) since nl = 0 implies 1 - B, n 2 = 0 implies 1 + B, and
+
n 3 = n 4 = 0 implies 1 B2, where the latter two seasonal unit roots f zcorrespond
with the annual frequency. Typically, one considers t-tests for n1 and n 2 and a joint
F-test for n3 and n 4 . One may also use the joint F-tests for nl to n 4 or n 2 to n 4 ,
where the latter concerns all seasonal unit roots (Ghysels, Lee, and Noh 1994).
An application of the HEGY method to the German unemployment data (where
q ( B ) in (11) includes two lags, and pcdoes not include any trends) results in t(fi.1) =
1.292 [-2.831, t ( f i 2) = -1.914 [-2.831, and F ( f i . 3 , 3 4 ) = 8.380 [6.62], where
again 5%critical values are given in brackets. Hence, to remove the nonstationar-
+
ities in this series, one needs the (1 - B)( 1 B ) = 1 - B2 filter. In other words,
one may describe the changing seasonal variation in unemployment using a seasonal
unit root at the biannual frequency.
For the Spanish tourism data, the application of an extension of the HEGY
regression (ll),with Bo in (10) unequal to zero and q ( B ) = 1, results in t ( f i 1 ) =
-1.161 [-3.291, t(fi.2) = -4.799 [-2.761, and F ( f i 1 , ., iz l2 ) = 18.835 [4.46].
Hence, for this monthly series only the A , filter is required. This implies that the re-
sults in Table 1 and Figure 5 can be interpreted safely since the auxiliary regression
model cannot be rejected by the data.
An application of the HEGY method to many quarterly U.K. time series (also
including consumption nondurables) in Osborn (1990)yields that often only one or
two seasonal unit roots are present. This finding, which appears typical across em-
MODELING
SEASONALllY IN TIME
SERIES 565
pirical applications of the HEGY method (see, for example, Hylleberg, Jgrgensen,
and Sgrensen 1993, Ghysels, Lee, and Siklos 1993, and Lee and Siklos 1993), im-
plies that the double-differencing filter A ] As as assumed in (1) imposes too many
unit roots on the AR polynomial in (5). However, as shown in Ghysels, Lee, and
Noh (1994),in case (1) is the true data-generating process in simulation exercises,
the empirical size of the HEGY test largely exceeds the nominal size, implying that
one is inclined to find not enough unit roots. Furthermore, it can be shown that in-
appropriate lag augmentation in regressions as (1 1) can have a large impact on the
empirical outcomes, Finally, if p in (5) is smaller than S, the HEGY and OCSB ap-
proaches may be difficult to apply. Additional research is needed to fully under-
stand the theoretical and empirical properties of both test methods. For example,
Ghysels, Lee, and Siklos (1993) examine the effect of lag selection of HEGY test
results.
B. Seasonality in Parameters
The second currently dominant approach in modeling and forecasting economic time
series with seasonality concerns so-called periodic models. Since the studies in Os-
born (1988) and Osborn and Smith (1989), these models have become increasingly
popular in economics. This section surveys some key aspects of periodic models for
univariate and multivariate time series. A full account of the literature and of recent
developments in periodic models for nonstationary seasonal time series is given in
Franses (1996b). For ease of notation, I confine most of the discussion to quarterly
time series (S = 4).
s= 1 s= 1
where ps are seasonal intercepts and the 4~7 denote AR parameters that are allowed
to vary with the season. Model (12) represents a periodic AR process (PAR) of order
1. This PAR(1) is the simplest case of the more general PAR(p) class of models, but
for the present discussion expression (12) suffices to highlight some of the properties
of periodic AR models. Franses (1996b) focuses at great length on more elaborate
models.
For the periodic process in (12) with 4.s# 4 for all s = 1, 2, 3 , 4 , it is clear
that the observations in each of the four seasons are described by a different model.
Denoting Ys,r as the observation on yc in quarters in year T , (12) implies that Y ~ , J= -
p4 -I- 4 4 Y 3 , T +
E ~ , Tand for example Y I , T = pl + +
@ ~ Y ~ , J - - I E I , T . These expres-
sions show that the models for the annual time series Y s , have ~ constant parameters.
In fact, model (12) can be written as
1 0 0 -
[
-42 1 0
0 -43 1
0 0 -44
where B Y ~ , =J Y ~ , J - - IThe
. model for the 4 x 1 vector process YJ- containing Y ~ , to
T
Y ~ , Tis convenient to analyze the unit root properties of yl since (13) represents the
same time-series data as (12) does. The characteristic equation of (13)is
root -1, and the YT process again has one unit root. Hence, seasonal unit roots in
yl correspond with regular unit roots in the YT process. Given this correspondence,
it seems natural for a PAR process first to test whether @1@2@344 = 1 and next
to test restrictions on the 4sparameters. Boswijk and Franses (1996) show that the
first test follows a Dickey-Fuller distribution, and that the second step involves just
x2 asymptotics. Franses and Paap (1996a) show through simulation and through
forecasting empirical series that this two-step method yields useful results. Boswijk,
Franses, and Haldrup (1997) extend this method to allow for more general structures
involving 1(2)processes and more seasonal unit roots. Ghysels, Hall, and Lee (1996)
focus on testing for restrictions as 4s = - 1 in (12 ) in one step.
In case $1$2@344 = 1in (12) and $.v # 1or -1 for all s = 1,2,3,4, the yl pro-
cess is said to be periodically integrated (Osborn 1988, Franses 199613). Otherwise
formulated, yl requires a periodic differencing filter 1 - q5sB with $1&?@3$4 = 1
to remove the stochastic trend. Notice that these q5s parameters have to be estimated
from the data, and that some 4svalues will exceed 1. Typical values for $s for quar-
terly data are within the range of 0.8 and 1.2. For example, the estimation results
for a PAR(1) with the restriction 41@2$344 = 1 for the U.K. consumption durables
sample series are $1 = 1.003(0.008),$2 = 0.932 (0.007),$3 = 1.030(0.008), and
$4 = 1.039 (0.008),with estimated standard errors in parentheses. Note that these
standard errors underestimate the true standard errors since the 4sparameters are
estimated superconsistently, see Boswijk and Franses (1996).An F-test for the re-
striction 4s= 1 obtains the value of 31.61 7, and this is clearly significant at the 5%
level. Hence, U.K. consumption nondurables appears to be a periodically integrated
process; see also Osborn (1988) where it is shown that this finding is consistent with
a modified economic theory and Franses (1996b) for some results on forecasting such
senes.
For many sample series considered in Franses (1996b) it is found that the pe-
riodic differencing filter 1 - 4, B is appropriate to remove the unit root. This finding
appears robust to data transformations and structural breaks. One of the main impli-
cations of periodic integration is that seasonality changes, which can be illustrated
by writing (12) without the p, as
4 4
s= 1 s= 1
4
+ 4s4s-14s-2Ds,lEl-3
s= 1
which after taking first differences and with @~T values close to unity can be approx-
imated by
4
s= 1
568 FRANSES
When model (15) is estimated under the assumption that qs = q for all s, it is very
similar to the airline model in (1). Hence, periodically integrated time series may
seem adequately described by the airline model. Note that this does not hold the
other way around. In fact, for the consumption series, such a model is estimated in
(2). In general, it can be shown that neglecting periodic parameter variation increases
the lag order (Osborn 1991, Tiao and Grupe 1980);i.e., there appears to be a trade-
off between lags in nonperiodic models and the number of intrayear parameters in
periodic time-series models. Furthermore, neglecting periodicity reduces the power
of nonseasonal unit root tests (Franses 1996b), and it may lead to the finding of
spurious seasonal unit roots (Boswijk and Franses 1996).
The application of periodic integration is not necessarily restricted to periodic
models as (12). In fact, Bollerslev and Ghysels (1996)introduce the stationary peri-
odic GARCH model to describe seasonally observed financial time series. Franses
and Paap (1995) extend this model to allow for persistence of volatility shocks in
order to describe the stylized fact that for many daily financial time series the vari-
ance appears to change slowly over time. Franses and Paap (1995) fit the following
PAR(p)-PIGARCH(1, 1) model to daily returns on the Dow-Jones index (for about
4000 daily observations):
P
i= 1
Et cv “0, of)
of = 0, + CKSEf-, + pot-,
2
(s = 1,2,3,4,5)
under the restriction that
5
n(a, + j3) = 1 with C I , ~# a (s = 1 , 2 , 3 , 4 , 5 ) (19)
s= 1
across the seasons; see Franses (1996~) for more details. Hence, seasonal adjust-
ment methods as Census X-11, which treat all seasons in an equal fashion, do not
remove the intrinsic periodicity in a time series. As shown in Franses (1996b) it
is possible to fit periodic models to seasonally adjusted data if the underlying time
series shows dynamic periodicity. Strictly speaking, it does not make sense to sea-
sonally adjust periodic time series since the key assumption for seasonal correction
is that one can isolate the seasonal from the nonseasonal component. As a possi-
ble consequence of this conceptual problem, there is evidence that the NBER peaks
MODELING
SEASONALITY I N TIMESERIES 569
and troughs display seasonality, even though these dates are set using seasonally ad-
justed data (Ghysels 1994b). In fact, Franses (1996b) shows that seasonally adjusted
periodically integrated time series can generate such features. Additional evidence
for the apparent link between seasonal fluctuations and the trend/cycle is presented
in Barsky and Miron (1989),Beaulieu, MacKie-Mason, and Miron (1992), Canova
and Ghysels (1994),Franses (1995),and Miron (1996).
Finally, a feature of periodic integration in connection with the intercept pa-
rameters p, in (12) is that this allows a description of variables with increasing sea-
sonal variation without taking logs. This may be useful for such data as the trade
balance, and other economic data that may take negative values. A key drawback
of periodic AR models, however, is that the number of parameters increases quite
rapidly if S and p increase. For example, a PAR(2) model as (12) for monthly data
involves the estimation of 24 parameters. Hence it may be useful to impose certain
parameter restrictions (Anderson and Vecchia 1993).
where the equilibrium and adjustment parameters are allowed to vary with the sea-
son s = l , 2, 3 , 4. Boswijk and Franses (1995) propose an empirical strategy for
this so-called periodic cointegration model. Franses (1996b) illustrates that models
such as (20) can also generate time series for which the trend/cycle and the seasonal
fluctuations are dependent.
At present, the literature on multivariate periodic models is not extensive, and
much more research is needed into the properties of such models and into the design
of useful empirical methods for more general cases than (20).
570 FRANSES
The issue of investigating seasonal variation in economic data has gained much in-
terest in the last few years, and this has resulted in the two approaches discussed
in the previous section. There are also several studies in which seasonality is in-
corporated explicitly into economic theory, e.g., Osborn (1988),Hansen and Sargent
(1993),Todd (1990), and Miron and Zeldes (1988).There are, however, many issues
for current and future research, and in this section I will highlight only a few of these.
A. Structural Breaks
The time-series models in the previous section assume that seasonality changes be-
cause of shocks; i.e., the changes are stochastic. It may, however, be that the changes
are deterministic. Institutional changes may make economic agents to start behaving
differently in certain seasons. For example, allowing country regions to fix their own
school holiday periods may reduce variation in tourism spending. Another example
is the introduction of a new TV and radio broadcasting channel that can change the
structure of the market for advertising expenditures. Such a new radio channel was
introduced in The Netherlands in 1989, around observation 154 in Figure 4. It is
clear from this graph that seasonality in advertising expenditures changes dramati-
cally. The source of this change appears to be a new pricing policy, which in turn is
due to changing pricing policies for TV. This change is seasonality can be said to be
deterministic.
Following the arguments in Perron and Vogelsang (1992), it can be expected
that neglecting changing parameters in deterministic seasonal dummies biases sea-
sonal unit root tests toward nonrejection; see Franses and Vogelsang (1997) for formal
results. Furthermore, such changes bias tests for periodicity toward the alternative;
i.e., too much periodicity is found, as shown in Franses (1996b).As an example, for
the advertising series, when the OCSB regression in (9)is enlarged with 13 seasonal
dummies for the period from observation 154 onward, the t-tests obtain the values
t ( P 1 ) = -2.497 [-2.861 and t ( P 2 ) = -1 1.358 [-7.501, where the 5% critical val-
ues are from Franses and Hobijn (1997). Hence, the previous conclusion that radio
advertising needs a A13 filter changes to the necessity of only the A1 filter. All 1 2
seasonal unit roots disappear when one allows for deterministic shifts.
For many economic time series the location of a break that affects tests for
seasonal unit roots is unknown. If one suspects such breaks, one may then use the
extreme values of the various t - and F-tests to search for a break data. Asymptotic
theory for this approach is presented in Franses and Vogelsang (1997). If one does
not like the idea of searching for possibly inconveniently located breaks, one may
wish to use this method in order to investigate the robustness of the outcomes of
seasonal unit root tests. In fact, Paap, Franses, and Hoek (1997) show through simu-
lation experiments that making mistakes in either direction yields highly imprecise
forecasts.
MODELING
SEASONALITY I N TIME
SERIES 57 I
B. Time-Varying Parameters
Allowing for structural mean shifts can be helpful to detect exactly when seasonality
changes, if it does. Seasonal unit root and periodic integration models may not be
very helpful to decide in which part of the sample the seasonal variation changes.
Hence. in order to be able to understand more clearly what economic behavior causes
such changes, one may consider models that are in between models with seasonal
unit roots and deterministic shifts. For example, it may be useful to consider
4
s= 1
where uc is some ARMA process and 6,s,Lare time-varying parameters for the sea-
sonal dummies. The 6.s,lcan be made functions of time, of lagged 6S,l or of economic
variables. Consequently, one can extend the HEGY approach to allow for more flex-
ible structures in the pulterm in (10).
Recent examples of flexible structures for seasonal variation are given in An-
dersen and Bollerslev (1994), Harvey and Scott (1994), and Canova and Hansen
(1995). Hylleberg and Pagan (1997) put forward the so-called evolving seasonals
model that amounts to an intermediate case between the models in (4) and the sea-
sonal unit root models. This model appears useful to explain the simulation results
in Hylleberg (1995),where the HEGY test appears better in some cases and the
Canova-Hansen test in others. The two null models in these tests are both spe-
cial cases of the evolving seasonals model. Further research is needed to inves-
tigate the practical usefulness of more flexible structures with respect to those in
Section 111.
C. Nonlinear Modeling
It may be possible and important to introduce even more flexible structures by allow-
ing yt to be described by nonlinear time-series models, while taking care of seasonal-
ity. For example, Ghysels (199413) finds that regime shifts in the business cycle tend
to occur more frequently in some seasons than in others. Additionally, Canova and
Ghysels (1994) and Franses (199s)find that some macroeconomic variables show
572 FRANSES
0 2 Q3
Yt = Pt + 41yt-1 + + 4pyt-p
* * *
j= 1
where G is the logistic activation function; see Kuan and White (1994) for an overview
of neural network models. Although the parameters in the nonlinear component can-
not be interpreted, Franses and Draisma (1997) use the B;GG(.) components (for each
of the seasons) to investigate the contribution of the hidden layer components. For
example, consider again the unemployment data for Germany. When pt in (22) is
as in (10) with 8 2 to 84 set to zero and p in (22) equals 4, the Schwarz criteria for
q = 0, 1 , 2 , 3 , are -756.3, -790.8, -809.5, and -789.0, respectively. Hence the
neural network model with q = 2 hidden layer units is selected, and German unem-
ployment shows nonlinear features.
In Figure 6, I depict the impact of the two hidden layer units (HI and H2) in
each of the four quarters. The dotted line is the time series in each of the seasons
( Y s , ~and
) , the straight line is the contribution of the relevant hidden layer output to
the final output yt. It is clear that H I is active only in quarters 2 and 4 (Q2 and Q4),
MODELING
SEASONALITY IN TIMESERIES 573
where for 42 this activity ends around 1975 and for Q 4 it ends around 1985. Hidden
layer 2 appears active only in quarters 1 and 4. These results show that the time
series in quarter 3 seems linear and that changes in certain seasons occur around
1975 and 1985.
It is now interesting to see whether the nonlinear structures in these German
unemployment data are robust to seasonal adjustment. For this purpose, I estimate
the neural network model in (22) for the official adjusted data. For q = 0, 1,2,3, I ob-
tain Schwarz criteria values of -918.7, -892.1, -867.7, and -855.9, respectively.
Hence, seasonal adjustment appears to affect the apparent nonlinear features of this
time series, at least when represented by the highly parameterized neural network
model in (22). See Granger and Terasvirta (1993) for more parsimonious nonlinear
models. Ghysels, Granger, and Siklos (1996) document that Census X-1 1 seasonal
adjustment appears to introduce nonlinear features in otherwise linear data. Finally,
Franses and Paap (1996b) show that seasonal adjustment may leave nonlinearity in-
tact but that it changes some key parameters. Obviously, it seems useful to study the
properties of seasonal adjustment with respect to nonlinearity. Also, it seems of great
importance to consider models that explicitly describe nonlinearity and seasonality
at the same time, possibly along the lines of Lewis and Ray (1996).
D. Economics
A fourth important topic for further research considers how one can design economic
models that can describe the behavior of economic agents which cause macroeco-
nomic aggregates to show slowly changing seasonality over time. As mentioned, sev-
eral studies incorporate seasonality somehow, but to my knowledge there are no stud-
ies that deal with the question of why seasonality changes; i.e., why do economic
agents endogenously change their seasonal behavior? Furthermore, can (seasonal or
periodic) unit root models generate time series that really mimic economic behavior?
One possible route to follow may be to focus on consumer expectations. An em-
pirical analysis of confidence indicators in Ghysels and Nerlove (1988)and Franses
(1996b) shows that, even when agents are asked to remove seasonality by focusing on
annual trends, the indicators display marked seasonality. In fact, for most countries
one can nowadays obtain only seasonally adjusted consumer confidence indicators,
which somehow may seem counterintuitive.
Another research strategy, which may involve game-theoretic aspects, is to
investigate if seasonality changes because economic agents mistake a set of large
shocks as a precursor for “new” seasonality, and hence start to behave like that.
Many important changes for, e.g., Germany, occurred in the fourth quarter (1966.4:
economic crisis, 1973.4 and 1979.4 dramatic increase in oil prices and 1989.4:uni-
fied Germany). Hence, theoretical models where the arrival rate of shocks can have
an impact on economic behavior may be useful.
574 FRANSES
V. CONCLUDING REMARKS
This chapter surveys recent and possible future research issues in modeling eco-
nomic time series with seasonality. There is a growing interest in modeling season-
ality and not in removing it through some seasonal adjustment methods. In areas as
marketing and tourism, seasonality itself is the focus of investigation. In macroeco-
nomics there is a tendency to apply Census X-11 type methods to remove seasonal-
ity. However, recent empirical studies have shown the large number of drawbacks of
seasonally adjusted data. Furthermore, other studies have shown that it is not that
difficult to model seasonality explicitly.
From a statistical point of view, there appears a consensus that not too many
stochastic trends can be found in economic data. In other words, although seasonal
variation changes over time, it changes quite slowly. The tools for analyzing seasonal
time series can be refined in the direction of nonlinear models or alternative flexible
structures. From an economic point of view, and especially in case of macroeco-
nomics, there is a need to understand why such seasonal patterns change over time.
ACKNOWLEDGMENTS
I thank the Royal Netherlands Academy of Arts and Sciences for its financial sup-
port. Dick van Dijk, Richard Paap, and an anonymous referee provided helpful com-
ments.
REFERENCES
1. INTRODUCTION
Longitudinal or panel data refers to data where we have observations on the same
cross section of individuals, households, industries, etc., over multiple periods of
time. Often the panel data is short in the sense that the cross-sectional units are
available for a period of 2 to 10 years. Some panel data sets are rotating panels where
a proportion of cross-sectional units is kept for revisits, while the remaining is re-
placed by new cross-sectional units. Among the most analyzed panel data are the
Michigan Panel Study of Income Dynamics (PSID) and the National Longitudinal
Surveys of Labor Market Experience (NLS) in the United States, the International
Crops Research Institute for the Semi-Arid Tropics Village Level Studies (ICRISAT
VLS) in Hyderabad, India, and the Living Standards Surveys (LSS) in C6te d’Ivoire.
Recent years have witnessed a significant growth in the availability of the panel
data (Borus 1982,Ashenfelter and Solon 1982, Deaton 1994). The drive behind this
growth stems from the fact that the panel data helps to study dynamics of the indi-
vidual cross-sectional units. It is useful for studying intertemporal and intergenera-
tional behavior of the cross-sectional units. From the inference point of view panel
data leads to efficiency gains in the econometric estimators. For details on advan-
tages and problems with the panel data, see Hsiao (1985), Klevmarken (1989), and
Deaton (1994).
An important difference between the panel data econometric models with ei-
ther cross-sectional or time-series models is that it allows for the cross-sectional
and/or time heterogeneity. Within this framework two types of model are mostly esti-
mated, one is the fixed-effects model where one makes inferences conditional on the
579
580 ULIAH AND ROY
cross-sectional units in the sample while the other is the random-effects model which
is used if we want to make inferences about the population generating the cross-
sectional units. Both the models control for heterogeneity. There is no agreement
in the literature as to which one should be used in the applied work; see Maddala
(1987) for a good set of arguments on the fixed- versus random-effects models. The
econometrics of the fixed-effects model was initiated in the works of Mundlak (1961)
and Hoch (1962), while the work on the random effects model was introduced by
Balestra and Nerlove (1966) and developed further by Wallace and Hussain (1969),
Maddala (1971), Nerlove (1971), and Fuller and Battese (1973). Since then a volumi-
nous econometric literature has developed; for details see the excellent monographs
by Heckman and Singer (1985),Hsiao (1986),Dielman (1989), and Baltagi (1995);
surveys by Chamberlain (1984), Maddala (1987), and Baltagi (1996); a journal vol-
ume by Raj and Baltagi (1992); a recent handbook of approximately 900 pages by
MAtyAs and Sevestre (1996); and the recent work on dynamic panel data models by
Pesaran and Smith (1995) and Harris and MBtyAs (1996).
The motivation for this chapter is based on a simple observation that this volu-
minous literature has been largely confined to the linear parametric models; although
see MQtyiis and Sevestre (1996)and Baltagi (1996) for the references on some of the
recent works on the nonlinear parametric regression and the latent variable models.
It is, however, well known that the misspecified linear or nonlinear parametric mod-
els may lead to inconsistent and inefficient estimates and suboptimal test statistics.
With this in view, the modest aim of this chapter is to systematically develop the
nonparametric estimation of both the fixed- and random-effects panel models which
are robust to the misspecification in the functional forms. Some new estimators are
proposed. Further, the estimation of semiparametric models are also considered. For
the nonparametric regression analysis based on either cross-sectional or time-series
data and the usefulness of their application, see Hardle (1990) and Pagan and U1-
lah (1996). Muller (1988) has considered the nonparametric longitudinal models
with fixed-design regressors and has discussed its applications to growth and other
'
biomedical models. Generally the econometric regressors do not have fixed-design
structure, and his work does not consider the random- and fixed-effects models. Only
the static models are considered here, and it is hoped that in the future the results
could be extended to various other nonparametric econometric models such as the
dynamic models, limited dependent variable models, and duration models.
Another objective of this chapter is to explore the application of the nonpara-
metric panel models to study the calorie-income relationship based on the ICRISAT
VLS panel data. The usefulness of this application stems from the issue of possi-
ble nonlinearity in the calorie-income relation raised in Ravallion (1990). There has
been an ongoing debate in the calorie-income literature in the context of develop-
ing countries on the magnitude of the income elasticity of calorie intake; see Strauss
and Thomas (1990), Bhargava (1991),Bouis and Haddad (1992), and Subramanian
and Deaton (1996), among others. All these authors have considered the per capita
ECONOMETRICSOF PANEL DATA 581
BR=C):---
nT
B(xit)
(5)
i=l t=l
where B(xil) is obtained from (4). For the Hardle and Stoker estimator we assume the
density f ( x ) vanishes at the boundary of its support, and using integration by parts
we write B = E B ( x ) = -E[y( f (')(x)/'(x))], where f(')(x) is the first derivative
of f ( x ) . Then the estimator B is
where
where s represents the sth derivative and o:(x) = V ( u 1 x); for s = 0, rit(')(x) =
rit(x), and for s = 1, rit(')(x) = B ( x ) . In practice o:!,) can be replaced by its
consistent estimator a :
(
.
) ( x ) Z ~E , is the vector of
= ( L ~ ~ K ( x ) ~ , T ) - ' L / , ~ Kwhere
the squared nonparametric residuals iit = (yit - rit(~;,))~. We can then use (7)
to calculate the confidence intervals. The conditions for the asymptotic normality
of B R and ,!IHs are very similar to the conditions of the asymptotic normality of the
pointwise B ( x ) . It follows from Rilstone (1991)and Hardle and Stoker (1989)that,
as n + 00,
In practice, for the confidence intervals and the hypothesis testing, an estimator of
C can be obtained by replacing 0% (x),f ( ' ) ( x ) ,and B ( x ) by their kernel estimators
a;(%),f")(x), and B ( x ) and then taking the sample averages. Then for q = 1,
and, up to O( llnhq),
S
where p2 = q 2 K ( Q ) dQ < 00. We note that the expressions for the condi-
tional bias and the conditional variance do not depend on xit. Thus the unconditional
bias, E ( f i ( x ) ) - m ( x ) ,up to O(h2)and the unconditional variance, V(h(x)), up to
O(l/nhq) also are the same as given in (12) and (13), respectively. Similarly, we can
show that (Pagan and Ullah 1996)
and
or
where Z ( x ) is an n T x ( q +
1) matrix with itth element [ l xit - x] and G(x) =
+
[m(x)B'(x)]' is a ( q 1) x 1 parameter vector. Again minimizing u'K(x)u we get
s=-
nT
C8(XiL)
l n T
i = l ,=1
and established its asymptotic normality. This result also provides the asymptotic
normality of the average derivative
where B;(xit) = (0 i q ) 8 ( x i t )
586 ULLAH AND ROY
To see the behavior of the above estimators more explicitly, we consider first
the asymptotic results. From Kneisner and Li (1996), as n + 00,
In practice one can replace a',(x) and f (x) by their consistent estimators &:(x) and
m. The above results indicate that while the asymptotic variances of f i ( x ) and
PI(%)are the same as those of the Nadaraya-Watson estimators &(x) and the Ril-
stone-Ullah estimator B ( x ) , respectively, the asymptotic variance of estimator is B(x)
different from (x) and B ( x ) . For the standard normal kernel, however, the asymp-
totic variance of P ( x ) is the same as that of B1 (x) and B ( x ) . This is because py2
q 2 K 2 ( Q ) dQ = $(K"'(Q))2 dQ = 1/4fi.
Regarding the average derivative in (22) we note from Li, Lu, and Ullah
(1996)that
where X is as in (9).Thus, the local linear average derivative estimator has the same
asymptotic variance as BR and BHS
(8).The Monte Carlo analysis in Li, Lu, and
Ullah (1996),however, indicates that in small samples fi performs better than both
BR and BHS B
in terms of the MSE. Further the MSE of is minimum when h O( n-2/7.
We now turn to the small-sample behavior of &(x) and B ( x ) compared to A(x)
and f i ( x ) . Conditional on %it,
where p 4 is the fourth moment of the kernel around zero. Using (29) it can also be
verified that
Again, these conditional bias results are the same as the unconditional results. Fur-
ther, the variance of f i ( x ) , up to O(l/nhq),and variances of B ( x ) and Bl(x), up to
O(l/nhq”), are the same as their asymptotic variances given above. It follows from
these results that the optimal h’s which minimize the integrated MSE of %(x), B ( x ) ,
and B1 (x) are the same as in (16) except for the proportionality constants.
Comparing the bias of Nadaraya-Watson estimator h ( x ) with that of the local
linear estimator f i ( x ) we note that while the bias of %(x) depends on the curvature
behavior m ( 2 ) ,the bias of h ( x ) depends on rn(2) as well as m ( ’ ) f ( ’ ) / fdue to the
local constant fit. When (m(’)I is large or when f ( ’ ) / f is large, especially in highly
clustered data, the bias of h ( x ) is large. Even when the true regression is linear h
is biased but f i is unbiased. The asymptotic variances of h and f i are the same.
Thus, one might expect f i to perform better than R. Fan (1992,1993) reports good
finite-sample MSE performance of % compared to h. He shows that f i is the best
among all linear smoothers, including orthogonal series and splines. Further f i has
100%efficiency among all linear smoothers in a minimax sense and a high minimax
efficiency among all possible estimators. Fan and Gijbels (1992)have reported better
performance of f i ( x ) near the boundary of the support o f f , although see a word of
caution by Ruppert and Wand (1994).
The comparison of the bias of b(x)
with the bias of B ( x ) is, however, not so
clear. Both are seriously affected by m(’), rn(3),f(’), and f . In contrast, the bias of
our proposed estimator Bl(x) is much simpler and smaller compared to B ( x ) and
B ( x ) . In fact, when the true rn is linear both b ( x ) and B ( x ) are biased but f i l ( x ) is
unbiased. Though not studies here, it is conjectured that the MSE performance of 81
will be much superior to both and B.
588 ULLAH AND ROY
The estimator s"(x) in (19) can also be obtained by the local nonparametric
estimation of the linear parametric model
which is (32);a = m ( p ) - p p ( p ) .Thus while the local linear model (17) is based
on the small h approximation of rn, the model (32) may be interpreted as the small
cr amroximation of rn.
I I
where ai is the individual specific fixed parameters, and uit is i.i.d. with mean zero
and constant variance at.A more general specification of (34)is discussed in Section
1I.D. When m(x;,) = xit/?, model (34)is the well-known linear parametric FE model
studied in the literature very extensively. An important reason for the popularity of
the linear parametric model is that there exists a class of transformations of (34)
which eliminates ai so that /? can be estimated by either a simple LS or a generalized
least-squares (GLS) estimator. It is not straightforward to get transformations which
will remove ai from (34)when m(xit)is of unknown form. However, if we reformulate
the problem of estimating rn and its derivative in terms of local linear estimation of
(17), then it is possible to implement some of the existing transformations. These are
proposed below. First, following (17),
which gives
and for q 2 1,
x (X’MDK(x)h!DX)-’
provided ~ : ~ ( =
x )of.For a feasible version of the variance of BFR(X) one needs
to replace O: by sz = E”C c iil*,2/nT,where is the residual from the regression
590 ULLAH AND ROY
with
for given xi,; A is the matrix which transforms y into the vector
- of yi, - y;,-l. It is
not clear, unlike in the linear parametric regression, whether FE is the same as BFE
for T = 2 or 3. This seems true even if we had estimated #? by the weighted GLS
procedure, which takes into account the moving-average nature of the error term.
An alternative way to estimate #?(x) is to write
which can be written, after expanding rn around the mean values of xit = p and
ai = 0, as
where rn(p)- p@(p)= a,vi = a;p1( p ) ,and @ and @1 are derivatives with respect
to xit and ai respectively. The model (41) is the RE version of the local linear model
in (33)based on small-a expansion. The RE version of the local linear model in (17)
can also be written as (42) with p replaced by x:
We consider uit to be i.i.d as in Section II.C, and assume vi also to be i.i.d with mean
zero and variance of.
The local nonparametric RE estimator of M and /3 in (43) can be obtained by
minimizing
&E -'
(x) = ( Z * ' ( x ) K ( x ) Z *(x)) Z * ' ( x ) K (x)y* (44)
When h = 00, K ( x ) = K ( 0 ) and we get the well-known parametric RE estimator
given in econometric texts (Baltagi 1995, Hsiao 1986).
A feasible estimator of 8 ~ ~ ( xis )obtained by replacing h with its estimator
X +
= 3:/(3: Tb:). 3; is obtained as
i=l t=1
As described in Section II.B, the estimators of a and /3 in (42) are also given
by the estimator in (44). For given xi,,
W R E (x)>
z
= 0;( Z *’(x)K (x) * (x)) - z*’(x)K 2(x)z*(x)( Z *’(x)K (x)2* (x))-
(46)
The detailed study of the asymptotic and finite-sample properties of b ( x ) and
its comparison with the FE estimators will be the subject of future research.
y;, = X i t p + rn(z;,) + +
zli U;, (47)
where t i t is a vector of p regressors. For zli = 0, the model is considered by Li and
Stengos (199S), who propose the &z consistent estimation of p by Robinson’s (1988)
procedure. This is given by transforming (47) with vi = 0 as
R r = R::@ + (48)
where RT,’ = yi, - E(yi, I zi,) and R;: = nil - E(xi, I z i t ) and then applying the LS
method. This gives
l n T \-l n T
/ i=l ,=I
PSFE(2) =(Z'MDK(z)M~Z)-'Z'M~K(z)~~y*
It is important to be able to estimate the derivative since that is the parameter of
interest in most economic applications.
When vi is random, the model becomes the semiparametric RE (SRE) model
considered in Li and Ullah (1996), who propose a &-consistent GLS estimator of
BY
where fi,~~
is the FE estimator from the model RTtZ= R::P
2
U; + +
uit. The estimator
6:is given by 6; - 6 : / T , where 6; = C" -xz
f i ~
(R!,' - #?eRi.) / n and is the between
A
xit) = Li and Stengos (1994) suggest a two-step GLS estimator of p for such
a model. One can also consider V(ai I xi.) = a2(x;,) and develop a two-step GLS
estimator. The situation where both a2(xi.)and a 2 ( x i t )are present has not been con-
sidered in the literature. In a recent paper Horowitz and Markatou (1996) consider
the nonparametric kernel density estimation of at and a: and then propose the max-
imum likelihood estimation of 6 in (47) with no m ( t ) . The issues of unit roots and
serial correlation in the errors remain the subjects of future research.
One disadvantage of the semiparametric model in (47) with an unknown func-
tion of regressors, or the purely nonparametric model in Section II.A, is the “curse
of dimensionality.” This refers to the fact that the rate of convergence of the non-
parametric estimator of m ( z ) decreases drastically with the increase in the number
of regressors. One solution explored in the literature is to use the generalized ad-
ditive models of Hastie and Tibshirani (1990) and Berhame and Tibshirani (1993),
which estimate the p-dimensional m ( z ) at the convergence rate of one dimensional
nonparametric estimator. Essentially the model (47) is written as (assuming 0 and
U ; = 0) yit = E;==, +
m ; ( ~ ; ; ~ )U i t , and m;(z;it) is estimated by a nonparametric
method; see Linton and Neilson (1996)for the kernel method of estimation.
There is an extensive semiparametric literature on the estimation of limited
dependent variables models such as single-index models, censored models, and se-
lection models (Melenberg and Soest 1993,Pagan and Ullah 1996). Essentially these
semiparametric models can be considered as special cases of (47) with 2ii = 0 where,
+
for example, in the single-index case x i t p m(zit)is a function of single index, say,
t i t s , that is, m ( z i t S ) . In the censored case m(zil)in (47) is m(zit6),which becomes
the inverse Mill’s ratio under the assumption of the normality of u i t . When u i t is
nonnormal, vi # 0 but is fixed, and m(z;,) = 0, the estimation of (47) by the least
absolute deviation (LAD) method has been discussed in Honor6 (1992)and Keane
(1993),among others.
F. Specification Testing
There is an extensive literature on various specification testing in the parametric FE
and RE models (Hsiao 1986, Baltagi 1995, 1996, Greene 1993).Here we mainly
look into the recent work in the context of nonparametric panel models.
Considering the nonparametric pooled model in Section II.A, we note that the
pointwise hypothesis testing for the linear restrictions on the derivatives can be done
by using the asymptotic normality results for the local linear or Nadaraya-Watson
given there (Ullah 1988, Robinson 1989a, Lewbell995, and Pagan and Ullah 1996
give more details and references). In fact, the pointwise asymptotic test for various
+
misspecifications in the local linear model yit = m ( x ) (xit - x ) p ( x ) U ; , may +
follow from the corresponding tests in the linear parametric models. The global tests
based on comparing the restricted residual sum of squares (RRSS) with the unre-
stricted RRSS or based on the conditional moments are developed, among others, in
ECONOMETRICS OF PANEL DATA 595
Fan and Li (1996), Li and Wang (1994), and Bierens (1990). Their results cover the
tests for linearity, exclusion of regressors and semiparametric specification, and can
+
be extended for yit = r n ( x i L ) uit, where xit and uit are i.i.d. across i and t .
Frees (1995) considered the problem of testing cross-sectional correlation in
the parametric panel model and noted that the Breusch and Pagan (1980) measure
does not possess desirable asymptotic properties for the practical situation where n
is large but T is small. In fact, he showed that the asymptotic distribution depends
on the parent population even under the hypothesis of no cross-sectional correlation.
In view of this, he introduced a distribution-free statistic which does not have this
problem. An extension of this to nonparametric and semiparametric models will be
useful. Li and Hsiao (1996) have considered the semiparametric model (47) and
developed a LM-type test for the null hypothesis that uit is white noise against the
alternative that uit has the RE specification.
An important question in the nonparametric panel data analysis is whether to
pool the data. A conditional moment test for this problem, Ho : r n l ( x i 1 ) = mz(x;z)
against H1 : m l ( x i l ) # mn(x;z)assuming T = 2 here for simplicity, is proposed
in Baltagi et al. (1995).If the Ho is accepted then one can pool the data and use the
results of Section 1I.A. If Ho is rejected then the estimates for two different periods
can be pooled to obtain a more precise estimate (Pinske and Robinson 1996).
111. A N APPLICATION
Here we present an empirical example based on the methodology discussed and de-
veloped in the previous sections. For a long time now there has been a debate in
the nutrition-income literature in developing countries on the response of nutrition,
more specifically calorie intake, resulting from a rise in income. Some of the recent
articles that have engaged in this debate are by Behrman and Deolalikar (1990),
Strauss and Thomas (1990), Bhargava (1991), Bouis and Haddad (1992), Subrama-
nian and Deaton (1996), and Grimard (1995). For them estimating the income elas-
ticity is important because it has serious policy implications on how best to reduce
malnutrition. If the elasticity turns out to be close to zero, the implication is that
improvement in the income of the poor will have little impact on the extent of mal-
nutrition. Then the development policies aimed at improving nutrition will have to
use policy instruments which attack malnutrition directly rather than relying solely
on rising income.
Behrman and Deolalikar (1990) used individual level ICRISAT VLS panel
data for two years from three villages in south central India and estimated a linear
parametric FE model. In this section we consider both the standard parametric panel
models and the nonparametric panel models discussed in Section I1 to study the
calorie-income relationship based on the data set used by Behrman and Deolalikar
(1990). For details on ICRISAT VLS data, see Binswanger and Jodha (1978), Ryan
596 ULlAH AND ROY
et al. (1984),and Walker and Ryan (1989).Our contribution to the existing literature
on calorie-income relationship is that we are able to take into account both the func-
tional form and the heterogeneity while modeling the calorie-income relationship,
and this we are able to do using the results in Section 11. Note that previous works
have considered one or the other (i.e., the heterogeneity or the functional form) but
never both together. While we recognize the fact that there are other variables which
influence individual calorie intake, we choose to use income as our only regressor
since it is undoubtedly the most influential factor in individuals’ consumption de-
cisions and some other authors in this literature have done the same. For example,
Subramanian and Deaton (1996) studied the regression of calorie intake on expen-
diture. A nonparametric regression analysis of calorie intake with other variables
besides income included as regressors will be the subject of a future study. We think
that in the multivariate case, the semiparametric method described in Section II.E,
rather than a pure nonparametric analysis, may be a better way to study the calorie-
income relationship.
We consider three types of nonparametric models: constant intercept, fixed-
effects, and random-effects models. These correspond to model (34)with a;equal to
a constant, ai as an individual fixed effect, and a;as a random effect respectively.
Similarly we consider the same three types of models with linear parametric speci-
fication r n ( x i t ) = x ; , B . The dependent variable, yit, in all the models represents the
logarithm of individual calorie intake for the ith individual in the tth time period, the
explanatory variable, xit, represents the logarithm of per capita real income, and ai
represents the combined effects of unobserved individual characteristics, household
characteristics, etc., which can be considered to be fix<,.-! or random, as may be the
case.
The results are all based on a total number of observations of 730, that is, 365
individuals each observed over two years. For the nonparametric regression analysis
the kernel used is the normal kernel given as
will affect calorie intake but only very slightly. What the parametric analysis does
not tell us is whether the elasticity is significant at all levels of income, and if so what
is its magnitude. Of course, one can do parametric regression analysis by percentile
groups (for example, the elasticity for the bottom and the top deciles, say), but still
one cannot get the elasticity at each income level. This question can be answered
from the nonparametric regression analysis.
Table 2 gives the nonparametric elasticity estimates at the mean value of the
regressor to make it somewhat comparable with the parametric elasticity estimates
from Table 1. The results from the nonparametric models are similar to our paramet-
ric model results as can be seen from Table 2.
Given that the nonparametric specification gives us elasticity estimates at dif-
ferent income levels, we also report in Table 3 the mean, the minimum, and the max-
imum values of the elasticity for the different models. The mean elasticities were
calculated by using h = ~ s n - ~as/ ~ indicated
, in Section 1I.A. Note, however, that
h a n-2/7 is known to be optimal for the constant-intercept model only (Li and U1-
lah 1996),but the optimal h values for the FE and the RE models are not yet known.
It can be seen from the table that the elasticity can be quite different for different
income levels, and looking at just the mean elasticity estimate can be misleading.
We find for all three models (constant intercept, FE, and RE) and, on average,
the elasticity is higher for poorer households compared to richer households (Fig-
ures 1, 2, and 3). For our constant-intercept nonparametric model the elasticity is
--..
t
0.3
0.2 4
---.
-4.-
... --..
\
\
4.1 --
\
\
\
\'
'.
Figure I Elasticity of calorie intake with respect to per capita real income from pooled non-
parametric model using local linear estimation method.
ECONOMETRICS OF PANEL DATA 599
leoofprruo1tlM-
Figure 2 Elasticity of calorie intake with respect to per capita real income from a nonpara-
metric FE model.
T
0.2 '.
0.1s 9 '
-..-._.
). 0.1 -.
F]
......ubond
-. .
leoofprrrqlcdheonr
Figure 3 Elasticity of calorie intake with respect to per capita real income from a nonpara-
metric RE model.
600 ULLAH AND ROY
significant everywhere except at the tails, but the tail behavior of nonparametric
estimators is generally not very good. For FE and the RE models, the elasticities are
significant everywhere except at the upper tail.
Thus our results suggest that the income elasticity of calorie intake is signif-
icant but is rather low. The small magnitude of elasticity, however, does not neces-
sarily mean that income is an ineffective policy instrument to reduce undernutrition,
as has been effectively demonstrated in Ravallion (1990).Both parametric and non-
parametric regression analyses give us a similar result, except that the nonparamet-
ric one gives us the added information that the elasticity gradually declines as one
moves up the per capita income distribution.
ACKNOWLEDGMENTS
We are grateful to B. Baltagi, Q. Li, L. MAtyAs, B. Raj, and M. Ravallion for their
useful comments and suggestions. We are also thankful to the seminar participants
at McGill University and UCR for their comments. We are especially grateful to Anil
Deolalikar for the discussions on the subject matter of this chapter and for providing
the ICRISAT VLS data. Any remaining errors are ours. The first author gratefully
acknowledges research support from the Academic Senate, UCR.
REFERENCES
Ashenfelter, 0. and G. Solon (1982), Longitudinal Labor Market Data-Sources, Uses and Lim-
itations, in What’s Happening to Americun Labor Force and Productivity Measurements?
Proceedings of a June 17,1982, conference sponsored by the National Council on Em-
ployment Policy, W. E. Upjohn Institute for Employment Research.
Balestra, P. and M. Nerlove (1966), Pooling Cross-Section and Time-Series Data in the Estima-
tion of a Dynamic Model: The Demand for National Gas, Econornetrica, 34, 585-612.
Baltagi, B. H. (1995), Econometric Analysis ($’Panel Data, Wiley, New York.
Baltagi, B. H. (1997), Panel Data Methods, in Handbook ojApplied Economics Statistics (A.
Ullah and D. E. A. Giles, eds.), Marcel Dekker, New York.
Baltagi, B. H., J. Hidalgo, and Q. Li (1996), A Nonparametric Test for Poolability Using Panel
Data, Journal of Econometrics, 75(2), 345-367.
Behrman, J. R. and A. B Deolalikar (1990). The Intrahousehold Demand for Nutrients in Rural
South India: Individual Estimates, Fixed Effects and Permanent Income, Journal of
Human Resources, 25(4), 665-696.
Berhame, K. and R. J. Tibshirani (1993), Generalized Additive Models for Longitudinal Data,
Manuscript, [Jniversity of Toronto.
Bhargava, A. (1991), Estimating Short and Long Run Income Elasticities of Foods and Nutri-
ents for Rural South India, Journal ojthe Royal Statistical Society, 154, 157-174.
Bierens, J. H. A. (1990), Consistent Conditional Moment Tests of Functional Form, Econo-
metricu, 58, 1443-1458.
ECONOMETRICS OF PANEL DATA 60I
Stone, C. J. (1982), Optimal Global Rates of Convergence for Nonparametric Regression, An-
nals ofStatistics, 10, 1040-1053.
Strauss, S. and D. Thomas (1990), The Shape of the Calorie Expenditure Curve, Economic
Growth Centre Discussion Paper no. 595, Yale University.
Subramanian, S. and A. Deaton (1996),The Demand for Food and Calories: Further Evidence
from India, Journcd of Policital Economy, 104(l), 133-162.
Ullah, A. (1988), Semiparametric and Nonparametric Econometrics, Physics-Verlag, Heidel-
berg.
Vinod, H. D. and A. Ullah (1988), Flexible Production Estimation by Nonparametric Kernel
Estimators, in Advances in Econometrics: Robust and Nonparametric Statistical Infer-
ence (T. B. Fomby and C. F. Rhodes, eds.), JAI Press, Greenwich.
Walker, T. S. and J. G. Ryan (1989), Village and Household Economies in India’s Semiarid
Tropics, The Johns Hopkins University Press, Baltimore.
Wallace, T. D. and A. Hussain (1969), The IJse of Error Components Model in Combining
Cross-Section with Time-Series Data, Econometrica, 37, 55-72.
Watson, G. S. (1964), Smooth Regression Analysis, Sankhya, Series A , 26,359-372.
18
On Calibration
1. INTRODUCTION
One issue that rises is how all the authors listed above can regard themselves as in-
volved in calibration-do they mean the same thing when describing what they do
in this way? To get to the heart of this we really need some definition of the term.
That is not easy to come by. Definitions, such as “. . . calibration . . . is not estima-
tion” (Kydland and Prescott 1996, p. 74) are negative rather than positive, while
alternative explanations, such as Kydland and Prescott (1991), tend to be rather too
diffuse. Nevertheless, I would agree with the negative statement above, in the sense
ON CALIBRATION 607
that the primary focus of calibrators is not really estimation, even though they may
perform estimation as part of their task. My definition is
There are three key words or phrases in the definition-“process,” “data em-
ployed,” and “measure specified characteristics,” and we need to dwell a little on
the last two elements, concentrating upon how one would carry out the process dis-
tinguished in the definition. The employment of data distinguished calibration from
exercises with models in which unknown parameters are just replaced with some
values in order that they might be simulated-what King (1996) refers to as quanti-
tative theory. The characteristics that investigators wish to measure are multifold-
examples drawn from the literature include
This leaves us with the topic of parameter assignment. There is no one way of doing
estimation that is common to all who describe themselves as calibrators. Instead,
608 PAGAN
the whole gamut of estimation procedures is represented, ranging from the somewhat
vague (and possibly inconsistent) prescriptions of Kydland and Prescott:
Thus data are used to calibrate the model economy so that it mimics the world as
closely as possible along a limited, but clearly specified, number of dimensions.
(Kydland and Prescott 1996, p. 74)
It is important to emphasize that the parameter values selected are not the ones
that provide the best fit in some statistical sense. (Kydland and Prescott 1996,
P- 74)
to maximum likelihood (e.g., McGrattan 1994), GMM (e.g., Christian0 and Eichen-
baum 1992, Fkve and Langot 1994), and indirect estimation (e.g., Smith 1993, Bansal
et al. 1995).Indirect estimation is an interesting approach in that it brings together
those who are primarily interested in fitting statistical models to data with those con-
cerned with having a theoretical model as the way of organizing the facts. In indirect
estimation, as set out in Gourieroux et al. (1993)and Gallant and Tauchen (1996),
the parameters of the theoretical model are derived from the estimated parameters
of the statistical model. The method works from the observation that, if the theoret-
ical model is correct, then, from the principles of encompassing, one can predict
what the parameters of the statistical model should be. Hence, if we reverse the nor-
mal encompassing methodology, we can recover estimates of the parameters of the
theoretical model from those of the statistical model.
It is worth asking why one does this rather than fit the theoretical model di-
rectly to the data, as would be the practice with those doing MLE, i.e., “direct” esti-
mation. It turns out that there may be some gains to doing indirect rather than direct
estimation. To see this we look at a small stochastic equilibrium model set out in In-
gram (1995). In this model the system consists of equations describing the evolution
of the log of the capital stock, k,, productivity, a,, and the real interest rate r,, of the
form (Ingram 1995, p. 20)
where A1 is a function of the discount factor /3 and the cost of adjustment coefficient
6, and the equation describing the evolution of the capital stock comes from solving
the Euler equations.
Now let us play some games in which we describe the results from being a cali-
brator, who is either performing direct estimation or employing an indirect estimator
as a way of measuring any unknown parameters. We will assume that the “theorist,”
ON CALIBRATION 609
who provides the model to be estimated, makes some errors, and we ask how robust
the estimation methods are to these mistakes. The parameters to be estimated will be
p 1 , 6 , /3, and p r . One can always get consistent estimators of pa and pr from the sec-
ond and third equations of the system, allowing u s to concentrate on the estimation
of the first equation as that relevant to producing estimates of 6 and B. The statistical
model we choose for indirect estimation is
Indirect estimator
The idea behind indirect estimation is to find what values of 6, /3 are implied by
b l , b2, and b3. In the statistical model bl , b z , and b3 are all consistently estimated;
i.e., hl + pa and - p n h l , are consistently estimated. Hence, the estimation of B
and 6 will be consistent. Thus the use of the general statistical model as the way
of inferring estimates of the parameters of the theoretical model has protected u s
against a misspecification that comes from making an incorrect assumption about
the parameters of the latter.
and this still involves a specification error in that there are incorrect restrictions
imposed between the coefficients of r,-l, k , - l , and k t - q .
*Of course we cannot recover both B and 6 from a single parameter h i , hut the variance of the error term
also contains B and 6 and it can be consistently estimated if A1 can be.
610 PAGAN
Indirect estimators
The indirect estimator will also be inconsistent for the same reason as the direct es-
timator, i.e., even though b l , bz, and b3 are consistent, the theoretical model imposes
an incorrect relation between them.
Even though indirect estimation might be more robust than direct estimation, I
do not feel that one should oversell this idea, and my presumption would be that there
is likely to be little gain from doing indirect estimation, at least in regards to avoiding
the consequences of specification error. It does seem though that, in many instances,
indirect estimation may be an easier way to do estimation, in that information on
good statistical models of data is plentiful, and it is frequently easy to simulate from
theoretical models, which is the modus operandi of indirect estimation. An example
would be models of exchange rates. These can become very complex when allowance
is made for intervention points, etc., and direct estimation may be very difficult.
There is an extensive literature on the type of GARCH models that fit such data, so
it makes sense to use these models to estimate the parameters of some underlying
theoretical model of exchange rates. One might even argue that it is a philosophy
that is ideally suited to calibration endeavors in that it provides the rationale for
a division of labor between those designing good statistical models to fit the data
and those interested in generating economic models. It is likely to be rare that any
individual has skills in both of these areas and the indirect estimation principle
therefore provides a way to reap the benefits of specialization when estimating the
parameters of economic models.
With so much agreement one might wonder what the argument is about? I think that
there are three major areas in which calibrators have a distinctive stance and these
revolve around
A. Preeminence of Theory
The following quotations provide what might be regarded as the polar cases in atti-
tudes toward theory. At one level is the “LSE approach” to econometrics. It is not the
case that such a methodology eschews theory, but it sees theory as just one element
in modelling, as witnessed by the following statements.
ON CALIBRATION 61 I
. . . there is nothing that endows an economic theory with veracity a priori, and so
coherence of an econometric model with an economic theory is neither necessary
not sufficient for it to be a good model. (Mizon 1995, pp. 115-116)
I think it is important to emphasize that the issue is not theory versus “no
theory,” even though many representatives of the calibrationist approach do seem
to make such a stark contrast. Empirical work in most methodological traditions to-
day is sensitive to the need for theory. Indeed, even in the “systems of equations”
approach most despised by Kydland and Prescott (1991) there are few models nowa-
days that do not have a strong theoretical core; see Hall (1995, pp. 980-983) for
a brief review of this fact and Murphy (1988)and Powell and Murphy (1995)for a
working model. The issue is more “how much theory” or “what type of theory” rather
than “no theory.”
In contrast to the position just advanced is a statement by Kydland and Prescott
about the role of theory that seems to be shared, to different degrees, by most of those
who see themselves as calibrators. It is my belief that it is this belief in thepreemi-
nence of theory that distinguishes a calibrator from a noncalibrator.
The degree of confidence in the answer depends on the confidence that is placed
in the economic theory being used. (Kydland and Prescott 1991, p. 171)
A belief in the preeminence of theory carries with it the stance that consistency
with data is of secondary importance; i.e., “strong” data consistency is not necessary
when working with models. One might ask if such a stance is reasonable. I think it
is if all we are doing with the model is demonstrating thefeasibility of a particular
outcome, i.e., we are doing quantitative theory. A good example of this would be
the debate over the validity of uncovered interest parity as an essential element of
many macroeconometric models. Defining the log of the spot exchange rate as S,, its
expected rate one period in the future as S;+], and the forward rate as F,, covered
interest parity yields
Invoking rational expectations so that SL+.l = SE+, + et+l, uncovered interest parity
eventuates as
Of course, this raises the issue of the credibility of a model when used for a range
of issues rather than just the replication of a single fact. It is rare to see a calibrated
model whose originator is not aiming to say something about the real world, from
statements that business cycles are largely due to supply side shocks to the idea that
cycles reduce welfare by very small amounts. I really find it impossible to believe
that anyone can take such conclusions or prescriptions seriously if they derive from
models whose credentials have not been established by measuring them against the
data. It is therefore fascinating, and troubling, to look at the first of the two polar
attitudes that I discern in the literature and which are reproduced below.
If the theory is strong and the measurements good, we have confidence that the
answer for the model economy will be essentially the same as for the actual
economy. (Kydland and Prescott 1996, p. 83)
The statement by Kydland and Prescott comes very close to blaming the data if the
calibrator’s model fails to fit. It is breathtaking because of our lack of strong theory.
We have theory, but to think it is this “strong” is truly amazing. The idea that a model
should be used just because the “theory is strong,” without a demonstration that it
provides a fit to an actual economy, is mind-boggling.
One might argue that few calibration exercises fail to include some evidence on
their fit to data. There are however two defects currently in such presentations which
hamper my acceptance of the proposition that the credibility of the maintained mod-
els has been established. One of these is the selective nature of the facts upon which
fit is to be judged. In some exercises this seems to come down to a single index, e.g.,
the correlation between hours and productivity. In others (e.g., Burnside et al. 1993),
the attempts at model evaluation are far more respectable, in that quite a number of
features are examined for their correspondence with the data. Nevertheless, it is the
case that few of these attempts are holistic. What is to be regarded as holistic de-
pends upon the nature of the problem, but when the models involve restrictions upon
a VAR, as is typical of most RBC and monetary models, it seems appropriate to test
aZZ the restrictions, and not just a subset of them.* King and Watson (1995) make
*Anderson (1991)makes the same point in commenting on Kydland and Prescott’s (1991)paper.
O N CALlBRATiON 6 I3
this same point, although they prefer the comparison of impulse responses. As ex-
plained in Pagan (1995)I do not think impulse response comparisons are as good
a method of assessing fit as a VAR, albeit it may be that any discrepancy between
model and data might be usefully expressed in terms of a discrepancy between the
model and data based impulse responses. In Canova et al. (1994) this philosophy
was put into action with respect to the model of Burnside et al. (1993). That model
looks good when judged by a limited number of features, but very poor when it is
forced to address all aspects of the VAR.
Even if one abstracts from the proper way to evaluate a model, one is still
left with the question of how we are to assess the magnitude of any inconsistency
between data and model. Early on in calibration studies, “eyeball” tests seemed to
predominate as measures of the size of the deviation between the model and real-
ity. Consequently, the chosen metric was very fuzzy, leaving one to despair at any
agreement being reached over whether a model is satisfactory. Just as beauty is in
the eye of the beholder, some of the judgments rendered concerning fit seemed quite
remarkable; e.g., a glance at the woeful (to me) match between model predictions
and data in either Figure 5 of Hansen and Prescott (1993)or Figure 4 in Jovanovic
and MacDonald (1994) leaves one with a sense of wonderment when reading that
the authors described the graphical evidence as supportive of the model. Given the
tendency for these same authors to pull out the measuring stick of predictive perfor-
mance when judging other methodologies, e.g., “one reason for its demise was the
spectacular predictive failure of the approach” (Kydland and Prescott (1991, p 166),
a legitimate question would seem to be why such a benchmark should not be uni-
versal rather than particular. Fortunately, some discipline has begun to emerge in
this literature, mostly through variants of statistical hypothesis testing (e.g., Burn-
side et al. 1993),and it is therefore time to turn to the question of the role of statistics
within the calibration agenda.
C. Role of Statistics
In many papers written by calibrationists there is a clear hostility to the use of purely
statistical models of data. This is most apparent in Kydland and Prescott’s (1991)
paper where the statistical models are identified with the “systems of equation ap-
proach.” In that paper, Frisch’s name is invoked as someone who heartily disap-
proved of this type of work. Actually, a closer reading of the paper they cite does not
support that interpretation. Frisch was certainly worried about how much informa-
tion there was in time-series data, and he was very much in favor of investigators
“going down coal mines” in order to collect and understand the workings of the in-
stitutions they were studying, but the whole paper that is quoted so approvingly by
Kydland and Prescott, is directed against the use of theoretical models that are not
closely connected with modeling features seen in the real world-what he terms
614 PAGAN
“playometrics.”* His stance on this vividly reminds one of some parts of the calibra-
tionist school of modelling as to be worth recording:
In too many cases the procedure followed resembles too much the escapist pro-
cedure of the man who was facing the problem of multiplying 13 by 27. He was
not very good at multiplication but very proficient in the art of adding figures,
so he thought he would try to add these figures. He did and got the answer 40,
which mathematically speaking was the absolutely correct answer to the prob-
lem as he had formulated it. But how well does the figure 40 tell us about the
size of the figure 351?
It is pointless to test all the strong restrictions implied by this simple model: it
is known to be wrong in its details, and formal statistical rejections of the null
would tell us no more than we already know. The more interesting question is,
How wrong is it? (Rosen et al. 1994, p. 482)
If one takes this proposition seriously then it calls into question our ability to easily
decide whether a model fits the data based on some specified metric. In particular
the type of analysis described below becomes problematic.
. . . first a set of statistics that summarizes relevant aspects of the behaviour of
the actual economy is selected. Then the computational experiment is used to
generate many independent realizations of the equilibrium process for the model
economy. In this way, the sampling distribution of this set of statistics can be
determined to any degree of accuracy for the model economy and compared
with values of the set of statistics for the actual economy. (Kydland and Prescott
1996, p. 75)
To see what the problem is assume that the model described by the calibrator
has the form
where z: are the “latent” (unobserved) variables of the theoretical model, et are
shocks that drive the theoretical model, and 0 are the parameters of the model. The
random variable whose realizations are the data will be z t , and it will be definitional
*It is therefore rather ironic to read Kydland and Prescott’s (19%) comment that “here by theory we do
not mean a set of assertions about the actual economy’’ (p. 72).
ON CALIBRATION 615
that z, = 2: + U,, where the properties of the “observation errors” U, are unknown.
This is simply a formal description of a misspecified model. To complete the analysis
we suppose that the calibrationist computes some quantity, g ( z , , 8); e.g., this could
be the sample variance of (say) output. Kydland and Prescott’s proposal is to treat
g(zt, 6 ) as fixed and to study the distribution of g ( z : , 6 ) , locating where g ( z , , 6)
lies in this distribution. In order to find the distribution of g(zr, 6) one only needs to
be able to simulate from the theoretical economy. Despite its seductive appeal, the
procedure is an invalid one, unless it is assumed that ZI, = -zr, as only then is it
true that 2, would remain constant as different realizations of z: are made. Otherwise,
we need to make some assumptions about how U, varies with 2.: By far the simplest
solution, used by most investigators, is to make zt = 2.: Then the quantity of interest
will be taken to be a function of 8 alone and g ( z t ,6) can be compared to the value
predicted by the model. However, this presumes that the model is correctly specified
and contradicts Kydland and Prescott’s fundamental premise about such models.*
What does one do in the face of this problem? My answer, described in more
detail in Pagan (1994,pp. 8-9; 1995, p. 51), would be to take the theory of misspec-
ification of econometric models seriously when judging the significance of a value
of g ( z , , 6). The theory for completing this task is now very well developed (e.g., see
White 1994),as are the computational methods. Perhaps the major obstacle to im-
plementing the theory is to arrive at a description of how the data is generated that is
independent of the theoretical model. In some cases, such as the analysis of macro-
economic data, a VAR would seem to be appropriate, but in other instance one may
need to fit quite complex models, e.g., as in Bansal et al. (1995).This is how I see
both statistical and theoretical models being used in a way that benefits from spe-
cialization. Unlike indirect estimation, which assumes the validity of the theoretical
model and then estimates its parameters from a statistical model after generating re-
alizations from the theoretical one, the scheme above reverses the steps, simulating
data from the statistical model to be used for studying estimators and statistics that
are associated with the theoretical model. Diebold et al. (1995)apply such a scheme
when evaluating the quality of the “cattle cycle” model in Rosen et al. (1994).
ACKNOWLEDGMENTS
This chapter was the basis of my comments made in the Calibration Symposium at
the 7th World Congress of the Econometric Society in Tokyo, August 1995. Some of
the ideas are drawn from Canova et al. (1994),Kim and Pagan (1995),Pagan (1994),
and Pagan (1995).
*Thispoint is also relevant to those proposals for a Bayesian rather than classical assessment of the quality
of the model e.g., De Jong et al. (1996).
616 PAGAN
REFERENCES
Hansen, G. D. and E. C. Prescott (1993), Did Technology Shocks Cause the 1990-1991 Re-
cession, American Economic Review Papers and Proceedings, 83,280-286.
Ingram, B. F. (1995),“Recent Advances in Solving and Estimating Dynamic Macroeconomic
Models,” in K. D. Hoover (ed.). Macroeconometrics: Developments, Tensions and Pros-
pects (Kluwer Academic Publishers. Boston) 1546.
Ingram, B. F. and C. H. Whiteman (1994),Towards a New Minnesota Prior: Forecasting Macro-
economic Series Using Real Business Cycle Model Priors, Journal of Monetary Eco-
nomics, 47,497-510.
Javanovic, B. and G. MacDonald (1994),The Life Cycle of a Competitive Industry, Journal of
Political Economy, 102,322-347.
Kim, K. and A. R. Pagan (1995),The Econometric Analysis of Calibrated Macroeconomic
Models, in M. H. Pesaran and M. R. Wickens (eds.) Handbook ofApplied Econometrics,
Blackwell, Oxford, 356-390.
King, R. G. (1995),Quantitiative Theory and Econometrics, Federal Reserve Bunk of Rich-
mond Economic Quarterly, 81/3,53-105.
King, R. G., C. I. Plosser, and S. T. Rebelo (1988),Production, Growth and Business Cycles.
I: The Basic Neoclassical Growth Model, Journal of Monetary Economics, 21, 195-
232.
King, R. G. and M. W. Watson (1995),On the Econometrics of Comparative Dynamics, mimeo,
University of Virginia.
Kydland, F. E. and E. C. Prescott (1991),The Econometrics of the General Equilibrium Ap-
proach to Business Cycles, Scnndinavian Journal of Economics, 93, 161-178.
Kydland, F. E. and E. C. Prescott (1996),The Computational Experiment: An Econometric
Tool, Journal of Economic Perspectives, 10,6945.
McCallum, B. T. (1994),A Reconsideration of the Uncovered Interest Parity Relationship,
Journal of Monetary Economics, 33, 105-132.
McGrattan, E. R. (1994),The Macroeconomic Effects of Distortionary Taxation, Journal of
Monetary Economics, 33, 573-601.
McKibbin, W. J. and J. D. Sachs (1991),Global Linkages, Brookings Institution, Washington,
DC.
McKibbin, W. J. and P. J. Wilcoxen (1993),G-Cubed: A Dynamic Multi-sector General Equi-
librium Growth Model of the Global Economy, Brookings Discussion Puper in Znterna-
tional Economics No. 98, Brookings Institution, Washington, DC.
Mizon, G. M. (1995),Progressive Modelling of Macroeconomic Time Series: The LSE Method-
ology, in K. D. Hoover (ed.), Macroeconometrics: Developments, Tensions and Prospects,
Kluwer, Boston, 107-170.
Murphy, C. W. (1988), An Overview of the Murphy Model, in M. Burns and C. W. Murphy
(eds.), Macroeconomic Modelling in Australia (supplementary conference issue of AUS-
tralian Economic Papers), 61-68.
Murphy, C. W. (1995),A Model ofthe New Zealand Economy, New Zealand Treasury, Welling-
ton.
Nason, J. M. and T. Cogley (1994),Testing the Implications of Long-Run Neutrality for Mon-
etary Business Cycle Models, Journal of Applied Econometrics, S37-S70.
Pagan, A. R. (1994),Calibration and Economic Research: An Overview, Journal of Applied
Econometrics, 9, S1-Sl0.
618 PAGAN
Pagan, A. R. (1995), Some Observations on the Solution, Estimation and Use of Modern
Macroeconometric Models, in K. D. Hoover (ed.), Macroeconometrics: Developments,
Tensions and Prospects, Kluwer, Boston, 47-55.
Powell, A. A. and C. W. Murphy (1995), Inside a Modern Macroeconometric Model: A Guide
to the Murphy Model, Springer-Verlag, Berlin and New York.
Rosen, S., K. M. Murphy, and J. A. Scheinkman (1994), Cattle Cycles, Journal of Political
Economy, 102,468-492.
Rudebusch, G. D. (1995), Federal Reserve Interest Rate Targeting, Rational Expectations,
and the Term Structure, Journal of Monetary Economics, 35,245-274.
Smith, A. A. (1993), Estimating Non-linear Time-Series Models Using Simulated Vector Au-
toregression, Journal of Applied Econometrics, 8,563-584.
White, H. (1994), Estimation, Inference and Spec$cation Analysis, Cambridge University
Press, Cambridge.
Aggregation Autoregressive (AR) process, 299,
consistency, 125 386387,566
exact, 180,199-208 periodic, 566
condition, 201 spatial moving average (SARMA), 251
Gorman, 183-191
additive aggregation, 184-185
Bayesian analysis, 368,379,427,605
affine homotheticity, 197
Best linear unbiased predictor (BLUP),
polar form (GPF), 179, 187
294
quasi-homotheticity, 187
BFGS algorithm, 528
Stone-Geary structure, 187-188
Binary response model, 75
joint, 219
Boole-Bonferroni inequalities (see Union
for commodities and agents, 227
intersection)
Klein, 220-222
Bootstrap, 145, 164, 419, 436
Klein-Nataf structure, 182,220-222
for confidence intervals, 424,425
multiple fixed inputs, 225-226
percentile-t, 436
of fixed and variable inputs,
refinements, 428,429-432
219-220,226-227
simple, 421
vintage models, 225-226
size correction, 433
Muellbauer, 179, 191-196, 199
Bounds test
Almost ideal demand system (AIDS), 179,
of Kiviet, 489-491
196
of Zinde-Walsh and Ullah, 486,489,
quadratic, 179
492
Analysis of variance (ANOVA), 293,298,
303,305
Approximate slope analysis, 394 Calibration, 605
Atkinson-Bourguigon condition, 131 definition, 606
Atkinson’s index, 6,8,9,47,48, 140 direct estimator, 609
Autocorrelation (see also Serial indirect estimator, 609
correlation) Calorie-income relationship, 595
function (ACF), 556 Calorie intake, 595,596,598
of errors, 394,413,434, 528-529,559, Cambridge controversy, 181
561 Census X-11, 568,573,574
spatial (see Spatial autocorrelation) Central limit theorem, 23,253
619
620 INDEX
Cluster fixed effects (see Panel data, fixed Error component model, 293,299-302,
effects) 305
Cluster sampling (see Sampling, cluster) Error correction model, 5 3 0 4 3 1 , 5 3 9
Coefficient of variation, 7,9, 11, 124, 333, Estimator, direct, 609
335
Cointegration, 507 Fixed difference two stage least squares
seasonal, 565 (FD-2SLS),303,304
Collinearity, 374, 378,597 Fixed effects model (see also Panel data),
Conditional heteroskedasticity (see 293-295,298,299,301,303,305,
Autoregressive conditional 306,308-310
heteroskedasticity) Fourier series, 398400
Confidence set estimation (see Union Full information maximum likelihood
intersection) (FIML) estimator (see Simul-
Cook’s distance, 458 taneous equations models)
Covariates, 29-32
Cram&-Rao lower bound, 294
Galtonian model, 139
Critical point, 474
Gastwirth bounds, 53, 55
Cross-entropy principle, 368
Gauss-Newton regression, 384
Generalized additive models, 594
Debreu-Sonnenschein-Mantel (DSM),
Generalized entropy indices, 6 , 9 , 10, 16,
124,125,127
179,313
Diagnostic testing, 239,383-414 Generalized least squares (GLS), 259,
for GARCH effect, 390,610 293-295,302,313,331,337,341,
347,350,486,492,499
Hausman test, 384
higher moments, 389 Generalized method of moments (GMM),
information matrix test, 384-385 82,258-260,295,303,307,310,
311,374,467
J-test, 387,433
Geographic effects, 74, 82
multiple, 390
Geographic model, 67
of conditional mean, 386
variable addition, 384 Gibbs phenomenon, 399
Gibrat’s law, 139
Dimensionality, curse of, 594
Disequilibrium model, 507 Giffen paradox, 181
Gini coefficient, 8, 10, 11, 17,45,46,49,
Durbin’s h test, 277
53-56
Durbin-Watson test, 265,267,387