Measurement in Economics
Measurement in Economics
Measurement in Economics
A HANDBOOK
This page intentionally left blank
MEASUREMENT IN ECONOMICS
A HANDBOOK
Edited by
Marcel Boumans
University of Amsterdam, The Netherlands
No part of this publication may be reproduced, stored in a retrieval system or transmitted in any
form or by any means electronic, mechanical, photocopying, recording or otherwise without the
prior written permission of the publisher
Permissions may be sought directly from Elsevier’s Science & Technology Rights Department in
Oxford, UK: phone (+44) (0) 1865 843830; fax (+44) (0) 1865 853333; email: permissions@
elsevier.com. Alternatively you can submit your request online by visiting the Elsevier web site
at http://elsevier.com/locate/permissions, and selecting
Obtaining permission to use Elsevier material
Notice
No responsibility is assumed by the publisher for any injury and/or damage to persons or property as
a matter of products liability, negligence or otherwise, or from any use or operation of any methods,
products, instructions or ideas contained in the material herein. Because of rapid advances in the
medical sciences, in particular, independent verification of diagnoses and drug dosages should be
made
ISBN: 978-0-12-370489-4
07 08 09 10 11 10 9 8 7 6 5 4 3 2 1
Preface
This volume owes its existence to a brave initiative of J. Scott Bentley, Exec-
utive Editor of Elsevier Inc. Based upon my article ‘Economics, Strategies in
Social Sciences’ in the Encyclopedia of Social Measurement (Elsevier 2004),
Bentley expressed in August 2004 his interest in publishing a handbook on mea-
surement in economics, and asked me whether I would be interested to serve
as the editor-in-chief. I did not need much time to think about this challeng-
ing invitation. The Amsterdam Research Group in History and Methodology
of Economics had just concluded a project on Measurement in Economics, di-
rected by Mary Morgan. Morgan had successfully linked this Amsterdam project
to a project ‘Measurement in Physics and Economics’, at the Centre for Philos-
ophy of Natural and Social Science of the London School of Economics and
Political Science, which ran from 1996 to 2001, and was co-directed by Nancy
Cartwright, Hasok Chang, and Carl Hoefer. Another event that had an impor-
tant influence on the ultimate structure of this book was the 10th IMEKO TC7
International Symposium on Advances of Measurement Science, held in St. Pe-
tersburg, Russia, June 30–July 2, 2004. There, a number of different perspectives
on measurement employed in sciences other than engineering were examined.
For that reason, Joel Michell was invited to give his account on measurement
in psychology, Luca Mari to discuss the logical and philosophical aspects of
measurement in measurement science, and I was invited to give my account on
measurement in economics. Together with Ludwik Finkelstein and Roman Z.
Morawski, this multi-disciplinary exchange was very fruitful for developing the
framework that has shaped this volume.
A volume on Measurement in Economics with contributions from all the peo-
ple I had met when developing my own ideas about Measurement Outside the
Laboratory would be a perfect way to conclude this research project. In fact,
this volume is a very nice representation of the achievements of the many peo-
ple that were involved. From the beginning we attached importance to the aim of
having contributions from a broad range of backgrounds. We welcomed contri-
butions from practitioners as well as scholars, from various disciplines ranging
from economics, econometrics, history of science, metrology, and philosophy of
science, with the expectation that an intensive exchange among these different
backgrounds would in the end provide a deeper understanding of measurement
in economics. Thanks to all contributors I do think we attained this goal.
An important step towards the completion of this volume was an Author
Review Workshop that took place in April 2006, in Amsterdam, through the
generous financial support of Netherlands Organisation for Scientific Research
(NWO), Tinbergen Institute and Elsevier. At this workshop, the contributors pre-
sented their work to each other, which, together with the subsequent profound
discussions, improved the coherence of the volume considerably.
vi Preface
There are many scholars who made a significant contribution to the project
but whose work is not represented in the volume: Bert M. Balk, Hasok Chang,
Francesco Guala, Michael Heidelberger, Kevin D. Hoover, Harro Maas, and Pe-
ter Rodenburg.
I would also thank the Elsevier’s anonymous referees who helped me improve
the structure of the volume and the editors at Elsevier: J. Scott Bentley (Ex-
ecutive Editor), Kristi Anderson (Editorial Coordinator), Valerie Teng-Broug
(Publishing Editor), Mark Newson and Shamus O’Reilly (Development Edi-
tors), and Betsy Lightfoot (Production Editor), and Tomas Martišius of VTEX
who saw the book through production.
Marcel Boumans
May 2007, Amsterdam
List of Contributors
Numbers in parentheses indicate the pages where the authors’ contributions can
be found.
Roger E. Backhouse (135) University of Birmingham and London School of
Economics, UK. E-mail: R.E.Backhouse@bham.ac.uk.
Marcel Boumans (3, 231) Department of Economics, University of Amster-
dam, Roetersstraat 11, Amsterdam 1018 WB, The Netherlands. E-mail:
m.j.boumans@uva.nl.
Hsiang-Ke Chao (271) Department of Economics, National Tsing Hua Uni-
versity, 101, Section 2, Kuang Fu Road, Hsinchu 300, Taiwan. E-mail:
hkchao@mx.nthu.edu.tw.
Frank A.G. den Butter (189) Vrije Universiteit, Department of Economics,
De Boelelaan 1105, NL-1081 HV Amsterdam, The Netherlands. E-mail:
fbutter@feweb.vu.nl.
Dennis Fixler (413) Bureau of Economic Analysis, 1441 L Street NW, Wash-
ington, DC 20230, USA. E-mail: dennis.fixler@bea.gov.
Christopher L. Gilbert (251) Dipartimento di Economia, Università degli Studi
di Trento, Italy. E-mail: cgilbert@economia.unitn.it.
Glenn W. Harrison (79) Department of Economics, College of Business Ad-
ministration, University of Central Florida, Orlando FL 32816-1400, USA.
E-mail: gharrison@research.bus.ucf.edu.
Eric Johnson (79) Department of Economics, Kent State University, Kent, Ohio
44242, USA. E-mail: ejohnson@bsa3.kent.edu.
Alessandra Luati (377) Dip. Scienze Statistiche, University of Bologna, Italy.
E-mail: luati@stat.unibo.it.
Jan R. Magnus (295) Department of Econometrics and Operations Research,
Tilburg University, The Netherlands. E-mail: magnus@uvt.nl.
Luca Mari (41) Università Cattaneo – Liuc – Italy. E-mail: lmari@liuc.it.
Thomas Mayer (321) University of California, Davis, CA 94708, USA. E-mail:
tommayer@lmi.net.
Melayne M. McInnes (79) Department of Economics, Moore School of Busi-
ness University of South Carolina, USA. E-mail: mcinnes@moore.sc.edu.
Joel Michell (19) School of Psychology, University of Sydney, Sydney NSW
2006, Australia. E-mail: joelm@psych.usyd.edu.au.
viii List of Contributors
Preface v
List of Contributors vii
Part I: General 1
Chapter 1. Introduction 3
Marcel Boumans
Chapter 3. Measurability 41
Luca Mari
Chapter 15. Optimal Experimental Design in Models of Decision and Choice 357
Peter G. Moffatt
General
This page intentionally left blank
CHAPTER 1
Introduction
Marcel Boumans
Department of Economics, University of Amsterdam, Amsterdam, The Netherlands
E-mail address: m.j.boumans@uva.nl
Abstract
Measurement in Economics: a Handbook aims to serve as a source, reference,
and teaching supplement for quantitative empirical economics, inside and out-
side the laboratory. Covering an extensive range of fields in economics: econo-
metrics, actuarial science, experimental economics, and economic forecasting,
it is the first book that takes measurement in economics as its central focus. It
shows how different and sometimes distinct fields share the same kind of mea-
surement problems and so how the treatment of these problems in one field can
function as a guidance in other fields. This volume provides comprehensive and
up-to-date surveys of recent developments in economic measurement, written at
a level intended for professional use by economists, econometricians, statisti-
cians and social scientists.
The organization of this Handbook follows the framework that is given in this
introductory chapter. It consists of four major parts: General, Representation in
Economics, Representation in Econometrics, and Precision.
1.1. Introduction
yi = F (x) + εi . (1.1)
The observed quantity y can only provide information about the system variable,
x, when this variable does influence the behavior of y. In general, however, it
will be the case that not only x will influence y, but also there will be many
other influences, B, too. To express more explicitly how x and other possible
1 ‘True value’ is an idealized concept, and is unknowable. Even according to the Classical Ap-
proach, as expressed in VIM (1993), it is admitted that ‘true values are by nature indeterminate’
(p. 16). In current evaluations of measurement results this term is avoided. The in metrology influ-
ential Guide to the Expression of Uncertainty in Measurement (GUM, 1993) recommends to express
the quality of measurement results in terms of ‘uncertainty’, see section Precision below and Mari
in this volume.
Introduction 5
factors (B) influence the behavior of the observed quantities, the relationship is
transformed into the following equation:
∂f ∂f
y = f (x, B) = x + B (1.3)
∂x ∂B
where ∂f/∂x and ∂f/∂B denote how much y will change proportionally due to
changes in x and B, respectively.
To achieve reliable measurement results, the following problems have to be
dealt with:
1. Invariance problem: ∂f/∂x is the element of Eq. (1.3) that expresses the re-
lation between the observed quantity y and the measurand x. This element
should be, as much as possible, invariant – that is to say, it has to remain
stable or unchanged for, and to be independent of, two kinds of changes:
variations over a wide range of the system variable, x, and variations over
a wide range of background conditions, B.
2. Noise reduction: Taking care that the observations are as informative as pos-
sible, or in other words, are as accurate and precise as possible, we have to
reduce the influences of the other factors B. In a laboratory, where we can
control the environment, this can be achieved by imposing ceteris paribus
conditions: B = 0. For example, by designing experiments as optimally as
possible (discussed by Moffatt in Chapter 15) one can gain precision.
3. Outside the laboratory, where we cannot control the environment, accuracy
and precision have to be obtained by modeling in a specific way. To mea-
sure x, a model, denoted by M, has to be specified, for which the observations
yi function as input and x̂, the estimation of x, functions as output:
x̂ = M[yi ; α] (1.4)
where α denotes the parameters of the model. The term ‘model’ is used here
in a very general sense; it includes econometric models, filters, and index
numbers (see also Chapter 6 in which Backhouse discusses other representa-
tions than those usually understood to be useful as models in economics).
Substitution of the observation equation (1.1) into model M (Eq. (1.4)) shows
what should be modeled (assuming that M is a linear operator):
x̂ = M f (x) + εi ; α = Mx [x; α] + Mε [ε; α]. (1.5)
To explore how this measurement error is dealt with, it may be helpful to com-
pare this with the ‘mean-squared error’ of an estimator as defined in statistics:
E ε̂ 2 = E (x̂ − x)2 = Var ε̂ + (x − E x̂)2 . (1.7)
The first term of the right-hand side of Eq. (1.7) is a measure of precision and the
second term is called the bias of the estimator (see also Proietti and Luati’s Sec-
tion 5.1 in this volume). Comparing expression (1.6) with expression (1.7), one
can see that the error term Mε [ε, α] is reduced, as much as possible, by reducing
the spread of the errors, that is by aiming at precision. The second error term
(Mx [x, α] − x) is reduced by finding an as accurate as possible representation
of x.
This splitting of the error term into two and the strategies developed to deal
with each part explains the partitioning of this volume. After a General Part
in which general and introductory issues with respect to measurement in eco-
nomics are discussed, there will be two parts in which the problem of obtaining
accurate representations in economics and in econometrics are looked at in turn.
The division between economics and econometrics is made because of the differ-
ences between strategies for obtaining accuracy developed in the two disciplines.
While there is an obviously stronger influence from economic theory in eco-
nomics, one can see that econometrics is more deeply influenced by statistical
theories. The last part of this volume deals with the first error term, namely Pre-
cision.
1.2. General
2 See also Chao in this volume. For surveys of model accounts, see Morgan (1998) and Morrison
and Morgan (1999).
8 M. Boumans
3 See for examples Morgan’s history of the measurement of the velocity of money in Chapter 5,
Backhouse’s chapter on Representation in Economics, and den Butter’s chapter on national accounts
and indicators.
4 See Boumans and Morgan (2001) for a detailed discussion of this spectrum.
Introduction 9
∂f y
= . (1.8)
∂x x
At the opposite end of this spectrum of experiments are the so-called ‘natural
experiments’, where one has no control at all, and one is fully dependent on
observations only passively obtained:
∂f ∂f ∂f ∂f
y = x + z1 + z2 + z3 + · · · (1.9)
∂x ∂z1 ∂z2 ∂z3
where the zi ’s represent all kinds of known, inexactly known and even unknown
influencing factors.
To discuss these latter kinds of experiment and to chart the kind of knowledge
gained from them, it is helpful to use a distinction between ‘potential influences’
and ‘factual influences’, introduced by Haavelmo in his important 1944 paper
‘The probability approach in econometrics’. A factor z has potential influence
when ∂f/∂z is significantly5 different from zero. Factor z has factual influence
when ∂f/∂z · z is significantly different from zero. In practice, most of all
possible factors will have no or only negligible potential influence: ∂f/∂zi ≈ 0,
for i > n. So change of y is determined by a finite number (n) of non-negligible
potential influencing factors which are not all known yet:
∂f ∂f ∂f ∂f
y = x + z1 + z2 + · · · + zn . (1.10)
∂x ∂z1 ∂z2 ∂zn
To find out which factors have potential influence, when one can only passively
observe the economic system, we depend on their revealed factual influence.
Whether they display factual influence (∂f/∂z · z), however, depends not only
on their potential influence (∂f/∂z) but also on whether they have varied suf-
ficiently for the data set at hand (z). When a factor hasn’t varied enough
(z ≈ 0), it will not reveal its potential influence. This is the so-called problem
of passive observation. This problem is tackled by taking as many as possible
different data sets into account, or by modeling as many as possible factors as
suggested by theory.
Generally, to obtain empirical knowledge about which factor has potential
influence without being able to control, econometric techniques (e.g. regression
analysis) are applied to find information about their variations:
5 Whenever this term is used in this chapter, it refers to passing a statistical test of significance.
Which statistical test is applied depends on the case under consideration.
10 M. Boumans
∂f ∂f ∂f ∂f ∂f
y = x + c1 + · · · + cm + z1 + · · · + zk
∂x ∂c1 ∂cm ∂z1 ∂zk
(1.12)
where beside the stressor x, the ci ’s indicate the influencing factors which are
also controlled by the experimenter. The experimenter intervenes by varying
these control factors in a specific way, according to certain instructions or tasks,
ci = Ii , where Ii represents a specific institutional rule assumed to exits in
the real world or a rule which is correlated with naturally occurring behavior.
Knowledge about these rules is achieved by other experiments or econometric
studies. Harrison et al. (Chapter 4) investigate how knowledge about one of these
experimental controls might influence the measurement results. This knowledge
depends on previous experiments or is based on theoretical assumptions. Mis-
specification of these controls may lead to inaccurate measurement results.
Each economic measuring instrument can be understood as involving three
elements, namely, of principle, of technique and of judgment. A particular strat-
egy constrains the choices and the combinations of the three elements, and these
elements in turn shape the way individual measuring instruments are constructed
and so the measurements that are made (see also Morgan, 2001). Morgan (Chap-
ter 5) provides examples of different measurement strategies; these different
strategies all have the common aim to measure the ‘velocity of money’. Interest-
ingly she observes in her history of the measurement of the velocity of money a
trajectory that may be a general feature of the history of economic measurement:
a trajectory of measuring some economic quantity by direct means (measure-
ment of observables), to indirect measurement (measurement of unobservables)
to model-based measurement (measurement of idealized entities).6 The latter
kind of measurement involves a model that mediates between theory and obser-
vations and which defines the measurand.
6 A similar study has been carried out by Peter Rodenburg (2006), which compares different
strategies of measuring unemployment.
Introduction 11
simplified product of the core theory. “It sees how much mileage it can get out of
that model. Only then does it add any complicating and more realistic feature”
(p. 29). Mayer (Chapter 13) discusses a similar ‘disagreement’ between these
two strategies.
Structure is one of the key concepts of measurement theory, including econo-
metrics. Surveying the literature on this subject, Chao (Chapter 11) observes that
structure has two connotations: one is that it refers to a system of invariant re-
lations and the other to a deeper layer of reality than its observed surface. They
are, however, connected. The latter connotation implies that we can only assess
that layer indirectly, we need theory to connect surface with the layers below.
The connection between the observations yi and the measurand x, denoted by F
in Eq. (1.1) is made by theory. As we have seen, in order to let the observations
y be informative about x, this relation must be stable across a broad range of
variations in both x and background conditions.
Magnus (Chapter 12) shows that one should not only use diagnostic tests to
assess the validity of the model specification, but also sensitivity analysis. Mor-
gan (Chapter 5) raises the issue that observations and measurements are always
taken from a certain position – observation post. This implies that the view of
the nearest environment of this position is quite sharp, detailed and complete, but
the view of the environment farther away becomes more vague, less detailed, in-
complete, and even incorrect. The question is whether this is a problem. Is it
necessary for reliable measurement results to have an accurate representation
of the whole measuring system plus environment? No, not necessarily, as Mag-
nus argues, but a reliability report should include an account of the scope of the
measurement: a sensitivity report.
Models contain two sets of parameters: focus parameters (α) and nuisance
parameters (θ ). The unrestricted estimator α̂(θ̃ ) is based on the full model, de-
noted by θ̃ , and the restricted estimator α̂(0) is estimated under the restriction
θ = 0. Magnus shows that the first order approximation of their difference can
be expressed as:
∂ α̂(θ )
α̂ θ̃ − α̂(0) ≈ θ̃ ≡ S θ̃ (1.13)
∂θ θ=0
the problem of data mining: the repetition of operations until the desired results
are obtained. The problem is to validate the accuracy of these results. An im-
portant way to assess the results’ accuracy is to see whether these results can
be reproduced. Reproduction is the opposite of data mining. In VIM (1993)
reproducibility is defined as: “closeness of the agreement between the results
of measurement of the same measurand carried out under changed conditions
of measurement” (p. 24). The changed conditions might include: principle of
measurement, method of measurement, observer, measuring instrument, ref-
erence standard, location, conditions of use, and time. A similar strategy in
non-laboratory sciences is ‘triangulation’, see also Chapter 6. The term ‘trian-
gulation’ is often used to indicate that more than one method is used in a study
with a view to multiple checking results. The idea is that we can be more con-
fident about the accuracy of a result if different methods lead to the same result
(see e.g. Jick, 1979).
1.5. Precision
with Type A uncertainty. Discussions about how to achieve accuracy are rather
similar to the discussions about assessing Type B uncertainty.
Precision or Type A uncertainty can be objectively established for any chosen
metric, they are considered to be quantitative concepts. However, accuracy or
Type B uncertainty depends much more on qualitative knowledge of the mea-
surand itself and cannot be assessed in the same objective way. That objective
standards are not enough for evaluating measurement results is admitted in the
Guide to the Expression of Uncertainty in Measurement:
Although this Guide provides a framework for assessing uncertainty, it cannot substitute for
critical thinking, intellectual honesty, and professional skill. The evaluation of uncertainty is
neither a routine task nor a purely mathematical one; it depends on detailed knowledge of the
nature of the measurand and of the measurement. The quality and utility of the uncertainty
quoted for the result of a measurement therefore ultimately depend on the understanding,
critical analysis, and integrity of those who contribute to the assignment of its value (GUM,
1993, p. 8).
1 1
n n
yi = F (x) + εi = F (x). (1.14)
n n
i=1 i=1
This method is, of course, based on the assumption that the errors are symmetri-
cally distributed around zero. Nonetheless, it is an early example of a model of
the errors, Mε .
Taking the arithmetic mean to reduce noise also implicitly assumes that the
observations are taken under the same conditions, the assumption of repeatabil-
ity. Repeatability, however, is a quality of a laboratory. Economic observations
are rarely made under these conditions. For example, times series are sequen-
tial observations without any assurance that the background conditions haven’t
changed. To discuss noise reduction outside the laboratory and at the same time
16 M. Boumans
yt = xt + εt . (1.15)
n
n
n
x̂t = αs yt+s = αs xt+s + αs εt+s . (1.16)
s=−n s=−n s=−n
To turn the observations yt into a measurement result x̂t , one has to decide on
the values of the weighting system αs . In other words, the weights have to be
chosen such that they represent the dynamics of the phenomenon (cf. Eq. (1.5)):
n
Mx [x; α] = αs xt+s (1.17)
s=−n
n
Mε [ε; α] = αs εt+s . (1.18)
s=−n
Usually a least squares method is used to reduce this latter error term. Proietti
and Luati (Chapter 16) give an overview and comparison of the various models
that are used for this purpose.
Fixler (Chapter 17) discusses the tension between the requirement of preci-
sion and of timeliness. Equation (1.4) seems to assume immediate availability of
all needed data for a reliable estimate. In practice, however, this is often not the
case. Collecting data takes time; economic estimates are produce in vintages,
with later vintages incorporating data that were not previously available. This
affects the precision of early estimates.
1.6. Conclusions
Measurement theories have been mainly developed from the laboratory. In eco-
nomics, however, many if not most measurement practices are performed out-
side the laboratory: econometrics, national accounts, index numbers, etc. Taking
these theories as starting point, this volume aims at extending them to include
these outdoor measurement practices. The partitioning of this volume is based
on an expression (1.6) that represents the key problems of measurement:
ε̂ = x̂ − x = Mε [ε, α] + Mx [x, α] − x .
Introduction 17
Acknowledgements
I am grateful to Harro Maas, Luca Mari, Thomas Mayer and Marshall Reinsdorf
for their valuable comments.
References
Boumans, M. (1999). Built-in justification. In: Morgan, M.S., Morrison, M. (Eds.), Models as Me-
diators. Cambridge Univ. Press, Cambridge, pp. 66–96.
Boumans, M., Morgan, M.S. (2001). Ceteris paribus conditions: Materiality and the application of
economic theories. Journal of Economic Methodology 8 (1), 11–26.
Finkelstein, L. (1975). Fundamental concepts of measurement: definitions and scales. Measurement
and Control 8, 105–110.
GUM (1993). Guide to the Expression of Uncertainty in Measurement. ISO, Geneva.
18 M. Boumans
2.1. Introduction
Measurement has characterised science since antiquity, and many have written
on its philosophy, but during the twentieth century an unprecedented number
of attempts were made to uncover its foundations. Such attempts generally em-
phasised one or more of three aspects: first, the processes of measuring (e.g.,
Dingle, 1950); second, the structure of measured attributes (e.g., Hölder, 1901
and Mundy, 1987); and, third, evidence that putative measurement processes
actually measure. It is to this third aspect that the representational theory of
measurement is most directly relevant.
Initially, the representational theory emerged from the philosophy of mathe-
matics, specifically, from changes in the understanding of what numbers are. In
the nineteenth century, increasingly abstract and formal theories made it difficult
to think of numbers as features of the real-world situations to which processes of
measurement apply. This raised the issue of why, if they are not real-world fea-
tures, they appear indispensable in measurement? One proposal was that while
numbers are not, themselves, features of the real world, they might serve to re-
present or model such features. On its own, this idea could never have energised
1. Given any magnitudes, a and b, of Q, one and only one of the following is
true:
(i) a is identical to b (i.e., a = b and b = a);
(ii) a is greater than b and b is less than a (i.e., a > b and b < a); or
(iii) b is greater than a and a is less than b (i.e., b > a and a < b).
2. For every magnitude, a, of Q, there exists a b in Q such that b < a.
3. For every pair of magnitudes, a and b, in Q, there exists a magnitude, c, in
Q such that a + b = c.
4. For every pair of magnitudes, a and b, in Q, a + b > a and a + b > b.
5. For every pair of magnitudes, a and b, in Q, if a < b, then there exists mag-
nitudes, c and d, in Q such that a + c = b and d + a = b.
6. For every triple of magnitudes, a, b, and c, in Q, (a + b) + c = a + (b + c).
7. For every pair of classes, φ and ψ , of magnitudes of Q, such that
(i) each magnitude of Q belongs to one and only one of φ and ψ ;
(ii) neither φ nor ψ is empty; and
(iii) every magnitude in φ is less than each magnitude in ψ,
there exists a magnitude x in Q such that for every other magnitude, x , in Q,
if x < x, then x ∈ φ and if x > x, then x ∈ ψ (depending on the particular
case, x may belong to either class).
For example, for length, these axioms mean: 1, that any two lengths are the
same or different and if different, one is less than the other; 2, that there is no
least length; 3, that the additive composition of any two lengths exists; 4, that all
lengths are positive; 5, that the difference between any pair of lengths constitutes
another; 6, that the additive composition of lengths is associative; and 7, that the
ordered series of lengths is continuous (i.e., any set of lengths having an upper
bound (i.e., a length not less than any in the set) has a least upper bound (i.e.,
a length not greater than any of the upper bounds)). This is what it is for length
to be an unbounded continuous quantity.
Because magnitudes were understood as attributes of things, the traditional
view entailed that numbers are intrinsic features of the situations to which the
procedures of measurement apply. Consequently, the conceptual thread binding
number, magnitude and ratio would seem to unravel if either, (i) magnitudes
were denied a structure capable of sustaining ratios or, (ii) if numbers were not
thought of as located spatiotemporally. It was the first of these that applied in
Russell’s case. He stipulated that magnitudes are merely ordered (one magni-
tude always being greater or less than another of the same kind) and denied that
they are additive (i.e., denied that one magnitude is ever a sum of others) and
thus, by implication, denied that magnitudes stand in relations of ratio, thereby
severing the thread sustaining the traditional theory. His reasons were idiosyn-
cratic (Michell, 1997) and not accepted by his fellow logicists, Gottlob Frege
(1903) or A.N. Whitehead (see Whitehead and Russell, 1913), who treated logi-
cism as compatible with the ratio theory of number. Nonetheless, Russell, for
his own reasons, gave flesh to the representational idea, and it proved attractive.
22 J. Michell
One of its first advocates was Campbell (1920 and 1928), who, applying it to a
distinction of Hermann von Helmholtz (1887), produced the concepts of funda-
mental and derived measurement. He distinguished measured quantities, such as
length, from measured qualities, as he called them, like density. Quantities, he
claimed, are like numbers in possessing additive structure, which is only iden-
tifiable, he thought, via specification of a suitable concatenation procedure. For
example, when a rigid straight rod is extended linearly by another adjoined end
to end with it, the length of these concatenated rods stands in a relation to the
lengths of the rods concatenated that has the form of numerical addition, in the
sense that it conforms to associative (a + [b + c] = [a + b] + c) and commutative
(a + b = b + a) laws, a positivity law (a + b > a), and the Euclidean law that
equals plus equals gives equals (i.e., if a = a and b = b , then a + b = a + b )
(Campbell, 1928, p. 15). Evidence that these laws are true of lengths could be
gained by observation. Therefore, thought Campbell, the hypothesis that any at-
tribute is a quantity raises empirical issues and must be considered in relation to
available evidence.
If, for any attribute, such laws obtain, then, said Campbell, numerals may
be assigned to its specific magnitudes. Magnitudes are measured fundamentally
by constructing a ‘standard series’ (1920, p. 280). This is a series of objects
manifesting multiples of a unit. If u is a unit, then a standard series displays a
set of nu, for n = 1, 2, 3, . . . , etc., for some humanly manageable values of n.
If an object is compared appropriately to a standard series, a measure of its
degree of the relevant quantity can be estimated and this estimate is taken to
represent the additive relation between that degree and the unit. The sense in
which measurements are thought to represent empirical relations is therefore
clear.
Campbell recognised that not all magnitudes are fundamentally measurable.
There is also derived measurement of qualities, which is achieved by discover-
ing constants in laws relating attributes already measured. He believed that the
discovery of such laws is a result of scientific research and must be sustained by
relevant evidence. An example is density. For each different substance, the ratio
of mass to volume is a specific constant, different say for gold as compared to
silver. The numerical order of these constants is the same as the order of degrees
of density ordered by other methods. Thus, said Campbell, these constants are
derived measurements of density, but the sense in which they represent anything
beyond mere order is not entirely clear from his exposition.
However, that the ratio of mass to volume is correlated with the kind of sub-
stance involved suggests that each different substance possesses a degree of a
general property accounting for its associated constant ratio. Because the effect
being accounted for (the constant) is quantitative, the property hypothesised to
account for it (viz., density) must likewise be quantitative, otherwise the com-
plexity of the property would not match the complexity of the effect. Although
Representational Theory of Measurement 23
Campbell did not reason like this and never explained how derived measurement
instantiated the representational idea, it seems that it can.
When the British Association for the Advancement of Science established
the Ferguson Committee to assess the status of psychophysical measurement,
Campbell’s empiricism confronted psychologists’ scientism head-on: he insisted
that their claims to measure intensities of sensations be justified via either funda-
mental or derived measurement. Instead of doing this, he argued, ‘having found
that individual sensations have an order, they assume that they are measurable’
(Ferguson et al., 1940, p. 347), but ‘measurement is possible only in virtue of
facts that have to be proved and not assumed’ (Ferguson et al., 1940, p. 342).
While the failure of psychologists to produce evidence for more than order
in the attributes they aspired to measure was the basis for Campbell’s critique,
there is nothing in the representational idea per se that restricts measurement to
fundamental and derived varieties. Campbell had simply tried to translate the
traditional concept of measurement into representational terms and because the
former concept is confined by the role it gives to the concept of ratio, represen-
tational theory is thereby needlessly narrowed. Morris Cohen and Ernest Nagel
served the latter better when they wrote that numbers
have at least three distinct uses: (1) as tags, or identification marks; (2) as signs to indicate
the position of the degree of a quality in a series of degrees; and (3) as signs indicating the
quantitative relations between qualities (Cohen and Nagel, 1934, p. 294).
Use (1) is not something that Russell or Campbell would have called measure-
ment, but the representational idea does not exclude it. Use (2) is the assignment
of numbers to an ordered series of degrees to represent a relation of greater than.
For this, Cohen and Nagel required that the represented order relation be shown
by observational methods to match the ordinal properties of the number series,
such as transitivity and asymmetry. They called this use measurement of ‘inten-
sive qualities.’ Use (3) covered fundamental and derived measurement and their
treatment of these added little to Campbell’s, but the popularity of their textbook
meant that the representational idea was well broadcast.
However, inclusion of ordinal structures within representational theory only
highlighted psychology’s dilemma. If no more than ordinal structure is identi-
fied for psychological attributes, then the fact that psychological measurement
does not match the physical ideal is displayed explicitly, a point laboured by
critically minded psychologists (e.g., Johnson, 1936). By then, practices called
‘psychological measurement’ occupied an important place, especially attempts
to measure intellectual abilities (Michell, 1999). Psychologists had devoted con-
siderable energy to constructing numerical assignment procedures (such as in-
telligence tests), which they marketed as measurement instruments, but without
any observational evidence that the relevant attributes possessed the sort of struc-
ture thought necessary for physical measurement.
A key concept within the representational paradigm, and one that future de-
velopments traded upon is the concept of structure (see Chao, 2007). The sorts
of phenomena investigated in science do not consist of isolated properties or ob-
jects. Crucial to scientific investigation is the concept of relation. That a thing
24 J. Michell
Stevens had developed the sone scale to measure sensations of loudness (Stevens
and Davis, 1938), an achievement the Ferguson Committee disputed (Ferguson
et al., 1940). However, he saw Percy Bridgman’s (1927) operationalism and
Rudolf Carnap’s (1937) logical positivism as philosophical tools for deflecting
Campbell’s objections. Carnap thought that logic and mathematics are systems
of symbols, each with a syntax (i.e., rules for constructing formulas and deduc-
tions) consisting of conventions, not empirical truths. Stevens agreed, holding
that ‘mathematics is a human invention, like language, or like chess, and men
not only play the game, they also make the rules’ (1951, p. 2). How come then
that successful applications of arithmetic are so ubiquitous? He responded that
the rules for much of mathematics (but by no means all of it) have been deliberately rigged to
make the game isomorphic with common worldly experience, so that ten beans put with ten
other beans to make a pile is mirrored in the symbolics: 10 + 10 = 20 (1951, p. 2).
that for a ratio scale, operations used to ‘determine’ equal ratios ‘define’ equal
ratios. The procedures used to construct his sone scale involved instructing sub-
jects to judge loudness ratios directly. Stevens took the resulting scale ‘at its face
value’ (1936, p. 407) as a ratio scale and, cocking his snoot at Russell’s scru-
ples announced that if this ‘is thievery, it is certainly no petty larceny’ (1951,
p. 41), thereby deriding the issue of whether loudness intensities stand in ratios
independently of the operations supposed to identify them.
This delivered the sort of conceptual pliability needed to claim measurement
without having to interrogate established methods in the way that realist inter-
pretations of representational theory required. It allowed psychologists to claim
that they were measuring psychological attributes on scales like those used in
physics (Michell, 2002). For the majority of procedures used in psychology
there is no independent evidence that hypothesised attributes possess even or-
dinal structure, but the received wisdom since Stevens is that ‘the vast majority
of psychological tests measuring intelligence, ability, personality and motivation
. . . are interval scales’ (Kline, 2000, p. 18).
5. If the weight of evidence supports the set of axioms, at least one of the ho-
momorphisms between the empirical and numerical systems is selected as a
scale of measurement for the relevant attribute.
The following examples illustrate these ideas, but hardly scratch the surface,
given the range of possible empirical systems elaborated by proponents of this
version. (See also Reinsdorf, 2007, for an example from economics.)
Consider a set, A, of rigid, straight rods of various lengths and the relation,
a spans b, holding between any pair of rods whenever the length of a at least
matches that of b (symbolised as b a). For any pair of rods, whether this
relation holds can be decided empirically by, say, laying them side-by-side. This
set of rods and the spanning relation constitute an empirical system, A = A, .
Consider the following two axioms in relation to this system: for any rods, a, b,
and c in A,
1. If a b and b c, then a c (transitivity);
2. Either a b or b a (connexity).
A system having this character is a weak order and any weak order is homo-
morphic to a numerical structure, N = N, (where N is a subset of positive
real numbers and is the familiar relation of one number being less than or
equal to another). That is, it can be proved (Krantz et al., 1971) that a many-to-
one, real-valued function, φ, exists such that for any rods, a and b, in A,
That is, positive real numbers may be assigned to the rods where the magnitude
of the numbers reflects the order relations between the rods’ lengths. Further-
more, if ψ is any other function mapping A into N , such that
Krantz et al. (1971) proved that if a conjoint structure satisfies these axioms
then there exist functions, φ and ψ , into the positive real numbers, such that for
any a and b in D and x and y in V ,
Furthermore, they proved that if λ and θ are any other functions satisfying this
condition for a given conjoint structure, then λ and φ are related by a positive
linear transformation, as are θ and ψ also. That is, θ and ψ are interval scales of
density and volume. They are, in fact, logarithmic transformations of the scales
normally used in physics because, of course, if mass = density × volume then
log(mass) = log(density) + log(volume). So, equally, the above conjoint struc-
ture also admits of a multiplicative numerical representation, in accordance with
conventional practice in physics.
These examples illustrate the fact that the concept of scale-type derives
from the empirical system’s intrinsic structure (Narens, 1981) and not, as
Stevens thought, from the measurer’s purposes in making numerical assign-
ments. Recognising this has allowed for progress to be made with respect to
two problems.
The first is the problem of ‘meaningfulness’ (Luce et al., 1990). This prob-
lem arises whenever the structure of the system represented falls short of the
structure of the real number system itself. Then numerically valid inferences
from assigned numbers may not correspond to logically valid deductions from
empirical relations between objects in the system represented (Michell, 1986).
Following the lead of Stevens (1946) and Suppes and Zinnes (1963), Luce et
al. (1990) and Narens (2002) have attempted to characterise the meaningfulness
of conclusions derived from measurements in terms of invariance under admis-
sible scale transformations. Non-invariant conclusions are generally held to be
not meaningful. For example, consider the question of which of two arithmetic
means of ordinal scale measures is the greater? The answer does not necessarily
remain invariant under admissible changes of scale (in this case, any increasing
monotonic transformation). Hence, such conclusions are said not to be mean-
ingful relative to ordinal scale measures.
In so far as this issue has affected the practice of social scientists it relates to
qualms about whether the measures used qualify as interval scales or are only
ordinal, and, thus, to the meaningfulness of conclusions derived from parametric
statistics (such as t and F tests) with ordinal scale measures. Since many social
scientists believe that the existence of order in an attribute is a sign that the
attribute is really quantitative, recent thinking on the issue of meaningfulness
has considered the extent to which interval scale invariance may be captured by
conclusions derived from ordinal scale measures (Davison and Sharma, 1988
and 1990). For example, even though the concept of an arithmetic mean has,
itself, no analogue within a purely ordinal structure, calculation of means with
ordinal data may still be informative if it is assumed that the attribute measured
possesses an underlying, albeit presently unknown, interval scale structure.
Representational Theory of Measurement 31
Beginning with Ernest Adam’s (1966) paper, the logical empiricist version has
been subjected to numerous critiques. Many of these are based upon misun-
derstandings. For example, one common criticism concerns error. If data were
collected using objects and relations of the kinds specified in any of the above
examples, with the aim of testing the axioms involved, then as exact descrip-
tions of data, the axioms would more than likely be false. However, this is not a
problem for the representational theory because the axioms were never intended
as exact descriptions of data. From Krantz et al. (1971) to Luce (2005), it has
been repeatedly stressed that the axioms are intended as idealisations. That is,
they are intended to describe the form that data would have in various situa-
tions, were they completely free of error. In this respect, as putative empirical
laws, the axioms are no different to other laws in science. (On this issue see also
Boumans, 2007.)
It also goes without saying that its proponents are not claiming that such
axiom systems played a role in the historical development of physical measure-
ment. In so far as physics is concerned, as Suppes (1954) said at the outset, such
systems are mainly intended to display how ‘to bridge the gap between quali-
tative observations (“This rod is longer than that one”) . . . and the quantitative
32 J. Michell
surement have achieved and were social scientists serious about measurement,
they would attempt to employ this body of knowledge to test for quantitative
structure.
2.7. Résumé
References
Adams, E.W. (1966). On the nature and purpose of measurement. Synthese 16, 125–169.
Representational Theory of Measurement 37
Luce, R.D., Krantz, D.H., Suppes, P., Tversky, A. (1990). Foundations of Measurement, vol. 3.
Academic Press, New York.
Mari, L. (2007). Measurability. In: M. Boumans (Ed.), Measurement in Economics: A Handbook.
Elsevier, London.
Michell, J. (1986). Measurement scales and statistics: A clash of paradigms. Psychological Bulletin
100, 398–407.
Michell, J. (1993). The origins of the representational theory of measurement: Helmholtz, Hölder,
and Russell. Studies in History and Philosophy of Science 24, 185–206.
Michell, J. (1994). Numbers as quantitative relations and the traditional theory of measurement.
British Journal for Philosophy of Science 45, 389–406.
Michell, J. (1997). Bertrand Russell’s 1897 critique of the traditional theory of measurement. Syn-
these 110, 257–276.
Michell, J. (1999). Measurement in Psychology: A Critical History of a Methodological Concept.
Cambridge Univ. Press, Cambridge.
Michell, J. (2002). Stevens’s theory of scales of measurement and its place in modern psychology.
Australian Journal of Psychology 54, 99–104.
Michell, J. (2006). Psychophysics, intensive magnitudes, and the psychometricians’ fallacy. Studies
in History and Philosophy of Science 17, 414–432.
Michell, J., Ernst, C. (1996). The axioms of quantity and the theory of measurement, Part I. An
English translation of Hölder (1901), Part I. Journal of Mathematical Psychology 40, 235–252.
Michell, J., Ernst, C. (1997). The axioms of quantity and the theory of measurement, Part II. An
English translation of Hölder (1901), Part II. Journal of Mathematical Psychology 41, 345–356.
Mundy, B. (1987). The metaphysics of quantity. Philosophy Studies 51, 29–54.
Mundy, B. (1994). Quantity, representation and geometry. In: Humphreys, P. (Ed.), Patrick Suppes:
Scientific Philosopher, vol. 2. Kluwer, Dordrecht, pp. 59–102.
Nagel, E. (1932). Measurement. Erkenntnis 2, 313–333.
Narens, L. (1981). On the scales of measurement. Journal of Mathematical Psychology 24, 249–275.
Narens, L. (2002). Theories of Meaningfulness. Lawrence Erlbaum, Mahwah, NJ.
Narens, L., Luce, R.D. (1990). Three aspects of the effectiveness of mathematics in science. In:
Mirkin, R.E. (Ed.), Mathematics and Science. World Scientific Press, Singapore, pp. 122–135.
Newton, I. (1967). Universal arithmetic: Or, a treatise of arithmetical composition and resolution. In:
Whiteside, D.T. (Ed.), The Mathematical Works of Isaac Newton, vol. 2. Unwin Hyman, London,
pp. 68–82.
Ramsay, J.O. (1991). Review: Suppes, Luce, et al., Foundations of Measurement, vols. 2 and 3.
Psychometrika 56, 355–358.
Reinsdorf, M.B. (2007). Axiomatic price index theory. In: Boumans, M. (Ed.), Measurement in
Economics: A Handbook. Elsevier, London.
Russell, B. (1903). Principles of Mathematics. Cambridge Univ. Press, Cambridge.
Russell, B. (1919). Introduction to Mathematical Philosophy. Routledge, London.
Stevens, S.S. (1936). A scale for the measurement of a psychological magnitude: Loudness. Psycho-
logical Review 43, 405–416.
Stevens, S.S. (1939). Psychology and the science of science. Psychological Bulletin 36, 221–263.
Stevens, S.S. (1946). On the theory of scales of measurement. Science 103, 677–680.
Stevens, S.S. (1951). Mathematics, measurement and psychophysics. In: Stevens, S.S. (Ed.), Hand-
book of Experimental Psychology. Wiley, New York, pp. 1–49.
Stevens, S.S., Davis, H. (1938). Hearing: Its Psychology and Physiology. Wiley, New York.
Suppes, P. (1954). Some remarks on problems and methods in the philosophy of science. Phil.
Science 21, 242–248.
Suppes, P., Krantz, D.H., Luce, R.D., Tversky, A. (1989). Foundations of Measurement, vol. 2.
Academic Press, New York.
Suppes, P., Zinnes, J.L. (1963). Basic measurement theory. In: Luce, R.D., Bush, R.R., Galanter, E.
(Eds.), Handbook of Mathematical Psychology, vol. 1. Wiley, New York, pp. 1–76.
Representational Theory of Measurement 39
Swoyer, C. (1987). The metaphysics of measurement. In: Forge, J. (Ed.), Measurement, Realism
and Objectivity: Essays on Measurement in the Social and Physical Sciences. Reidel, Dordrecht,
pp. 235–290.
von Helmholtz, H. (1887). Zählen und Messen erkenntnistheortisch betrachtet. Philosophische Auf-
sätze Eduard Zeller zu seinem fünfzigjährigen Doktorjubiläum gewidmet. Fues’ Verlag, Leipzig.
(Translated as: Lowe, M.F. (Ed.), ‘Numbering and Measuring from an Epistemological View-
point’. Cohen and Elkana, 1977.)
Whitehead, A.N., Russell, B. (1913). Principia Mathematica, vol. 3. Cambridge Univ. Press, Cam-
bridge.
This page intentionally left blank
CHAPTER 3
Measurability
Luca Mari
Università Cattaneo – Liuc – Italy
E-mail address: lmari@liuc.it
Abstract
Words have nothing magic in them: there are no “true words” for things, nor
“true meanings” for words, and discussing about definitions is usually not so
important. Measurement assumed a crucial role in physical sciences and tech-
nologies not when the Greeks stated that “man is the measure of all things”, but
when the experimental method adopted it as a basic method to acquire reliable
information on empirical phenomena/objects. What is the source of this reli-
ability? Can this reliability be assured for information related to non-physical
properties? Can non-physical properties be measured, and how? This paper is
devoted to explore these issues.
3.1. Introduction
Fig. 3.1.
Fig. 3.2.
I will add, step by step, some of the elements leading to a more complete frame-
work for understanding the concept. The basic theses of the paper are:
– measurability is a specific case of evaluability;
– the measurability of a property conceptually depends on the current state of
the knowledge of the property, and therefore it is not an “intrinsic character-
istic” of the property;
– the measurability of a property operatively depends on the availability of ex-
perimental conditions, and therefore it cannot be derived solely from formal
requirements;
– the measurement of a property is an evaluation process aimed at producing in-
tersubjective and objective information; accordingly, measurement is a fuzzy
subcategory of evaluation: the more an evaluation is/becomes intersubjective
and objective, the more is/becomes a measurement.
Although somehow discussed in the following pages, I will assume here as
primitive the concepts of (1) property, (2) relation among objects and proper-
ties (variously expressed as “property of an object”, “object having a property”,
“object exhibiting a property”, “property applicable to an object”, etc.), and
(3) description related to a property. Objects under measurement are consid-
ered as empirical entities, and not purely linguistic/symbolic ones, and as such
the interaction with them requires an experimental process, not a purely formal
one: many of the peculiar features of measurement derive from its role of bridge
between the empirical realm, to which the object under measurement belongs,
and the linguistic/symbolic realm, to which the measurement result belongs.
Measurability 43
I do not think that words have something magic in them: there are no “true”
words for things, and discussing about definitions is usually not so important.
Accordingly, I surely admit that the same term measurement can be adopted
in different fields with (more or less) different meanings, and I do not think
that the identification of a unified concept of measurement is necessarily a
well-grounded aim for the advancement of science. On the other hand, a basic,
historical, asymmetry can be hardly negated:
– measurement assumed a crucial role in Physics not when the Greeks stated
that “man is the measure of all things”, nor when they decided to call “mea-
sure” the ratio of a geometrical entity to a unit, but when the experimental
method adopted it as a basic method to acquire reliable information on em-
pirical phenomena/objects;
– for many centuries measurement has been exclusively adopted in the evalua-
tion of physical properties, and it is only after its impressively effective results
in this evaluation that it has become a coveted target also in social sciences.
As a consequence, I will further assume that:
– a structural analysis of the measuring process for physical properties should
be able to highlight the characteristics which guarantee the intersubjectivity
and the objectivity of the information it produces;
– as far as the analysis is maintained at a purely structural level, its results
should be re-interpretable for non-physical properties.
Fig. 3.3.
44 L. Mari
C ASE 1. Two subsystems, x1 and x2 , are identified, and the same property p is
evaluated on them (op1 ), thus obtaining the values p(x1 ) and p(x2 ). From the
comparison of these values the inference can be drawn (op2 ) whether x1 and
x2 are mutually substitutable as far as the given property is concerned. As the
resolution of the evaluation process increases (e.g., typically by increasing the
number of the significant digits by which the values p(xi ) are expressed), the
inference result is enhanced in its quality.
dpi
= f (p1 , . . . , pn )
dt
sometimes called canonical representation for a dynamic system (it can be noted
that several physical laws have this form, possibly as systems of such first-order
differential equations), then the inference becomes a prediction. The diagram in
Fig. 3.4 shows the basic validation criterion in this case: the values pi (x(tfuture ))
obtained in tcurrent as inference result (op1a + op2 ) and by directly evaluating pi
in tfuture (system dynamics + op1b ) must be compatible with each other.
Fig. 3.4.
Measurability 45
Fig. 3.5.
Not every object has every property. Given a property p, the domain of p, D(p),
is the set of objects {xi } having the property p, so that x ∈ D(p) asserts that the
object x has the property p. For example, if p is the property “length” then phys-
ical rigid objects usually belong to D(p), in the sense that they have a length,
but social objects such as organizations do not, since they do not have a length.
For a given property p and a given object x in D(p), the descriptive information
on p of x is denoted as ν = p(x) and it is called the value ν of p of x, as in
the syntagm “the value of the length of this table”, expanded but synonymous
form of “the length of this table”. Values of properties can be simple entities as
booleans, as in the case of the property “1 m length”, or they can be, for exam-
ple, vectors of numbers, as for the property “RGB color” by which each color
is associated with a triple of positive numbers. The set V = {νi } of the possible
values for p must contain at least two elements, so that the assertion p(x) = νi
conveys a non-null quantity of information, provided that the a priori probabil-
ity of the assertions p(x) = νj , i = j , is positive, so that p(x) = νi reduces the
(objective or subjective) current state of uncertainty on the property value.1
Properties can be thus interpreted as (conceptual and operative) methods to
associate values to objects. Accordingly, the diagram:
1 This standpoint has been formalized in terms of a concept of quantity of information (Shannon,
1948). The quantity of information I (ν) conveyed by an entity ν depends inversely on the proba-
bility PR(ν) assigned to ν: as PR(ν) decreases, I (ν) increases. From a subjective standpoint, I (ν)
expresses the “degree of surprise” generated by the entity ν. The boundary conditions, PR(ν) = 1
(logical certainty) and P (ν) = 0 (logical impossibility), correspond respectively to null and infinite
quantity of information conveyed by ν. Hence, an entity ν brings a non-null quantity of informa-
tion only if V contains at least a second element ν , such that I (ν ) > 0. The formal definition,
I (ν) = − log2 (PR(ν)) bit, only adds a few details to this conceptualization.
Measurability 47
can be recognized as substitutable with each other as far as the purpose of fill-
ing a given round hole is considered. This recognition requires an experimental
comparison to be performed among candidate objects, aimed at assessing their
mutual substitutability. Since it does not involve any information handling, such
a process is integrally empirical, and as such it can be considered as a primitive
operation. Properties can be operationally interpreted in terms of this concept of
mutual substitutability: if two objects, x1 and x2 , are recognized as mutually sub-
stitutable, then there exists a property p such that both x1 and x2 belong to D(p),
and their mutual substitutability is the empirical counterpart of p(x1 ) = p(x2 )
(this position endorses a generalized version of operationalism, whose origi-
nal characterization, “the concept is synonymous with a corresponding set of
operations” (Bridgman, 1927), has been acknowledged as too narrow; indeed,
nothing prevents here that the same property is evaluated by different opera-
tions). As a consequence, for a given property p, an experimental comparison
process cp (x1 , x2 ) can be available:
such that two objects x1 and x2 in D(p) can be compared relatively to p. The
process cp is formalized as a relation, so that cp (x1 , x2 ) = 1 means that x1 and
x2 are recognized in the comparison substitutable with each other as far as p
is concerned, the opposite case being cp (x1 , x2 ) = 0, where 1 and 0 correspond
thus to the boolean values ‘true’ and ‘false’ respectively:
The result for this measurement process can be thus expressed as “p(x) = ν
in reference to s by means of cp ”. Whenever distinct comparisons regularly
produce the same value, i.e., cp (s, x) = cp (s, x) even if cp = cp , then the last
specification can be removed and measurement results are expressed more cus-
tomarily as “p(x) = ν in reference to s”.
The previous diagrams assume the simplified situation in which calibration
and measurement are performed synchronously, tcal = tmeas : this is seldom the
case. More generally, the reference object s should be then identified in its
state, s = s(t), that can change during time, i.e., s(tcal ) = s(tmeas ) and therefore
cp (s(tcal ), s(tmeas )) = 0. This highlights the inferential structure of measure-
ment:
and explains why the basic requirement on reference objects is their stability.
50 L. Mari
We are now ready to introduce some extensions to the simple structure presented
above, with the aim of characterizing the measurement process with more details
and realism.
The information, that relatively to the measurand the measured object is either
equivalent or not equivalent to the chosen reference object, can be sometimes
refined, i.e., quantitatively increased. A whole set of reference objects, S = {si },
i = 1, . . . , n, called a reference set, can be chosen such that:
– the reference objects can be compared to each other with respect to the mea-
surand, and any two distinct objects in S are not equivalent to each other,
cp (si , sj ) = 0 if i = j , i.e., the objects in S are mutually exclusive with re-
spect to cp :
52 L. Mari
Hence, the measurand value for x is assigned so that if cp (si , x) = 1 then p(x) =
νi , and the measurement result is therefore expressed as “p(x) = νi in reference
to S”:
The information, that relatively to the measurand the measured object is equiv-
alent to an element of the chosen reference set, can be sometimes qualita-
tively enhanced. The elements of the reference set S can be compared to each
other with respect to an experimental, measurand-related, relation Rp . Assum-
ing Rp to be binary for the sake of notation simplicity, such a relation is such
that:
– for each couple (si , sj ) of elements in S the fact that either Rp (si , sj ) = 1 or
Rp (si , sj ) = 0 can be determined, i.e., Rp is complete on S × S;
– a relation R among the elements of the value set V is present in cor-
respondence to Rp , such that, for each couple (si , sj ) of elements in S,
Rp (si , sj ) = 1 implies R(p(si ), p(sj )) = 1, i.e., the experimental information
Measurability 53
3.3.3. Traceability
time, in which the same reference S (either a set or a relational system, possi-
bly equipped with a unit) is adopted. This requires S to be available to perform
the comparisons cp (si , x) which constitute the experimental component of mea-
surement. This problem is commonly dealt with by experimentally generating
some replicas Sx of S, and then iteratively generating some replicas Sx,y of the
replicas Sx until required, and finally disseminating these replicas to make them
widely available (according to this notation, Sx,y,z is therefore the zth replica
of the yth replica of the xth replica of S). The whole system of a reference S
and its replicas is therefore based on the assumption that cp (S, Sx ) = 1 and that,
iteratively, cp (Sx , Sx,y ) = 1, having suitably extended the relation cp to sets and
relational systems. Hence, an unbroken chain cp (S, Sx ) = 1, cp (Sx , Sx,y ) = 1,
cp (Sx,y , Sx,y,z ) = 1, etc., makes the last term traceable to S, which thus has the
role of primary reference for all the elements of the chain:
This decomposition formally justifies the initial assertion on the role of mea-
surement of bridge between the empirical realm and the linguistic/symbolic
realm: the measurement of a property p of an object x corresponds to the em-
pirical determination of the ≈p -equivalence class which x belongs to followed
by the symbolic assignment of a value to this class. I see this as the main merit
of the Level A description, which on the other hand is unable to specify any
constraint on the evaluation (note that λp is 1–1 by definition), thus leading to a
far too generic description of measurement.
The available knowledge on the property p could guarantee that among the
objects in X one or more relations related to p can be observed together with the
≈p -equivalence. For example, objects xi and xj which are not ≈p -equivalent to
each other could satisfy an order relation <p such that as far as p is concerned xi
is not only distinguishable from xj , but also “empirically less” than it. In these
cases the labeling function λp must be constrained, so to preserve the available
structural information and to allow inferring that xi <p xj from p(xi ) < p(xj ),
as from p(xi ) = p(xj ) the conclusion that xi ≈p xj can be drawn. To sat-
isfy this further condition, p is formalized as a homomorphism: this Level B
description, which clearly specializes the Level A description, emphasizes in-
deed the constraints that a consistent mapping p satisfies, as formalized by the
concept of the scale type in which the property is evaluated. For example, an
evaluation performed in an ordinal scale is defined but a monotonic transforma-
tion, so that if p : X → {1, 2, 3, 4, 5} is ordinal then the transformed mapping
p : X → {10, 20, 30, 40, 50} such that p (x) = τ (p(x)), where τ (y) = 10y,
conveys exactly the same information as p. Hence, each scale type corresponds
58 L. Mari
I see this link with the concept of scale type as the main merit of the Level
B description, which on the other hand is unable to specify any constraint on
the evaluation that guarantees its intersubjectivity and objectivity, thus leading
to a description of measurement that is still too generic. The Level B description
expresses the representational point of view to a theory of measurement.
The model of measurement that has been presented in the previous pages
corresponds to the Level C description, which characterizes measurement as a
homomorphic evaluation resulting from an empirical comparison to a reference.
Indeed, if a reference scale is available for the property p such that, for example,
an experimental order <p is defined between reference objects, then the above
specified conditions on the scale require that:
– the order <p is transferred by the comparison process cp to the objects under
measurement x, so that if si <p sj and cp (si , xi ) = 1 and cp (sj , xj ) = 1 then
also xi <p xj and therefore p(xi ) < p(xj ):
2 As a corollary of this definition, it can be easily shown that the transformation functions τ are in-
jective, i.e., map distinct arguments to distinct values. The algebraically weakest, and therefore more
general, scale type is the nominal one, for which the only preserved relation is the ≈p -equivalence,
so that the only constraint on its transformation functions is injectivity. Each other scale type spe-
cializes the nominal one by adding further constraints to injectivity, for example monotonicity for
the ordinal type and linearity for the interval type. It is precisely this common requirement of injec-
tivity that justifies the fact that the transformation functions preserve the information acquired in the
experimental interaction with the object under measurement, as expressed in the recognition of its
membership to a given ≈p -equivalence class.
Measurability 59
son process Rp (xi , xj ), or can their existence be inferred from the comparisons
R(p(xi ), p(xj ))?
The Level C description only requires the experimental observability of
the relations Rp (si , sj ), that generate the reference scale, not necessarily of
Rp (xi , xj ). Let us at first assume that also the Level B description allows the
relations Rp to be obtained indirectly (let us call this a weak representational
point of view). Accordingly, Level C specifies Level B, which is not a theory of
measurement only because too generic: the weak representational point of view
gives a necessary but not sufficient condition to characterize measurement. If, on
the other hand, the relations among the objects in X are required to be directly
observable (a strong representational point of view), the situation becomes more
complex: a property evaluation could satisfy all the Level C requirements and at
the same time the relations Rp (xi , xj ) could remain unobserved. The strong rep-
resentational point of view gives neither a sufficient nor a necessary condition
to characterize measurement (a further discussion on this subject can be found
in Mari, 2000).3
3 In its usual interpretation, is the representational point of view strong or weak? Let us take into
account a classical definition at this regards: “Measurement is the assignment of numbers to prop-
erties of objects or events in the real world by means of an objective empirical operation, in such
a way as to describe them. The modern form of measurement theory is representational: numbers
assigned to objects/events must represent the perceived relations between the properties of those
objects/events” (Finkelstein and Leaning, 1984). This emphasis on perception seems to give a clear
answer to the question.
Measurability 61
and more became crucial for any scientific analysis and development: the con-
cept of model. The current view on symbolization can be traced back to the
concept of formal system as defined by David Hilbert: theories are purely sym-
bolic constructions, and as such they can (and should) be consistent, but they are
neither true nor false since, strictly speaking, they do not talk about anything.
Truth is not a property of symbols, and surely not even of empirical objects,
but of models, i.e., interpretations of theories that are deemed to be true when-
ever they manifest themselves as empirically coherent with the given domain
of observation. According to our current model-based view, numbers are not in
the (empirical) world simply because they cannot be part of it. Indeed, let us
compare the following two statements:
– “at the instant of the measurement the object under measurement is in a defi-
nite state”;
– “at the instant of the measurement the measurand has a definite value”.
While traditionally such statements would be plausibly considered as synony-
mous, their conceptual distinction is a fundamental fact of Measurement Sci-
ence: the former expresses a usual assumption of measurement (but when some
kind of ontological indeterminism is taken into account, as in some interpreta-
tions of quantum mechanics); the latter is unsustainable from an epistemological
point of view and however operationally immaterial (a further discussion on this
subject can be found in Mari and Zingales, 2000).
The conceptual importance of the change implied in the adoption of the
concept of model should not be underestimated. It is a shift from ontology to
epistemology: measurement results report not directly about the state of the ob-
ject under measurement, but on our knowledge about this state. Our knowledge
usually aims at being coherent with the known objects (“knowledge tends to
truth”, as customarily said), but even a traditional standpoint, such as the one
supported by the above mentioned VIM, is forced to recognize that “true values
are by nature indeterminate”. The experimental situation which at best approx-
imates the concept of true value for a property is the check of the calibration
of a sensor by means of a reference object. In this case, the value for the input
property is assumed to be known before the process is performed, and there-
fore actually operates as a reference value. On the other hand, this operation is
aimed at verifying the calibration of a device, not obtaining information on a
measurand. Indeed, if the reference value is 2.345 m and the value 2.346 m is
instead experimentally obtained, then the usual conclusion is not that the refer-
ence object has changed its state (however surely a possible case), but that the
sensor must be recalibrated. Plausibly for describing this kind of peculiar situ-
ations the odd term “conventional true value” has been proposed (the concept
of “conventional truth” is not easy to understand . . .), but it should be clear that
even in these situations truth is out of scope: reference values are not expected
to be true, but only traceable. A still conservative outcome, which is adopted
more and more, is of purely lexical nature: if the reference to truth is not opera-
tional, then it can simply be removed. This has been for example the choice of
62 L. Mari
Measurement should produce information on both the measurand value and its
quality, which can be interpreted in terms of reliability, certainty, accuracy, pre-
cision, etc. Each of these concepts has a complex, and sometimes controversial,
meaning, also because its technical acceptation is usually intertwined with its
common, non-technical, usage (as a cogent example the case of the term “pre-
cision” can be considered. The VIM (ISO, 1993) does not define it, and only
recommends that it “should not be used for ‘accuracy’ ”, whereas it defines the
repeatability as the “closeness of the agreement between the results of succes-
sive measurements of the same measurand carried out under the same conditions
of measurement”, called “repeatability conditions”. A second fundamental stan-
dard document, also released by ISO (ISO, 1998a), defines the precision as “the
closeness of agreement between independent test results obtained under stipu-
lated conditions”, and then notes that the repeatability is the “precision under
repeatability conditions”).
I do not think discussing terminology is important: words can be precious
tools for knowledge, but too often discussions are only about words. The agree-
ment should be reached on procedures and possibly on concepts, not necessarily
on lexicon. I subscribe at this regards the position of Willard Van Orman Quine:
“science, though it seeks traits of reality independent of language, can neither
get on without language nor aspire to linguistic neutrality. To some degree, nev-
ertheless, the scientist can enhance objectivity and diminish the interference of
language, by the very choice of language” (Quine, 1966). Indeed, what is im-
portant for our subject is an appropriate operative modeling on the quality of
measurement, not the choice of the terms adopted to describe this modeling ac-
tivity and its results.
The structure of the measurement process, that in the previous pages I have
introduced and then variously extended, does not include any explicit component
allowing to formally derive some information about the quality of the process
itself. Such a structure can be thus thought of as an “ideal” one. Two prototypical
situations are then traditionally mentioned to exhibit the possible presence of
“non-idealities”:
– the measurement of a property whose value is assumed to be already known
(thus analogously to the check of the calibration of a sensor by means of a
reference object): a difference of the obtained value from the known one can
be interpreted as the effect of an error in the process, for example due to the
usage of an uncalibrated sensor; this effect, which is not plausibly corrected
Measurability 63
Fig. 3.6.
64 L. Mari
not at confirming the quality of the measuring system (which is instead the
task of calibration); as a consequence, systematic errors cannot usually be
evaluated;
– the repeatability is surely not a necessary condition for measurement, but it
can be sometimes assumed as the result of the analysis of the empirical char-
acteristics of both the measuring system and the object under measurement;
as a consequence, random errors can sometimes be evaluated.
Apart from these epistemological issues, the traditional interpretation of qual-
ity of measurement in terms of errors is hindered by the operative problem of
formalizing these two types of error in a compatible way, so to allow to prop-
erly combine them into a single value. None of the several solutions which
have been proposed obtained a general agreement, plausibly because of their
nature of ad hoc prescriptions (either “combine them by adding them linearly”,
or “. . . quadratically”, or “. . . linearly in the case . . . , and quadratically other-
wise”). On the other hand, this problem has been recently dealt with in a success-
ful way by the already mentioned GUM (ISO, 1995), according to a pragmatic
standpoint which is aimed at unifying the procedure and the vocabulary while
admitting different interpretations of the adopted terms. In the following this
standpoint will be explicitly presented, and maintained as a background refer-
ence.
4 My opinion is that Measurement Science is currently living a transition phase, in which the
historically dominant truth-based view is being more and more criticized and the model-based view
is getting more and more support by the younger researchers. On the other hand, the truth-based view
is a paradigm that benefits from a long tradition: the scientists and the technicians who spent their
whole live thinking and talking in terms of true values and errors are fiercely opposing the change.
An indicator of this situation is linguistic: in response to the critical analyses highlighting the lack
of any empirical basis for the concept of true value, the term “conventional true value” has been
introduced (the VIM: ISO, 1993, defines it as “value attributed to a particular quantity and accepted,
sometimes by convention, as having an uncertainty appropriate for a given purpose”). Despite its
Measurability 65
change from the truth-based view to the model-based one is a domain exten-
sion. Indeed, the uncertainty modeling does not prevent dealing with errors as a
possible cause of quality degradation, but it does not force to assume that any
quality degradation derives from errors. If measurement is not able to acquire
“pure data”, then it must be based on a model including the available rele-
vant knowledge on the object under measurement, the measuring system and
the measurand: this knowledge is generally required to evaluate the quality of
a measurement. Indeed, several, not necessarily independent, situations of non-
ideality can be recognized in the measurement process; in particular (denoting
with x the object under measurement and with s the reference to which x is
compared), it could happen that5 :
– s is not stable, i.e., it changes its state during its usage, so that the value that
was associated to it at the calibration time does not represent its state at the
measurement time (formally: p(s(t)) = p(s(t0 )) even if cp (s(t0 ), s(t)) = 0,
a comparison that can be performed only in indirect way);
– the system used to compare x to s is not repeatable, i.e., x and s are
mutually substitutable in a given time and subsequently they result as no
more substitutable even if they have not changed their state (formally: if
cp (s(t1 ), x(t1 )) = 1, then cp (s(t2 ), x(t2 )) = 0 even if cp (s(t1 ), s(t2 )) = 1 and
cp (x(t1 ), x(t2 )) = 1, where cp is a comparison process assumed to be more
repeatable than cp );
– the system used to compare x to the adopted reference has a low resolu-
tion, i.e., x is substitutable with distinct reference objects (formally: both
cp (si , x) = 1 and cp (sj , x) = 1 even if si = sj ) (this applies also to the re-
lation between a reference and its replicas in the traceability system).
This list of situations of non-ideality does not include the item that the GUM
(ISO, 1995) states as the first source of uncertainty in a measurement: the in-
complete definition of the measurand.
Because of its relevance to the very concept of measurement uncertainty,
a short analysis of the problem of the definition of the measurand is appropriate.
aim of extreme defense of the traditional paradigm, the very concept of “conventional truth” is so
manifestly oxymoric that its adoption seems to be a cure worse than the illness. A further analysis
on the current status of Measurement Science in terms of paradigms can be found in Rossi (2006).
5 In more detailed way, the GUM mentions as “possible sources of uncertainty in a measure-
ment”: “a) incomplete definition of the measurand; b) imperfect realization of the definition of the
measurand; c) non-representative sampling – the sample measured may not represent the defined
measurand; d) inadequate knowledge of the effects of environmental conditions on the measure-
ment, or imperfect measurement of environmental conditions; e) personal bias in reading analogue
instruments; f) finite instrument resolution or discrimination threshold; g) inexact values of measure-
ment standards and reference materials; h) inexact values of constants and other parameters obtained
from external sources and used in the data-reduction algorithm; i) approximations and assumptions
incorporated in the measurement method and procedure; j) variations in repeated observations of the
measurand under apparently identical conditions”.
66 L. Mari
From the previous considerations the conclusion can be drawn that measurement
is not a purely empirical operation. Indeed, any measurement can be thought of
as a three-stage process (see also Mari, 2005b):
1. acquisition, i.e., experimental comparison of the object under measurement
to a given reference;
2. analysis, i.e., conceptual modeling of the available information (the compar-
ison result, together with everything is known on the measurement system:
the measurand definition and realization, the instrument calibration diagram,
the values of relevant influence quantities, etc.);
3. expression, i.e., statement of the gathered information according to an agreed
formalization.
The crucial role of the analysis stage is emphasized by considering it in the light
of the truth-based view. Once more, were “numbers in the world”, questions
such as “how many digits has the (true) length of this table?” would be mean-
ingful, while as the power of the magnifying glass increases the straight lines
limiting the table become more and more blurred, and the very concept of length
looses any meaning at the atomic scale. If these questions traditionally remained
outside Measurement Science it is plausibly because of the impressive effective-
ness that the analytical methods based on differential calculus have shown in the
prediction of the system dynamics: this led to the assumption that the “num-
bers in the world” are real numbers.6 As a consequence, the property values are
usually hypothesized to be real numbers, or however, when the previous con-
sideration is taken into account, rationals, and therefore always scalars. On the
other hand, if it is recognized that (real) numbers are linguistic means to express
our knowledge, then the conclusion should be drawn that scalars are only one
possible choice to formalize property values, so that other options, such as in-
tervals, probability distributions, fuzzy subsets, could be adopted. Apart from
tradition, I suggest that a single, but fundamental, reason remains today explain-
ing why property values are so commonly expressed as scalars: such values act
6 Direct consequence of this standpoint is the hypothesis that “true measurement” requires conti-
nuity, so that discreteness in measurement would always be the result of an approximation. I must
confess that I am simply unable to understand the idea of numbers as empirical entities which
grounds the position that in Mari (2005) is called the “realist view”: “whether a physical phenom-
enon is continuous or not seems to be primarily a matter of Physics, not Measurement Science.
Classical examples are electrical current and energy: while before Lorenz/Millikan and Plank they
were thought of as continuously varying quantities, after them their discrete nature has been discov-
ered, with electron charge and quantum of action playing the role of ultimate discrete entities. What
is the realist interpretation of these changes in terms of the measurability of such quantities? (they
were measurable before the change, no more after; they have never been actually measurable; etc.).
In more general terms, from the fact that any physical measuring system has a finite resolution the
conclusion follows that all measurement results must always be expressed as discrete (and actually
with a small number of significant digits) entities: does it imply according to the realist view that
‘real’ measurements are only approximations of ‘ideal’ measurements, or what else?” (Mari, 2005).
Measurability 69
as input data in inferential structures, as it is the case of physical laws, which are
deemed to deal with scalars. Indeed, no logical reasons prevent to express infer-
ences, such as the above mentioned “the acceleration generated on a body with
mass m by a force F is equal to F /m”, in terms of non-scalar values, e.g., inter-
vals or fuzzy subsets, provided that the functions appearing in such inferences,
the ratio in this case, are properly defined for these non-scalar values.
The analysis stage does not univocally determine the form of measurement re-
sults also because quality of measurement is a multi-dimensional characteristic.
According to Bertrand Russell: “all knowledge is more or less uncertain and
more or less vague. These are, in a sense, opposing characters: vague knowledge
has more likelihood of truth than precise knowledge, but is less useful. One of
the aims of science is to increase precision without diminishing certainty” (Rus-
sell, 1926).
The same empirical knowledge available on a measurand value can be for-
mally expressed by balancing two components:
– one defining the specificity of the value: sometimes this component is called
precision or, at the opposite, vagueness;
– one stating the trust attributed to it: neither accuracy nor trueness (the lat-
ter term is used in ISO, 1998a, but not in the VIM: ISO, 1993) have been
mentioned here. If a reference value is not known, such quantities are simply
undefined; in the opposite case, accuracy can be thought of as the subject-
independent version of trust.
where the first term is a scalar and the second one is interpreted as a standard
deviation of the estimated measurand value, either derived from an experimen-
tal frequency distribution or obtained from an assumed underlying probability
density function.7 In the first case the estimated measurand value is defined as
the mean value of the experimental set {νi }, i = 1, . . . , n:
1
n
ν̄ = νi
n
1
7 This double option highlights, once more, the pragmatic orientation of the GUM: while tradi-
tional distinctions are aimed at identifying “types” of uncertainty (or of error, of course, as in the
case of random vs. systematic error), thus assuming an ontological basis for the distinction itself, the
GUM distinguishes between methods to evaluate uncertainty. Furthermore, the GUM removes any
terminological interference by adopting a Recommendation issued by the International Committee
for Weights and Measures (CIPM) in 1980 and designating as “Type A” the evaluations performed
“by the statistical analysis of series of observations”, and as “Type B” the evaluations performed “by
other means”. The GUM itself stress then that “the purpose of the Type A and Type B classification
is to indicate the two different ways of evaluating uncertainty components and is for convenience of
discussion only; the classification is not meant to indicate that there is any difference in the nature
of the components resulting from the two types of evaluation”.
Measurability 71
the primary model by multiplying the standard uncertainty u(ν̄) by a positive co-
efficient k, called “coverage factor”, typically in the range 2 to 3, U = ku(ν̄).
Such an interval, ν = [ν̄ − U, ν̄ + U ], has the goal “to encompass a large fraction
of the distribution of values that could reasonably be attributed to the measur-
and”. The quality of the value is now in principle formalized in terms of both
specificity and trust, as related respectively to the expanded uncertainty and the
encompassed “fraction of the distribution”, interpreted as a probability measure
and called “level of confidence” of the interval. On the other hand, since “it
should be recognized that in most cases the level of confidence (especially for
values near 1) is rather uncertain” and therefore difficult to assign, the standard-
ized decision is made of choosing a level of confidence above 0.95, by suitably
increasing the expanded uncertainty as believed to be required: the expanded
uncertainty, and therefore the specificity, is thus in practice the only component
which expresses the quality of measurement.
The reasons of this double modeling are explicitly pragmatic:
– the primary model is aimed at propagating uncertainties through functional
relationships;
– the secondary model is aimed at comparing property values to ascertain
whether they are compatible to each other.
Let us introduce the main features of these application categories.
– ν̄q is obtained by applying the function f to the estimated values of the input
properties: ν̄q = f (ν̄1 , ν̄2 , . . . , ν̄k );
– u(ν̄q ) is obtained by assuming that the uncertainty on each property pj pro-
duces a deviation Δνj from the mean value ν̄j , so that the problem is to derive
the standard deviation of ν̄q from f (ν̄1 + Δν1 , ν̄2 + Δν2 , . . . , ν̄k + Δνk ).
The technique recommended by the GUM is based on the hypothesis that
the function f can be approximated by its Taylor series expansion in the k-
dimensional point ν̄1 , ν̄2 , . . . , ν̄k . In the simplest case, in which all input prop-
erties are independent and f is “linear enough” around this point, the series
expansion can be computed up to the first-order term:
k
u2 (ν̄q ) = cj2 u2 (ν̄j )
j =1
i.e., the partial derivative of the function f with respect to the j th property as
computed in the point ν̄1 , ν̄2 , . . . , ν̄k , operates as a “sensitivity coefficient”.
This dependence becomes more and more complex as higher-order terms in the
Taylor series expansion and/or the correlations among the input properties are
taken into account: at this regards some further technical considerations can be
found in the GUM, and several cogent examples are presented in Lira (2002).
This logic of solution to the problem of the propagation of uncertainty is based
on the traditional choice of expressing the property value as a scalar entity, dis-
tinct from the parameter specifying its quality: property values are dealt with in
a deterministic way and analytical techniques are applied for formally handling
the uncertainty.8
8 The working group who created the GUM is currently preparing some addenda to it, and in par-
ticular the “Supplement 1: Numerical methods for the propagation of distributions”, which presents
an alternative solution to the problem of uncertainty propagation. Whenever input property values
can be expressed as probability density functions, the whole functions can be propagated, to obtain
a “combined propagated function”. This logic is in principle more general than the one endorsed by
the GUM, since the mean value and its standard deviation are trivially derived from a probability
density function. On the other hand, since the combined propagated function cannot generally be
obtained by analytical techniques, the propagation can be performed in a numerical way, typically
by the Monte Carlo method.
Measurability 73
Fig. 3.7.
(See Fig. 3.7.) Whenever the property values are expressed with a non-null un-
certainty, case 3 of ambiguity can always appear in the borderline situations. On
the other hand, the frequency of this case statistically decreases as the width of
the interval ν decreases (whereas in the extreme situation in which the width
of ν is greater than the width of c the first case cannot occur): the quality of
measurement influences the ambiguity of the conformance decision.
Object state
non-conformance conformance
Decision refuse ok: correct refusal type 2 error
accept type 1 error ok: correct acceptation
such resources with the quality of the measurement results. The stated goal of
the process should allow to identify a lower bound for acceptable quality and an
upper bound for acceptable costs, so that the decision space can be split in three
subspaces, for decisions leading respectively to (see Fig. 3.8):
Accordingly, the decision should be made before measurement about its mini-
mum acceptable quality threshold, expressed by the so called target uncertainty,
so that the measurement process should be performed according to the following
procedure:
1. decide the minimum acceptable quality, i.e., the target uncertainty, uT , and
the maximum acceptable costs, i.e., the resource budget (i.e., define the four
subspaces in Fig. 3.8);
2. estimate the minimum costs required to obtain uT : if such costs are beyond
the resource budget, then stop as unfeasible measurement (the procedure
stops in subspaces 2 or 3);
3. identify the components which are deemed to be the main contributions to
the uncertainty budget;
4. choose an approximate method to combine such contributions, credibly lead-
ing to overestimate the combined uncertainty;
5. perform the measurement by keeping into account the identified contribu-
tions and evaluate the result by combining them, thus obtaining a measure-
ment uncertainty uM ;
6. compare uM to uT : if uM < uT , then exit the procedure by stating the ob-
tained data as the result of an appropriate measurement (area 4);
7. estimate the current costs: if such costs are beyond the resource budget, then
stop as unfeasible measurement (area 2);
8. identify further contributions and/or enhance the method to combine them;
9. repeat from 5.
The pragmatic ground of this algorithm is manifest: as soon as the available in-
formation allows to unambiguously satisfy the goal for which the measurement
is performed, the process should be stopped. Any further activity is not justified,
because useless and uselessly costly: the concept of “true uncertainty” is simply
meaningless.
76 L. Mari
Fig. 3.8.
3.6. Conclusions
References
Benoit, E. Foulloy, L. Mauris, G. (2005). Fuzzy approaches for measurement. In: Sydenham, P.,
Thorn, R. (Eds.), Handbook of Measuring System Design, pp. 60–67, Wiley, ISBN 0-470-
02143-8.
Boumans, M. (2007). Invariance and calibration. In: this book.
Bridgman, P.W. (1927). The Logic of Modern Physics. MacMillan, New York.
Measurability 77
Carbone, P., Buglione, L., Mari, L., Petri, D. (2006). Metrology and software measurement: A com-
parison of some basic characteristics. Proc. IEEE IMTC, pp. 1082–1086, Sorrento, 24–27 Apr.
D’Agostini, G. (2003). Bayesian Reasoning in Data Analysis – A Critical Introduction. World Sci-
entific Publishing, Singapore.
Finkelstein, L., Leaning, M. (1984). A review of the fundamental concepts of measurement. Mea-
surement 2, 1.
International Organization for Standardization (ISO) (1993). International Vocabulary of Basic and
General Terms in Metrology. Second ed. Geneva, 1993 (published by ISO in the name of BIPM,
IEC, IFCC, IUPAC, IUPAP and OIML).
International Organization for Standardization (ISO) (1995). Guide to the Expression of Uncertainty
in Measurement. Geneva, 1993, amended 1995 (published by ISO in the name of BIPM, IEC,
IFCC, IUPAC, IUPAP and OIML) (also ISO ENV 13005: 1999).
International Organization for Standardization (ISO) (1998a). Accuracy (Trueness and Precision) of
Measurement Methods and Results – Part 1: General Principles and Definitions. Geneva.
International Organization for Standardization (ISO) (1998b). 14253-1: Geometrical Product Spec-
ification – Inspection by Measurement of Workpieces and Measuring Instruments. Part 1: Deci-
sion Rules for Proving Conformance or Non-Conformance with Specification. Geneva.
Kuhn, T.S. (1970). The Structure of Scientific Revolutions. Univ. of Chicago Press, Chicago.
Lira, I. (2002). Evaluating the measurement uncertainty – Fundamentals and practical guidance.
Institute of Physics, ISBN 0-750-30840-0.
Mari, L. (1997). The role of determination and assignment in measurement. Measurement 21 (3),
79–90.
Mari, L. (1999). Notes towards a qualitative analysis of information in measurement results. Mea-
surement 25 (3), 183–192.
Mari, L. (2000). Beyond the representational viewpoint: A new formalization of measurement. Mea-
surement 27 (1), 71–84.
Mari, L. (2003). Epistemology of measurement. Measurement 34 (1), 17–30.
Mari, L. (2005). The problem of foundations of measurement. Measurement 38 (4) 259–266.
Mari, L. (2005b). Explanation of key error and uncertainty concepts and terms. In: Sydenham, P.,
Thorn, R. (Eds.), Handbook of Measuring System Design. Wiley, pp. 331–335, ISBN 0-470-
02143-8.
Mari, L. Zingales, G. (2000). Uncertainty in measurement science. In: Karija, K., Finkelstein, L.
(Eds.), Measurement Science – A Discussion. Ohmsha, IOS Press, ISBN 4-274-90398-2 (Ohmsha
Ltd.)/1-58603-088-4.
Michell, J. (2007). Representational theory of measurement. In: this book.
Morgan, M.S. (2007). A brief and illustrated analytical history of measuring in economics. In: this
book.
Phillips, S.D., Estler, W.T., Doiron, T., Eberhardt, K.R., Levenson, M.S. (2001). A careful consid-
eration of the calibration concept. Journal of Research of the National Institute of Standards and
Technology 106 (2), 371–379 (available on-line: http://www.nist.gov/jresp).
Quine, W.V.O. (1966). The scope and language of science. In: The Ways of Paradox. Random House.
Rossi, G.B. (2006). An attempt to interpret some problems in measurement science on the basis of
Kuhn’s theory of paradigms. Measurement 39 (6), 512–521.
Russell, B. (1926). Theory of Knowledge. The Encyclopedia Britannica.
Shannon, C.E. (1948). A mathematical theory of communication. Bell System Technical Journal 27,
379–423, 623–656.
Taylor, B.N. Kuyatt, C.E. (1994). Guidelines for evaluating and expressing the uncertainty of NIST
measurement results. Technical note 1297. National Institute of Standards and Technology (avail-
able on-line: http://physics.nist.gov/Pubs/guidelines/TN1297/tn1297s.pdf).
This page intentionally left blank
CHAPTER 4
Abstract
Many predictions of economic theory depend on the assumed aversion of in-
dividuals towards risk. We examine statistical aspects of controlling for risk
aversion in the lab, and the implications that these have on the ability to test
expected utility theory. The concerns expressed here regarding the importance
and difficulty of generating precise estimates of individual risk attitudes gen-
eralize to a wide range of other individual characteristics, such as inequality
aversion and trust. We show that imprecision in estimated individual charac-
teristics may result in misleading conclusions in tests of the underlying theory
of choice. We also show that the popular instruments and statistical models
used to estimate risk attitudes do not allow sufficiently precise estimates. Given
existing laboratory technology and statistical models, we conclude that con-
trols for risk aversion should be implemented using within-subjects, “revealed
preference” designs that utilize the direct, raw responses of the subject. These
statistical issues are generally applicable to a wide variety of experimental situ-
ations.
Experimental methods provide the promise that economists will be able to mea-
sure latent concepts with greater reliability. The reason is that experimental
methods offer the possibility of controlling potential confounds.
However, the use of experimental controls might not lead to more reliable
measurements. One reason is that the imposition of an artefactual control might
itself lead to changes in behavior compared to the naturally occurring coun-
terpart of interest. Concern with this problem has spurred interest in field ex-
periments, where the controls are less artificial than in many laboratory exper-
iments.1 It has also spurred renewed interest in sample selection and sorting
processes.2
Another reason that experimental controls might not generate more reli-
able measurements is that the latent data-generating process might simply be
misspecified. If the experimental design is motivated by a model of the data-
generating process that is invalid, then there can be no expectation that the
controls will improve measurement and inference. For example, if there are ac-
tually two or more distinct data-generating processes at work, and we assume
one, then systematically invalid inferences can result.3
We consider a third way in which experimental controls might influence
measurement inference, by allowing “unobservables” to become “observable.”
Concepts that previously needed to be assumed to take on certain values or dis-
tributions a priori, can now be measured and controlled. In turn, this allows
conditional measurements to be made unconditionally, akin to the integration of
“nuisance parameters” in Bayesian analysis. We consider a substantively fun-
damental application of these ideas, to the evaluation of choice behavior under
uncertainty when one has experimental control of risk attitudes.4
Many predictions of economic theory depend on the assumed aversion of in-
dividuals towards risk. Empirical research requires that one make a maintained
assumption about risk attitudes or devise controls for risk aversion. The first
strategy has the obvious disadvantage that the maintained assumption may be
false. The second strategy is becoming feasible, particularly with the develop-
ment of simple pre-tests for risk aversion in laboratory settings. We examine
statistical aspects of controlling for risk aversion in the lab, and the implications
that these have on the ability to test expected utility theory (EUT). The concerns
expressed here regarding the importance and difficulty of generating precise es-
timates of individual risk attitudes generalize to a wide range of other individual
characteristics, such as inequality aversion and trust.5 Imprecision in estimated
individual characteristics may result in misleading conclusions in tests of any
underlying theory of choice.
1 Harrison and List (2004) review this literature, and this concern with laboratory experiments.
2 For example, see Botelho et al. (2005), Harrison, Lau and Rutström (2005), Kocher, Strauß and
Sutter (2006) and Lazear, Malmendier and Weber (2006).
3 Harrison and Rutström (2005) illustrate this point by comparing estimates of choice behavior
when either expected utility theory or prospect theory are assumed to generate the observed data,
and contrast the results with those obtained from a finite mixture model that allows both to be
valid for distinct sub-samples. Similarly, Coller, Harrison and Rutström (2006) compare inferences
about temporal discounting models when one assumes that subjects discount exponentially or quasi-
hyperbolically, when the data is better characterized by again assuming that distinct sub-sample
follow each model.
4 Other applications include Harrison (1990), Engle-Warnick (2004) and Karlan and Zinman
(2005).
5 Methodologically related experimental procedures are being used to identify the extent of “in-
equality aversion” in tests of the propensity of individuals to “trust” each other. Cox (2004) discusses
the need for controls in experiments in this context.
Measurement With Experimental Controls 81
The way in which controls for risk aversion can be implemented varies with
the experimental design. If a “within-subjects” design is used, in which the same
subject takes part in a risk aversion test and some other task, one can directly use
the results for that subject to control for theoretical predictions in the other task.
In general such responses are likely to respect the individual heterogeneity that
one would expect from risk aversion, which is after all a subjective preference.
If a “between-subjects” design is used, in which different subjects6 are sam-
pled for the risk aversion test and the other task, one must construct a statistical
instrument for the risk attitudes of subjects in the latter task.
Instruments for risk attitudes can be generated by constructing a statistical
model of risk attitudes from the responses to the risk aversion task, and then
using that model to predict the attitudes of the subjects in the other tasks. Given
the time and monetary cost of eliciting risk attitudes in addition to some other
experimental task, such methodological short-cuts would be attractive to experi-
menters if reliable. Of course, relying on a statistical model means that one must
recognize that there is some sampling error surrounding the estimated risk at-
titude, even if one assumes that the correct specification has been used for the
statistical model.
The issue of imprecision in measuring risk is readily apparent when one uses
a statistical model to predict risk parameters. While less obvious, this issue still
arises when one uses a “direct test” in a within subjects design. Imprecision in
directly eliciting risk aversion may arise for several reasons.7 First, our risk elic-
itation task may not yield precise estimates due to “trembling hand” error on
behalf of the subject. For example, even when given a simple choice between
two lotteries, a subject may, with some positive probability, indicate one lottery
when they intended to choose the other. A second source of error occurs if our
risk attitude task does not elicit enough information to make sufficiently precise
inferences about the parameters of the choice model. We can reduce the impre-
cision by improving the design of the risk elicitation task, but we still need a
way to characterize the degree of imprecision in the estimated parameters and
to gage its impact on any conclusions that can be drawn based on responses in
the subsequent choice task.8
We illustrate these procedures using a test of EUT as the “choice task” for
which one needs a control for risk aversion.9 We use data from a previous experi-
ment in which subject choices have been shown to be inconsistent with expected
6 We will assume that these are two samples drawn from the same population, and that there are
no sample selection biases to worry about. These potential complications are not minor.
7 “Imprecision” here is used in the usual econometric sense of minimizing the confidence intervals
for the underlying parameter.
8 Moffatt (2007) uses the concept of D-Optimal design to maximize the overall information content
of an experiment.
9 Rabin (2000) examines the theoretical role of risk aversion and EUT, and argues that EUT must
be rejected for individuals who are risk averse at low monetary stakes. If true, then further tests
of EUT are not needed for those individuals who are found to be risk averse in these low stake
lottery choices. He proves a calibration theorem showing that if individuals are risk averse over
82 G.W. Harrison et al.
utility given the estimate of the individual’s risk attitude. We begin our analysis
by allowing for the possibility that subjects are noisy decision-makers. One way
to incorporate subject errors is to calculate their cost and ignore those inconsis-
tent choices that have a trivial error cost. We show that the percentage of choices
violating EUT remains high even when we consider only those errors that are
costly to the subject. We then ask whether these results are sensitive to the preci-
sion with which we estimated an individual subject’s risk attitude. The data we
use were implemented using a full within-subjects design, allowing us to com-
pare the use of direct, raw risk aversion measures for each subject in the EUT
task with the use of instruments generated by a statistical model. The method we
adopt is to examine the sensitivity of our conclusions about EUT to small pertur-
bations in the estimated risk preference parameters. We can think of this test of
empirical sensitivity as a counterpart to the formal sensitivity tests proposed by
Magnus (2007). Leamer (1978, p. 207) and Mayer (2007) remind us to consider
both economic significance as well as statistical significance when evaluating
estimates of a parameter of interest. Our objective is to evaluate whether our eco-
nomic conclusions are sensitive to small changes in the estimated parameters. In
the case of our statistical model, the estimated confidence interval provides the
standard region in which to conduct the perturbation study. When we use direct
measures of risk, the nature of the experiment suggests natural regions in which
to check for parameter sensitivity. These methods can also be used to address
the issue of precision in tests of choice models other than EUT. Tests of cumu-
lative prospect theory, for example, are conditional on the estimated parameters
of the choice function and the robustness of the conclusions will depend on the
precision with which the initial parameters were estimated.10
In Section 4.1 we review the need for estimates of risk attitudes in tests of
EUT and show how inference is affected by risk aversion. In Section 4.2 we
discuss experimental procedures for characterizing risk attitudes due to Holt
and Laury (2002). We present the distribution of estimated risk attitudes of our
low stakes lotteries then there are absurd implications about the bets those individuals will accept
at higher stakes. Following the interpretation of these arguments by Cox and Sadiraj (2006) and
Rubinstein (2002), a problem for EUT does indeed arise if (a) subjects exhibit risk aversion at low
stake levels, and (b) one assumes that utility is defined in terms of terminal wealth. If, on the other
hand, one assumes utility is defined over income, this critique does not apply. A close reading of
Rabin (2000, p. 1288) is consistent with this perspective, as is the model proposed by Charness and
Rabin (2002) to account for experimental data they collect. Whether or not one models utility as a
function of terminal wealth (EUTw) or income (EUTi) depends on the setting. Both specifications
have been popular. The EUTw specification was widely employed in the seminal papers defining risk
aversion and its application to portfolio choice. The EUTi specification has been widely employed
by auction theorists and experimental economists testing EUT, and it is the specification we employ
here. Fudenberg and Levine (2006) provide another framework for reconciling the EUTi and EUTw
approaches, by positing a “dual self” model of decision-making in which a latent EUTw-consistent
self constrains choices actually observed by the EUTi-motivated self.
10 For example Harbaugh, Krause and Vesterlund (2002) test the fourfold patten of risk attitude
predicted by cumulative prospect theory. Their tests are designed based on parameter estimates from
Prelec (1998) and others.
Measurement With Experimental Controls 83
subject pool and examine its implications for subjects’ preferences over lot-
tery choices taken from the EUT choice experiments. In Section 4.3 we show
that, for each subject and each choice, the cost of choosing inconsistently with
EUT can be calculated. Conditional on our risk aversion estimates, we find that
subjects frequently violate EUT even when the cost of doing so is high. In
Section 4.4 we examine whether our risk aversion estimates provide sufficient
precision for reaching meaningful conclusions about EUT. We show that the use
of instruments, based on a statistical model, does not allow sufficiently precise
estimates of risk aversion for our purposes. We discuss various reasons for this
outcome. The implication is, however, that with existing laboratory technology
and statistical models, controls for risk aversion should be implemented using
within-subjects designs that utilize the direct, raw responses of the subject.11
Experiments that test EUT at the level of the individual typically require that the
subject make two choices, so that we can compare their consistency. The first
lottery choice can be used to infer the subject’s risk attitude, and then the second
choice can be used to test EUT, conditional on the risk attitude of the subject.
Thus, preferences have to be elicited over two pairs of lotteries for there to be a
test of EUT at all.
For a specific example of the frequently used “Common Ratio” (CR) test,
suppose Lottery A consists of prizes $0 and $30 with probabilities 0.2 and 0.8.
Lottery B, consisting of prizes $0 and $20 with probabilities 0 and 1. Then one
may construct two additional compound lotteries, A∗ and B ∗ , by adding a front
end probability q = 0.25 of winning zero to lotteries A and B. That is, A∗ offers
a (1 − q) chance to play lottery A and a q chance of winning zero. Subjects
choosing A over B and B ∗ over A∗ , or choosing B over A and A∗ over B ∗ , are
said to violate EUT.
To show precisely how risk aversion does matter, assume that risk attitudes
can be characterized by the popular Constant Relative Risk Aversion (CRRA)
function, U (m) = (m1−r )/(1 − r), where r is the CRRA coefficient. The cer-
tainty equivalents (CE) of the lottery pairs AB and A∗ B ∗ as a function of r are
shown in the left and right upper panels respectively of Fig. 4.1. The CRRA co-
efficient ranges from −0.5 (moderately risk loving) up to 1.25 (very risk averse),
11 To what extent do our conclusions transfer beyond the lab to field experiments? In that setting one
often encounters apologies that it was not possible to control everything of theoretical interest, but
that the tradeoff was worth it because one is able to make more “externally valid” inferences about
behavior. Such claims should be viewed with suspicion, and are often made just to hide incomplete
experimental designs (Harrison, 2005). One can always condition on a priori distributions that might
have been generated by other samples in more controlled settings (e.g., Harrison, 1990), such that
inferences based on posteriors do not ignore that conditioning information. Or one can complement
“uncontrolled” field experiments with controlled lab experiments, acknowledging that controls for
risk attitudes in the latter might interact with other differences between the lab and the field.
84
G.W. Harrison et al.
Fig. 4.1: Risk attitudes and common ratio tests of EUT.
Measurement With Experimental Controls 85
the expected values are $3.86 and $3.85, respectively. The other five PR lottery
pairs are similar. Because the lottery pairs in these experiments have virtually
identical expected values, the difference in CE is zero if the CRRA coefficient
r = 0.14 Figure 4.2 shows that the difference in CE varies over the 6 lotteries.15
Based on the observed distribution of risk attitudes in our sample, we can
calculate the EUT consistent choice in each of the 8 lottery pairs, assuming
a CRRA utility function.16 Abstracting from any consideration of the size of
the CE difference, only 52% of observed choices were consistent with EUT
when pooling over all 8 lottery pairs. Only in one PR lottery pair does one see a
proportion of choices that are markedly higher than 50% and therefore consistent
with EUT. However, even those observations would require us to have a high
tolerance for errors in the data in order to accept EUT. One would therefore
reject the predictions of EUT for this set of choices, conditional on the point
estimates of risk aversion being accepted.
While we observe a high rate of lottery choices inconsistent with EUT, this
analysis does not consider whether the apparent errors are costly from the per-
spective of the subject. We ask here whether consistency with EUT increases as
the cost of an error increases. Moreover, we investigate whether our rejection of
EUT, conditional on the point estimates of risk aversion being accepted is sen-
sitive to the precision of our estimates of the underlying CRRA coefficients.
Since the calculation of the CE differences depends critically on the CRRA
estimates, we need to measure the robustness of our EUT findings to the im-
precision of those estimates. While we focus on tests of EUT, this question of
precision in estimating the parameters of the choice function is also relevant
for tests of cumulative prospect theory, models of choice with altruism, other
regarding preferences, etc.
One might ask how one can test EUT when one must assume that EUT holds
in order to measure risk attitudes. Our point is that tests of EUT are incomplete
if they do not also include a joint hypothesis about risk attitudes and consistent
behavior over lottery choices. That is, one has to undertake such tests jointly
or else one cannot test EUT at all, since the subject might be indifferent to the
choices posed. Or, more accurately, without such tests of risk attitudes, the ex-
perimenter is unable to claim that he knows that the subject is not indifferent.
So, just as EUT typically entails consistent behavior across two or more pairs of
14 The payoffs in choice pair #6 have been altered slightly from the values in Grether and Plott
(1979) so that a risk neutral individual is not indifferent between the two.
15 Moffatt (2007) develops a technique to select the parameters of the experiment to maximize the
information content. Rather than use his technique, we adopted familiar tests from the literature,
which has the advantage of allowing comparisons to previous findings.
16 There are 8 lotteries in total, 6 PR lotteries and 2 CR lotteries. Each individual was presented
with all 6 PR choices but only one of the CR choices. Hence, we have 7 observed choices for
each individual. The analysis in Harrison et al. (2003) was undertaken in the usual manner from
the “preference reversal” literature: the direct binary choices of each subject were compared to the
implied choices from the selling prices elicited over the underlying lotteries.
Measurement With Experimental Controls
Fig. 4.2: Risk attitudes and preference reversal choice pairs
87
(difference in certainty equivalents favoring P-bet in each pair).
88 G.W. Harrison et al.
We use data from experiments reported in Harrison et al. (2003). These experi-
ments implement both a risk elicitation task and several lottery choices following
those used in earlier experimental tests of the CR and PR phenomena.
The risk elicitation task follows Holt and Laury (2002) who devise a simple
experimental measure for risk aversion using a multiple price list design. Each
subject is presented with a choice between two lotteries, which we can call A
or B. Table 4.1 illustrates the basic payoff matrix presented to subjects. The
first row shows that lottery A offered a 10% chance of receiving $2 and a 90%
chance of receiving $1.60. The expected value of this lottery, EVA , is shown
in the third panel as $1.64, although the EV columns were not presented to
subjects.17 Similarly, lottery B in the first row has chances of payoffs of $3.85
and $0.10, for an expected value of $0.48. Thus the two lotteries have a relatively
large difference in expected values, in this case $1.17. As one proceeds down the
matrix, the expected value of both lotteries increases, but the expected value of
lottery B becomes greater than the expected value of lottery A.
The subject chooses A or B in each row, and one row is later selected at
random for payout for that subject. The logic behind this test for risk aversion
is that only risk-loving subjects would take lottery B in the first row, and only
risk-averse subjects would take lottery A in the second-to-last row. Arguably,
the last row is simply a test that the subject understood the instructions, and has
no relevance for risk aversion at all. A risk neutral subject should switch from
choosing A to B when the EV of each is about the same, so a risk-neutral subject
would choose A for the first four rows and B thereafter.
Holt and Laury (2002) examine two main treatments designed to measure the
effect of varying incentives.18 They vary the scale of the payoffs in the matrix
shown in Table 4.1 by multiplying the payoffs by 20, 50, or 90. Thus, Table 4.1
shows the scale of 1.
17 There is an interesting question as to whether they should be provided. Arguably the subjects
are trying to calculate them anyway, so providing them avoids a test of the joint hypothesis that
“the subjects can calculate EV in their heads and will not accept a fair actuarial bet.” On the other
hand, providing them may cue the subjects to adopt risk-neutral choices. The effect of providing EV
information deserves empirical study.
18 Holt and Laury’s (2002) design provides in-sample tests of the hypothesis that risk aversion does
not vary with income, an important issue for those that assume specific functional forms such as
CRRA or constant absolute risk aversion (CARA), where the “constant” part in CRRA or CARA
refers to the scale of the choices. A rejection of the “constancy” assumption is not a rejection of
EUT in general, of course, but just these particular (popular) parameterizations.
Table 4.1: Design of the Holt and Laury risk aversion experiments
(Standard payoff matrix).
89
90 G.W. Harrison et al.
Harrison et al. (2003) adapt the Holt and Laury (2002) procedure by scaling it
appropriately for the present purposes. Multiplying by 10 the original payoff
scale of 1, which has prizes ranging between $0.10 and $3.85, provides re-
sponses that span prizes between $1.00 and $38.50. These two payoffs scales
are referred to as 1x and 10x hereafter. The 10x payoffs comfortably covers the
range of prizes needed to apply the measures of risk aversion to our experiments.
All subjects were given the 10x test, but some were also given a 1x test prior
to the 10x, which we refer to as the 1x10x treatment since these payoffs are
comparable to the EUT decision tasks.19
Apart from conducting experiments to elicit subjects attitudes toward risk (the
risk aversion experiments) Harrison et al. (2003) also conducted experiments
with the same subjects in order to test for violations of EUT, controlling for
risk aversion. To avoid possible intra-session effects, only one experiment was
run in each session. The same subjects were contacted again by e-mail and in-
vited to participate in subsequent experiments that were separated by at least one
week.20 Students were recruited from the University of South Carolina. In total,
152 subjects participated in a risk aversion experiment and, of those, 88 also
participated in the lottery choice experiments. Overall, there were 88 subjects
for whom we can match results from the risk aversion test to the lottery choice
task. No attempt was made to screen subjects for recruitment into subsequent
experiments based on their choices in earlier experiments.
All subjects received a fixed show-up fee of $5 in each of the three experi-
ments, consistent with our standard procedures.21 This is a constant across all
subjects, and does not vary with the decisions the subjects faced. No subject
faced losses.
The lower left panel of Fig. 4.3 displays the elicited CRRA coefficients for
our sample, based on a sample of 152 subjects. We employ the CRRA utility
function introduced earlier to define the CRRA intervals represented by each
row in the payoff matrix faced by the subject shown in Table 4.1, although other
functional forms could also be used and would lead to similar conclusions. Each
subject is assigned the midpoint of the CRRA interval at which they switch from
choosing lottery A to lottery B.22 The right column in Table 4.1 shows CRRA
intervals associated with each switch point. The resulting distribution of risk
attitudes is depicted in the bottom left panel in Fig. 4.3. While a small portion of
19 The reason for this design was to test for “order effects” in elicited risk attitudes, as reported in
Harrison et al. (2005b).
20 The time between sessions for a given subject was usually one or two weeks. Harrison et al.
(2005a) show that elicited risk attitudes for this sample are stable over time horizons of several
months.
21 The subjects were recruited in lectures for experiments run during the usual lecture time and they
received no show-up fee. In our case, subjects were recruited via Ex-Lab (http://exlab.bus.ucf.edu),
consisting of a combination of e-mail alerts and on-line registration schedules from a subject pool
database.
22 Several subjects switched two or more times. In this case we use the first and last switch points
to define a relatively “fat” interval for that subject.
Measurement With Experimental Controls
91
Fig. 4.3: Observed risk attitudes and common-ratio tests of EUT.
92 G.W. Harrison et al.
23 In addition, there are problems asking a subject to give two “real responses” in the lab. First,
there might be wealth effects, or expected wealth effects, when the earnings from one lottery affect
valuations for the second lottery. Second, if one picks out one choice at random to pay the subject,
one is assuming that one of the axioms of EUT (independence) is correct. If it is not, then this
random payoff device can generate inconsistent preferences even if the underlying preferences are
consistent. These points are well known in the experimental literature, and are important if one is
attempting to identify which axioms of EUT might be in error.
24 For subjects that participated in the 1x10x experiments, the data constitute a panel consisting of
two observations for that subject, so we use panel interval regression models with random individual
effects. We included a binary indicator in the regression to control for order effects when subjects
did both 1x and 10x tasks.
25 These were binary indicators for sex, race (black), a Business major, Sophomore status, Junior
status, Senior status, high GPA, low GPA, Graduate student status, expectation of a post-graduate
education, college education for the father of the subject, college education for the mother of the
subject, and US citizen status. We also included age in years.
Measurement With Experimental Controls 93
We first consider the possibility that subjects may be more likely to choose in-
consistently with EUT when the cost of doing so is trivial. Figure 4.4 shows the
fraction of EUT consistent choices as a function of CE differences, using all the
data from the two CR choices and the 6 PR choices. For each threshold listed
on the bottom axis, the calculations underlying these figures drop any choice
that entails a CE difference that is less than the indicated threshold. Thus, as
the threshold gets above several pennies, many of the A∗ B ∗ choices faced by
risk averse subjects are naturally dropped from consideration. Figure 4.4 shows
thresholds for the difference in CE varying from 0 cents up to 100 cents. The
thin, dashed line shows the fraction of choices above the threshold on the bottom
axis. Thus, for a threshold of 0 cents 100% of the choices are considered (i.e.,
the choices from the 6 PR choices plus the 2 CR choices, for all individuals).
As the threshold increases, additional choices are dropped. Whether a choice is
dropped depends on the estimated risk aversion of the subject and the parame-
ters of the lotteries in each choice, since these are the factors determining the
CE. The heavy, solid line shows the fraction of the remaining choices that are
consistent with the EUT prediction.
Surprisingly, the ability of EUT to predict choices does not appear to increase
as the threshold for CE differences is increased. Our earlier conclusion, that
there is little support for EUT, is therefore not affected by excluding observations
that were based on small differences in the CE of the lotteries (and where “small”
is defined parametrically, so that the reader may individually decide what is
Fig. 4.4: Fraction of EUT consistent choices as a function of certainty equivalent differences.
94 G.W. Harrison et al.
small). Moreover, although not shown in Fig. 4.4, we find that EUT does not do
better than about 50% correctly predicted even if CE differences are required to
exceed $3.00. Simple random chance would explain these data better than EUT.
Figure 4.5 undertakes the same analysis at the level of each individual task.26
These results show that the fraction of choices above the threshold, shown by
the thin dashed line, stays quite high for most of the preference reversal choice
tasks. This is by design, given that we have a generally risk averse subject pool.
By contrast, the fraction of choices above the threshold drops rapidly in the CR
task involving lotteries A∗ and B ∗ , as implied from Fig. 4.1. In fact, given the
CRRA values observed in our sample, no decisions have CE differences greater
than about 30 cents for the A∗ B ∗ pair. For this task and two of the preference
reversal tasks, Pair 3 and 6, the fraction of choices consistent with EUT, shown
by the heavy, solid line, does increase as the threshold increases. However, this
only occurs for a small fraction of choices since the number of choices above
the threshold falls rapidly for these three tasks. For the remaining tasks, there
appears to be no relationship between the minimum threshold and the extent to
which choices are consistent with EUT. This is particularly telling since these
are the tasks for which a substantial fraction of choices exceed the threshold.
We have seen that error rates do not decline even when CE differences are large.
While we do not see any persuasive evidence that the size of CE differences
affects our conclusions about EUT , we must recognize that our risk aversion co-
efficient estimates for individuals may be imprecise. As seen in Figs. 4.1 and 4.2,
CE differences are very sensitive to the CRRA coefficient. Small changes in r
can change an observed choice from being considered a trivial violation to a
costly violation, or to no violation. Imprecision in estimating the CRRA coeffi-
cient must be taken into account when evaluating the data.
Imprecision may arise if our risk elicitation task does not yield precise esti-
mates due to “trembling hand” error on behalf of the subject, or to our failure
to elicit sufficient information to make more precise inferences about the risk
attitudes of the subject. To illustrate an important but subtle point, imagine we
had collected information on hair color and used that to explain the risk aversion
choices of our subject. We anticipate that this would be a poor statistical model,
generating extremely wide standard errors on our forecast of the individual’s
risk attitudes. As a result, it would be very easy to find a predicted risk aversion
coefficient within a 95% confidence interval of the mean predicted value that
includes the point of indifference. Thus one could almost always find a risk atti-
tude that makes the observed choices consistent with EUT, but only because the
statistical model was so poor. We have selected individual characteristics that
26 The lines in Fig. 4.5 are defined identically to those in Fig. 4.4.
Measurement With Experimental Controls
95
Fig. 4.5: Fraction of EUT consistent choices as a function of certainty equivalent differences.
96 G.W. Harrison et al.
are standard in empirical work of this kind, but there is always the risk that none
of these characteristics help us predict risk attitudes carefully.
For all the analyses and tests employed so far, the way we characterize risk
attitudes makes no difference to the conclusions we draw. Using the interval
regression model to generate average risk aversion estimates for each individual
yields indistinguishable results from the alternative, less parametric, approach
which measures risk aversion for each individual using the observed interval
at which the individual switches from safe to risky. The reason that these two
approaches to inferring risk attitudes generate the same conclusions is that the
averages from the interval closely approximate the average prediction from the
interval regression model.
However, when considering the impact of the precision of the risk aversion
estimates, the conclusions we draw are sensitive to how we statistically charac-
terize risk attitudes. Thus, we will first consider the precision of raw responses,
and then compare the precision of the CRRA coefficients estimated using inter-
val regression models.
First, consider a minimally parametric approach that does not condition on
the socio-demographic characteristics of the subjects. This allows us to focus on
the imprecision inherent in the experimental task rather than prediction error in
the regression model. Because subjects were only given ten questions in the risk
aversion task, we only know the interval at which the subject switched from the
safe to risky choice.27 For each individual we know the upper and lower bounds
of their “switching” interval. Any CRRA coefficient between these bounds is
consistent with the observed switching behavior of the individual, and equally
plausible a priori. Each CRRA coefficient in the interval is associated with a CE
difference; hence, there is a range of equally plausible CE differences. For each
individual and each choice, we pick the most “conservative” CRRA coefficient,
that is, we pick the CRRA coefficient associated the smallest absolute value of
the CE difference. Then, if the CE difference is below the chosen threshold, this
observation is dropped. Thus, whenever it is plausible that the subject does not
care about the choice given the bounds on the subject’s risk aversion, that choice
is excluded. The bottom panel of Fig. 4.6 shows the results of this calculation.
The horizontal axis again shows threshold values up to 100 cents and the thin
dashed line shows the fraction of choices above the threshold. The darker line
shows the fraction of EUT consistent choices when we allow for uncertainty
over the precise CRRA coefficient for each individual. There do not appear to
be conservative CRRA values for each subject, taking into account the inter-
val nature of CRRA estimate, such that the predicted consistency of EUT rises
much above 50%. For comparison, the top panel of Fig. 4.6 shows the fraction of
choices correctly predicted by EUT assuming no uncertainty in the risk aversion
27 Of course, we could have asked more questions to “pin” the individual to a smaller interval.
This alternative is implemented in a risk aversion elicitation task by Harrison, Lau, Rrutström and
Sullivan (2005).
Measurement With Experimental Controls
97
Fig. 4.6: Sample uncertainty and EUT consistent choices.
98 G.W. Harrison et al.
measure and using the midpoint of each individual’s raw CRRA response inter-
val. There is little difference in the fraction of EUT consistent choices between
the top and bottom panels of Fig. 4.6.
We also consider whether “trembling hand” errors in risk aversion could be
driving these apparent EUT violations. Suppose that the latent process that
drives an individual’s choices in the risk aversion experiment operates with
some error, so that individuals may be observed switching earlier or later than
their optimal switching points. To capture this idea, we expand the upper and
lower bounds of the individual’s observed CRRA interval to the midpoints of
the adjacent intervals.28 As above, we consider the range of CRRA values in
this expanded interval, and then pick the one that leads to the smallest CE differ-
ence in the same manner as before. The bottom panel of Fig. 4.7 shows that EUT
still preforms poorly even under this less exacting test (the top panel of Fig. 4.6
is reproduced in the top panel of Fig. 4.7 for comparison). Allowing for uncer-
tainty over the risk aversion interval chosen does not provide any compelling
new evidence in favor of EUT.
Now ask the same questions using the interval regression model to charac-
terize risk attitudes. Figure 4.8 shows the effects of incorporating the forecast
error of the model’s prediction for each individual into the test of EUT. The top
panel shows the fraction of choices that are both EUT consistent and above the
CE threshold when the CRRA estimates are generated from the average of the
prediction from the interval regression. In the bottom panel, forecast error from
the regression is taken into account in a similar manner as described above for
Figs. 4.6 and 4.7. For each subject, we randomly draw a thousand CRRA esti-
mates from the estimated distribution of CRRA values for that subject, in this
case using the estimated mean and standard error of the forecast from the in-
terval regression model as the estimated distribution.29 We then use the CRRA
estimate for the individual that generates the smallest of the absolute values of
the CE difference between the two choices in each lottery pair.
Figure 4.8 shows the results of this calculation. What is striking here is that
the fraction of choices that are above the threshold for CE differences drops to
nothing when the threshold exceeds 10 cents. Hence, for each individual and
each lottery, there exist “plausible” CRRA values such that the opportunity cost
of an error under EUT is trivial.
28 We do not extend the interval to the three intervals surrounding the chosen interval, since the
trembling hand argument does not justify a uniform distribution over the “outer intervals.” It simply
says that somebody may have had a CRRA of 0.16 but chosen the interval with upper bound 0.15
since it was “close enough.”
29 The standard error of a forecast takes into account the uncertainty of the coefficients in the interval
regression model. It is always larger than the standard error of the prediction, which assumes that
those estimates are known exactly. These draws reflect the normal distribution appropriate for this
estimated coefficient value, so 95% of the draws for each subject will be within ±1.96 standard
errors of the point estimate. Thus, to emphasize, we are not allowing the CRRA values for any
individual to take on values that are outside the realm of statistical precision given our experimental
procedures.
Measurement With Experimental Controls
Fig. 4.7: Additional sample uncertainty and EUT consistent choices.
99
100
G.W. Harrison et al.
Fig. 4.8: Sample uncertainty and EUT consistent choices.
Measurement With Experimental Controls 101
Figures 4.6 and 4.8 pose a dilemma for the interpretation of the lottery choices
from the perspective of EUT. One must decide which statistical characterization
of risk attitudes is the best, in terms of reflecting the precision of inferences pos-
sible from our experimental procedure. Although there is nothing wrong with
the interval regression characterization, we are firmly inclined towards the min-
imally parametric characterization since we have that for each individual in our
sample. It makes fewer assumptions about the process generating the observed
risk aversion choices, can be easily relaxed to undertake robustness checks as
shown in Fig. 4.7, and can be refined with simple extensions of the experimental
procedures we used. 30 Thus, we conclude that if one cannot directly elicit risk
attitudes from the sample then EUT may be operationally meaningless since the
estimated risk attitude coefficients suffer from too much imprecision.
There are situations in which one might prefer the interval regression model,
despite the relative imprecision of the estimates that result. Assume that the risk
aversion test has been applied to a sample drawn from one population, and one
wants to define a risk aversion distribution for use in interpreting data drawn
from choices in a risk-sensitive task by a distinct sample drawn from the same
population or a distinct population. All that one might know about the new sam-
ple are individual characteristics, such as sex and age. One could then generate
conditional predictions for the new sample using the coefficient estimates from
the interval regression model estimated on the first sample and the information
on characteristics of the new, target sample. The minimally parametric char-
acterization is not so attractive here, since it cannot be so easily conditioned
on individual characteristics. In many experimental situations considerations
of cost may necessitate using predicted rather than elicited risk attitude coef-
ficients. However, more work on specifying good predictive models is needed
before such an approach can be meaningfully applied. Furthermore, there are
many situations in which one only needs to know broad qualitative properties of
the risk attitudes of subjects (e.g., are they risk-neutral or not), rather than pre-
cise estimates of degrees of risk aversion. For such purposes the within-sample
procedures may be overkill.
4.5. Conclusions
30 Harrison, Lau, Rutström and Sullivan (2005) consider two ways. One, noted earlier, is to “iterate”
the MPL procedure several times so that subjects get 10 intervals within the interval they chose on
the prior iteration. The other way in which one can tighten the CRRA interval is to administer
the procedure several times over distinct lottery prizes, so as to span a more refined set of CRRA
intervals.
102 G.W. Harrison et al.
Acknowledgements
We thank the US National Science Foundation for research support under grants
NSF/IIS 9817518 and NSF/POWRE 9973669 to Rutström, grant NSF/SES
0213974 to McInnes, and grants NSF/DRU 0527675 and NSF/SES 0616746 to
31 This need to account for the effect of errors that may arise in the elicitation task has been explicitly
considered in the literature on using the “trade-off method” of Wakker and Deneffe (1996) to elicit
the probability weighting function of rank-dependent utility theory. Using methods similar to the
ones proposed here, Bleichrodt and Pinto (2000) use simulations to assess the robustness of their
conclusions about the shape of the probability weighting function to subject errors. Even if errors
cannot “propagate” in the elicitation task, any test of a choice theory that is formed conditional
on a fitted parameter must take into account the precision with which that first stage parameter is
estimated. More generally, the literature provides many examples in which predicted behavior is
conditioned on risk attitudes, which then serve as a confound unless controlled for in some manner.
For example, Cox, Smith and Walker (1985) and Harrison (1990) consider the effect of calibrating
controls for risk attitudes on predicted bidding behavior in first-price sealed-bid auctions.
Measurement With Experimental Controls 103
Harrison and Rutström. We are grateful to Maribeth Coller, Morten Lau, Graham
Loomes, Chris Starmer, Robert Sugden, Melonie Sullivan and Peter Wakker for
comments, as well as participants at the 11th Conference on the Foundations &
Applications of Utility, Risk & Decision Theory in Paris. Supporting data and
instructions are stored at the ExLab Digital Archive at http://exlab.bus.ucf.edu.
References
Ballinger, T.P., Wilcox, N.T. (1997). Decisions, error and heterogeneity. Economic Journal 107,
1090–1105.
Bleichrodt, H., Pinto, J.L. (2000). A parameter-free elicitation of the probability weighting function
in medical decision analysis. Management Science 46 (11), 1485–1496.
Botelho, A., Harrison, G.W., Pinto, L.M.C., Rutström, E.E. (2005). Social norms and social choice.
Working paper 05-23. Department of Economics, College of Business Administration, University
of Central Florida.
Carbone, E. (1997). Investigation of stochastic preference theory using experimental data. Eco-
nomics Letters 57, 305–311.
Charness, G., Rabin, M. (2002). Understanding social preferences with simple tests. Quarterly Jour-
nal of Economics 117 (3), 817–869.
Coller, M., Harrison, G.W., Rutström, E.E. (2006). Does everyone have quasi-hyperbolic prefer-
ences? Working paper 06–01. Department of Economics, College of Business Administration,
University of Central Florida.
Cox, J.C. (2004). How to identify trust and reciprocity. Games and Economic Behavior 46 (2), 260–
281.
Cox, J.C., Sadiraj, V. (2006). Small- and large-stakes risk aversion: Implications of concavity cali-
bration for decision theory. Games and Economic Behavior 56 (1), 45–60.
Cox, J.C., Smith, V.L., Walker, J.M. (1985). Experimental development of sealed-bid auction theory:
Calibrating controls for risk aversion. American Economic Review (Papers and Proceedings), 75,
160–165.
Engle-Warnick, J. (2004). Inferring decision rules from experimental choice data. Working paper.
Department of Economics, McGill University.
Fudenberg, D., Levine, D.K. (2006). A dual self-model of impulse control. American Economic
Review 96 (5), 1449–1476.
Grether, D.M., Plott, C.R. (1979). Economic theory of choice and the preference reversal phenom-
enon. American Economic Review 69, 623–648.
Harbaugh, W.T., Krause, K., Vesterlund, L. (2002). Risk attitudes of children and adults: Choices
over small and large probability gains and losses. Experimental Economics 5, 53–84.
Harless, D.W., Camerer, C.F. (1994). The predictive utility of generalized expected utility theories.
Econometrica 62 (6), 1251–1289.
Harrison, G.W. (1990). Risk attitudes in first-price auction experiments: A Bayesian analysis. Review
of Economics and Statistics 82, 541–546.
Harrison, G.W. (2005). Field experiments and control. In: Carpenter, J., Harrison, G.W., List, J.A.
(Eds.), Field Experiments in Economics, vol. 10. Research in Experimental Economics. JAI Press,
Greenwich, CT.
Harrison, G.W., List, J.A. (2004). Field experiments. Journal of Economic Literature 42 (4), 1013–
1059.
Harrison, G.W., Rutström, E.E. (2005). Expected utility theory and prospect theory: One wedding
and a decent funeral. Working paper 05-18. Department of Economics, College of Business Ad-
ministration, University of Central Florida.
Harrison, G.W., Johnson, E., McInnes, M.M., Rutström, E.E. (2003). Individual choice and risk
aversion in the laboratory: A reconsideration. Working paper 03-18. Department of Economics,
College of Business Administration, University of Central Florida.
104 G.W. Harrison et al.
Harrison, G.W., Johnson, E., McInnes, M.M., Rutström, E.E. (2005a). Temporal stability of esti-
mates of risk aversion. Applied Financial Economics Letters 1, 31–35.
Harrison, G.W., Johnson, E., McInnes, M.M., Rutström, E.E. (2005b). Risk aversion and incentive
effects: Comment. American Economic Review 95 (3), 897–901.
Harrison, G.W., Lau, M.I., Rutström, E.E. (2005). Risk attitudes, randomization to treatment, and
self-selection into experiments. Working paper 05-01. Department of Economics, College of
Business Administration, University of Central Florida.
Harrison, G.W., Lau, M.I., Rutström, E.E., Sullivan, M.B. (2005). Eliciting risk and time preferences
using field experiments: Some methodological issues. In: Carpenter, J., Harrison, G.W., List, J.A.
(Eds.), Field Experiments in Economics, vol. 10. Research in Experimental Economics. JAI Press,
Greenwich, CT.
Hey, J.D. (1995). Experimental investigations of errors in decision making under risk. European
Economic Review 39, 633–640.
Hey, J.D., Orme, C. (1994). Investigating generalizations of expected utility theory using experi-
mental data. Econometrica 62 (6), 1291–1326.
Holt, C.A., Laury, S.K. (2002). Risk aversion and incentive effects. American Economic Review 92
(5), 1644–1655.
Karlan, D., Zinman, J. (2005). Observing unobservables: Identifying information asymmetries with
a consumer credit field experiment. Working paper. Department of Economics, Yale University.
Kocher, M., Strauß, S., Sutter, M. (2006). Individual or team decision-making: Causes and conse-
quences of self-selection. Games & Economic Behavior 56 (2), 259–270.
Lazear, E.P., Malmendier, U., Weber, R.A. (2006). Sorting in experiments, with application to social
experiments. Working paper. Department of Economics, Stanford University.
Leamer, E.E. (1978). Specification Searches: Ad Hoc Inference with Nonexperimental Data. Wiley,
New York.
Loomes, G., Sugden, R. (1995). Incorporating a stochastic element into decision theories. European
Economic Review 39, 641–648.
Loomes, G., Sugden, R. (1998). Testing different stochastic specifications of risky choice. Econom-
ica 65, 581–598.
Loomes, G., Moffatt, P.G., Sugden, R. (2002). A microeconometric test of alternative stochastic
theories of risky choice. Journal of Risk and Uncertainty 24 (2), 103–130.
Magnus, J.R. (2007). Local sensitivity in econometrics. In: Boumans, M. (Ed.), Measurement in
Economics: A Handbook. Elsevier, San Diego, CA (this book).
Mayer, T. (2007). The empirical significance of models. In: Boumans, M. (Ed.), Measurement in
Economics: A Handbook. Elsevier, San Diego, CA (this book).
Moffatt, P.G. (2007). Models of decision and choice. In: Boumans, M. (Ed.), Measurement in Eco-
nomics: A Handbook. Elsevier, San Diego, CA (this book).
Prelec, D. (1998). The probability weighting function. Econometrica 66, 497–527.
Rabin, M. (2000). Risk aversion and expected utility theory: A calibration theorem. Econometrica
68, 1281–1292.
Rubinstein, A. (2002). Comments on the risk and time preferences in economics. Unpublished man-
uscript. Department of Economics, Princeton University.
Wakker, P.P., Deneffe, E. (1996). Eliciting von Neumann–Morgenstern utilities when probabilities
are distorted or unknown. Management Science 42, 1131–1150.
CHAPTER 5
5.1. Introduction
This paper is concerned with the kind of measurements that are often taken
for granted in modern economics, namely the measurement of the entities that
economists theorise about such as: prices, money, utility, GNP, cycles, and
so forth. Two important issues are addressed here. One concerns the general
constitution of such measurements in economics, or for that matter, in the so-
cial sciences more generally: What makes good economic measurements? The
second is an enquiry into the historical trajectory of economists’ successive at-
tempts to provide reliable measurements for the concepts in their field. Is there a
recognisable transit, and if so, what are its characteristics? I shall seek and frame
answers to both these questions with the help of that useful, but perhaps unusual,
concept to find in the practice of economics: namely, “measuring instruments”.1
The discussion makes extensive use of a case example: the history of attempts
to measure the velocity of money – as a way of analysing the nature of econo-
mists’ measuring instruments. In this case, as we will see, economists began by
measuring velocity as a free standing entity using statistical data from various
sources. They went on to use identities or equations from monetary theory to
derive measurements of velocity that were still understood as an observable fea-
ture of the economy. More recently, economists defined and measured velocity
using econometric models that embedded the mathematical idealised notions of
theory in terms of statistical data relations. The case provides material for an
analytical history that illuminates both general issues noted above: the criteria
for measuring instruments; and the historical development of such instruments
in economics.
1 Marcel Boumans first introduced this terminology in an insightful series of papers on measure-
ment in economics – see his 1999, 2001 and 2005, as well as this, volumes.
There are three kinds of literature that help us to think seriously and analyti-
cally about the history of economic measurements, particularly the problems of
measuring things that are not easy to measure. These literatures come from the
philosophy of science, from metrology, and from the history and social studies
of science. As we shall see, they are complementary.
The mainstream philosophy of science position, known as the representational
theory of measurement, is associated particularly with the work of Patrick Sup-
pes.2 This theory was developed by Suppes in conjunction with Krantz, Tversky
and Luce, and grew out of their shared practical experience of experiments in
psychology into a highly formalised approach between the 1970s and 1990s (see
Michell, this volume). The original three volumes of their studies ranged widely
across the natural and social sciences and has formed the basis for much further
work on the philosophy of measurement.
Formally, this theory requires one to think about measurement in terms of a
correspondence, or mapping: a well defined operational procedure between an
empirical relational structure and a numerical relational structure. Measurement
is defined as showing that “the structure of a set of phenomena under certain
empirical operations and relations is the same as the structure of some set of
numbers under corresponding arithmetical operations and relations” (Suppes,
1998). This theory is, as already remarked, highly formalised, but informally,
Suppes himself has used the following example.3 Imagine we have a mechanical
balance – this provides an empirical relational structure whose operations can
be mapped onto a numerical relational structure for it embodies the relations of
equality, and more/less than, in the positions of the pans as weights are place
in them. The balance provides a representational model for certain numerical
relations, and there is an evident homomorphism between them. Though this
informal example nicely helps us remember the role of the representation, and
suggests how the numerical relations can lead to measurement, it is unclear how
you find the valid representation.4
Finkelstein and Sydenham, both in the Handbook of Measurement Science
(see Sydenham, 1982) offer more pragmatic accounts to go alongside and in-
terpret the representational theory’s formal requirements to ensure valid mea-
surement. Finkelstein’s informal definition talks of the assignment of numbers
to properties of objects, stressing the role of objectivity and that “measurement
2 For the original work, see Krantz et al. (1971). For recent versions see Suppes (1998 and 2002).
A more user-friendly version is found in Finkelstein (1974 and 1982).
3 At his Lakatos award lecture at LSE (2004).
4 See Rodenburg (2006) on how these representations are found in one area of economics, namely
unemployment measurement.
An Analytical History of Measuring Practices: The Case of Velocities of Money 107
5 See Mari (this volume) for an account in this pragmatic tradition that stresses the importance of
processes of measurability over the purely logic approach.
6 In this context, I should stress that this is not solely a discussion of econometrics, which uses
models as measuring instruments to measure the relations between entities: on which see Chao,
2002 and this volume (and his forthcoming book).
7 An infamous UK example of this is the way in which Thatcher’s government insisted on succes-
sive changes in the definitional rules of counting unemployment so that the measurements of this
entity would fall.
108 M.S. Morgan
of information and so forth. Since these kinds of rules are obviously endemic
in the production of most economic data, Porter’s thesis is particularly salient
to economic measurement; the onus however is on how our numbers gain trust,
not on how we overcome the problem of turning our concepts and ideas about
phenomena into numbers in the first place.8
An analysis of effective measurement in economics engages us in considering
the aspects of measuring entailed in these three approaches – the philosophical,
the metrological and social/historical. All of these approaches are concerned
with making economic entities, or their properties, measurable, though that
means slightly different things according to these different ideas. For the repre-
sentational theorists, it means finding an adequate empirical relational structure
for an entity or property and constructing a mapping to a numerical relational
structure. This enables measurements – numbers – to be constructed to repre-
sent that entity/property. For Boumans, it means developing a model or formula
which has the ability to capture the variability in numerical form of the prop-
erty or entity, but itself to remain stable in that environment. For Porter, it means
developing standardised quantitative rules (by the scientific or bureaucratic com-
munity or some combination thereof) that allow us to construct, in an objective
and so trustworthy manner, measurements for the concepts we have. We can
interpret these three notions as having in common the idea that we need a mea-
suring instrument, though the nature of such instruments (an empirical relational
structure, a model formula, or a standardised quantitative rule), and the criteria
for their adequacy, have been differently posed in the three literatures.
If we look back over the past century of so of how economists have developed
ways of measuring things in economics, we can certainly find the kinds of for-
mulae and models – measuring instruments – that Boumans describes. We can
also interpret them using the notions of Porter’s standardised quantitative rules,
and the kind of representational approach outlined by Suppes. For example,
Boumans (2001) analysed the construction of the measuring instrument for the
case of Irving Fisher’s “ideal index” number. Fisher attempted to find a formula
that would simultaneously fulfil a set of axioms or requirements that he believed
a good set of aggregate price measurements should have. Boumans showed how
Fisher came to understand that, although these were all desirable qualities, they
were, in practise, mutually incompatible in certain respects. Different qualities
had to be traded-off against each other in his ideal index formula – the formula
that became his ideal measuring instrument.
8 The development of the measurement rules is not neglected by Porter (for example, see his 1995
discussion of the development of the rules of cost-benefit analysis), but such processes of gaining
trust are less susceptible to the kinds of generalisation I seek to use here.
An Analytical History of Measuring Practices: The Case of Velocities of Money 109
Fisher’s initial axioms – his design criteria – can be interpreted within the
representational theory of measurement as laying out the empirical relational
structure that the measurements would have to fulfil. But Fisher’s empirical re-
lational structure (his axioms) could not be fully and consistently mapped onto
numbers from the economic world. Only when one or two of the axioms or
criteria were relaxed, could the empirical structure map onto the numerical struc-
ture.9 While this might seem as a partial failure according to the criteria of the
representational theory, the actual index number formula that Fisher developed
on the modified set of axioms was interpreted by Boumans as a successful at-
tempt to arrive at a more accurate measuring instrument given the variation in
the material to be measured.10 In addition, the kinds of rules and procedures
that were developed to take measurements using Fisher’s index or similar kinds
of instruments11 can be understood within Porter’s discussion of standardised
quantitative rules. The fact that numbers produced with such measuring instru-
ments, are, by and large, taken for granted is evidence of our trust in these
numbers, and that trust is lost when we notice something amiss with the rules
or formulae used to calculate them. For example, Banzhaf (2001) gives an ac-
count of how price indices lost their status as trustworthy numbers when quality
changes during the Second World War undermined the credibility of the index
number formula which assumed constant qualities.12
Historians of economics writing about measurement issues typically focus on
one particular measuring instrument such as input–output matrices or macro-
accounting. But each of these particular instruments can be classified based on
family likeness, for we have different kinds of measuring instruments in eco-
nomics in the same way that we have different categories of musical instruments.
An orchestra can be divided into classes of instrument labelled as strings, wood-
wind, brass, keyboard, percussion etc, according to the way that sounds are
produced within each group and so to the kinds of musical noises we asso-
ciate with each group. Similarly, we can define several different kinds of generic
measuring instruments and associated kinds of measurements in economics (see
Morgan, 2001 and 2003), and within each kind we can find a number of spe-
cific instruments. These generic measuring instrument groups are constructed
9 See Reinsdorf, this volume, for a broader and deeper discussion of the issues raised here on index
numbers.
10 While Fisher decided to compromise on the axioms, an alternative is to reject the data cases that
do not fit the axioms. A recent seminar paper by Steven Dowrick (see Ackland et al., 2006) at ANU
on poverty measurement suggests that it is common in this field to use as a measuring instrument an
ideal index in which the Afriat conditions have to be met. If a particular set of countries do not meet
these tests (and thus the axioms on which the index was based), then those data points are omitted
from the data set.
11 The data requirements of Fisher’s ideal index means that many index numbers are based on the
simpler, less demanding, Laspeyres formula.
12 The recent Boskin report on the US cost of living index offers another case for the investigation
of trusty numbers; see the special symposium on the report in Journal of Economic Perspectives, 12
(1), Winter 1988.
110 M.S. Morgan
aggregates must sum to the same amount (e.g. rows and columns in input–output
analysis). Such principles are really important – they are the glue that holds the
necessary elements of the measuring instrument together; they give form to the
standardised quantitative rules and provide constraints to the structure; they give
shape to the representation of the empirical and numerical relational structures
and help define the locations of variance and invariance.
In looking at the history of economic measurements then, we need to look
out for the measuring instruments, to their principles of construction, and to the
techniques and judgements required in their practical usefulness. The literatures
on measurement from the philosophy, metrology and science studies are com-
plementary here for they offer more general criteria relevant for all classes of
instrument. Measuring instruments, regardless of their general kind or particular
construction, should ideally fulfil Suppes’, Boumans’ and Porter’s requirements
for the characteristics of measuring systems. How the instruments are used, and
what happens when the requirements are not fulfilled, are explored in the case
below.
Now Petty set out to measure the amount of necessary money stock given
the total “expenses” of the nation, not to measure velocity, but it is easy to see
that he had to make some assumptions or estimates of the circulation of money
according to the two main kinds of payments. He supposed, on grounds of his
knowledge of the common payment modes, that the circulation of payments
was 52 times per year for one class of people and their transactions and 4 for the
other, and guesstimated the shares of such payments in the whole (namely that
payments were divided half into each class), in order to get to his result of the
total money needed by the economy.
If we simple average Petty’s circulation numbers, we would get a velocity
number of 28 times per year (money circulating once every 13 days); but Petty
An Analytical History of Measuring Practices: The Case of Velocities of Money 113
was careful enough to realise that for his purpose to find the necessary money
stock, these must be weighted by the relative amounts of their transactions. Such
an adjustment must also be made to find a velocity measurement according to
our modern ideas. If we employ the formula: velocity = total expenditure/money
stock, to Petty’s circulation numbers, we get a velocity equal 7.3 (or that money
circulates once every 50 days).
One immediate contrast that we can notice between these two episodes is that
in Petty’s discussion, the original circulation figures for the two kinds of trans-
action – the figures relating to velocity – were needed to derive the money stock
necessary for the functioning of the economy and having found this unknown, it
was then possible (though Petty did not do this) to feed this back into a formula
to calculate an overall velocity figure from the velocity of circulation numbers
for the two classes of payments. We used the formula here to act as a calcu-
lation device for overall velocity, though that measurement was dependent on
independent “guesstimates” of the two classes of such circulation by Petty. This
is in contrast to the modern way used by the Fed, where the velocity number is
derived only from V = GNP/M, this simple formula acts as a measuring device,
yet there were no separate numbers constituting independent measurements (or
even guesses) of monetary circulation or velocity in this calculation.
These two methods of measuring velocity – Petty’s independent way and the
modern derived way – are very different. It is tempting to think that the Fed’s was
a better measure because it was based on real statistics not Petty’s guess work,
and because its formula links up with other concepts of our modern theories. But
we should be wary of this claim. We should rather ask ourselves: What concept
in economics does the Fed’s formula actually measure? And, Does it measure
velocity in an effective way?
Beginning again with Petty’s calculations, recall that he had guesstimated the
amounts of money circulating on two different circuits in the economy of his
day. He characterised the two circuits both by the kind of monetary transactions
and the economic class of those making expenditures in the economy. I label
these “guesstimates” because these two main circuits of transactions and their
timing were probably well understood within the economy of his day but the
exact division between the two must have been more like guess work. We find
further heroic attempts, using a similar approach, to estimate the velocity or
“rapidity” of circulation in the late 19th century. For example, Willard Fisher
(1895) drew on a number of survey investigations into check and money de-
posits at US banks in 1871, 1881, 1890 and 1892 to estimate the velocity of
money in the American economy. Although these survey data provided for two
different ways of estimating the amount of money going through bank accounts,
the circulation of cash was less easy to pin down, and he was unhappy with the
ratio implied from the bank data that only 10% of money circulated in the form
114 M.S. Morgan
did assume that the numerical relations between the separately measured series
should hold in the same format as his empirically defined relations. Thus, he
constructed measurements of all the elements independently and numerical dif-
ferences between the two sides of the relation MV = P T were taken to indicate
how far his series of measurements of each side of the equation might be in er-
ror. The formula here operated neither as a calculation device nor as a measuring
instrument, but it was part of a post-measurement check system which had the
potential to create trust or confidence in his measurements.
This indeed was the same use that Irving Fisher made of his equation of
exchange MV = P T , but in a much more explicit way that takes us back imme-
diately to Suppes’ informal example of the mechanical balance. In my previous
examination of Irving Fisher’s use of the analogy of the mechanical balance for
his equation of exchange (see Morgan, 1999), I wrote briefly about the mea-
surement functions of his mapping of the various numbers he obtained for the
individual elements of the equation of exchange onto a visual representation
of a double-armed balance.14 I suggested that the mechanical balance was not
the measuring instrument in this case, for, like Kemmerer, he measured all the
elements depicted on the balance in separate procedures and both tabled and
graphed the series to show how far the two sides of the equation were equal –
see Fig. 5.2 (where money and trade are the weights on the arms; velocity and
14 The original diagram is in his 1911 book, Fig. 17, opposite p. 306. See also Harro Maas (2001)
on the way Jevons used the mechanical balance analogy to bootstrap a measurement of the value of
gold, and to understand certain properties of unobservable utility; and Sandra Peart (2001) on his
measurements of the wear of coins.
116 M.S. Morgan
pries are shown on the left and right arms respectively). Nevertheless, he did
use the mechanical balance visual representation to discuss various measure-
ment issues: the mapping enabled him to show the main trends in the various
series at a glance and in a way which immediately made clear that the quan-
tity theory of money (a causal relation running from changes in money stock
to changes in prices) could not be “proved” simply by studying the equation
of exchange measurements. He was also prompted by this analogy to solve the
problem of weighted averages by developing index number theory (which is
where Boumans’s case analysis of Section 5.2.2 fits in). All of this takes us
somewhat away from the point at issue – the measurement of velocity – but the
use of equations of exchange returns again later.
In thinking about all these measurement problems, Irving Fisher took the op-
portunity to develop not only the fundamentals of measuring prices by index
numbers (see Boumans, 2001), but also new ways of measuring the velocity of
money. He regarded his equation of exchange as an identity which defined the
relationship of exchange based on his understanding that money’s first and fore-
most function was as a means of transaction. Thus, he thought it important to
measure velocity at the level of individuals: it was individuals that spent money
and made exchanges with others for goods and services. From this starting point,
he developed two neat new methods of measuring velocity.
I will deal with the second innovation first as it can be understood as working
within the same tradition as that used by Petty and Kemmerer, but instead of
simply estimating two numbers for the two different cash circulations as had
Petty, or two different circulations of cash and check money as had Kemmerer,
Irving Fisher (1909, and then 1911) proposed a more complex accounting in
which banks acted as observation posts in tracing the circulation of payments in
and out of a monetary “reservoir”. This innovation in measuring velocities was
introduced as follows15 :
The method is based on the idea that money in circulation and money in banks are not two in-
dependent reservoirs, but are constantly flowing from one into the other, and that the entrance
and exit of money at banks, being a matter of record, may be made to reveal its circulation
outside. . . . We falsely picture the circulation of money when we think of it as consisting of
a perpetual succession of transfers from person to person. It would then be, as Jevons said,
beyond the reach of statistics. But we form a truer picture if we think of banks as the home of
money, and the circulation of money as a temporary excursion from that home. If this be true,
the circulation of money is not very different from the circulation of checks. Each performs
one, or at most, a few transactions outside of the bank, and then returns home to report its
circuit (1909, pp. 604–605).
15 This work was reported in his 1909 paper “A Practical Method of Estimating the Velocity of
Circulation of Money” and repeated in his 1911 book under the title “General Practical Formula for
Calculating V ”, Appendix to his Chapter XII, para 4, pp. 448–460.
An Analytical History of Measuring Practices: The Case of Velocities of Money 117
him map the circulation of money in exchange for goods – first a visual rep-
resentation, and, from using that, a second model, an algebraic formula which
allowed him to calculate velocity.
The first visual model (his Fig. 18 – shown here in Fig. 5.3 – and his Fig. 19,
pp. 453 and 456 of Fisher, 1911) portrayed the circulation from banks into pay-
ment against goods or services, possibly on to further exchanges, and thence
back to banks. This “cash loop” representation enabled him to define all the rel-
evant payments that needed to go into his formula and to determine which ones
should be omitted. The relevant payments that he wanted to count for his calcu-
lation of velocity were ones of circulation for exchanges of money against goods
and services, not those into and out of banks, that is, the ones indicated on the
triangle of his diagram, not on the horizontal bars. But banks were his obser-
vation posts – they were the place where payment flows were registered and so
the horizontal bars were only places where easy counting and so measurement
could take place. Thus his argument and modelling were concerned with clas-
sifying all the relevant payments that he wanted to make measurable and then
relating them, mapping them, in whatever ways possible, to the payments that he
could measure using the banking accounts.16 He used the visual model to create
the mathematical equation for the calculation using the banking statistics, and
this in turn used the flows that were observed (and could be measured) in order
to bootstrap a measurement of the unobservable payments and thus calculate a
velocity of circulation.
Irving Fisher applied his calculation formula – his measuring instrument for
velocity – to the 1896 statistics on banks that Kinley had discussed earlier. This
part of his work is also very careful, detailing all the assumptions and adjust-
ments he needed to make as he went along (for example for the specific char-
acteristics of the reporting dates). Some of these steps enabled him to improve
on his model-based formula. For example, his diagram assumed that payments
to non-depositors circulated straight back to depositors, so such money changes
hands only twice before it returns to banks, not more. Yet in the process of mak-
ing a consistent set of calculations with the statistical series, he found that he
could specify how much of such circulated money did change hands more than
twice. In other words, his measuring instrument formula acted not just as a rule
to follow in taking the measurement, but as a tool to interrogate the statistics
given in the banking accounts and to improve his measurements.
The velocity measure that Irving Fisher arrived at in 1909 by taking the ratio
of the total payments (calculated using his formula) to the amount of money in
circulation for 1896 was 18 times a year (or a turnover time of 20 days). Kin-
ley (1910) immediately followed with a calculation for 1909 based on Fisher’s
formula and showing velocity at 19. Kinley’s calculations paid considerable at-
tention to how wages and occupations had changed since the 1890 population
census, and Fisher in turn responded by quoting directly this section of Kin-
ley’s paper, and his data, in his own 1911 book. With Kinley’s inputs, and after
some further adjustments, Fisher had two measurements for velocity using this
cash loop analysis: 18.6 for 1896 and 21.5 for 1909. The calculation procedure
had been quite arduous and required a lot of judgement about missing elements,
plausible limits, substitutions and so forth. Nevertheless, on the basis of this
experience and the knowledge gained from making these calculations, Fisher
claimed that a good estimate of velocity could be made from the “measurable”
parts (rather than the “conjectural” parts) of his formula (p. 475, 1911, shown
as Fisher’s “barometer” equation above). He concluded confidently that “money
deposits plus wages, divided by money in circulation, will always afford a good
barometer of the velocity of circulation” (1911, p. 476). It is perhaps surprising
that he did not use this modified equation, his “barometer”, to calculate the fig-
ures for velocity between 1897 and 1908! Rather, the two end points acted as a
calibration for interpolation. Nevertheless, the way that he expressed this shows
that his cash loop model and subsequent measurement formula can be classified
as a sophisticated measurement instrument in Petty’s tradition of using the class
of payers and payments to determine the velocity measurement.
16 In doing this, he argued through an extraordinarily detailed array of minor payments to make sure
that he had taken account of everything, made allowances for all omissions, and so forth.
An Analytical History of Measuring Practices: The Case of Velocities of Money 119
The other new way of measuring velocity introduced by Irving Fisher was an
experimental sample survey that he undertook himself and reported briefly in
1897. A fuller report of this survey was included in his The Purchasing Power
of Money (1911). In his 1897 paper, he wrote of the possibility of taking a direct
measurement of velocity:
. . . just as an index number of prices can be approximately computed by a judicious selection
of articles to be averaged, so the velocity of circulation of money may be approximately com-
puted by a judicious selection of persons. Inquiry among workmen, mechanics, professional
men, &c., according to the methods of Le Play might elicit data on which useful calculations
could be based, after taking into account the distribution of population according to occupa-
tions (Fisher 1897, p. 520).
17 See Chapter VIII of Fisher (1911), to which the report of the Yale experiment forms an Appendix,
pp. 379–382.
120 M.S. Morgan
for his methods of measuring velocity were essentially designed to measure the
money flow. Although his sample survey method appears to be measuring the
cash balance (in the students’ pockets overnight), Fisher’s aim was to measure
the cash as it went through his student subjects’ pockets each day. The students
were acting as his observation posts here for a flow measure, just as in his cash
loop method, he used the banks as observation posts for payments into and out
of a position of rest as a way to get at the flows of money. Fisher’s idea of money
velocity can be well characterised by Holtrop’s idea of an “energy” or compound
property of money, somehow inseparable from its quantity.
Open disagreement about such conceptual issues in discussions of velocity
in the late nineteenth century continued into the inter-war period. The masterly
treatment of these arguments by Arthur Marget (1938) in the context of the the-
ory of prices provides an exhaustive analysis of theorising about velocity. But
Marget’s analysis and critique could not stem the tide; in place of the earlier
“transactions velocity”, the number of times money changes hands for transac-
tions during a certain period (the concept that we have found in the examples
of measurement from Petty to Irving Fisher), velocity was re-conceived as “in-
come velocity”: the number of times the circular flow of income went around
during the period. Although the velocity measures of the later twentieth century
are conceptualised by thinking about individuals’ demand for money in rela-
tion to their income, and might even be considered a cash-balance approach in
Holtrop’s terms, this concept of velocity came to be “expressed” in a measuring
formula of the ratio of national income to money in circulation, i.e. a macro-
level instrument. And, as we shall see, the issue of compound properties comes
back to strike those grappling with the problem of uncertainty and variability in
these measurements of monetary aggregates at the US Federal Reserve Board.
Economists considering questions about velocity in the latter half of the 20th
century have tended to stick with an income notion of velocity, not only in dis-
cussion, but also in measurement. Yet their measuring instruments are far from
providing numbers that fit the concept of the individuals’ demand for money im-
plied by Holtrop. Rather, since 1933, measurements have been constructed on
the basis of macro-aggregates, rather than at the individual level in accordance
with the conceptual requirements.
Michael Bordo’s elegant New Palgrave piece on Equations of Exchange
(1987), discussed how equations of aggregate exchange, considered as identi-
ties, have been important in providing building blocks for quantity theories and
causal macro-relations. Not only for theory building, for, as we have seen, equa-
tions of exchange provided resources for measuring the properties of money.
In the work of Kemmerer and Fisher, their equation of exchange, the identity
MV = P T , provided a checking system for their independent measurements of
transactions circulation and so velocity. In the more recent history of velocity,
122 M.S. Morgan
the income equation of exchange, namely M = P Y/V , has formed the basis
for measuring instruments that enable the economist to calculate velocity with-
out going through the complicated and serious work of separately measuring
velocity as done by Fisher and Kemmerer.
This income equation of exchange, rearranged to provide: V = P Y/M (ve-
locity = nominal income divided by the money stock), became a widely used
measuring instrument for velocity in the mid-twentieth century, in which differ-
ent money stock definitions provide different associated velocities, and different
income definitions and categories alter the measurements of velocity made. For
example, Richard Selden’s (1956) paper on measuring velocity in the US reports
38 different series of “estimates” for velocity made by economists between 1933
and 1951 and adds 5 more himself. They use various versions of income as the
numerator (personal, national income, even GDP) and various versions of M as
the denominator. These are called “estimates” both because the measurers could
not yet simply take their series for national income (or equivalent) and money
stock ready made from some official source (national income figures were only
just being developed during this time), and because many of the measurers, as
Selden, wished to account for the behaviour of velocity. They wished to see if
income velocity exhibited long-term secular changes, so understood themselves
to be estimating some kind of function to capture the changing level of veloc-
ity as the economy developed. Like the late nineteenth century measurers of
transactions velocity whose work we considered earlier, there was considerable
variation in the outcome measurements.
Boumans (2005) has placed considerable emphasis on variation and invari-
ance in measurement. It is useful to think about that question here. Clearly, we
want our measuring instrument to be such that it could be used reliably over
periods of time, and could be applied to any country for which there are relevant
data, to provide comparable (i.e. standardised) measurements of velocity. At the
same time, we want our measuring instrument to capture variations accurately,
either between places or over time. In the context of this measuring instrument,
clearly if the ratio of P Y/M were absolutely constant over time for example,
velocity measured in this way would also be unvarying, suggesting something
like a natural constant perhaps. But the evidence from our history suggests that
velocity is not a natural constant, so the question is: does the formula work well
as a measuring instrument – like for example a thermometer – to capture that
variation? From that formula, V = P Y/M, we can see that the variations in the
measurements for velocity (for example, shown in Fig. 5.1 earlier) are due to
variations in either, or both, the numerator and denominator of the right hand
side term. The measuring formula appears to operate as a measuring instrument
to capture variations in velocity, but in fact it merely displays variations that are
reflections of changes in one or both of the money supply and nominal income.19
19 This parallels a similar confusion over testing the hypothesis that k is a constant in the equation
V = kY /M; such a test is only valid if there are separate (independent) measurements available for
V , Y and M, otherwise it is just a test of the constancy of the ratio of Y /M.
An Analytical History of Measuring Practices: The Case of Velocities of Money 123
20 I am indebted to Tom Mayer who points out that Alvin Hansen has a strong statement of the same
kind in his The American Economy of 1957, p. 50.
124 M.S. Morgan
money supply growth was high, and it was an uncertainty that came from sev-
eral sources. First there was the normal problem of predicting the economic
future of the real economy and the monetary side of the economy in relation to
that. Secondly, and equally problematic, seemed to be the uncertainty associated
with the difficulty of locating a reliable measure of money supply in relation to
transactions demand, the inverse of velocity. The Fed’s charts show the problem
of the day – after a period of trend growth, stability had broken down, as we see
in the 1980s part of the graph for the M1 velocity shown in Fig. 5.1 (above).
This may have been to do with institutional changes to which people reacted
by “blurring” the distinctions (and so their monetary holdings) between trans-
actions and savings balances. As Chairman Volcker expressed it is “not that we
know any of these things empirically or logically” (p. 81).
The difficulties of locating a money supply definition that provided stability
for measuring the relevant concept of money was matched in – and indeed, in-
timately associated with – the problem of velocity measurement.21 The target
ranges discussed in the committee were understood to be dependent on both
what happened to a money stock that was unstable and a velocity that was
subject to change. The instability of the money stock measurements were un-
derstood to be not only normal variation as interest rates changed, but also more
unpredictable changes in behaviour because of innovations in the services of-
fered to savers.22 Those factors in turn were likely to affect the velocity of money
conceived of as an independent entity. Here though, the situation is further con-
fused by the fact that, as the Governors were all aware, the velocity numbers
that they were discussing were not defined nor measured as independent con-
cepts, but only by their measurement equation – namely as the result of nominal
income divided by a relevant money supply. Thus, variations in velocity were
infected by the same two kinds of reasons for variations as the money supply.
Velocity was as problematic as the money stock. The difficulties are nicely
expressed in this contribution from Governor Wallich:
We seem to assume that growth in velocity is a special event due to definable changes in
technology. But if people are circumventing the need for transactions balances right and left
by using money market funds and overnight arrangements and so forth, then really all that
is happening is that M-1B is becoming a smaller part of the transactions balances. And its
velocity isn’t really a meaningful figure; its just a statistical number relating M-1B to GNP.
But it doesn’t exert any constraints. That is what I fear may be happening, although one can’t
be very sure. But that makes a rise in velocity more probable than thinking of it in terms of a
special innovation (FOMC transcript, July 1981, p. 88).
21 For background to the troubles the Fed had in setting policy in this period, see Friedman (1988).
22 This may be interpreted as Goodhart’s Law, that any money stock taken as the object of central
bank targeting will inevitably lose its reliability as a target. However, the reasons for the difficulty
here were not necessarily financial institutions finding their way around constraints but the combi-
nation of expected savings behaviour in response to interest rate changes and unexpected behaviour
by savers in response to new financial instruments.
An Analytical History of Measuring Practices: The Case of Velocities of Money 125
So, velocity in the equation V = P Y/M now has three faces or interpretations.
On one side, it is simply the measured ratio between two things, each of which
are determined elsewhere than the equation of exchange: because velocity has
no autonomous causal connections, it provides for no measure of V that can
be used for policy setting. On the second, it is thought of as an independent
concept and its measurements might exhibit its own (autonomous) trend growth
rate (though sometimes unreliably so) which might be useful for prediction and
so monetary policy setting. On the third, it has a relationship to the behaviour
of money demand, a relationship which is both potentially reliable and poten-
tially analysable, so that it could be useful for understanding the economy and
for policy work, but here the focus has been reversed: understanding the deter-
minants of velocity now seems to be the device to understand the behaviour of
the money stock, even while the measuring instrument works in the opposite
direction.
Standing back from this episode and using our ideas on measuring instru-
ments, it seems clear that the problem in the early 1980s was not so much that the
instrument was just unreliable in these particular circumstances, but that the in-
strument itself has design flaws. In taking the formula V = nominal GDP/money
stock as a measuring instrument that is reliable for measuring velocity, there is a
certain assumption of stability between the elements that make up the measuring
instrument and within their relationships. If the dividing line between velocity
and money supply (that is, between one property or stuff that is defined as veloc-
ity and another property or stuff that constitutes money) is not strict, the latter
126 M.S. Morgan
These second and third faces mentioned by Axilrod are ones that many econo-
mists have taken up when they assume that velocity does indeed have its own be-
haviour. Arguments over what determines the behaviour of velocity and whether
it declines or rises with commercialisation were a feature of those late 19th and
early 20th century measurers. They can be seen as following suggestions made
by many earlier economists (mostly non-measurers) who discussed both eco-
nomic reasons (such as changes in income and wealth) as well as institutional
reasons (changes in level of monetisation or in financial habits) for velocity to
change over time.25
In the twentieth century, economists have assumed that velocity’s behaviour
can be investigated just like that of any other entity through an examination of
the patterns made by its measurements. Some, like Selden (1956) have used
correlations and regressions to try to fix the determinants of variations in ve-
locity. Particular attention was paid to the role of interest rates in altering
people’s demand for money and so its velocity. Selden himself reported re-
gressions using bond yields, wholesale prices and yields on common stocks
to explain the behaviour of velocity (though without huge success). More re-
cently and notably, Michael Bordo with Lars Jonung (1981 and 1990), have
completed a considerable empirical investigation into the long run behaviour of
velocity measurements using regression equations to fix the causes of these be-
haviours statistically and thence to offer economic explanations for the changes
implied in the velocity measurements. Others have argued that there is no eco-
nomically interesting behavioural determinant, that velocity follows a random
walk and can be characterised so statistically (for example Gould and Nelson,
1974).26
The use of regression equations in the context of explaining the behaviour of
velocity is but one step removed from using regression equations as a measur-
ing instrument to measure velocity itself. Regression forms a different kind of
generic measuring instrument (see Section 5.2.2) from the calculations based on
bank and individual sample surveys used from Petty to Irving Fisher and from
the calculating formulae provided by equations of exchanges (such as those sur-
veyed by Seldon or in the Fed’s formulae). The principles of regression depend
on the statistical framework and theories which underlay all regression work.
An additional principle here is that by tracing the causes which make an entity
change, we can track the changes in the entity itself. Alfred Marshall suggested
this as one of the few means to get at monetary behaviour: “The only practicable
method of ascertaining approximately what these changes [in prices or velocity
of money] are is to investigate to what causes they are due and then to watch
the causes” (Marshall, 1975, p. 170). Marshall did not of course use regression
for this, but his point is relevant here: regression forms a measuring instrument
not only for tracing these changes but for measuring them – and so velocity –
too.27
One option has been to use regression to measure or “estimate” velocity by
using the opportunity cost of holding money as the estimator (see for example,
Orphanides and Porter, 1998). The velocity concept measured here is a different
one from the kind supposed and measured by Fisher. It is, by constitution an
idealised entity named the “equilibrium velocity”; it may be well defined con-
26 William Barnett has used Divisia index numbers to try to isolate a stable velocity; see Chapter 6
of Barnett and Serlitis (2000).
27 See Backhouse, this volume, for a more general account of representational issues of models and
measurement and Chao, this volume, for a considered account of issues or representation particularly
related to econometrics.
128 M.S. Morgan
28 Even if such models locate the actual velocity in an error term, this too would be a velocity
measured with respect to some hypothetical equilibrium level.
An Analytical History of Measuring Practices: The Case of Velocities of Money 129
In a further contrast, the regression methods of the most recent work rely
on well-established measuring instruments (the statistical family of regression
methods), and use a variety of good quality input data on other variables that are
causally related to the velocity that modern economists seek to measure. No-
tably, these instruments are aimed at a different concept of velocity, one defined
by idealised economic models rather than one that might be immediately valid
in the real economy. In this respect, there is a step change in the kind of en-
tity being made measurable from the earlier transactions and income velocities,
both of which had seemed to have the status of empirically valid entities (how-
ever difficult it was to get at them and make them measurable), to ones that in
principle seem non-observable.
The historical trajectory of this case suggests that economists in the later
nineteenth century began by treating the things they wanted to measure as in-
dependent free standing entities which could be measured using clever designs
for collecting and then manipulating statistical data: an approach that under-
stood observation as close to measurement and fashioned measuring instruments
accordingly. This was not a naive empiricist approach to measuring, for well de-
fined conceptual properties and relations were used to help define measuring
instruments (e.g. surveys), but the relationships (causal relations, accounting
identities) in which velocity was thought to be embedded were cast as back-
ground constraints or as checking systems not as measuring instruments in
themselves. In the middle of the twentieth century, economists’ approach had
changed to using those same kinds of descriptive identities or relationships as
the measuring instruments themselves: enabling economists to derive measure-
ments of velocity by easy application of equations in which other elements had
already been observed and measured. In the late twentieth century, economists
moved further away from an empirical starting point to focus on measuring con-
cepts that were defined in, and by, the idealised, theory-based, economic models
which had come to dominate economics. Such concepts of velocity could be
considered unobservables in the sense that they were hypothesised, though the
point of the measurement process was to bring them into measurable status by
using their causal and functional relationships to other entities – both idealised
and pragmatic – so as to put numbers to them.
This trajectory of beginning by measuring some economic quantity by inde-
pendent means, to processes of making measurable some empirical entity by
130 M.S. Morgan
Acknowledgements
This paper was originally written for the “History and Philosophy of Money”
Workshop, Peter Wall Institute for Advanced Study, University of British
Columbia, 12–14th November 2004 and then given at the Cachan/Amsterdam
Research Day, 10th December 2004. Later versions were given at the ESHET
conference in Stirling (June 2005), at the HES sessions at the ASSA, January
2006, at departmental seminars at LSE and Australian National University in
2005 and at the Tinbergen Institute, University of Amsterdam, April 21–22nd
2006 for the workshop for the Elsevier Measurement in Economics: A Hand-
book (editor: Marcel Boumans). I thank participants for comments on all these
occasions, but particularly Malcolm Rutherford, David Laidler, Marshall Reins-
dorf, Tom Mayer, Frank den Butter and Janet Hunter. I thank Arshi Khan, Xavier
Duran and Sheldon Steed for research assistance; Peter Rodenburg, Hsiang-Ke
Chao and Marcel Boumans (editor of this volume) for teaching me about mea-
surement in economics; and The Leverhulme Trust and ESRC funded project
“How Well Do ‘Facts’ Travel?” (Grant F/07 004/Z, held at the Department of
Economic History, LSE) who sponsored the research.
An Analytical History of Measuring Practices: The Case of Velocities of Money 131
References
Ackland, R., Dowrick, S., Freyens, B. (2006).Measuring global poverty: Why PPP methods matter.
Working paper. ANU.
Axilrod, S.H. (1983). Velocity presentation for October FOMC Meeting. At http://www.
federalreserve.gov/FMC/transcripts/1983/831004StaffState.pdf.
Banzhaf, S. (2001). Quantifying the qualitative: Quality-adjusted price indexes in the United States,
1915–1961. In: Klein, J.L., Morgan, M.S. (Eds.) (2001). The Age of Economic Measurement.
Annual Supplement to vol. 33: History of Political Economy. Duke Univ. Press, Durham, pp.
345–370.
Barnett, W., Serlitis, A. (Eds.) (2000). The Theory of Monetary Aggregation. Elsevier, Amsterdam.
Board of Governors of the Federal Reserve System (1984, 1986). Federal Reserve Chart Book.
Bordo, M.D. (1987). Equations of exchange. In: Eatwell, J., Milgate, M., Newman, P. (Eds.), The
New Palgrave: A Dictionary of Economics, vol. 2. Macmillan, London, pp. 175–177.
Bordo, M.D., Jonung, L. (1981). The long-run behaviour of the income velocity of money in five
advanced countries, 1870–1975: An institutional approach. Economic Inquiry 19 (1), 96–116.
Bordo, M.D., Jonung, L. (1990). The long-run behaviour of velocity: The institutional approach
revisited. Journal of Policy Modeling 12 (2), 165–197.
Boumans, M. (1999). Representation and stability in testing and measuring rational expectations.
Journal of Economic Methodology 6 (3), 381–401.
Boumans, M. (2001). Fisher’s instrumental approach to index numbers. In: Klein, J.L., Morgan,
M.S. (Eds.) (2001). The Age of Economic Measurement. Annual Supplement to vol. 33: History
of Political Economy. Duke Univ. Press, Durham, pp. 313–344.
Boumans, M. (2005). How Economists Model the World into Numbers. Routledge, London.
Chang, H. (2004). Inventing Temperature: Measurement and Scientific Progress. Oxford Univ. Press,
Oxford.
Chao, H.-K. (2002). Representation and Structure: The Methodology of Economic Models of Con-
sumption. University of Amsterdam thesis, and Routledge, London (in press).
Cramer, J.S. (1986). The volume of transactions and the circulation of money in the United States,
1950–1979. Journal of Business & Economic Statistics 4 (2), 225–232.
Federal Open Market Committee (1981). Transcripts of July 6–7 Meeting at http://www.
federalreserve.gov/FOMC/transcripts/1981/810707Meeting.pdf.
Finkelstein, L. (1974). Fundamental Concepts of Measurement: Definition and Scales. Measurement
and Control 8, 105–111 (Transaction paper 3.75).
Finkelstein, L. (1982). Theory and philosophy of measurement. Chapter 1. In: Sydenham, P.H. (Ed.),
Handbook of Measurement Science, vol. 1: Theoretical Fundamentals. Wiley, New York.
Fisher, I. (1897). The role of capital in economic theory. Economic Journal 7, 511–537.
Fisher, I. (1909). A practical method of estimating the velocity of circulation of money. Journal of
the Royal Statistical Society 72 (3), 604–618.
Fisher, I. (1911). The Purchasing Power of Money. Macmillan, New York.
Fisher, W. (1895). Money and credit paper in the modern market. Journal of Political Economy 3
(4), 391–413.
Friedman, B.M. (1986). Money, credit, and interest rates in the business cycle. In: Gordon, R.J. (Ed.),
The American Business Cycle, vol. 25. NBER Studies in Business Cycles. Univ. of Chicago Press,
Chicago, pp. 395–458.
Friedman, B.M. (1988). Lessons on monetary policy from the 1980s. Journal of Economic Perspec-
tives 2 (3), 51–72.
Gould, J.P., Nelson, C.R. (1974). The stochastic structure of the velocity of money. American Eco-
nomic Review 64 (3), 405–418.
Hansen, A. (1957). The American Economy. McGraw–Hill, New York.
Holtrop, M.W. (1929). Theories of the velocity of circulation of money in earlier economic literature.
Economic Journal Supplement: Economic History 4, 503–524.
Humphrey, T.M. (1993). The origins of velocity functions. Economic Quarterly 79 (4), 1–17.
132 M.S. Morgan
Jevons, W.S. (1909) [1875]. Money and the Mechanism of Exchange. Kegan Paul, Trench, Trübner
& Co., London.
Kemmerer, E.W. (1909). Money and Credit Instruments in their Relation to General Prices 2nd ed.
Henry Hold & Co., New York.
Kinley, D. (1897). Credit instruments in business transactions. Journal of Political Economy 5 (2),
157–174.
Kinley, D. (1910). Professor Fisher’s formula for estimating the velocity of the circulation of money.
Publications of the American Statistical Association 12, 28–35.
Klein, J.L., Morgan, M.S. (Eds.) (2001). The Age of Economic Measurement. Annual Supplement
to vol. 33: History of Political Economy. Duke Univ. Press, Durham.
Krantz, D.H., Luce, R.D., Suppes, P., Tversky, A. (1971). Foundations of Measurement, vol. 1.
Academic Press, New York.
Maas, H. (2001). An instrument can make a science: Jevons’s balancing acts in economics. In: Klein
and Morgan, 2001, pp. 277–302.
Marget, A.W. (1938). The Theory of Prices, Vol I. New York, Prentice Hall.
Marshall, A. (1975). The Early Economic Writings of Alfred Marshall, vol. 1. Whitaker, J.K. (Ed.),
Macmillan for the Royal Economic Society.
Mitchell, W.C. (1896). The quantity theory of the value of money. Journal of Political Economy 4
(2), 139–165.
Morgan, M.S. (1999). Learning from models. In: Morgan, M.S., Morrison, M. (Eds.), Models as
Mediators: Perspectives on Natural and Social Science. Cambridge Univ. Press, Cambridge, pp.
347–388.
Morgan, M.S. (2001). Making measuring instruments. In: Klein, J.L., Morgan, M.S. (Eds.) (2001).
The Age of Economic Measurement. Annual Supplement to vol. 33: History of Political Economy.
Duke Univ. Press, Durham, pp. 235–251.
Morgan, M.S. (2002). Model experiments and models in experiments. In: Model-Based Reasoning:
Science, Technology, Values. Magnani, L., Nersessian, N.J. (Eds.), Kluwer Academic/Plenum,
pp. 41–58.
Morgan, M.S. (2003). Business cycles: Representation and measurement. In: Monographs of Official
Statistics: Papers and Proceedings of the Colloquium on the History of Business-Cycle Analysis.
Ladiray, D. (Ed.), Office for Official Publications of the European Communities, Luxemborg, pp.
175–183.
Orphanides, A., Porter, R. (1998). P∗ revisited: Money-based inflation forecasts with a changing
equilibrium velocity∗ . Working paper. Board of Governors of the Federal Reserve System, Wash-
ington, DC.
Peart, S.J. (2001). “Facts carefully mershalled” in the empirical studies of William Stanley Jevons.
In: Klein, J.L., Morgan, M.S. (Eds.) (2001). The Age of Economic Measurement. Annual Supple-
ment to vol. 33: History of Political Economy. Duke Univ. Press, Durham, pp. 252–276.
Petty, W. (1997) [1899]. The Economic Writings of Sir William Petty, vol. I. Cambridge Univ. Press,
Cambridge. Reprinted Routledge: Thoemmes Press.
Porter, T.M. (1994). Making Things Quantitative. In: Accounting and Science: Natural Enquiry and
Commercial Reason. Power, M. (Ed.), Cambridge Univ. Press, Cambridge, pp. 36–56.
Porter, T.M. (1995). Trust in Numbers: The Pursuit of Objectivity in Science and Public Life. Prince-
ton Univ. Press, Princeton.
Rodenburg, P. (2006). The construction of measuring instruments of unemployment. University of
Amsterdam thesis.
Selden, R.T. (1956). Monetary velocity in the United States. In: Studies in the Quantity Theory of
Money. Friedman, M. (Ed.), Univ. of Chicago Press, Chicago, pp. 179–257.
Suppes, P. (1998). Measurement, theory of. In: Craig, E. (Ed.), Routledge Encyclopedia of Phi-
losophy. London, Routledge. Retrieved October 27, 2004, from http://www.rep.routledge.com/
article/QO66.
Suppes, P. (2002). Representation and Invariance of Scientific Structures. CSLI, Stanford.
Sydenham, P.H. (1982). Measurements, models, and systems. Chapter 2. In: Handbook of Measure-
ment Science, vol. 1: Theoretical Fundamentals. Sydenham, P.H. (Ed.), Wiley, New York.
PART II
Representation in Economics
This page intentionally left blank
CHAPTER 6
Representation in Economics
Roger E. Backhouse
University of Birmingham and London School of Economics, UK
E-mail address: R.E.Backhouse@bham.ac.uk
1 The word “even” is used, because it is widely accepted, by supporters as well as critics, that
falsificationism is not what goes on in economics. See Hausman (1992), Blaug (1992).
2 Lipsey’s flow chart and Figures 1 and 2 can be compared with the ones offered in Backhouse
(1997, pp. 140–142).
one another, it is helpful to think of them in five categories. Together, they con-
stitute the background knowledge relevant to economists’ modelling activities.
(a) Statistics are numerical measurements of the economy. There is, of course,
much to be said about the creation of statistics, which are themselves repre-
sentations, and what they measure – for example about how the national ac-
counts are constructed (see den Butter, Chapter 9) or whether index numbers
can have the properties that economists want them to have (see Reinsdorf,
Chapter 8). Taking into account the process whereby statistics are created
reinforces the argument given here: it adds to the complexity of the picture,
making the contrast with the conventional view even more pronounced; and
it provides further places through which informal arguments enter, raising
further questions about whether representation can be understood in terms
of formal mappings between entities.
(b) History/experience is a very loose label for knowledge about the economy
that is not the result of any formal modelling process. It is discussed further
in Section 6.2.
(c) Metaphysical assumptions are included as a separate category as a reminder
that economists appear, much of the time, to be committed to some of many
of the assumptions made in their models for reasons that appear to have
Representation in Economics 137
mentioned, economists may decide that the result is so clearly right that it does
not need empirical testing: the stock of knowledge has been enlarged (iv). A
third possibility is that the result may feed into empirical work (v). Two arrows
are shown here, to denote the fact that economists may decide to test theoretical
results with or without evaluating them first.
Empirical work starts with a set of economic relationships, formulated in such
a way that they can be confronted with data using formal, statistical techniques.
These relationships may be the theoretical results discussed above, but typically
they will be different. The reason for this is the requirement that they can be
confronted with data: they must refer to variables on which statistical data exist,
or for which proxies can be found; functional forms must be precisely specified
and amenable to statistical implementation. These equations are then confronted
with statistics (vi), and empirical results are derived. These results are then eval-
uated against the background knowledge (vii). They augment that background
knowledge (viii) either positively (the results are considered to stand up) or neg-
atively (the results indicate that the model is inferior to some other representation
of the phenomenon). The process then starts again given the new set of repre-
sentations of the economy.
The significance of this view can best be seen by comparing it with the con-
ventional, hypothetico-deductive model. This is depicted in Fig. 6.2, which is
kept as close as possible to Fig. 6.1, to aid comparison. The economist starts
with assumptions that reflect what has been leaned from previous modelling ex-
ercises and logical or mathematical methods are used to derive results. These
results are then tested (note that the step of evaluating the theoretical results is
dropped) against statistical data (it is assumed that it is the theoretical results
themselves that are tested, cutting out another element in Fig. 6.1). Empirical
results are then evaluated against previous results, and the stock of knowledge is
augmented, either positively or negatively, as before.
This is might be thought an oversimplification of the hypothetico-deductive
model, which allows for the possibility that assumptions may come from any-
where, so long as their implications are tested empirically. There should perhaps
also be direct feedback from evaluation of the empirical results to theoretical
assumptions, the process iterating until some results receive a positive eval-
uation. However, the heart of the hypothetico-deductive model is represented
here, for there is a clear link from assumptions to evaluation of empirical re-
sults (as in Lipsey, 1975, p. 15). In addition, Fig. 6.2 captures the fact that in the
hypothetico-deductive model, testing leads to progressive improvement in the
empirical representation of the phenomenon.
The process of representation outlined in Fig. 6.1 thus encompasses the
hypothetico-deductive model, but is more complex. This additional complexity
points to additional questions concerning representation in economics:
(a) What is the origin of representations that are not the outcome of the formal
modelling analysed here (notably History/experience)?
(b) How are theories and the evaluation of theories related to existing represen-
tations (i) and (ii)?
Representation in Economics 139
(c) What is the relationship theoretical models, data and empirical models (v)?
(d) How do economists learn from the results of their modelling activities (iv)
and (viii)?
The conventional view has simple answers to all of these. In other words, the
conventional view, represented by the hypothetico-deductive model, presumes
the following:
(a) Models are based on assumptions that reflect what is known about the econ-
omy as a result of preceding empirical work.
(b) Theoretical models are used to generate predictions that are then tested.
(c) These predictions typically take the form of equations, and testing them
implies estimating the parameters of these equations, thereby generating an
empirical model.
(d) These numerical parameters provide quantitative knowledge of the phenom-
ena under analysis, thereby increasing the stock of knowledge.
In practice, however, the picture is much more complicated: none of these four
links is as simple as the conventional view suggests. Each will be considered in
turn. It is worth noting that even this picture is not comprehensive in that little
attention is paid to the statistical estimation process itself and to issues such as
140 R.E. Backhouse
the choice of confidence intervals and criteria for accepting or rejecting hypothe-
ses. Given that this involves judgement (see Mayer, Chapter 14) this reinforces
the arguments made here. This picture also misses out other relations between
representations. Empirical models may be used to generate statistical data (as
a measuring instrument) – for example, estimates of full-capacity output or the
NAIRU/natural rate of unemployment. Such data may be used directly in test-
ing models. However, such data may also inform other types of representation:
they may inform interpretations of historical experience. “Stylised facts”, that
frequently inform theory, may be simplifications of or generalisations from em-
pirical results, or they may be based on prior beliefs such as that the economic
system is not going to grind to a halt.
5 It should go without saying that it is ridiculous to describe work as non-analytical simply because
it does not use mathematics.
6 See Backhouse (1998).
7 The wording reflects McCloskey (1986).
Representation in Economics 141
dence, rests on implicit models, the point is that the models are not explicit and
their relationship to the results is, at best, tenuous.
9 I make no claim that this is the only, or even the best, way to view models, merely that it is the
appropriate one for the arguments being made here.
10 I use the word “comparatively” because even apparently simple data may be open to question.
Take the assertion that the market for groceries in the UK is dominated by four large firms. This
depends on how that market is defined: on whether “convenience stores” are considered to be a
distinct market from “supermarkets”.
144 R.E. Backhouse
chemist’s knowledge of how sodium and water behave when they come together.
Economists are themselves economic agents, so can draw upon introspection:
they can consider how they would themselves behave in the situation they are
considering. Thus one prominent economist could go so far as to claim that “The
method of economics remains . . . that of the mental experiment aided by intro-
spection” (Georgescu-Roegen, 1936, p. 546). Not exactly the same, but in the
same broad category are arguments from rationality. Maximisation of expected
utility is a normative theory in that it describes how agents ought to behave,
which provides economists with a strong reason for assuming that economists
do behave in this way. In addition, for many economists, explanation means
explaining observed behaviour as the outcome of rational behaviour. Some ex-
plore other assumptions about behaviour, such as bounded rationality, or other
characterisations of behaviour derived from experimental work (“behavioural
economics”) but that is a minority.
Thus most theoretical models are based not on empirically observed behav-
iour but on the assumption of rationality. Agents are assumed to be rational,
and to maximise an objective function subject to the relevant constraints. These
constraints are drawn from economists knowledge of institutions, which deter-
mine, for example, how market structures are modelled (as perfect competition,
monopoly, or interaction between a small number of agents, each of whom has
to take account of how others will respond to his or her own actions). However,
increasingly, economists have ceased to take institutions as constraints, but as
part of what is to be explained. Contracts, for example, should not be seen as
constraints, but as the result of a process involving decisions by rational agents.
Government decisions are the result, not of policy-makers standing “outside” the
system, taking decisions based on what improves social welfare, but of decisions
taken by politicians and bureaucrats who are seeking to achieve their own ends.
Though they are more reticent in print, in less formal situations, economists
will talk about analysing the implications of rational choice as “the” method of
economics.
This approach, defining economics not in terms of its subject matter but in
terms of its method, has a history going back at least to Lionel Robbins (1932)
who provided the most commonly cited definition of economics: the science
which studies the allocation of scarce resources, which have alternative uses,
between competing ends. The whole of economic theory, he suggested, could be
derived from the assumption of scarcity, taken to be a fundamental feature of the
human condition. It resulted in an approach to economics that paid little atten-
tion to empirical work, and which provoked the move to “positive economics”,
represented by Friedman (1953) and Lipsey (1975). Under the banner of pos-
itive economics, and facilitated by increased quantities of economic data and
improved computing facilities, statistical work aimed at testing theories became
more and more common.
Despite the mass of empirical work, the assumption of rationality has contin-
ued to drive economic theory. One reason for its persistence has been that, the
assumption of rationality is hard to falsify. However, the attractions of the model
Representation in Economics 145
would appear to go deeper than this. Hausman (1992) has argued that, not only
have economists frequently attached more weight to such theoretical arguments
than to empirical results, but that there are reasons why they should do this. Not
only are the arguments in favour of rationality compelling, to the extent that
economists find ways to preserve it rather than accept apparently conflicting ex-
perimental evidence, but also the data available to economists are often of low
quality, containing many errors, and not measuring precisely what economists
want them to measure. The result is that, faced with a conflict between theory
and data, it is the data that they call into question. The result is that propositions
from economic theory, even if not tested against data, or even if they have been
tested and found inadequate, may influence subsequent work. The assumption of
rationality may exert a more powerful influence on models than does the result
of statistical testing.
In an ideal world, the models that economists confront with data (assumed in
this section to be statistical) would be the same as their theoretical models. In
practice this is not always possible: theories may involve unobservable variables;
other variables may be not be measurable or measured properly; theories may
specify functional forms that cannot be estimated given the available techniques;
and theories may simply be too complicated or too imprecise to be testable (see
Mayer, Chapter 14). The result is that, probably in most cases, the model that
is tested is not the same as the one that is produced by the theory. It may not
even be a special case of the theoretical model, but one that has been modified
in ways that make it possible to confront it with data.
Cartwright (2002) suggests that when there is such a gap between theoret-
ical and empirical models, the link between them is too weak to consider the
empirical work a meaningful test of the theoretical model; empirical models do
not have any nomological machine underlying them, but are effectively plucked
from the air. Against this, it can be argued that the theoretical model should be
seen as interpretive, the lack of a precise correspondence between the two mod-
els being the result of the model being an incomplete representation of reality
(Hoover, 2002a). The model leads us to discover robust regularities in the data.
An alternative way to put this is that the theoretical model establishes possible
causal links between variables, and that those causal links then form building
blocks for the empirical model that is confronted with the data (Backhouse,
2002a). Whichever interpretation of the link we accept, the result is the same:
the nature of economic data mean that the links between theory and empirical
models involve a degree of informality.
The most highly visible, though not necessarily the most commonly-used,
way of confronting models with data involves the imprecisely-defined package
of techniques going under the name of econometrics. These methods are covered
by Qin and Gilbert in Chapter 11, but some features of this process need to be
146 R.E. Backhouse
discussed here. The most important point is that, though econometricians em-
phasise the statistical foundations of their work, economic considerations and
consequently less formal forms of reasoning become important when statisti-
cal methods are employed in practice. These come in both in formulating and
estimating the model, and in drawing conclusions.
Econometric models rarely work properly the first time they are applied to
data – say, the first time a regression equation is calculated. Theory typically
does not tell the economist what functional form to use. Often it merely say that
a function should be increasing, decreasing, or perhaps that it should be convex
or concave, leaving room for an infinite number of functional forms. Decisions
need to be made about which variable to include, and how they are measured.
Even something apparently simple such as a price index can be measured in
many ways, none of which is clearly better than the others. Lag structures and
control variables (particularly important in cross-section data sets on individu-
als) are something else over which it is hard to make a decision before coming to
the data. Where theory does indicated clearly that variables should be included,
their statistical properties in the particular data set may make it impossible to
include them in the way that theory suggests they should be. The result is that
decisions on these matters have to be made in the light of initial empirical re-
sults. Variables are added or dropped; alternative functional forms and lags are
tried out; and the economist experiments with different measures of included
variables.
Such practices are often referred to, derogatorily, as data mining. Mayer
(Chapter 14) offers a parallel discussion of this problem, and suggests remedies.
Given conventional statistical theory, it undermines the theoretical foundations
of the hypothesis tests on which econometricians rely: at its simplest, if the
econometrician will carry on calculating regressions until he or she finds one
where a particular coefficient is positive, a statistical test that shows it to be
positive is meaningless. In practice, however, not only does data mining oc-
cur – it has to occur. It can be compared with the way experimental scientists
have to tune their experiments before they work, whether working means that
anticipated results are found or are not found (Backhouse and Morgan, 2000).
There are no rules for such tuning, and economic criteria enter. Attempts have
been made to formalise the process, even incorporating such strategies into au-
tomated computer software routines but choices have to be made before even
the most sophisticated software can be applied. For example, to use PcGets,11
which automates the process of model selection, requires the user to input a list
of variables and to specify basic parameters such as lag lengths to be analysed.
Even once a satisfactory empirical model is found, judgements have to be
made about its economic significance. Of particular importance is the generality
of the conclusions reached – about the domain of the theory – a process that is
11 See http://www.pcgive.com/pcgets/index.html.
Representation in Economics 147
Empirical models clearly add to the stock of knowledge. What is less clear is
how they add to the stock of economic knowledge, rather than simply knowledge
of the form “If you correlate x, y and z, the result is p”. Econometric results
can establish generalisations (see Hoover, 2002a; Sutton, 2000) but the stock
of economic knowledge resulting from such work is arguably much less than
the stock of results. Summers (1991) has argued that econometric results have
had much less influence on macroeconomics than much more informal work,
such as the calculation of averages, trends, and other “low level” techniques.
The reason, he contends, is that the latter are more robust. This is related to the
problem of the domain of the results discussed above. Typically, economists are
not concerned, say, with the properties of UK demand for money between 1979
and 1987, but with the properties of money demand functions in general. For
example, if someone is constructing a model of the business cycle, is it more
realistic to assume that the elasticity of substitution between labour and leisure
is zero, a half or one? Meta-analysis works in some disciplines, but in economics
it has not been particularly effective in narrowing the range of disagreement.13
However, if it became more widespread, the effects might be greater: knowing
that their results would be subject to meta-analysis, and compared systematically
with other results, researchers might alter their behaviour.
The issue of how economists learn from empirical and theoretical results is
not merely a practical problem, but a profound conceptual one. The stock of
knowledge is multi-dimensional, the various elements being, at least in part,
incommensurable. Furthermore, there is no formal procedure that can specify
12 Backhouse (1997), following Harry Collins’s “experimenter’s regress” calls this the “econome-
trician’s regress”.
13 See Backhouse (1997) and Goldfarb (1995).
148 R.E. Backhouse
14 They approach the complexity of the relationships between models and the policy process by
analysing their case studies as examples of different market structure, characterised by the variables
of number and the degree of product differentiation (den Butter and Morgan, 2000, p. 283).
15 See Bank of England (2000).
16 Downward and Mearman (2005) use the metaphor of ‘triangulation’, which has recently entered
British political discourse, to describe this use of a variety of methods and sources. It is typical of
the way economists use empirical evidence (see Backhouse, 1997, 2002b, two examples from which
are discussed below).
Representation in Economics 149
knowledge. However, it is the models themselves, rather than what they say
about the economy that is relevant, for the aim is to produce other models that
can be judged superior. It is not even necessary that the models are taken seri-
ously as representations of the economy for them to play a role in subsequent
research.
Perhaps more interesting is the way that econometric results are used by
economists in constructing economic theories. Generally, economists have been
reluctant to rely on generalisations established by economists, these doubts go-
ing beyond the widely-known Lucas critique (Lucas, 1976). There is usually
scepticism about whether quantitative generalisations will be robust in the pres-
ence of exogenous shocks and changes within the systems being considered.
Thus, whereas economists were at one time willing to assume that the propensity
to save was approximately constant, and might have considered using evidence
on its value, they would nowadays not be willing to do this. Yet empirical re-
sults are not irrelevant. Rather, they inform theory in ways that have something
in common with the way the MPC uses the results of models (Backhouse, 1997,
Chapter 13). Two examples make this point.17 These may be particularly good
examples, but the conclusions drawn are probably representative of much theo-
retical work.
Peter Diamond (1994) is concerned with theory, seeking to go beyond the sta-
tics of Marshallian period analysis and the textbook IS-LM model, to construct
a more dynamic theory. Though his purpose is theoretical, and he does not even
try to derive empirical models, he makes extensive use of evidence, some of
which came from models, some of which did not. This ranged from survey evi-
dence on the frequency of price changes and price dispersion to evidence on the
link between US monetary policy decisions and subsequent change in national
output. He does not use numbers directly in his theory, but quantitative results
are important in establishing that things are important. Thus he establishes that
flows into and out of unemployment are very large in relation to the stock of
unemployment, that entry is more concentrated than exit, that seasonal changes
are large relative to the growth of national product. Because he is interested in
propositions that are more general – more abstract – than those offered by the
studies he cites, he brings together evidence from a range of sources, looking in-
formally for common patterns. Thus price stickiness is established by bringing
together evidence on input prices, mail order catalogue prices and news-stand
prices of magazines.18
Another example is Seater’s (1993) survey or Ricardian equivalence. This
brings together evidence from econometric studies of the life-cycle theory of
consumption, in which Ricardian equivalence depends, tests relating to assump-
tions made in theory, direct evidence (effectively from reduced-form models)
17 Both are taken from Backhouse (1997, pp. 190—203), where they are discussed in much more
detail.
18 Backhouse (1997, pp. 203–205) argues that this can be thought of as replication, comparable to
the replication of experimental results.
150 R.E. Backhouse
Representation does not imply resemblance, even in visual arts. This is doubly
true in economics. Economic models, even though they may be thought ade-
quate representations of the economic world, rest on assumptions that are often
wildly unrealistic as descriptions of the world. Indeed, in his classic analysis of
the issue, Friedman (1953) made a virtue of models being unrealistic. Models
are based on caricatures of agents.20 Furthermore, in some cases it is hard even
to think about resemblance, for models deal with concepts for which it is hard
to identify real-world counterparts with which a comparison can be made. Take
markets as an example. The market for equities may correspond to an identifi-
able institution, defined in space and time, but many of the markets that appear in
economic models have no such counterparts. Many markets are, at most, loose
networks of buyers and sellers with no tangible existence.
This has led many economists to be sceptical about the link between models
and understanding reality, this scepticism extending both to economic theory
and to econometric work. On economic theory, many would echo the following
critique:
More often than not, the method of economics consists either of the application of an existing
theory with little attention to whether it is closely related to the system being considered or,
worse still, of recommending that the system be changed to bring it into conformity with the
assumptions of theory (Phillips, 1962, p. 361).
19 Backhouse (1997, pp. 194–199) uses the example of the textbook by Blanchard and Fischer to
make this point.
20 See Gibbard and Varian (1978).
Representation in Economics 151
that the informality of this process means that models are dispensable. As the
Bank of England (2000, p. 3) put it,
Why bother with models at all? Could policy judgements not simply be based on observation
of current economic developments, in the light of lessons from past experience of how the
economy works? That is indeed the basis for policy judgements, but making them without the
aid of models would be extraordinarily difficult, not simple.
The same could be applied, mutatis mutandis, to the use of models elsewhere
in economics, for purposes other than monetary policy.
Where does this leave representation in economics? The main lesson is that
representation is multi-dimensional or multi-layered.21 It is trivial to say that
economics is full of representations of economic phenomena. What is interest-
ing is how these relate to each other, how they evolve, and how they contribute to
the economist’s stock of knowledge. Individual representations, whether based
on experiments, econometric work or “lower-level” methods, should not be seen
in isolation. This has been widely recognised. Friedman (1953) stressed the im-
portance of looking at the data before theorising, reflecting the view associated
with the National Bureau of Economic Research that empirical work was an
engine of discovery as much as a way of testing theories. Boumans (1999) has
drawn attention to the variety of roles played my models and the way evidence
affects theorising at different levels. “Data mining” is acknowledged to be a
widespread practice in econometrics, implying a more complex relationship be-
tween theory and data than standard views about hypothesis testing would imply.
When this broader view is taken, the common theme is that formal rules, such as
the rules of experimental or econometric practice, fail to encompass the process.
Insofar as it implies formal mappings between entities, and sees models purely
as logical-mathematical structures, this involves moving away from the repre-
sentational theory of measurement. The wider picture may be less precise and
highly informal, but this is not to say it is not systematic.22
Acknowledgements
This chapter was finished while Ludwig Lachmann Research Fellow in the De-
partment of Philosophy, Logic and Scientific Method at the London School of
Economics. I am grateful to the Charlottenberg Trust for its support. I am grate-
ful to Marcel Boumans and Thomas Mayer for helpful remarks on this paper.
References
Backhouse, R.E. (1997). Truth and Progress in Economic Knowledge. Edward Elgar, Cheltenham.
Backhouse, R.E. (1998). If mathematics is informal, perhaps we should accept that economics must
be informal too. Economic Journal 108, 1848–1858.
Backhouse, R.E. (2002a). Economic models and reality: The role of informal scientific methods.
In: Maki, U. (Ed.), Fact and Fiction in Economics: Models, Realism and Social Construction.
Cambridge Univ. Press, Cambridge, pp. 202–213.
Backhouse, R.E. (2002b). How do economic theorists use empirical evidence? Two case studies. In:
Dow, S.C., Hillard, J. (Eds.), Beyond Keynes, vol. 1: Post-Keynesian Econometrics, Microeco-
nomics and the Theory of the Firm. Edward Elgar, Cheltenham and Lyme, VT, pp. 176–190.
Backhouse, R.E., Morgan, M.S. (2000). Is data mining a methodological problem? Journal of Eco-
nomic Methodology 7 (2), 171–181.
Bank of England (2000). Economic Models at the Bank of England. Bank of England, London.
Available at http://www.bankofengland.co.uk/publications/other/beqm/modcobook.htm.
Blaug, M. (1992). The Methodology of Economics, second ed. Cambridge Univ. Press, Cambridge.
Boumans, M.J. (1999). Built-in justification. In: Morgan, M.S., Morrison, M. (Eds.), Models as
Mediators: Perspectives on Natural and Social Science. Cambridge Univ. Press, Cambridge,
pp. 66–96.
Cartwright, N. (2002). The limits of causal order, from economics to physics. In: Maki, U. (Ed.), Fact
and Fiction in Economics: Models, Realism and Social Construction. Cambridge Univ. Press,
Cambridge, pp. 137–151.
den Butter, F.A.G., Morgan, M.S. (2000). Empirical Models and Policy-Making: Interaction and
Institutions. Routledge, London.
Diamond, P. (1994). On Time: Lectures on Models of Equilibrium. Cambridge Univ. Press, Cam-
bridge.
Downward, P., Mearman, A. (2005). Methodological triangulation at the Bank of England: An in-
vestigation. Discussion paper 05/05. School of Economics, University of the West of England.
Friedman, M. (1953). The methodology of positive economics. In: Friedman, M. (Ed.), Essays in
Positive Economics. Chicago Univ. Press, Chicago, IL.
Georgeseu-Roegen, N. 1936. The pure theory of consumer’s behaviour. Quarterly Journal of Eco-
nomics 50 (4), 545–593.
Gibbard, A., Varian, H.R. (1978). Economic models. Journal of Philosophy 75, 664–677.
Goldfarb, R. (1995). The economist-as-audience needs a methodology of plausible inference. Jour-
nal of Economic Methodology 2 (2), 201–222.
Hausman, D.M. (1992). The Inexact and Separate Science of Economics. Cambridge Univ. Press,
Cambridge.
Hoover, K.D. (2002a). Econometrics and reality. In: Maki, U. (Ed.), Fact and Fiction in Economics:
Models, Realism and Social Construction. Cambridge Univ. Press, Cambridge, pp. 152–177.
Hoover, K.D. (2002b). The Methodology of Empirical Macroeconomics. Cambridge Univ. Press,
Cambridge.
Keuzenkamp, H.A., Magnus, J.R. (1995). On tests and significance in econometrics. Journal of
Econometrics 67 (1), 5–24.
Lipsey, R.G. (1975). An Introduction to Positive Economics, fourth ed. Weidenfeld and Nicolson,
London.
Lucas, R.E. (1976). Econometric policy evaluation: A critique. In: Brunner, K., Meltzer, A. (Eds.),
The Phillips Curve and Labor Markets. North-Holland, Amsterdam.
McCloskey, D.N. (1986). The Rhetoric of Economics. Wheatsheaf Books, Brighton.
Nickell, S.J. (1980). A picture of male unemployment in Britain. Economic Journal 90, 776–794.
Phillips, A. (1962). Operations research and the theory of the firm. Southern Economic Journal 28
(4), 357–364.
Robbins, L.C. 1932. An Essay on the Nature and Significance of Economic Science. MacMillan,
London.
Seater, J. (1993). Ricardian equivalence. Journal of Economic Literature 31, 142–190.
Summers, L. (1991). The scientific illusion in empirical macroeconomics. Scandinavian Journal of
Economics 93 (2), 129–148.
Sutton, J. (2000). Marshall’s Tendencies: What Can Economists Know? MIT Press, Cambridge,
MA.
CHAPTER 7
made by John Maynard Keynes (1930), who argued that commodity price trends
diverge in ways that are too persistent to be explainable as differences in random
draws from some distribution, and that the prices in an economy are function-
ally interdependent and simultaneously determined. Nevertheless, after a long
period of neglect, the stochastic approach has recently enjoyed something of a
revival, particularly for applications to inter-area comparisons.
Although the first systematic uses of price index tests occurred in late in the
nineteenth century, people have been selecting index formulas to achieve certain
properties for as long as they have sought to go beyond the use of a single, pur-
portedly representative, commodity for measurement of aggregate price change.
Perhaps the earliest discussion of a price index property (quoted in Wirth Fer-
ger, 1946, p. 56) concerned the ability to track the cost of a constant basket of
commodities. This discussion was in a treatise written in 1707 by William Fleet-
wood, Bishop of Eli, on the change in the cost of living for Fellows at Oxford
since the establishment of a cap on their outside income of £5 per year in the
time of Henry VI. Letting p0 and pt represent the vectors of prices in periods 0
and t and letting q∗ be a vector of quantities that is taken as representative of
both periods, the fixed basket price index P FB (p0 , pt , q∗ ) compares the cost of
purchasing the same quantities at the different vectors of prices from the refer-
ence time period 0 and the comparison time period t:
pt · q∗
P FB p0 , pt , q∗ = . (7.1)
p0 · q∗
Unfortunately, William Fleetwood does not get the credit for the first use of
the fixed basket index formula: surprisingly enough, he departed from Eq. (7.1)
in calculating his results. The groundwork for the first documented use of the
fixed basket index came a few years later in 1747, when the Massachusetts Bay
Colony passed legislation calling for the use of “the prices of provisions and
other necessaries of life” for the escalation of inflation-adjusted public debt to
avoid the spurious volatility and manipulation that had occurred when a sin-
gle commodity (silver) was used (Willard Fisher, 1913, p. 426). Since keeping
track of the prices of the “necessaries of life” was impractical given the re-
sources available at the time, the idea had to be simplified before it could be
implemented. This was done in 1780, when Massachusetts specified a basket of
consisting of 5 bushels of corn, 68 4/7 pounds of beef, 10 pounds of wool, and
16 pounds of sole leather for indexation of interest-bearing notes used to pay its
soldiers in the Revolutionary War.
The rudimentary Massachusetts basket was not an approximation to the aver-
age basket that was actually consumed by the erstwhile colonists, so it also falls
Axiomatic Price Index Theory 155
short of complete implement of the fixed basket index idea. The first index bas-
ket design that included a plan for making the weights truly reflect expenditure
patterns came nearly a half century later in 1823, when Joseph Lowe proposed
such a consumer price index for Britain (W. Erwin Diewert, 1993, p. 34).
Another index number property that early measures of price change were
designed to achieve is independence from the definitions of the units of mea-
surement of the items in the index. Letting Λ represent a matrix with arbitrary
positive values on its main diagonal and zeros elsewhere, the commensurability
axiom, also known as the “change of units test,” requires that the index formula
P (p0 , pt , q0 , qt ) have the property:
P (p0 , pt , q0 , qt ) = P Λp0 , Λpt , Λ−1 q0 , Λ−1 qt . (7.2)
A formula that fails to satisfy this axiom is useful only for items that are
homogeneous and measured in identical units. An example such a formula was
used by Dutot in 1738:
pit
P Dutot (p0 , pt ) = i . (7.3)
p
i i0
The commensurability axiom is critical for the main purpose of price indexes,
which Ragnar Frisch (1936, p. 1) identified as the uniting of individual measure-
ments for which no common physical unit exists. If the units of measurement for
diverse commodities could all be converted into some common unit by means
of physical equivalency ratios, index numbers would be unnecessary. Instead,
the aggregate price level could be measured by the unit value (the ratio of total
expenditures to total equivalency units consumed) of the single composite com-
modity. Price change would then be measured by the change in the aggregate
unit value:
[ i pit qit ]/[ i qit ]
unit value ratio = . (7.4)
[ i pi0 qi0 ]/[ i qi0 ]
1
P Carli (p0 , pt ) = pit /pi0 . (7.5)
N
i=1,...,N
156 M.B. Reinsdorf
A fixed basket index satisfies the commensurability axiom if and only if the
elements of q∗ in Eq. (7.1) are determined in a way that takes units of mea-
surement into account. The basket used by Massachusetts in 1780, for example,
assumed equal expenditures in the base period. This procedure insures that a
change in units will have no effect on the index; indeed, it makes the fixed bas-
ket index equivalent to the Carli index. Two more substantive examples of fixed
basket indexes that satisfy the commensurability axiom are the Laspeyres index,
defined as P FB (p0 , pt , q0 ), and the Paasche index, defined as P FB (p0 , pt , qt ).
Defining si0 as (pi0 qi0 )/(p0 ·q0 ), the reference period expenditure share of com-
modity i, the Laspeyres index is shown to satisfy the commensurability axiom
by writing it as a weighted average of the price relatives:
P Laspeyres = si0 (pit /pi0 ). (7.6)
i
1
P Paasche (p0 , pt , qt ) = . (7.7)
i sit (pi0 /pit )
A closely related test that is satisfied by any index that satisfies the base-
independence test was discussed by Harald Westergaard. This test is known as
the circularity test, and is much-discussed in the subsequent price index litera-
ture. It requires that a chained index calculated as the product of the index from
period 0 to period s and the index from period s to period t equal the direct
index from period 0 to period t:
Another much-discussed test is, in turn, satisfied by any index that satisfies
the circularity test and the identity test. The time reversal test requires agreement
between the value that a price index formula assigns to a set of price changes,
and the value that the formula assigns to a reversal of those price changes, given
that quantities also return to their original values:
N.G. Pierson (1896) (who also pointed out the failure of the commensurability
axiom by the Dutot index) thought this test of such importance that its failure by
the indexes known to him caused him to recommend that the entire enterprise of
trying to construct price indexes be abandoned.
A failure of the time reversal test that reveals a bias occurs in the case of
the Carli index. Let r be the column vector of the price relatives and r−1 be
the vector of their inverses pi0 /pit . Letting ι be a vector of ones, the product
of the Carli index (1/N)ιι r and its time-reversed counterpart (1/N )ιι r−1 is the
quadratic form (1/N 2 )ιι [r(r−1 ) ]ιι . The main diagonal of the matrix r(r−1 ) con-
sists of 1s, and the average of all the elements of r(r−1 ) equals the chained Carli
index.
We can calculate this average in two stages. The first stage combines each
element above the main diagonal of r(r−1 ) with a counterpart from below the
main diagonal. Letting δij = (pit /pi0 )(pj 0 /pj t ) − 1, the average of element ij
and element j i of the matrix r(r−1 ) is:
This average is greater than 1 if δij = 0, so unless pt = p0 , the average of all the
pairwise averages exceeds 1. Paradoxically, after every price and every quan-
tity has returned to its original value, the chained Carli index registers positive
inflation!
The beginning of the twentieth century saw the first systematic use of the test
approach to evaluate and design price index formulas. A book by Correa M.
Walsh (1901) discussed a version of the circularity test that adds a third link to
the chain and makes the ultimate prices and quantities identical to the original
ones, as they are in the time reversal test. Walsh also discussed a proportionality
axiom (also known as the strong proportionality test), which requires that an
index containing identical price relatives equal that price relative:
Finally, Walsh applied the constant basket test to price index formulas that
attempt to account for the effects of changes in the basket that is consumed.
158 M.B. Reinsdorf
These formulas should agree with the fixed basket formula in the special case of
an unchanging consumption basket:
pt · q
P (p0 , pt , q, q) = . (7.12)
p0 · q
The constant basket test is trivially satisfied either by a fixed basket index
formula that uses only the base period market basket, disregarding the bas-
ket from the other period, or by a fixed basket index formula that uses only
the comparison period market. Use of the base period basket had been sug-
gested by Laspeyres in 1871, and use of the comparison period basket had been
suggested by Hermann Paasche in 1874. Yet Walsh inferred from numerical tri-
als that the Laspeyres price index P FB (p0 , pt , q0 ) and the Paasche price index
P FB (p0 , pt , qt ) were both biased.
To satisfy the constant basket test while allowing the opposite biases of the
Laspeyres and Paasche indexes to offset one another, Walsh favored an average
of the baskets from the base and comparison periods. The simple average of the
quantities in the two baskets proposed earlier by Edgeworth and Alfred Marshall
was acceptable. Walsh found, however, that a geometric mean performed better,
so his preferred index was:
pit (qi0 qit )0.5
P Walsh (p0 , pt , q0 , qt ) = i 0.5
. (7.13)
i pi0 (qi0 qit )
A decade after Walsh’s book appeared, Irving Fisher wrote The Purchasing
Power of Money. This book contained some important new tests, but it is even
more notable because Fisher took a systematic and thorough approach that el-
evated the question of index number properties to the level of a formal field of
study.
Inspired by the right hand side of the equation of exchange MV = P T , where
M is the stock of money in circulation, V is its velocity of circulation, P is the
price level and T is the volume of trade, Fisher proposed the product test.1 This
test states that when a price index and a quantity index are specified simulta-
neously, their product must equal the expenditure relative pt · qt /p0 · q0 . Using
the product test, Fisher developed the concept of the “correlative form” of the
quantity index corresponding to a price index, a concept that is now known as
the implicit quantity index.
1 As den Butter [this volume] explains, today National Accounts use index numbers to decompose
changes in nominal expenditure into price and volume effects. This procedure has its origins in
Fisher’s (1911) discussion of the product test. The Laspeyres quantity index derived there is used to
measure real GDP in most countries.
Axiomatic Price Index Theory 159
The product test can also be used to derive the implicit price index cor-
responding to a directly specified quantity index. To avoid a violation of the
commensurability axiom, Fisher defined the units for each item in T as a “dollar
worth” in the base year, which makes T equal to the numerator of the Laspeyres
quantity index p0 · qt . By substituting pt · qt for MV in the equation MV = P T
and solving for P , Fisher obtained the implicit price index implied by the use
of base period prices to measure volume change. The result showed that for
the Laspeyres quantity index, the implicit price index is a Paasche price index.
Divide both sides of the equation pt · qt = P (p0 · qt ) by base period nomi-
nal expenditures p0 · q0 to obtain pt · qt /p0 · q0 = P (p0 · qt /p0 · q0 ), where
p0 · qt /p0 · q0 = QLaspeyres . Then the price index P must equal:
pt · qt /p0 · q0 pt · qt
P= = . (7.14)
Q Laspeyres p0 · qt
7. Test of independence from the choice of base and the closely related cir-
cularity test and time reversal test. Given the identity axiom (i.e., given
that P (p, p, q, q) = 1), these latter tests are special cases of the base-
independence test.
8. The commensurability axiom.
None of the 44 price indexes that Fisher considered passed all these tests.
(Indeed, no index can.) Fisher, however, emphasized the test of proportionality
in quantities because of the importance he attached to the equation of exchange.
If we restrict attention to indexes that also satisfy the proportionality axiom in
prices, only the Paasche price index is able to satisfy the implicit quantity index
version of the comparative proportionality test, which considers time periods
other than the base period. Substituting λqs for qt , the change in the Laspeyres
quantity index implied by the Paasche price index from time s to time t is:
p0 · λqs /p0 · q0
Q(p0 , pt , q0 , λqs )/Q(p0 , ps , q0 , qs ) =
p0 · qs /p0 · q0
p0 · λqs
= = λ. (7.15)
p0 · qs
Fisher’s magnum opus on index numbers, The Making of Index Numbers, ap-
peared a year later. This book tabulated the performance of nearly 150 formulas
on the tests of proportionality, determinateness, and withdrawal or entry, and it
identified the class of formulas that failed to satisfy the fundamentally important
commensurability test. It supplemented this deductive reasoning based on tests
with inductive reasoning based on trials of how formulas performed with ac-
tual data. These trials gave empirical evidence of such properties as the upward
bias of arithmetic averages of price relatives and the downward bias of harmonic
averages (which are simply reciprocals of time-reversed arithmetic averages).
Fisher’s new treatment of tests differed from his original one in some impor-
tant ways. Fisher renounced the circularity test and also the “comparative” tests,
which focused on the change in the index rather than the index itself. Fisher
also dismissed the base-independence test as irrelevant because of its inapplica-
bility to the chained indexes that he now favored. (Chained price indexes use
the baskets from years being compared, not the basket from some base year of
questionable germaneness.) Finally, these changes in approach implied an aban-
donment of the recommendation of the Paasche price index as the best formula
for deflation purposes.
Three tests that were unknown at the time of Fisher’s earlier book are men-
tioned in The Making of Index Numbers. Fisher’s discussion of index formulas
that behaved “erratically” or “freakishly” implied a test of continuity in prices
and quantities. Second, Fisher (1922, pp. 220–221 and 402) justified his prefer-
ence for “crossing” formulas (as is done in the Fisher index) rather than crossing
weights (as is done in the Edgeworth–Marshall and Walsh indexes) by arguing
that only the former procedure would insure that the final index remained within
the bounds of the Laspeyres index and the Paasche index. (This was not the first
mention of the Laspeyres–Paasche bounds test; it had already been discussed
by Arthur L. Bowley and by Pigou, 1912 and 1920.) Third, Fisher placed great
emphasis on his new factor reversal test. After excluding erratic or freakish in-
dex formulas and focusing on crosses of formulas rather than of weights, Fisher
identified P Fisher as “ideal” because it was the only straightforward formula that
satisfied the time reversal test and the factor reversal test.
2 In his discussant’s comments Walsh (1921) showed how to derive other formulas that satisfied
Fisher’s new test besides P Fisher , thereby undermining Fisher’s initial argument for the superiority
of P Fisher . The name later given to this formula reflects Fisher’ role in demonstrating its axiomatic
advantages; Walsh was the first to mention it.
162 M.B. Reinsdorf
Indeed, P Fisher also satisfies all the tests on Fisher’s original list if they are
properly framed. The circularity test or base-independence test, which Fisher
now disavowed, becomes the time reversal test when applied to two periods
only. Of the remaining seven tests, Fisher reported that five were satisfied. The
two tests that Fisher reported as violated by his ideal index are the price with-
drawal or entry test, and the quantity withdrawal or entry test. Fisher, however,
made no restriction on the quantities when he tested the effect on the price index
of withdrawal or entry of an item with a price relative equal to P Fisher . The ap-
propriate assumption for testing an index that depends on prices and quantities
in both periods is that the entering or withdrawing item matches both the origi-
nal price index and the original quantity index. A simultaneous test of price and
quantity withdrawal or entry is satisfied by P Fisher and QFisher .
7.2.4. Criticisms of Fisher’s tests and the rise of the economic approach
Following the publication of The Making of Index Numbers, the focus of index
number research shifted to the economic approach, with a host of contributions
advancing the field far beyond the state in which Pigou and other pioneers of this
approach had left it.3 Furthermore, the test approach research that continued to
be performed shifted in focus from the use of tests to select index formulas to the
selection of the tests themselves. Certain tests were singled out for criticism as
unjustifiable according to the economic approach or as incompatible with other
tests. For example, Samuelson and Swamy (1974, p. 575) discussed the lack of
an economic justification for the factor reversal test, concluding: “A man and
his wife should be properly matched, but that does not mean I should marry my
identical twin!”
The discovery that important tests can be incompatible with each other
pointed to a weakness of the axiomatic approach: the question of which axioms
are vital can neither be avoided by finding a formula that simultaneously satisfies
them all, nor answered in a way that is beyond all controversy.4 A noteworthy
example of controversy involves three tests from Fisher’s list that Frisch (1930)
identified as impossible to satisfy simultaneously. These are the circularity (or
base-independence) test, the commensurability test, and the determinateness
test.5 At different times, each member of Frisch’s set of incompatible tests has
been identified as the one to abandon. Frisch (1930, p. 405) suggested the sacri-
fice of the commensurability test. Fisher had, of course, discarded the circularity
3 Important contributions to the economic approach from this era include Corrado Gini (1924,
1931), Gottfried Haberler (1927), Bowley (1928), R.G.D. Allen (1935 and 1949), Hans Staehle
(1935), Abba P. Lerner (1935), A.A. Konus (1939) and Erwin Rothbarth (1941).
4 The impossibility of satisfying every axiom is interpreted within the representational theory of
measurement by Morgan [this volume]. She also discusses two approaches that have been used to
respond to this problem.
5 Frisch overlooked the need for the proportionality axiom, without which the expenditure relative
pt · qt /p0 · q0 would satisfy all the tests on the list (Eichhorn, 1976, p. 251).
Axiomatic Price Index Theory 163
test, and had – like Frisch in 1936 – identified the commensurability test as fun-
damental. Finally, Swamy (1965, p. 625) discarded the determinateness test.
The circularity test is particularly prone to incompatibility with other axioms,
including some that are indispensable. In particular, an important impossibility
theorem states that it is impossible to satisfy the axioms of circularity, commen-
surability and proportionality simultaneously if the price index uses information
on the quantities (see Appendix A). These three axioms constitute a charac-
terization for a geometric average of price relatives that has exponents that
are constants that sum to 1 but that need not be identical, as they are in the
Jevons index. (A characterization for an index is a combination of tests and ax-
ioms that is uniquely satisfied by that index.) An additional axiom that prevents
negative exponents must also be included to make the formula that is charac-
terized admissible as a price index. One such axiom, introduced in a later vein
of the literature by Wolfgang Eichhorn and Joachim Voeller, is the monotonicity
axiom. This axiom requires that the price index be strictly increasing in com-
parison period prices and strictly decreasing in base period prices. Combining
this axiom with the other three, we have a characterization for a version of the
Cobb–Douglas index that has predetermined weights s∗ .6 In log-change form,
this index is:
∗
log P Cobb–Douglas p0 , pt , s∗ = si log(pit /pi0 ). (7.19)
i
Another perspective on the difficulty of satisfying the circularity test was of-
fered by Samuelson and Swamy. They showed that a price index can use the
quantity data and still satisfy the circularity test if the quantities behave in a way
that is consistent with homothetic utility maximization.7 This price index need
not sacrifice the proportionality test nor the commensurability axiom. Unfortu-
nately, however, homotheticity is a strong assumption: it means that marginal
rates of substitution do not depend on the utility level, making the composition
of the consumption basket invariant to income and dependent only on prices.
Samuelson and Swamy conclude:
[I]n the nonhomothetic cases of realistic life, one must not expect to be able to make the naïve
measurements that untutored common sense always longs for; we must accept the sad facts of
life, and be grateful for the more complicated procedures economic theory devises (p. 592).
The rise of the economic approach to index numbers did not mean the end
of progress on index number axiomatics, nor even the limiting of axiomatic
6 The name comes from the economic approach. Constant expenditure shares are implied by a
Cobb–Douglas utility function. Used as weights in a log-change price index, these shares yield the
Cobb–Douglas cost of living index.
7 They were not the first to show this: the homotheticity condition had been identified a year earlier
by Charles Hulten using a method that is discussed in the appendix.
164 M.B. Reinsdorf
Note also that Eq. (7.20) allows an additive decomposition of the change in
P Fisher , and similarly for Eq. (7.21) and QFisher . These equations are therefore
used in the national economic accounts of the US and Canada to calculate the
tables of contributions to change in their Fisher indexes of price and volume
change (Reinsdorf et al., 2002).
Starting in the 1970s, the field of axiomatic index theory began to experience
a renaissance. Yrjö Vartia (1976) introduced the test of consistency in aggre-
gation, which requires that a multi-stage application of the index formula in
various levels of aggregation yield the same result as a single stage application
that calculates the top-level aggregate directly from the detailed data.9 Another
watershed event was the independent discovery by Kazuo Sato (1976) and Var-
tia (1976) of the ideal log-change (i.e. geometric) index, where “ideal” means
that an index satisfies the factor reversal and time reversal tests. This index has
8 This weakness of the Edgeworth–Marshall index is also avoided by the Walsh index.
9 Diewert (2005, fn. 24) notes that a variant of this test had already appeared in a book by J.K.
Montgomery (1937). This test is also discussed in Charles Blackorby and Diane Primont (1990).
Axiomatic Price Index Theory 165
with logmean(si0 , si0 ) ≡ si0 . Normalizing the weights so that they sum to 1, the
natural logarithm of the Sato–Vartia index (also known as the Vartia II index),
has the form:
At about the same time as the research leading to the Sato–Vartia index, the
study of index number tests themselves experienced a rebirth as “the axiomatic
theory of index numbers,” which was the name of a paper by Eichhorn. Eich-
horn, with his student Voeller, and also Janos Aczél, replaced Fisher’s pragmatic
quest for good measurement tools – termed the “instrumental approach” by
Marcel Boumans (2001, p. 336) – with the functional equation approach. This
literature provided further theorems on the mutual inconsistency of various sets
of axioms, but it also introduced the new concerns of identifying mathematically
independent sets of axioms and the discovery of characterizations for the impor-
tant price index formulas. Theorems on mutual inconsistency and independence
of sets of axioms were proven by Eichhorn (1976), for example. For examples
of characterizations for the Fisher index, see Bert Balk (1995), H. Funke and
Voeller (1978, 1979) and for characterizations for other indexes see Manfred
Krtscha (1984, 1988), and Arthur Vogt (1981).
A set of independent axioms than can be viewed as a definition of the fun-
damental properties of a price index function was introduced by Eichhorn and
Voeller (1983). This set consists of the monotonicity axiom, the proportionality
axiom, the commensurability axiom and the price dimensionality axiom. The
price dimensionality axiom requires that multiplying all base and comparison
period prices by the same positive scalar leave the index unchanged, so Balk
(1995, p. 72) calls it “homogeneity of degree zero in prices.”
These four axioms are independent because price indexes exist that violate
any one of them while satisfying all others. Eichhorn and Voeller show that any
index that satisfies them all also satisfies some additional tests, most notably the
mean value test. The mean value test requires that P (p0 , pt , q0 , qt ) lie within
the range defined by the smallest and largest price relative:
none of its main rivals can satisfy completely. A particularly noteworthy rival to
the Fisher index is the Leo Törnqvist (1936) index:
1
log P T örnqvist = (si0 + sit ) log(pit /pi0 ). (7.26)
2
i
Many researchers who prefer the economic approach think of the Törnqvist
price index as superior because, in a much-celebrated paper, Diewert (1976)
demonstrated its exact equality to a cost of living index from the versatile
translog model of economic behavior.10 This claim to superiority from the
economic approach made Diewert’s subsequent finding of a relatively poor per-
formance for the Törnqvist index on axiomatic criteria all the more striking. The
test violations of the Törnqvist index generally involve only small discrepancies,
but they are surprisingly numerous and they include some important properties.
The constant basket test, the Laspeyres–Paasche bounds test, the determinate-
ness test (which was omitted from Diewert’s list), the monotonicity axiom, and
the mean value test for the implicit quantity index are not satisfied by the Törn-
qvist index.11 The violation of the Laspeyres–Paasche bounds test may seem
inconsistent with the equivalence of the Törnqvist index to a cost of living in-
dex in the translog case, but when the data do not fit the translog model, the
Törnqvist index may not mimic a cost of living index so well.
The next round in the debate came in a survey of axiomatic price index the-
ory. Here Balk (1995, p. 87) observed that every known characterization of the
Fisher index includes a questionable test. This leaves open the possibility that
some other index could fulfill just as many of the important tests as the Fisher
index does. Indeed, Balk singled out the Sato–Vartia index as doing just that.
Though Balk did not explore the properties of the Sato–Vartia index in detail,
remarkably, it can be shown to satisfy the same slightly weakened version of
Fisher’s original list of tests that the Fisher index satisfies.
Tests not found on Fisher’s (1911) list are a different matter, however. The
question of the test parity of the Sato–Vartia index with the Fisher index turns
on how much importance is attributed to two of these tests. First, Reinsdorf and
Alan Dorfman (1999) demonstrated that the Sato–Vartia index fails to satisfy
the monotonicity axiom. Since the monotonicity axiom has been viewed as fun-
damental – Balk lists it first among his core axioms – there would seem to be no
hesitation in proclaiming the Fisher index superior.
10 The paper also categorized the Fisher index and some other indexes as superlative, but the Törn-
qvist index is exact for the economic model with the widest use and the most appeal. The intuition
that the superlative index corresponding to the best model must itself be best has recently been
vindicated in research by Robert J. Hill (2006).
11 Diewert subsequently discovered some axiomatic advantages of the Törnqvist index, which he
details in chapter 16 of the International Labor Organization (2004) manual on consumer price
indexes.
168 M.B. Reinsdorf
Yet the economic approach shows that things are not so simple. The rela-
tionship between price changes and quantity changes implies a value for the
elasticity of substitution in the economic model that generates the Sato–Vartia
index, so with quantities held constant, a different price change implies a dif-
ferent degree of item substitutability. For large price changes, the degree of
substitutability matters greatly. A larger price increase (or smaller price decline)
for a highly substitutable item can therefore have less effect on the cost of liv-
ing than a smaller price increase (larger decrease) for a less substitutable item.
A believer in the economic approach could, then, argue that the fault lies with
the monotonicity axiom, not the Sato–Vartia index. In particular, according to
the economic approach, the property of monotonicity should be required locally
in the region where price log-changes do not exceed 1 in absolute value. The
Sato–Vartia index indeed satisfies such local monotonicity.
The second important test failure of the Sato–Vartia index – which occurs
only when it includes three or more items – is of the Laspeyres–Paasche bounds
test. As is argued below, the economic approach does support the validity of the
Laspeyres–Paasche bounds test. Thus, if the economic approach is used to ex-
cuse the failure of the monotonicity axiom, the failure of the Laspeyres–Paasche
bounds test cannot at the same time be dismissed.
The failure of the Sato–Vartia index to equal the test performance of the Fisher
index does not by itself rule out the possibility that some new rival to the Fisher
index could be discovered. We can rule out this possibility, however, if we accept
the importance of the Laspeyres–Paasche bounds test. Fisher’s contention that
this test rules out a crossing of the weights can be proven under the assumptions
that more than two goods are present and that the index satisfies tests of time
reversal, continuity, and proportionality. Consistent with this, Hill (2006) shows
that the only superlative index that satisfies the Laspeyres–Paasche bounds test
is the Fisher index.
Since Pigou, the main argument for the Laspeyres–Paasche bounds test has come
from the economic approach. The basic logic is straightforward: adjusting con-
sumers’ base period income p0 · q0 for the price change to pt by means of the
Laspeyres index would enable them to purchase basket q0 again, though they
would likely choose a better basket that would also cost pt · q0 . A change in
income in proportion to the Laspeyres index is, therefore, at least adequate to
maintain the base period standard of living, and quite possibly more than an
adequate. This makes the Laspeyres index an upper bound for the cost of liv-
ing index evaluated at the base period standard of living. Similarly, consumers’
comparison period income pt · qt deflated by the Paasche index is adequate at
base period prices p0 to purchase basket qt , or some other, possibly better, bas-
ket that also costs p0 · qt . Therefore the Paasche index is a lower bound for the
index that compares the cost of the comparison period standard of living at com-
Axiomatic Price Index Theory 169
parison period prices to the cost of that same standard of living at base period
prices.
The existence of two relevant standards of living creates complications that
easily lead to mistakes. Only in the simple case of homotheticity are the
Laspeyres and Paasche indexes upper and lower bounds for the same cost of
living index. One possible mistake is, therefore, to ignore the need to assume
homotheticity to make the cost of living index a function of prices alone.
A sound theory should allow for the possibility of changes in the composi-
tion of the consumption basket due to income effects. For small changes in the
standard of living, income effects are generally so dominated by price effects
that they may be ignored. At the other extreme, if q0 and qt represent very dis-
parate standards of living, the Laspeyres–Paasche bounds may plausibly contain
neither the cost of living index for the standard of living of period 0, nor the
one for the standard of living of period t. A relaxation of the Laspeyres–Paasche
bounds test is justifiable when the value of the quantity index is far below 1
or far above 1, because large effects on consumption patterns attributable to a
changing standard of living widen the range of possible values for the relevant
cost of living indexes beyond the Laspeyres and Paasche bounds.
Nevertheless, to discard the Laspeyres–Paasche bounds test entirely is a sec-
ond possible mistake. Under a wide variety of assumptions, any cost of liv-
ing index that is outside the desired bounds will be approximately equal to a
Laspeyres or Paasche index and hence be at least approximately inside their
bounds. Furthermore, the regions of the index domain where the relevant cost
of living indexes are outside the Laspeyres–Paasche bounds are limited to sub-
spaces of the domain where the standard of living varies widely. As a result, an
index number formula that violates the Laspeyres–Paasche bounds test is likely
to do so in the region where the relevant cost of living indexes are necessarily
between those bounds.
To identify the region where the index must lie within the range defined by
the Laspeyres and Paasche indexes, we can use the weak axiom of revealed pref-
erence (WARP). According to revealed preference theory, if bundle q0 (qt ) is
chosen when qt (q0 ) would be less expensive, the costlier bundle is superior
(“revealed preferred”) to less expensive one. In terms of index numbers, this
means that QLaspeyres (p0 , pt , q0 , qt ) 1 implies that q0 is at least equivalent
to qt , and that QPaasche (p0 , pt , q0 , qt ) 1 implies that qt is at least equivalent
to q0 . Letting Q∗ represent the implicit quantity index implied by the price index
being tested, WARP requires that Q∗ 1 if QLaspeyres (p0 , pt , q0 , qt ) 1, and
that Q∗ 1 if QPaasche (p0 , pt , q0 , qt ) 1. If QLaspeyres = QPaasche = 1, then
Q∗ must equal 1.
Moreover, the version of revealed preference theory for strictly quasi-concave
preferences (Vartia and Weymark, 1981, p. 411) implies that QPaasche <
QLaspeyres when prices change in a way that keeps the standard of living un-
changed. In this case, in a region of positive measure consisting of a neighbor-
hood around the locus of points where the standard of living is constant, the cost
of living index evaluated at the comparison period or reference period utility
170 M.B. Reinsdorf
Samuelson and Swamy counsel us to accept the sad facts of life regarding the
circularity test because our hopes for satisfying this test must depend on an un-
realistic assumption of homotheticity. Yet price indexes that take expenditure
Axiomatic Price Index Theory 171
patterns into account via their weighting structure must exhibit at least a min-
imal amount of internal consistency to be meaningful. This requisite internal
consistency can be defined as the absence of contradictions in the ordinal rank-
ing of consumption (or output) bundles: a price measure that simultaneously
implies that real consumption is up and that it is down does not seem to measure
anything useful. Fortunately, economic optimization based on some stable util-
ity or production function – a much weaker assumption than homotheticity! – is
sufficient to rule out such contradictions.
The ordinal circularity axiom uses the absence of ranking contradictions as a
criterion for determining whether the quantities and prices behave too inconsis-
tently for construction of indexes that conform to the Laspeyres–Paasche bounds
test. Recalling the logic of the weak axiom of revealed preference for consump-
tion, purchase of qt at price pt when q0 would have cost less implies that qt
yields more utility than q0 . That is, QPaasche (p0 , pt , q0 , qt ) > 1 implies that qt
can be ranked as a higher level of real consumption (consumer welfare or pro-
ducer use of inputs) than q0 . Similarly, QLaspeyres (p0 , pt , q0 , qt ) < 1 implies that
q0 can be ranked as superior. An analogous theory for a producer of multiple out-
puts states that QLaspeyres (p0 , pt , q0 , qt ) > 1 implies that qt represents a higher
level of real output than q0 , while QPaasche (p0 , ps , q0 , qs ) < 1 implies that q0 is
superior. In the producer case, the logic is that selling q0 when qt would have
yielded more revenue shows that qt is on a higher production possibility fron-
tier. Of course, these restrictions on the quantity indexes imply restrictions on
the Laspeyres and Paasche price indexes via the product test.
The ordinal circularity axiom forbids a transitive contradiction in rankings
when we form a closed loop of Laspeyres and Paasche indexes. The sim-
plest version of this axiom uses just three time periods to form the loop,
though a complete characterization of this axiom would allow for loops of
any length. Letting the first link in the loop run from period 0 to period s,
if min[QLaspeyres (p0 , ps , q0 , qs ), QPaasche (p0 , ps , q0 , qs )] > 1, then qs repre-
sents a larger volume of production or consumption than q0 . Similarly, if
min[QLaspeyres (ps , pt , qs , qt ), QPaasche (ps , pt , qs , qt )] > 1, then qt represents a
larger volume of consumption or production than qs . The ordinal circularity ax-
iom states that if qs is an improvement on q0 and qt is an improvement on qs ,
then a return to q0 cannot be still another improvement. That is:
min QLaspeyres (p0 , ps , q0 , qs ), QPaasche (p0 , ps , q0 , qs ) 1 and
min QLaspeyres (ps , pt , qs , qt ), QPaasche (ps , pt , qs , qt ) 1
⇒ max QLaspeyres (p0 , pt , q0 , qt ), QPaasche (p0 , pt , q0 , qt ) 1. (7.27)
In the loop formed by combining inequality (7.28) with the first two inequalities
in expression (7.27), the transitive property implies that each point is strictly
superior to itself!
The Laspeyres and Paasche indexes will satisfy the ordinal circularity test if
and only if the price and quantity data are consistent with economic optimization
behavior (utility maximization, cost minimization, profit maximization). If we
accept that the absence of ranking contradictions is necessary for indexes to be
meaningful, we must conclude that the construction of Laspeyres, Paasche or
similar indexes implies a belief in the existence of some economic concept that
is being maximized, such as utility or profits. This belies the claims that are
occasionally made that Laspeyres and Paasche indexes are devoid of economic
content.
Tests of ordinal circularity have an interesting history. The economic signif-
icance of the existence of intransitive loops was first pointed out by Jean Ville
(1951–1952), an engineer who was called upon to teach economics at the Uni-
versity of Lyon because of a post-war faculty shortage. Ville, however, based
his tests on a purely theoretical construct known as a Divisia index, which is
explained in Appendix B. Ordinal circularity of Laspeyres and Paasche indexes
was first used to test for the existence of a utility function that rationalizes the
data by Sidney Afriat (1967), who also developed non-parametric bounds for
the cost of living index. The narrowing of the hypothesis to be tested to one of
utility maximization allows Eq. (7.27) to be simplified by substituting QPaasche
for min[QLaspeyres , QPaasche ] and QLaspeyres for max[QLaspeyres , QPaasche ].
The importance of Afriat’s tests was explained in Diewert (1973), and they
were extended to include a test for homothetic utility maximization in Diewert
(1981). Hal Varian (1982 and 1984) developed algorithms for the implemen-
tation of these tests, and Dowrick and Quiggin (1994 and 1997) adapted these
algorithms for use in inter-area comparisons. Varian’s algorithms were used
by Marilyn Manser and Richard McDonald (1988), with the surprising result
that US aggregate consumption data were consistent with homothetic utility
maximization. Finally, using enhanced algorithms, Blow and Crawford (2001)
found that data from the British Family Expenditure Survey, which furnishes
the weights for the British Retail Price Index (RPI), were consistent with utility
maximization. They also determined ranges for the annual substitution bias of
the Laspeyres index used for the official RPI. The range was centered some-
where between 0.1 and 0.25 percentage points in most years, a result that agrees
with Manser and McDonald’s estimates of substitution bias in a Laspeyres price
index for the US.
Formulas with better axiomatic properties than the Laspeyres and Paasche for-
mulas have been recommended since the earliest days of the field in the late
1800s, and the economic approach also implies that other formulas are better.
Axiomatic Price Index Theory 173
i
Balk (1996, p. 357) provides a general characterization for the price indexes
that satisfy consistency in aggregation. He shows that this set of indexes P can
be defined implicitly by requiring the existence of a function f (P , p0 ·q0 , pt ·qt )
such that:
f (P , p0 · q0 , pt · qt ) = f (pit /pi0 , pi0 qi0 , pit qit ). (7.30)
i
The Montgomery index is the only formula that is consistent in aggregation and
that satisfies the factor reversal test. Unfortunately, it gains these remarkable
174 M.B. Reinsdorf
properties at the cost of sacrificing the vital proportionality axiom. Its usefulness
is therefore limited to theoretical purposes, such as Diewert’s (1978) proof that
the Törnqvist and Fisher indexes satisfy approximate versions of the consistency
in aggregation test.
If we restrict f (·) to be a linear combination of (p0 · q0 )P and (pt · qt )P , then
the family of generalized Stuvel indexes comprises the admissible indexes that
solve Eq. (7.30) exactly. To derive the Stuvel indexes, begin by recalling that a
pairing of the Paasche price index with the Laspeyres quantity index passes the
product test, but a pairing of two Laspeyres indexes does not. Since raising a
quantity index has the effect of lowering the implicit price index that it implies,
we can adjust the Laspeyres price index and the Laspeyres quantity index in the
same direction to arrive at a pair of indexes P and Q that pass the product test.
Moreover, by making both adjustments identical, we can derive a formula that
satisfies the factor reversal test.
Two ways to do this were identified by van IJzeren (1958). The system of
equations formed by the product test P Q = (pt · qt )/(p0 · q0 ) and an equality of
proportional adjustments can be solved for P and Q to obtain the Fisher indexes.
The proportional adjustments equality is:
Alternatively, we can make the adjustments equal in absolute terms. This leads
to the equation:
P Laspeyres − QLaspeyres
P Stuvel =
2
[(P Laspeyres − QLaspeyres )2 + 4(pt · qt )/(p0 · q0 )]1/2
+ , (7.34)
2
QLaspeyres − P Laspeyres
QStuvel =
2
[(QLaspeyres − P Laspeyres )2 + 4(pt · qt )/(p0 · q0 )]1/2
+ . (7.35)
2
The Stuvel indexes are “ideal” because they satisfy both the time reversal test
and the factor reversal test. They can also be generalized. The family of gen-
eralized Stuvel indexes is defined by weighting the absolute price and quantity
index adjustments by some λ ∈ [0, 1]:
λ P − P Laspeyres (·) = (1 − λ) Q − QLaspeyres (·) . (7.36)
Axiomatic Price Index Theory 175
λ(p0 · q0 )P − (1 − λ)(pt · qt )P −1
= λ(pi0 qi0 )(pit /pi0 ) − (1 − λ)(pit qit )(pit /pi0 )−1 . (7.37)
i
Despite the controversies over which tests to set aside in light of the impos-
sibility of simultaneously satisfying all of them, some sets of axioms have
gained acceptance as ways of defining an admissible price index. Balk (1995,
p. 86) identifies two such sets of fundamental axioms. One combination con-
sists of the monotonicity axiom, the proportionality axiom, the price dimen-
sionality axiom and the commensurability axiom, as discussed by Eichhorn
and Voeller (1983). The other combination of axioms, which Balk prefers, re-
places the proportionality axiom with two axioms, the strong identity test and
an axiom requiring linear homogeneity in comparison period prices, i.e. that
P (p0 , λpt , q0 , qt ) = λP (p0 , pt , q0 , qt ). A formula can satisfy proportionality
yet fail to exhibit linear homogeneity in comparison prices. Whether such a for-
mula should be excluded from consideration as a price index is debatable, so
176 M.B. Reinsdorf
∂ log P Cobb–Douglas /∂ log pi0 = si0 (1 − si0 ) log(pit /pi0 ) − si0 . (7.38)
If log(pit /pi0 ) 1 − si0 , the index is increasing in pi0 , thus violating the
monotonicity axiom.12
The problems of severe implications and lack of theoretical justification can
be resolved by weakening the monotonicity axiom whenever a log price changes
by more than one. We retain the local monotonicity axiom as a requirement on
the price index in the region defined by |log(pit /pi0 )| −1 ∀i because this
axiom guarantees that P (p0 , pt , q0 , qt ) > 1 whenever pit pi0 ∀i with at least
one inequality strict, and that P (p0 , pt , q0 , qt ) < 1 when p0 < pt . We also main-
tain the global necessity of a monotonicity axiom that holds item expenditures
constant by treating the quantities as inversely dependent on prices. Using the
diagonal matrix Λ from the commensurability axiom to adjust prices and quan-
tities in one period only, the weak monotonicity axiom requires that:
∂P Λp0 , pt , Λ−1 q0 , qt /∂Λii < 0 (7.39a)
and
∂P p0 , Λpt , q0 , Λ−1 qt /∂Λii > 0. (7.39b)
Eichhorn and Voeller’s set of four core axioms is valid if the monotonicity
axiom is replaced with a combination of the local monotonicity axiom and the
12 To obtain a general result for the log-change indexes, note that functions of the form x ap+b
are non-monotonic for small p if a > 0 and 0 b < e−2 a. Let p represent a price and let a and
b be parameters such that ap + b approximates the function for the weight of the Cobb–Douglas,
Törnqvist or Sato–Vartia index.
Axiomatic Price Index Theory 177
The specifics of the problem at hand, including the purpose of the index and the
characteristics of the data, determine the relative merits of the possible attributes
of the index formula. In selecting tests, therefore, the key principle is that the
answer depends on the question. Even formulas with serious defects, such as
the Dutot index and the ratio of unit values, can be useful in the right context.
However Fisher’s (1922, p. 361) oft-neglected warning about the Carli index –
“[it] should not be used under any circumstances” – is best observed.
Six kinds of tests are of practical value for comparison of prices over time.
First, failure to satisfy the time reversal test is a sign of bias if the discrepancies
tend to be in one direction, and if the discrepancies are necessarily in one di-
rection, the bias is severe. Second, the requirements of continuity in prices and
quantities may be necessary for avoiding erratic behavior of the index. Third,
if the data contain extreme price relatives, the determinacy test is important for
avoiding excessive sensitivity to outliers. Fourth, the test of consistency in ag-
gregation is relevant when an index is constructed in stages, especially if index
users are interested in the index components along with the top level aggregate.
Fifth, for indexes that have an interpretation using the economic approach,
such as a cost of living index, item weights must reflect expenditure patterns.
An index that stays within the bounds defined by the Laspeyres and Paasche
indexes will do this, but treating this test may be treated as approximate to avoid
automatically limiting the choice of index to a Laspeyres index, a Paasche index,
or some kind of average of the two.
Lastly, if the index is to be used for deflation of nominal expenditures, the
product test, and tests of the properties of the implicit quantity index that is
implied by the product test are critical. Satisfaction of the factor reversal test
(which requires that the implicit quantity index have the same functional form
as the price index) is a convenient way to insure that the implicit quantity index
has axiomatic properties that are as good as those of the price index, but this
178 M.B. Reinsdorf
The pendulum that swung so strongly towards the economic approach starting
in the 1930s began to swing back starting in the 1970s. One cause of this re-
vitalized interest is an increased recognition of the usefulness of the axiomatic
13 A chained Törnqvist index is a good discrete time approximation for François Divisia’s continu-
ous time concept, as explained in Pravin K. Trivedi (1981) and Balk (2005).
Axiomatic Price Index Theory 179
approach. The problem of formula bias in the US Consumer Price Index (CPI),
which caused hundreds of billions of dollars in excess payments and played a
key role in the decision to name a commission to investigate the CPI led by
Michael Boskin, provides a dramatic example of this. In the early 1990s, re-
search revealed that a narrow focus on the stochastic approach had prevented a
full consideration in the 1970s of the axiomatic properties of a formula for the
lowest-level aggregates of the CPI that had the characteristics of a Carli index
(Reinsdorf, 1998). Another example of the usefulness of the axiomatic approach
comes from the selection in the 1990s of the Fisher index for the US and Cana-
dian national economic accounts. Even though the factor reversal test has been
criticized for its lack of support from the economic approach, this test showed
that the alternatives to the Fisher index (such as the Törnqvist index) do not per-
mit the kind of unified approach to the construction of both price indexes and
quantity indexes that is desirable for national accounts.
A second cause of revitalized interest in the axiomatic approach is disillusion-
ment with its competitor, the economic approach. Concerns about the applica-
bility of the economic approach to groups of heterogeneous households have
received renewed emphasis from some researchers (e.g. Angus Deaton, 1998).
Furthermore – although having an imperfect but explicit conceptual framework
would seem preferable to having a framework that cannot be articulated or that
is not relevant to important elements of the problem – a few index number users
question whether the underlying assumptions of the economic approach are suf-
ficiently descriptive of reality to constitute a useful paradigm. Because of such
controversies, a recent US National Academy of Sciences Panel was unable
to reach a consensus on the question of whether the underlying measurement
concept for the US Consumer Price Index should be based on the economic ap-
proach or on a kind of basket test (National Research Council, 2002). A turning
of the tables on the economic approach by the axiomatic approach is not on the
horizon, for the reasons discussed by Jack Triplett (2001). Instead, we can hope
that index number researchers in either tradition will become increasingly aware
that both traditions offer critical advantages.
Acknowledgements
I am grateful to Keir Armstrong, Bert Balk, Erwin Diewert and Jack Triplett for
helpful comments. The views expressed are my own and should not be attributed
to the Bureau of Economic Analysis.
P ROOF. The commensurability axiom states that a change in the quantity units
for any item i must have no effect on the index. A change in the units of mea-
surement for the arbitrary item i will change the values of qi0 and qit and the
values of pi0 and pit , so an index that satisfies the commensurability axiom
must be expressible as a function that does not have the qi0 , the qit , the pi0 , or
the qit as arguments. In particular, all the information about item i that matters
for the index must be contained in three functions of (qi0 , qit , pi0 , qit ) that are
unaffected by a change in its units of measurement: (a) the price relative pit /pi0 ;
(b) the expenditure pi0 qi0 ; and (c) pit qit . Therefore, an index that satisfies the
commensurability axiom is expressible in the form:
where s10 and s1t are omitted from the argument list because they are deter-
mined from the other shares as 1 − (s20 + · · · + sN 0 ) and 1 − (s2t + · · · + sN t ),
respectively.
The circularity test states that:
f (p1s /p10 )(p1t /p1s ), . . . , (pN s /pN 0 )(pN t /pN s ), ·
= f (p1s /p10 , . . . , pN s /pN 0 , ·)f (p1t /p1s , . . . , pN t /pN s , ·). (A7.2)
Equation (A7.2) implies that multiplying pis /pi0 by any positive scalar k
must change log f (p1s /p10 , . . . , pN s /pN 0 , ·) by minus the amount that divid-
ing pit /pis by k changes log f (p1t /p1s , . . . , pN t /pN s , ·). Let r represent the
vector of price relatives from time 0 to time s and ρ represent the vector of
price relatives from time s to time t. Then, using the first price relative as the
representative case, for all (r, ρ) we have the equality:
This equality holds if and only if, for some predetermined constant w1 ,
Now suppose that pit /pi0 = λ > 0 for all i. Then λ(pi0 /pis ) can be substi-
tuted for pit /pis in Eq. (A7.2). Furthermore, using the proportionality test, λ can
Axiomatic Price Index Theory 181
be substituted for f (p1t /p10 , . . . , pN t /pN 0 , ·). Equation (A7.2) then becomes:
Consequently, i=1,...,N wi = 1. Finally, monotonocity of f (·) implies that
wi 0 ∀i.
In the economic approach, the price index concept is a ratio of expenditure func-
tions, cost functions or revenue functions. These functions are derived from the
primal economic utility or production functions via the maximization and min-
imization problems studied in duality theory. They therefore exist if and only if
the demand system or output system could have been generated by some form
of economic optimization behavior, such as utility maximization.
To identify the implications of economic optimization behavior, we need an
index concept that does not presume the existence of expenditure functions, cost
functions, or revenue functions, which are used to define the economic indexes.
The Divisia index can be adapted to this purpose. The Divisia index may be
considered a generalization of the economic index concept (such as the cost of
living index) because in cases where the functions needed to define an economic
index exist, the Divisia index can be evaluated in a way that makes it equal to the
economic index. A detailed explanation of this point is beyond the scope of this
appendix, but briefly, to equal the economic index based on some indifference
curve (or isoquant), the line integral that defines the Divisia index must be eval-
uated over a path consisting of a segment that runs along the indifference curve
(or isoquant) in price space, and a segment (or possibly a pair of segments) that
runs along a ray emanating from the origin.
A derivation of the Divisia index of consumption is as follows. Let {(pt , Yt );
t ∈ [0, 1]} define a continuously differentiable mapping from t to the price vec-
tor pt and income level Yt . François Divisia defined an analogous path for the
182 M.B. Reinsdorf
quantity vector, but in the version of the Divisia index used to study the proper-
ties of a demand model, the quantities must be specified as functions of prices
and income. To represent the demand model, let s(p, Y ) be a continuously dif-
ferentiable mapping of prices and income to a vector of expenditure shares.
Letting sit and pit represent the share and price of the ithitem at time t, a price
change from pt to pt+t implies a Laspeyres index of i sit (pi,t+t /pit ). In
the limit as t approaches 0, the log-change in this Laspeyres index equals
s(pt , Yt ) · (∂ log(pt )/∂t)t, and the log-change in the Paasche index has an
identical limit. We therefore define the Divisia price index as the solution to
the differential equation:
∂ log PtDivisia /∂t = s(pt , Yt ) · ∂ log(pt )/∂t . (B7.1)
0
∂ log(Yt )/∂t − ∂ log e(p, u) /∂ log p · ∂ log(pt )/∂t dt. (B7.3)
1
This integral equals log(Y (1)/Y (0)) − log(e(p(1), u)/e(p(0), u)) regardless of
the path.
For more information on Divisia indexes see Balk (2005).
B7.1. Glossary
Product. Requires that the price index and the quantity index together decom-
pose the expenditure change; i.e. P (p0 , pt , q0 , qt )Q(p0 , pt , q0 , qt ) = (pt · qt )/
(p0 · q0 ).
Proportionality. P (p0 , λp0 , q0 , qt ) = λ. Also known as “strong proportional-
ity” to distinguish it from the weaker requirement that P (p0 , λp0 , q0 , λq0 ) = λ.
Proportionality, comparative. P (p0 , λps , q0 , qt )/P (p0 , ps , q0 , qs ) = λ, where
pt has been assumed to equal λps .
Time reversal. P (p0 , pt , q0 , qt )P (pt , p0 , qt , q0 ) = 1.
Weak axiom of revealed preference. Implies that the standard of living index is
greater than or equal to 1 whenever the Paasche quantity index is greater than 1
and that the standard of living index is less than or equal to 1 whenever the
Laspeyres quantity index is less than 1. If expenditures are constant, equivalent
conditions are that the cost of living index is greater than or equal to 1 whenever
the Paasche price index is greater than 1, and the cost of living index is less than
or equal to 1 whenever the Laspeyres index is less than 1.
Paasche. Uses final (comparison) period quantities as its basket. If price rela-
tives are averaged, the averaging formula is a weighted harmonic mean, and the
weights are the expenditure shares from the same period as the numerator of the
price relatives.
Sato–Vartia. In log-change form, an average of price log-changes with
weights proportional to logarithmic means of base and comparison period ex-
penditure shares.
Stuvel. Uses the quadratic formula to define implicitly an average of the
Laspeyres and Paasche indexes that satisfies the factor reversal test and the time
reversal test.
Törnqvist. In log-change form, an average of price log-changes with weights
equal to simple averages of base and comparison period expenditure shares.
Some authors refer to the chained Törnqvist index as a “Divisia index.”
Walsh. Uses a two-period geometric mean of quantities as its fixed basket.
References
Afriat, S.N. (1967). The construction of utility functions from expenditure data. International Eco-
nomic Review 8, 67–77.
Allen, R.G.D. (1935). Some observations on the theory and practice of price index numbers. Review
of Economic Studies 3, 57–66.
Allen, R.G.D. (1949). The economic theory of index numbers. Economica 16, 197–203.
Armstrong, K. (2003). A restricted-domain multilateral test approach to the theory of international
comparisons. International Economic Review 44, 31–86.
186 M.B. Reinsdorf
Balk, B.M. (1995). Axiomatic price index theory: A survey. International Statistical Review 63,
69–93.
Balk, B.M. (1996). Consistency-in-aggregation and Stuvel Indices. Review of Income and Wealth
42, 353–364.
Balk, B.M. (2003). Aggregation methods in international comparisons, ERIM Report Series Refer-
ence No. ERS-2001-41-MKT. http://ssrn.com/abstract=826866.
Balk, B.M. (2005). Divisia price and quantity indices: 80 years after. Statistica Neerlandica 59,
119–158.
Banerjee, K.S. (1959). A generalisation of Stuvel’s index number formulae. Econometrica 27, 676–
678.
Blackorby, C., Primont, D. (1990). Index numbers and consistency in aggregation. Journal of Eco-
nomic Theory 22, 87–98.
Blow, L., Crawford, I. (2001). The cost of living with the RPI: Substitution bias in the UK retail
prices index. Economic Journal 111, F357–F382.
Boumans, M. (2001). Fisher’s instrumental approach to index numbers, In: Klein, J.L., Morgan,
M.S. (Eds.), The Age of Economic Measurement, Duke Univ. Press, Durham, pp. 313–344.
Bowley, A.L. (1928). Notes on index numbers. Economic Journal 38, 216–237.
Deaton, A. (1998). Getting prices right: What should be done? Journal of Economic Perspectives
12, 37–46.
Diewert, W.E. (1973). Afriat and revealed preference theory. Review of Economic Studies 40, 419–
426.
Diewert, W.E. (1976). Exact and superlative index numbers. Journal of Econometrics 4, 115–145.
Reprinted in: Diewert, W.E., Nakamura, A.O. (Eds.), Essays in Index Number Theory, vol. 1.
Elsevier Science Publishers, Amsterdam, 1993, pp. 223–252.
Diewert, W.E. (1978). Superlative index numbers and consistency in aggregation. Econometrica 46,
883–900. Reprinted in: Diewert, W.E., Nakamura, A.O. (Eds.), Essays in Index Number Theory,
vol. 1. Elsevier Science Publishers, Amsterdam, 1993, pp. 253–275.
Diewert, W.E. (1981). The economic theory of index numbers: A survey. In: Deaton, A. (Ed.), Es-
says in the Theory and Measurement of Consumer Behavior in Honour of Sir Richard Stone,
Cambridge Univ. Press, London, 163–208. Reprinted in: Diewert, W.E., Nakamura, A.O. (Eds.),
Essays in Index Number Theory, vol. 1. Elsevier Science Publishers, Amsterdam, 1993, pp. 177–
222.
Diewert, W.E. (1984). Group cost of living index: Approximations and axiomatics. Methods of Op-
erations Research 48, 23–45.
Diewert, W.E. (1992). Fisher ideal output, input, and productivity indexes revisited. Journal of Pro-
ductivity Analysis 3, 211–248.
Diewert, W.E. (1993). The early history of price index research. In: Diewert, W.E., Nakamura, A.O.
(Eds.), Essays in Index Number Theory, vol. 1, Elsevier Science Publishers, Amsterdam, pp. 33–
66.
Diewert, W.E. (1999). Axiomatic and economic approaches to international comparisons. In: Hes-
ton, A., Lipseys, R.E. (Eds.), International and Inter-Area Comparisons of Income, Output and
Prices, pp. 13–87.
Diewert, W.E. (2005). Index number theory using differences rather than ratios. American Journal
of Economics and Sociology 64:1, 311–360.
Dowrick, S., Quiggin, J. (1994). International comparisons of living standards and tastes: A revealed
preference analysis. American Economic Review 84, 332–341.
Dowrick, S., Quiggin, J. (1997). True measures of GDP and convergence. American Economic Re-
view 87, 41–64.
Ehemann, C. (2005). Chain drift in leading superlative indexes. Working paper WP2005-09 BEA.
Available at http://www.bea.gov/bea/working_papers.htm.
Eichhorn, W. (1976). Fisher’s tests revisited. Econometrica 44, 247–256.
Eichhorn, W., Voeller, J. (1983). The axiomatic foundation of price indexes and purchasing power
parities. In: Diewert, E., Montmarquette, C. (Eds.), Price Level Measurement. Ministry of Supply
and Services, Ottawa.
Axiomatic Price Index Theory 187
Eltetö, Ö. Köves, P. (1964). On the problem of index number computation relating to international
comparisons. Statisztikai Szemle 42, 507–518 (in Hungarian).
Ferger, W.F. (1946). Historical note on the purchasing power concept and index numbers. Journal
of the American Statistical Association 41, 53–57.
Fisher, I. (1911). The Purchasing Power of Money. MacMillan, New York.
Fisher, I. (1921). The best form of index number. Quarterly Publications of the American Statistical
Association 17, 533–537.
Fisher, I. (1922; 3rd ed. 1927). The Making of Index Numbers: A Study of Their Varieties, Tests, and
Reliability. Houghton Mifflin Co., Boston.
Fisher, W.C. (1913). The tabular standard in Massachusetts history. Quarterly Journal of Economics
27, 417–452.
Frisch, R. (1930). Necessary and sufficient conditions regarding the form of an index number which
shall meet certain of Fisher’s Tests. Journal of the American Statistical Association 25, 397–406.
Frisch, R. (1936). Annual survey of general economic theory: The problem of index numbers.
Econometrica 4, 1–38.
Funke, H., Voeller, J. (1978). A note on the characterization of Fisher’s ideal index. In: Eichhorn,
W., Henn, R., Opitz, O., Shephard, R.W. (Eds.), Theory and Applications of Economic Indices.
Physica-Verlag, Würzburg, pp. 177–181.
Funke, H., Hacker, G., Voeller, J. (1979). Fisher’s circular test reconsidered. Schweizerische
Zeitschrift für Volkswirtschaft und Statistik 115, 677–687.
Gini, C. (1924). Quelques considérations au sujet de la construction des nombres indices des prix et
des questions analogues. Metron 2.
Gini, C. (1931) On the circular test of index numbers. Metron 9, 3–24.
Geary, R.C. (1958). A note on the comparison of exchange rates and purchasing power between
countries. Journal of the Royal Statistical Society 121, 97–99.
Haberler, G. (1927). Der Sinn der Indexzahlen. Mohr, Tübingen.
Hill, R.J. (2006). Superlative index numbers: Not all of them are super. Journal of Econometrics
130, 25–43.
Hulten, C.R. (1973). Divisia index numbers. Econometrica 41, 1017–1025.
Hurwicz, L., Richter, M. (1979). Ville axioms and consumer theory. Econometrica 47, 603–620.
International Labor Organization (2004). Consumer Price Index Manual: Theory and Practice. ILO
Publications, Geneva.
Keynes, J.M. (1930). A Treatise on Money. MacMillan, London.
Khamis, S.H. (1972). A new system of index numbers for national and international purposes. Jour-
nal of the Royal Statistical Society 135, 96–121.
Konus, A.A. (1939). The problem of the true index of the cost of living. Econometrica 7, 10–29.
Krtscha, M. (1984). A Characterization of the Edgeworth–Marshall Index. Athenaüm/
Hain/Hanstein, Königstein.
Krtscha, M. (1988). Axiomatic characterization of statistical price indices. In: Eichhorn, W. (Ed.),
Measurement in Economics. Physical-Verlag, Heidelberg.
Lerner, A.P. (1935). A note on the theory of price index numbers. Review of Economic Studies 3,
50–56.
Manser, M., McDonald, R. (1988). An analysis of substitution bias in measuring inflation. Econo-
metrica 46, 909–930.
Montgomery, J.K. (1937). The Mathematical Problem of the Price Index. Orchard House, P.S. King
& Son, Westminster.
National Research Council (2002). At what price? Conceptualizing and measuring cost-of-living
and price indexes. In: Schultze, C.L., Mackie, C. (Eds.), Panel on Conceptual Measurement, and
Other Statistical Issues in Developing Cost-of-Living Indexes. Committee on National Statistics,
Division of Behavioral and Social Sciences and Education, National Academy Press, Washing-
ton, DC.
Pierson, N.G. (1896). Further considerations on index numbers. Economic Journal 6, 127–131.
Pigou, A.C. (1912). Wealth and Welfare. Macmillan, London.
Pigou, A.C. (1920). The Economics of Welfare. 4th ed., Macmillan, London.
188 M.B. Reinsdorf
Pollak, R. (1981). The social cost-of-living index. Journal of Public Economics 15, 311–336.
Reinsdorf, M.B. (1998). Formula bias and within-stratum substitution bias in the US CPI. Review of
Economics and Statistics 80, 175–187.
Reinsdorf, M., Dorfman, A. (1999). The monotonicity axiom and the Sato–Vartia Index. Journal of
Econometrics 90, 45–61.
Reinsdorf, M.B., Diewert, W.E., Ehemann, C. (2002). Additive decompositions for Fisher, Törnqvist
and geometric mean indexes. Journal of Economic and Social Measurement 28, 51–61.
Rothbarth, E. (1941). The measurement of changes in real income under conditions of rationing.
Review of Economic Studies 8, 100–107.
Samuelson, P.A., Swamy, S. (1974). Invariant economic index numbers and canonical duality: Sur-
vey and synthesis. American Economic Review 64, 566–593.
Sato, K. (1976). The ideal log-change index number. Review of Economics and Statistics 58, 223–
228.
Staehle, H. (1935). A development of the economic theory of price index numbers. Review of Eco-
nomic Studies 2, 163–188.
Stuvel, G. (1957). A new index number formula. Econometrica 25, 123–131.
Swamy, S. (1965). Consistency of Fisher’s tests. Econometrica 33, 619–623.
Szulc, B. (1964). Indices for multiregional comparisons. Przeglad Statystyczny 3, 239–254.
Törnqvist, L. (1936). The bank of Finland’s consumption price index. Bank of Finland Monthly
Bulletin 10, 1–8.
Triplett, Jack E. (2001). Should the cost-of-living index provide the conceptual framework for a
Consumer Price Index? Economic Journal 111, F311–F334.
Trivedi, P.K. (1981). Some discrete approximations to Divisia integral indices. International Eco-
nomic Review 22, 71–77.
van IJzeren. J. [van Yzeren] (1952). Over de plausibiliteit Van Fisher’s ideale indices (On the
plausibility of Fisher’s ideal indices). Statistiche en Econometrische Onderzoekingen (C.B.S.)
7, 104–115.
van IJzeren, J. [van Yzeren] (1958). A note on the useful properties of Stuvel’s index numbers.
Econometrica 26, 429–439.
Varian, H.R. (1982). The non-parametric approach to demand analysis. Econometrica 50, 945–974.
Varian, H.R. (1984). The non-parametric approach to production analysis. Econometrica 52, 579–
597.
Vartia, Y.O. (1976). Ideal log-change index numbers. Scandinavian Journal of Statistics 3, 121–126.
Vartia, Y.O., Weymark, J.A. (1981). Four revealed preference tables. Scandinavian Journal of Eco-
nomics 83, 408–418.
Ville, J. (1951–1952). The existence-conditions of a total utility function. Review of Economic Stud-
ies 19, 123–128.
Vogt, A. (1981). Characterizations of indexes, especially of the Stuvel index and the Banerjee index.
Statistische Hefte 22, 241–245.
Walsh, C.M. (1901). The Measurement of General Exchange Value. Macmillan and Co., New York.
Walsh, C.M. (1921). The best form of index number: Discussion. Quarterly Publications of the
American Statistical Association 17, 537–544.
CHAPTER 8
Abstract
National accounts generate a variety of indicators used in economics for deter-
mining the value of goods and services. This chapter highlights two problems
in the measurement of such indicators, namely the construction of the data at
the macro level using individual observations from different sources, and the in-
terpretation of the data when economic relationship are empirically investigated
using these data at the macro level. The chapter pays ample attention to the insti-
tutional set-up of national accounting, and to the use of indicators derived from
the national accounts in policy analysis in various industrialised countries. Major
difficulties in interpretation arise when the indicators are used in the assessment
of (social) welfare and in separating developments in prices and volumes.
8.1. Introduction
given empirical content by the statistics from the national accounts. National in-
come may have different meanings and connotations in various macro economic
analyses. However, when national accounts’ data are used in these analyses to
represent the concept of national income empirically, it is the definition of na-
tional income according to the rules of national accounting which determines
how this concept is made operational. To give another example: many inhabi-
tants of the European Union had the impression that after the introduction of the
Euro life had become much more expensive. Yet, according to the price deflators
computed by the NSOs, following the standard aggregation methods, “in real-
ity” only a slight increase of inflation could be observed. Obviously there was a
discrepancy between the men and women in the streets’ view on inflation, and
the way this concept is made operational in statistical accounting.
This chapter focuses on the conceptual problem of the construction and use
of indicators from the national accounts in policy analysis. As the author is
especially familiar with the situation in the Netherlands, most examples and
historical anecdotes stem from that country. The contents of the article is as
follows. The next section describes the characteristics and methodology of the
national accounts. Section 8.3 surveys the history of national accounting and
Section 8.4 discusses the interaction between the collection and use of data at
the macro level in the last two centuries in the Netherlands. Section 8.5 consid-
ers the role of statistics and economic policy analysis in the institutional set-up
of the polder model in the Netherlands. Section 8.6 discusses the history of na-
tional accounting and the institutional set-up of policy preparation in some other
industrialised countries. Sections 8.3–8.6 provide insight into the confrontation
between scientific knowledge and practical policy needs, which has been crucial
in the development of the national accounts. Section 8.7 examines the present
situation, issues for discussion and prospects for national accounting. The rela-
tionship between construction and use of various main economic indicators from
the national accounts is discussed in Section 8.8. This section also gives exam-
ples where the conceptual problem of measurement and use has been subject of
fierce debate, such as the use of NA statistics as welfare indicators and the cor-
rection of national income for environmental degradation. Finally Section 8.9
concludes.
lective sector makes use of national resources. National accounts data also give
an answer to the question to what extent the policy goals have been realised.
This illustration of the scope and use of the national accounts is indicative for
what must be included in the description of the economic process. On the one
hand the selection of data is motivated by the needs from economic theory, and
on the other side by the demands from policy analysis. With respect to the latter,
demands do not only stem from the government. Trade unions and employer
associations base their policy likewise on data from the national accounts. An
example is the development of prices and labour productivity, which play a ma-
jor role in the wage negotiations.
The system of the national accounts can be characterised as a coherent and in-
tegrated data set at the macro level. The consistency of the data in the accounting
scheme is guaranteed by using definition equations and identities, which relate
the underlying observations from various statistical sources to each other. This
quality of the system is crucial for its use in economic analysis and policy: its
structure of interdependent definitions enables a uniform analysis and compari-
son of various economic phenomena. However, it also makes the system rather
rigid. It is impossible to change individual concepts and/or definitions in the sys-
tem. For instance, inclusion of a new component to domestic production is only
possibly if at the same time the concepts of income, consumption, savings and
investments are adapted.
The consistency of the system of national accounts is of great importance for
the way the data are used in practice. A number of possibilities has already been
mentioned. The domestic product and national income are frequently used as
an overall indicator for the functioning of the national economy. The success
of the economic policy and the financial power of a nation are based on these
indicators. In this line of reasoning the extent to which a country should pro-
vide development aid is expressed as a percentage of national income. National
income is also the benchmark for payments of the various member states to
the European Union. A higher national income means more payments. There-
fore, it is extremely important that the calculation of national income is based
as much as possible on objective criteria and is calculated according to inter-
national guidelines. It should not become subject of dispute between countries,
and of political manipulation.
The same applies when the national income is taken as a basis for various
economic indicators to guide and judge government policy. See for instance the
debt and budget deficit of the government, which are, according to the Maas-
tricht criteria and the limits set in the Stability and Growth Pact (SGP) of the
EU, expressed as a percentage of national income. Moreover the relative impor-
tance of a specific economic sector, e.g. agriculture, industry or retail trade, can
be illustrated by calculating its relative share in domestic production. However,
the fact that national account data should be undisputed when used in policy
practice, does not exclude that there can be much dispute between experts on
proper definitions. By way of example Mellens (2006) discusses the various de-
finitions of savings.
National Accounts and Indicators 193
8.2.1. Methodology
National accounts are set-up for a number of possible uses. The consequence
of such diversity is that the definition of the various concepts in the national ac-
counts (e.g. of income) is not always completely in accordance with the intention
and wishes of the users. An important choice in this respect is that between pro-
viding a description from the angle of the economic actors versus reproducing
as correctly as possible economic processes. The first is called the institutional
approach and the second the functional approach.
In the institutional approach the producers are the focus of the description
of the production process. Their value added in production is classified on the
basis of their main activities in sectors of the economy. Producers who perform
mainly transport activities, therefore will be classified in the transport sector.
This provides good information on total production value of producers in a spe-
cific branch of industry or services. However, it also implies that other activities
of the producers in the transport sector, for instance some trading activities,
are not counted as such in the national accounts. When the analysis focuses
on the characteristics of the production activities themselves, such institutional
approach is not very adequate and a functional approach is warranted.
The question of how to define a concept plays an important role in the national
accounts and in the interpretation of the data from these accounts. Examples
are construction and decorating activities of house owners and their families,
and unpaid domestic work. Should these be included in the domestic product?
One can think of pros and cons. The argument for inclusion is that they are
productive services that would be included in the domestic product if they would
be performed on payment by third parties. The counter argument is that inclusion
would imply large changes in the domestic production, which would limit the
use of this indicator in analysing the developments of the market economy. In
fact, taxable income is used here as criterion (see Bos, 2003, pp. 145–147).
The problem of definition is, of course, very much connected to the desire for
international comparability. An individual country or a statistical office does not
decide about the definition of, for instance, income autonomously, but has to
follow the definition laid down in the international directives. Of course there
are always border cases and grey areas in these definitions. A typical example in
the Netherlands is the (home) production from small rented gardens at distance
from the homes (so called “volkstuintjes”). It is now included in the production
statistics because the official directives suggest it should, but only after a foreign
expert asked questions about the production of these gardens when he had seen
them when travelling to the CBS (Netherlands Central Bureau of Statistics).
However, most of such cases relate to small amounts which will not influence
interpretation of the data.
194 F.A.G. den Butter
A major question of this chapter is how national accounts’ data can be used in
measurement of economic phenomena and relationships. From a theoretical per-
spective this question relates to the way the construction and compilation of data
of the national accounts are related to the theory of measurement. According
to Boumans (2007, this volume) today’s measurement theory is the Represen-
tational Theory of Measurement. It is described as taking “measurement as a
process of assigning numbers to attributes of the empirical world in such a way
that the relevant qualitative empirical relations among these attributes are re-
flected in the numbers themselves as well in important properties of the number
system”. Boumans distinguishes two different foundational approaches in eco-
nomics in the theory of measurement: the axiomatic and the empirical approach.
When considering measurement and national accounts the empirical approach
is most relevant. For the use of these data in policy analysis modelling eco-
nomic relationships based on economic theory plays a major role. That is why
this chapter pays ample attention to the interaction between the provision of
data at the macro level, the empirical analysis of economic relationships using
these data and the policy analysis based on these relationships, or “models” of
the economy. Loosely speaking, measurement theory is, in this respect, con-
cerned with determining the parameter values of these models using the data
constructed by the methodology of the national accounts. Modern econometric
methodology, time series analysis in particular, teaches us how to establish this
empirical link between data and characteristics of the model (see e.g. Chao, this
volume). However, a number of methodological issues remains unsolved which
nowadays have considerably reduced the role of econometric methodology in
macroeconomic model building (see e.g. Don and Verbruggen, 2006). Three
issues can be mentioned. A first issue is that consistency of the models with the-
oretical requirements and with long run stylised facts is often at variance with
parameter estimates which are a mere result of applying econometric methods
to one specific data set. A second issue is that econometric methodology re-
quires specific conditions of the specification of a model, e.g. linearity, which
are too binding for a proper use of the model. Thirdly, the relationship between
the theoretical concept warranted in the model may be much at variance with
the practical construction method according to which the data in the empirical
analysis are obtained. This latter issue is most relevant for this chapter.
Important historical events such as wars, economic crises and revolutions have
always called the need for good quantitative data on the economy at the macro
level, and have therefore contributed considerably to the development of national
accounting. A look into the early history teaches us that a need for such data for
policy analysis formed the reason for the first estimates of national income. They
National Accounts and Indicators 195
were made respectively by Sir William Petty and Gregory King in 1665 and
1696 for United Kingdom (see Kendrick, 1970; Bos, 1992, 2003). Petty tried
to show that the state could raise a much larger amount of taxes to finance the
war expenditure than it actually did, and that the way of collecting taxes could
be much improved. Moreover, Petty wanted to show that the United Kingdom
was not ruined by its revolutions and by the wars with the foreign enemies, and
that it could compare itself with the Netherlands and France with respect to the
amount of trade and military potential.
The estimates by King can be regarded as an improvement to those of Petty.
In his calculation method, King used a broad concept of income and production,
similar to what it is today according to the guidelines of the United Nations.
Production comprises the added value of both the production of goods and of
services. This concept is in strong contrast with that of the physiocrats, who
reasoned that only agriculture produces value added and that all remaining pro-
duction is ‘sterile’. Yet already Adam Smith argued that not only agriculture but
also occupations in the trade and the industry produce added value. However,
according to Smith, services, both by the government and by private businesses,
do not generate additional value. In that sense the income concept of King was
even broader and more modern than that of Smith. Beside the use of a ‘modern’
concept of income, a second important characteristic of the estimates of King is
that he calculated national income already in three different ways, as it is done
today, namely from the perspective of (i) production, (ii) income distribution
and (iii) expenditure. Moreover, the calculations by King showed remarkably
much detail. He did not restrict himself to the outcomes for total annual na-
tional income and the total annual expenditure and savings, but made a split up
of these data with respect to social groups, to the various professions, and to
different income groups. He also made an estimate of the national wealth (gold,
silver, jewels, houses, livestock, etc.). King compared national income and na-
tional wealth of United Kingdom with those of the Netherlands and France. It is
interesting to note that this aspect of international comparability – an important
aim of the international guidelines – already played a role in the first estimates
of national income ever. King constructed time series for national income for
the period 1688–1695. Using these time series he calculated income forecasts
for the years 1696, 1697 and 1698.
At about the same time in France estimates of national income were made
by Boisguillebert and Vauban. It is unclear to what extent these estimates were
influenced by the way national income was originally calculated in United King-
dom. However, the estimates of the English national income by Petty and King
can be regarded unique as far as the quality and the scope of these estimates
were hardly matched in the following two centuries. After the pioneering work
of King the number of countries for which national bookkeeping’s were estab-
lished, gradually increased. Around 1900, estimates were available for eight
countries: United Kingdom, France, the United States, Russia, Austria, Ger-
many, Australia and Norway. Compiling national accounts was not yet always
196 F.A.G. den Butter
considered as a task for the government. In this respect Australia was an early
bird: here the government already started in 1886.
International historic reports do not include the Netherlands in the above list of
eight countries. Nevertheless the first estimates of national income in the Nether-
lands were already made much earlier (see Den Bakker, 1993). In fact the history
of the national bookkeeping in the Netherlands starts at the beginning of the 19th
century, with the calculations of national income by Hora Siccama and Van Rees
in 1798, by Keuchenius in 1803, and by Metelerkamp in 1804. And again war
was the reason for making these calculations. The major goal of these calcula-
tions was that they enabled a comparison of the wealth in the Netherlands with
that of the neighbouring countries from the economic and military perspective.
The calculations by Hora Siccama and Van Rees were part of a plan at the re-
quest of the national assembly of the new Batavian republic for revision of the
tax system. The reason was to see how taxes could be levied efficiently, in pro-
portion to personal wealth (see Bos, 2006). Keuchenius, a member of the city
council of Schiedam, constructed a hypothetical estimate of national income
which was based on the situation as if war in Europe would have ended and
peace would have been established. Keuchenius estimated national income of
the Netherlands to be about 221 million guilders, it is 117 guilders per head of
the population. The share of agriculture and fishery in this income amounted to
45%, whereas 27% was transfer income from abroad (think of the rich import
from the colonies). Metelerkamp, who knew the work of Keuchenius, intro-
duced some improvements, and arrived at an estimate of national income for
the Netherlands in 1792 of 250 million guilders, that is 125 guilders per head of
the population.
The first systematic estimates of national income in the Netherlands were
made by Bonger. The first year for which data were calculated, was 1908. It
was published in 1910. The first official calculations of national income by the
Netherlands Central Bureau of Statistics (CBS) were published in 1933 and refer
to the year 1929. Finally it was Van Cleeff who constructed a coherent system
of national accounts for the Netherlands in a two article publication in the Dutch
periodical ‘De Economist’ in 1941. Subsequently, on 19 January 1943 a com-
mission for national accounting was installed at CBS. Today the installation of
this commission is considered the official beginning of the Netherlands’ national
accounting (see Bos, 2006, for an extensive review of the history of national ac-
counting in the Netherlands).
The 1930s and 1940s provided inspiration for the modern system of national ac-
counts. Three aspects played an important role. In the first place the discussion
National Accounts and Indicators 197
on what concepts of income to use at the macro level revived. Secondly devel-
opments in economic theory underlined the importance of national accounting.
Thirdly the first coherent and approved systems of national accounts were devel-
oped. The two most important protagonists in the discussions on the problems
of the definition of national income (what should, and should not be included
in income data) in the inter-bellum were Clark and Kuznets. Clark argued that
services from house ownership were to be included in income, but services of
durable consumer goods were not to be included. Clark already suggested to
subtract every verifiable exhaustion of natural resources from income. Moreover
he considered problems of purchasing power and international and intertempo-
ral comparability of the national income data. This discussion of comparability
continues today and has, for instance, resulted in the large PENN World Table-
project of data collection and construction, where national income data are made
comparable by using a constructed international price. More specifically, for
each country the costs of a differentiated basket of goods are calculated and the
national income data are corrected by means of the observed cost differences
(see Summers and Heston, 1991).
Much more than Clark, Kuznets was also a prominent theoretician. He pub-
lished on the link between changes in national income and welfare, on the
valuation of production by the government and on the difference between in-
termediate and final production. Moreover he contributed a number of techni-
calities in data processing (interpolation, extrapolation). In 1936, Leontief made
a next major step in the statistical description of an economy by presenting in-
put/output tables. Although the basic idea of the input/output table is already
present in Quesnay’s ‘tableau économique’ and in the way Walras described the
working of the economy, Leontief’s main innovation was the formulation of the
model that directly connects the outputs with the inputs in an operational man-
ner. In this way it portrays the complete production structure of a country and it
enables to calculate which changes in inputs are needed in order to bring about a
warranted change of the outputs. It should be noticed that there does not need to
exist a direct link between the input/output tables and the national accounts. As
a matter of fact in a large number of countries input/output tables are calculated
only on an incidental basis, and outside the framework of the annual calculation
of the national accounts. The Netherlands is an exception. Already for a long
time in this country input/output tables are published annually together with the
tables of the national accounts. In this case the input/output tables do not only
form a separate source of information, but are also exploited as the main statis-
tical tool to calculate the data from the production accounts.
In the 1930s, the start of macro economic model building and the consequent
development of new econometric techniques were important innovations that
increased the need for statistical data collection at the macro level, and hence
198 F.A.G. den Butter
for national accounting. In 1936, Tinbergen constructed the first macro model
for the Dutch economy. In order to make the model describe the actual work-
ing of the economy empirically, the behaviour parameters of the model were
estimated using time series data on all endogenous and exogenous variables of
the model. For that reason other and longer time series at the macro level were
needed than originally available. Moreover, the quality of the existing data had
to be improved. Although Tinbergen realised the need of a good and compre-
hensive system of national accounts, he himself has not been involved directly
in the drawing up of such a social accounting system. However, the CBS started
already in 1937 at the request of Tinbergen a project that aimed at improved
estimates of the national income. Its focus was a better statistical foundation
of cyclical analysis. At the CBS it was Derksen who managed this project
that contributed much to improve the calculation methodology of income data.
Nowadays the demands of the builders and users of macro economic models still
play a major role in the set-up and development of national accounting.
Undoubtedly the most important support for further elaboration of the national
bookkeeping was the publication of Keynes’ “General Theory” in 1936. It marks
the beginning of macro economic analysis. This Keynesian analysis directly
connects economic theory with national accounting: both use the same set of
identities. The consequence of the theory of Keynes was that a shift occurred in
the main concept of income used in policy analysis: net national income in factor
costs was more and more replaced by gross national income in market prices.
The reason was to provide a better insight into the link between the different
expenditure categories and income. The Keynesian revolution also prompted
the governments to an active countercyclical policy. This created a need for
a system of national accounts where the government sector was added to the
sector accounts. All in all, thanks to the Keynesians revolution it was widely
recognised how important national accounting for preparation and conduct of
economic policy is. Keynes himself actively stimulated the advancement of na-
tional accounting schemes, particularly in the United Kingdom. At his initiative
the most important experts of the national accounting in the United Kingdom,
Stone and Meade, made estimates of national income and expenditures in 1941.
These data were used to assess the receipts and expenditures of the government
into a scheme of balances for the whole economy. And again it was a war which
contributed to a prompt implementation of this work. According to Stone the
major aim of this exercise was to map the problem of financing the war expendi-
tures. These data were indeed used in the discussions on the government budget
during war time.
National Accounts and Indicators 199
This marks also the beginning of the era in which national accounting was con-
ducted on the basis of international guidelines in order to promote international
comparability. For that reason, the League or Nations (the pre-war predecessor
of the United Nations) had already asked for such guidelines in 1939. How-
ever, the activities were postponed because of the war. At last, in 1947, the first
guidelines were published by the United Nations in a report which consisted
mainly of an appendix, drafted by Stone. This appendix can be regarded as the
first fully fledged and detailed description of a system of national accounts. The
next step were the guidelines that Stone published in 1951 at the request of the
Organisation of European Economic Co-operation (OEEC, the predecessor of
the OECD). These guidelines were a simplification as compared to those of the
United Nations: in fact the guidelines of the United Nations were much too am-
bitious for most European countries. After a number of following rounds with
new guidelines the United Nations published in 1968 a fully revised and very
detailed set of guidelines for the construction of national accounts (SNA). To-
gether with the guidelines of the EC from 1970, which were mainly meant to
clarify the guidelines of the United Nations, these guidelines have, for a long
period, been the basis for the set-up of the systems of national accounts in the
world. As a matter of fact, in order to guarantee the continuity in national ac-
counting, modification of the guidelines should not take place too frequently. It
was only in 1993 that the United Nations issued new guidelines.
The previous survey of the history of national accounts illustrates the long road
from the early calculations of total income and wealth of a nation to today’s
extended and sophisticated systems of national accounts. In order to obtain a
better view on how indicators from the national accounts are used in economic
policy analysis, a look into the history of the interaction between data collection
and policy analysis is also useful. Here the history in the Netherlands is taken
as an example. A historical overview for other countries, especially the United
Kingdom, Norway and the United States, is given by Kenessey (1993).
Today empirical analysis and measurement play an essential role in the debate
on policy measures in the Netherlands. This interest in actual measurement only
slowly and partially emerged between 1750 and 1850 (see Klep and Stamhuis,
2002; Den Butter, 2004). Yet, it were mainly private initiatives of individual
scientists and practitioners, and not so much of the government, which brought
about this attitude. The estimates by the forerunners Hora Siccama, Van Rees,
Keuchenius and Metelerkamp were already mentioned in the previous section.
200 F.A.G. den Butter
together these apparently disorganised individuals obey the law of errors in devi-
ating from the ideal “average man”. Obviously this is one of the basic notions in
econometric methodology, used in the evaluation of economic policy measures.
So Quetelet can be seen as a first bridge-builder between the mathematically
oriented statistical approach and the descriptive and qualitative-quantitative ap-
proach. However, Quetelet’s ideas did not reach Vissering and his people. It was
only after the 1930s that, with Tinbergen as the great inspirer and teacher, a full
integration of both lines of thought in statistics took place in the Netherlands. It
is remarkable that, whereas these two lines in statistics had been separated for
such a long time, from then on the Netherlands obtained a strong position in
econometrics and applied economics.
Vissering and his people have played a major role in promoting that the gov-
ernment should regard statistical data collection as a public good and therefore
should take its responsibility in collection these data. However, in the second
half of the 19th century the government was very reluctant to take up this re-
sponsibility. Therefore, in 1866 Vissering took a private initiative to compose
and publish general statistics for the Netherlands. However, this large project
has never been finished (see Stamhuis, 1989, 2002). In 1884, when the Dutch
government was still not willing to collect statistical data in the public domain,
a Statistical Institute was established by these private people. At last, in 1892,
after questions in the Second Chamber of the Parliament by, amongst others,
the socialist member of parliament, F.J. Domela Nieuwenhuis, de “Centrale
Commissie voor de Statistiek” (Central Committee for Statistics) was installed.
Finally, in 1899 the Central Bureau of Statistics (CBS) was founded, which from
then on conducts its task to collect independent and undisputed data for public
use in the Netherlands. The Central Committee for Statistics still exists and has a
role as supervisory board for the Central Bureau of Statistics. Its responsibilities
were even expanded by decision of the Parliament in 2003. In fact, the lobby
to have the government collect statistical data at the level of the state was much
conducted by the “Society of Statistics”, founded in 1849 (see Mooij, 1994). Af-
ter 1892, now that the lobby of the society for data collection by the government
had finally been successful, the main focus of the society became more and more
on economics. Therefore, in 1892, its name was changed in Society for Politi-
cal Economy and Statistics. Yet it was more than half a century later, namely in
1950, that the focus of the society was really reflected in its name which now be-
came Netherlands Economic Association. Finally, in 1987 the Queen honoured
the society by granting it the label “Royal”. So since 1987 we have the Royal
Netherlands Economic Association, which, given its start in 1849, is probably
the oldest association of political economists in the world.
202 F.A.G. den Butter
The analysis of the Dutch Central Planning Bureau has from its start played
an important role in the design of the policy preparation in the Netherlands.
National Accounts and Indicators 203
Nowadays the bureau calls itself CPB Netherlands Bureau for Economic Policy
Analysis, because there is no true “planning” involved in the activities of the
bureau. More specifically the analysis is an important input for the negotiations
and social dialogue on policy issues in what has become known as the Dutch
“polder model”. It has already be noted that Tinbergen, who became the CPB’s
first director in 1945, has built the first econometric policy model (Tinbergen,
1936). Therefore, it is understandable that model based policy analysis has, from
the origin, constituted an important part of the work of the CPB. The CPB’s
‘model’ early acquired a high status in academic circles and has come to be
regarded in the Dutch society as an more or less “objective” piece of economic
science (Den Butter and Morgan, 1998).
However, in the first few years of the CPB there was a fierce internal discus-
sion in the CPB about the way the bureau should give shape to its advices (see
Van den Bogaard, 1999). On the one side was Van Cleeff, who had the view that
the CPB should follow a normative approach, while on the other side Tinbergen
supported the idea of disentangling the positive and normative elements of the
analyses. The crucial question in this controversy was about the way economic
policy advice would be the most successful in the pillarised economy. Van Cle-
eff tried to develop an all-embracing normative theory which would integrate the
ideas of the different pillars. Like in industry that would lead to a formal policy
“plan” which could be implemented by the government in a coordinated effort
of all citizens, On the other hand, Tinbergen wanted to develop a method that
would give the most objective description of reality. The differences between the
pillars would then be minimised to their different normative proportions. In other
words, he wanted to make a clear distinction between the working of the econ-
omy (model) and the policy goals (welfare functions), and then “try to agree on
the first and compromise on the second issue”. Tinbergen won this battle. Since
then, economic policy preparation in the Netherlands is organised in three au-
tonomous parts: data, model and norms. As discussed in the previous section,
the data and statistics are collected by the Central Bureau of Statistics (CBS) in
an independent and (hopefully) undisputed manner, the working of the economy
is described by the models of the CPB and the balancing of different points of
view is done by the government in dialogue with unions, employer organisations
and other associations of organised interest. This method of splitting facts and
politics has, up to now, always been a prominent feature in creating consensus
in the Dutch society where all belong to a cultural minority or minority party.
In this institutional set-up the CPB has a major role in describing the working
of the economy. It takes the data, collected, and in the case of national accounts,
constructed by the CBS, as given. The task of the CPB is to provide a quantita-
tive analysis of the state of the Dutch economy, based on scientific knowledge.
In doing so it tries to establish a consensus view on economic developments and
the effects of policy measures. Of course others (other institutions) also have a
say in this analysis of the Dutch economy based on scientific insights. An ex-
ample is the Dutch central bank, that makes its own model based analysis of
developments and policy measures in the Netherlands. Moreover, in some cases
204 F.A.G. den Butter
a major discussion emerges with academics and other scientists working outside
the CPB (e.g. the Ministry of Economics Affairs, private research institutes) on
matters of interpretation of economic developments. Examples are discussions
on Keynesian demand policies versus neo-classical policies in the second half
of the 1970s, on the need for general equilibrium modelling in the early 1990s,
and on the effectiveness of a prolonged policy of wage moderation in the early
2000s. However, these disputes did not refer to the measurement of economic
data at the macro level, nor to the construction methods of data.
Nowadays, the analyses of the CPB are widely used as input for social eco-
nomic policy discussions, e.g. in the Social Economic Council (see below).
A typical example of the role of the CPB in using their model based analysis
for policy purposes is the calculation of the effects of the policy proposals in the
election programmes of the political parties on economic growth, employment,
income distribution and so on. Seemingly, it is almost a realisation of Tinber-
gen’s dream to separate the knowledge on the working of the economy, which
is contained in the models used by the CPB, and the normative preferences on
trade-offs between policy goals, which will differ for each political party. In fact,
the CPB has two major tasks. The first is that of national auditor: this implies
economic forecasting and assessment of the effects of policy measures for the
government and for other groups involved in the policy making process. The
second task consists of the CPB conducting, in a more general sense, applied
economic research (see Don, 1996). Nowadays the latter task gains importance:
extensive scenario analyses and cost benefit analyses are conducted with respect
to various aspects of the Dutch economy. There is also a shift towards micro-
economic research and evaluation studies. Typical for the institutional set-up of
Dutch policy-making are the numerous formal and informal contacts between
the staff of the CPB and the economists at ministries, researchers in academia
and the staff of the social partners. On the one hand, they provide relevant infor-
mation to the CPB, but, on the other hand, they will, if needed, be critical on the
work of the CPB.
An other major institution in the set-up of policy preparation in the Nether-
lands is the Social Economic Council (SER) that plays (together with the Foun-
dation of Labour) the central role in negotiations between the various stake-
holders to come to a compromise agreement on matters of economic and social
policy (see for a more elaborate survey: Den Butter and Mosch, 2003; Den But-
ter, 2006). This is the arena where interaction between scientific knowledge and
the policy dispute takes place. The SER is the main policy advisory board for
the government regarding social economic issues. Its constellation is tripartite.
Labour unions, employer associations and independent “members of the crown”
each possesses one third of the seats. The “members of the crown” consist of
professors in economics or law, politicians, the president of the Dutch Central
Bank and the director of the CPB.
It is through these independent members that the policy discussions within the
SER benefit from the insights of scientific research. The analyses of the CPB and
also of the Dutch Central Bank carry a large weight in these discussions. Policy
National Accounts and Indicators 205
The role of the CBS in the institutional set-up of economic policy preparation
in the Netherlands is much linked to Tinbergen’s strict separation of the task of
independent data collection form the tasks of consensus and compromise forma-
tion on economic policy analysis and political decision making. In this respect
the institutional set-up in other countries differs from that in the Netherlands, al-
beit that independence of data collection and compilation carries a large weight
in all industrialised and democratic economies.
The 19th and early 20th century history of data collection at the macro level
in the UK is somewhat comparable to that in the Netherlands. The major gov-
ernment body to collect data at a national level was the statistical department
of the Board of Trade. After two journalists had been head of that department,
in the early 1870s there were great concerns about the quality of the data. The
idea was to establish a central statistical department to service the requirements
of all Departments of State. Recommendations were continually made over the
years to establish a small central statistical department but they were rejected
because of difficulties arising from the laws, customs and circumstances under
which the different statistics were collected. In addition to the objections raised
by the Board of Trade, Mr Gladstone, then the first Chancellor of the Exchequer,
feared that such a central Department might extend its functions beyond the lim-
its required by economy and expediency, and so the recommendations to form a
Central Statistical Office were rejected.
Calls for improvements in statistical services continued throughout the 1920s
and the 1930s. The outbreak of the Second World War saw proponents for
change brought together in the team supporting the War Cabinet. Finally the
Central Statistical Office (CSO) was set up on 27 January 1941 by Sir Winston
Churchill with the clear aim of ensuring coherence of statistical information and
to service the war effort. It quickly established itself as a permanent feature of
government. It is interesting to note that again it was during wartime that a major
step in the provision of statistical data at the macro level was taken. After 1945
there was an expansion in the work of official statisticians. This resulted mainly
from the aim to manage the economy through controlling government income
and expenditure by the use of an integrated system of national accounts. The
passing of the Statistics of Trade Act in 1947 made it possible to collect more
information from industry on a compulsory basis.
The late 1960s saw the performance of the statistical system again come un-
der scrutiny. Following a report of the Estimates Committee of the House of
Commons a reorganisation was effected. This reorganisation had four central
elements:
• Establishment of the Business Statistics Office (BSO) to collect statistics from
businesses irrespective of the department requiring information.
• Establishment of the Office of Population Censuses and Surveys to collect in-
formation from individuals and households through programmes of censuses,
surveys and registers.
• An enhanced role for the CSO in managing government statistics.
• Development of the Government Statistical Service (GSS), including a cadre
of professional statisticians across government.
A new, expanded CSO was established in July 1989. This brought together
responsibility for collecting business statistics (previously with the BSO), re-
sponsibility for compilation of trade and financial statistics (previously with the
Department of Trade and Industry) and responsibility for the retail prices index
and family expenditure survey (previously with the Employment Department)
with the old responsibilities of the CSO. In early 1990 the quality of economic
statistics continued to be of concern to the Treasury and to the CSO. John Ma-
jor, then the Chancellor of the Exchequer, indicated to Parliament his continuing
concern about the statistical base. This was quickly followed by an announce-
ment in May 1990 of a package of measures (known as the Chancellor’s Initia-
tive), backed up by substantial additional resources, to improve quality. Finally
the CSO was renamed Office for National Statistics (ONS) on 1 April 1996 when
it merged with the Office of Population, Censuses and Surveys (OPCS).
National Accounts and Indicators 207
In Norway, national accounts was, earlier than in most other countries, defined as
the framework for the overall economic policy. It was Ragnar Frisch, with Tin-
bergen the first Nobel price winner in economics, who was responsible for this
special type of integration of national accounting and economic policy analy-
sis in Norway, which differed from the Anglo-American approach. Frisch had
already in the late 1920s worked on a system of accounting concepts for describ-
ing the economic circulation. In 1933 Frisch had recommended the construction
of ‘national accounts’, introducing this term for the first time in Norwegian.
Frisch reworked his national accounting ideas several times in the following
years, adopting the eco-circ system as the name for his accounting framework
(and elaborate eco-circ graphs as a way of presenting it).
Frisch’s national accounting ideas and his active role in the economic policy
discussion in the 1930s led in 1936 to a project with colleagues at the University
of Oslo, where he started to develop national accounts for Norway. Funds were
provided by the Rockefeller Foundation and by private Norwegian sources. In
1940 Frisch had elaborated the eco-circ system from a theoretical level to a quite
sophisticated system of national accounts.
The compilation of national accounts tables according to Frischian ideas was
continued by some of his former students within the Central Bureau of Statistics
(renamed Statistics Norway in 1991). In the first years after WWII, national ac-
counting was at a preliminary stage and international standards were still years
away. That is why the early national accounting in Norway in the Frischian
tradition had distinct national features, which made it differ from the stan-
dard national accounting framework. In the Frischian conception of national
accounts above all it were the ‘real phenomena’ that mattered. The accounts
should distinguish clearly between the real sphere and the financial sphere and
show the interplay between them. The entries in the accounts should repre-
sent flows (or stocks) of real and financial objects. This ‘realist’ conception of
national accounting, supported by Frisch’s detailed structure of concepts, was
later modified by adopting elements from Richard Stone’s work, and further
enhanced by embracing the input–output approach of Wassily Leontief as an in-
tegral part. For years the Norwegian approach was one of very few accounting
systems producing annual input–output tables. The result was a detailed set of
accounts comprising thousands of entries, rather than just a few tables of ag-
gregate figures. It gave the impression that an empirical representation of the
entire economic circulation had been achieved and it looked like a wholly new
foundation for scientifically-based economic policy analysis.
The use of macroeconomic models for economic policy in Norway has been
closely related to the reliance upon ‘national budgeting’ in the management of
economic policy. The idea was that of a budget, not for the government’s fiscal
accounts, but in real terms for the entire national economy, spelt out in the spirit
and concepts of the Frischian national accounts. The national budget served as
a conceptual framework as well as a quantitative instrument for economic plan-
ning. The national budgeting process was organised by the Ministry of Finance
as a network of ministries, other government agencies, semi-official bodies, and
coordinating committees. The national budgeting in the early postwar period
took place in a highly-regulated and rationed economy, and called for the kind
of detail that the new national accounts could provide. The value of the national
budget was seen in its role as an integrating tool, linking the sub-budgets of min-
istries, subordinate government agencies and semi-official bodies in the process
of working out the economic prospects and economic policies for the coming
year.
This programmatic national budget as something different from a forecast of
national accounting aggregates raised problems of interpretation and realism.
The national budget would not constitute a plan in a meaningful sense unless it
was based upon a realistic assessment of the functioning of the economy. The
various sub-budgets had to be combined in a such way that all relationships
in the economy would be taken into account. However, with national accounts
still in their infancy, large-scale models unavailable and computers in a mod-
ern sense non-existent, this was a daunting task. In fact it was resolved by the
‘administrative method’ which at best was an imperfect iterative administrative
procedure.
As yet, together with the Netherlands, Norway is the example of a country
where interaction between data collection at the macro level and model based
economic analysis had an early start. Even more so than in the Netherlands,
the Norwegian experiment was, in those early days, directed at detailed eco-
National Accounts and Indicators 209
nomic planning, where the economy was run like an enterprise. In that sense
the planning exercise in Norway was much in line with the proposals of Van
Cleeff for ‘central planning’ in the Netherlands. A remarkable difference with
the Netherlands (and reflecting differences in opinion between Tinbergen and
Frisch) is that in Norway model based economic policy analysis and forecast-
ing has originally been conducted at the same institute as the data collection,
namely Statistics Norway. As mentioned before, in the Netherlands Tinbergen
advocated a strict separation between on the one hand data collection and on the
other hand economic policy analysis and forecasting.
Unlike in other countries, the US has no single NSO which collects all statisti-
cal data. There are several institutions financed by the government which collect
and compose data on the state of the economy. The Bureau of Labour Statis-
tics publishes inflation and unemployment figures. The Census Bureau collects
statistics specifically with respect to production, stock building, and population
data. The Bureau of Economic Analysis (BEA) composes the national accounts
based on data collected using by the Census Bureau. Finally the Federal Re-
serve Board (Fed), apart from monetary data, also collects and composes data
on the cyclical situation of the economy. This division of labour between the
various institutes brings about coordination problems. The different institutions,
in many cases, use their own methodology, which makes the data difficult to
compare, and makes policy analysis based on the data somewhat troublesome.
It also leads to much discussion on the quality of the data between the various
producers, so that data are less undisputed as, for instance, in the Netherlands.
A powerful institution in the US where economic policy analysis of statistical
data at the macro level takes place is the Council or Economic Advisers (CEA).
The council consists of a chairperson and two members, appointed by the Pres-
ident of the US. The members are assisted by a relatively small staff. Most of
them are university professors on leave from their university, and statistical as-
sistants and graduate students. For this reason the CEA has been strongly related
to the academic world. Each year the CEA makes forecasts of macroeconomic
developments. An important publication is The Economic Report of the Presi-
dent, which contains the political vision of the CEA. Obviously the composition
of this advisory body changes with the political colour of the President. As a
consequence, both the contents of the recommendations and the advice process
itself depend much on the composition of the government. Although the major
obligation of the CEA is to give policy recommendations to the President, it
has a broader task in policy preparation. The members of the CEA frequently
take part in committee meetings at several levels and can therefore try to per-
suade, beside the President, other policy makers of their vision. This strong link
between the political colour of the President and the composition of the CEA
resulted that policy advices have been less consistent than for example at the
210 F.A.G. den Butter
The Statistische Bundesamt is the central institution for collecting statistical data
in Germany. Some 2780 staff members collect, process, present and analyse sta-
tistical information in this Federal Statistical Office. Seven departments and the
offices of the President and the Vice-President are located in Wiesbaden’s main
office, two further departments are situated in the Bonn branch office. The Berlin
Information Point directly provides information and advisory services based on
official statistical data to Members of the Bundestag, the German federal gov-
ernment, embassies, federal authorities, industry associations, and all those who
are interested in official statistics in the Berlin–Brandenburg region.
In accordance with the federal state and administrative structure of the Fed-
eral Republic of Germany, federation-wide official statistics (federal statistics)
are produced in cooperation between the Federal Statistical Office and the sta-
tistical offices of the 16 Länder. This means that the system of federal statistics
is largely decentralised. In the context of that division of labour, the Federal
Statistical Office has mainly a coordinating function. Its main task is to ensure
that federal statistics are produced without overlaps, based on uniform methods,
and in a timely manner. The tasks of the Federal Statistical Office include (i) the
National Accounts and Indicators 211
methodological and technical preparation of the individual statistics, (ii) the fur-
ther development of the programme of federal statistics, (iii) the coordination of
individual statistics, (iv) the compilation and publication of federal results. With
just few exceptions, conducting the surveys and processing the data up to the
Land results fall within the competence of the statistical offices of the Länder.
So in fact a major part of the statistical data in Germany are collected by
these regional statistical institutions. Many cyclical indicators are constructed
and published by the Bundesbank. Moreover the Institut für Arbeidsmarkt- und
Berufsforschung der Bundesagentur für Arbeit (IAB) collects, publishes and
analyses data on developments at the labour market.
An important link between science and policy advice in Germany is the
Sachverständigenrat zur Begutachtung der gesamtwirtschaftlichen Entwicklung
(SVR). This council consists of five members, in most cases university pro-
fessors. They are the so-called ‘five wise’. The members of the council are
appointed for five years on proposal of the federal government by the Bundes-
president. In practice three members have no links with political parties and
interest groups. For the remaining places the employees and employers organi-
sations can present a candidate, but also the current members of the SVR have
a say in these appointments. The Sachverständigenrat publishes each year be-
fore November 15th a report on economic developments. Important topics in the
analysis are the stability of the price level, developments on the labour market,
including the unemployment problem, steady economic growth and an assess-
ment of the position of the balance of payment. Moreover the council must take
the income distribution in consideration. The council is asked to propose several
policy measures for reaching the policy goals, but no choice should be make.
The advice of the council is not bound to be unanimous; members may include
a minority opinion in the report. The Sachverständigenrat regularly commissions
research to other scientists. In contrast to the CEA in the US, the Sachverständi-
genrat is politically independent. Moreover, the way new members are appointed
ensures that their economic views will not differ radically from those of their
predecessors.
Both the Ministry of Finance and the Ministry of Economic Affairs also have
their own scientific advisory councils (wissenschaftliche Beiräte), composed of
university professors. The current members of these councils propose the new
members, so that here there is also some continuity in the line of advice. The
task of members of these councils is to give opinions on policy suggestions and
to suggest proposals themselves.
An important role in economic policy analysis in Germany is played by the
six independent research institutes. These have each their own specialisations,
although all report on the (inter)national economic development. Although none
of these institutes has a specific political background, or is linked to a political
party, they do represent different schools of economic thought. For instance, the
Deutsches Institut für Wirtschaftsforschung (DIW) in Berlin has a more Keyne-
sian orientation, whereas the Institut für Weltwirtschaft (IfW) of the university
of Kiel frequently pleads for letting the market forces work and for less govern-
212 F.A.G. den Butter
ment regulation. Twice a year these institutes meet in order to draft a report on
the stance of the business cycle for the current year (in April) and for the coming
year (in October). It is possible to add a minority opinion to the report. Espe-
cially the DIW has often used this possibility. Moreover each of the research
institutes publishes its own monthly report. So there is no equivalent to the CPB
in Germany. The common (consensus) forecast of the research institutes is not
the outcome based on one macroeconomic model, but the result of consultation
between the institutes. An important aspect is also that policy makers and politi-
cians in Germany are not very familiar with, and enthusiastic about model based
policy analysis.
In Germany the social partners also have their own research institutes. The
Institut der Deutschen Wirtschaft (IW) in Cologne, financed by the employers
organisations, is even one of the largest scientific research institutes in Germany.
The counterpart of the trade unions, the Wirtschafts und Sozialwissenschaftliche
Institut of the DGB (WSI), is somewhat smaller. These institutes publish their
own bulletins with analyses of the economic situation and prospects in advance
of the autumn report of the six independent institutes, in order to influence the
discussion.
Like in the UK, the most powerful institution in economic policy analysis and
policy preparation in France is the Ministry of Finance. The power of the Minis-
ter for Finance over its colleagues stems from delegation by the President of the
Republic. Because of this, a situation can arise where the Prime Minister has no
influence on economic policy, because the President imposes another opinion by
means of the Minister for Finance.
National accounts’ data and other data on the state of the French economy are
collected by the Institut National de la Statistique et des Études Économiques
(INSEE). It is a “General Directorate” of the French Ministry of Finance and
it is subjected to government-accounting rules: it is mainly funded from the
central-government’s general budget. The INSEE has a rather long history. In
1833 Adolphe Thiers (then Minister of the Interior) founded the Bureau de la
Statistique. It became the Statistique Générale de la France (SGF) in 1840. In
1946 the National Institute of Statistics and Economic Studies for Metropoli-
tan France and Overseas Possessions (Institut National de la Statistique et des
Études Économiques pour la Métropole et la France d’Outre-Mer) was estab-
lished. It was later renamed as the INSEE.
Around 1960, the formulation of “Le Plan” in France led to the application of
statistics to economic planning and economic-regulation policies. Immediately
after the war, a task force had engaged in preliminary national-accounting work.
The program was originally carried out by the Finance Ministry’s Economic
and Financial Studies Office (Service des Études Économiques et Financières:
SEEF), and then transferred to the INSEE. National accounting and medium-
term forecasting gained momentum in the 1960s. The contacts with potential
National Accounts and Indicators 213
Today, most NSOs publish quarterly national account data and some data are
even available at a monthly basis. An important aim of the quarterly estimates
is providing consistent and timely information on the recent economic develop-
ments in the country. However, NA data, and also the quarterly estimates, suffer
from long publication delays. In most cases it will take more than two years
when final data can be published. Data published previously are all preliminary
and provisional data, bound to revisions. Therefore the analysis of the recent
development takes place by means of data which may change considerably. In
National Accounts and Indicators 215
spite of these uncertainties with respect to the quality of the data, most NSOs
provide a quarterly “flash estimate” in order to cope with the need for very re-
cent information. In the case of the CBS this is an estimate of the development
of gross domestic product, released by means of a press bulletin eight weeks
after the end of the respective quarter. Magnus et al. (2000) designed a method-
ology using available information on indicator ratio’s, which can be helpful to
enhance the accuracy of recent national accounts estimates. Yet, there always is
a trade-off between timelines and accuracy in these estimates (see also Fixler,
2007, this volume). It can pose a problem when much weight is attached to these
recent data, for instance by financial markets. Market developments and strate-
gic decisions may, with the benefit of hindsight, be based on data which had a
very poor information contents. Therefore NSOs should very well monitor the
quality of their flash estimates and refrain from publishing them when quality is
too poor. They should do that in spite of public pressure to come up with recent
information.
8.7.2. Revisions of NA
On average each five to seven years a major revision of the national accounts data
takes place (see e.g. Blades, 1989). Reasons for these revisions are (i) new basic
observations becoming available; (ii) improvement in the construction method
and (iii) changes in the definitions and set-up of the system (for instance in
response to new international guidelines). These revisions may bring about sub-
stantial changes in the final figures of the national accounts. In the Netherlands
the last revision was published in 2005 and related to 2001 as the year of re-
vision. This revision had the following consequences for the assessment of the
state of the economy and for the economic policy indicators:
1. Gross domestic product was enhanced with 18.4 billion Euros which implies
an increase of 4.3%. This increase was mainly caused by introduction of new
insights in the use of statistical information.
2. Gross national income increased with 24.8 billion Euros.
3. The financial deficit of the government (according to the EMU definition)
now amounted to 0.2% of gross domestic product instead of 0.1% according
to the original calculations.
Obviously these revisions have considerable consequences for the interpretation
of historic economic developments, and also, in the above case, in the ranking
of nations according to their per capita income. This ranking is often used to
illustrate the relative prosperity of nations (see also Table 8.1).
1. Norway 3
2. Iceland 6
3. Australia 10
4. Luxemburg 1
5. Canada 7
6. Sweden 20
7. Switzerland 8
8. Ireland 2
9. Belgium 12
10. United States 4
11. Japan 13
12. Netherlands 11
13. Finland 16
14. Denmark 5
15. United Kingdom 18
16. France 15
17. Austria 9
18. Italy 19
19. New Zealand 22
20. Germany 14
Of course this is not by definition the most suitable system for an analysis of the
national economy with its specific institutional characteristics. Although already
in its current form the accounting framework satisfies to a large number of user
wishes, information relevant for a specific policy analysis may not be contained
in the system. Here the trade-off between the criteria of flexibility and invari-
ance (and international comparability) referred to above, plays a part. Moreover
the current NA in principle has been set up from the institutional approach (see
Section 8.2). The international guidelines have chosen a specific definition of
income, which excludes, for instance, domestic production but also the nega-
tive consequences of the use of the environment in production. More in general,
NA do not provide information on other aspects which are, beside financial in-
come and wealth, of importance for the prosperity of a country (see the next
section).
In order to meet the need of multi-purpose information a more flexible system
of NA has been designed. It consists of a (institutional oriented) core and vari-
ous types of modules (see Bloem et al., 1991; Bos, 2006). The core focuses on
transactions which are in reality expressed in money terms. These transactions
are booked (exclusively) for the actors who are actually involved in the transac-
tions. This core module system offers a number of clear advantages above the
current system of presentation of NA. In this alternative set-up the users avail of
National Accounts and Indicators 217
Today’s emphasis on the on the flexibility of the system of national accounts re-
flects the wishes of national accountants to make the system more user friendly
and to adapt to changes in the needs for data in economic analysis. In this
perspective there is an analogy to the argument by Mayer (this volume). He
describes the relationship between readers and authors of scientific articles as
a principal agent relationship. The author (as agent) has more information on
his/her research, but the description of the research should, in a concise way,
provide the essentials of the information so that the reader (as principal) can
make a good judgement on the value and importance of the research. Likewise
218 F.A.G. den Butter
the national accountant (as agent) should in the construction of the data provide
as much as possible the information which the user of the data (the principal)
needs. Tinbergen’s organisational set-up of economic policy preparation can,
along these lines, be seen as a multilayered principal agent relationship. The
CSO is the agent for the modelling and forecasting agency, and on their turn,
these model builders, model users and forecasters are the agents of the policy
makers who use these analyses in their debates and compromise agreements
on proper policy measures. A major advantage of such strict organisation and
separation of responsibilities is that it minimizes transaction costs in the pol-
icy discussions. In the context of the principal agent model these transaction
costs can be associated with bonding costs, monitoring costs and residual loss.
The more the national accountants are prepared and able to fulfil the wishes of
the users, and communicate the information contents of the data in an adequate
manner, the less effort the users of the data have to conduct their research in a
proper manner. In the multilayered principal agent model discussed above, all
experts involved in policy preparation – statisticians, model builders and model
users, policy makers – should familiarise themselves with the concepts used in
the analysis. Such common economic framework, where all “speak the same lan-
guage”, greatly contributes to the efficiency in the policy discussions. Of course,
as Den Butter and Morgan (1998) note, there is much interaction between pol-
icy makers, model builders and model users. So there is no one way stream of
information from agent to principal (or vice versa). In the context of the prin-
cipal agent model this interaction could be seen as a way of goal alignment,
so that the residual loss (agent has different goals than principal, or principal
has no clear goals given the external conditions) as part of transaction costs is
minimized.
The major aggregate economic indicators from the national accounts are na-
tional income and national product in their various definitions. These data are
often used as indicators for economic welfare and prosperity. There is ample
theoretical literature on the representation of economic welfare by national ac-
counting (e.g. Weitzman, 1976; Asheim, 1994). Asheim and Buchholz (2004)
developed a framework for national income accounting using a revealed wel-
fare approach that covers both the standard utilitarian and the maximin criteria
for welfare as special cases. They show that the basic welfare properties of na-
tional income accounting do not only cover the discounted utilitarian welfare
functions, but extend to a more general framework of welfare functions. In par-
ticular, under a wider range of circumstances, it holds that real NNP growth
indicates welfare improvement. Also from the empirical perspective develop-
ments in real national income (per capita) show a substantial correlation with
indicators which are specifically used as indicators of non material welfare,
such as child mortality, literacy, educational attainment and life expectancy. The
National Accounts and Indicators 219
Human Development Index (HDI), published annually by the UN, ranks na-
tions according to their citizens’ quality of life rather than strictly by a nation’s
traditional economic figures. The ranking of countries according to HDI in Ta-
ble 8.1 shows that the top of the list consists only of industrialised countries
with high national per capita incomes. The table uses the 2005 index which is
based on 2003 figures. Yet, the table also shows that within this group of in-
dustrialised countries, the ranking according to HDI and according to GDP per
capita may differ considerably. For instance, Australia and Sweden obtain much
better scores for HDI than for GDP per capita. The opposite holds for the United
States, and, surprisingly, for Ireland and Denmark.
However, from a more operational perspective there is much criticism and
discontent with national accounting data as indicators for welfare and specific
economic developments. For instance, Van Ark (1999) mentions a number of
problems when national account data are used for the analysis of long term eco-
nomic growth. In that case long and internationally comparable time series are
needed on (changes) in real GDP and its components. Van Ark’s first concern is
the weighting procedure. Changes in volume terms need necessarily be related
to a benchmark year with a given basket of goods and services. The weights
of the benchmark year are representative for the volume index or price index
used for the calculation of volume data over the whole time period. Ideally one
would wish to use the regular shifts in weights in benchmark years every five or
ten years, and some coordination amongst various countries would be highly de-
sirable. However, such data are not available and one has to rely at most on a few
benchmark years, and sometimes even on only one benchmark year. The second
concern by Van Ark is the estimation of intermediate inputs, capital and labour,
which are important ingredients of an empirical study of economic growth. With
the exception of manufacturing, which in many (trading) countries comprises
only a relative small part of total production, there is very little comprehen-
sive evidence on intermediate inputs in the production process before the era
of input–output tables. Historical sources on capital stock and capital services
are only available for a very limited number of countries and the consistency
of historical labour statistics with national accounts is weak in many cases. The
third concern of Van Ark is the treatment of services. The measurement of real
output in services remained somewhat neglected as much of the work of his-
torical accounts focused primarily on the commodity sectors of the economy.
Historical accounts often assume no productivity changes in services and rely
largely on changes in the wage bill of services. It appears that on the whole real
output growth in services is likely to be understated in most accounts, because
the no productivity growth assumption seems to be unrealistic. It may also imply
that productivity increases in services are attributed to industry and commodity
sectors.
220 F.A.G. den Butter
More in general one of the most troublesome parts of national accounting from
the perspective of the interpretation of the data is the separation of the observed
(changes in) nominal values in prices and volumes (for reviews see Diewert,
2004 and Reinsdorf, 2007, this volume). Index number theory gives statistical
agencies some guidance on what is the “right” theoretical index for determin-
ing prices of commodities and services and for aggregation of these prices. The
problem, however, is that there have been many alternative index number theo-
ries and that statistical agencies have been unable to agree on a single theory to
guide them in the preparation of their consumer price indices or their indices of
real output.
One of major operational problems is to adjust prices for the quality changes
in the attributes of goods and services. For instance, a price increase of a new
version of a car may come together with some improvements (higher engine
power, more luggage space, new safety provisions) as compared to the older
version of the same car. In that case a correction has to be made for these im-
provements which may imply that the corrected price change is much lower, or
even negative, as compared to the actual price change. These implicit changes
in the quality of goods and services in the basket of consumer goods used for
determining the consumer price index (CPI) has been a major concern for the
Boskin commission.3 When quality changes are not properly taken into con-
sideration, price indices overestimate inflation and hence underestimate volume
changes and productivity increases. A method of adjusting prices for quality
changes is the so called hedonic method where prices of goods and services are
regressed with (quality) changes in the attributes of those goods and services.
As yet one should be cautious in the use of hedonic regressions because many
issues have not yet been completely resolved. Moreover questions have been
raised about the usefulness of hedonic regressions as several alternative hedonic
regression methodologies proved to yield different empirical results. Therefore
Diewert (2004) notes that there is still some work to be done before a consensus
on “best practice” hedonic regression techniques emerges.
A related problem with respect to the construction of price indices is intro-
duction of new products. Here the solution is the reservation price methodology,
already suggested by Hicks, which has, however, not been adopted by any sta-
tistical agency as yet. Moreover, a final solution for the problem of separating
price and volume movements will never be possible as there are, especially in
services, categories of products where prices are difficult, or even impossible to
be observed. Diewert (2004) gives the following list: (i) unique products: that is,
in different periods, different products are produced; it prevents routine match-
ing of prices and is a pervasive problem in the measurement of the prices of
3 It is acknowledged that measuring inflation by the CPI using a basket of consumer commodities
is, strictly speaking, not part of national accounting.
National Accounts and Indicators 221
services; (ii) complex products: many service products are very complicated;
e.g., telephone service plans; (iii) tied products: many service products are bun-
dled together and offered as a single unit; e.g., newspapers, cablevision plans,
banking services packages; (iv) joint products; for this type of product, the value
depends partially on the characteristics of the purchaser; e.g., the value of a year
of education depends not only on the characteristics of the school and its teach-
ers but also on the social and genetic characteristics of the student population;
(v) marketing and advertising products: this class of service sector outputs is
dedicated to influencing or informing consumers about their tastes; a standard
economic paradigm for this type of product has not yet emerged; (vi) heavily
subsidized products: in the limit, subsidized products can be supplied to con-
sumers free of (explicit) charges: the question than is whether zero is the “right”
price for this type of product? (vii) financial products: what is the “correct”
real price of a household’s monetary deposits? (viii) products involving risk and
uncertainty: what is the correct pricing concept for gambling and insurance ex-
penditures? What is the correct price for a movie or a record original when it is
initially released?
Diewert also mentions the problem for statistical agencies of how to deal with
transfer prices when constructing import and export price indexes. A transfer
price is a border price set by a multinational firm that trades products between
subsidiaries in different countries. It is unlikely that currently reported transfer
prices represent “economic” prices that reflect the resource costs of the exports
or imports. As the proportion of international trade that is conducted between
subsidiaries of multinational firms is about 50%, it becomes an increasingly
difficult challenge for statistical agencies to produce price indexes for exports
and imports that are meaningful.
which connect individual welfare with happiness. These studies show that some-
where between 1950 and 1970 the increase in individual welfare (or happiness)
has stopped, or even has changed into a negative trend in most industrialised
(OECD) countries, whereas there has been a steady and continuous growth of
real GNP. There seems to be a ‘decoupling’ between income and individual sub-
jective welfare at the level of about 15 000 to 20 000 dollars income per year
(see also Layard, 2006; Helliwell, 2006).
allot individual observations at the firm level to these various sectoral accounts.
Sectoral disaggregation becomes even more difficult now that more and more
production processes are split up due to subcontracting and outsourcing. Even
at the plant level firms fulfil various different functions in the production chain
so that a functional approach would be better suited for the purposes of data
analysis than the present institutional approach in sectoral accounting. Think of
multinationals like Shell, Unilever and Philips, which are in the statistics part of
the industry sector, but which have in their home countries mainly an orches-
trating function where goods and services are produced all over the world at
lowest prices and sold at highest prices. Reductions of transaction costs (e.g.
by innovations in subcontracting and outsourcing, or by creating much value
by smart marketing) will, according to the sectoral accounting, result in pro-
ductivity gains of the industry. The economic interpretation of such productivity
increase is often that it is caused by product innovations, which is not true in this
case (see WRR, 2003). In fact, macroeconomic research in this field of produc-
tivity analysis and growth accounting increasingly use microeconomic data sets
with individual firm data which cover the whole economy. Modern computer
facilities and empirical methodology facilitates such analysis. NSOs are capable
and willing to make these data sets available for professional researchers.
The second example relates to the consumer price index (CPI). The CPI is
used for indexation of all kinds of economic quantities such as wages and pen-
sion income. Calculation of the CPI is based on an basket of goods and services
for the average of all individuals. However, the price inflation calculated by the
CPI differs for each individual and group. Frequently specific groups, such as
the elderly, are dissatisfied with indexation according to the average CPI when
they believe that inflation has been above average for their group. On that occa-
sion they ask the NSO to calculate a CPI for their specific group – obviously no
demand for a group CPI occurs when the inflation of that group is believed to
be below average. In principle NSOs are able to calculate a CPI for each indi-
vidual person – or to be more precise: for each individual basket of goods and
services. So they can comply with the demand for CPIs for various (sub)groups
of the population. The question is whether such proliferation of CPIs is wise
from both a political and a statistical viewpoint. From a political viewpoint it is
not wise because the use of these disaggregated CPIs will always be asymmet-
ric and biased to bring more inflation. From a statistical viewpoint, researchers
at the Netherlands CBS, Pannekoek and Schut (2003) have shown that it is not
wise either. They looked at price increases within and between four different
groups of income earners, namely (i) households with wage incomes (workers);
(ii) households with income from capital and own occupation (self employed);
(iii) households living on social security and assistance; (iv) household with old
age pensions (elderly). There appeared to be some persistent (but hardly signif-
icant) differences in inflation rates between these groups. However, differences
within these groups appeared to be much larger. Therefore the CBS decided, for
the time being, not to comply with the demand to publish regularly CPIs for
various groups.
National Accounts and Indicators 225
The third example is somewhat related to the previous one, albeit that the re-
sult here is a presentation of data at the micro level rather than (solely) at the
macro level. Traditionally the Netherlands CPB calculates short term prospects
for the purchasing power of Dutch households. The outcome of these calcu-
lations carry a heavy weight in the policy discussions in the Netherlands. The
effect of each policy measure on purchasing power is closely looked at by politi-
cians and the media, and often policy measures are very much fine tuned (and
therefore sometimes made too specific and complicated) in order to avoid losses
of purchasing power, especially for low income groups. As a matter of fact,
in the Netherlands it is the indicator which carries the largest weight in pol-
icy discussions on measures which affect the income distribution and in the
yearly negotiations on the government budget. The CPB used to present (and
still is presenting) the effects on purchasing power for the average of different
income groups: minimum wage earner; modal wage earner; two times modal
wage earner, etc. However it was perceived that these average outcomes at the
macro level did not provide a sufficient picture of the underlying effects at the
individual level. For instance, when the government declared that, on the basis of
the average outcomes, through a combination of policy measures, the purchas-
ing power of the whole population would increase, the media and politicians
of the opposition were always able to find an unfortunate and poor individual,
who suffered a substantial decrease in disposable income by the combination
of the policy measures. The Social Economic Council even published a lengthy
advice on how to present indicators of purchasing power. It made the CPB de-
cide to present the development of purchasing power in scatter diagrams, where
each point in the scatter represents a specific small groups of similar house-
holds. These scatters for six different categories of households are reproduced
in Fig. 8.1. They show for most households of all categories an increase of pur-
chasing power in 2006 as compared to 2005. Policy measures seem to be most
favourable to households with a single wage earner. Most households with two
wage earners will also see their purchasing power increase, but here there is a
considerable number of households that will not profit from the policy measures
(and in this case, start of the cyclical upturn). The same holds true for the other
categories of the figure. So the scatter diagram brings more sophistication to the
policy discussions than a simple presentation of averages at the macro level in
a table. Although the scatter diagrams may seem complicated and difficult to
understand at first sight, nowadays all participants in the social economic policy
debate in the Netherlands know perfectly well how to interpret this represen-
tation of the indicator. A disadvantage of this indicator is, like in the case of
aggregated purchasing power indicators, that it does not reveal the dynamics of
moving to another group (e.g. from unemployed to employed). Policy measures
often aim to give incentives for such transitions.
226 F.A.G. den Butter
Fig. 8.1: Purchasing power by household type, source of income and household income (changes
in %), 2006. Source CPB: Purchasing power in 2006 according to MEV 2007.
8.9. Conclusions
National accounts (NA) and the indicators derived from the system of national
accounts play a major role in economic policy preparation and in the political
debate on welfare and well being. For a structured discussion on these matters
it is essential that technical aspects of data construction are as much as possible
separated from the policy interpretation of these composed data which often has
a normative and political character. This separation of responsibilities leads to a
National Accounts and Indicators 227
References
Asheim, G.B. (1994). Net national product as an indicator of sustainability. Scandinavian Journal
of Economics 96, 257–265.
Asheim, G.B., Buchholz, W. (2004). A general approach to welfare measurement through national
income accounting. Scandinavian Journal of Economics 106, 361–384.
Bjerkholt, O. (1998). Interaction between model builders and policy makers in the Norwegian tradi-
tion. Economic Modelling 15, 317–339.
Blades, D. (1989). Revision of the system of national accounts: A note on the objectives and key
issues. OECD Economic Studies 12, 205–219.
228 F.A.G. den Butter
Bloem, A.M., Bos, F., Gorter, C.N., Keuning, S.J. (1991). Vernieuwing van de Nationale rekeningen
(Improvement of national accounts). Economisch Statistische Berichten 76, 957–962.
Bos, F. (1992). The history of national accounting. National Accounts Occasional Paper Nr. NA-048.
CBS, Voorburg.
Bos, F. (2003). The national accounts as a tool for analysis and policy; past, present and future.
Academic thesis. University of Twente.
Bos, F. (2006). The development of the Dutch national accounts as a tool for analysis and policy.
Statistica Neerlandica 60, 215–258.
Boumans, M.J. (2007). Representational theory of measurement. In: Durlauf, S., Blume, L. (Eds.),
New Palgrave Dictionary of Economics, 2nd ed. Macmillan, to appear.
Clark, C. (1937). National Income and Outlay. MacMillan, London.
Comim, F. (2001). Richard Stone and measurement criteria for national accounts. In: History of
Political Economy. Annual Supplement to vol. 33, pp. 213–234.
de Boo, A.J., Bosch, P.R., Garter, C.N., Keuning, S.J. (1991). An environmental module and the
complete system of national accounts. National Accounts Occasional Paper Nr. NA-046. CBS,
Voorburg.
de Haan, M., Keuning, S.J. (1996). Taking the environment into account: The NAMEA approach.
Review of Income and Wealth 42, 131–148.
den Bakker, G.P. (1993). Origin and development of Dutch National Accounts. In: de Vries, W.F.M.,
et al. (Eds.), The Value Added of National Accounting. CBS, Voorburg/Heerlen, pp. 73–92.
den Butter, F.A.G. (2004). Statistics and the origin of the Royal Netherlands Economic Association.
De Economist 152, 439–446.
den Butter, F.A.G. (2006). The industrial organisation of economic policy preparation in the Nether-
lands. Paper presented at the conference on Quality Control and Assurance in Scientific Advice to
Policy. Berlin–Brandenburg Academy of Sciences and Humanities, Berlin, January 12–14, 2006.
den Butter, F.A.G., Morgan, M.S. (1998). What makes the models-policy interaction successful?
Economic Modelling 15, 443–475.
den Butter, F.A.G., Mosch, R.H.J. (2003). The Dutch miracle: Institutions, networks and trust. Jour-
nal of Institutional and Theoretical Economics 159, 362–391.
den Butter, F.A.G., van der Eyden, J.A.C. (1998). A pilot index for environmental policy in the
Netherlands. Energy Policy 26, 95–101.
den Butter, F.A.G., Verbruggen, H. (1994). Measuring the trade-off between economic growth and
a clean environment. Environmental and Resource Economics 4, 187–208.
Diewert, W.E. (2004). Index number theory: Past progress and future challenges. Paper presented at
SSHRC Conference on Price Index Concepts and Measurement, Vancouver, Canada, June/July
2004.
Don, F.J.H. (1996). De positie van het Centraal Planbureau (The position of the Central Planning
Bureau). Economisch Statistische Berichten 81, 208–212.
Don, F.J.H., Verbruggen, J.P. (2006). Models and methods for economic policy: An evolution of 50
years at the CPB. Statistica Neerlandica 60, 145–170.
Gerlagh R., Dellink, R., Hofkes, M.W., Verbruggen, H. (2002). A measure of sustainable national
income for the Netherlands. Ecological Economics 41, 157–174.
Helliwell, J.F. (2006). Well-being, social capital and public policy: What’s new? Economic Journal
116, C34–C45.
Hope, C., Parker, J., Peake, S. (1992). A pilot environmental index for the UK in the 1980s. Energy
Policy 20, 335–343.
Hueting, R., Bosch, P., de Boer, B. (1992). Methodology for the calculation of sustainable national
income. Statistical Essays M44 (Central Bureau of Statistics, Voorburg).
Kendrick, J.W. (1970). The historical development of National-Income accounts. History of Political
Economy 2, 284–315.
Kenessey, Z. (1993). Postwar trend in national accounts in the perspective of earlier develop-
ments. In: de Vries, W.F.M., et al. (Eds.), The Value Added of National Accounting. CBS,
Voorburg/Heerlen, pp. 33–70.
National Accounts and Indicators 229
Klep, P.M.M., Stamhuis, I.H. (Eds.) (2002). The Statistical Mind in a Pre-statistical Era: The
Netherlands 1750–1850. Aksant, Amsterdam, NEHA Series III.
Keuning, S.J. (1991). Proposal for a social accounting matrix which fits into the next system of
national accounts. Economic Systems Research 3, 233–248.
Keuning, S.J. (1993). An information system for environmental indicators in relation to the Na-
tional Accounts. In: de Vries, W.F.M., et al. (Eds.), The Value Added of National Accounting.
Netherlands Central Bureau of Statistics, Voorburg/Heerlen, pp. 287–305.
Keuning, S.J., de Ruijter, W.A. (1988). Guidelines to the construction of a social accounting matrix.
Review of Income and Wealth 34, 71–100.
Layard, R. (2006). Happiness and public policy: A challenge to the profesion. Economic Journal
116, C24–C33.
Magnus, J.R., van Tongeren, J.W., de Vos, A.F. (2000). National accounts estimation using indicator
ratios. Review of Income and Wealth 46, 329–350.
Mäler, K.-G. (1991). National accounts and environmental resources. Environmental and Resource
Economics 1, 1–15.
Mellens, M. (2006). Besparingen belicht: Samenhang en verschillen tussen definities (A look at sav-
ings: Links and differences between definitions). CPB Memorandum 145. The Hague, February.
Mooij, J. (1994). Denken over Welvaart, Koninklijke Vereniging voor de Staathuishoudkunde, 1849–
1994. Lemma, Utrecht.
Morgan, M.S. (1990). The History of Econometric Ideas. Cambridge Univ. Press, Cambridge.
Pannekoek, J., Schut, C.M. (2003). Geen inflatie op maat (No inflation index at request). Economisch
Statistische Berichten 88, 412–414.
Stamhuis, I.H. (1989). ‘Cijfers en Aequaties’ en ‘Kennis der Staatskachten’; Statistiek in Nederland
in de negentiende eeuw. Rodopi, Amsterdam/Atlanta.
Stamhuis, I.H. (2002). Vereeniging voor de Statistiek (VVS); Een gezelschap van juristen (The
Statistical Society; A society of lawyers). STAtOR 3 (2), 13–17.
Summers, R., Heston, A. (1991). The PENN world table (mark 5); an expanded set of international
comparisons, 1950–1988. Quarterly Journal of Economics 106, 327–368.
Tinbergen, J. (1936). Kan hier te lande, al dan niet na overheidsingrijpen een verbetering van de
binnenlandse conjunctuur intreden, ook zonder verbetering van onze exportpositie? Welke lering
kan ten aanzien van dit vraagstuk worden getrokken uit de ervaringen van andere landen? In:
Praeadviezen voor de Vereeniging voor de Staathuishoudkunde en de Statistiek. Nijhoff: Den
Haag, pp. 62–108.
Tinbergen, J. (1952). On the Theory of Economic Policy. North-Holland, Amsterdam.
Tinbergen, J. (1956). Economic Policy: Principles and Design. North-Holland, Amsterdam.
Wetenschappelijke Raad voor het Regeringsbeleid (2003). Nederland handelsland: Het perspectief
van de transactiekosten (The Netherlands as a nation of traders: The transaction costs’ perspec-
tive). Reports to the Government No. 66. Sdu Publishers, The Hague.
Weitzman, M.L. (1976). On the welfare significance of national product in a dynamic economy.
Quarterly Journal of Economics 90, 156–162.
van Ark, B. (1999). Accumulation, productivity and technology: Measurement and analysis of long
term economic growth. CCSO Quarterly Journal 1 (2). June.
van den Bergh, J.C.J.M. (2005). BNP, weg ermee (BNP, let’s get rid of it). Economisch Statistische
Berichten 90, 502–505.
van den Bogaard, A. (1999). Configuring the economy, the emergence of a modelling practice in the
Netherlands, 1920–1955. Thela-Thesis.
van Zanden, J.L. (2002). Driewerf hoera voor het poldermodel (Three hoorays for the polder model).
Economisch Statistische Berichten 87, 344–347.
This page intentionally left blank
CHAPTER 9
Abstract
The Representational Theory of Measurement conceives measurement as es-
tablishing homomorphisms from empirical relational structures into numerical
relation structures, called models. Models function as measuring instruments by
transferring observations of an economic system into quantitative facts about
that system. These facts are evaluated by their accuracy. Accuracy is achieved
by calibration. For calibration standards are needed. Then two strategies can be
distinguished. One aims at estimating the invariant (structural) equations of the
system. The other strategy is to use known stable facts about the system to ad-
just the model parameters. For this latter strategy, the requirement of models as
homomorphic mappings is not required anymore.
The objects of economic measurements have a different ontology than the ob-
jects of classical theories of measurement. Measurement is assigning numbers
to properties. In the classical view of measurement, which arose in the physical
sciences and received its fullest exposition in the works of Campbell (1928),
these numbers represents properties of things. Measurement in the social sci-
ences does not necessarily have this thing-relatedness. It is not only properties
of ‘things’ that are measured but also those of other kinds of phenomena: states,
events, and processes.
To arrive at an account of measurement that acknowledges this different on-
tology, Woodward’s (1989) distinction between phenomena and data is helpful.
According to Woodward, phenomena are relatively stable and general features
of the world and therefore suited as objects of explanation and prediction. Data,
that is, the observations playing the role of evidence for claims about phenom-
ena, on the other hand involve observational mistakes, are idiosyncratic and
reflect the operation of many different causal factors and are therefore unsuited
for any systematic and generalizing treatment. Theories are not about observa-
tions – particulars – but about phenomena – universals.
Woodward characterizes the contrast between data and phenomena in three
ways. In the first place, the difference between data and phenomena can be indi-
cated in terms of the notions of error applicable to each. In the case of data the
Invariance and Calibration 233
Underlying the contrast between data and phenomena is the idea that theories
do not explain data, which typically will reflect the presence of a great deal of
noise. Rather, an investigator first subjects the data to analysis and processing,
or alters the experimental design or detection technique, in an effort to separate
out the phenomenon of interest from extraneous background factors. Although
phenomena are investigated by using observed data, they themselves are in gen-
eral not directly observable. To ‘see’ them we need instruments, and to obtain
numerical facts about the phenomena in particular we need measuring instru-
ments. In social science, we do not have physical instruments, like thermometers
or galvanometer. Mathematical models function as measuring instruments by
transforming sets of observations into a measurement result.
Theories are incomplete with respect to the quantitative facts about phenom-
ena. Though theories explain phenomena, they often (particularly in economics)
do not have built-in application rules for mathematizing the phenomena. More-
over, theories do not have built-in rules for measuring the phenomena. For ex-
ample, theories tell us that metals melt at a certain temperature, but not at which
temperature (Woodward’s example); or they tell us that capitalist economies give
rise to business cycles, but not the duration of recovery. In practice, by mediating
between theories and the data, models may overcome this dual incompleteness
of theories. As a result, models that function as measuring instruments medi-
ate between theory and data by transferring observations into quantitative facts
about the phenomenon under investigation:
Because facts about phenomena are not directly measured but must be in-
ferred from the observed data, we need to consider the reliability of the data.
These considerations cannot be derived from theory but are based on a closer
investigation of the experimental design, the equipment used, and need a statisti-
cal underpinning. This message was well laid out for econometrics by Haavelmo
234 M. Boumans
(1944, p. 7): ‘The data [the economist] actually obtains are, first of all, nearly
always blurred by some plain errors of measurement, that is, by certain extra
“facts” which he did not intend to “explain” by his theory’.
If we look at the measuring practices in economics and econometrics, we
see that their aims can be formulated as: Measurements are results of model-
ing efforts for their goal of obtaining quantitative information about economic
phenomena. To give an account of these economic measurement practices, the
subsequent sections will explore in which directions the representational theory
has to be extended. This extension will be based on accounts that deal explicitly
with measuring instruments and measurement errors.
1 I have replaced the symbols Q and R in the original text by the symbols Y and X, respectively,
to make the discussion of the measurement literature uniform.
Invariance and Calibration 235
2 Ellis’ account of associative measurement is based on Mach’s (1968) chapter ‘Kritik des Tem-
peraturbegriffes’ from his book Die Principien der Wärmelehre (Leipizg, 1896). This chapter was
translated into English and added to Ellis’ (1968) book as Appendix I.
236 M. Boumans
(1963), where it was called ‘pointer measurement’, but its discussion disap-
peared in later accounts of RTM. Generally, by instrument measurement we
mean a numerical assignment based on the direct readings of some validated
instrument. A measuring instrument is validated if it has been shown to yield
numerical values that correspond to those of some numerical assignments under
certain standard conditions. This is also called calibration, which in metrology
is defines as: ‘set of operations that establish, under specified conditions, the
relationship between values of quantities indicated by a measuring instrument
or measuring system, or values represented by a material measure or a refer-
ence material, and the corresponding values realized by standards’ (IVM, 1993,
p. 48). To construct a measuring instrument, it is generally necessary to utilize
some established empirical law or association.
One difference between Ellis’ associative measurement and Heidelberger’s
correlative interpretation of measurement, that is instrument measurement, is
that, according to Heidelberger, the mapping of X into numbers, φ(X), is not the
result of (direct) measurement but is obtained by calibration (see Heidelberger’s
quote above). To determine the scale of the thermometer no prior measurement
Invariance and Calibration 237
where φ(X) is the measure of X on some previously defined scale. The cor-
relation F also involves other influences indicated by OC. OC, an acronym of
‘other circumstances’, is a collective noun of all other quantities that might have
an influence on X.
The central idea of instrument measurement is that in measuring any at-
tribute Y we always have to take into account its empirical lawful relation to (at
least) another attribute X. To establish this relation we need a measurement ap-
paratus or experimental arrangement, A. In other words, a measuring instrument
had to function as a nomological machine. This idea is based on Cartwright’s ac-
count that a law of nature – necessary regular association between properties –
hold only relative to the successful repeated operation of a ‘nomological ma-
chine’, which she defines as:
a fixed (enough) arrangement of components, or factors with stable (enough) capacities that
in the right sort of stable (enough) environment will, with repeated operation, give rise to the
kind of regular behavior that we represent in our scientific laws (Cartwright, 1999, p. 50).
This error term, representing noise, reflects the operation of many different,
sometimes unknown, influences. Now, accuracy of the observation is obtained
by reducing the noise as much as possible. One way of obtaining accuracy is
by taking care that the other influence quantities, indicated by OC, are held as
constant as possible, in other words, that ceteris paribus conditions are imposed.
To show this idea, Eq. (9.2) is rewritten to express how Y and possible other
circumstances (OC) influence the observations:
Equation (9.3) shows that accuracy can be obtained ‘in the right sort of stable
(enough) environment’ by imposing ceteris paribus conditions (cp), which also
might include even stronger ceteris absentibus conditions: OC ≈ 0. As a result
the remaining factor Y can be varied in a systematic way to gain knowledge
about the relation between Y and X:
Xcp
FY = . (9.4)
Y
If the ratio of the variation of Xcp and the variation of Y appears to be stable,
the correlation is an invariant relationship and can thus be used for measurement
aims.
So, an observation in a controlled experiment is an accurate measurement be-
cause of the stabilization of background noise (E = 0 → E is stable: E = S).
xcp = φ(Xcp ) = φ F (Y, S) . (9.5)
xi = f (y) + εi (i = 1, . . . , k) (9.6)
the degree imposed by the required accuracy of the measurement result, addi-
tional input quantities must be included in M to eliminate this inaccuracy. This
may require introducing input quantities to reflect incomplete knowledge of a
phenomenon that affects the measurand. This means that the model has to incor-
porate a representation of the full nomological machine A, denoted by a, that is
should represent both properties of the phenomenon to be measured as well as
the background conditions influencing the observations. To take account of this
aspect of measurement, Fig. 9.3 has to be further expanded as shown in Fig. 9.4.
When one has to deal with a natural measuring system A that can only be
observed passively, the measurement procedure is first to infer from the obser-
vations Xi nature’s design of this system to determine next the value of the
measurand Y . So, first an adequate representation a of system A has to be speci-
fied before we can estimate the value of Y . A measurement result is thus given by
If one substitute Eq. (9.6) into model M, one can derive that, assuming that
M is a linear operator (usually the case):
ŷ = M f (y) + εi ; a = My (y; a) + Mε (εi ; a). (9.8)
A true signal, that is the true value of Y , however, can only be obtained by a
perfect measurement, and so is by nature indeterminate. The reliability of the
model’s outputs cannot be determined in relation to a true but unknown signal,
and thus depends on other aspects of the model’s performance. To describe the
performance of a model that functions as a measuring instrument the term ac-
curacy is important. In metrology, accuracy is defined as a statement about the
closeness of the mean taken from the scatter of the measurements to the value
declared as the standard (Sydenham, 1979, p. 48).
The procedure to obtain accuracy is calibration, which is the establishment
of the relationship between values indicated by a measuring instrument and the
corresponding values realized by standards. This means, however, that accuracy
can only be assessed in terms of a standard. In this context, a standard is a rep-
resentation (model ) of the properties of the phenomenon as they appear under
well-defined conditions.
To discuss this problem in more detail, we split the measurement error in three
parts:
ε̂ = ŷ − y = Mε + (My − S) + (S − y) (9.9)
where S represents the standard. The error term Mε is reduced as much as pos-
sible by reducing the spread of the error terms, in other words by aiming at
precision. (Mx − S) is the part of the error term that is reduced by calibration.
So, both errors terms can be dealt with by mechanical procedures. However, the
reduction of the last term (S − y) can only dealt with by involving theoretical as-
sumptions about the phenomenon and independent empirical studies. Note that
the value y is not known. Often the term (S − y) is reduced by building as ac-
curate representations a of the economic system as possible. This third step is
called standardization.
Invariance and Calibration 243
k
Model I: I
x̂it+1 = αijI xj t (i: 1, . . . , k), (9.10)
j =1
k+1
Model II: II
x̂it+1 = αijII xj t (i: 1, . . . , k + 1). (9.11)
j =1
If xit+1 − x̂it+1
II < x
it+1 − x̂it+1 for the majority of these error terms
I
mimics the world as closely as possible along a limited but clearly specified,
number of dimensions.
5. Run the experiment.
Kydland and Prescott’s specific kind of assessment is similar to Lucas’ idea
of testing, although Lucas didn’t call it calibration. To test models as ‘useful
imitations of reality’ we should subject them to shocks ‘for which we are fairly
certain how actual economies, or parts of economies, would react. The more
dimensions on which the model mimics the answer actual economies give to
simple questions, the more we trust its answer to harder questions’ (Lucas, 1980,
pp. 696–697). This kind of testing is similar to calibration as defined by Franklin
(1997, p. 31): ‘the use of a surrogate signal to standardize an instrument. If an
apparatus reproduces known phenomena, then we legitimately strengthen our
belief that the apparatus is working properly and that the experimental results
produced with that apparatus are reliable’.
The economic questions, for which we have known answers, or, the standard
facts with which the model is calibrated, were most explicitly given by Cooley
and Prescott (1995). They describe calibration as a selection of the parameters
values for the model economy so that it mimics the actual economy on dimen-
sions associated with long-term growth by setting these values equal to certain
‘more or less constant’ ratios. These ratios were the so-called ‘stylized facts’
of economic growth, ‘striking empirical regularities both over time and across
countries’, the ‘benchmarks of the theory of economic growth’.
What we have seen above is that in modern macroeconomics, the assessment
of models as measuring instruments is not based on the evaluation of the ho-
momorphic correspondence between the empirical relational structure and the
numerical relational structure. The assessment of these models is more like
what is called validation in systems engineering. Validity of a model is seen
as ‘usefulness with respect to some purpose’. Barlas (1996) notes that for an
exploration of the notion validation it is crucial to make a distinction between
white-box models and black-box models. In black-box models, what matters is
the output behavior of the model. The model is assessed to be valid if its out-
put matches the ‘real’ output within some specified range of accuracy, without
any questioning of the validity of the individual relationships that exists in the
model. White-box models, on the contrary, are statements as to how real systems
actually operate in some aspects. Generating an accurate output behavior is not
sufficient for model validity; the validity of the internal structure of the model is
crucial too. A white-box model must not only reproduce the behavior of a real
system, but also explain how the behavior is generated.
Barlas (1996) discusses three stages of model validation: ‘direct structural
tests’, ‘structure-oriented behavior tests’ and ‘behavior pattern tests’. For white
models, all three stages are equally important, for black box models only the last
stage matters. Barlas emphasizes the special importance of structure-oriented
behavior tests: these are strong behavior tests that can provide information on
potential structure flaws. The information, however, provided by these tests does
not give any direct access to the structure, in contrast to the direct structure tests.
Invariance and Calibration 247
References
Barlas, Y. (1996). Formal aspects of model validity and validation in system dynamics. System Dy-
namics Review 12 (3), 183–210.
Campbell, N.R. (1928). Account of the Principles of Measurement and Calculation. Longmans,
Green, London.
Cartwright, N. (1999). The Dappled World. A Study of the Boundaries of Science. Cambridge Univ.
Press, Cambridge.
Cooley, T.F., Prescott, E.C. (1995). Economic growth and business cycles. In: Cooley, T.F. (Ed.),
Frontiers of Business Cycle Research. Princeton Univ. Press, Princeton, pp. 1–38.
Ellis, B. (1968). Basic Concepts of Measurement. Cambridge Univ. Press, Cambridge.
Finkelstein, L. (1975). Fundamental concepts of measurement: Definition and scales. Measurement
and Control 8, 105–110.
Franklin, A. (1997). Calibration. Perspectives on Science 5, 31–80.
Friedman, M. (1951). The methodology of positive economics. In: Essays in Positive Economics.
Univ. of Chicago Press, Chicago, pp. 3–43.
Haavelmo, T. (1944). The probability approach in econometrics. Econometrica 12. Supplement.
Heidelberger, M. (1993). Fechner’s impact for measurement theory. Behavioral and Brain Sciences
16 (1), 146–148.
Heidelberger, M. (1994a). Alternative Interpretationen der Repräsentationstheorie der Messung. In:
Meggle, G., Wessels, U. (Eds.), Proceedings of the 1st Conference “Perspectives in Analytical
Philosophy”. Walter de Gruyter, Berlin and New York, pp. 310–323.
248 M. Boumans
Heidelberger, M. (1994b). Three strands in the history of the representational theory of measure-
ment. Working paper. Humboldt University, Berlin.
IVM (1993). International Vocabulary of Basic and General Terms in Metrology, second ed. Inter-
national Organization for Standardization, Geneva.
Kydland, F.E., Prescott, E.C. (1996). The computational experiment: An econometric tool. Journal
of Economic Perspectives 10 (1), 69–85.
Lucas, R.E. (1976). Econometric policy evaluation: A critique. In: Brunner, K., Meltzer, A.H. (Eds.),
The Phillips Curve and Labor Markets. North-Holland, Amsterdam, pp. 19–46.
Lucas, R.E. (1980). Methods and problems in business cycle theory. Journal of Money, Credit, and
Banking 12, 696–715.
Mach, E. [1896] (1968). Critique of the concept of temperature. In: Ellis, B. (Ed.), Basic Concepts of
Measurement. Cambridge Univ. Press, Cambridge, pp. 183–196 (translated by M.J. Scott-Taggart
and B. Ellis).
Morgan, M.S. (2003). Experiments without material intervention: Model experiments, virtual exper-
iments, and virtually experiments. In: Radder, H. (Ed.), The Philosophy of Scientific Experimen-
tation. Univ. of Pittsburgh Press, Pittsburgh, pp. 216–235.
Simon, H.A. (1969). The Sciences of the Artificial. MIT Press, Cambridge.
Stevens, S.S. (1959). Measurement, psychophysics, and utility. In: Churchman, C.W., Ratoosh, P.
(Eds.), Measurement. Definitions and Theories. Wiley, New York, pp. 18–63.
Suppes, P., Zinnes, J.L. (1963). Basic measurement theory. In: Luce, R.D., Bush, R.R., Galanter, E.
(Eds.), Handbook of Mathematical Psychology. Wiley, New York, London and Sydney, pp. 1–76.
Sutton, J. (2000). Marshall’s Tendencies: What Can Economists Know? Leuven Univ. Press, Leuven
and The MIT Press, Cambridge and London.
Sydenham, P.H. (1979). Measuring Instruments: Tools of Knowledge and Control. Peter Peregrinus,
London.
White, K.P. (1999). System Design. In: Sage, A.P., Rouse, W.B. (Eds.), Handbook of Systems Engi-
neering and Management. Wiley, New York, pp. 455–481.
Woodward, J. (1989). Data and phenomena. Synthese 79, 393–472.
PART III
Representation in Econometrics
This page intentionally left blank
CHAPTER 10
Representation in Econometrics:
A Historical Perspective
Christopher L. Gilberta and Duo Qinb
a Dipartimento di Economia, Università degli Studi di Trento, Italy
E-mail address: cgilbert@economia.unitn.it
b Department of Economics, Queen Mary, University of London, UK
E-mail address: d.qin@qmul.ac.uk
Abstract
Measurement forms the substance of econometrics. This chapter outlines the
history of econometrics from a measurement perspective – how have measure-
ment errors been dealt with and how, from a methodological standpoint, did
econometrics evolve so as to represent theory more adequately in relation to
data? The evolution is organised in terms of four phases: ‘theory and measure-
ment’, ‘measurement and theory’, ‘measurement with theory’ and ‘measurement
without theory’. The question of how measurement research has helped in the
advancement of knowledge advance is discussed in the light of this history.
10.1. Prologue
1 There was a strong sense to make ‘modern economics’ ‘scientific’, as apposed to humanity, e.g.
see Schumpeter (1933) and Mirowski (1989).
• the orthodox structural approach which closely follows the measurement ap-
proach of hard science;
• the reformist approach which places measurement in a soft system but does
not diverge methodologically from the scientific approach; and
• the heterodox approach which we discuss as ‘measurement without theory’.
Economists have been concerned with quantification from at least the nineteenth
century. Morgan’s (1990) history of econometrics starts with W.S. Jevons’ at-
tempts to relate business cycles to sunspots (Jevons, 1884). Jevons (1871) was
also the first economist to ‘fit’ a demand equation although Morgan (1990) at-
tributes the first empirical demand function to C. Davenant (1699) at the end of
the seventeenth century. Klein (2001) documents measurement of cyclical phe-
nomena commencing with W. Playfair’s studies of the rise and decline of nations
published during the Napoleonic War (Playfair, 1801, 1805). Hoover and Dow-
ell (2001) discuss the history of measurement of the general price level starting
from a digression in Adam Smith’s Wealth of Nations (Smith, 1776).
More focused empirical studies occurred during the first three decades of the
twentieth century. These studies explored various ways of characterising cer-
tain economic phenomena, e.g. the demand for a certain product, or its price
movement, or the cyclical movement of a composite price index by means
of mathematical/statistical measures which would represent certain regular at-
tribute of the phenomena concerned, e.g. see Morgan (1990), Gilbert and Qin
(2006) and the Chapter by Chao in this volume. These studies demonstrate a
concerted endeavour to transform economics into a scientific discipline through
the development of precise and quantifiable measures for the loose and unquan-
tified concepts and ideas widely used in traditional economic discussions.
This broad conception of the role of econometrics continued to be reflected in
textbooks written in the first two post-war decades in which econometrics was
equated to empirical economics, with emphasis on the measurability in eco-
nomic relationships. Klein (1974, p. 1) commences the second edition of his
1952 textbook by stating ‘Measurement in economics is the subject matter of
this volume’. In Klein (1962, p. 1) he says ‘The main objective of econometrics
is to give empirical content to a priori reasoning in econometrics’. This view
of econometrics, which encompassed specification issues and issues of mea-
surement as well as statistical estimation, lagged formal developments in the
statistical theory of econometrics.
The formalisation of econometrics was rooted directly in the ‘structural
method’ proposed by Frisch in the late 1930s (1937, 1938). Much of the for-
malisation was stimulated by the famous Keynes–Tinbergen debate, see Hendry
and Morgan (1995, Part VI), and resulted in econometrics becoming a dis-
tinct sub-discipline of economics. The essential groundwork of the formalisation
comprised the detailed theoretical scheme laid out by Haavelmo (1944) on the
basis of probability theory and the work of the Cowles Commission (CC) which
elaborated technical aspects of Haavelmo’s scheme, see Koopmans (1950) and
Hood and Koopmans (1953).3
2 This is from the title of the Cowles Commission twenty year research report, see Christ (1952).
3 For more detailed historical description, see Qin (1993) and Gilbert and Qin (2006).
254 C.L. Gilbert and D. Qin
p
A0 x t = Ai xt−i + εt . (10.2.1)
i=1
4 Note that ‘identification’ carried far wider connotation prior to this formalisation, e.g. see Hendry
and Morgan (1989) and Qin (1989).
Representation in Econometrics: A Historical Perspective 255
p
p
xt = A−1
0 Ai xt−i + A−1
0 εt = i xt−i + ut . (10.2.2)
i=1 i=1
5 The CC group was conscious of the problem and ascribed it to the lack of good theoretical
models, see Koopmans (1957) and also Gilbert and Qin (2006).
256 C.L. Gilbert and D. Qin
formulation, which they had initially wished to take as given (see e.g. Simon,
1953).
The CC’s work set the scientific standard for econometric research. Their work
was both further developed (tool adaptation) and subjected to criticism in the
decades that followed.
The controversy between maximum likelihood (ML) and least squares (LS)
estimation methods illustrates the limits of tool adaptation. The argument is re-
lated primarily to the validity of the simultaneous representation of economic
interdependence, a model formulation issue (e.g. see Wold, 1954, 1960, 1964).
The judgement or evaluation related to actual model performance, e.g. measured
accuracy of modelled variables against actual values. The reversal out of ML es-
timation methods back to LS estimation methods provided a clear illustration of
the practical limits of tools rather than model adaptation. The Klein-Goldberger
model (1955) provided the test-bed (see Christ, 1960 with Waugh, 1961), offer-
ing the final judgement in favour of LS methods.
This was one of a number of debates which suggested that there was relatively
little to be gained from more sophisticated estimation methods. An overriding
concern which came to be felt among researchers was the need for statistical
assessment of model validity. This amounted to a shift in focus from the mea-
surement of structural parameters within a given model to examination of the
validity if the model itself. It led to the development of a variety of specification
methods and test statistics for empirical models.
One important area of research related to the examination of the classical
assumptions with regard to the error term, as these sustain statistical optimal-
ity of the chosen estimators.6 Applied research, in particular consumer demand
studies, exposed a common problem: residual serial correlation (e.g. see Orcutt,
1948). From that starting point, subsequent research took two different direc-
tions. The first was to search for more sophisticated estimators on the basis of
an acceptance of a more complicated error structure but remaining within the
originally postulated structural model. Thus in the case of residual serial cor-
relation, we have the Cochrane–Orcutt procedure (1949) while in the case of
residual heteroscedasticity, we have feasible general least squares (FGLS) both
of which involve two stage estimation procedures. These were instances of tool
adaptation. The other direction was to modify the model in such a way as to
permit estimation on the basis of the classical assumptions (e.g. Brown’s 1952
introduction of partial adjustment model into the consumption function, an early
instance of model adaptation).
In later decades, it was model adaptation which came to dominate, especially
in the field of time-series econometrics. Statistically, this was facilitated by the
6 For a historical account of the error term in econometrics, see Qin and Gilbert (2001).
Representation in Econometrics: A Historical Perspective 257
This section sets out how the second generation of econometricians put model
search as the focus of their research.
The RE movement, and especially the component associated with the Lu-
cas’ (1976) critique, posed a profound methodological challenge to then current
258 C.L. Gilbert and D. Qin
9 The method of factor analysis in a cross-sectional setting was employed in economics as early as
the 1940s (see e.g. Waugh, 1942 and Stone, 1947).
10 Shephard (2006) provides a history of SV models.
Representation in Econometrics: A Historical Perspective 263
could be interpreted as (or in terms of) the first order derivatives of the sup-
posedly underlying theoretical models. By contrast, parameters often lack clear
interpretation in nonlinear models and the model must be interpreted through
simulation.
The status of models, and hence structure, in philosophy of science, and specif-
ically in the methodology of economics, remains controversial. Even if in some
of the natural sciences, parameters may be seen as natural constants relating to
universal regularities, it makes more sense in economics to see parameters as
objects defined in relation to models, and not in relation either to theories or to
the world itself. Econometric measurement becomes co-extensive with model
specification and estimation.
The standard view is that models provide a means of interpreting theory into
the world. Cartwright (1983) regards models as explications of theories. For
Hausman (1992), models are definitional – they say nothing directly about the
world, but may have reference to the world. Further, a theory may assert that a
particular model does make such reference. These views are broadly in line with
the CC conception of econometrics in which models were taken as given by the
theorists.
Taking models as given proved unproductive in practice. Estimated models
often performed poorly, and more sophisticated estimation (measurement) meth-
ods failed to give much improvement; identification problems were often acute;
and the availability of richer data sets produced increasing evidence of misspec-
ification in ‘off the shelf’ economic models. The econometrician’s task shifted
from model estimation to adaptation. This view was captured by Morgan (1988)
who saw empirical models as intermediating theory and the world. For her, the
task facing the economist was to find a satisfactory empirical model from the
large number of possible models each of which would be more or less closely
related to economic theory.
The alternative view of the relationship between theory and models is less
linear, even messier. Morrison (1999) asserts that models are autonomous, and
may draw from more than one theory or even from observed regularities rather
than theories. Boumans (1999), who discusses business cycle theory, also views
models as eclectic, ‘integrating’ (Boumans’ term) elements from different the-
ories. In terms of our earlier discussion, this view is more in line with the
data-instigated approach to economic modelling which derives from the tradi-
tions of time series statistics. In this tradition, economic theory is often loosely
related to the estimated statistical model, and provides a guide for interpretation
of the estimates rather than a basis for the specification itself.
Wherein lies the measurement problem in econometrics? Econometricians in
the CC tradition saw themselves as estimating parameters of well-defined struc-
tural models. These structural parameters were often required to be invariant to
264 C.L. Gilbert and D. Qin
changes in other parts of the system, such as those induced by policy change.
Many of these parameters were first order partial derivatives. But the interpreta-
tion of any partial derivative depends on the ceteris paribus condition – what is
being held constant? The answer depends on the entire model specification. If
we follow Boumans (1999) and Morrison (1999) in regarding models as being
theoretically eclectic, parameters must relate to models and not theories. The
same conclusion follows from Morgan’s views of the multiplicity of possible
empirical models.
Subsequently, with the fading faith in the existence of a unique correct model
for any specific economic structure, measurement shifted away from parame-
ters, which are accidental to model specification, and towards responses, and in
particular in time series contexts, to dynamic responses. The VAR emphasis, for
example, is often on estimated impulse response functions, rather than the pa-
rameters of a particular VAR specification. Similarly, the main interest in error
correction specifications is often in the characterisation of the system equilib-
rium which will be a function of several parameters.
Models may be more or less firmly grounded in theory. The evolution of
econometrics may be seen as a continuous effort to pursue best possible statisti-
cal measurements for both ‘principle models’ and ‘phenomenological models’,
to use the model classification suggested by Boniolo (2004).11 The former are
assiduously sought by the orthodox structural econometricians. This probably
results from four major attractions of a ‘principle’ model, see De Leeuw (1990),
namely it serves as an efficient medium of cumulative knowledge; it facilitates
interpolation, extrapolation and prediction; it allows for deductive reasoning to
derive not so apparent consequence; it enables the distilling out of stable and
regular information.
Many classes of models in economic theory are deliberately and profoundly
unrealistic. This is true, for example, of general equilibrium theory and much of
growth theory. Such models make possible ‘conceptual, logical and mathemati-
cal exploration’ of the model premises. These models are useful in so far as they
‘increase our conceptual resources’ (Hausman, 1992, p. 77) and, we would add,
that they allow us to recognise similar aspects of the model behaviour which cor-
respond to real world economic phenomena. In a sense, these models substitute
for experiments which are seldom possible for entire economies.
Econometrics claims to be solely occupied with models which are realistic
in the sense that they account statistically for behaviour as represented by data
sets. For econometricians, the data are the world. Following Haavelmo’s (1944)
manifesto, Neyman–Pearson testing methodology became the established proce-
dure for establishing congruency of models with data. But the claim to realism
is problematic in that models can at best offer partial accounts of any set of
phenomena. ‘The striving for too much realism in a model may be an obstacle
11 The third model category in Boniolo (2004) is ‘object models’, which correspond essentially to
computable general equilibrium (CGE) type models in econometrics.
Representation in Econometrics: A Historical Perspective 265
to explain the relevant phenomena’ (Boumans, 1999, p. 92). During the initial
decades of modern econometrics, data sets were limited and sometimes rela-
tively uninformative. Over more recent decades, econometricians have benefited
both from larger and more informative data sets and from the computing power
to analyse these data. As Leamer anticipated, these rich data would oblige a
thorough-going classical econometrician to reject almost any model: ‘. . . since
a large sample is presumably more informative than a small sample, and since
it is apparently the case that we will reject the null hypothesis in a large sam-
ple, we might as well begin by rejecting the hypothesis and not sample at all’
(Leamer, 1978, p. 89). So either by the force of circumstance in the case of in-
adequate data, by design in the face of rich and informative data, or through
the imposition of strong Bayesian priors, econometricians have abandoned real-
ism in favour of simplicity. The situation is not very different from that of the
deliberately unrealistic theory models. Econometricians measure, but measure-
ments are model-specific and are informative about the world only in so far as
the models themselves are congruent with the world.
History reflects a gradual ‘externalisation’ of measurement in terms of Car-
nap’s terminology (1950): the development of measurement instruments is ini-
tially for ‘internal questions’ and moves gradually towards ‘external questions’.
For example, parameters are internal within models, whereas the existence of
models is external with respect to the parameters. Econometric research has
moved from the issue of how to optimally estimate parameters to the harder
issue of how to measure and hence evaluate the efficiency, fruitfulness and sim-
plicity of the models, i.e. the relevance of models as measuring instruments.
Acknowledgements
References
Banerjee, A., Marcellino, M., Masten, I. (2003). Leading indicators for Euro area inflation and GDP
growth. Working paper No. 3893. IGIR.
Beveridge, W.H. (1921). Weather and harvest cycles. Economic Journal 31, 429–452.
Bollerslev, T., Chou, R.Y., Kroner, K.F. (1992). ARCH modelling in finance. Journal of Economet-
rics 52, 5–59.
Boniolo, G. (2004). Theories and models: Really old hat? In: Yearbook of the Artificial, vol. II. Peter
Lang Academic Publishing Company, Bern, pp. 61–86.
Boumans, M. (1999). Built-in justification. In: Morgan, M.S., Morrison, M. (Eds.), Models as Me-
diators. Cambridge Univ. Press, Cambridge, pp. 66–96.
Boumans, M. (2005). Measurement in economic systems. Measurement 38, 275–284.
Box, G.E.P., Jenkins, G.M. (1970). Time Series Analysis, Forecasting and Control. Holden-Day, San
Francisco.
266 C.L. Gilbert and D. Qin
Brillinger, D.R. (2002). John W. Tukey’s work on time series and spectrum analysis. Annals of
Statistics 30, 1595–1618.
Brown, T.M. (1952). Habit persistence and lags in consumer behaviour. Econometrica 20, 361–383.
Burns, A.F., Mitchell, W.C. (1946). Measuring business cycles. National Bureau of Economic Re-
search, New York.
Camba-Mendez, G., Kapetanios, G. (2004). Forecasting Euro area inflation using dynamic factor
measures of underlying inflation. Working paper No. 402. ECB.
Carnap, R. (1950). Empiricism, semantics, and ontology. Revue Internationale de Philosophie 4,
20–40.
Cartwright, N. (1983). How the Laws of Physics Lie. Clarendon Press, Oxford.
Christ, C.F. (1952). History of the Cowles Commission, 1932–1952. In: Economic Theory and
Measurement: A Twenty-Year Research Report 1932–1952. Cowles Commission for Research
in Economics, Chicago, pp. 3–65.
Christ, C.F. (1960). Simultaneous equations estimation: Any verdict yet? Econometrica 28, 835–
845.
Chow, G.C. (1960). Tests of equality between sets of coefficients in two linear regressions. Econo-
metrica 28, 591–605.
Cochrane, D., Orcutt, G. (1949). Application of least squares regression to relationships containing
autocorrelated error terms. Journal of the American Statistical Association 44, 32–61.
Davenant, C. (1699). An Essay upon the Probable Methods of Making a People Gainers in the
Balance of Trade. R. Horsfield, London.
De Leeuw, J. (1990). Data modelling and theory construction. Chapter 13 in: Hox, J.J., Jon-Gierveld,
J.D. (Eds.), Operationalization and Research Strategy. Swets & Zeitlinger, Amsterdam.
Diebond, F.X., Rudebusch, G.D. (1996). Measuring business cycles: A modern perspective. Review
of Economics and Statistics 78, 67–77.
Durbin, J., Watson, G.S. (1950). Testing for serial correlation in least squares regression, I. Biomet-
rica 37, 409–428.
Durbin, J., Watson, G.S. (1951). Testing for serial correlation in least squares regression, II. Biomet-
rica 38, 159–178.
Engle, R.F. (1982). Autoregressive conditional heteroskedasticity with estimates of the variance of
United Kingdom inflation. Econometrica 50, 987–1008.
Engle, R.F., Granger, C.W.J. (1987). Cointegration and error correction: representation, estimation
and testing. Econometrica 55, 251–276.
Evans, M. (1966). Multiplier analysis of a post-War quarterly US model and a comparison with
several other models. Review of Economic Studies 33, 337–360.
Forni, M., Mallin, M., Lippi, F., Reichlin, L. (2005). The generalised dynamic factor model: One-
sided estimation and forecasting. Journal of the American Statistical Association 100, 830–840.
Frisch, R. (1933). Editorial. Econometrica 1, 1–4.
Frisch, R. (1937). An ideal programme for macrodynamic studies. Econometrica 5, 365–366.
Frisch, R. (1938). Autonomy of economic relations, unpublished until inclusion. In: Hendry, D.F.,
Morgan, M.S. (Eds.) (1995), The Foundations of Econometric Analysis. Cambridge Univ. Press,
Cambridge, pp. 407–419.
Gilbert, C.L. (1989). LSE and the British approach to time series econometrics. Oxford Economic
Papers 41, 108–128.
Gilbert, C.L., Qin, D. (2006). The first fifty years of modern econometrics. In: Patterson, K., Mills,
T.C. (Eds.), Palgrave Handbook of Econometrics. Palgrave MacMillan, Houndmills, pp. 117–
155.
Gordon, R.J. (1970). The Brookings model in action: A review article. Journal of Political Economy
78, 489–525.
Granger, C.W.J. (1969). Investigating causal relations by econometric models and cross-spectral
methods. Econometrica 37, 424–438.
Granger, C.W.J., Hatanaka, M. (1964). Spectral Analysis of Economic Time Series. Princeton Univ.
Press, Princeton.
Representation in Econometrics: A Historical Perspective 267
Granger, C.W.J., Teräsvirta, T. (1993). Modelling Nonlinear Time Series. Oxford Univ. Press, Ox-
ford.
Greenstein, B. (1935). Periodogram analysis with special application to business failures in the
United States, 1867–1932. Econometrica 3, 170–198.
Griliches, Z. (1968). The Brookings model volume: A review article. Review of Economics and
Statistics 50, 215–234.
Haavelmo, T. (1944, mimeograph 1941), The probability approach in econometrics. Econometrica
12, supplement.
Hamilton, J.D. (1989). A new approach to the economic analysis of nonstationary time series and
the business cycle. Econometrica 57, 357–384.
Hamilton, J.D. (1990). Analysis of time series subject to changes in regime. Journal of Econometrics
45, 39–70.
Hausman, D.M. (1992). The Inexact and Separate Science of Economics. Cambridge Univ. Press,
Cambridge.
Hausman, J.A. (1978). Specification tests in econometrics. Econometrica 46, 1251–1271.
Hendry, D.F. (1980). Econometrics – alchemy or science? Economica 47, 387–406.
Hendry, D.F. (1995). Dynamic Econometrics. Oxford Univ. Press, Oxford.
Hendry, D.F., Morgan, M.S. (1989). A re-analysis of confluence analysis. Oxford Economic Papers
41, 35–52.
Hendry, D.F., Morgan, M.S. (Eds.) (1995). The Foundations of Econometric Analysis. Cambridge
Univ. Press, Cambridge.
Hood, W., Koopmans, T.C. (Eds.) (1953). Studies in Econometric Method. Cowles Commission
Monograph 14. New York.
Hoover, K.D., Dowell, M.E. (2001). Measuring causes: Episodes in the quantitative assessment of
the value of money. In: Klein, J.L., Morgan, M.S. (Eds.), The Age of Economic Measurement.
Duke Univ. Press, Durham (NC), pp. 137–161.
Hull, J., White, A. (1987). The pricing of options on assets with stochastic volatilities. Journal of
Finance 42, 28–300.
Jevons, W.S. (1871). The Theory of Political Economy. Macmillan, London.
Jevons, W.S. (1884). Investigations in Currency and Finance. Macmillan, London.
Johansen, S. (1988). Statistical analysis of cointegration vectors. Journal of Economic Dynamics
and Control 12, 231–254.
Johnston, J. (1963). Econometric Methods. McGraw-Hill, New York.
Klein, J.L. (2001). Reflections from the age of economic measurement. In: Klein, J.L., Morgan,
M.S. (Eds.), The Age of Economic Measurement. Duke Univ. Press, Durham (NC), pp. 111–136.
Klein, L.R. (1952, 2nd ed., 1974). A Textbook in Econometrics. Prentice Hall, Englewood Cliffs
(NJ).
Klein, L.R. (1962). An Introduction to Econometrics. Prentice Hall, Englewood Cliffs (NJ).
Klein, L.R., Goldberger, A.S. (1955). An Econometric Model of the United States 1929–1952. North-
Holland, Amsterdam.
Koopmans, T.C. (1947). Measurement without theory. Review of Economics and Statistics 29, 161–
179.
Koopmans, T.C. (Ed.) (1950). Statistical Inference in Dynamic Economic Models. Cowles Commis-
sion Monograph 10. Wiley, New York.
Koopmans, T.C. (1957). Three Essays on the State of Economic Science. McGraw-Hill, New York.
Koopmans, T.C., Reiersøl, O. (1950). The identification of structural characteristics. Annals of Math-
ematical Statistics 21, 165–181.
Leamer, E.E. (1978). Specification Searches. Wiley, New York.
Liu, T.-C. (1960). Underidentification, structural estimation, and forecasting. Econometrica 28, 855–
865.
Lucas, R.E. (1976). Econometric policy evaluation: A critique. In: Brunner, K., Meltzer, A.H. (Eds.),
The Phillips Curve and Labor Markets. Carnegie-Rochester Conference Series on Public Policy,
vol. 1. North-Holland, Amsterdam.
268 C.L. Gilbert and D. Qin
Luce, R.D., Krantz, D.H., Suppes, P., Tversky, A. (1990). Foundations of Measurement, vol. 3:
Representation, Axiomatisation and Invariance. Academic Press, New York.
Malinvaud, E. (1964, English ed. 1968). Statistical Methods in Econometrics. North-Holland, Am-
sterdam.
Marschak, J. (1946). Quantitative studies in economic behaviour (Foundations of rational economic
policy). Report to the Rockefeller Foundation, Rockefeller Archive Centre.
Mirowski, P. (1989). More Heat than Light. Cambridge Univ. Press, Cambridge.
Moore, H.L. (1914). Economic Cycles – Their Law and Cause. MacMillan, New York.
Morgan, M.S. (1988). Finding a satisfactory empirical model. In: de Marchi, N. (Ed.), The Popperian
Legacy in Economics. Cambridge Univ. Press, Cambridge, pp. 199–211.
Morgan, M.S. (1990). The History of Econometric Ideas. Cambridge Univ. Press, Cambridge.
Morgenstern, O. (1961). A new look at economic time series analysis. In: Hegeland, H. (Ed.), Money,
Growth, and Methodology and other Essays in Economics: In Honor of Johan Akerman. CWK
Gleerup Publishers, Lund, pp. 261–272.
Morrison, M. (1999). Models as autonomous agents. In: Morgan, M.S., Morrison, M. (Eds.), Models
as Mediators. Cambridge Univ. Press, Cambridge, pp. 38–65.
Nelson, C.R. (1972). The prediction performance of the FRB-MIT-PENN model of the US economy.
American Economic Review 62, 902–917.
Orcutt, G. (1948). A study of the autoregressive nature of the time series used for Tinbergen’s model
of the economic system of the United States 1919–1932. Journal of the Royal Statistical Society,
Series B 10, 1–45.
Pagan, A. (1987). Three econometric methodologies: A critical appraisal. Journal of Economic Sur-
veys 1, 3–24.
Persons, W.M. (1916). Construction of a business barometer based upon annual data. American
Economic Review 6, 739–769.
Persons, W.M. (1919). Indices of business condition. Review of Economic Studies 1, 5–110.
Phillips, A.W. (1954). Stabilisation policy in a closed economy. Economic Journal 64, 290–323.
Phillips, A.W. (1957). Stabilisation policy and the time form of lagged responses. Economic Journal
67, 256–277.
Phillips, P.C.B. (1997). The ET interview: Professor Clive Granger. Econometric Theory 13, 253–
303.
Playfair, W. (1801). The Statistical Breviary. Bensley, London.
Playfair, W. (1805). An Inquiry into the Permanent Causes of the Decline and Fall of Wealthy and
Powerful Nations. Greenland and Norris, London.
Qin, D. (1989). Formalisation of identification theory. Oxford Economic Papers 41, 73–93.
Qin, D. (1993). The Formation of Econometrics: A Historical Perspective. Oxford Univ. Press, Ox-
ford.
Qin, D. (1996). Bayesian econometrics: The first twenty years. Econometric Theory 12, 500–516.
Qin, D. (2006). VAR modelling approach and Cowles Commission heritage. Economics Department
Discussion Paper Series QMUL No. 557.
Qin, D., Gilbert, C.L. (2001). The error term in the history of time series econometrics. Econometric
Theory 17, 424–450.
Rothenberg, T.J. (1971). The Bayesian approach and alternatives in econometrics. In: Intriligator,
M.D. (Ed.), Frontiers of Quantitative Economics. North-Holland, Amsterdam, pp. 194–207.
Sargan, J.D. (1964). Wages and prices in the United Kingdom: A study in econometric methodology.
In: Hart, R.E., Mills, G., Whittaker, J.K. (Eds.), Econometric Analysis for National Economic
Planning. Butterworth, London, pp. 25–63.
Sargent, T.J. (1981). Interpreting economic time series. Journal of Political Economy 89, 213–247.
Sargent, T.J., Sims, C.A. (1977). Business cycle modelling without pretending to have too much
a priori economic theory. In: New Methods in Business Cycle Research: Proceedings from a
Conference. Federal Reserve Bank of Minneapolis, pp. 45–109.
Schumpeter, J. (1933). The common sense of econometrics. Econometrica 1, 5–12.
Sent, E.-M. (1998). The Evolving Rationality of Rational Expectations: An Assessment of Thomas
Sargent’s Achievements. Cambridge Univ. Press, Cambridge.
Representation in Econometrics: A Historical Perspective 269
Shephard, N. (2006). Stochastic volatility. In: Durlauf, S., Blume, L. (Eds.), New Palgrave Dictio-
nary of Economics, 2nd ed. Nuffield College, Oxford University. Draft version: Working paper
17.
Simon, H.A. (1953). Causal ordering and identifiability. In: Hood, W., Koopmans, T. (Eds.), Studies
in Econometric Method. In: Cowles Commission Monograph 14, pp. 49–74.
Sims, C.A. (1980). Macroeconomics and reality. Econometrica 48, 1–48.
Sims, C.A. (1989). Models and their uses. American Journal of Agricultural Economics 71, 489–
494.
Smith, A. (1904). An Inquiry into the Causes and Consequences of the Wealth of Nations. Methuen,
London. (E. Cannan’s edition first published in 1776.)
Stock, J.H., Watson, M.W. (1989). New indexes and coincident and leading economic indicators. In:
Blanchard, O., Fischer, S. (Eds.), NBER Macroeconomic Annual. MIT Press, Cambridge, MA,
pp. 351–394.
Stock, J.H., Watson, M.W. (1991). A probability model of the coincident economic indicators. In:
Lahiri, K., Moore, G.H. (Eds.), Leading Economic Indicators: New Approaches and Forecasting
Records. Cambridge Univ. Press, Cambridge, pp. 63–89.
Stock, J.H., Watson, M.W. (1993). A procedure for predicting recessions with leading indicators:
Econometric issues and recent experience. In: Stock, J.H., Watson, M.W. (Eds.), Business Cycles,
Indicators and Forecasting. Univ. of Chicago Press for NBER, Chicago, pp. 255–284.
Stone, R. (1947). On the interdependence of blocks of transactions. Journal of the Royal Statistical
Society (suppl.) 9, 1–45.
Theil, H. (1957). Specification errors and the estimation of economic relationships. Review of Inter-
national Statistical Institute 25, 41–51.
Theil, H. (1958). Economic Forecasts and Policy. North-Holland, Amsterdam.
Vining, R. (1949). Koopmans on the choice of variables to be studied and of methods of measure-
ment: A rejoinder. Review of Economics and Statistics 31, 77–86, 91–94.
Waugh, F.V. (1942). Regression between two sets of variables. Econometrica 10, 290–310.
Waugh, F.V. (1961). The place of least squares in econometrics. Econometrica 29, 386–396.
Wold, H. (1954). Causality and econometrics. Econometrica 22, 162–177.
Wold, H. (1960). A generalization of causal chain models. Econometrica 28, 443–463.
Wold, H. (Ed.), (1964). Econometric Model Building: Essays on the Causal Chain Approach. North-
Holland, Amsterdam.
This page intentionally left blank
CHAPTER 11
Structure
Hsiang-Ke Chao
Department of Economics, National Tsing Hua University, 101, Section 2, Kuang Fu Road,
Hsinchu 300, Taiwan
E-mail address: hkchao@mx.nthu.edu.tw
11.1. Introduction
1 Also see Michell (Chapter 2, this volume) for a historical account for the philosophical origin of
the representational theory of measurement.
There are generally two meanings of structure in econometrics. One refers to the
understanding that the relationships among variables are specified by theory or a
priori information. The other refers to the notion of invariance. In this chapter the
former is called the “theory view” while the latter is referred to as the “invariance
view”. Both the theory view and the invariance view are direct outgrowths of the
Cowles Commission approach to econometric modeling. They are compatible
rather than conflicting with each other. Each meaning leads to a different model
specification and a measurement strategy.
Structure and its measurement are discussed by considering four approaches
towards econometric models. They are: the Cowles Commission structural ap-
proach, the new classical macroeconomics, the vector autoregressive models,
and the London School of Economics (LSE) approach (Gilbert and Qin, Chap-
ter 10, this volume, also discuss the similar issues).2
2 Kevin Hoover has explored extensively on the same issue in a series of his works, but his main
concern is the issue of causality. Structure is regarded as causal. Hoover’s account can be regarded
as a structural approach to causality. See Hoover (2001), Hoover and Jordá (2001), and Demiralp
and Hoover (2003).
3 For more detailed historical accounts, see Epstein (1987), Morgan (1990), Qin (1993), and
especially Hendry and Morgan (1995, pp. 60–76).
4 Qin (1993, p. 68) states that the indirect least square method was first developed by Jan Tinbergen
in 1930.
Structure 273
Marschak (1953) and Hurwicz (1962) are particularly concerned with the is-
sue of invariance under policy intervention. They can be seen as a precedent for
the Lucas critique (Lucas, 1976; see below). Hurwicz, for example, along the
same line with Haavelmo’s view, argues that if the original model and the one
modified after some policies are implemented are both unique up to a admis-
sible transformation, then this model can be regarded as containing a structure
with respect to this policy intervention. Thus, structure is a relative concept:
274 H.-K. Chao
Even though the Cowles Commission scholars have considered the theme of
invariance, their simultaneous equations models have become the targets of
the Lucas critique. Lucas (1976) challenges the standard econometric models
which do not exhibit invariant relationships as they should be, because they
do not properly deal with expectations. Macroeconomists’ reaction to the Lu-
cas critique is to construct models based on the microfoundation that employs
the representative agent assumption and derives a well-articulated optimization
model. What is invariant in this model can thus be regarded as structure. For
instance, deep parameters, indicating the policy-invariant parameters describ-
ing taste and technology, are regarded as structural in the real business cycle
research. In consumption studies, the Euler equation, denoting the first-order
condition of the consumer’s intertemporal choices, represents the structure for
the new classical aggregate consumption function (Hall, 1978).6
5 Woodward’s recent work (2000, 2003) discusses extensively the degrees of autonomy and invari-
ance in the context of scientific explanation.
6 Hall (1990, p. 135): “For consumption, the structural relation, invariant to policy interventions
and other shifts elsewhere in the economy, is the intertemporal preference ordering.”
Structure 275
Sims’s vector autoregressive (VAR) (Sims, 1980) approach was inspired by Ta-
Chung Liu’s critique on the Cowles Commission method. Liu (1960, 1963)
asserts that the identifying restrictions exclude many variables that should be
included.7 One of the reasons is that in reality “very few variables can really be
legitimately considered as exogenous to economic system” (Liu, 1963, p. 162).
7 In this sense the identifying restrictions are sometimes called “exclusion restrictions”.
276 H.-K. Chao
This is referred by Maddala (2001, p. 375) as the “Liu critique”. Therefore, the
nature of the models is underidentified rather than overidentified. Sims wants to
abandon these “incredible restrictions” altogether and proposes an unrestricted
reduced-form model, in which all variables are regarded as endogenous.8
The simplest form of the VAR model is reduced-form VAR models. In a
reduced-form VAR model each variable is a linear function of the past value
of itself and all other variables. Each equation can be estimated by the OLS
method. However, it is assumed in reduced-form VARs that error terms, usually
denoting shocks in macroeconomic theory, are usually correlated. A problem is
caused by interpreting these error terms as particular economic shocks that are
normally regarded as uncorrelated. To solve the problem, econometricians can
orthogonalize the shocks by using a Choleski factorization to decompose the
covariance matrix (Sims, 1980). The Choleski decomposition implies a Wold
causal chain on the contemporaneous variables – we have a specific hierarchi-
cal causal ordering among the contemporaneous variables. However, the order
of the variables is arbitrarily chosen. This means there is a unique Choleski de-
composition for each possible order – changing the order of the contemporary
variables changes the VAR representation. Therefore, we have a class of obser-
vational equivalent VAR representations.
In order to identify a VAR, additional a priori information is in need to choose
between many possible links among contemporaneous variables. A VAR that
requires these identifying restrictions is known as a structural VAR, where the
term “structural” is the same as the theory view that we can find in the Cowles
Commission models. See Bernanke (1985), Blanchard and Watson (1986), Sims
(1986) for some early papers on structural VARs.9
The LSE school of econometrics, led by David Hendry and his collaborators, of-
fers a methodologically promising approach to economic modeling (see Hendry,
1995, 2000). At the outset it is assumed that there exists an unobservable data-
generating process (DGP), represented by a conditional joint density function
of all the sample data, that is responsible for producing the data we observe.
While to uncover the real DGP seems impossible, the best thing that econome-
tricians can do is to build a model which characterizes all types of information
at hand. In this sense a model can be said to be congruent with the information
sets. To achieve a congruent econometric model, the LSE approach provides
the theory of reduction, claiming that to obtain an empirical econometric model
is to impose a sequence of reductions on a hypothetical local DGP (LDGP),
a data-generating mechanism of variables under analysis. The purpose is to
8 Some think that Liu first refers such an identification as “incredible”. But in fact it is Sims (1980)
who originally coins the term.
9 See also Stock and Watson (2001) for a recent review of the VAR approach.
Structure 277
ensure that the features of the data obtained are not lost in the derived em-
pirical model. The practical implementation of the theory of reduction is the
general-to-specific methodology. It directs econometricians to start with a gen-
eral unrestricted model containing all available information that the DGP or the
LDGP is supposed to have. They then use econometric concepts to impose var-
ious types of tests on the general model so that there is no loss of information
when deriving a specified final model.
In the LSE methodology, the notion of structure is equally important to econo-
metric models as in the Cowles Commission methodology. Structure can be
represented as (Hendry, 1995, p. 33):
where yt is the output variable, zt is the input variable for an agent’s decision,
and Et−1 is the conditional expectations given all available information at t − 1.
When ρ is invariant, the above equation can be said to define a structure. It
shows that the LSE methodology subscribes to the invariance view of structure.
Economic theory is not much help to determine the invariance. To see whether
the model represents an invariant relation, the Chow test for structural change is
performed on ρ. Hendry (1997, p. 166) claims, “Succinctly, ‘LSE’ focuses on
structure as invariance under extensions of the information set over time, across
regimes, and for new sources of information.” Hence structure is embedded in
the congruence test which checks for whether parameters are invariant under the
extensions of the information sets.
The views on structure in the above-discussed four approaches to macro-
econometric modeling can be summarized as follows. The simultaneous equa-
tions models that the Cowles Commission proposes involve both the theory view
and the invariance view. The new classical and the RBC schools subscribe to
the invariance view, but also hold the belief that economic theory is capable of
specifying the structure of the model. Both the VAR approach and the LSE ap-
proach construe structure as invariance. What contrasts between the VAR and
the LSE approaches is that in the VAR approach economic theory is regarded as
incapable of imposing credible restrictions on the structure, while in the LSE ap-
proach economic theory and other types of measurable information are treated
on an equal footing. Yet the distinctions between these competing approaches
are rather subtler than this classification suggests. The controversy over structure
between the post-Cowles econometric approaches results from their attitudes to-
wards the Lucas critique.
11.2.5. Discussion
Although Sims’s VAR modeling is of great contrast with the Cowles Commis-
sion simultaneous equations models, he does not see the simultaneous equations
models as misrepresenting the structure. Sims’s favorite definition of structure
comes from Hurwicz (1962) as mentioned above. Sims (1980, 1982) accepts
278 H.-K. Chao
Hurwicz’s idea that invariance is only a matter of degree. Hence, Sims’s view
is no different from Frisch’s and Haavelmo’s views on the autonomy and con-
fluence of econometric models. Sims also finds Lucas’s assumption of a once-
and-for-all policy choice as too strong. A permanent policy action is rare to
non-existent. The public would act (rationally) to implement useful information
provided by history to take up proper responses to policy interventions (Sims,
1982, 1998).10
The meaningfulness of the Lucas critique can be empirically evaluated by
checking whether policies have permanent effects on switching the regimes. A
recent empirical study by Leeper and Zha (2003) shows that actual monetary
policy interventions may not be subject to the Lucas critique. Leeper and Zha
distinguish a policy intervention’s direct effect from the expectation-formation
effect that is induced by the change in people’s expectations about a policy
regime. They find that many monetary policies are in fact “modest” relative
to the Lucas critique in the sense that the policies that the Federal Reserve con-
siders do not have expectation-formation effects. This empirical finding on the
one hand demurs the Lucas critique which specifies the fact that a lack of invari-
ance may be due to the effect of changing expectations formation on structure,
while on the other hand supports Sims’s view that the permanent policy regime
changes are only rare events. The Cowles Commission simultaneous equations
models remain structural in the sense of Hurwicz and still have the merit of be-
ing used in policy analysis. The Lucas critique is merely a “cautionary footnote”
(Sims, 1982, p. 108).11
Structural VARs, a mixture of the VARs and the Cowles structural models,
seem to diverge from the VARs and converge to the new classical macroeco-
nomics. Koopmans’s analogy of the Kepler’s and the Newton’s stages of science
to the NBER’s and the Cowles Commission’s methodologies in his Measure-
ment without Theory paper strikes a chord with the new classical economists.
Cooley and LeRoy (1985) argue against the VARs as a retreat to the Kepler
stage since theory plays no role in scientific investigation They also point out
the identification problem in the non-structural VARs that has to be dealt with.
They claim that in order for policy analysis, the VARs must be interpreted as
structural in terms of the theory view.
Structural VARs are not without criticism. The most appealing one is that it
seems a retreat to what Sims has forcefully argued against the Cowls Commis-
10 See Sims (1998) for his reevaluation for the Lucas critique.
11 Sims’s view can be envisaged in an analogy to the structures in civil engineering. In analyzing the
seismic resistance of a building structure, it is usually required for those “essential structures” (e.g.,
hospitals, power plants) that must remain operational all the time to resist a much larger seismic
force than other structures. Structure thus is also a relative concept: it is defined by its resistance to
a certain assigned degree of the strength of earthquakes. It would be implausible to define structure
by its capacity of resisting one determining seismic activity that destroys all constructed buildings,
because an earthquake this strong has not happened in the past, and it perhaps would never happen
in the future.
Structure 279
12 A study by Keating (1990) also shows that standard structural VAR models under rational expec-
tations may yield inconsistent parameter estimates.
280 H.-K. Chao
the previous theories and also explain the novel facts unexplained by the previ-
ous theories. In this sense we say that the new model encompasses the models
built according to existing theories. Take one of the benchmark models in the
LSE approach as an example: the DHSY (an acronym for Davidson et al., 1978)
model of consumption. When the LSE practitioners built the DHSY model, they
considered the existing theories reflected in the permanent income and the life-
cycle hypotheses, and they aimed to encompass both of them. This indicates that
economic theories are not superior to other sorts of information.
11.2.6. Conclusion
The views of structure can be distinguished between the theory view and the
invariance view. The theory view believes that economic theory is capable
of specifying the relationships between variables. The invariance view defines
structure as a set of invariant relationships under intervention. The structural
VAR approach aligns itself with the Cowles Commission on the theory view.
The identifying restrictions are based on a priori information or theory. The new
classical school goes further to argue that a macroeconomic model needs to be
derived from the representative agent’s optimizing behavior. The VAR and the
LSE approaches are of empiricism. Theory alone does not define structure.
All approaches generally agree to the invariance view, because it would be
strange if there are unstable relationships in their models, yet for the VAR and
structural VAR approaches the Lucas critique does not apply. Sims’s argument
is that the radical policy change is rare to non-existent. For the VAR and the LSE
approaches, the invariance is an empirical question. They agree that invariance
is a matter of degree. This marks the great legacy of Frisch and Haavelmo.13
13 Lucas might also agree to the idea of degree of invariance. See his (1973) work on cross-country
comparisons of the slope of the Phillips curve. (I thank Kevin Hoover for pointing this out to me.)
Structure 281
The “received view”, the name coined by philosopher of science Hilary Put-
nam, stands for the view of the constitution of scientific theories in the eyes
of logical positivists. According to the received view of scientific theories, a
theory consists of a set of theoretical axioms on the one hand, and a set of corre-
spondence rules on the other. Theoretical axioms are constituted by “theoretical
terms” which only exist in the context of theory. The correspondence rules then
relate the axioms to the phenomena expressed in “observational terms”. The
correspondence rules play the central role in the received view. As a part of
the theory, the correspondence rules contain both theoretical and observational
terms, and offer the theory an interpretation by giving the theory proper em-
pirical meanings.14 Logical positivist Rudolf Carnap gave an example of the
correspondence rules as follows: “The (measured) temperature of a gas is pro-
portional to the mean kinetic energy of its molecules.” This correspondence rule
links the kinetic energy of molecules (a theoretical term in molecular theory)
with the temperature of the gas (an observational terms). If such rules exist, then
philosophers are confident in deriving empirical laws about observable entities
from theoretical laws (Carnap, 1966, p. 233).
The received view also recommends a deductive method of theorizing, in
which logical analysis is applied to deduce consequences from the statements
that axioms posit. Since the received view puts great emphasis on the theoriza-
tion from axioms, it is also known as the axiomatic approach (cf. van Fraassen,
1980). However, this type of axiomatization only refers to the theories axioma-
tized in first-order logic. The received view is also called as the syntactic view,
because axioms are statements of language and have no direct connection to the
world.
The received view may have been dismissed in the methodology of economics
(see Blaug, 1992), yet one can still observe theories and practices that fol-
low such a tradition. In addition to the similarities between various economic
methodologies and logical positivism (see Caldwell, 1994), Koopmans’s (1957)
methodological approach is particularly regarded as similar to the received view
14 In the sense that the theoretical axioms are interpreted by the empirical world, the correspondence
rules can be referred to as “rules of interpretation” or “dictionaries”. See Suppe (1977).
282 H.-K. Chao
(see Morgan and Morrison, 1999). Koopmans’s (1957, pp. 132–135) proposi-
tion to the structure of economic theory starts with “postulates” that consist of
logical relations between symbols that he calls “terms”. Terms are interpreted if
they are connected with the observable phenomena. A set of postulates then are
regarded as a theory and can be verified or refuted by observation. Milton Fried-
man’s (1957) permanent income hypothesis may be considered as an example of
the application of the syntactic view. The theoretical term “permanent income”
is linked with the empirical world by the correspondence rule: “the permanent
income is estimated by a weighted pattern of past income” in which “a weighted
pattern of past income” is an observational term. Whereas it is questionable
that we can derive an empirical law about observable entities (i.e., measured
consumption and income) from the permanent income hypothesis in a logical
positivistic way, the theory is confirmed as an empirical claim when the de-
duced consequences are tested against both cross-sectional and time-series data
in various respects.15
The received view has been criticized for many reasons, two of which are
particularly salient, and both regard the correspondence rules.16 First, there is
no sharp distinction between theory and observation. Thomas Kuhn (1962) has
pointed out to us that observations are possibly theory-laden. Therefore, corre-
spondence rules do not obtain. Second, the correspondence rules are too naïve to
describe the complex interactions between theory and the world. This point has
motivated many philosophers to reconsider the structure of theory, particularly
the fact that models are used extensively to bridge theory and data in science.
One alternative to the syntactic approach is to develop an account that is inspired
by model theory in mathematics and scientific practices. The semantic view or
model-theoretical approach provides such an alternative.
15 This kind of test can be regarded as what Kim et al. (1995) call “characteristic tests”, that aim
to confirm specific characteristics of empirical models. See Mayer’s (1972) classic book for an
extensive study on testing Friedman’s permanent income hypothesis, and Chao (2003, pp. 87–89)
for interpreting Friedman’s theory in terms of the notion of the characteristic tests.
16 See Suppe (1977, 1989, 2000) for the criticisms of the received view.
Structure 283
An entity that has a structure can be thought of as a model for the theory, which
is a realization of all axioms. There can be many models for the theory if these
models all satisfy the axioms. However, since these models have the same struc-
ture, an isomorphism (one-one mapping) can be constructed between them.
Suppes believes that the notions of models and structure in mathematical logic
can also be applied to understand models used in the daily scientific practices.
To put models at the center stage of science marks a significant contrast to the
received view. As Suppes puts it: “A central topic in the philosophy of science is
the analysis of the structure of scientific theories. . .. The fundamental approach
I advocated for a good many years is the analysis of the structure of a theory in
terms of the models of the theory” (Suppes, 2002, p. 51).
In considering the roles and functions of models, Suppes’s seminal article “Mod-
els of Data” (Suppes, 1962) pioneered the attempt to explicate the role models
17 “Theory” here means mathematical theory, like theory of group or theory of ordering.
18 Suppes (1957) shows that ordered n-ary tuples can be reduced to ordered couples.
284 H.-K. Chao
11.3.4. Two versions of the semantic view: Van Fraassen and Giere
Two of the most discussed versions of the semantic view are offered by van
Fraassen (1980, 1989) and Giere (1988), who differ from each other in ontology.
Van Fraassen’s picture of scientific theories can be depicted as follows. At the
outset, there are structures and models that present the theory. Models include a
subset called empirical substructures that correspond to the actual phenomena.20
The type of correspondence that van Fraassen prefers is isomorphism. In his
account of constructive empiricism, van Fraassen draws a distinction between
acceptance of a theory and belief in the truth of a theory. An empiricist does not
believe that theory explains the unobservable parts, but only accepts a theory
because its empirical adequacy – that is, those parts of the models of the theory
called empirical substructures – are isomorphic to the observational parts of the
object investigated.
Giere’s constructive realism contrasts with van Fraassen’s empirical account
on two aspects. First, Giere holds a realist position that models can represent the
underlying causal structure which may be unobservable. Second, he regards iso-
morphic mappings as rare in science and suggests similarity relations instead.
Models for Giere, such as Watson and Crick’s scale model of DNA and geo-
graphical maps, exhibit a particular similarity of structure between models and
19 This paper influenced Suppe’s version of the semantic view (see Suppe, 1989).
20 Teller (2001) distinguishes many versions of empirical substructures in van Fraassen’s work.
Structure 285
the real system (Giere, 1997, pp. 21–24). In Giere’s account a theoretical hy-
pothesis is a statement asserting the relationships between a model and a real
world. Models that satisfy the axioms are the means to represent the real world
up to similarity. There is no truth relationship of correspondence between the-
oretical hypothesis and the real world, as the received view claims. For Giere,
a theoretical hypothesis has the general form as: “Such-and-such identifiable
real system is similar to a designated model in indicated respects and degrees.”
(Giere, 1988, p. 81). What is the concern is the similarity relationship between
models and the real world that requires a “redundancy theory” of truth only
(Giere, 1988, pp. 78–82).
Similarity propose a weaker interpretation of the relationship between a
model and a designated real system than isomorphism, in the sense that isomor-
phism requires models being “perfect” to exhibit all the details and information
contained in the object, but only selected features.21 Similarity account seems
similar to Mary Hesse’s (1966) analogy account in that there are positive analo-
gies between the model and the object, yet Giere has been cautious in realizing
the problem of a vacuous claim in similarity since anything is similar to anything
else. He then introduces the role of scientists who are able to specify the rele-
vant degrees and respects of similarity between model and the world. As Giere
(2004, pp. 747–748) once put it,
Note that I am not saying that the model itself represents an aspect of the world because it
is similar to that aspect. There is no such representational relationship. Anything is similar
to anything else in countless respects, but not anything represents anything else. It is not
the model that is doing the representing; it is the scientist using the model who is doing the
representing. One way scientists do this is by picking out some specific features of the model
that are then claimed to be similar to features of the designated real system to some (perhaps
fairly loosely indicated) degree of fit. It is the existence of the specified similarities that makes
possible the use of the model to represent the real system in this way.
The message that the semantic view delivers is models’ structural representa-
tion to the theory and to the world. Different versions of the semantic view
may shed some interesting light on the methodology of econometric models.
Giere’s picture of the relations between theory, model, and data is useful to un-
derstanding the methodology and practices of economics (see Morgan, 1998).
Van Fraassen’s constructive empiricism is compatible with the LSE approach.
Hendry’s empiricist position can be described by his statement: “the proof of
empirical puddings lies in their eating, not a priori views” (Hendry, 1997,
p. 168).23 Unobservable entities (e.g., the DGP for the LSE approach) exist,
22 The contribution to the issue of meaningfulness in the theory of measurement is mainly due to
Louis Narens. See Falmange and Narens (1983), Narens (1985), and Luce and Narens (1987).
23 This is a paraphrase of Tinbergen’s famous quote in his reply to Keynes (Tinbergen, 1940, p. 154).
288 H.-K. Chao
explaining that the existence of the unobservables is not the purpose; rather it is
to construct a model to match the observed data. This matches Hendry’s concept
of congruence.
The representational theory of measurement asserts that quantitative repre-
sentations (i.e., measurement) are required to satisfy both representation and
uniqueness theorems. In measuring utility, proving the existence of the rep-
resentation theorem is essential in building utility functions, even though the
axiomatization does not proceed in an explicit set-theoretical way. In economet-
rics, a discipline of measurement, can this structural approach of measurement
shed some light on the measurement of structure? In other words, can we ap-
ply the structural approach to measurement to understand the measurement of
structure in econometrics?
At the outset, econometricians do regard their models as representation. We
see econometricians not only customarily call the VAR models the “VAR rep-
resentation”, but also go further to prove theorems for securing models’ rep-
resentation, even though such representation theorem in econometrics are not
presented in terms of relational structures. Perhaps the most famous one is
the Granger representation theorem by Engle and Granger (1987), in which
they prove that co-integrated variables can be represented as an error correction
model.
Furthermore, the issue concerning uniqueness theorems consists apparently
with (a certain type of) the identification problem – the central topic in the
above discussion on measuring structure in econometrics. The basic idea of
identification is based on the observational equivalence problem. Observational
equivalent structures generate the same data or have the same probability density
function. Cooley and LeRoy (1985) argue that Sims’s VARs are not identified,
because we can easily find several observationally equivalent VARs that gen-
erate the same probability distribution for the data. Observationally equivalent
structures can be related by what Hsiao (1983, p. 231) calls “admissible transfor-
mation”, which is equivalent to that we understand in unique theorems. Hence,
to find the admissible transformation is to prove the uniqueness theorem for a
class of observational equivalent structures. The rank and order conditions for
simultaneous equations models are thus considered as a type of uniqueness the-
orem.
To solve the identification or uniqueness problem, as described in the pre-
vious part in this chapter, econometricians usually employ the theory view of
structure. Different sets of identifying restrictions imply different theoretical in-
terpretations. Christ (1966, p. 298) describes the identification problem in the
following way:
It is a truism that any given observed fact, or any set of observed facts, can be explained in
many ways. That is, a large number of hypotheses can be framed, each of which if true would
account for the observance of the given fact or facts.
because there can be many theories capable of interpreting the world, we can-
not determine the true theory of data by appealing to the data alone. Theory
is thus underdetermined by data. When the theory view is applied to identify
the structure, it implies that econometricians hold strong priors (Hoover, 1988;
Sutton, 2000) on the theory in the face of the underdetermination problem:
identification primarily depends on economic theory, other information such
as the pragmatic factors for choosing between models (e.g., mathematical el-
egance or simplicity) are only secondary in importance. A particular economic
theory and its suggested identifying restrictions are true, and therefore other
theories and their restrictions can be ruled out. Again Christ (1966, p. 299)
claims:
The purpose of a model, embodying a priori information (sometimes called the maintained
hypothesis), is to rule out most of the hypotheses that are consistent with the observed facts.
The ideal situation is one in which, after appeal has been made both to the facts and to the
model, only one hypothesis remains acceptable (i.e., is consistent with both). If the “facts”
have been correctly observed and the model is correct, the single hypothesis that is consis-
tent with both facts and model must be correct; . . . . In a typical econometrics problem the
hypothesis we accept or reject is a statement about the relevant structure or a part of it or a
transformation of it.
Yet to rule out many other hypotheses according to the theory regarded as real
does not suggest that others are false. Uniqueness theorems merely imply that
the accepted theory is more fundamental so that other theories can be reduced to
this fundamental theory (see Suppes, 2002). The transformation (or reduction)
to a unique structure (or models representing theoretical structures) indicates
invariance under transformation. In the measurement theory, since a unique-
ness theorem for a scale asserts transformation among the relational structures
up to an isomorphism, the existence of the same structure is a prerequisite for
invariance under transformation.
If the purpose of econometric models is to represent something, be it theory or
data, then representation theorems are thus required to determine which model
is an acceptable representation. We seldom see econometricians write down spe-
cific representation theorems of the sort, but they are implicitly stated. As seen
already, the notion of structure is usually defined in a set-theoretical way. So a
representation theorem for econometric models may be (loosely) stated as:
or
11.5. Conclusion
One of the major themes in econometrics is the definition and the measurement
of the notion of structure. We have distinguished between two definitions of
structure: the theory view and the invariance view. The theory view is to consider
whether the particular chosen theory is true. But if the theory view is accepted,
we can equally say that the theory view provides a “realism-about-structure”
attitude towards the relationships between theory and data: economic theory is
true to the invariant relations that we call structure.
It is more widely accepted (for both realists and empiricists) that structure is
understood in terms of invariance under intervention. Invariance is a matter of
degree. It provides a good arguable definition of structure, not only in economics
and econometrics, but also in other subjects for which the notion of structure is
an essential prerequisite.
The semantic view is a more appropriate approach to understanding and in-
terpreting econometrics than the received view. It is particularly because the
semantic view stresses the importance of the function and the role of models,
and because models are crucial devices for the aim of econometrics - bridg-
ing theory and data. There are several studies that have attempted to apply the
semantic view to econometrics, for example, Davis (2000), Chao (2002), and
Stigum (1990, 2003).24 Their views are similar to the idea presented in Morgan
and Morrison’s (1999) edited volume. The Morgan–Morrison volume provides
a broader interpretation of models than the semantic view. For them models are
“autonomous agents” in the sense that they have the merit of being not entirely
dependent on theory or data. Representation is one of models’ functions to me-
diate between theory and data. Nonetheless, as long as structure is concerned,
econometric models can involve representation and uniqueness theorems, as the
structural approach to measurement suggests, for representing the structure.
Acknowledgements
I am grateful for the suggestions from Kevin Hoover, Chao-Hsi Huang, Mary
Morgan and the participants of the Handbook’s review workshop held at the
Tinbergen Institute Amsterdam on April 21–22, 2006. I particularly thank Mar-
cel Boumans, the editor, for his invitation, and detail comments and suggestions
on an early draft of this chapter. Financial supports from the Department of
Economics, National Tsing Hua University and the Taiwan National Science
Council under grant 95-2415-H-007-006 are gratefully acknowledged.
24 The origin of Stigum’s work is regarded as connected with Haavelmo’s econometric methodol-
ogy. See Hendry and Morgan (1995, p. 68), and Moene and Rødesth (1991, p. 179n).
Structure 291
References
Attanasio, O.P. (1998). Consumption demand. Working paper No. 6466. NBER.
Balzer, W. (1992). The structuralist view of measurement: An extension of received measurement
theories. In: Savage, C.W., Ehrlich, P. (Eds.), Philosophical and Foundational Issues in Measure-
ment Theory. Lawrence Erlbaum Associates, Hillsdale, NJ, pp. 93–117.
Balzer, W., Hamminga, B. (Eds.) (1989). Philosophy of Economics. Kluwer, Amsterdam.
Blaug, M. (1992). The Methodology of Economics. Or How Economists Explain. second ed. Cam-
bridge Univ. Press, Cambridge.
Bernanke, B.S. (1985). Alternative explanations of the money-income correlation. In: Brunner, K.,
Meltzer, A.H. (Eds.), Real Business Cycles, Real Exchange Rates and Actual Policies. Carnegie-
Rochester Conference Series on Public Policy 25, pp. 49–100.
Blanchard, O.J., Watson, M.W. (1986). Are business cycles all alike? In: Gordon, R. (Ed.), The
American Business Cycle: Continuity and Change. Univ. of Chicago Press, Chicago, pp. 123–
179.
Boumans, M. (2002). Calibration. In: Snowdon, B., Vane, H.R. (Eds.), An Encyclopedia of Macro-
economics. Edward, Elgar, Cheltenham, pp. 105–109.
Caldwell, B.J. (1994). Beyond Positivism, revised ed. Routledge, London.
Cao, T.Y. (2003). Structural realism and the interpretation of quantum field theory. Synthese 136,
3–24.
Carnap, R. (1966). Philosophical Foundations of Physics. Basic Books, New York.
Chao, H.-K. (2002). Representation and structure: The methodology of econometric models of con-
sumption. PhD dissertation. Faculty of Economics and Econometrics, University of Amsterdam.
Chao, H.-K. (2003). Milton Friedman and the emergence of the permanent income hypothesis. His-
tory of Political Economy 35, 77–104.
Christ, C.F. (1966). Econometric Models and Methods. Wiley, New York.
Cooley, T.F., LeRoy, S.F. (1985). A theoretical macroeconometrics: A critique. Journal of Monetary
Economics 16, 283–308.
Davidson, J.E.H., Hendry, D.F., Srba, F., Yeo, S. (1978). Econometric modelling of the aggregate
time-series relationship between consumers’ expenditure and income in the United Kingdom.
Economic Journal 88, 661–692.
Davis, G.C. (2000). A semantic conception of Haavelmo’s structure of econometrics. Economics
and Philosophy 16, 205–228.
Demiralp, S., Hoover, K.D. (2003). Searching for causal structure of a vector autoregression. Oxford
Bulletin of Economics and Statistics 65, 745–767.
Díez Calzada, J.A. (2000). Structuralist analysis of theories of fundamental measurement. In:
Balzer, W., Sneed, J.D., Moulines, C.U. (Eds.), Structuralist Knowledge Representation: Par-
adigmatic Examples. Poznan Studies in the Philosophy of the Sciences and Humanities. 75,
pp. 19–49.
Engle, R.F., Granger, C.W.J. (1987). Co-integration and error correction: Representation, estimation,
and testing. Econometrica 55, 251–276.
Epstein, R.J. (1987). A History of Econometrics. Elsevier, Amsterdam.
Falmange, J.-C., Narens, L. (1983). Scales and meaningfulness of quantitative laws. Synthese 55,
287–325.
Flavin, M.A. (1981). The adjustment of consumption to changing expectations about future income.
Journal of Political Economy 89, 974–1009.
Friedman, M. (1957). A Theory of the Consumption Function. Princeton Univ. Press, Princeton.
Haavelmo, T. (1944). The probability approach in econometrics. Econometrica 12, 1–118 (Supple-
ment).
Haavelmo, T. (1947). Methods of measuring the marginal propensity to consume. Journal of Amer-
ican Statistical Association 42, 105–122.
Hall, R.E. (1978). Stochastic implications of the life cycle-permanent income hypothesis: Theory
and evidence. Journal of Political Economy 86, 971–987.
292 H.-K. Chao
Hall, R.E. (1990). Survey of research on the random walk of consumption. In: The Rational Con-
sumer. MIT Press, Cambridge, MA, pp. 131–157.
Hamilton, J.D. (1994). Time Series Analysis. Princeton Univ. Press, Princeton.
Hands, D.W. (1985). The structuralist view of economic theories: A review essay. Economics and
Philosophy 1, 303–335.
Hansen, L.P., Singleton, K.J. (1983). Stochastic consumption, risk aversion and the temporal behav-
ior of asset returns. Journal of Political Economy 91, 249–265.
Hartley, J.E., Hoover, K.D., Salyer, K.D. (1998). The limits of business cycle research. In: Real
Business Cycles: A Reader. Routledge, London, pp. 3–42.
Hendry, D.F. (1995). Dynamic Econometrics. Oxford Univ. Press, Oxford.
Hendry, D.F. (1997). On congruent econometric relations: A comment. Carnegie-Rochester Confer-
ence Series on Public Policy 47, 163–190.
Hendry, D.F. (2000). Econometrics: Alchemy or Science? New ed. Oxford Univ. Press, Oxford.
Hendry, D.F., Mizon, G.E. (1990). Procrustean econometrics: Or stretching and squeezing data. In:
Granger, C.W.J. (Ed.), Modelling Economic Series. Oxford Univ. Press, Oxford.
Hendry, D.F., Mizon, G.E. (2000). On selecting policy analysis models by forecast accuracy. In:
Atkinson, A.B., Glennester, H., Stern, N.H. (Eds.), Putting Economics to Work. Volume in Honor
of Michio Morishima. London School of Economics, London, pp. 71–119.
Hendry, D.F., Morgan, M.S. (1995). Introduction. In: Hendry, D.F., Morgan, M.S. (Eds.), The Foun-
dations of Econometric Analysis. Cambridge Univ. Press, Cambridge, pp. 1–82.
Hesse, M.B. (1966). Models and Analogies in Science. Notre Dame Univ. Press, Notre Dame.
Hoover, K.D. (1988). The New Classical Macroeconomics: A Skeptical Inquiry. Basil Blackwell,
Oxford.
Hoover, K.D. (1994). Econometrics as observation: The Lucas critique and the nature of econometric
inference. Journal of Economic Methodology 1, 65–80.
Hoover, K.D. (2001). Causality in Macroeconomics. Cambridge Univ. Press, Cambridge.
Hoover, K.D., Jordá, O. (2001). Measuring systematic monetary policy. Federal Reserve Bank of St.
Louis Review 113–137.
Hsiao, C. (1983). Identification. In: Griliches, Z., Intriligator, M.D. (Eds.), Handbook of Economet-
rics, vol. 1. Elsevier, Amsterdam, pp. 223–283.
Hurwicz, L. (1962). On the structural form of interdependent systems. In: Nagel, E., Suppes, P.,
Tarski, A. (Eds.), Logic, Methodology and Philosophy of Science: Proceedings of the 1960 Inter-
national Congress. Stanford Univ. Press, Stanford, pp. 232–239.
Giere, R.N. (1988). Explaining Science: A Cognitive Approach. Univ. of Chicago Press, Chicago.
Giere, R.N. (1997). Understanding Scientific Research, fourth ed. Holt, Rinehart and Winston, New
York.
Giere, R.N. (2004). How models are used to represent reality. Philosophy of Science 71, 742–752.
Girshick, M.A., Haavelmo, T. (1947). Statistical analysis of the demand for food: Examples of si-
multaneous estimation of structural equations. Econometrica 15, 79–110.
Keating, J. (1990). Identifying VAR models under rational expectations. Journal of Monetary Eco-
nomics 25, 453–476.
Kim, J., de Marchi, N., Morgan, M.S. (1995). Empirical model particularities and belief in the natu-
ral rate hypothesis. Journal of Econometrics 67, 81–102.
Koopmans, T.C. (1947). Measurement without theory. Review of Economics and Statistics 29, 161–
172.
Koopmans, T.C. (1957). Three Essays on the State of Economic Science. McGraw-Hill, New York.
Krantz, D.H., Luce, R.D., Suppes, P., Tversky, A. (1971). Foundations of Measurement, vol. 1:
Additive and Polynomial Representations. Academic Press, New York.
Kuhn, T.S. (1962). The Structure of Scientific Revolutions. Univ. of Chicago Press, Chicago.
Leeper, E.M., Zha, T. (2003). Modest policy interventions. Journal of Monetary Economics 50,
1673–1700.
Liu, T.-C. (1960). Underidentification, structural estimation and forecasting. Econometrica 28, 855–
865.
Structure 293
Liu, T.-C. (1963). Structural estimation and forecasting: A critique of the Cowles Commission
method. Tsing-Hua Journal 3–4, 152–171.
Lucas, R.E. (1973). Some international evidence on output–inflation tradeoffs. American Economic
Review 63, 326–334.
Lucas, R.E. (1976). Econometric policy evaluation: A critique. In: Brunner, K., Meltzer, A.H. (Eds.),
The Phillips Curve and Labor Markets. Carnegie-Rochester Conference Series on Public Policy
1, pp. 19–46.
Luce, R.D., Krantz, D.H., Suppes, P., Tversky, A. (1990). Foundations of Measurement, vol. 3:
Representation, Axiomatization, and Invariance. Academic Press, San Diego.
Maddala, G.S. (2001). Introduction to econometrics, third ed. Wiley, New York.
Marschak, J. (1953). Economic measurement for policy and prediction. In: Hood, W.C., Koopmans,
T.C. (Eds.), Studies in Econometric Method. Cowles Commission Monograph 14. Wiley, New
York, pp. 1–26.
Mayer, T. (1972). Permanent Income, Wealth, and Consumption: A Critique of the Permanent In-
come Theory, the Life-Cycle Hypothesis and Related Theories. Univ. of California Press, Berke-
ley.
Moene, K.O., Rødseth, A. (1991). Nobel laureate Trygve Haavelmo. Journal of Economic Perspec-
tives 5, 175–92.
Morgan, M.S. (1990). The History of Econometric Ideas. Cambridge Univ. Press, Cambridge.
Morgan, M.S. (1998). Models. In: Davis, J.B., Hands, D.W., Mäki, U. (Eds.), The Handbook of
Economic Methodology. Edward Elgar, Cheltenham, pp. 316–321.
Morgan, M.S., Morrison, M. (1999). Models as Mediators: Perspectives on Natural and Social Sci-
ence. Cambridge Univ. Press, Cambridge.
Narens, L. (1985). Abstract Measurement Theory. MIT Press, Cambridge, MA.
Narens, L., Luce, R.D. (1987). Meaningfulness and invariance. In: Eatwell, J., Milgate, M., New-
man, P. (Eds.), The New Palgrave: A Dictionary of Economics, vol. 3. Macmillan Reference
Limited, London, pp. 417–21.
Qin, D. (1993). The Formation of Econometrics: A Historical Perspective. Oxford Univ. Press, Ox-
ford.
Scott, D., Suppes, P. (1958). Foundational aspects of theories of measurement. Journal of Symbolic
Logic 23, 113–128.
Sims, C.A. (1980). Macroeconomics and reality. Econometrica 48, 1–48.
Sims, C.A. (1982). Policy analysis with econometric models. Brooking Papers on Economic Activity
107–152.
Sims, C.A. (1986). Are forecasting models usable for policy analysis? Federal Reserve Bank of
Minneapolis Quarterly Review 10, 2–16.
Sims, C.A. (1998). The role of interest rate policy in the generation and propagation of business
cycles: What has changed since the ‘30s? In: Fuhrer, J.C., Schuh, S. (Eds.), Beyond Shocks:
What Causes Business Cycles? Federal Reserve Bank of Boston, Boston, pp. 121–175.
Stegmüller, W., Balzer, W., Spohn, W. (Eds.) (1982). Philosophy of Economics. Springer-Verlag,
Berlin.
Stevens, S.S. (1959). Measurement, psychophysics and utility. In: Churchman, C.W., Ratoosh, P.
(Eds.), Measurement: Definitions and Theories. Wiley, New York, pp. 18–63.
Stigum, B.P. (1990). Toward a Formal Science of Economics. MIT Press, Cambridge, MA.
Stigum, B.P. (2003). Econometrics and the Philosophy of Economics: Theory-Data Confrontations
in Economics. Princeton Univ. Press, Princeton.
Stock, J.H., Watson, M.W. (2001). Vector autoregressions. Journal of Economic Perspectives 15,
101–116.
Suppe, F. (Ed.) (1977). The Structure of Scientific Theories, second ed. Univ. of Illinois Press, Ur-
bana.
Suppe, F. (1989). The Semantic Conception of Theories and Scientific Realism. Univ. of Illinois
Press, Urbana.
Suppe, F. (2000). Understanding scientific theories: an assessment of developments, 1969–1988.
Philosophy of Science 67, S102–S115.
294 H.-K. Chao
Abstract
We investigate a phenomenon which is well known in applied econometrics and
statistics: an auxiliary parameter (say θ ) is significant in a diagnostic test, but
ignoring it (setting θ = 0) makes very little difference for the parameter of in-
terest (say β). In other words, the estimator for β is not sensitive to variations
in θ . We shall argue that sensitivity analysis is often more relevant than diagnos-
tic testing, and we shall review some of the sensitivity results that are currently
available. In fact, sensitivity analysis and diagnostic testing are both important
in econometrics. They play different and, as we shall see, orthogonal roles.
12.1. Motivation
Suppose we are given a cloud of points, as in Fig. 12.1, and assume that these
points are generated by a linear relationship
yt = β0 + β1 xt + εt (t = 1, . . . , n),
and this estimator would be best linear unbiased. Secondly, and more realisti-
cally, we might know the structure of Ω (for example, an AR(1) process) but
not the values of the parameters. Thus we would know that Ω = Ω(θ ) where
θ is a finite-dimensional parameter vector. Since θ is unknown it needs to be
estimated, say by θ̃ . If θ̃ is a consistent estimator (not necessarily efficient), and
writing Ω := Ω(θ̃ ), we obtain the feasible GLS estimator
−1 X −1 X Ω
β̃ ∗ = X Ω −1 y.
In most cases, however, not even the structure of Ω is known. We could then
sequentially test, thereby adding more and more noise to our estimator for β,
or we might simply set Ω = I . In the latter case, we obtain the ordinary least
squares (OLS) estimator for β:
β̂ = (X X)−1 X y.
One question is how good or bad this OLS estimator is, in other words: how
sensitive it is to variations in Ω. Let us consider the data plotted in Fig. 12.1.
If we estimate the relationship by OLS we obtain the solid line, labeled “OLS
regression”. In fact, the data have been generated by some ARMA process, and
therefore the OLS estimator is not the “right” estimator. Consider the hypoth-
esis that the {εt } form an AR(1) process, so that the elements of Ω depend on
just one parameter, say θ . We don’t know whether this hypothesis is true (in
fact, it is not), but we can test the hypothesis using a diagnostic, such as the
Durbin–Watson statistic. In the present case, the diagnostic is statistically very
significant, so that we must reject the null hypothesis (no autocorrelation) in fa-
vor of the alternative hypothesis (positive autocorrelation). Given the outcome
of the diagnostic test, we estimate the AR(1) parameter θ from the OLS residu-
als, and estimate β again, this time by feasible GLS. This yields the broken line,
labeled “GLS regression”.
The broken line is hardly visible in Fig. 12.1, because the OLS and GLS
estimates coincide almost exactly. Hence, the OLS estimates are not sensitive to
whether θ is zero or not, even though the diagnostic test has informed us that θ is
significantly different from zero. One may argue that although β̂ is not affected
by the presence θ , the variance of β̂ will be affected. This is true. In fact, the
variance will be quite sensitive to variance misspecification for reasons that will
become clear in Section 12.3. This is precisely the reason why in current-day
econometrics we usually estimate the variance by some “robust” method such
as the one proposed by Newey and West (1987).
This simple example illustrates a phenomenon which is well known in applied
econometrics and statistics. An auxiliary parameter (like θ ) may show up in a
diagnostic test as significant, but ignoring it (setting θ = 0) makes very little
difference for the parameter of interest (here β). In other words, the estimator
for β is not sensitive to variations in θ . We shall argue that sensitivity analysis
is often more relevant than diagnostic testing, and we shall review some of the
sensitivity results that are currently available. In fact, sensitivity analysis and
diagnostic testing are both important in econometrics. They play different and,
as we shall see, orthogonal roles.
We shall thus be concerned with models containing two sets of parameters:
focus parameters (β) and nuisance parameters (θ ). In such models one often has
a choice between the unrestricted estimator β̃ (based on the full model) and the
restricted estimator β̂, estimated under the restriction θ = 0. Let us introduce the
function β̂(θ ), which estimates β for each fixed value of θ . The unrestricted and
restricted estimators can then be expressed as β̃ = β̂(θ̃ ) and β̂ = β̂(0), respec-
tively.
A Taylor expansion gives
∂ β̂(θ )
β̃ − β̂ = β̂ θ̃ − β̂(0) = θ̃ + Op (1/n),
∂θ θ=0
y = Xβ + u, (12.1)
are unbiased and efficient. If θ = 0, then β̂ and ŷ are, in general, no longer effi-
cient. If we know the structure of Ω and the values of the m elements of θ , then
generalized least squares (GLS) is more efficient. If we know the structure Ω
but not the value of θ , then estimated GLS is not necessarily more efficient than
OLS. But in the most common case, where we don’t even know the structure Ω,
we have to determine Ω and estimate θ . The question then is whether the result-
ing estimator for β (or Xβ) is “better” than the OLS estimator β̂. In sensitivity
analysis we don’t ask whether the nuisance parameters (here the θ -parameters)
are significantly different from 0 or not. Instead we ask directly whether the
GLS estimators β̂(θ ) and ŷ(θ ) are sensitive to deviations from the white noise
assumption.
If θ is known, then the parameters β and σ 2 can be estimated by generalized
least squares. Thus,
−1
β̂(θ ) = X Ω −1 (θ )X X Ω −1 (θ )y (12.2)
and
The OLS estimators are then given by β̂ := β̂(0), σ̂ 2 := σ̂ 2 (0), and ŷ := ŷ(0).
We wish to assess how sensitive (linear combinations of) β̂(θ ) is with respect
to small changes in θ , when θ is close to 0. The predictor is the linear combi-
nation most suitable for our analysis. Since any estimable linear combination of
β̂(θ ) is a linear combination of ŷ(θ ), and vice versa, this constitutes no loss of
generality.
We now define the sensitivity of the predictor ŷ(θ ) (with respect to θs ) as
∂ ŷ(θ )
zs := (s = 1, . . . , m), (12.5)
∂θs θ=0
300 J.R. Magnus
we thus propose
so that, at θ = 0,
Hence, using (12.6) and the fact that σ̂ 2 = y My/(n − k), we obtain
u Ws u u Ws u
Bs = = .
u Mu u Ws u + u (M − Ws )u
The condition 0 < rs < n − k implies that both Ws and M − Ws have rank 1.
It follows that u Ws u ∼ σ 2 χ 2 (rs ), u (M − Ws )u ∼ σ 2 χ 2 (n − k − rs ), and the
two quadratic forms are independent, because (M − Ws )Ws = 0. Therefore, Bs
follows a Beta-distribution. Summarizing, we have found
T HEOREM 1. We have
y Ws y
zs = −Cs y and Bs = ,
y My
n − k − rs Bs
· ∼ F(rs , n − k − rs ).
rs 1 − Bs
This is a common situation for stationary processes and the matrix Cs then be-
comes Cs = (In − M)T (h) M. Our particular focus – and the most important
special case in practice – is As = T (1) , where
⎛ ⎞
0 1 0 ... 0 0
⎜1 0 1 ... 0 0⎟
⎜ ⎟
⎜0 1 0 ... 0 0⎟
T =⎜
(1)
⎜ ... .. .. .. .. ⎟ . (12.7)
⎜ . . . .⎟ ⎟
⎝0 0 0 ... 0 1⎠
0 0 0 ... 1 0
In the standard linear model we have two parameters of interest: the mean pa-
rameters β and the variance parameter σ 2 . Having studied the sensitivity of the
mean parameter in the previous section, we now turn our attention to the sensi-
tivity of the variance estimator σ̂ 2 (θ ) with respect to small changes in θ .
It is more convenient to consider log σ̂ 2 instead of σ̂ 2 . Thus we define
∂ log σ̂ 2 (θ )
Ds := (12.8)
∂θs θ=0
and hence, at θ = 0,
∂ σ̂ 2 (θ )
(n − k) = 2y MCs y − y MAs My = −y MAs My,
∂θs
T HEOREM 2. We have
y MAs My v P As P v
Ds = − = − ,
y My vv
Theorem 2 shows that Ds has the same form as the DW-statistic. The most
important special case occurs again when As = T (1) (that is, AR(1) or MA(1)
or ARMA(1, 1)). The corresponding Ds -statistic will be denoted by D1. This
case was considered by Dufour and King (1991, Theorem 1) as a locally best
Local Sensitivity in Econometrics 303
invariant test of θ = 0 against θ > 0, where θ denotes the AR(1) parameter. Not
surprisingly, D1 is closely related to the DW-statistic, a fact first observed by
King (1981).
An immediate consequence of Theorems 1 and 2 and the fact that
n
n
n
n
û T (1) û = 2 ût ût−1 = − (ût − ût−1 )2 + û2t + 2
ût−1
t=2 t=2 t=2 t=2
û W (1) û û T (1) û
Bs := B1 = , Ds := D1 = − = DW − 2 + R/n,
û û û û
where
−
W (1) := C (1) C (1) C (1) C (1) , C (1) := (I − M)T (1) M,
n
2
n
DW = ût − ût−1 ût2 ,
t=2 t=1
and R = (û21 + ûn2 )/( ût2 /n) is a remainder term.
The matrix T (1) is equally relevant in the AR(1) and MA(1) case (and in-
deed, the ARMA(1, 1) case). From Theorem 3 we see that B1 and D1 depend
on T (1) , and hence are identical for AR(1) and MA(1). This explains, inter alia,
the conclusion of Griffiths and Beesley (1984) that a pretest estimator based
on an AR and an MA pretest performs essentially the same as a pretest es-
timator based on only an AR pretest. Any likelihood-based test (such as the
Lagrange multiplier test) uses the derivatives of the log-likelihood, in particular
∂Ω(θ )/∂θs . Under the null hypothesis that θ = 0 the test thus depends on As ,
which explains why As plays such an important role in many test statistics. Any
pretest which depends on As = T (1) will not be appropriate to distinguish be-
tween AR(1) and MA(1). A survey of the DW and D1 statistics is given in King
(1987).
x1 constant 1, 1, 1, . . .
x2 time trend 1, 2, 3, . . .
x3 normal distribution E(x3 ) = 0, var(x3 ) = 9
x4 lognormal distribution E(log x4 ) = 0, var(log x4 ) = 9
x5 uniform distribution −2 x5 2
These regressors can be combined in various data sets. We consider five data
sets with two regressors and five with three regressors, as follows:
k=2 k=3
For each of the ten data sets we calculate B1∗ and D1∗ such that
where α = 0.05 and the disturbances are assumed to be white noise. In Fig. 12.2
we calculate
under the assumption that the disturbances are AR(1) for values of θ between
0 and 1. Each line in the figure corresponds to one of the ten different data
sets. As noted before, the D1-statistic is essentially the DW-statistic. As a re-
sult, Pr(D1 D1∗ ) can be interpreted as the power of D1 in testing θ = 0
against θ > 0. Alternatively we can interpret Pr(D1 D1∗ ) as the sensitiv-
ity of σ̂ 2 with respect to θ . In the same way, B1 measures the sensitivity of ŷ
(and β̂) with respect to θ . One glance at Fig. 12.2 shows that B1 is quite insen-
sitive, hence robust, with respect to θ , even for values of θ close to 1. The figure
Local Sensitivity in Econometrics 305
shows the probabilities Pr(B1 > B1∗ ) and Pr(D1 D1∗ ) for n = 25. The main
conclusion is that D1 is quite sensitive to θ but B1 is not. Hence, the D1 or
DW-statistic may indicate the OLS is not appropriate since θ is “significantly”
different from 0, but the B1 statistic shows that the estimates ŷ and β̂ are little
effected. This explains and illustrates a phenomenon well known to all applied
econometricians, namely that OLS estimates are “robust” to variance misspeci-
fication (although their distribution may be less robust).
If θ is close to 1, then the limit (or the limiting distribution) can be calculated
from Banerjee and Magnus (1999, Appendix 2). The flatness of the B1-curves
suggests that B1 and D1 are near-independent. This “near-independence” is
based on asymptotic independence, a fact proved in Section 12.8. For n = 25
and θ = 0.5 we would decide in only about 7–10% of the cases that ŷ is sensitive
with respect to θ .
Figure 12.2 gives the sensitivities for one value of n, namely n = 25. To see
how B1 depends on n we calculate for each of our ten data sets Pr(B1 > B1∗ )
for three values of n (n = 10, 25, 50) and one variance specification: AR(1).
The results are given in Table 12.1. Table 12.1 confirms our earlier statements.
In only 5–10% of the cases would we conclude that ŷ and β̂ are sensitive to
AR(1) disturbances. High values of n are needed to get close to the probability
limit and the higher is θ > 0, the higher should be n.
Our calculations thus indicate that OLS is very robust against AR(1) (in
fact, ARMA(1, 1)) disturbances. In only about 5–10% of the cases does the
306 J.R. Magnus
than is necessary, but of course much less frequently than if we were using the
DW-test.
Suppose now that we are interested, not in estimation, but in testing, in particular
testing linear restrictions while we are uncertain about the distribution of the
disturbances. The set-up is the same as in Sections 12.3 and 12.4. We have a
linear regression model y = Xβ + u, where u follows a normal distribution with
mean zero and variance σ 2 Ω(θ ). In this section, to simplify notation, we assume
that θ consists of a single parameter; hence m = 1.
If there are restrictions on β, say Rβ = r0 , where R is a q × k matrix of rank
q 1, then the restricted GLS estimator for β is given by
−1 −1 −1
β̃(θ ) = β̂(θ ) − X Ω −1 (θ )X R R X Ω −1 (θ )X R R β̂(θ ) − r0 ,
where
−1
β̂(θ ) = X Ω −1 (θ )X X Ω −1 (θ )y.
If we assume that θ is known, then the usual F -statistic for testing the hypothesis
Rβ = r0 can be written as
or alternatively as
where
Notice that the equality of (12.9) and (12.10) holds whether or not the restriction
Rβ = r0 is satisfied. Of course, under the null hypothesis H0 : Rβ = r0 , F (θ ) is
distributed as F(q, n − k).
Suppose we believe that θ = 0, which may or may not be the case. Then we
would use the OLS estimator β̂(0) or the restricted OLS estimator β̃(0). We now
define the symmetric idempotent n × n matrix
−1
B := X(X X)−1 R R(X X)−1 R R(X X)−1 X , (12.11)
308 J.R. Magnus
MB = 0, rk(M) = n − k, rk(B) = q.
T HEOREM 4. We have
n−k
ϕ = 2 F (0) + θ̂ − θ̃ , (12.13)
q
where
ũ ũ − û û n − k
F (0) = · , (12.15)
û û q
û and ũ denote the unrestricted and restricted OLS residuals, and A is again
defined as dΩ(θ )/dθ at θ = 0.
but, since these quadratic forms are not independent, it does not appear feasible
to obtain the density of ϕ in closed form. We shall obtain certain limiting results
in Section 12.6, and also the first two moments of ϕ exactly.
The notation θ̂ and θ̃ in (12.13) and (12.14) suggests that these statistics can
be interpreted as estimators of θ . This suggestion is based on the following ar-
gument. We expand Ω(θ ) as
1
Ω(θ ) = In + θ A + θ 2 H + O θ 3 ,
2
where A is the first derivative of Ω, and H is the second derivative, both at
θ = 0. Then,
1
Ω −1 (θ ) = In − θ A + θ 2 2A2 − H + O θ 3 .
2
If the y-process is covariance stationary, we may assume that the diagonal ele-
ments of Ω are all ones. Then, tr A = tr H = 0 and
dΩ −1 (θ )
tr · Ω(θ ) = θ tr A2 + O θ 2 . (12.16)
dθ
dΩ −1 (θ )
û (θ ) û(θ ) = −û Aû + θ (2û AMAû − û H û) + O θ 2 . (12.17)
dθ
The maximum likelihood estimator for θ is obtained by equating (12.16) and
(12.17); see Magnus (1978). This gives
û Aû 1 û Aû 1
θ̂ML ≈ = 1 + n− 2 δ ,
2û AMAû − û H û − tr A2 2 û û
The general results for the F -test can be found in Banerjee and Magnus (2000,
Section 4). If q = 1 then the null hypothesis is written as H0 : r β = r0 , where r
is a given k × 1 vector. The matrix B has rank one and can be written as B = bb ,
where
X(X X)−1 r
b=
r (X X)−1 r
using (12.13)–(12.15) and the facts that û = Mu and ũ = (M + bb )u. Let M =
SS , S S = In−k , so that S b = 0. Define the vector v := S u and the scalar
η1 := b u, so that v and η1 are independent. Then,
and hence
R1 − b Ab
E(ϕ | v) =
w
and
We now use Pitman’s lemma, recognizing the fact that R1 and w are indepen-
dent, and, similarly, that R2 and w are independent. Since
1 n−k 1 (n − k)2
E = , E = ,
w n−k−2 w2 (n − k − 2)(n − k − 4)
we obtain
and
n−k 3(n − k)(b Ab)2
E ϕ2 = + 4b AMAb
n−k−2 n−k−4
6 tr(AM)2 + 2(tr AM)2 6(b Ab)(tr AM)
+ − .
(n − k + 2)(n − k − 4) n−k−4
where
Let μ1 denote the largest eigenvalue (in absolute value) of A. Then condition
(ii) guarantees that μ1 remains bounded for all n. As a result,
2
b Ab
(b Ab)2 = μ21 ,
b b
(Ab) M(Ab) b A2 b
b AMAb = · · b b μ21 ,
(Ab) (Ab) bb
| tr AM| = tr A(In − M) μ1 tr(In − M) = kμ1 ,
and
tr AM tr(AM)2
E(R1 ) = → 0, E R12 = → 0,
n−k (n − k)(n − k + 2)
Rule of thumb: The t-statistic is sensitive (at the 50% level) to variance mis-
specification if and only if |ϕ|/c > 0.40.
In practice, we may compute ϕ from (12.13) and c from (12.19) and check
whether |ϕ| > 0.40c. If we know the type of variance misspecification which
could occur, we use the A-matrix corresponding to this type of misspecification.
In most situations we would not know this. Then we use the Toeplitz matrix T (1) ,
defined in (12.7), as our A-matrix. We know from Sections 12.2–12.4 that this
is the appropriate matrix for AR(1), MA(1) and ARMA(1, 1) misspecification.
There is evidence that the probability that |ϕ| > 0.40c is extremely close to
0.50. In other words, 0.40c is an excellent approximation to the exact (finite
sample) median of |ϕ|. Park et al. (2002) use the fact that serial correlation has
little bearing on the robustness of t- and F -tests in a study on the effects of
temperature anomalies and air pressure/wind fluctuations of the sea surface on
the supplies of selected vegetables and melons.
314 J.R. Magnus
Let us now further consider the relationship between the sensitivity statistic and
a diagnostic test. In Fig. 12.4 we assume for simplicity that k = m = 1, so that
there is one focus parameter β and one nuisance parameter θ . At (β̃, θ̃ ) we
obtain the maximum of the likelihood , ˜ while at (β̂, 0), we obtain the restricted
ˆ
maximum . For every fixed value of θ , let β̂(θ ) denote the value of β which
maximizes the (restricted) likelihood. The locus of all constrained maxima is the
curve
C := β̂(θ ), θ, β̂(θ ), θ .
ˆ and (β̃, θ̃ , )
In particular, the points (β̂, 0, ) ˜ are on this curve.
The β̂(θ )-curve is thus the projection of the curve C onto the (β, θ )-plane; we
shall call this projection the sensitivity curve. In contrast, if we project C onto
the (θ, )-plane, we obtain the curve ˆ defined as
ˆ ) := β̂(θ ), θ ,
(θ
which we shall call the diagnostic curve. The diagnostic curve ˆ in the (θ, )-
plane contains all relevant information needed to perform the usual diagnostic
ˆ θ̃ ) − (
tests. In particular, the LR test is based on ( ˆ θ̂ ), the Wald test is based
ˆ ) at θ = 0.
on θ̃ , and the LM test is based on the derivative of (θ
Analogous to the LM test in the (θ, )-plane, the (local) sensitivity of β̂ is the
derivative of β̂(θ ) at θ = 0 in the (β, θ )-plane,
∂ β̂(θ )
Sβ̂ := .
∂θ θ=0
The sensitivity thus measures the effect of small changes in θ on the restricted
ML estimator β̂ and the sensitivity curve contains all restricted ML estimators
β̂(θ ) as a function of θ .
One might think that sensitivity and diagnostic – although obviously not the
same – are nevertheless highly correlated. We shall now argue that this is not
the case. In fact, they are asymptotically independent, as demonstrated by Mag-
nus and Vasnev (2007). The fact that the sensitivity curve and the diagnostic
curve in Fig. 12.4 live in different planes suggests this orthogonality result, but
constitutes no proof.
Since this independence result is a crucial aspect of the importance if sen-
sitivity analysis, let us consider first the simplest example, namely the linear
regression model
y = Xβ + Zθ + ε, ε | (X, Z) ∼ N 0, σ 2 In ,
where (β, σ 2 ) is the focus parameter and θ is the nuisance parameter. We are
interested in the sensitivity of β with respect to θ . The restricted estimator is
β̂ = (X X)−1 X y and the Lagrange multiplier (LM) test takes the form
y MZ(Z MZ)−1 Z My
LM = .
y My/n
and hence the distribution of LM (and W and LR) does not depend on (X, Z).
Thus, for any two measurable functions φ and ψ,
E φ(LM)ψ(X, Z) = E E φ(LM) X, Z ψ(X, Z)
= E φ(LM) E ψ(X, Z) .
316 J.R. Magnus
Not only are LM and Sβ̂ uncorrelated, but any two measurable functions of LM
and Sβ̂ are uncorrelated as well. Then, by Doob (1953, p. 92), LM and Sβ̂ are
independent, and the same holds for the Wald and LR tests.
In this simple example, the sensitivity and the diagnostic are not only
asymptotically independent, but even independent in finite samples. In the next
example – which is more typical – we only find asymptotic independence.
Consider again the linear regression model
y = Xβ + u, u | X ∼ N 0, σ 2 Ω(θ ) ,
from which we see that the LM test is a quadratic function of u, while the sensi-
tivity is a linear function of u. Hence they are asymptotically independent since
both have finite limiting variances.
A limiting result does not, however, inform us how fast the convergence takes
place. Thus, we perform a Monte Carlo experiment, based on the same set-up as
in Section 12.4. Our assumed alternative is the AR(1) model with parameter θ .
Assuming that the null hypothesis that θ = 0 is true, we calculate critical values
SS∗ and LM∗ such that
the LR and Wald tests will also be asymptotically independent of the scaled sen-
sitivity, but the speed of convergence could be different. All three tests converge
quickly to the 95% line; the Wald test is the slowest. The Wald test and the LR
test are both positively correlated with the scaled sensitivity.
Sensitivity analysis matters. The usual diagnostic test provides only half the
information required to decide whether a restricted estimator suffices to learn
about the focus parameters in the model; the other half is provided by the sensi-
tivity.
What are the implications for the practitioner? If the practical model and esti-
mation environment is covered by the theory and examples in this chapter, then
the sensitivity can be computed and its distribution derived. This will provide
useful additional information. In many cases encountered in practice, however,
the current state of sensitivity analysis does not yet allow formal testing. In those
cases ad hoc methods can be fruitfully employed to assess the sensitivity of
estimates, forecasts, or policy recommendations. After all, sensitivity analysis
simply asks whether the results obtained change “significantly” when one or
more of the underlying assumptions is violated. Each time we perform a diag-
nostic test, we should also ask the corresponding sensitivity question. Suppose,
318 J.R. Magnus
Acknowledgements
References
Banerjee, A.N., Magnus, J.R. (1999). The sensitivity of OLS when the variance matrix is (partially)
unknown. Journal of Econometrics 92, 295–323.
Banerjee, A.N., Magnus, J.R. (2000). On the sensitivity of the usual t - and F -tests to covariance
misspecification. Journal of Econometrics 95, 157–176.
Chao, H.-K. (2007). Structure. In: this volume, Chapter 12.
Cook, R.D. (1979). Influential observations in linear regression. Journal of the American Statistical
Association 74, 169–174.
Cook, R.D. (1986). Assessment of local influence (with discussion). Journal of the Royal Statistical
Society, Series B 48, 133–169.
Doob, J.L. (1953). Stochastic Processes. John Wiley, New York.
Dufour, J.-M., King, M.L. (1991). Optimal invariant tests for the autocorrelation coefficient in linear
regressions with stationary or nonstationary AR(1) errors, Journal of Econometrics 47, 115–143.
Giere, R.N. (1999). Using models to represent reality. In: Magnani, L., Nersessian, N.J., Thagard, P.
(Eds.), Model-Based Reasoning in Scientific Discovery. Kluwer Academic/Plenum Publishers,
New York.
Godfrey, L.G. (1988). Misspecification Tests in Econometrics. Cambridge Univ. Press, Cambridge.
Griffiths, W.E., Beesley, P.A.A. (1984). The small-sample properties of some preliminary test esti-
mators in a linear model with autocorrelated errors, Journal of Econometrics 25, 49–61.
Harrison, G.W., Johnson, E., McInnes, M.M., Rutström, E.E. (2007). Measurement with experimen-
tal controls. In: this volume, Chapter 4.
Huber, P.J. (2004). Robust Statistics. In: Wiley Series in Probability and Statistics. John Wiley, Hobo-
ken, NJ.
King M.L. (1981). The alternative Durbin–Watson test. Journal of Econometrics 17, 51–66.
King, M.L. (1987). Testing for autocorrelation in linear regression models: A survey. In: King, M.L.,
Giles, D.E.A. (Eds.), Specification analysis in the linear model, Essays in honour of Donald
Cochrane. Routledge & Kegan Paul, London.
Laha, R.G. (1954). On a characterization of the gamma distribution. The Annals of Mathematical
Statistics 25, 784–787.
Leamer, E.E. (1978). Specification Searches. John Wiley, New York.
Local Sensitivity in Econometrics 319
Leamer, E. E. (1984). Global sensitivity results for generalized least squares estimates. Journal of
the American Statistical Association 79, 867–870.
Magnus, J.R. (1978). Maximum likelihood estimation of the GLS model with unknown parameters
in the disturbance covariance matrix. Journal of Econometrics 7, 281–312.
Magnus, J.R., Neudecker, H. (1988). Matrix Differential Calculus with Applications in Statistics and
Econometrics. Revised Edition 1999. John Wiley, Chichester/New York.
Magnus, J.R., Vasnev, A. (2007). Local sensitivity and diagnostic tests. Econometrics Journal 10,
166–192.
Newey, W.K., West, K.D. (1987). A simple, positive semi-definite, heteroskedasticity and autocor-
relation consistent covariance matrix. Econometrica 55, 703–708.
Omtzigt, P., Paruolo, P. (2005). Impact factors. Journal of Econometrics 128, 31–68.
Park, J., Mjelde, J.W., Fuller, S.W., Malaga, J.E., Rosson, C.P. (2002). An assessment of the effects
of ENSO events on fresh vegetable and melon supplies. HortScience 37, 287–291.
Pitman, E.J.G. (1937). The “closest” estimates of statistical parameters. Proceedings of the Cam-
bridge Philosophical Society 33, 212–222.
Polasek, W. (1984). Regression diagnostics for general linear regression models. Journal of the
American Statistical Association 79, 336–340.
This page intentionally left blank
CHAPTER 13
Most of the papers in this volume analyze in detail some narrowly specified
problem of economic measurement. This paper takes a more general approach
and surveys a number of problems that limit the empirical evaluations of eco-
nomic models. It takes as given that economic models should have empirical
relevance, so that they need to be empirically tested.
I therefore focus on some difficulties in testing economic models. The prob-
lems being numerous and space being limited, I take up just a few of them,
concentrating on those that are at least to some extent remediable, and ignore
others, such as the problem of inferring causality, the Lucas critique and some
limitations of the available data. This means omitting some important funda-
mental problems in relating data to theory, such as those discussed in the last
chapter of Spanos (1986). Nor do I discuss the problems created by ideological
commitments, loyalty to schools of thought, or to the reluctance to admit error.1
At the same time I have not been reluctant to discuss issues that are already well
known – but ignored in practice.
Although this essay focuses on models that make quantitative predictions let us
first look at models that are intended in the first instance to provide qualitative
understanding. They, too, are empirical since they aim to enhance our under-
standing of observed phenomena.
One type of such models is what Allen Gibbard and Hal Varian (1978) call
“caricature models,” that is models that purposely deal with an extreme case
because that clarifies the operation of a particular factor that in the real world
1 Elsewhere (Mayer, 2001b) I have argued that ideological differences do not explain very much
of the disagreement among economists. (For a contrary conclusion see Fuchs et al., 1998.) In Mayer
(1998) I have presented a case study of how adherence to schools of thought and other personal
obstacles have inhibited the debate about a fixed monetary growth rate rule.
takes a less extreme form. For example, a model may explain price dispersion
by hypothesizing that there are only two types of consumers, those who search
until they find the lowest price for a specific item regardless of the time it takes,
and those who buy from the first seller they encounter. Such a model can teach
us about the importance of search effort. Since that is an empirical issue, it is
an empirical model, even though standing on its own, it is only loosely related
to empirical prediction. Suppose, for example, that we would find that while the
model predicts that the variance of prices is high for a commodity for which
there is much consumer search, the data show the opposite. We would not treat
this as a contradiction of the model’s claim that consumer search lowers the vari-
ance of prices, but would attribute it to some disturbing factor, such as reverse
causation. Such a caricature model can serve as an input into a more general
model that has less restrictive ceteris paribus clauses. It also provides under-
standing in an informal sense, understanding that Fritz Machlup (1950) has
characterized as a sense of “Ahaness.” This does not mean that the credibility
of a caricature model is entirely independent of its predictive performance. If
the larger model in which it is used fails to predict correctly, and if no conve-
nient disturbing factors can be found, then the alleged insight of the model is
dubious.
Caricature models carry a potential danger. Particularly if the model is elegant
it may be applied over-enthusiastically by ignoring its ceteris paribus conditions,
as was done, for example, when Ricardian rent theory was used to predict that
rents would absorb a rising share of income. Schumpeter called this type of error
the “Ricardian Vice.”
Another type of qualitative model, qualitative in the broad sense that it does
not require the microscope of econometric analysis, is one that derives its ap-
peal from readily observed experience. In some cases the facts stand out starkly.
Thus, the Great Depression showed that prices are not flexible enough to quickly
restore equilibrium given a massive negative demand shock. And the Great In-
flation showed that a Phillips curve that does not allow for the adjustment of
expectations is not a good policy guide. In microeconomics a model that ex-
plains why some used goods whose quality is hard to ascertain sell at a greater
discount than do others, is empirically validated by ordinary experience without
the aid of econometrics. This does not mean that narrower subsidiary hypothe-
ses of these models are not tested, and that these tests are not informative. But
our willingness to accept the broad messages of these models does not depend
on t values, etc. (See Summers, 1991.)
Both of these types of quantitative models raise different issues from qual-
itative models. For the first type it is whether the feature of reality that the
caricature model has pounced on teaches us enough about the real world, or
whether it distorts our understanding by focusing our attention on something
that may be technically “sweet,” but of little actual relevance.2 That cannot be
2 For example, take the following model: A government can finance its expenditures only by tax-
ing, borrowing or money creation. Therefore, holding tax receipts and borrowing constant, money
The Empirical Significance of Econometric Models 323
Let us now look at three of the many serious problems that arise in testing eco-
nomic theories.
The traditional procedure is to select as the regressors the major variables im-
plied by the model, run the regression, and then, if necessary add, or perhaps
eliminate, some regressors until the diagnostics look good. An alternative pro-
cedure coming from LSE econometricians is to use a large number of regressors,
some of which may not be closely tied to the hypothesis being tested, and then
narrow the analysis by dropping those with insignificant coefficients. Such a
search for the data generating process (DGP) usually puts more stress on meet-
ing the assumptions of the underlying statistical model, emphasizes misspecifi-
cation tests, and rejects quick fixes, such as adding an AR term, than does the
traditional procedure, though it does not reject the criteria used in the traditional
approach. Thus Spanos (1986, pp. 669–670) cites the following criteria: “the-
ory consistency, goodness of fit, predictive ability, robustness (including nearly
orthogonal explanatory variables), encompassing [the results of previous work
and] parsimony.”
What is at stake here is a more fundamental disagreement than merely a
preference for either starting with a simple model and then adding additional
variables until the fit becomes satisfactory. or else starting with a general model
and then dropping regressors that are not statistically significant. Nobody can
start with a truly general model (see Keuzenkamp and McAleer, 1995), and if
the reduction does not provide a satisfactory solution a LSE econometrician, too,
is likely to add additional regressors at that stage.
The more fundamental disagreement can be viewed in two ways. The first is
as emphasizing economic theory versus emphasizing statistical theory. In the
former case one may approach a data set with strong priors based on the theo-
ry’s previous performance on other test. One then sees whether the new data
set is also consistent with that theory rather than asking which hypothesized
DGP gives the most satisfactory diagnostics, Suppose that the quantity theory
creation depends on government expenditures. We can therefore explain the inflation rate by the
growth rate of government expenditures. This last statement holds if tax receipts and borrowing are
constant, but not if – as seems at least as, if not more, likely – the government finances changes in
its expenditures by changing tax receipts or borrowing.
324 T. Mayer
gives a good fit for the inflation rates of twenty countries . However, for each
of these countries one can estimate a DGP that gives a better fit, but contains
an extra variable that differs from country to country. One may then still prefer
the quantity theory. LSE econometricians would probably agree, but in practice
their method tends to stress econometric criteria rather than the other criteria
relevant to theory selection. This issue is well stated by Friedman and Schwartz
(1991, pp. 39, 49) who wrote in their debate with Hendry and Ericsson (1991)
that one should:
[E]xamine a wide variety of evidence quantitative and nonquantitative. . . ; test results from
one body of evidence on the other bodies, using econometric techniques as one tool in this
process, and build up a collection of simple hypotheses. . . . [R]egression analysis is a good
tool for deriving hypotheses. But any hypothesis must be tested with data or nonquantitative
evidence other than that used in deriving the regression, or available when the regression was
derived. Low standard errors of estimate, high t values and the like are often tributes to the
ingenuity and tenacity of the statistician rather than reliable evidence. . . .
Another serious problem both in testing and in applying a model is that the ce-
teris paribus conditions that define its domain are often insufficiently specified.
If we are not told what they are, and the extent to which they can be relaxed
without significant damage to the model’s applicability, then data cannot be said
to refute it, but only to constrain its domain. In day-to-day work this shows up as
the question of what variables have to be included among the auxiliary regres-
sors. A dramatic illustration is Edward Leamer’s (1978) tabulation of the results
obtained when one includes various plausible auxiliary regressors in equations
intended to measure the effect of capital punishment on the homicide rate. The
results are all over the map. And the same is true in a recent follow-up study
(Donohue and Wolfers, 2006). Similarly, as Thomas Cooley and Stephen LeRoy
(1981) have shown, in demand functions for money the negative interest elastic-
ity predicted by theory does not emerge clearly from the data, but depends on
what other regressors are used.
The ideal solution would be to specify the ceteris paribus conditions of the
theoretical model so precisely that it would not leave any choice about what
auxiliary regressors to include. But we cannot list all the ceteris paribus condi-
tions. New classical theorists claim to have a solution: the selection of auxiliary
regressors must be founded on rational-choice theory. But that is unpersuasive.
In their empirical work the new classicals substitute for utility either income,
or both income and leisure variables, plus perhaps a risk-aversion variable. But
behavioral and experimental economics, as well as neuroscience, provide much
evidence that there is more to utility than that. And the well documented bounds
on rationality open the door to all sorts of additional variables that are not in
the new classicals’ utility function. Similarly, market imperfections complicate
a firms’ decisions.
If theory cannot constrain sufficiently the variables that have to be held
paribus by the inclusion of regressors for them one possible solution could
be: open the floodgates, allow all sorts of plausible variables in, and call the
model confirmed only if it works regardless of which auxiliary regressors are
included. In this spirit Edward Leamer (1978, 1983) has advocated “extreme
bounds analysis,” that is, deciding what regressors are plausible, running re-
gressions with various combinations of them, and then treating as confirmed
only those hypotheses that survive all of these tests. This procedure has been
criticized on technical grounds (see McAleer et al., 1983; Hoover and Perez,
2000). It also has the practical disadvantage that it allows very few hypothe-
ses to survive. Since if economists refuse to answer policy questions they leave
326 T. Mayer
more space for the answers of those who know even less, it is doubtful that they
should become the Trappist monks that extreme bounds analysis would require
of them. However, it may be possible to ameliorate this problem by adopting a,
say 15 percent significance level instead of the 5 percent level.
Full-scale extreme bounds analysis has found few adherents. Instead, econo-
mists now often employ an informal and limited version by reporting as robust-
ness tests, in addition to their preferred regressions, also the results of several
alternative regressions that use different auxiliary regressors or empirical defini-
tions of the theoretical variables.3 This can be interpreted along Duhem–Quinian
lines as showing that the validity of the maintained hypothesis does not depend
on the validity of certain specific auxiliary hypotheses. While this is a great im-
provement over reporting just the results of the favored regression it is not clear
that economists test – and report on – a sufficient number of regressors and de-
finitions. Indeed, that is not likely because data mining creates an incentives-
incompatibility problem between authors (agents) and readers (principals).
By the time she runs her regressions a researcher has usually already spent much
effort on the project. Hence, if her initial regressions fail to confirm her hypoth-
esis she has a strong incentive to try other regressions, perhaps with differently
defined variables, different functional forms, a different sample periods, differ-
ent auxiliary variables, or different techniques, and to do so until she obtains
favorable results. Such pre-testing makes the t values of the final regression
worthless.4 Just as bad, if not worse, such biased data mining also means that
the final results “confirm” the hypothesis only in the sense of showing that it is
not necessarily inconsistent with the data, that there are some decisions about
auxiliary regressors, etc., that could save the hypothesis. Suppose a researcher
has run, say ten alternative regressions, three of which support his hypothesis
and seven that do not. He will be tempted to present one of his successful regres-
sions as his main one and mention the other two successful ones as robustness
tests, while ignoring the seven regressions that did not support his hypothesis.5
3 I have the impression that this has become much more common in recent years.
4 There is no way of correctly adjusting t values for pre-testing. (See Greene, 2000; Hoover and
Perez, 2000; Spanos, 2000.)
5 It is often far from obvious whether the results of additional regressions confirm or disconfirm
the maintained hypothesis. Suppose, that this hypothesis implies that the coefficient of x is positive.
Suppose further that it is positive and significant in the main regression. But in additional regressions
that include certain other auxiliary regressors, though again positive, it is significant only at the
20 percent level. Although taken in isolation these additional regressions would usually be read
as failures to confirm, they should perhaps be read as enhancing the credibility of the maintained
hypothesis, because they suggest that even if the auxiliary hypothesis that these regressors do not
belong in the regression is invalid, there is still only a relatively small likelihood that the observed
results are due merely to sampling error. Good theory choice takes more than attention to t values.
The Empirical Significance of Econometric Models 327
And that deprives readers of information they need to evaluate the hypothe-
sis.
Data mining can occur not only in conventional econometric tests, but also in
calibrations, where there may be many diverse microeconomic estimates among
which the calibrator can pick and choose. (Cf. Hansen and Heckman, 1996.)
To be convincing a calibration test requires making a compelling case for the
particular estimates of the coefficient that has been picked out of the often quite
diverse ones in the literature, not just giving a reference to the coefficient found
in some particular paper.
Though much practiced (see Backhouse and Morgan, 2000) data mining is
widely deplored (see for instance Leamer, 1983; Cooley and LeRoy, 1981). But
it has its defenders. Thus Adrian Pagan and Michael Veall (2000) argue that
since economists seem willing to accept the output of data miners they cannot be
all that concerned about it. But what choice do they have? They do no know what
papers have been hyped by biased data mining, and being academic economists
they have to read the journals and refer to them. Pagan and Veall also argue
that data mining does little damage because if a paper seems important but is
not robust, it will be replicated and its fragility will be exposed. But while path
breaking papers are likely to be replicated, by no means all unreplicated papers
are unimportant; much scientific progress results from normal science. And even
when papers are replicated time passes until the erroneous ones are spotted, and
in the meantime they shunt researchers onto the wrong track.
A much more persuasive defense of data mining is that it is needed to ob-
tain as much information as we can from the data, so that the learning that
results from trying many regressions and testing need to coexist (see Greene,
2000; Spanos, 2000). Thus Hoover and Perez (2000), who focus on generating
accurate values for the coefficients of a hypothesis rather than on testing it, ar-
gue (mainly in the context of general-to-specific modeling) that we need to try
many specifications to find the best one, while Keuzenkamp and McAleer (1995,
p. 20) write: “specification freedom is a nuisance to purists, but is an indispens-
able aid to practical econometricians.” (See also Backhouse and Morgan, 2000;
Kennedy, 2002.) Testing, Hoover and Perez argue, should then be done in some
other way, thus separating the task of exploring a data set from the task of draw-
ing inferences from it. That would be the ideal solution, but in macroeconomics
such, multiple independent data sets are generally not available, or if they are
they relate to different countries which may complicate research. In much mi-
croeconomic work with sample surveys or experimental data, it is, in principle,
possible to gather two samples or to divide the sample into two, and to use one
to formulate and the other to test the hypothesis. But in practice, funds are often
too limited for that. Suppose, for example, that your budget allows you to draw
a sample of a 1000 responses. Would you feel comfortable using only 500 re-
sponses to estimate the coefficients when another 500 are sitting on your desk?
Moreover, a researcher who has two samples can mine surreptitious by peeking
328 T. Mayer
at the second sample when estimating the coefficients from the first sample.6
(See Inoue and Killian, 2002.)
The other polar position on data mining – one usually not stated so starkly but
implicit in much criticism of data mining – is to limit each researcher to testing
only a single variant of her model. But that is a bad rule, not only for the reasons
just mentioned, but also because it leaves too much to luck. A researcher might
just happen on the first try to pick the one variant of twenty equally plausible
ones that provides a good fit. (See Bronfenbrenner, 1972.) Moreover, even if all
data mining by individual researchers were eliminated, it would not put a stop
to the harmful effects of data mining because of a publication bias. Only those
papers that come up with acceptable t values and other regression diagnostics
tend to be printed, so that, at least in the short run, there would still be a bias
in favor of the hypothesis.7 Moreover, it is hard to imagine such a rule of one
regression per researcher being effectively enforced.
A more feasible solution that avoids both extremes, is to permit data mining
but only as long as it is done transparently. A basic idea underlying the orga-
nization of research is the division of labor; instead of having every scientist
investigate a particular problem, one scientist does so, and her discoveries be-
come known to all others. This works best if she holds nothing important back,
and not well if she withholds information that detracts from the validity of her
work, for example, that her results require the assumption that the lag is six
months rather than three, nine or twelve months. Hence, a data miner should let
readers know if plausible assumptions other than the ones she used yield results
that are meaningfully different. The reader can then decide whether to accept
the proffered conclusions.
Though I think this is the best of all available alternatives, it, too, has its prob-
lems. One is the difficulty (impossibility?) of ensuring that researchers mention
all their alternative regressions that significantly reduce the credibility of the
maintained hypothesis. Your conscience may urge you to do so, but fear that
your rivals do not, urges you to override your conscience. A second is that a
researcher is likely to run some regressions that she does not take seriously, just
to see what would happen if. . . . Do they have to be reported? And if not, where
does one draw the line? Another problem is that a researcher who intends to
run, say twelve variants of the maintained hypothesis, and happens to get a good
result in say the first two, has a strong incentive to quit while he is ahead, so that
potential knowledge is lost.
6 This is not necessarily dishonest. If a macroeconomist sets a few year’s data aside as a second
sample, she knows something about what the data are likely to show simply by having lived through
this period, And someone working with survey data may have inadvertently learned something about
the second sample in the process of splitting the data or in talking to his research assistant.
7 I say the short run because, as Robert Goldfarb (1995) has shown, once a hypothesis is widely
accepted only those papers that test it and disconfirm it tend to be published, because only they
provide “new” information.
The Empirical Significance of Econometric Models 329
8 See for instance, Bruce Thompson (2004), Open Peer Comments (1996). Sui Chow (1996, p. 11),
who even though he defends the use of significance tests, writes: “the overall assessment of the . . .
[null-hypotheses significance test procedure] in psychology is not encouraging. The puzzle is why
so many social scientists persist in using the process.” He argued persuasively that these criticisms
of significance tests are largely due to researchers trying to read too much into them.
9 McCloskey (1985) also argued that in many cases the sample is, in effect, the whole universe, so
that tests for sampling error are meaningless. Hoover and Perez (2000) respond that the hypothesis
being tested is intended to be general and thus cover actual or potential observations outside the
sample period.
330 T. Mayer
10 In his survey of papers published in the Journal of Economic History and in Explorations in
Economic History Anthony O’Brien (2004) found that of the 185 papers that used regression analy-
sis, 12 percent did so incorrectly, and that in 7 percent of these papers this did matter for the main
conclusions of the paper. The confusion of statistical and substantive significance has also been a
problem in biology (see Phannkuch and Wild, 2000).
11 For some specific instances see Robertson (2000), Viscusi and Hamilton (1999), Loeb and Page
(2000), McConnell and Perez-Quiros (2000), Papell et al. (2000), Wei (2000). For a further discus-
sion of this problem see Mayer (2001a).
12 Nothing said above conflicts with the philosophy-of-science proposition that failure to be discon-
firmed on a hard test raises the credibility of a hypothesis, because the term “not disconfirmed” is
The Empirical Significance of Econometric Models 331
used in two different senses. In the context of significance testing it means that – using a rigorous
standard for saying that the hypothesis has been rejected – there is not sufficient evidence to say that
it has. In the context of philosophy-of-science failure to be disconfirmed means that the probability
that the proposition is false is less than 50 percent.
332 T. Mayer
As several economists have pointed out (see for instance Leontief, 1971) most
economists show little concern about the quality of their data.13 To be sure, they
make allowance for sampling error, but that’s about it. The standard justifica-
tions for this unconcern are first that the obvious need to quantify and test our
hypotheses forces us to use whatever data we can find, and as long as they are
the best available data, well, that’s all we can be expected to do. Second, previ-
ous researchers have already decided what the best data sets are, so we can just
use these.
Sounds compelling – but isn’t. Yes, empirical testing is important, but in some
cases even the best available data may not be reliable enough to test the model,
and then we should either develop a better data set on our own, or else admit
that our model cannot, at least at present, be adequately tested. Or if the avail-
able data are neither wholly reliable nor totally inadequate you may use them
to test the hypothesis, but inform the reader about the problem, and perhaps do
some robustness testing. That others have used a data set is not an adequate jus-
tification for your using it, not only because of uncertainty about whether the
previous use was successful, but also because, while for some purposes crude
estimates suffice, for others they do not. Don’t assume that the sophistication of
13 Previously Oskar Morgenstern (1950) had provided a long list of errors that resulted from econo-
mists not knowing enough about their data, and Andrew Kamarck (1983) has presented more recent
examples. The appearance of downloadable databases probably exacerbated this problem. In the old
days when economists had to take the data from the original sources they were more likely to read
the accompanying description of the data. Another exacerbating factor is the much greater use of re-
search assistants. A researcher who has to work with data herself is more likely to notice anomalies
in the data than are assistants who tend to follow instructions rather than “waste” time by thinking
about the data.
The Empirical Significance of Econometric Models 333
your econometrics can compensate for the inadequacy of your data. (Cf. Chat-
field, 1991.) Time spent on cleaning up the data, or looking for a data set that
provides a better measure of your model’s variables, may not impress a referee,
but it may improve the results more than the same time spent in learning the
latest technique. As Daniel Hamermesh (2000, p. 365) has remarked: “data may
be dirty, but in many cases the dirt is more like mud than Original Sin.”
In more concrete terms suppose the data seem to disconfirm the hypothesis
because the t value of the critical coefficient is low, or because other regression
diagnostics look poor. Both of these may be due to data errors and not an er-
ror in the hypothesis. To illustrate with an extreme case, albeit one involving
an identity rather than a hypothesis, few would deny that for a particular com-
modity total exports equal total imports, even though the data show them not
to. Conversely, data errors may sometimes favor the hypotheses. For example,
because of a lack of better data the compilers of a series may have estimated an
important component as a simple trend. If the model contains a regressor dom-
inated by a similar trend this data error could provide spurious support for the
model.
Because of the reluctance of economists to get involved in the messy details
of how their data were derived certain standard conventions are used without
question. To illustrate the type of problem frequently swept under the rug con-
sider the savings ratio. How many economists who build models to explain this
ratio discuss whether they should use the savings data given in the National In-
come and Product Accounts (NIPA), or else the very different savings data that
can be derived from the flow-of-funds accounts? The former are generally used
even though they derive saving by subtracting consumption from income, and
are therefore at least potentially subject to large percentage errors.14 (The flow-
of-funds estimates also have their problems.) Moreover, as Reinsdorf (2004) has
pointed out, there are some specific problems with the NIPA savings data. One is
that the personal income data include income received on behalf of households
by pension funds and nonprofit organizations that serve households, that is in-
come that households may not be aware of and take into consideration when
deciding on their consumption. Data on the difference between the NIPA per-
sonal savings rate and the savings rate of households that exclude these receipts
are available since 1992, and while the difference is trivial in 1992–1994, it
amounts to 0.7 percentage points – that is about 30 percent of the savings ratio
14 More precisely, “personal outlays for personal consumption expenditures (PCS), for interest pay-
ments on consumer debt, and for current transfer payments are subtracted from disposable personal
income” (Reinsdorf, 2004, p. 18). The extent to which errors in estimating either income or con-
sumption affect estimates of the savings ratio depends not only on the size of these errors, but also
on their covariance. Suppose income is actually 100, but is estimated to be 101, while consumption
is estimated correctly at 95. Then, saving is estimated to be 6 rather than 5, a 20 percent error. But if
income has been overestimated by 1 because consumption was overestimated by 1, then these errors
lower the estimated savings ratio only by 0.05 percent of income, that is by 1 percent of its actual
value.
334 T. Mayer
in 1999 and 2000. Another problem is that the NIPA data treat as interest income
(and as also as interest payments on consumer debt, and hence as a component
of consumption) nominal instead of real interest payments. Using real instead of
nominal interest payments reduces the personal savings rate by 1.5 to 2.4 per-
centage points during 1980–1992, but only by 0.5 to 1.2 percentage points in
1993–2000.
Another problem is the treatment of capital gains and losses. The NIPA data
exclude capital gains from income, and hence from saving, but they deduct the
taxes paid on realized capital gains from disposable personal income, and thus
indirectly from personal saving. Using an alternative measure that includes in
disposable personal income federal taxes on capital gains changes the recorded
savings rate by only 0.5 percentage points in 1991–1992 but by 1.65 percent-
age points in the unusual year, 2000. And then there is the important question
whether at least some of the unrealized capital gains and losses shouldn’t be
counted as saving, since over the long run capital gains are a major component
of the yield on stocks.
Other data sets have other problems. For instance the difficulties of measur-
ing the inflation rate are well known, and since real GDP is derived by deflating
nominal GDP, errors in estimating the inflation rate generate corresponding er-
rors with the opposite sign in estimated real GDP. Moreover, real GDP estimates
are downward biased because of an underground economy that might account
for 10 percent or more of total output. Furthermore, GDP revisions are by no
means trivial, which raises the question of how reliable the final estimates are.
Balance of payments statistics, too, are notoriously bad. The difficult of defining
money operationally has led to the quip that the demand for money is stable; it
is just the definition of money that keeps changing. And even if one agrees on
the appropriate concept of money, real time estimates of quarterly growth rates
of money are unreliable. The problems besetting survey data, such as misunder-
stood questions and biased answers are also large. Moreover, in using survey
data it has become a convention in economics not to worry about a possible bias
due to non-response, even when the non-response rate is, say 65 percent.
My point here is not that the available data are too poor to test our models.
That I believe would be an overstatement. It is also not that economists use
wrong data sets, but rather that they tend to select their data sets in a mechanical
way without considering alternatives, or asking whether the data are sufficiently
accurate for the purpose at hand.
There is also a serious danger of errors in data entry, in calculations, and in
the transcription of regression results. Dewald, Thursby and Anderson (1986),
show that such errors were frequent and substantial. Perhaps as a result of this
paper they are now much less common, but perhaps not.15 Downloading data
from a standard database is not a complete safeguard against errors. Without
15 Over many years of working first with desk calculators and then with PCs I have found that even
if one checks the data carefully, in any large project mechanical errors do creep in. Calculation errors
may be as common, or even more common, now than they were in the days of desk calculators. One
The Empirical Significance of Econometric Models 335
even looking for them I have twice found a substantial error in a widely used
database.
Moreover, since various popular software packages can yield sharply differ-
ent results, regression programs, too, can generate substantial errors. (See Lovell
1994; McCullough and Vinod, 1999; McCullough, 2000.) In particular, McCul-
lough and Vinod speak of:
the failure of many statistical packages to pass even rudimentary benchmarks for numerical
accuracy. . . . [E]ven simple linear procedures, such as calculation of the correlation coefficient
can be horrendously inaccurate. . . . While all [three popular] packages tested did well on
linear regression benchmarks – gross errors were uncovered in analyses of variance routines.
. . . [There are] many procedures for which we were unable to find a benchmark and for which
we found discrepancies between packages: linear estimation with AR(1) errors, estimation of
an ARMA model, Kalman filtering, . . . and so on (pp. 633, 635, 650, 655).
is more likely to be dividing when one should be multiplying, if one can do so with a single key
stroke, than in the old days when in the tedious hours of using a desk calculator one had plenty of
time to think about what one was doing.
16 I did the search on November 4, 2005 using the Google “scholar” option. It is, of course, possible,
though unlikely, that some errata were published that did not cite the McCullough–Vinod paper but
cited one of the other papers that made a similar point. It is also possible that in their subsequent
papers some economists did check whether other programs gave results similar to the one they
used, though I do not recall ever seeing any indication of this. Also, some economists may have
tried several programs and abandoned their projects when they found that these programs gave
substantially different results. It would be interesting to know whether economists in government
or business, whose errors could result in large losses, recalculated some of their regressions using
different programs.
336 T. Mayer
temperature. If they get similar results then that confirms the original findings,
and if they do no]t, that can be read either as a limitation of the domain of the
model or as casting doubt on it. If many replications fail to confirm the original
findings these are then treated as, at best, a special case. Such replication is not
common in economics.
This discussion may seem to have struck an unrelieved pessimistic note. But all
attempts to advance knowledge, not just economic measurements, face obsta-
cles. For example, economic theory has its unrealistic assumption (and impli-
cation) of rational income maximization. All the same, it has greatly advanced
our understanding. Moreover, the large volume of economic modeling over the
last few decades has improved our understanding of the economy and our pre-
dictive ability, think, for example, of asymmetric information theory, modern
finance theory and behavioral economics. And other fields have their problems
too. In medicine a study found that: “16 percent of the top cited clinical research
articles on postulated effective medical interventions that have been published
within the last 15 years have been contradicted by subsequent clinical studies,
and another 16 percent have been found to have initially stronger effects than
subsequent research” (Ioannides, 2005, p. 223).
The preceding tale of woe is therefore not a plea for giving up, but instead an
argument for modesty in the claims we make. Our papers seem to suggest that
there is at least a 95 percent probability that our conclusions are correct. Such
a claim is both indefensible and unneeded. If an economist takes a proposition
for which the previous evidence suggested a 50:50 probability and shows that
it has a 55:45 probability of being right, she has done a useful job. It is also a
plea to improve our work by paying more attention to such mundane matters as
the quality and meaning of our data, potential computing errors, and the need to
at least mention unfavorable as well as favorable results of robustness tests. To
be sure, that would still leave some very serious problems, such as the transition
from correlation to causation, the Lucas critique, and the limited availability of
reliable data, but that there is some opportunity for improvement is a hopeful
message. Moreover, that some problems we face are insoluble should make us
economists feel good about ourselves, since it suggests that our failure to match
the achievements of most natural sciences is not an indication of intellectual
inferiority.17
None of this would carry much weight if the pessimists are right in saying that
in economics empirical evidence is not taken seriously when it conflicts with
References
Backhouse, R.E. (1992). The significance of replication in econometrics. Discussion paper 92-25.
Economics Department, University of Birmingham.
Backhouse, R.E., Morgan, M.S. (2000). Introduction: Is data mining a methodological problem?
Journal of Economic Methodology 7, June, 173–182.
Bronfenbrenner, M. (1972). Sensitivity analysis for econometricians. Nebraska Journal of Eco-
nomics 2, 57–66, Autumn.
Chatfield, C. (1991). Avoiding statistical pitfalls. Statistical Science 6, 240–252, August.
Chow, S. (1996). Statistical Significance. Sage Publishing, London.
Cooley, T.F., LeRoy, S.F. (1981). Identification and estimation of money demand. American Eco-
nomic Review 71, 825–843, December.
Dewald, W., Thursby, J., Anderson, R. (1986). Replication in economics: The Journal of Money,
Credit and Banking Project. American Economic Review 76, 587–603, September.
Donohue, J., III, Wolfers, J. (2006). Uses and abuses of empirical evidence in the death penalty
debate. Working paper 11982. NBER.
Elliot, G., Granger, C.W.J. (2004). Evaluating significance: Comment on ‘size matters’. Journal of
Socio-Economics 33, 547–550.
Fuchs, V., Kruger, A., Poterba, J. (1998). Economists’ views about parameters, values and policies:
Survey results in labor and public economics. Journal of Economic Literature 36, 1387–1425,
September.
Friedman, M. (2005). A natural experiment in monetary policy covering three periods of growth and
decline in the economy and the stock market. Journal of Economic Perspectives 19, 145–150,
Fall.
Friedman, M., Schwartz, A. (1991). Alternative approaches to analyzing economic data. American
Economic Review 81, 39–49, March.
Gibbard, A., Varian, H. (1978). Economic models. Journal of Philosophy 1975, 665–677, November.
Gigerenzer, G. (2004). Mindless statistics. Journal of Socio-Economics 33, 587–606.
Goldfarb, R. (1995). The economist-as-audience needs a plausible model of inference. Journal of
Economic Methodology 2, 201–222, February.
Goldfarb, R., Stekler, H.O. (2000). Why do empirical results change? Forecasts as tests of rational
expectations. History of Political Economy (Annual Supplement), 95–116.
Greene, C. (2000). I am not, nor have I have been a member of the data-mining discipline. Journal
of Economic Methodology 7, 217–239, June.
Hansen, L.P., Heckman, J. (1996). The empirical foundations of calibration. Journal of Economic
Perspectives 10, 87–104, Winter.
Hamermesh, D. (2000). The craft of labrometrics. Industrial and Labor Relations Review 53, 363–
380, April.
Hausman, D.M. (1992). The Inexact and Separate Science of Economics. Cambridge Univ. Press,
Cambridge.
Hendry, D.F., Ericsson, N. (1991). An econometric analysis of UK money demand, in Friedman, M.,
Schwartz, A.J. (Eds.), Monetary Trends in the United States and the United Kingdom. American
Economic Review 81, 8–39, March.
Hoover, K.D., Perez, S. (2000). Three attitudes towards data mining. Journal of Economic Method-
ology 7, 195–210, June.
Hoover, K.D., Siegler, M. (unpublished). Sound and fury: McCloskey and significance testing in
economics.
Horowitz, J. (2004). Comment on size matters. Journal of Socio-Economics 33, 571–575.
The Empirical Significance of Econometric Models 339
Inoue, A., Killian, L. (2002). In-sample or out-of-sample tests of predictability: Which should we
use? Working paper No. 195. European Central Bank.
Ioannides, J. (2005). Contradicted and initially stronger effects in highly cited clinical research.
Journal of the American Medical Association 294, 219–227, July.
Kamarck, A. (1983). Economics and the Real World. Blackwell, Oxford.
Kennedy, P. (2002). Sinning in the basement: What are the rules? The ten commandments of applied
econometrics. Journal of Economic Surveys 16 (4), 569–585.
Keuzenkamp, H.A., Magnus, J.R. (1995). On tests and significance in econometrics. Journal of
Econometrics 67, 5–24.
Keuzenkamp, H.A., McAleer, M. (1995). Simplicity, scientific inference and econometric mod-
elling. Economic Journal, January, 1–21.
Kim, J., de Marchi, N., Morgan, M.S. (1995). Empirical model peculiarities and belief in the natural
rate hypothesis. Journal of Econometrics 67, 81–102.
Leamer, E.E. (1978). Specification Searches: Ad Hoc Inference with Nonexperimental Data. John
Wiley, New York.
Leamer, E. (1983). Let’s take the con out of econometrics. American Economic Review 73, 31–43,
March.
Leontief, W. (1971). Theoretical assumptions and nonobervable facts. American Economic Review
61, 1–7, March.
Loeb, S., Page, M. (2000). Examining of the link between teacher wages and student outcome.
Review of Economics and Statistics 82, 393–408, August.
Lovell, M. (1994). Software reviews. Economic Journal 104, 713–726, May.
Machlup, F. (1950). Methodology of Economics and Other Social Sciences. Academic Press, New
York.
Makridakis, S., Bibon, M. (2000). The M-3 competition: Results, conclusions and implications.
International Journal of Forecasting 16, 451–476,
Mayer, T. (1998). Monetarists versus Keynesians on central banking. In: Backhouse, R., Hausman,
R., Mäki, D.U., Salanti, A. (Eds.), Economics and Methodology. MacMillan, London.
Mayer, T. (2001a). Misinterpreting a failure to disconfirm as a confirmation. http://www.econ.
ucdavis.edu/working.
Mayer, T. (2001b). The role of ideology in disagreements among economists: A quantities analysis.
Journal of Economic Methodology 8, 253–274, June.
McAleer, M., Pagan, A., Volcker, P. (1983). What will take the con out of econometrics? American
Economic Review 73, 293–307, June.
McCloskey, D.N. (1985). The Rhetoric of Economics. Univ. of Wisconsin Press, Madison.
McConnell, M., Perez-Quiros, G. (2000). Output fluctuations in the United States: What has changed
since the early 1980s? American Economic Review 90, 1464–1476, December.
McCullough, B.D. (2000). Is it safe to assume that software is accurate? International Journal of
Forecasting 16, 349–357.
McCullough, B.D., Vinod, H.D. (1999). The numerical reliability of econometric software. Journal
of Economic Literature 37, 633–665, June.
Mirowski, P., Skilivas, S. (1991). Why econometricians don’t replicate although they do reproduce.
Review of Political Economy 3 (2), 146–163.
Morgenstern, O. (1950). On the Accuracy of Economic Observations. Princeton Univ. Press, Prince-
ton.
O’Brien, A. (2004). Why is the standard error of regression so low using historical data? Journal of
Socio-Economics 33, 565–570, November.
Open Peer Comments (1996). Brain and Behavioral Research. 19, 188–228, June.
Pagan, A., Veall, M. (2000). Data mining and the econometrics industry: Comments on the papers
by Mayer and of Hoover and Perez. Journal of Economic Methodology 7, 211–216, June.
Papell, D., Murray, C., Ghiblawi, H. (2000). The structure of unemployment. Review of Economics
and Statistics 82, 309–315, May.
Phannkuch, M., Wild, C. (2000). Statistical thinking and statistical practice: Themes gleaned from
professional statisticians. Statistical Science 15 (2), 132–152.
340 T. Mayer
Reinsdorf, M. (2004). Alternative measures of personal saving. Survey of Current Business 84, 17–
27, September.
Robertson, R. (2000). Wage shocks and North American labor-market integration. American Eco-
nomic Review 9, 742–764, September.
Rosenberg, A. (1978). The puzzle of economic modeling. Journal of Philosophy 75, 679–683, No-
vember.
Spanos, A. (1986). Statistical Foundations of Econometric Modeling. Cambridge Univ. Press, Cam-
bridge.
Spanos, A. (2000). Revisiting data mining: ‘Hunting’ with or without a license. Journal of Economic
Methodology 7, 231–264, June.
Summers, L. (1991). The scientific illusion in empirical macroeconomics. Swedish Journal of Eco-
nomics 93, 129–148, March.
Thompson, B. (2004). The ’significance’ crisis in psychology and education. Journal of Socio-
Economics 33, 607–613.
Thorbecke, E. (2004). Economic and statistical significance. Comments on ‘size matters’. Journal
of Socio-Economics 33, 571–575.
Viscusi, W.K., Hamilton, J.T. (1999). Are risk regulators rational? Evidence from hazardous waste
cleanup decisions. American Economic Review 89, 210–227, September.
Ward, B. (1972). What’s Wrong with Economics. Basic Books, New York.
Wei, S.-J. (2000). How taxing is corruption on international investment? Review of Economics and
Statistics 82, 1–11, February.
Zellner, A. (1992). Statistics, science and public policy. Journal of the American Statistical Associ-
ation 87, 1–6, March.
Ziliak, S., McCloskey, D.N. (2004). Size matters: The standard error of regressions in the “American
Economic Review.” Journal of Socio-Economics 33 (5), 527–546.
PART IV
Precision
This page intentionally left blank
CHAPTER 14
Precision
Theodore M. Porter
Department of History, UCLA, Los Angeles, CA, USA
E-mail address: tporter@history.ucla.edu
from Newton (who wanted them for his study of the chronology of ancient king-
doms) until they were perfect shows how obsessive the drive for precision could
be. His amendments to Newton’s edition, copies of which he hunted down and
burned, were few and minor. In the 1790s, Pierre-André Méchain was driven to
distraction by an error of a few seconds of arc (implying a discrepancy of about
a hundred meters) in his survey of a line of meridian from France through the
Pyrenees to Barcelona (Alder, 2002; Gillispie, 2004). These three or four sec-
onds poisoned the remainder of this life, which he spent desperately trying to
correct the inconsistency and covering it up, until his more level-headed collab-
orator Jean-Baptiste Delambre discovered it in his papers after he died.
The role of the metric system in the history of precision goes beyond
Méchain’s measurement-induced madness. The aspiration to universal mea-
sures, based on a meter that would be one ten-millionth part of a quarter merid-
ian, the distance from the equator to the North Pole, was part of a revolutionary
ambition to remake the world. The tangle of locally variable units of length,
weight, area, and volume given by history was to be replaced by a simple and
rational system, removing also those ambiguities that were so often exploited
by the powerful and promoting free commerce among the nations of the world
(Kula, 1986). A culture of precise, standardized measures would be more open
and legible than one based on locality, tacit understanding, and social power. It
would facilitate the free movement of knowledge as well as of merchandise.
The late eighteenth century, when the Enlightenment came to fruition, initi-
ated the heyday of precision in the sciences, which has gained momentum in
the succeeding centuries. The antique sciences of astronomy, mechanics, and
geometrical optics, known to the early modern period as natural philosophy,
had long traditions of mathematical precision. Now, chemistry, meteorology,
geodesy, and the study of heat, light, electricity, and magnetism, were made ex-
perimental and subjected to precise measurement, in most cases before there
was much in the way of mathematical theory. The rising standard of precision
was made possible by new instruments and the experimental practices that went
with them, but the impulse behind it all was not simply a matter of scientific
goals. Rather, it grew from a new alliance of scientific study with the forces of
state-building, technological improvement, and global commerce and industrial
expansion. The metric system, linked as it was to measurements of the earth,
depended on techniques of land surveying now deployed on a large scale. The
French state, for example, undertook in the early eighteenth century to map the
kingdom, the better to administer it, while the British surveyed North America in
an effort to regulate settlement and to allow clearer property holdings (Linklater,
2002). The scientific controversy over the shape of the Earth, often explained in
terms of Cartesian opposition to Newtonian mechanics, arose in fact from the
empirical findings of a French land survey which seemed to show that the Earth
was narrower at the equator than a perfect sphere would be (Terrall, 2002). The
systematic pursuit of precision extended to many technological domains, includ-
ing mining and metallurgy, agriculture, forestry, and power production by water
wheels as well as steam engines (Frängsmyr et al., 1990). Precise measurement
346 T.M. Porter
became important also in the human sciences, such as medical studies of small-
pox inoculation, In the early 1790s, savants and administrators led by Lavoisier
and Lagrange undertook to demonstrate the successes of the Revolution and mo-
bilize the nation for war with a systematic accounting of the French economy.
The drive for improved population statistics was stimulated in equal measure by
scientific, administrative, and ideological ambitions, to the extent that these can
even be distinguished (Rusnock, 2002; Brian, 1994).
Kathryn Olesko argues that the proliferation of instruments of exactitude in
the late eighteenth century did not suffice to support a shared understanding of
what “precise” measurement must mean (Olesko, 1995, p. 104). The looseness
of this concept may be illustrated by Lavoisier’s notorious practice of giving
many superfluous decimal places, as for example in a paper of 1784 where
0.86866273 pounds of vital air are combined with 0.13133727 pounds of inflam-
mable gas to give 1.00000000 pounds of water (Golinski, 1995, p. 78). Olesko
suggests, very reasonably, that the idea of probability, as incorporated into the
method of least squares early in the new century, supported a novel concept of
precision as the tightness of a cluster of measurements, one that indicates the
reliability of the measuring system in the inverse form of the magnitude of ex-
pected error. Significantly, the method of least squares was first published in
1805 by Adrien-Marie Legendre with specific reference to the metric surveys.
Carl Friedrich Gauss, however, had already incorporated this method of mini-
mizing the squares of errors into his work on planetary astronomy, where the
notion of error to be expected with a given measuring apparatus was already
familiar. The method of least squares was first applied routinely in these two al-
lied fields, astronomy and geodesy. It was subsequently taken up in experimental
physics, first of all in Germany, and much more slowly than in astronomy.
Yet precision as a concept, if not necessarily as a word, was reasonably famil-
iar in the late eighteenth century, before the systematization of least squares for
reducing data and fitting curves. Laplace deployed the paired concepts of preci-
sion and probability in the 1780s to estimate population. He inferred a measure
of population from the number of births – these were systematically recorded
and gathered up nationally – using a multiplier, the number of inhabitants per
birth. He made an estimate of the multiplier based on samples involving com-
plete population tallies is a few selected towns. He was aware that the individuals
sampled could not be independent and random, as if drawn with equal probabil-
ity from the whole of France, yet proceeded as if they were. He then explained
just how many individuals must be counted for the determination of the multi-
plier, 771,649, in order to have sufficiently high odds – a thousand to one – that
the error of the population estimate for the kingdom would be less than half a
million. The dubious exactitude of the proposed sample size then vanished as he
moved on to a recommendation that the précision demanded by the importance
of the subject calls for a census of 1,000,000 to 1,200,000. Using similar math-
ematics he calculated the probability (very low) that the difference between the
ratio of male to female births in London compared to Paris could be due merely
to chance. He could not replicate the data or the measures, but implicitly he
Precision 347
was comparing the actual results with others that might be anticipated based on
his customary probability model of drawing balls from an urn, supposing the
underlying chances to be the same as those given by the statistics (Bru, 1988;
Gillispie, 1997).
The earliest effort to create a system of mass production and interchangeable
parts, undertaken in France in the last decades before the French Revolution,
also involved a sense of precision as something measurable, though this was not
necessarily given an explicitly probabilistic form. In contrast to the American
system of mass production, achieved by entrepreneurs and practical engineers,
savants had a large role in the early French version. Ken Alder (1997) shows
what fundamental changes in systems of manufacture and the organization of
labor would have followed from this initiative, had it succeeded. Quantita-
tive standards were to provide the discipline according to which the workmen
labored, and the skilled craftsman who worked according to his own, possi-
bly high, individualized standards could not survive. There was no immediate
prospect in 1780 of manufacturing weapons more cheaply with interchangeable
parts, since the precision required took much labor. The advantage, rather, was
a work force under tighter control and a manufacturing process less dependent
on special skills monopolized and kept secret by guildsmen. There was also
the potential, with standardization, to repair weapons more easily in the field.
Precision here took the form of “tolerance,” and could be gauged either with a
measuring stick or by comparing the piece with a standard. And this effort to
create interchangeability provides a standard for us with which to think about
the meaning of precision.
Already in this eighteenth-century initiative, precision was about interchange-
ability and standardization. On one of Janus’s faces we see scientific accuracy
based on technical knowledge and methods, and on the other, a cultural and eco-
nomic system of highly disciplined work. Precision instruments by themselves
achieve rather little, for the system depends on skilled or highly standardized
operation of them, and also on the reliability, hence uniformity, of their con-
struction. A work force of technicians and savants that can produce and operate
such instruments presupposes a well-organized system of training and appren-
ticeship. If the validity of science is to be independent of place, as scientists (and
the rest of us) commonly suppose, some of those instruments, and with them the
work practices and the institutions that make them possible, must be replicated
in new locations. Precision cannot be merely technical, but depends on and helps
to create a suitable culture. Such a culture is not indissolubly bound to capital-
ism or socialism, democracy or oligarchy, yet forms of state and of economy are
part of this system as well.
among other things, a means by which knowledge can more readily travel. He
explains (p. 6): “While qualities do not travel well beyond the local communities
where they are culturally valued, quantities seem to be more easily transportable,
and the more precise the better.” To be sure, a system of precision and objectiv-
ity would not survive in an alien world. Its life under such conditions would
resemble that of the Connecticut Yankee who tries to raise King Arthur’s Court
up to the technological standard of nineteenth-century New England, and is de-
feated by cultural backwardness and superstition. Precision necessarily includes
a capacity to replicate itself, to recreate, within a certain tolerance, the social
and economic conditions through which it was formed. The universal validity of
knowledge is the precondition as well as the outcome of modern science, which
manages somehow to transform local knowledge and personal skill, passed on
in specific locations from master to apprentice, into truths that are recognized
all over the world, if not quite everywhere. Michael Polanyi (1958), who fa-
mously emphasized “personal knowledge” and the “tacit dimension” of science,
also compared science to a liberal economy of free enterprise. Far from being
machinelike and impersonal, socialism incarnate, science for Polanyi was neces-
sarily a spontaneous and highly decentralized cultural form. Against J.D. Bernal
and the British enthusiasts for Soviet-style scientific planning, Polanyi insisted
that scientists must be left free to follow their intuitions in choosing research
problems and methods of solution.
Polanyi’s vision depended on a highly idealized version of capitalism as well
as of science. One might just as well say that effective (humane, tolerant) so-
cialism depends or would depend on a capacity to nurture rather than to squash
or rationalize away local initiative and the expert knowledge of small commu-
nities. The question of the scale of quantitative precision has no simple answer.
Polanyi’s insights regarding skill and locality are as valid for the pursuit of pre-
cision as for other aspects of science. The last decimal place of precision in a
measurement is often purchased at very high cost, and the laboratory that can
achieve it may have to be correspondingly large. But such a laboratory will be
permeated by non-replicable skills of many sorts, joining forces to combat the
sources of error that multiply relentlessly as the scale of the variability at issue
becomes ever smaller. Often a program of precision measurement will incorpo-
rate also the power of numerous repetitions, thus joining brute force to exquisite
craft in the clocklike regularity of the statistical recorder. Finally, the statistical
design itself may be, in a subtle way, unique, fitted to the special circumstances
of the observations, and it may have to be adapted when things don’t work
out quite the way they were planned. All of this applies a fortiori to therapeu-
tic experiments, measuring the medical effectiveness of pharmaceuticals, where
tightly-organized large-scale experiments are necessary even to detect the effect
of a valuable new treatment, and where measurement to two significant figures
would be a miracle.
A many-layered precision measurement in a physics or engineering laboratory
does not travel easily or carry conviction from the sheer force of the evidence
it supplies, but depends for its credibility on trust. Although much of the work
Precision 349
may be shielded from the vision of those who are interested in the result, their
trust need not be blind. Specialists in the same field will be familiar with the
instruments and their limits, and perhaps also with the particular scientists and
technicians who carried out the work. Published papers include a description of
experimental methods, enough to be illuminating to cognoscenti from the same
area of science if not to just any technically-literate person. The data, even as
filtered for publication, give indications of the limits of the procedures and of
things that might have gone wrong. Also, scientists often have other indications
of what the result ought to be, based on a model or on measurements carried
out in a somewhat different way, which can be compared with the new one. And
they may well try to incorporate some aspects of a new procedure into their own
work, and in this limited sense to replicate it.
Moreover, the last word is scarcely ever spoken in science. Brilliantly orig-
inal but quirky and unreliable techniques get their rough spots sanded down
and the conditions within which they work more closely defined. Skilled prac-
tices become routine or are automated, and may in effect be incorporated into a
manufactured instrument. The cutting-edge precision to which scientists of one
generation devote every waking moment will often, in the next, be purchased off
the shelf from a supply house and incorporated unthinkingly into work in quite
different disciplines. In this way, the most glorious triumphs of precision, pur-
sued often for their own sake, are transformed into instruments and procedures
to simplify tasks or improve reliability in the achievement of some other task.
Scientific precision, especially in the form that can most easily travel, thus con-
tributes to and depends on that other basic form of precision, manufacturing with
standardized, interchangeable parts. Precision machinery forms the nucleus of a
system of standardization that has spread over much of the world, an artificial
world within which travel is relatively unproblematical. As with Anne Tyler’s
Accidental Tourist, who leaves home with reluctance and would like every for-
eign location to be as much like his neighborhood in Baltimore as possible, you
can eat a salade niçoise, replace the battery in your watch, and pick up email on
your Blackberry almost anywhere you go.
In a similar fashion, precision and standardization help to make the world
administrable, especially by creating the conditions for information to travel. In-
formation, as Yaron Ezrahi points out, is knowledge “flattened and simplified.” It
should require little or no interpretation, and thus presume no deep intellectual
preparation, but be immediately available to almost anyone for do-it-yourself
use (Ezrahi, 2004). The existence of such information presumes much about the
world, which should contain an abundance of self-similar objects, and about its
inhabitants, who should be familiar with them: in short, a world of standardized
objects and, to a degree, standardized subjects as well. Such a world was not
made in a day, and while precision has greatly assisted the Weberian project of
rational bureaucracy, it cannot figure in this great drama as the deus ex machina.
As Wise (1995, p. 93) sagely puts it, “precision comes no more easily than
centralized government.” The pursuit of precision cannot, unassisted, created
a system of legibility and control that makes bureaucracy possible. Rather, sys-
350 T.M. Porter
and discernment, or information and wisdom. Here, once again, are numbers
performing a legal or bureaucratic function. However it often is not within the
capacity of a particular agency to pronounce authoritatively on what these num-
bers should be. In the United States, for example, they have been subject to
challenge in Congressional committee hearings and sometimes in the courts.
At times the numbers are transparently corrupt, but good intentions provide no
guarantee that they will hold up as valid or even that they should. Cost–benefit
analysis is often defended as bringing the methods and hence the efficiencies
of business to government, but it was from its beginning a technology for pub-
lic decisions, involving the quantification of effects that would never appear on
a balance sheet of a private business. This mighty project of commensuration,
which began as a somewhat loose and informal method for analyzing public con-
struction projects, was more and more strictly codified beginning in the 1930s.
By 1965 it had emerged as an ideal for the analysis of government expenditures
and regulatory actions of all kinds, a way of purging (or pretending to purge)
the corrupt play of interests from the decisions of government, which should in-
stead be objective and rational. That is, they should be turned into a problem of
measurement and calculation.
Precision in these cost–benefit studies never pretended to absolute exactitude.
The engineers who, through the 1950s, normally performed them for such agen-
cies as the Army Corps of Engineers and the Bureau of Reclamation were not
always consistent in their use of rounding, but they would rarely claim more than
two significant figures, and when the politics shifted or matured, it was quite
possible for a dismal benefit–cost ratio of 0.37 to 1 to rise above 1.0 by adding,
say, hydroelectric generation facilities that, despite these dazzling economic ad-
vantages, had somehow not at first been inserted into the plans (Porter, 1995,
p. 160). Only the pressure of powerful opponents, some of them from private
industry such as electric utilities and railroads but the most effective ones from
rival agencies, caused the rules of measurement to be spelled out more clearly.
Even after this, the decision process continued to depend as much on forming
an alliance of supporters as on the “objective” economic considerations. Still,
when in the 1940s the Bureau of Reclamation and the Corps of Engineers found
themselves embattled over projects on the Missouri River or in the epic contest
to build the Pine Flat Dam on the Kings River in California, the issue of objec-
tivity in the calculations rose to the surface and had to be defended in a battle of
experts.
At times these collisions inspired challenges directed specifically to ques-
tions of accuracy. Was the increased revenue to movie theaters in areas where
agriculture was promoted by new cheap supplies of irrigation water properly in-
cluded among the benefits of a dam built by the Bureau of Reclamation? Often,
however, precision was the great desideratum. Were the methods devised by the
Corps in the 1950s for assigning a value to recreation on reservoirs sufficiently
strict so as to exclude manipulation of the calculation and to avoid decisions
made for corrupt political reasons instead of rational bureaucratic ones? And the
issue of special preferences was, after all, the crucial concern that had stimulated
Precision 353
As science merges more and more with technology, there is a tendency for ac-
curacy to give way to precision. In a way, this can already be seen in the early
history of the metric system. The savants of the 1790s hoped to create a natural
unit of measurement, based on the circumference of the Earth. In more recent
times, scientists and historians have thought the choice of unit arbitrary, a mere
convention, but they forget the ties that bound systems of measurement with the
administration of the land and the rationalization of the economy. The makers
of the metric system envisioned, as Gillispie (1997, p. 152) points out,
a universal decimal system, embracing not only ordinary weights and measures but also
money, navigation, cartography, and land registry. . . . In such a system, it would be possi-
ble to move from the angular observations of astronomy to linear measurements of the earth’s
surface by a simple interchange of units involving no numerical conversions; from these linear
units to units of area and capacity by squaring and cubing; from these to units of weight by
taking advantage of the specific gravity of water taken as unity; and finally from weight to
price by virtue of the value of gold and silver in alloys held invariant in composition through
a rigorous fiscal policy.
Inevitably there were errors; the meter was not, and of course could not possi-
bly have been, exactly one forty-millionth part of the longitudinal circumference
of the Earth. The Euro-doubting Guardian newspaper, inspired by Méchain’s
concealment and by a title advertising a “secret error” in the founding of the
metric system, reviewed Alder’s book on the topic as evidence of corruption at
its very foundation. To modern users of the system, the inaccuracy scarcely mat-
ters. Metric measurements are a system of precision, justified by their internal
coherence, widespread adoption, and ease of use rather than by any relationship
to quantities in nature.
Absolute quantities, of course, still matter to science, and it is difficult not to
believe that accuracy is advancing along with precision in the measurement of
nature. By now the meter is again defined in terms of a natural quantity, though
as an unnatural multiple with many decimal places. Precision is of fundamen-
tal economic importance, crucial for the standardization that makes possible
not only mass production, but also the interconnection of vast grids of power,
transportation, and communication. The field of metrology is presided over by
bureaus of standards, of which the prototype was founded in newly-unified Ger-
many in 1871 (Cahan, 1989). Metrology is an engineering science that serves
as infrastructure for all the sciences and an indispensable aid to scientific com-
munication. It is concerned less directly with accurate measures of nature than
Precision 355
Acknowledgements
This paper draws extensively from my essay “Speaking Precision to Power: The
Modern Political Role of Social Science,” Social Research 73 (4) (2006) 1273–
1294.
References
Ackerman, F., Hamerling, L. (2004). Priceless: On Knowing the Price of Everything and the Value
of Nothing. The New Press, New York.
Alder, K. (1997). Engineering the Revolution: Arms and Enlightenment in France. Princeton Univ.
Press, Princeton, NJ.
Alder, K. (2002). The Measure of All Things: The Seven-Year Odyssey and Hidden Error that Trans-
formed the World. Free Press, New York.
Brian, E. (1994). La mésure de l’État: Administrateurs et géomètres au XVIIIe siècle. Albin Michel,
Paris.
Bru, B. (1988). Estimations laplaciennes. In: Mairesse, J. (Ed.), Estimations et sondages. Editions
Albatross, Paris, pp. 7–46.
Cahan, D. (1989). An Institute for an Empire: The Physikalische-Technische Reichsanstalt, 1871–
1918. Cambridge Univ. Press, Cambridge, UK.
Ezrahi, Y. (2004). Science and the political imagination in contemporary democracies. In: Jasanoff,
S. (Ed.), States of Knowledge. Routledge, New York, pp. 254–273.
Frängsmyr, T., Heilbron, J.L., Rider, R. (Eds.) (1990). The Quantifying Spirit in the Enlightenment.
Univ. of California Press, Berkeley.
Gigerenzer, G., Swijtink, Z., Porter, T.M., Daston, L., Beatty, J., Krüger, L. (1989). The Empire
of Chance: How Probability Changed Science and Everyday Life. Cambridge Univ. Press, New
York.
Gillispie, C.C. (1997). Pierre Simon de Laplace, 1749–1827: A Life in Exact Science. Princeton
Univ. Press, Princeton, NJ.
Gillispie, C.C. (2004). Science and Polity in France: The Revolutionary and Napoleonic Years.
Princeton Univ. Press, Princeton.
Golinski, J. (1995). The nicety of experiment: Precision of measurement and precision of reasoning
in late eighteenth-century chemistry. In: Wise, M.N. (Ed.), The Values of Precision. Princeton
Univ. Press, Princeton, NJ, pp. 72–91.
Grier, D. (2005). When Computers Were Human. Princeton Univ. Press, Princeton, NJ.
Heilbron, J.L. (1999). The Sun in the Church: Cathedrals as Solar Observatories. Harvard Univ.
Press, Cambridge, MA.
Kula, W. (1986). Measures and Men. Princeton Univ. Press, Princeton, NJ (translated by Richard
Szreter).
Linklater, A. (2002). Measuring America: How an Untamed Wilderness Shaped the United States
and Fulfilled the Promise of Democracy. Walker & Company, New York.
Olesko, K.M. (1995). The meaning of precision: The exact sensibility in early nineteenth-century
Germany. In: Wise, M.N. (Ed.), The Values of Precision. Princeton Univ. Press, Princeton, NJ,
pp. 103–134.
Polanyi, M. (1958). Personal Knowledge: Towards a Post-Critical Philosophy. Univ. of Chicago
Press, Chicago.
356 T.M. Porter
Porter, T.M. (1986). The Rise of Statistical Thinking, 1820–1900. Princeton Univ. Press, Princeton,
NJ.
Porter, T.M. (1992). Objectivity as standardization: The rhetoric of impersonality in measurement,
statistics, and cost–benefit analysis. In: Megill, A. (Ed.), Rethinking Objectivity, Annals of Schol-
arship 9, 19–59.
Porter, T.M. (1995). Trust in Numbers: The Pursuit of Objectivity in Science and Public Life. Prince-
ton Univ. Press, Princeton, NJ.
Porter, T.M. (2001). Economics and the history of measurement. In: Klein, J.L., Morgan, M.S.
(Eds.), The Age of Economic Measurement. Annual supplement to History of Political Economy.
Duke Univ. Press, Durham, NC, pp. 4–22.
Rusnock, A. (2002). Vital Accounts: Quantifying Health and Population in Eighteenth-Century Eng-
land and France. Cambridge Univ. Press, Cambridge.
Stigler, S.M. (1986). The History of Statistics: The Measurement of Uncertainty before 1900. Har-
vard Univ. Press, Cambridge, MA.
Terrall, M. (2002). The Man Who Flattened the Earth: Maupertuis and the Sciences in the Enlight-
enment. Univ. of Chicago Press, Chicago.
Wise, M.N. (Ed.) (1995). The Values of Precision. Princeton Univ. Press, Princeton, NJ.
Youden, W.J. (1972). Enduring values. Technometrics 14, 1–11.
CHAPTER 15
Abstract
When dichotomous choice problems are used as a means of eliciting individual
preferences, an important issue is how the choice problems should be chosen in
order to allow maximal precision in the estimation of the parameters of inter-
est. This issue is addressed by appealing to the theory of optimal experimental
design which is well-established in the statistical literature, but has yet to break
into most areas of economics. Two examples are provided of situations in which
such techniques are applicable: Willingness To Pay (WTP) for an environmen-
tal good; and degree of risk aversion. The determinant of Fisher’s information
matrix is chosen as the criterion for optimal design.
15.1. Introduction
yi = θ1 + θ2 xi + εi , i = 1, . . . , n,
εi ∼ N 0, σ 2 . (15.1)
Here, we imagine that we are in a position to choose the values taken by the
explanatory variable xi in the sample. We set out to make this choice in a way
that maximises the precision with which the unknown parameters θ1 and θ2 can
be estimated. “Precision” is a subject that is discussed in more detail and in a
more general context in Chapter 14 of this volume (Porter, this volume).
Here, we refer to precision in terms of the variation of an estimator around the
true value of the parameter being estimated. Precision in this sense is conven-
tionally measured by the standard error of an estimate. For example, it is well
known to anyone who has taken an introductory course in Econometrics that the
Optimal Experimental Design in Models of Decision and Choice 359
standard error of the ordinary least squares estimator of the slope parameter θ2
in (15.1) is:
σ2 Var(εi )
se θ̂2 = ≈ . (15.2)
i (xi − x̄)
2 n × Var(xi )
From (15.2), we clearly see that the standard error of the slope estimator, and
therefore the precision with which the slope parameter is being estimated, de-
pends on three separate factors: the variance of the equation error, the variance
of the explanatory variable, and the sample size. The first of these factors is al-
ways outside of our control. The other two are within our control, and both can
therefore be used to improve precision. These two factors appear to be equally
important: a doubling of the variance of x would appear to have the same effect
on precision as a doubling of the sample size. It therefore appears that a design
that is optimal for a given sample size is one that maximises the variance in
the explanatory variable x. For obvious reasons, this is a sensible optimisation
problem only if bounds on x are determined at the outset. Such bounds might
be determined by the range of values taken by an analogous variable in a real
setting. It is often assumed without loss of generality that x can only take values
between −1 and +1. Given such an assumption, the optimal design becomes
one in which half of the observations in the sample are allocated to the design
point x = −1, and the other half are allocated to x = +1, since this is the com-
bination of x values that maximises the variance of x given the constraint on
the range of the variable. In the jargon of experimental design, we are allocat-
ing all observations to the “corners of the design space”. This simple example
illustrates the selection of an optimal design in a highly intuitive way.
A more general approach is to consider all of the model’s parameters simul-
taneously. The information matrix is a square symmetric matrix representing
the potency of the data in respect of estimating the model’s parameters. The
most popular criterion of experimental design is D-optimality, in which a de-
sign is sought to maximise the determinant of this information matrix. Since
the variance of the estimated vector is the inverse of the information matrix,
D-optimality is seen to be equivalent to minimising the volume of the “confi-
dence sphere” surrounding the parameter estimates, and the D-optimal design
can hence be interpreted as one that maximises the precision with which the
parameters (taken together) are estimated.
A feature of linear models such as (15.1) is that the information matrix only
involves the values of x appearing in the sample, and does not involve the para-
meters. In this situation, finding the D-optimal design is easy. In this Chapter, we
are more interested in non-linear models, and for these, the information matrix
depends on the parameter values. Hence, knowledge of the parameter values is
necessary in order to design an optimal experiment. This is sometimes referred
to as the “chicken and egg” problem. At first sight this problem appears quite
damning: the experiment cannot be designed without knowledge of the parame-
360 P.G. Moffatt
ters whose values the experiment’s purpose is to find! However, there are ways
of circumventing this problem, as we shall see later.
The type of non-linear models in which we are interested are binary data
models. This interest is motivated as follows. There are situations in Economic
research in which a certain continuously measurable quantity representing indi-
vidual preferences is of interest, but the preferred way of eliciting this quantity
for a given individual is to present them with a dichotomous choice problem,
rather than directly to ask them to state the quantity. Two important examples
are contingent valuation studies (e.g. Green et al., 1998), in which Willingness
To Pay (WTP) for a public good is elicited by means of a hypothetical referen-
dum, and studies of risk attitude (e.g. Holt and Laury, 2002), in which subjects
are asked to choose between pairs of lotteries. In each case, there are reasons,
some more convincing than others, for preferring this elicitation method. When
dichotomous choice is used to elicit preferences, the resulting variable is binary,
calling for non-linear models such as logistic regression or probit in its analysis.
Given the wide acceptance of dichotomous choice as a means of eliciting
preferences, it is important for researchers to have guidance on appropriate de-
sign. In particular, it is desirable to have a clear framework for choosing the
payment levels in referendum questions, and for choosing the parameters of the
lottery pairs. Such a framework comes from the statistical literature on optimal
experimental design.
While a vast literature exists on the problem of optimal experimental design,
most of this work has been applied to linear models (Silvey, 1980; Fedorov,
1972) and most of the seminal papers were highly theoretical. Atkinson (1996)
provides a useful review of developments in optimal experimental designs, in-
cluding more recent, non-linear designs, with particular reference to their prac-
ticality. Ford et al. (1992) summarise developments in non-linear experimental
design.
A problem that has already been raised is the “chicken and egg” problem:
in non-linear settings, the parameters need to be known in advance in order to
find the D-optimal design. A possible solution to the problem is to design an
“interactive” experiment, in which subjects’ choices are continually monitored,
and all choices made up to a particular stage in the experiment are used in con-
structing a design which is locally optimal for the next stage of the experiment.
This approach was adopted by Chaudhuri and Mykland (1993, 1995). There is
a problem with this approach: it violates the requirements of incentive compati-
bility. Intelligent subjects have a tendency to alter their behaviour if they believe
that their choices may have an influence the future course of an experiment. For
this reason, we restrict our attention to the search for a design which can be com-
pletely determined before the start of the experiment, although we acknowledge
that interactive experiments have recently been performed in ways that avoid
the incentive compatibility problem (e.g. Eckel et al., 2005). Such methods are
described in Section 15.5.
In theory, the problem of unknown parameter values could be approached by
taking expectations of the determinant of the information matrix over a prior dis-
Optimal Experimental Design in Models of Decision and Choice 361
tribution for the parameters. However, the algebra involved can be problematic
and such a technique would normally rely heavily on numerical routines.
Ponce de Leon (1993) adopted a Bayesian approach for generalised linear
models. The approach was then adapted by Müller and Ponce de Leon (1996)
to discriminate between two competing pairwise choice models. Although the
problem they analysed is similar in spirit to ours, our models are somewhat more
complex, with potentially more parameters, which tend to make the Bayesian
approach seem less attractive.
Instead we adopt an approach which takes parameter estimates from a past
study (or possibly a pilot study), and treats these estimates as is they were true
parameter values in the computation of the D-optimal design criterion.
Section 15.2 describes situations in which the dichotomous choice elicitation
methods have become popular, and attempts to justify the use of the method in
these contexts. Section 15.3 presents a brief introduction to experimental design
theory, covering first linear models and then the more relevant non-linear bi-
nary data models. Section 15.4 applies the optimal design results introduced in
Section 15.3 to the economic models of Section 15.2. Section 15.5 contains dis-
cussion of a number of issues relating to the framework developed in the chapter.
Section 15.6 concludes.
It follows that:
yi ∼ N xi β, σ 2 . (15.5)
where Φ(·) is the standard normal c.d.f. Equation (15.6) is the definition of a
binary probit model, with the suggested amount si included as an explanatory
variable along with the variables contained in the vector xi . Note that the coef-
ficient of si is necessarily negative, and an estimate of the parameter σ can be
deduced from knowledge of it. In turn, an estimate of the vector β could then
be deduced from the coefficient of xi . A further technical issue is that since the
structural parameters, β and σ , are non-linear functions of the reduced form
parameters estimated using the probit model, the delta-method (Greene, 2003,
p. 913) is required in order to compute standard errors.
In the present chapter, we mainly restrict attention to situations in which there
are no explanatory variables; all respondents have the same expected WTP, μ.
So, instead of (15.3) we have simply:
yi = μ + εi (15.7)
Optimal Experimental Design in Models of Decision and Choice 363
Our ultimate objective will be to choose values of the explanatory variable si that
will allow the two structural parameters μ and σ to be estimated with greatest
precision.
x 1−r
U (x) = , r = 1 (15.9)
1−r
then we can deduce that the subject’s coefficient of relative risk aversion is r =
0.186. If we are interested in measuring attitudes to risk over the population, this
might therefore seem an obvious way to proceed.
It is not obvious how to elicit a subject’s certainty equivalent in a way that
is incentive compatible. One popular method is the Becker–DeGroot–Marschak
(BDM, Becker et al., 1964) mechanism, which operates as follows. It is ex-
plained to the subject that when they have reported their valuation of a given
lottery, a random price will be drawn from a uniform distribution. If the random
price is less than the subject’s reported valuation, the subject will play the lot-
tery; if the random price exceeds the valuation, the subject receives that price
instead of playing the lottery. It is easily verified that this mechanism has the
virtue of incentive compatibility. However, a common criticism of it is that it is
hard for subjects to comprehend sufficiently for the incentive compatibility to
take hold.
An alternative to BDM which also claims to be Incentive Compatible is the
ordinal pay-off scheme (Tversky et al., 1990; Cubitt et al., 2004). In this scheme,
subjects are presented with a sequence of lotteries, for each of which they are
asked to state a certainty equivalent. They are informed that when they have
completed the experiment, two of the lotteries will be chosen at random from the
sequence, and the subject will play out the one to which they assigned a higher
value. Like the BDM scheme, this scheme claims to be incentive compatible.
A commonly reported problem with obtaining certainty equivalents, which-
ever of the above schemes is used, is that subjects have a tendency to report the
expected value of a lottery, that is, they tend to report a risk-neutral certainty
equivalent.
364 P.G. Moffatt
Fig. 15.1: A histogram of the monetary valuations of 112 subjects of the lottery (0.50, £10).
Source: Cubitt et al. (2004).
To verify this, in Fig. 15.1 we show data from Cubitt et al. (2004), who, as
previously noted, use the ordinal pay-off scheme. Shown in Fig. 15.1 is the
distribution over the sample of 112 subjects of the certainty equivalents of the
lottery (0.50, £10). By this notation, we mean a 50% chance of £10 and a 50%
chance of nothing. We see that more than half (58) of the 112 subjects report a
valuation of exactly £5.00, which is, of course, the expected value of this lottery.
Furthermore, the mean over the sample is 5.07, which is certainly not signif-
icantly different from 5.00 (p = 0.63). This goes against the widely-accepted
belief that the vast majority of people are risk averse.
The important point here is that there are strong reasons for believing that
choices between lotteries are a more reliable source of information on risk atti-
tudes than reported valuations of lotteries. In fact, the tendency to use expected
values for certainty equivalents is an obvious explanation for the reversal phe-
nomenon (Grether and Plott, 1979) – the tendency to value the riskier lottery
more highly but to choose the safer lottery when asked to choose between them.
Optimal Experimental Design in Models of Decision and Choice 365
Assume that the subject is asked to choose between two lotteries. Numerical
techniques can be used to compute the value of r for which a subject is indiffer-
ent between the two lotteries in question. We shall refer to this as the threshold
risk aversion parameter for the choice problem, and denote it as r ∗ .
Assume that subject i is presented with a choice problem with threshold risk
level ri∗ . Let yi = 1 if the safer of the two lotteries is chosen, and yi = −1 if the
riskier is chosen. The probability of the safe choice is:
∗
μ 1 ∗
P (yi = 1) = P ri > ri = Φ + − r . (15.11)
σ σ i
Once again we have a probit model, identical in form to that developed in the
context of referendum contingent valuation in Section 15.2.1. Once again we are
interested in choosing values of the explanatory variable, this time r ∗ , that allow
us to estimate the two structural parameters μ and σ as precisely as possible.
Consider a model in which the scalar dependent variable is y, the single ex-
planatory variable is x, and the probability, or probability density, associated
with a particular observation (yi , xi ) is f (yi | xi ; θ), where θ is a k × 1 vector
of parameters. Assume that there are a total of n independent observations. The
log-likelihood function for this model is:
n
Log L = ln f (yi | xi ; θ). (15.12)
i=1
The maximum likelihood estimate (MLE) of the parameter vector θ is the value
that maximises Log L. The information matrix is given by:
2
∂ Log L ∂ Log L ∂ Log L
I =E − =E . (15.13)
∂θ∂θ ∂θ ∂θ
366 P.G. Moffatt
The variance of the MLE is given by the inverse of the Information matrix.
Hence standard errors of individual estimates are obtained from the square roots
of the diagonal elements of I −1 .
The principle of D-optimal design is simply to select values of xi , subject
to the specified constraints, that maximise the determinant of the information
matrix. This is equivalent to minimising the volume of the “confidence ellipsoid”
of the parameters contained in θ , that is, estimating the entire set of parameters
with maximal overall precision.
Clearly, the information matrix and its determinant increase with the sample
size n. Often, when we are comparing designs, we need to adjust for the num-
ber of observations, so we divide the information matrix by n to obtain the per
observation information matrix.
yi = θ1 + θ2 xi + εi , i = 1, . . . , n,
εi ∼ N (0, 1),
−1 xi +1 ∀i. (15.14)
Assume that the investigator has control over the values taken by the explana-
tory variable xi , subject only to a lower and an upper bound, which we assume
without loss of generality to be −1 and 1. Note that the error term is assumed
to be normally distributed, and, for the sake of further simplicity, to have unit
variance. Given these assumptions concerning the error term, we may construct
the log-likelihood function for this model as:
n
log L = k − (yi − θ1 − θ2 xi )2 (15.15)
i=1
where k is a constant. It is easily verified that in this model the MLEs of the two
parameters are the same as the estimates from a least squares regression of y
on x. Differentiating twice with respect to the two parameters θ1 and θ2 we find
the information matrix to be:
n xi
I= . (15.16)
xi xi2
From (15.18) it is clear that the differences between different x-values must be
as great as possible. For this, half of the values must be set to the maximum
allowed, and the other half to the minimum. In experimental design jargon, we
are choosing all design points from the “corners of the design space”.
yi∗ = θ1 + θ2 xi + εi , i = 1, . . . , n,
εi ∼ N (0, 1) (15.19)
but all that is observed is whether y ∗ is positive or negative. That is, we observe
y where:
1 if yi∗ > 0,
yi = (15.20)
−1 if yi∗ 0.
n
Log L = ln Φ yi × (θ1 + θ2 xi ) . (15.21)
i=1
where
[φ(θ1 + θ2 xi )]2
wi = .
Φ(θ1 + θ2 xi )[1 − Φ(θ1 + θ2 xi )]
The determinant of the information matrix may be written as:
n
n
|I | = wi wj (xi − xj )2 . (15.23)
i=1 j =i+1
368 P.G. Moffatt
Fig. 15.2: Determinant of information matrix against percentile of larger design point; probit and
logit.
Optimal Experimental Design in Models of Decision and Choice 369
If the required number of design points is odd, the optimal design is to place
one design point exactly in the centre, and to divide the remaining points equally
between the 13th and 87th percentiles.
A well-known alternative to probit for the modelling of binary data is the logit
model, defined by:
exp(θ1 + θ2 xi )
P (yi = 1) = ≡ Pi . (15.24)
1 + exp(θ1 + θ2 xi )
The information matrix has the same form as (15.22) above, with weights given
by:
wi = Pi (1 − Pi ). (15.25)
With (15.25) in (15.23), it is found, again numerically, that the design points
that maximise |I | are the 18th and 82nd percentiles of the underlying response
function. The broken line in Fig. 15.2 shows |I | against the percentile of the
upper design point for the logit model.
We are often led to believe there are no major differences between probit and
logit. For example, according to Greene (2003, p. 667), “in most applications,
the choice between these two seems not to make much difference”. It therefore
seems surprising that the optimal design points under probit are five percentiles
further into the tails than under logit.
Note that in order to find these optimal design points, the parameters of the
underlying distribution (i.e. θ1 and θ2 ) must be known in advance, since ob-
viously these are needed in order to recover a point on the distribution from
knowledge of its percentile. This is a manifestation of the “chicken and egg”
problem referred to in Section 15.1.
The “slope” referred to in this quote is that of the line connecting two lotter-
ies in the Marschak–Machina triangle, and is analogous to our “threshold risk
370 P.G. Moffatt
x21−r x 1−r
EU(S) = p2 + p3 3 ,
1−r 1−r
x11−r x 1−r
EU(R) = p1 + p4 4 . (15.27)
1−r 1−r
Assuming that individuals obey Expected Utility (EU) Theory, choice S is made
if and only if:
That is, the Safer choice is made if the individual’s risk aversion parameter ex-
ceeds a “threshold” level of risk aversion, r ∗ , this being a function of all four
outcomes and all four probabilities. r ∗ can be computed numerically for a given
choice problem.
Let r have the following distribution over the population:
r ∼ N μ, η2 . (15.29)
First let us assume that each subject (i) solves only one choice problem with
threshold ri∗ . Let yi = 1(−1) if subject i chooses S(R). The log-likelihood con-
tribution for subject i is:
μ 1 ∗
Log Li = ln Φ yi × + − ri (15.30)
η η
1 (0.1, $2.00; 0.9, $1.60) (0.1, $3.85; 0.9, $0.10) −1.72 1.00
2 (0.2, $2.00; 0.8, $1.60) (0.2, $3.85; 0.8, $0.10) −0.95 0.99
3 (0.3, $2.00; 0.7, $1.60) (0.3, $3.85; 0.7, $0.10) −0.49 0.98
4 (0.4, $2.00; 0.6, $1.60) (0.4, $3.85; 0.6, $0.10) −0.15 0.92
5 (0.5, $2.00; 0.5, $1.60) (0.5, $3.85; 0.5, $0.10) 0.15 0.66
6 (0.6, $2.00; 0.4, $1.60) (0.6, $3.85; 0.4, $0.10) 0.41 0.40
7 (0.7, $2.00; 0.3, $1.60) (0.7, $3.85; 0.3, $0.10) 0.68 0.17
8 (0.8, $2.00; 0.2, $1.60) (0.8, $3.85; 0.2, $0.10) 0.97 0.04
9 (0.9, $2.00; 0.1, $1.60) (0.9, $3.85; 0.1, $0.10) 1.37 0.01
10 (1.0, $2.00; 0.0, $1.60) (1.0, $3.85; 0.0, $0.10) ∞ 0.00
and we would divide the remaining subjects equally between the two problems:
Fig. 15.3: The (imputed) distribution of risk aversion parameters from the Holt Laury experiment.
Normal density super-imposed.
and:
The optimal design problem solved in Section 15.4 was built on the assumption
that only one choice problem would be solved by each subject. It is more usual in
experiments of this type for each subject to solve a sequence of choice problems.
The Random Lottery Incentive (RLI) system is commonly implemented: at the
end of the sequence, one of the chosen lotteries is selected at random and played
for real. Under reasonable assumptions, this guarantees that subjects treat each
lottery as if it were the only lottery.
The resulting data set is a panel, containing a set of T choices for each of n
subjects. To accommodate the multiple observations per subject, within-subject
variation needs to be incorporated into the model. One approach to follow
Loomes et al. (2002) and apply the Random Preference assumption, namely
Optimal Experimental Design in Models of Decision and Choice 373
that an individual i’s risk aversion parameter varies randomly between the T
problems according to:
rit ∼ N mi , σ 2 t = 1, . . . , T (15.31)
and the mean risk aversion of each subject varies across the population according
to:
mi ∼ N μ, η2 . (15.32)
This leads to the random effects probit model (Avery et al., 1983). The log-
likelihood contribution for a single subject is given by:
! "
∞
T
m − rt∗ 1 m−μ
Log Li = ln Φ yit × φ dm . (15.33)
−∞ σ η η
t=1
The random effects model defined in (15.33) has three parameters: the “be-
tween” parameters, μ and η; and the “within” parameter, σ .
To obtain the information matrix requires differentiation that is quite de-
manding, and it is not expressible in closed form, as it was in the examples
in Section 15.3. Maximising the determinant of the information matrix is there-
fore an awkward problem. It is also intuitively obvious that the optimal design
will have a more complicated structure than the designs of Section 15.3. Given
that there are two sources of randomness, two distinct design points would not
be sufficient to identify the parameters separately, and it is not clear what the
D-optimal number of design points would be. These are matters for future re-
search.
Each subject has a different risk attitude so a choice problem which induces
indifference for one subject may induce a clear preference for another. A use-
ful approach is therefore to “tailor” problems to individual subjects, using their
choices in early problems to identify their risk attitude, and then set later prob-
lems that apply an optimal design rule for that subject.
As mentioned in Section 15.1, the obvious criticism of this approach is the po-
tential violation of incentive compatibility: subjects may manipulate the experi-
ment by deliberately making false responses in an effort to “steer” the problem
sequence in the direction of the most desirable problem types.
The problem has been addressed by Eckel et al. (2005) who apply a modified
version of RLI. A universal set of choice problems is determined at the outset.
Then a non-random sequence of problems is drawn from the universal set, with
each one chosen in the light of previous responses in order to locate indifference.
But, it is made clear at the outset that the problem that is played for real is drawn
374 P.G. Moffatt
randomly from the universal set, and not just from the subset of problems solved.
If one of the solved problems is drawn, the chosen lottery is played; if one of
the unsolved problems is drawn, the subject is asked to solve that problem as an
additional task, and then it is immediately played for real.
The crucial feature of this modified RLI system is that the choices made by
the subject have no effect whatsoever on the set of problems over which the ran-
domisation is performed, or on the probabilities of each problem being drawn.
It is this feature that guarantees incentive compatibility.
15.6. Conclusion
The use of dichotomous choice problems in economic research calls for a thor-
ough analysis of the issue of optimal design of such experiments. The main
objective of this chapter has been to bring some well-developed ideas concerning
optimal design into the mainstream of economic research. The chosen criterion
has been the popular D-optimal design criterion, under which the determinant
of the model’s information matrix is maximised. The key ideas are that when
the model is linear, the design points should be as far apart as possible, at the
“corners of the design space”, but for binary data models, this requirement is
countered by the requirement of “utility balance” – that the design points are in
the middle of the underlying distribution. The net effect of these counteracting
requirements is, somewhat intriguingly, that the optimal design points in binary
data models are at identifiable percentiles of the distribution, fairly near to the
tails. The optimal percentiles depend on which model is assumed.
Another issue is that, while in linear models, the optimal design points can
be found, in non-linear models such as the binary data models considered in
this Chapter, the parameters of the underlying distribution need to be known
in order for the optimal design points to be found. This is a problem that can
be addressed by using results from a previous study in designing an optimal
experiment. It was in this spirit that the example on estimating the distribution
of risk attitudes over the population in Section 15.4 was presented.
References
Alberini, A. (1995). Optimal designs for discrete choice contingent valuation surveys: Single-bound,
double-bound and bivariate models. Journal of Environmental Economics and Management 28,
287–306.
Atkinson, A.C. (1996). The usefulness of optimum experimental designs. Journal of the Royal Stat-
istical Society, Series B 58, 59–76.
Avery, R.B., Hansen, L.P., Hotz, V.J. (1983). Multiperiod probit models and orthogonality condition
estimation. International Economic Review 24, 21–35.
Becker G., DeGroot, M., Marschak, J. (1964). Measuring utility by a single-response sequential
method. Behavioural Science 9, 226–236.
Bishop, R.C., Heberlein, T.A. (1979). Measuring values of extra-market goods: Are indirect mea-
sures biased? American Journal of Agricultural Economics 61, 926–930.
Camerer, C.F. (2003). Behavioral Game Theory. Princeton Univ. Press, Princeton, NJ.
Optimal Experimental Design in Models of Decision and Choice 375
Chaudhuri, P., Mykland, P.A. (1993). Non-linear experiments: Optimal design and inference based
on likelihood. Journal of the American Statistical Association 88, 538–546.
Chaudhuri, P., Mykland, P.A. (1995). On efficient designing of non-linear experiments. Statistica
Sinica 5, 421–440.
Cubitt, R.P., Munro, A.A., Starmer, C.V. (2004). Testing explanations of preference reversal. Eco-
nomic Journal 114, 709–726.
Eckel, C., Engle-Warnick, J., Johnson, C. (2005). Adaptive elicitation of risk preferences. Mimeo.
CIRANO.
Fedorov, V.V. (1972). Theory of Optimum Experiments. Academic Press, New York.
Ford I., Torsney, B. Wu, C.F.J. (1992). The use of a canonical form in the construction of locally
optimal designs for non-linear problems. Journal of the Royal Statistical Society, Series B 54,
569–583.
Green D., Jacowitz, K.E., Kahneman, D., McFadden, D. (1998). Referendum contingent valuation,
anchoring, and willingness to pay for public goods. Resource and Energy Economics 20, 85–116.
Greene, W.H. (2003). Econometric Analysis, fifth ed. Prentice Hall, New York.
Grether, D.M., Plott, C.R. (1979). Economic theory of choice and the preference reversal phenom-
enon. American Economic Review 69, 623–638.
Hanemann, M., Kanninen, B.J. (1998). The statistical analysis of discrete-response CV data. In:
Bateman, I.J., Willis, K.G. (Eds.), Valuing Environmental Preferences: Theory and Practice of
the Contingent Valuation Method in the US, EU and Developing Countries. OUP, Oxford.
Hey, J.D., di Cagno, D. (1990). Circles and triangles: An experimental estimation of indifference
lines in the Marschak–Machina triangle. Journal of Behavioural Decision Making 3, 279–306.
Holt, C.A., Laury, S.K. (2002). Risk aversion and incentive effects. American Economic Review 92,
1644–1655.
Huber, J., Zwerina, K. (1996). The importance of utility balance in efficient choice designs. Journal
of Marketing Research 23, 307–317.
Kanninen, B.J. (1993a). Design of sequential experiments for contingent valuation studies. Journal
of Environmental Economics and Management 25, S1–S11.
Kanninen, B.J. (1993b). Optimal experimental design for double-bounded dichotomous choice con-
tingent valuation. Land Economics 69, 138–146.
Loomes, G., Moffatt, P.G., Sugden, R. (2002). A microeconometric test of alternative stochastic
theories of risky choice. Journal of Risk and Uncertainty 24, 103–130.
Louviere, J.J., Hensher, D.A., Swait, J.D. (2000). Stated Choice Methods: Analysis and Application.
Cambridge Univ. Press, Cambridge.
Müller, W.G., Ponce de Leon, A.C.M. (1996). Optimal design of an experiment in economics. Eco-
nomic Journal 106, 122–127.
Ponce de Leon, A.C.M. (1993). Optimum experimental design for model discrimination and general-
ized linear models. PhD thesis. London School of Economics and Political Sciences, Department
of Mathematical and Statistical Sciences, London.
Silvey, S.D. (1980). Optimum Design. Chapman and Hall, London.
Tversky, A., Slovic, P., Kahneman, D. (1990). The causes of preference reversal. American Eco-
nomic Review 80, 204–217.
This page intentionally left blank
CHAPTER 16
Abstract
This chapter provides an introduction to smoothing methods in time series analy-
sis, namely local polynomial regression and polynomial splines, that developed
as an extension of least squares regression and result in signal estimates that
are linear combinations of the available information. We set off exposing the
local polynomial approach and the class of Henderson filters. Very important
issues are the treatment of the extremes of the series and real time estimation,
as well as the choice of the order of the polynomial and of the bandwidth. The
inferential aspects concerning the choice of the bandwidth and the order of the
approximating polynomial are also discussed. We next move to semiparametric
smoothing using polynomial splines. Our treatment stresses their relationship
with popular stochastic trend models proposed in economics, which yield expo-
nential smoothing filters and the Leser or Hodrick–Prescott filter. We deal with
signal extraction filters that arise from applying best linear unbiased estimation
principles to the the linear mixed model representation of spline models and
establish the connection with penalised least squares. After considering several
ways of assessing the properties of a linear filter both in time and frequency do-
main, the chapter concludes with a discussion of the main measurement issues
raised by signal extraction in economics and the accuracy in the estimation of
the latent signals.
Every valley shall be exalted, and every mountain and hill made low,
and the crooked shall be made straight, and the rough places plain.
And the glory of the Lord shall be revealed, and all flesh shall see it together
for the mouth of the Lord hath spoken it.
Isaiah 40:4-5
16.1. Introduction
The smoothing problem has a long and well established tradition in statistics and
has a wide range of applications in economics. In its simplest form it aims at pro-
viding a measure of the underlying tendency from noisy observations, and takes
the name of signal extraction in engineering, trend estimation in econometrics,
and graduation in actuarial sciences. This chapter provides an introduction to
smoothing methods, namely local polynomial regression and spline smoothing,
that developed as an extension of least squares regression and result in signal
estimates that are linear combinations of the available information. These lin-
ear combinations are often termed filters and the analysis of the filter weights
provides useful insight into what the method does.
Although the methods can be applied to cross-sectional data, we shall deal
with time series applications. In particular, for a time series yt we assume an
additive model of the form:
y t = μ t + t , t = 1, . . . , n, (16.1)
where μt is the trend component, also termed the signal, and t is the noise, or
irregular, component. We assume throughout that E(t ) = 0, whereas μt can be
a random or deterministic function of time. If the observations are not equally
spaced, or the model is defined in continuous time, then we shall change our
notation to y(t) = μ(t) + (t).
The smoothing problem deals with the estimation of μt . If μt is random,
the minimum mean square estimator of the signal is E(μt |Yn ), where Yt =
{y1 , . . . , yt } denotes the information set at time t. Estimation is said to be car-
ried out in real time if it concerns E(μt |Yt ), using the available information up
to and including time t. If the model (16.1) is Gaussian, these inferences are
linear in the observations. Why is such set of problems relevant? Essentially, in
economics we seek to separate the permanent movements in the series from the
transitory ones. A related objective is forecasting future values.
The simplest and historically oldest approach to signal extraction is using a
global polynomial model for μt , which amounts to regressing yt on a polyno-
mial of time, where global means that the coefficients of the polynomial are
constant across the sample span and it is not possible to control the influence
of the individual observations on the fit. In fact, it turns out that global poly-
nomials are amenable to mathematical treatment, but are not very flexible: they
can provide bad local approximations and behave rather weirdly at the begin-
ning and at the end of the sample period, which is inconvenient for forecasting
Least Squares Regression: Graduation and Filters 379
Fig. 16.1: Industrial Production Index, Manufacture and Assembly of Motor Vehicles, seasonally
adjusted, Italy, January 1990–October 2005.
purposes. This point is illustrated by the first panel of Fig. 16.1, which plots the
original series, representing the industrial production index for the Italian Auto-
motive sector (monthly data, 1990.1–2005.10; source: Istat), and the estimate of
the trend arising from fitting cubic (P3) and quintic (P5) polynomials of time. In
particular, it can be seen that a high order is needed to provide a reasonable fit
(the cubic fit being very poor), and that extrapolations would be troublesome at
the very least.
The subsequent panels illustrate the smoothed estimates of the trend arising
from methods that aim at overcoming the limitations of the global approach,
while still retaining its simplicity (due to the linearity). The top right picture
plots the smoothed estimates of the trend resulting from the unweighted mov-
ing average of three consecutive observations, (yt−1 + yt + yt+1 )/3, which
arises from fitting a local linear trend to three consecutive observations, using
a uniform kernel (this is also known as a Macaulay’s moving average). Little
smoothing has taken place.
The bottom left panel plots the estimates resulting from the Henderson fil-
ter, which results from fitting a cubic polynomial to 23 consecutive observations
centred at t (11 on each side). The plot illustrates the advantages of local poly-
nomial fitting over the traditional global polynomial approach: the degree of the
approximating polynomials can be chosen of low order to produce a reasonable
fit. Finally, the last panel displays the estimates of the smoothing cubic spline
trend with smoothness parameter 1600, which yields results indistinguishable
from the popular Hodrick–Prescott filter (Hodrick and Prescott, 1997).
380 T. Proietti and A. Luati
The chapter is structured as follows. We set off exposing the local polynomial
approach (Section 16.2), and the class of Henderson’s filters (Section 16.3). Very
important issues are the treatment of the extremes of the series and real time es-
timation, which are dealt with in Section 16.4, and the choice of the order of the
polynomial and the bandwidth, which are the topic of Section 16.5. Section 16.6
presents some generalisations. We next move to an alternative fundamental ap-
proach, provided by polynomial splines (Section 16.7). Our treatment stresses
its relationships with popular stochastic trend models proposed in economics,
which yield exponential smoothing filters and the Leser (1961), also known as
the Hodrick–Prescott, filter. In Section 16.8 we deal with several ways of as-
sessing the properties of a linear filter. We conclude with a discussion of the
main issues raised by signal extraction in economics and the accuracy in the
estimation of the latent signals.
The literature on smoothing methods is very large and our review cannot be
but incomplete. For instance, we deal neither with the related flexible regression
approach proposed by Hamilton (2001), based on the notion of a random field,
nor with wavelets, which have a range of applications in economics, and fre-
quency domain methods based on band-pass filters (see Pollock, 1999). Early
references on moving average filters and (local) polynomial time series regres-
sion are Kendall et al. (1983) and Anderson (1971). For local polynomial regres-
sion essential references are Loader (1999) and Fan and Gjibels (1996). A book
on spline smoothing is Green and Silverman (1994), whereas Hastie and Tibshi-
rani (1990), Farhmeir and Tutz (1994) and Ruppert et al. (1989) are excellent
references on semiparametric regression. Polynomial spline models are related
to the time series literature on unobserved components models, trend estimation,
and state space methods, exposed in Harvey (1989) and Durbin and Koopman
(2001). An excellent review of graduation is Boumans (2004, Section 3).
In the estimation of the unknown level, we would like to weight the observa-
tions differently according to their distance from time t. In particular, we may
want to assign larger weight to the observations that are closer to t. For this
purpose we introduce a kernel function κj , j = 0, ±1, . . . , ±h, which we as-
sume known, such that κj 0, and κj = κ−j . Hence, the κj ’s are non-negative
and symmetric with respect to j . As a result, the influence of each individual
observation is controlled not only by the bandwidth h but also by the kernel.
Provided that p 2h, the p + 1 unknown coefficients βk , k = 0, . . . , p, can
be estimated by the method of weighted least squares (WLS), which consists of
minimising with respect to the βk ’s the objective function:
h
2
S β̂0 , . . . , β̂p = κj yt+j − β̂0 − β̂1 j − · · · − β̂p j p . (16.3)
j =−h
h
m̂t = e1 β̂ = e1 (X KX)−1 X Ky = w y = wj yt−j ,
j =−h
which expresses the estimate of the trend as a linear combination of the obser-
vations with coefficients
The trend estimate is local since it depends only on the subset of the ob-
servations that belong to the neighbourhood of time t. The linear combination
yielding our trend estimate is often termed a (linear) filter, and the weights wj
constitute its impulse responses. The latter are time invariant and carry essen-
tial information on the nature of the estimated signal; their properties will be
discussed in Section 16.8. For the time being we state two important ones: sym-
metry and reproduction of pth degree polynomials.
Symmetry (wj = w−j ) follows from the symmetry of the kernel weights κj
and the assumption that the available observations are equally spaced. Concern-
ing the second, from (16.4) we have that X w = e1 , or equivalently,
h
h
wj = 1, j l wj = 0, l = 1, . . . , p.
j =−h j =−h
The filter arising for K = I (uniform kernel) has w = X(X X)−1 e1 and it is
known as a Macaulay’s filter. In the case of a local constant polynomial, that
is p = 0 and κj = 1, ∀j , the signal extraction filter is the arithmetic moving
average: wj = w = 1/(2h + 1), j = 0, ±1, . . . , ±h. The same weights arise in
the case p = 1 i.e. for a local linear fit, but this is true only of the central weights
for equally spaced observations.
(S4 − S2 j 2 )
wj = κj ,
S0 S4 − S22
κj = (h + 1)2 − j 2 (h + 2)2 − j 2 (h + 3)2 − j 2 ,
j = 0, ±1, . . . , ±h, where Sk = hj=−h κj j k . Therefore, the Henderson filters
emerge from WLS estimation of a local cubic polynomial using the particular
kernel given above. Table 16.1 reports the filter weights for different values of
the bandwidth parameter.
16.4. The Treatment of the Extremes of the Series – Real Time Estimation
j Weights wj
h=4 h=6 h=8 h = 11
1. the construction of asymmetric filters that result from fitting a local poly-
nomial to the available observations yt , t = n − h + 1, n − h + 2, . . . , n.
The approximate model yt+j = mt+j + t+j is assumed to hold for j =
−h, −h + 1, . . . , n − t, and the estimators of the coefficients β̂k , k = 0, . . . , p,
minimise
n−t
2
S β̂0 , . . . , β̂p = κj yt+j − β̂0 − β̂1 j − · · · − β̂p j p .
j =−h
Hence, the trend estimates for the last h data points, m̂n−h+1 , . . . , m̂n , use
respectively 2h, 2h − 1, . . . , h + 1 observations.
2. Apply the symmetric two sided filter to the series extended by h forecasts
ŷn+l|n , l = 1, . . . , h, (and backcasts ŷ1−l|n ).
In the sequel we shall denote by m̂t|t+r the estimate of the signal at time t
using the information available up to time t + r, with 0 r h; m̂t|t is usually
known as the real time estimate since it uses only the past and current infor-
mation. Figure 16.2 displays the central and asymmetric filters for computing
m̂t|t+r of the Henderson filter with h = 8.
Both strategies imply that the final h estimates of the trend will be subject
to revision as new observations become available. An intuitive and easily estab-
lished fact is that if the forecasts ŷn+l|n are optimal in the mean square error
sense, then the variance of the revision is a minimum. The two strategies coin-
cide only when the future observations are generated according to a polynomial
function of time of degree p, so that the optimal forecasts are generated by the
same polynomial model.
Least Squares Regression: Graduation and Filters 385
To prove this result let us start by partitioning the matrices X, K and the
vector y as follows:
Xa ya Ka 0
X= , y= , K=
Xm ym 0 Km
which is the estimate of the intercept of the polynomial that uses only the avail-
able information. Hence, the asymmetric filter weights are given by
wa = Ka Xa (Xa Ka Xa )−1 e1 ,
Suppose that we add an observation to the current set ya , yt+r+1 , and denote
by xt+r+1 = [1, (r + 1), (r + 1)2 , . . . , (r + 1)p ] the (r + 1)st row of the ma-
trix X. If the first strategy is adopted, then we can express the estimate m̂t|t+r+1 ,
which uses the newly available observation, in terms of the previous estimate,
plus a revision term which depends on a fraction of the one-step-ahead forecast
error:
b
(A ± buv )−1 = A−1 ∓ A−1 uv A−1 . (16.6)
1 ± bv A−1 u
As illustrated by Fig. 16.2 the asymmetric filter weights of the Henderson filter
change rapidly, as new observations are added. This adds to the variability of the
estimates of the trend, and is detrimental to their reliability, as they are subject
Least Squares Regression: Graduation and Filters 387
16.5. Inference
The filter (16.4) depends on three characteristics: the degree of the approxi-
mating polynomial, the shape of the kernel function and the bandwidth h (or,
equivalently, the length of the filter H ). All these factors jointly contribute to
balance the trade-off between variance and bias, that will be discussed in the
following subsection.
Thus m̂t is biased, unless μt+j = mt+j , j = 0, ±, 1, . . . , ±h, i.e. the true signal
is a polynomial of order p. The bias arises from neglecting higher order terms
in the Taylor expansion:
μt − E(m̂t ) = μt − wj μt−j = dt − wj dt−j
j j
∞ 1 (k) k
where dt−j = μt−j − mt−j = k=p+1 k! μt−j j , is the remainder of the Tay-
(k)
lor’s approximation, μt−j being the kth derivative of the trend at t − j .
The bias is inversely related to p and is positively related with h. As a matter
of fact, the higher is p, the more reliable is the polynomial approximation (i.e.
the size of dt is lower); also, the suitability of the local polynomial approxima-
tion is higher the smaller the neighbourhood of time t that is considered. Hence,
388 T. Proietti and A. Luati
in order to minimise the bias we ought to take p high and h low. On the other
hand, higher degree polynomials also have more coefficients to estimate, result-
ing in higher variability. Also, if h is small the estimates use few observations,
so that by increasing h we decrease their variance.
As far as the variance is concerned,
2
2
Var(m̂t ) = E m̂t − E(m̂t ) = E wj (yt−j − μt−j )
j
=σ 2
w2j
j
which can be estimated using a variety of methods (see e.g. Fan and Gjibels,
1996).
Usually, p = 1, 2 are adequate choices for the degree of the fitting polynomial,
although the Henderson filter (p = 3) is fairly popular in time series applica-
tions.
16.5.2. Cross-validation
Let m̂t\t denote the two-sided estimate of the signal at time t which does not
use yt . Using (16.5) and the matrix inversion lemma (16.6),
m̂t\t = e1 (X KX − κ0 e1 e1 )−1 (X Ky − κ0 yt e1 )
−1 κ0 −1 −1
= e1 (X KX) + (X KX) e1 e1 (X KX)
1 − κ0 e1 (X KX)−1 e1
× (X Ky − κ0 yt e1 )
1
= e (X KX)−1 (X Ky − κ0 yt e1 )
1 − w0 1
1 w0
= m̂t − yt .
1 − w0 1 − w0
1
yt − m̂t\t = (yt − m̂t ).
1 − w0
n (yt − m̂t )2
CV = (yt − m̂t\t )2 = ,
t
(1 − w0t )2
t=1
where the subscript t in w0t signifies that the filter weights are different at the
extremes of the sample, so that the leverage varies with t.
Figure 16.3 plots CV for different values of the bandwidth parameter and
the trend estimates corresponding to the h = 9 for which the cross-validation
score is a minimum, along with its 95% confidence bounds computed using the
standard error estimates obtained as indicated in the next section.
The estimation of σ 2 can be done using the residuals from the local polynomial
n 2 "
E(RSS) = E yt − wj t yt−j
t=1 j
n
2 "
=E yt − mt − wj t (yt−j − mt )
t=1 j
390 T. Proietti and A. Luati
Fig. 16.3: Henderson filter: cross-validation scores and interval estimates of the signal.
n
2 "
=E t − wj t t−j
t=1 j
n
2 "
=E t2 −2 wj t t t−j + wj t t−j
t=1 j j
n n
"
=σ 2
n−2 w0t + w2j t .
t=1 t=1 j
This suggests that we can estimate the error variance by correcting the RSS:
RSS
σ̂ 2 = n n 2 .
n−2 t=1 0t +
w t=1 ( j wj t )
This estimate can be used in turn to compute interval estimates of the signal; e.g.
an approximate 95% confidence interval for μt is
1
2
m̂t ± 2 σ̂ 2 w2j t .
j
Least Squares Regression: Graduation and Filters 391
In the previous sections we have focussed on the simplified case when equally
spaced observations are available, the bandwidth is fixed and the support of the
kernel is discrete. The generalisation to unequally spaced observations and con-
tinuous kernels proceeds as follows. Assuming that n observations y(ti ) are
made at the at time points ti , i = 1, . . . , n, the estimate of the signal at time
t ∈ (t1 , tn ) is computed by minimising the WLS criterion function:
S β̂0 , . . . , β̂p
n
ti − t 2
= κ y(ti ) − β̂0 − β̂1 (ti − t) − · · · − β̂p (ti − t)p
b
i=1
where κ(z) is the kernel function, which is symmetric and non-negative. The
smoothing parameter b > 0 determines the bandwidth of the kernel, since
κ(z) = 0 for |z| > 1. If b tends to zero then m̂(ti ) = y(ti ). On the other hand, if
b tends to infinity, then all the observations will receive weight equal to 1/n and
the estimation gives the ordinary least squares solution.
The estimate of the trend is m̂(t) = β̂0 = wt y, where y = [y(t1 ), . . . , y(tn )] ,
and wt = e1 (Xt Kt Xt )−1 Xt Kt where Xt is an n × (p + 1) matrix with ith row
[1, (ti − t), . . . , (ti − t)p ], and Kt = diag[κ( t1b−t ), . . . , κ( tnb−t )] is n × n.
The case p = 0 (local constant fit) yields the well-known Nadaraya–Watson
estimator (Nadaraya, 1964; Watson, 1964):
n
1 ti − t
m̂(t) = n ti −t κ y(ti ),
i=1 κ( b )
b
i=1
where the weights for signal extraction are provided by the normalised kernel
coefficients.
There is a large literature on kernels and their properties. An important class,
embedding several widely used kernels, is the class of Beta kernels:
s r
κ(z) = krs 1 − |z|r I |z| 1 , krs =
2B(s + 1, 1r )
1
B(a, b) = za−1 (1 − z)b−1 dz
0
with a, b > 0 is the Beta distribution function. The pair (r = 1, s = 0) gives the
uniform kernel (yielding the Macaulay filters in the discrete case), r = s = 1
392 T. Proietti and A. Luati
k
p
μ(t) = β0 + β1 (t − t1 ) + · · · + βp (t − t1 )p + ηi (t − ti )+ , (16.7)
i=1
y = μ + = Xβ + Zη + , (16.8)
where the tth row of X is [1, (t − 1), . . . , (t − 1)p ], and Z is a known ma-
trix whose ith column contains the impulse response signature of the shock ηi ,
p
(t − ti )+ .
In the sequel we shall assume that observations are available at discrete times,
y(ti ) = yi , i = 1, . . . , n, and that the knots are placed at the times at which
observations are made (ti = i). Hence, each new observation carries “news”,
which produce the structural change.
The simplest truncated power basis arises for p = 0 and consists of step func-
tions with jumps of size 1 at the knots. The corresponding zero degree spline
is
n−1
t−1
μ(t) = β0 + ηi (t − i)0+ = β0 + ηi , t = 1, . . . , n, (16.9)
i=1 i=1
where ηt ∼ NID(0, ση2 ) and (t − i)0+ = 1 for t > i, and zero otherwise. Equation
(16.9) defines a random walk and can be reformulated as a stochastic difference
equation: μt+1 = μt + ηt , t = 1, . . . , n − 1, with starting value μ1 = β0 . Thus,
a shock ηt occurring at time t is accumulated into the future values of the level
and has unit long run impact. The shock signature is constant and is displayed
in the upper left panel of Fig. 16.4.
The model yt = μt + t , with μt given above and t ∼ NID(0, σ2 ), is known
as the local level model and plays an important role in the time series literature,
since the forecasts are an exponentially weighted moving average (EWMA) of
the current and past observations, and the smoothed estimates of μt are given
by a two-sided EWMA.
394 T. Proietti and A. Luati
n−1
μ(t) = β0 + β1 (t − 1) + ηi (t − i)0+
i=2
t−1
= β0 + β1 (t − 1) + ηi , t = 1, . . . , n,
i=2
the trend becomes a random walk with drift and can be represented by the sto-
chastic difference equation μt+1 = μt + β1 + ηt , t = 2, . . . , n − 1, with starting
values μ1 = β0 , μ2 = μ1 + β1 .
Another important trend model, known as the local linear trend model, arises for
p = 1:
n−1
μ(t) = β0 + β1 (t − 1) + ηi (t − i)1+
i=2
t−1
= β0 + β1 (t − 1) + (t − i)ηi , t = 1, . . . , n. (16.10)
i=2
Least Squares Regression: Graduation and Filters 395
Notice that, in order to enhance the identifiability of the parameters, the equally
spaced knots range from time 2 to n − 1. Equation (16.10) defines an inte-
grated random walk (IRW) and can be reformulated as a stochastic difference
equation: μt+1 − 2μt + μt−1 = ηt , t = 2, . . . , n, with starting values μ1 = β0 ,
μ2 = β0 + β1 . The impulse response function is linear and a shock is doubly
accumulated (integrated) in the future values of the level (see the upper right
panel of Fig. 16.4).
It should be noticedthat μt is continuous at each time point t (whereas the
first derivative, β1 + ηi (t − i)0+ , is discontinuous at t = i). To allow for a
discontinuity at each t = i we introduce a linear combination of (t − i)0+ :
n−1
n−1
μ(t) = β0 + β1 (t − 1) + ηi (t − i)1+ + ωi (t − i)0+
i=2 i=1
t−1
t−1
= β0 + β1 (t − 1) + (t − i)ηi + ωi , (16.11)
i=2 i=2
where we take ωi ∼ NID(0, σω2 ), uncorrelated with ηi . The trend model (16.11)
can be rewritten as a random walk with stochastic drift, δt , evolving as a random
walk: μt+1 = μt + δt + ηt , δt+1 = δt + ωt . See Harvey (1989) for more details.
Obviously, if σω2 = 0, then the model reduces to (16.9).
Consider the cubic spline model, which arises from setting p = 3 in (16.7):
3
n
μ(t) = βj (t − 1)j + ηi (t − i)3+ . (16.12)
j =0 i=1
The response signature of a shock is a cubic function of time (see the bottom-
left panel of Fig. 16.4) and the signal follows a third degree polynomial outside
the observations interval. This trend model displays too much flexibility for
economic time series, that is paid for with excess variability, especially at the
beginning and at the end of the sample period. Out of sample forecasts tend to
be not very reliable, as they are subject to high revisions as new observations
become available. This is the reason why it is preferable to impose the so called
natural boundary conditions, which constrain the spline to be linear outside the
boundary knots. Similar considerations were made for local polynomial smooth-
ing, see Section 16.4.2.
The original cubic spline model (16.12) has 4 + n parameters. The natural
boundary conditions require that the second and the third derivatives are zero for
t 1 and t n. As we shall see shortly they impose 4 restrictions (2 zero restric-
tions and 2 linear restrictions) on the parameters of the cubic spline. The second
396 T. Proietti and A. Luati
and third derivatives are respectively μ (t) = 2β2 + 6β3 (t − 1) + 6 i ηi (t − i)+
and μ (t) = 6β3 + 6 i ηi (t − i)0+ . For μ (t) to be equal to zero for t 1 and
t n we need β3 = 0 and i iηi = 0; moreover, μ (t) = 0 for t 1 and t n
requires also β2 = 0 and i ηi = 0.
Defining x(t) = [1, (t − 1)] , β = [β0 , β1 ] , z(t) = [(t − 1)3+ , . . . , (t − n)3+ ] ,
and η = [η1 , . . . , ηn ] , the natural cubic spline can be represented as μ(t) =
x(t) β + z (t)η. If we further collect in the vector μ the values of the spline at
the data points i = 1, . . . , n, μ = [μ1 , . . . , μn ] , and define X = [x1 , . . . , xn ],
with xi = [1, (i − 1)] , Z = [z1 , . . . , zn ], with zi = [(i − 1)3+ , . . . , (i − n)3+ ] we
can write μ = Xβ + Zη, where η satisfies the constraints X η = 0.
Also, the second derivative can be written as a linear combination of the
elements of η, μ (t) = v(t) η, where v(t) = 6[(t − 1)+ , . . . , (t − n)+ ] . De-
noting γi = μi the value of the second derivative at the ith data point i =
2, . . . , n − 1 (γ1 = γn = 0 for a natural spline), and defining the vector γ =
[γ2 , . . . , γn−1 ] , the boundary conditions X η = 0 imply that we can write
η = Dγ , γ = (D D)−1 D η, where D is the n × (n − 2) matrix
⎡ ⎤
1 0 ··· ··· 0
⎢ .. .. ⎥
..
⎢ −2 1 . . ⎥
.
⎢ ⎥
⎢ .. .. ..⎥
⎢ 1 −2 . . .⎥
⎢ ⎥
⎢ .. .. ⎥
D=⎢ 0 1 . . 0 ⎥ . (16.13)
⎢ ⎥
⎢ . .. .. ⎥
⎢ .. 0 . . 1 ⎥
⎢ ⎥
⎢ . .. .. .. ⎥
⎣ .. . . . −2 ⎦
0 0 ... 0 1
Replacing into the expressions for the spline and the second derivative gives:
where we have set r(t) = D v(t). The generic element of the vector r(t) is
6[(t − i)+ − 2(t − i − 1)+ + (t − i − 2)+ ], a triangular function which is nonzero
in the interval (i, i + 2) and peaking at i + 1, where it takes the value 6.
The integral of the squared second derivative between t = 1 and t = n is
n
2
μ (t) dt = η v(t)v(t) η dt = γ r(t)r(t) dt γ = 6γ Rγ
1
Least Squares Regression: Graduation and Filters 397
,
where λ 0 is the smoothness parameter, [μ (t)]2 dt = 6γ Rγ , and D μ =
Rγ . In the next section we argue that minimising the PLS objective function
with respect to μ is equivalent to maximising the posterior density f (μ|y), as-
suming the prior density γ ∼ N(0, σγ2 R−1 ), for a given scalar σγ2 .
The shock signature of the smoothing spline can be obtained as ZDR−1/2 ,
where R1/2 is the Choleski factor of R. The bottom right panel of Fig. 16.4
shows that this is cubic between knot i and knot i + 2, after which reverts to a
linear function of time.
16.7.4. Inference
(y − Xβ − Zη) −1 −1
(y − Xβ − Zη) + η η η (16.15)
with respect to β and η. The first term measures the fit and the second can be
seen as a penalisation term. It is perhaps worthwhile to stress that here β is fixed
but unknown.
398 T. Proietti and A. Luati
we obtain:
−1 −1
−1 −1 −1
β̂ = X −1
y X X y y, η̂ = −1
η + Z Z Z (y − Xβ̂).
η̂ = η Z −1
y (y − Xβ̂)
−1
= E(η) + Cov(η, y) Var(y) y − E(y) . (16.16)
μ̂ = Xβ̂ + μ −1
y (y − Xβ̂)
−1
= E(μ) + Cov(μ, y) Var(y) y − E(y) . (16.17)
Posterior mode estimation yields the optimal estimator of the trend in the
sense that MSE(μ̂) is a minimum. In particular, for a Gaussian model the
mode is coincident with the mean, and thus μ̂ = E(μ|y), the estimation error
has zero mean, E(μ̂ − μ|y) = 0, and E[(μ − μ̂)(μ − μ̂) |y] = μ −1 y +
−1 −1 −1 −1
y X(X y X) X y is a minimum. If gaussianity is not assumed
the previous expressions provide the best linear unbiased estimators of the fixed
and random effects.
If β is taken as a diffuse vector, β ∼ N(0, β ), −1
β → 0, the solution is
unchanged. As a matter of fact, posterior mode estimation entails the maximi-
sation of the joint density f (β, η|y) ∝ f (η)f (β)f (y|η, β), but the prior f (β)
does not depend on β (β −1
β β → 0).
An alternative equivalent characterisation of the trend estimates proceeds
from the following argument. Let
be a matrix which spans the nullity of X,
so that
X = 0. If the columns of X span a polynomial of degree k,
is
a matrix performing (k + 1)st order differences. Then, premultiplying both
sides of the trend equation by
yields
μ =
Zη. A rank n − k − 1 lin-
ear transformation is performed to annihilate the regression effects, and thus
1 This result generalises the matrix inversion lemma (16.6), see Henderson and Searle (1981).
Least Squares Regression: Graduation and Filters 399
μ ∼ N(0,
Z η Z
). The prior distribution of μ can be singular, as it oc-
curs for all the spline models discussed in the previous section, for which the
rank of Z η Z is n minus the column rank of the X matrix, but
μ has a
nonsingular normal distribution.
Let us consider the problem of choosing μ so as to maximise
(y − μ) −1 −1
(y − μ) + μ
(
Z η Z
)
μ. (16.18)
The above objective function is a penalised least squares criterion. The opti-
mal estimate of a signal arises from maximising a composite criterion function
which has two components, the first measuring the closeness to the data, and
the second the departure from zero of the differences
μ (i.e. a measure of
roughness). Penalised least squares is among the most popular criteria for de-
signing filters that has a long and well established tradition in actuarial sciences
and economics (see Whittaker, 1923; Henderson, 1916; Leser, 1961, and, more
recently, Hodrick and Prescott, 1997).
Differentiating with respect to μ and equating to zero, we obtain
−1 −1 −1
μ̂ = −1
+
(
Z η Z
)
y
= I −
(
Z η Z
+
)−1
y
= I −
−1
y
y
= y − ˆ ,
with y =
(Z η Z + )
, and ˆ =
−1
y
y = Cov(, y) ×
[Var(
y)]−1
y.
The equivalence with the expression (16.17) is demonstrated as follows. We
start defining the projection matrices
QX = X(X −1 −1 −1
y X ) X y , Q = y
(
y
)−1
, (16.19)
μ̂ = QX y + μ −1
y Q y
= I − I − μ −1
y Q y
= I − −1
y Q y
= I −
(
y
)−1
y
= I −
−1
y
y
(y − μ) (y − μ) + λμ
μ,
where λ = σ2 /ση2 is the reciprocal of the signal–noise ratio and provides a mea-
sure of the smoothness of the fit. The solution simplifies to μ̂ = (I + λ
)−1 y.
If λ = 0, the smoothing matrix is the identity matrix, μ̂ = y, and no smoothing
occurs. On the contrary, when λ → ∞, μ̂ = Xβ, a polynomial trend.
For the local level model
μ yields the first differences μt+1 − μt and
it can be shown (Harvey and Koopman, 2000) that if a doubly infinite sam-
ple isavailable, the estimate of the trend in √ the middle of the sample is μ̂t =
−1 + 2 − λ−2 + 4λ−1 )/2, and the real time
j (θ ) yt−j , where θ = (λ
1−θ j
1+θ
estimate is anexponentially weighted moving average of the available observa-
tions, μ̂t|t = ∞ j
j =0 θ yt−j . The corresponding filter is known as an exponential
smoothing (ES) filter. See Gardner (1985) for a review.
The Leser (1961) and Hodrick–Prescott (Hodrick and Prescott, 1997, HP)
filter arises in the special case = σ2 I, η = ση2 I, λ = σ2 /ση2 , and
is equal
to the matrix D given in (16.13). Figure 16.5 displays the middle and the last
row of the smoothing matrices (I + λ
Fig. 16.5: Two-sided and one-sided filter weights for the ES and Leser–HP filters for different
values of the smoothness parameter λ.
Least Squares Regression: Graduation and Filters 401
The solution yields μ̂ = (I + λDR−1 D )−1 y = [I − λD(R + λD D)−1 D ]y. No-
tice that D μ ∼ N(0, σγ2 R) is the matrix formulation of an ARIMA(0,2,1) model
for the trend, μt+1 = ξt + ϑξt−1 , where ϑ/(1 + ϑ 2 ) = 1/4.
Suitable algorithms are available for the efficient computation of μ̂; for
smoothing splines, the Reinsch algorithm (Green and Silverman, 1994) exploits
the banded structure of R and DD . If the polynomial spline models are cast
in the state space form, the computations can be carried out efficiently via the
Kalman filter and smoother (KFS, see Harvey, 1989, and Durbin and Koopman,
2001). The cross-validation residuals are also computed by KFS (de Jong, 1988).
The use of state space methods is advantageous also because the evaluation
of the likelihood, the computation of forecasts and the time series innovations,
along with other diagnostic quantities, are produced as a by-product of the KFS
calculations.
The smoothness parameter λ, and more generally the covariance matrices η
and , play an essential role in the estimation of μ, determining the bandwidth
of the smoothing filter. As Fig. 16.5 illustrates, when λ increases the weights
pattern becomes more smeared across adjacent time points. The estimation of
the variance parameters can be performed by cross-validation or by maximum
likelihood estimation (MLE), where the log-likelihood is given by
(y; η , , β)
1- .
=− ln |Z η Z + | + (y − Xβ) (Z η Z + )−1 (y − Xβ) ;
2
1
R (y; η , ) = β (y; η , ) − ln X (Z η Z + )−1 X.
2
distribution with a mean of zero and an arbitrarily large variance matrix. This is
suitable if the stochastic process for the trend has started in the indefinite past;
then the diffuse assumption is a reflection of the nonstationarity of μt .
Local polynomial smoothing and polynomial splines yield estimates that can be
written μ̂ = Wy, where W is the smoothing matrix. The plot of the (reversed)
rows of W provides useful information; Fig. 16.2 plots the central and the final h
rows of the Henderson smoothing matrix with h = 8, whereas Fig. 16.5 displays
the central and real time weights of the ES and Leser–HP filters.
Let us denote the spectral decomposition of the smoothing matrix W =
n
i=1 i vi vi , where vi denotes the eigenvector corresponding to the ith eigen-
ρ
value ρi , so that Wvi = ρi vi , ρ1 · · · ρn , and vi vj = I (i = j ). The decom-
position can be used to characterise the nature of the filter, as the eigenvectors
illustrate what sequences are preserved or compressed via a scalar multiplica-
tion. If ρi = 1 then the sequence vi is preserved with no modification, if ρi > 1
then it is amplified, otherwise it is damped. If ρi = 0 then it is annihilated.
The rank of W quantifies the computational complexity of the smoother, in
the sense that low rank smoothers use considerably less than n basis compo-
nents (eigenvectors) whereas full-rank smoothers uses approximately the same
number of basis components as the sample size. A related measure is the num-
ber of equivalent the degrees of freedom, which is often used for measuring the
smoothness (on an inverted scale) of the filter. Developing a notion first intro-
duced by Cleveland (1979), Hastie and Tibshirani (1990) define df = tr(W) as
the degrees of freedom of a smoother, which corresponds to total influence of
the observations. In local polynomial smoothing df increases with the order of
the polynomial and decreases as the bandwidth increases; for polynomial spline
models, it is inversely related to the smoothness parameter. The maximum value
is df = n, which occurs for W = I.
The residual degrees of freedom of a smoother are defined as dfres = n −
2 tr(W) + tr(WW ). This measure has been already used when we corrected the
residual sum of squares in polynomial regression to produce an estimate of the
error variance (see Section 16.5.3).
A different and perhaps more informative approach in a time series setting is
the analysis of the filter in the frequency domain. A comprehensive treatment of
filtering in the frequency domain is provided by Pollock (1999). Given a filter,
e.g. any one of the rows of the matrix W, we can investigate the effect of the
filter by measuring the effects induced on particular sequences, yt = cos(ωt),
where ω is the frequency in radians, that describe a regular periodic pattern with
unit amplitude and periodicity equal to 2π/ω. As the frequency ω increases, the
period reduces and for ω = π , cos(πt) = (−1)t describes a cycle with a period
of two observations.
Least Squares Regression: Graduation and Filters 403
is the gain of the filter and measures how the amplitude of the periodic com-
ponents that make up a signal are modified by the filter. If the gain is 1 at a
particular frequency, this implies that the periodic component defined at that
frequency is preserved; vice versa, fluctuations with periodicity at which the
gain is less than one are compressed. The function θ (ω) = arctan[α ∗ (ω)/α(ω)]
is the phase function and measures the displacement of the periodic function
along
the time axis. For symmetric filters the phase function is zero, since
j wj sin(ωj ) = 0.
Figure 16.6 plots the gain G(ω) versus the frequency ω for the central weights
of local polynomial and spline filters. The top panels refer to the local cubic fit
using a uniform kernel (Macaulay) and the Henderson filters for two values of
the bandwidth parameter. The filters preserve the low frequency components
in the original series to a different extent. In particular, increasing h, yields
smoother estimates, as the amplitude of the high frequency components is fur-
ther reduced. The zeros in the gain imply that certain cycles are annihilated.
The example also illustrates that the choice of the kernel does matter when one
considers the effects on the amplitude of periodic components. The bottom pan-
els exemplify the role of the smoothness parameter on the properties of the ES
and Leser–HP filters. In particular, increasing λ enhances the smoothness of
the filtered estimates as the filter retains to a greater extent the low frequency
components, corresponding to fluctuations with a long period. Also, it can be
anticipated that the Henderson filter with h = 11 will produce a rougher esti-
mate of the signal compared to Leser–HP with λ = 1600.
16.9. Discussion
This chapter has dealt with signal extraction methods which originate from dif-
ferent approaches and which yield linear filters, that is linear combinations or
moving averages of the observations, to extract the feature of interest. Filter-
ing has a long tradition is economics and actuarial sciences (Anderson, 1971,
Chapter 3). Some methods (e.g. smoothing by polynomial splines) originated in
other fields and were later imported into economics, were data are observational
rather than experimental (see Spanos, 1999). In this process the components of
the measurement model were somewhat reified, by attaching to them peculiar
economic content. In fact, in economics the decomposition yt = μt + t has
been assigned several meanings. The first is of course coincident with the orig-
inal meaning, where t is a pure measurement error. Indeed, errors in variables
have a long tradition in econometrics: the case is investigated when a response
variable (e.g. consumption) is functionally related to μt (e.g. income), but only
a contaminated version, yt in our notation, is observable. Also, the component
t can originate from survey sampling errors (see Scott and Smith, 1974 and
Pfeffermann, 1991, and the references therein).
Quite often t is interpreted as a behavioural component, such as a stochas-
tic cycle (a deviation or growth cycle) or transitory component, whereas μt is
the trend, or permanent component. The underlying idea is that trends and cy-
cles can be ascribed to different economic mechanisms and an understanding
of their determinants helps to define policy targets and instruments. Needless to
say, the formulation of dynamic models for the components turns out to be a
highly controversial issue, due to the fact that there are several observationally
equivalent decompositions consistent with the observations, yielding the same
forecasts and the same likelihood. This final section discusses a few open issues
concerning signal extraction in economics and the accuracy in the estimation of
the latent signals.
Least Squares Regression: Graduation and Filters 405
16.9.1. Accuracy
16.9.1.1. Validity
The validity (bias) of a smoother is usually difficult to ascertain, as it is related to
the appropriateness μ∗t as a model for the signal. This is a complex assessment,
involving many subjective elements, such as any a priori available information
and the original motivation for signal extraction. Goodness of fit measures can
be used along with cross-validation, but one needs to take into consideration the
well-known risk of overfitting, which takes place when too much variation of
the observed data is explained by the model.
In local polynomial and spline smoothing B(μ̂t ) arises from misspecifying
the degree of the polynomial. Increasing the degree is beneficial for the bias at
the cost of an inflated variance (less precision). This issue is related to overdif-
ferencing and to that of spuriousness, which shall be considered shortly. In the
analysis of economic time series a great deal of research has been attracted by
making inference on the order of integration. The trade-offs arising when high
order trends are entertained, e.g. d μ∗t = ηt , where ηt is a purely random dis-
turbance and d 2, are discussed in Proietti (2007).
Recently, there has been a surge of interest in model uncertainty and in
model averaging. Typically, several methods μ̂it are compared (e.g. for mea-
suring the trend in output we may compare structural vector autoregressive
406 T. Proietti and A. Luati
16.9.1.2. Reliability
A measurement method is reliable (precise) if repeated measurements of the
same quantity are in close agreement. Loosely speaking, reliability and precision
are inversely related to the uncertainty of an estimates. In the measurement of
immaterial constructs the sources of reliability would include:
(i) parameter uncertainty, due to the fact that the core parameters of the rep-
resentation μ∗t , such as the variance of the disturbances driving the compo-
nents, are unknown and have to be estimated;
(ii) estimation error, the latent components are estimated with a positive vari-
ance even if a doubly infinite sample on yt is available;
(iii) statistical revision, as new observations become available, the estimate of
a signal are updated so as to incorporate the new information.
The first source can be assessed by various methods both in the classical (Ans-
ley and Kohn, 1986) and the Bayesian approach (Hamilton, 1986; Quenneville
and Singh, 2000). In an unobserved component framework the Kalman filter
and smoother provide all the relevant information for assessing (ii) and (iii); for
nonparametric filters such as X-11 sliding span diagnostics and revision histo-
ries have been proposed (Findley et al., 1990).
Staiger et al. (1997) and Laubach (2001) find that estimates of the NAIRU, ob-
tained from a variety of methods, are highly imprecise, in that if one attempted to
construct confidence intervals around the point estimates, he/she would realise
that they are too wide to be of any practical use for policy purposes; similar
findings are documented in Smets (2002) Orphanides and van Norden (2002).
Least Squares Regression: Graduation and Filters 407
Somewhat different conclusions are reached by Planas and Rossi (2004) and
Proietti, Musso and Westermann (2007). The implications of the uncertainty sur-
rounding the output gap estimates for monetary policy are considered in Smets
(2002).
The sources (ii) and (iii) typically arise due to the fact that the individual
components are unobserved and they have a dynamic representation. The avail-
ability of additional time series observations helps to improve the estimation of
an unobserved component, apart from degenerate cases, such as the Beveridge–
Nelson (1981) decomposition and those arising from structural vector autore-
gressions, for which the latent variable is actually measurable with respect to
past and current information.
Recently, large dimensional dynamic factor models have become increas-
ingly popular in empirical macroeconomics. The essential idea is that the pre-
cision by which the common components are estimated can be increased by
bringing in more information from related series: suppose for simplicity that
yit = θi μt + it , where the ith series, i = 1, . . . , N , depends on the same com-
mon factor, which is responsible for the observed comovements of economic
time series, plus an idiosyncratic component, which includes measurement er-
ror and local shocks. Generally, multivariate methods provide more reliable
measurements provided that a set of related series can be viewed as repeated
measures of the same underlying latent variable. Stock and Watson (2002a and
2002b) and Forni et al. (2000) discuss the conditions on μt and it under which
dynamic or static principal components yield consistent estimates of the under-
lying factor μt as both N and n tend to infinity.
An additional source of uncertainty is data revision, which concerns yt .
Timely economic data are only provisional and are revised subsequently with
the accrual of more complete information. Data revision is particularly relevant
for national accounts aggregates, which require integrating statistical informa-
tion from different sources and balancing it so as to produce internally consistent
estimates (see Chapter 8 of this volume).
The characterisation of trends and cycles has always been at the core of the
econometric analysis of time series, since it involves an assessment of the role
of supply and demand shock. A first issue is whether the kind of nonstationary
behaviour displayed by economic time series is best captured by deterministic
or stochastic trends. In the former case it is also said that the series is trend-
stationary, implying that it can be decomposed into a deterministic function of
time (possibly subject to few large breaks) and a stationary cycle; in the second
the series can be made stationary after suitable differencing and so it is said to
be difference-stationary or integrated order of order d (or I(d)), where d denotes
the power of the stationary inducing transformation, (1 − L)d .
The characterisation of the nature of the series was addressed in a very in-
fluential paper by Nelson and Plosser (1982), who adopted the (augmented)
408 T. Proietti and A. Luati
Dickey–Fuller test for testing the hypothesis that the series is I(1) versus the
alternative that it is trend-stationary. Using a set of annual US macroeconomic
time series they are unable to reject the null for most series and discuss the im-
plications for economic interpretation. Another approach is to test stationarity
against a unit root alternative; see Kwiatkowski et al. (1992).
A second issue deals with the specification of a time series model for the trend
component for difference-stationary processes and the correlation between the
components μt and t . References on this issue are Watson (1986), Morley et al.
(2002) and Proietti (2006).
Another view is that any decomposition yt = μt + t is just an approximation
to a true trend cycle decomposition, but still it may yield sound inferences for
a given purpose, such as forecasting more than one step ahead, provided that
the parameters are estimated according to a criterion that is consistent with that
purpose (e.g. multistep or adaptive estimation. See Cox, 1961, and Tiao and Xu,
1993, for the local level model).
According to Klein (1997) one of the first uses of moving averages was to dis-
guise statistical information, rather than to unveil a hidden signal. The smooth-
ing properties of arithmetic moving averages would have been exploited by the
Bank of England in order to conceal the true level of gold reserves, which was
falling steeply, whereas the filtered series gave a much more optimistic view.
However, this episode just illustrates a bad practice in data publication rather
than the inherent limitations of filters: it is the data supplier that has to be blamed
and not the instrument. The latter has well known properties, which can be bent
to particular needs, but are independent of their uses. Indeed the publication and
availability of filtered series is a service to the scientific community provided
that the raw observations are also made available and the methods employed are
made transparent. Economic analysts, policy makers and the general public do
make widespread use of filtered information: the availability and the resources
devoted to seasonal adjustment testify this. The same considerations apply: the
original unadjusted data should be available along with the adjusted series.
In the analysis of economic time series, there is great concern about the sta-
tistical “artifacts” about the economy that could emerge from the application of
ad hoc filters (i.e. filters applied regardless of the properties of the series un-
der investigation), such as the Hodrick–Prescott filter (King and Rebelo, 1993;
Harvey and Jaeger, 1993; Cogley and Nason, 1995). This issue has particular
relevance with respect to the measurement of the business cycle.
The Slutzky–Yule effect is concerned with the fact that a moving average
repeatedly applied to a purely random series can introduce artificial cycles (Slut-
sky, 1937). As such it is a rather natural phenomenon; as a matter of fact, the
squared gain |G(ω)|2 of a filter w(L) (see Section 16.8) can be viewed as the
spectral density of the series resulting from the application of the filter w(L)
to a white noise sequence. Nevertheless, the application to nonstationary series
Least Squares Regression: Graduation and Filters 409
References
Anderson, T.W. (1971). The Statistical Analysis of Time Series. Wiley, New York.
Ansley, C., Kohn, R. (1986). Prediction mean square error for state space models with estimated
parameters. Biometrika 73, 467–473.
Baxter, M., King, R.G. (1999). Measuring business cycles: Approximate band-pass filters for eco-
nomic time series. Review of Economics and Statistics 81, 575–593.
Beveridge, S., Nelson, C.R. (1981). A new approach to decomposition of economic time series into
permanent and transitory components with particular attention to measurement of the ‘business
cycle’. Journal of Monetary Economics 7, 151–174.
Boumans, M. (2004). The reliability of an instrument. Social Epistemology 18, 215–246.
Boumans, M. (2005). Economics, strategies in social sciences. In: Encyclopedia of Social Measure-
ment, vol. 1. Elsevier, pp. 751–760.
Camba-Mendez, G., Rodriguez-Palenzuela, D. (2003). Assessment criteria for output gap estimates.
Economic Modelling 20, 529–562.
Canova, F. (1998). Detrending and business cycle facts. Journal of Monetary Economics 41, 475–
512.
Cleveland, W.S. (1979). Robust locally weighted regression and smoothing scatterplots. Journal of
the American Statistical Association 74, 829–836.
Cogley, T., Nason, J.M. (1995). Effects of the Hodrick–Prescott filter on trend and difference sta-
tionary time series: Implications for business cycle research. Journal of Economic Dynamics and
Control 19, 253–278.
Cox, D.R. (1961). Prediction by exponentially weighted moving averages and related methods. Jour-
nal of the Royal Statistical Society, Series B 23, 414–422.
Dagum, E.B. (1982). The effects of asymmetric filters of seasonal factor revisions. Journal of the
American Statistical Association 77, 732–738.
de Jong, P. (1988). A cross-validation filter for time series models. Biometrika 75, 594–600.
Durbin, J., Koopman, S.J. (2001). Time Series Analysis by State Space Methods. Oxford Univ. Press,
New York.
Farhmeir, L., Tutz, G. (1994). Multivariate Statistical Modelling Based Generalized Linear Models.
Springer-Verlag, New-York.
Fan, J., Gjibels, I. (1996). Local Polynomial Modelling and its Applications. Chapman and Hall,
New York.
Findley, D.F., Monsell, B.C., Shulman, H.B., Pugh, M.G. (1990). Sliding spans diagnostic for sea-
sonal and related adjustments. Journal of the American Statistical Association 85, 345–355.
Findley, D.F., Monsell, B.C., Bell, W.R., Otto, M.C., Chen, B. (1998). New capabilities and methods
of the X12-ARIMA seasonal adjustment program. Journal of Business and Economic Statistics
16, 2.
410 T. Proietti and A. Luati
Forni, M., Hallin, M., Lippi, F., Reichlin, L. (2000). The generalized dynamic factor model: Identi-
fication and estimation. Review of Economics and Statistics 82, 540–554.
Friedman, J.H. (1984). A variable span smoother. Technical report LCS 05. Department of Statistics,
Stanford University, USA.
Gardner, E.S. (1985). Exponential smoothing: The state of the art. Journal of Forecasting 4, 1–28.
Granger, C.W.J., Newbold, P. (1974). Spurious regressions in econometrics. Journal of Econometrics
2, 111–120.
Green, P.J., Silverman, B.V. (1994). Nonparametric Regression and Generalized Linear Models:
A Roughness Penalty Approach. Chapman & Hall, London.
Gómez, V. (2001). The use of Butterworth filters for trend and cycles estimation in economic time
series. Journal of Business and Economic Statistics 19, 365–373.
Hamilton, J.D. (1986). A standard error for the estimated state vector of a state space model. Journal
of Econometrics 33, 387–397.
Hamilton, J.D. (2001). A parametric approach to flexible nonlinear inference. Econometrica 69,
537–573.
Harvey, A.C. (1989). Forecasting, Structural Time Series and the Kalman Filter. Cambridge Univ.
Press, Cambridge, UK.
Harvey, A.C., Jaeger, A. (1993). Detrending, stylised facts and the business cycle. Journal of Applied
Econometric 8, 231–247.
Harvey, A.C., Koopman, S.J. (2000). Signal extraction and the formulation of unobserved compo-
nents. Econometrics Journal 3, 84–107.
Harville, D. (1977). Maximum likelihood approaches to variance components and related problems.
Journal of the American Statistical Association 72, 320–340.
Hastie, T.J., Tibshirani, R.J. (1990). Generalized Additive Models. Chapman and Hall, London.
Henderson, R. (1916). Note on graduation by adjusted average. Transaction of the Actuarial Society
of America 17, 43–48.
Henderson, H.V., Searle, S.R. (1981). On deriving the inverse of a sum of matrices. SIAM Review
23, 53–60.
Hodrick, R., Prescott, E.C. (1997). Postwar US business cycle: An empirical investigation. Journal
of Money, Credit and Banking 29 (1), 1–16.
Kaiser, R., Maravall, A. (2005). Combining filter design with model-based filtering (with an appli-
cation to business-cycle estimation). International Journal of Forecasting 21, 691–710.
Kendall, M., Stuart, A., Ord, J.K. (1983). The Advanced Theory of Statistics, vol. 3. C. Griffin.
Kenny, P.B., Durbin, J. (1982). Local trend estimation and seasonal adjustment of economic and
social time series. Journal of the Royal Statistical Society, Series A 145 (I), 1–41.
King, R.G., Rebelo, S. (1993). Low frequency filtering and real business cycles. Journal of Economic
Dynamics and Control 17, 251–231.
Klein, J.L. (1997). Statistical Visions in Time: A History of Time Series Analysis, 1662–1938. Cam-
bridge Univ. Press. Cambridge, UK.
Kwiatkowski, D., Phillips, P.C.B., Schmidt P., Shin, Y. (1992). Testing the null hypothesis of sta-
tionarity against the alternative of a unit root: How sure are we that economic time series have a
unit root? Journal of Econometrics 44, 159–178.
Leser, C.E.V. (1961). A simple method of trend construction. Journal of the Royal Statistical Society,
Series B 23, 91–107.
Laubach, T. (2001). Measuring the NAIRU: Evidence from seven economies. Review of Economics
and Statistics 83, 218–231.
Loader, C. (1999). Local regression and likelihood. Springer-Verlag, New York.
Morley, J.C., Nelson, C.R., Zivot, E. (2002). Why are Beveridge–Nelson and unobserved-component
decompositions of GDP so different? Review of Economics and Statistics 85, 235–243.
Musgrave, J. (1964). A set of end weights to end all end weights. Working paper. Census Bureau,
Washington.
Nadaraya, E.A. (1964). On estimating regression. Theory of Probability and its Applications 10,
186–190.
Least Squares Regression: Graduation and Filters 411
Nelson, C.R., Plosser, C.I. (1982). Trends and random walks in macroeconomic time series: Some
evidence and implications. Journal of Monetary Economics 10, 139–162.
Orphanides, A., van Norden, S. (2002). The unreliability of output gap estimates in real time. Review
of Economics and Statistics 84, 569–583.
Patterson, H.D., Thompson, R. (1971). Recovery of inter-block information when block sizes are
unequal. Biometrika 58, 545–554.
Pfeffermann, D. (1991). Estimation and seasonal adjustment of population means using data from
repeated surveys. Journal of Business and Economic Statistics 9, 163–177. Reprinted in: Har-
vey, A.C., Proietti, T. (Eds.), Readings in Unobserved Components Models. Oxford Univ. Press,
Oxford, UK, 2005.
Poirier, D.J. (1973). Piecewise regression using cubic splines, Journal of the American Statistical
Association 68, 515–524.
Pollock, D.S.G. (1999). Handbook of Time Series Analysis, Signal Processing and Dynamics. Aca-
demic Press.
Proietti, T. (2006). Trend-cycle decompositions with correlated components. Econometric Reviews
25, 61–84.
Proietti, T. (2007). On the model based interpretation of filters and the reliability of trend-cycle
estimates. Econometric Reviews. In press.
Proietti, T., Musso, A., Westermann, T. (2007). Estimating potential output and the output gap for
the euro area: A model-based production function approach. Empirical Economics. In press.
Quenneville, B., Singh, A.C. (2000). Bayesian prediction mean squared error for state space models
with estimated parameters. Journal of Time Series Analysis 21, 219–236.
Robinson, G.K. (1991). That BLUP is a good thing: The estimation of random effects. Statistical
Science 6 (1), 15–32.
Rossi, A., Planas, C. (2004). Can inflation data improve the real-time reliability of output gap esti-
mates? Journal of Applied Econometrics 19, 121–133.
Ruppert, D., Wand, M.P., Carroll, R.J. (1989). Semiparametric Regression. Cambridge Univ. Press.
Scott, A.J., Smith, T.M.F. (1974). Analysis of repeated surveys using time series methods. Journal
of the American Statistical Association 69, 647–678.
Slutsky, E. (1937). The summation of random causes at the source of cyclic processes. Econometrica
5, 105.
Smets, F.R. (2002). Output gap uncertainty: Does it matter for the Taylor rule? Empirical Economics
27, 113–129.
Spanos, A. (1999). Probability Theory And Statistical Inference: Econometric Modeling With Ob-
servational Data. Cambridge Univ. Press, Cambridge, UK.
Staiger, D., Stock, J.H., Watson, M.W. (1997). The NAIRU, unemployment and monetary policy.
Journal of Economic Perspectives 11, 33–50.
Stock, J.H., Watson, M.W. (2002a). Macroeconomic forecasting using diffusion indexes. Journal of
Business and Economic Statistics 20, 147–162.
Stock, J.H., Watson, M.W. (2002b). Forecasting using principal components from a large number of
predictors. Journal of the American Statistical Association 97, 1167–1179.
Tiao, G.C., Xu, D. (1993). Robustness of maximum likelihood estimates for multi-step predictions:
The exponential smoothing case. Biometrika 80, 623–641.
Wand, M.P., Jones, M.C. (1995). Kernel Smoothing. In: Monographs on Statistics and Applied Prob-
ability 60. Chapman & Hall.
Watson, G.S. (1964). Smooth regression analysis. Shankya Series A 26, 359–372.
Watson, M.W. (1986). Univariate detrending methods with stochastic trends. Journal of Monetary
Economics 18, 49–75. Reprinted in: Harvey, A.C., Proietti, T. (Eds.). Readings in Unobserved
Components Models. Oxford Univ. Press, Oxford, UK, 2005.
Whittaker, E. (1923). On new method of graduation. Proceedings of the Edinburgh Mathematical
Society 41, 63–75.
Yule, G.U. (1926). Why do we sometimese get nonsense-correlations between time series? Journal
of the Royal Statistical Society 89, 1
This page intentionally left blank
CHAPTER 17
Abstract
Government, business and households require timely information to make cur-
rent period decisions and to cast future plans. Producers of national economic
statistics seek to provide their users with timely and accurate estimates of eco-
nomic activity where the former refers to the release of estimates “close” to the
end of the reference period and the latter refers both to the validity of the mea-
sure and its reliability. Their ability to do so is constrained by the flow of data
and thereby necessitates a flow of vintage estimates that encompass revisions.
Though a trade-off between accuracy and timeliness is somewhat intuitive, the
paper explains that care should be taken in assessing the impact of the trade-off.
In the US, there does not seem be a trade-off between timeliness and accuracy
save perhaps between the first and second current quarterly vintages of GDP
estimates for the reference period. This finding does not negate the usefulness
of having a sequence of revisions because the sequence of revisions incorpo-
rates new information which enables users to view the later estimates as closer
approximations of the truth, with the latest as being the closest.
17.1. Introduction
estimates on a quarterly basis, they too are interested in at least quarterly fre-
quencies. Households’ demand for timely data is more difficult to assess because
decisions are on-going. Investors would like daily information if it were possi-
ble.
From the perspective of the producers of economic statistics, timeliness re-
quires the ability to collect, edit, process, analyze and publish data at a time
period “close” to the reference period. In other words, they are interested in
minimizing the time between the end of the reference period and the publishing
of statistics describing the economic activity in the reference period. Thus the
chief metric for measuring timeliness is the number of days between the end of
the reference period and the release of statistics. The determination of the op-
timal number of days should take into account the needs of both suppliers and
demanders of the estimates, including the demanders’ need for accuracy of the
estimates. The demand for quarterly data would disappear if it were impossible
to produce accurate quarterly estimates.
The availability of data on aggregate economic activity can affect the nature
of decision-making. Monetary policy-makers often debate the merits of adopt-
ing policy rules in lieu of discretionary policy. This debate has been ongoing
since Milton Friedman (1968) advocated a non-discretionary monetary policy
rule and continues today in regard to the merits of inflation targeting. Because
the purpose of policy rules is to reduce the uncertainty of future policy actions,
reliance on a rule requires a high degree of accuracy for the measures of eco-
nomic performance. In the US, monetary policy is decided at the meetings of
the Federal Open Market Committee that occur about every 6 weeks, with daily
intervention in the money markets to assure that the policy is carried out. Thus
if the rule were that some money aggregate should increase at some multiple of
Gross Domestic Product (GDP) growth then volatile estimates of GDP growth
would produce volatility in the growth in the money aggregate. In contrast, a
discretionary policy implicitly assumes that the estimates are “nearly” accurate
and allows for corrections or anticipations of corrections. In principle the imple-
mentation of rules could also anticipate revisions and accordingly adjust a point
of time estimate. Alan Greenspan (2003) expressed the idea succinctly “Rules
by their nature are simple, and when significant and shifting uncertainties exist
in the economic environment, they cannot substitute for risk-management para-
digms, which are far better suited to policymaking.”1
Paradigms of the economy are not only useful to policy-making, they are also
the basis for formulating measures of economic activity. That is to say, national
income accountants must first define an interesting and useful measure of aggre-
gate economic activity, and second, they must design methods for estimating the
value of that measure, taking the definition as fixed. One cannot address the ac-
curacy of those measures without first establishing their conceptual foundations.
In the case of GDP, the most widely used measure of economic activity, the con-
ceptual foundation began to be formed in the in the 17th century, as described
in Stone (1986) and in den Butter (this volume), and has undergone changes as
economies have changed.
The existence of a trade-off between timeliness and accuracy is somewhat
intuitive and has been recognized for quite awhile. If one desires to provide
estimates of economic activity near the end of the reference period (or even
before) then the amount and quality of the available of the data are going to be
less than would be the case if the estimates were to released later. In the first
issue of the Survey of Current Business, in July 1921, the introduction stated2 :
“In preparing these figures every effort is made to secure accuracy and completeness. On
the other hand, it is realized that timeliness is often of more value then extreme accuracy. In
certain cases it is necessary to use preliminary figures or advance estimates in order to avoid
too great delay in publication after the end of each month.”
2 The Survey of Current Business is the journal of record for the US estimates of national output
and income.
3 See Rytan (1997).
4 The analysis will not address price indexes because they are typically provided on a monthly
basis and are not the subject of the discussions about the trade-off between timeliness and accuracy.
416 D. Fixler
Economic Analysis has set the following sequence of vintages: the initial cur-
rent quarterly estimate of GDP is made about 30 days after the reference period,
two more estimates are made about 30 and 60 days later and then follow annual
and comprehensive revisions that take place even later.5 The estimates that are
released in the first 90 days after the end of the reference period receive most of
the attention of policy-makers and decision-makers.
There have been considerable efforts by other countries to produce more
timely data as well. Several studies have compared the differences between the
US and other countries regarding the timeliness of the data.6 The general find-
ing was that the US was able to provide estimates relatively sooner after the end
of the reference period for a variety of economic measures. A topic of concern
in these studies was the need for decision makers to have timely data, with the
caveat the data be useful to the decision process – thus release dates cannot be
independent of the attending accuracy of the estimates.
17.3. Accuracy
5 See Grimm and Weadock (2006) for a discussion of the flow of data into the early estimates of
US GDP.
6 See, for example, Richard McKenzie (2005).
7 Hand (2004, p. 129).
Timeliness and Accuracy 417
statistical theory and an assessment of the source data – the respondents to sur-
veys, regulatory forms (administrative data) and so on – as well as an assessment
of the methodologies used to produce the estimates. In the case of survey-based
data collections, the attention is on the total survey error, which is comprised
of both sampling and non-sampling error. Examples of the evaluations for total
survey error include examining how respondents interpret the questions, errors
in reviewing and editing response, and errors in imputing missing data.
In this paper, the focus will be on the attribute “aggregate economic activity,”
though the discussion could apply to any economic measure. The notion of
measuring economic activity requires the use of economic theories to form the
conceptual foundation for the measurement concept and the production bound-
ary for the economy. Both provide the context for defining the “true” value of
economic activity that serves as the basis for gaging accuracy. The production
boundary limits the set of activities that are deemed admissible to the measure.
For example, though household production is clearly an important economic
activity, it is treated as outside the boundary of aggregate economic activity mea-
sures such as Gross Domestic Product (GDP). Such exclusions apply to a host
of non-market transactions.
A prerequisite for a measure of economic activity to achieve the first aspect
of accuracy, validity, is the use of classification system grounded in economic
principles. For example, to measure the output of all industries in the economy
one needs a classification system that organizes firms into industries according to
an economics based guideline. The Industrial Standard Industrial Classification
of All Economic Activities (ISIC) organizes industries according to similarity
in the firm activities while the North American Industrial Classification System
(NAICS) organizes industries according to similarity in the firm’s production
process.
In addition, national economic accounts are designed to provide a system
yielding a measure of aggregate production or aggregate economic activity, as
well as ways of measuring the component parts. The acceptance of these mea-
sures, which is based on a perception of their accuracy, relies in turn on the
acceptance of the system.8 If decision makers had no confidence in the concep-
tual foundations of the system upon which the estimates are based then it would
be meaningless to talk of accuracy – regardless of the statistical properties of
the estimates. Manuals such as the one for the United Nations System of Na-
tional Accounts and other standardizations of techniques provide imprimaturs
of general acceptance that in turn provide the aura of objectivity necessary to
perceptions of accuracy and confidence in the estimates. In addition, they are
8 Porter (1995) similarly maintains that it is the creation of rules and their adoption that give rise
to the confidence in the quantification of economic activity.
418 D. Fixler
9 Katzner (1991).
10 Koopmans (1947, p. 162).
11 Morgan (2001).
Timeliness and Accuracy 419
between the income and expenditure measure of GDP is labeled the statistical
discrepancy and is often cited as such an indicator.12
The measurement of the second aspect of accuracy, reliability, focuses on the
revisions to the vintages of estimates that arise from the flow of source data. It
is this dimension of accuracy that is tied to timeliness.
The relationship between timeliness and accuracy partly depends on the pro-
duction process for the estimates. Estimates compiled from survey-based data
have a number of sources of errors; inadequate sampling, concealment and fal-
sification by respondents, inadequately trained collectors and a host of other
sources that are broadly classified into the category of total survey error. For all
estimates there is also potential error in the reconciliation of the available data
and the measurement objective. In other words, how are the requirements of the
measure satisfied when there are gaps in the underlying data? The uncertainty
surrounding the production of information and the validity of the measures bears
on the usefulness of the data released at any period of time. The intuition under-
lying the trade-off between timeliness and uncertainty is that the longer one
waits the error in the data arising from these sources is reduced. As stated in
Bier and Ahert (2001):
“The main reason is that improving timeliness forces the producers to compile the indicators
from incomplete source data. As more data become available afterwards, a so called recompi-
lation process produced different results and so revisions. The ECB (European Central Bank)
considers it necessary to balance timeliness and accuracy. To determine the optimal balance
is not straightforward.”13
Rytan (1997) suggested that the balance between timeliness and accuracy
can be conceptualized by adhering to the intuition of optimization theory: con-
sumers of data choose the combination of timeliness and accuracy according to
their preference structure, thereby balancing the benefits and costs of different
combinations while producers of data choose the combination of timeliness and
accuracy subject to a budget constraint. Oberg (2002) postulated that there can
be improvements to timeliness without any cost in accuracy when the organiza-
tion is operating inefficiently.14 For example, if the process yields a first estimate
60 days after the end of the reference period, it may be possible to reduce the lag
to 30 days without any erosion in accuracy. Indeed such has been the discussion
within the OECD about the production of “flash” estimates of GDP as discussed
in Shearing (2003).15 The point is that the trade-off may not be continuous and
depends on the efficiency of the production process.
12 Many countries allocate the difference across the sectors of the economy and do not publish the
difference. In the US accounts, the sum of industry value added is constrained to equal GDP.
13 Bier and Ahert (2001, p. 4).
14 Oberg (2002).
15 Shearing (2003).
420 D. Fixler
As Bowman (1964) put it: “Revisions in the data reflect the needs for timeliness,
frequency of reporting and accuracy. Timeliness can only be obtained by using
partial information.”16 Here, revision patterns are used to discuss the timeliness
and accuracy dimensions for US GDP estimates for which the latest estimates
are treated as the most accurate – the latest estimates are viewed as closest to
the true measure. Thus it is the second aspect of accuracy, reliability, that is the
focus of concern when evaluating revision magnitudes.17
The study of revision patterns in US GDP estimates has a long history – start-
ing with Jaszi (1965). Revisions primarily come from 5 sources: (1) Replace-
ment of early source data with later, more comprehensive data; (2) Replacement
of judgmental estimates with estimates based on source data; (3) Introductions
of changes in definitions and estimating procedures; (4) Updating of seasonal
adjustment factors; and (5) Corrections of errors in source data or computations.
Aside from examining the statistical properties of revisions such as mean revi-
sion, mean absolute revision and standard deviation of revision, one can use the
revisions to assess the quality of the GDP estimates in the context of how well
those estimates perform with respect to four basic questions:18
1. Do the estimates provide a reliable indication of the direction in which real
aggregate economic activity is moving?
2. Do they provide a reliable indication of whether the change in real aggregate
economic activity is accelerating or decelerating?
3. Do they provide a reliable indication of whether the change in real aggregate
economic activity differs significantly from the longer run?
4. Do they provide a reliable indication of cyclical turning points?
To answer theses questions one must pick the standard of measure and, as
mentioned, that is usually chosen to be the latest estimates. The latest esti-
mates provide the most informed picture of aggregate economic activity for the
time period of the current quarterly estimates. In the US, the latest estimates
embody additions to the source data that were unavailable when the current
quarterly estimates were made and have also undergone some comprehensive
revisions – in these revisions the US not only incorporates improved source data
– typically data from the Economic Censuses, but also the definitions of the
measures. Changes in definitions are made to adapt the measures to a chang-
ing economy. Comprehensive revisions are performed about every 5 years and
historical series are revised as far back as possible. For example, the 1996 com-
prehensive revision changed the name of the government component of GDP
16 Bowman (1964).
17 A caveat should be kept in mind when evaluating revisions to aggregate statistics: a zero revision
does not imply the absence of error. One way of thinking about this is to consider that an aggregate
estimate can have a zero revision while the components can have large but offsetting revisions.
18 See Grimm and Parker (1998).
Timeliness and Accuracy 421
17.4.2. The case of the GDP flash estimate and the perceived trade-off
To satisfy the needs of policy makers an estimate of the Gross National Prod-
uct (the preferred measure of aggregate measure of US economic activity before
1992) was made 15 days before the end of the reference period and provided to
policy makers – the Council of Economic Advisors, the Office of Management
and Budget, the Federal Reserve Board, and the Treasury and Commerce De-
partments – and was not publicly released.19 The estimates were first produced
in the mid 1960s and the confidentiality of the estimates did not become an is-
sue until the early 1980s when these estimates somehow leaked into the public
domain. Widespread interest in these early estimates resulted in BEA deciding,
in September 1983, to release them to the public as the minus 15-day estimate
– the moniker “flash” became attached afterward. The remaining vintages of
the estimates were released about 15 days, 45 days, and 75 days after the end
of the reference period.20 There was considerable discussion about the public
release of the flash estimate – in particular whether the subsequent revisions
would be confusing to the public. In an attempt to minimize such confusion,
the flash estimates were characterized as projections because they were being
formed on partial data. Nevertheless, the concern persisted and it was reinforced
when revisions to the flash received much attention. Characteristic of these con-
cerns is the following excerpt from a New York Times editorial on September 9,
198521 :
According to the Commerce Department’s “flash” report, the economy grew at an annual rate
of 2.8 percent in the third quarter. Don’t bet your savings on it. For the last 3 years, this
quarterly report has been at least one-half point off the mark every time; once it was three
points off. A statistic so dependently wrong is one to do without.
19 More specifically these agencies were provided with estimates of current and constant dollar GNP
as well as the related measures of price change, and charges against GNP and its components. These
estimates were produced by the Department of Commerce first in the Office of Business Economics
and then the Bureau of Economic Analysis as a result of a re-organization.
20 The estimates that were released as part of the flash were: GNP in current and constant dollars,
GNP fixed weight price index and GNP implicit price deflator – not exactly the full set that was
provided under limited distribution.
21 Note that the quote focuses on the magnitude of specific revisions and not on statistical measures
of reliability such as mean and standard deviation.
422 D. Fixler
Such reviews prompted much discussion about whether the flash should con-
tinue.
Given the fact that revisions are a necessary part of the process of providing
timely estimates, the central issue concerned the performance of the flash esti-
mate with respect to the 15-day estimate. In other words, as the quote from the
New York Times suggests, would users be better off if the flash estimate were
dropped and the first estimate was released 15 days after the end of the quarter?
The validity of the flash estimate was first assessed on three non-statistical
metrics: accuracy as an indicator of whether GNP is increasing or decreasing,
by approximately how much, and whether the change is larger or smaller than
the change in the previous quarter. These metrics convey the idea that although
the flash estimate was expressed in terms of a point estimate it was not intended
to provide such an estimate of GNP growth but rather to provide – on the basis
of incomplete data – a perception of how the economy was performing relative
to the estimates of the previous quarter. An internal BEA study found that the
revisions to the flash estimate relative to the later vintage estimates, especially
the 15-day estimate, performed with respect to the 3 metrics in a way that was
not significantly different from those of the 15-day estimate. Despite this evi-
dence, the perception that the flash estimate was providing policy makers and
decision makers with erroneous information about the economy persisted and in
January 1986 the flash estimate was discontinued by BEA at the direction of the
Commerce Department.
In an unrelated study, Mankiw and Shapiro (1966) examined the revision pat-
tern of BEA estimates including the flash estimate for the period 1976, Quarter
1 to 1982, Quarter 4. Their central emphasis was whether the revisions were due
to the availability of new information or due to measurement error. If the flash
estimate of GNP was an efficient estimate, in the sense that it incorporated all
available information, then the standard deviation of the 15-day estimate should
be higher. In addition the correlation between the revision from the flash to the
15-day estimate and the 15-day estimate should be significant. Table 17.1 is part
of their findings.
Note that the standard deviation of the 15-day estimate is higher than that
of the flash. They also found that the standard deviation in the revision of the
growth rates from the flash estimate to the 15-day estimate was 1.2 percentage
Table 17.1: Summary Statistics GNP Growth Rates, 1976 Q1 to 1982 Q4, Percent at annual rate
Nominal GNP
Flash (15 day) 9.0 4.0
15-Day 9.0 4.6 0.57 (significant at 1% level)
Constant Dollar GNP
Flash 1.7 3.8
15-Day 2.0 4.0 0.35 (not significant)
Timeliness and Accuracy 423
points at an annual rate, for nominal GNP, and 1.0 percentage point for constant
dollar GNP. This revision was significantly correlated with the 15-day estimate
in the case of nominal GNP and not significantly correlated in the case of con-
stant dollar GNP (the third column in the table above).22 These findings suggest
that the accuracy of the flash estimates was about the same as the 15-day esti-
mates and that the revision was due to the availability of new information and not
measurement error – that is to say, the flash estimates were efficient estimates.
The case of the flash estimate illustrates that the presumption of a trade-off
between timeliness and accuracy in conjunction with routine revisions can lead
to an inaccurate assessment of an estimate’s performance. In this case the mis-
perception of inaccuracy undermined its usefulness and led to its discontinuance.
In turns out that even with BEA’s current estimates and procedures, there does
not seem be a trade-off between timeliness and accuracy save perhaps between
the first and second current quarterly vintages of the estimates. Table 17.2 il-
lustrates the mean revisions for different vintages of GDP growth in the period
1983–2002. The current quarterly vintages are now labeled Advance, Prelimi-
nary and Final estimates, which are respectively released about 30, 60 and 90
days after the end of the reference period. Note that if the Advance estimate
were to be eliminated so that the first estimate would be the Preliminary estimate
then there would be a modest drop in the average revision. Instead of observing
a 0.09 percentage point revision from the Advance to the Final, users would
see a −0.01 percentage point revision in the Preliminary estimate.23 It thus ap-
pears that new information is received in the time between the Advance and
Preliminary estimate and that for these two vintages there is a trade-off between
timeliness and accuracy.24 This result also holds when one looks at the revision
to the current quarterly estimates using the 1st annual revision as the standard –
the magnitude of the revision is greatest for the Advance estimate. The revisions
with respect to the Latest estimates are large because these estimates contain all
available information – this includes new source data and changes in methodol-
ogy and definitions occurring with comprehensive revisions. None of the mean
revisions, however, are statistically significantly different from zero. This result
does not negate the usefulness of having a sequence of revisions. Because the
sequence of revisions incorporates new information, mostly new data but also,
especially for the later revisions, new estimation techniques and new definitions,
22 The revision is not correlated with the flash estimate in either case.
23 The magnitudes of the revision with respect to the latest estimates reflect the definitional changes
that occur in comprehensive revisions. These changes have often increased the level and rate of
increase of GDP. See for example Fixler and Grimm (2005).
24 More than half of the advance estimates are based at least in part on trend-based estimates. A large
majority of these are replaced with annual frequency data in the first annual estimates. See Grimm
and Weadock (2006, p. 12).
424 D. Fixler
Table 17.2: Mean Revisions to Successive Vintages of Estimates of Quarterly Changes in Real
GDP to Later Vintages of Estimates, 1983–2002/1/
[Percentage points]
Table 17.3: Mean Absolute Revisions to Successive Vintages of Estimates of Quarterly Changes
in Real GDP to Later Vintages of Estimates, 1983–2002/1/
[Percentage points]
users can view the later estimates as closer approximations of the truth, with the
latest as being the closest.
Similar findings hold true for the mean absolute revisions, as can be seen in
Table 17.3. These revisions are computed without regard to sign and they pro-
vide an idea of the dispersion of the revision. Note that once again elimination
of the Advance estimate – in other words, not releasing the Advance estimate
and waiting until all of the data used in the Preliminary estimate are available
– would result in substantially lower mean absolute revisions; with the Final as
the standard the revision would fall to 0.26 percentage points, and with the first
annual estimate the reduction is from 1.12 percentage points to 0.94 percent-
age points. Note that with the latest estimates as the standard there is very little
difference in the revision between the Advance, Preliminary and Final estimates.
As mentioned, the revisions to the vintages of the estimates can also be eval-
uated in terms of the four questions listed earlier. For the period 1983–2002,
the quarterly estimates of constant dollar GDP: successfully indicated the direc-
tion of change in real GDP 98 percent of the time; correctly indicated whether
real GDP was accelerating or decelerating 74 percent of the time; indicated
whether real GDP growth was high relative to trend about two-thirds of the time
and whether it was low relative to the trend about three-fifths of the time; and
Timeliness and Accuracy 425
successfully indicated the 2 cyclical troughs in the period but only one of the
cyclical peaks.25
Producers of national economic statistics seek to provide their users with timely
and accurate estimates of economic activity, especially aggregate economic ac-
tivity. Their ability to do so is constrained by the flow of data. Thus the problem
for national statistic offices is to determine how close the release dates for es-
timates can be to the end of the reference period and at the same time provide
accurate estimates. Because the data come from a variety of sources, many of
which are not based on probabilistic samples, it is not possible to determine how
close the estimates are to the true value in a statistical sense. Furthermore, the
concept of the true value depends on a host of definitions of economic activity
as well as the underlying conceptual framework of the economy.
Intuitively, there should be a trade-off between timeliness and accuracy be-
cause early estimates are based on less data than later estimates. Yet if the
methods used to project the missing data are efficient so that the projections
have small error, then it need not be true that there is a meaningful trade-off.
This inference follows from the BEA experience with the flash estimates. How-
ever, this is not say that the early estimates are forecasts, although the distinction
between forecasts and early actual values is not sharp because each is simply an
estimate based on partial incomplete information.26 If forecasts are admitted
into the competition for accurate estimates of the “true” nature of economic ac-
tivity, then an assessment of the trade-off ought to include the benefits to users
of having information before the reference period ends.
An analysis of data on revisions to US GDP estimates yields the following
conclusions about the trade-off between timeliness and accuracy. First, there are
modest average revisions from the advance to the two later current quarterly
estimates. Second, revisions to the latest estimates largely reflect definitional re-
visions that adapt the US National Income and Product Accounts to a changing
economy. Third, relative to the latest estimates the three current quarterly vin-
tages of GDP estimates have about the same average revision without regard to
sign. One can therefore conclude that there is little cost, in accuracy, of making
advance estimates containing relatively a large number of trend-based projec-
tions. Waiting a year to publish estimates of GDP would reduce the average
revision without regard to sign of real GDP by only 0.1 to 0.2 percentage points.
Research is continuing on how to improve the efficiency of the estimates. Re-
cently, in the US the use of real time data (original and unrevised data that are
available at the time the estimates are made) has been examined to see if they
25 The last recession in the period occurred from March 2001 (peak) to November 2001 (trough)
and the short duration probably played a role in the mis-estimation of the peak.
26 McNees (1986).
426 D. Fixler
can predict revisions or provide better estimates. This research focuses on data
that are available at the time that the estimate is being made but not included in
the set of data upon which the estimates are based. In effect, such studies seek
to examine whether the estimates can be considered as rational – in the sense
that all of the available information is used to formulate the estimate.27 Fixler
and Grimm (2006) examined whether the revisions to GDP estimates are ratio-
nal with respect to real time data and found that while such data can somewhat
predict revisions, thereby making the GDP estimates irrational, there may not be
much advantage in incorporating the additional data. Relatedly, Fixler and Nale-
waik (2006) use the difference between the income and expenditure approaches
to GDP measurement as measure of differences in available data, along with the
attending to revisions to the estimates, to provide a better estimate of the “true”
GDP.
Acknowledgements
I would like to thank Adam Copeland, Bruce Grimm, Steve Landefeld, and
Marshall Reinsdorf for their valuable comments. The views expressed do not
represent those of the Bureau of Economic Analysis or the Department of Com-
merce.
References
Bier, W., Ahert, H. (2001). Trade-off between timeliness and accuracy. ECB Requirements for Gen-
eral Economic Statistics. Article published in Dutch in Economisch Statistische Berichten (ESB)
15 March, 4299.
Bowman, R. (1964). Comments on Qui Numerare Incipit Errare Incipit by Oskar Morgenstern.
American Statistician, June, 10–20.
Fixler, D., Grimm, B. (2006). GDP estimates: Rationality tests and turning point performance. Jour-
nal of Productivity Analysis 25, 213–229.
Fixler, D., Nalewaik, J. (2006). News, Noise and Estimates of the “True” Unobserved State of the
Economy, unpublished paper, February.
Fixler, D., Grimm, B. (2005). Reliability of the NIPA estimates of US economic activity. Survey of
Current Business, February, 8–19.
Friedman, M. (1968). The role of monetary policy. The American Economic Review 58 (1), 1–17.
Greenspan, A. (2003). Opening remarks in Monetary Policy and Uncertainty: Adapting to a Chang-
ing Economy. Federal Reserve Bank of Kansas City Symposium, Jackson Hole, Wyoming,
August.
Grimm, B., Parker, R. (1998). Reliability of the quarterly and annual estimates of GDP and gross
domestic income. Survey of Current Business, Decemver, 12–21.
Grimm, B., Weadock, T. (2006). Gross domestic product: Revisions and source data. Survey of
Current Business, February, 11–15.
Hand, D. (2004). Measurement Theory and Practice. Arnold, London.
27 The notion comes from the rational expectations literature. Expectations are rationally formed if
an economic agent uses all of the available information.
Timeliness and Accuracy 427
Jaszi, G. (1965). The Quarterly Income and Product Accounts of the United States: 1942–1962.
In: Short-Term National Accounts and Long-Term Economic Growth. Studies in Income and
Wealth, edited by Simon Goldberg and Phyllis Deane, 100-187. London: Bowes & Bowes, for
the International Association for Research in Income and Wealth.
Katzner, D. (1991). Our mad rush to measure how did we get into this mess? Methodus, December.
Koopmans, T.C. (1947). Measurement without theory. The Review of Economic Statistics 29 (3),
August, 161–172.
Mankiw, G., Shapiro, M. (1966). News or noise: An analysis of GDP revisions. Survey of Current
Business, May, 20–25.
McKenzie, R. (2005). Improving the timeliness of short-term economic statistics. Working paper
No 5. OECD Statistics.
McNees, S. (1986). Estimating GNP: The trade-off between timeliness and accuracy, January, 3–10.
Morgan, M.S. (2001). Making measurement instruments. In: Klein, Morgan (Eds.), The Age of Eco-
nomic Measurement, Annual Supplement to Volume 23, History of Political Economy. Duke Univ.
Press, pp. 235–251.
Oberg, S. (2002). Quality and timeliness of statistics: Is it really a trade-off? Paper presented at 88th
DGINS Conference Palermo, 19 and 20 September.
Porter, T.M. (1995). Trust in Numbers. Princeton Univ. Press, Princeton, NJ.
Rytan, J. (1997). Timeliness and reliability: A necessary trade-off. Economic Statistics, Accuracy,
Timeliness and Relevance. Kenesy, Z. (Ed.), US Department of Commerce, Bureau of Economic
Analysis, Washington, DC, June.
Shearing, M. (2003). Producing flash estimates of GDP. Recent Developments and the Experiences
of Selected OECD countries. Paper prepared for UN Economic Commission of Europe for dis-
cussion at OECD Meeting of National Accounts Experts, Paris, 10 October.
Stone, R. (1986). Nobel memorial lecture 1984: The accounts of society. Journal of Applied Econo-
metrics 1, 5–28.
This page intentionally left blank
Author Index
Numbers in italics indicate page numbers when the names appear in the reference list.
Fan, J., 380, 388, 409 Goldfarb, R., 147, 152, 328, 337, 338
Farhmeir, L., 380, 409 Golinski, J., 346, 355
Fechner, G.T., 35, 37 Gómez, V., 409, 410
Federal Open Market Committee, 123, 124, Gordon, R.J., 257, 266
131 Gorter, C.N., see Bloem, A.M., 228
Fedorov, V.V., 360, 375 Gould, J.P., 127, 131
Ferger, W.F., 154, 187 Granger, C.W.J., 257, 261, 262, 266, 267,
Ferguson, A., 23, 25, 37 409, 410
Findley, D.F., 383, 406, 409 Granger, C.W.J., see Elliot, G., 338
Finkelstein, L., 7, 17, 60, 77, 106, 130, 131, Granger, C.W.J., see Engle, R.F., 266, 291
238, 247 Green, D., 360, 361, 375
Fisher, I., 115–119, 131, 158, 161, 167, Green, P.J., 380, 397, 401, 410
176, 177, 187 Greene, C., 326, 327, 329, 338
Fisher, W., 113, 131 Greene, W.H., 362, 369, 375
Fisher, W.C., 154, 187 Greenspan, A., 414, 426
Fixler, D., 11, 16, 17, 126, 215, 343, 351, Greenstein, B., 261, 267
423, 426, 426 Grether, D.M., 85, 86, 103, 364, 375
Flavin, M.A., 275, 291 Grier, D., 344, 355
Ford, I., 360, 375 Griffiths, W.E., 303, 318
Forni, M., 262, 266, 407, 410 Griliches, Z., 257, 267
Foulloy, L., see Benoit, E., 76 Grimm, B., 416, 420, 423, 426
Frängsmyr, T., 345, 355 Grimm, B., see Fixler, D., 426
Franklin, A., 246, 247 Guild, J., see Ferguson, A., 37
Frege, G., 21, 37 GUM, 4, 15, 17
Freyens, B., see Ackland, R., 131
Friedman, B.M., 123, 124, 126, 131 Haavelmo, T., 9, 17, 18, 234, 240, 247, 253,
Friedman, J.H., 392, 410 254, 264, 267, 272, 273, 291
Friedman, M., 144, 150, 151, 152, 243, 247, Haavelmo, T., see Girshick, M.A., 292
282, 291, 324, 338, 414, 426 Haberler, G., 162, 187
Frisch, R., 155, 162, 187, 251, 253, 266 Hacker, G., see Funke, H., 187
Fuchs, V., 321, 338 Hall, R.E., 274, 275, 291, 292
Fudenberg, D., 82, 103 Hallin, M., see Forni, M., 410
Hamerling, L., see Ackerman, F., 355
Fuller, S.W., see Park, J., 319
Hamermesh, D., 333, 338
Funke, H., 165, 187
Hamilton, J.D., 262, 267, 272, 292, 380,
406, 410
Gardner, E.S., 400, 410 Hamilton, J.T., see Viscusi, W.K., 340
Garter, C.N., see de Boo, A.J., 228 Hamminga, B., see Balzer, W., 291
Geary, R.C., 166, 187 Hand, D., 416, 426
Georgeseu-Roegen, N., 144, 152 Hands, D.W., 281, 292
Gerlagh, R., 223, 228 Hanemann, M., 358, 369, 375
Ghiblawi, H., see Papell, D., 339 Hansen, A., 123, 131
Gibbard, A., 150, 152, 321, 338 Hansen, L.P., 275, 292, 327, 338
Giere, R.N., 284, 285, 292, 298, 318 Hansen, L.P., see Avery, R.B., 374
Gigerenzer, G., 331, 338, 344, 355 Harbaugh, W.T., 82, 103
Gilbert, C.L., 12, 145, 253, 255, 260, 266, Harless, D.W., 85, 103
271, 272 Harrison, G.W., 10, 18, 80, 83, 85, 86, 88,
Gilbert, C.L., see Qin, D., 268 90, 96, 101, 102, 103, 104, 298, 318, 365
Gillispie, C.C., 345, 347, 354, 355 Harrison, G.W., see Botelho, A., 103
Gini, C., 162, 187 Harrison, G.W., see Coller, M., 103
Girshick, M.A., 273, 292 Hartley, J.E., 275, 292
Gjibels, I., see Fan, J., 409 Harvey, A.C., 380, 395, 400, 401, 408, 409,
Godfrey, L.G., 315, 318 410
Goldberger, A.S., see Klein, L.R., 267 Harville, D., 401, 410
432 Author Index
Hastie, T.J., 380, 402, 410 Irwin, J.C., see Ferguson, A., 37
Hatanaka, M., see Granger, C.W.J., 266 IVM, 236, 248
Hausman, D.M., 135, 137, 142, 145, 152, IVM, see VIM
263, 264, 267, 337, 338
Hausman, J.A., 257, 267 Jacowitz, K.E., see Green, D., 375
Heath, T.L., 20, 37 Jaeger, A., see Harvey, A.C., 410
Heberlein, T.A., see Bishop, R.C., 374 Jaszi, G., 420, 427
Heckman, J., see Hansen, L.P., 338 Jenkins, G.M., see Box, G.E.P., 265
Heidelberger, M., 35, 37, 234, 235, 247, 248 Jevons, W.S., 111, 116, 131, 253, 267
Heilbron, J.L., 344, 355 Jick, T.D., 14, 18
Heilbron, J.L., see Frängsmyr, T., 355 Johansen, S., 260, 267
Helliwell, J.F., 222, 228 Johnson, C., see Eckel, C., 375
Henderson, H.V., 386, 398, 410 Johnson, E., see Harrison, G.W., 10, 103,
Henderson, R., 382, 399, 410 104, 298, 318, 365
Hendry, D.F., 253, 254, 258, 260, 267, 271, Johnson, H.M., 23, 37
272, 276, 277, 279, 287, 290, 292, 324, Johnston, J., 254, 267
338 Jones, M.C., see Wand, M.P., 411
Hendry, D.F., see Davidson, J.E.H., 291 Jonung, L., see Bordo, M.D., 131
Hensher, D.A., see Louviere, J.J., 375 Jordá, O., see Hoover, K.D., 292
Hesse, M.B., 285, 292
Heston, A., see Summers, R., 229 Kahneman, D., see Green, D., 375
Hey, J.D., 85, 104, 358, 369, 375 Kahneman, D., see Tversky, A., 375
Hill, R.J., 167, 168, 187 Kaiser, R., 409, 410
Hodrick, R., 379, 399, 400, 410 Kamarck, A., 332, 339
Hofkes, M.W., see Gerlagh, R., 228 Kanninen, B.J., 369, 375
Hölder, O., 19, 20, 35, 37, 38 Kanninen, B.J., see Hanemann, M., 375
Holt, C.A., 82, 88, 90, 104, 360, 371, 375 Kapetanios, G., see Camba-Mendez, G.,
Holtrop, M.W., 120, 131 266
Hood, W., 253, 267 Karlan, D., 80, 104
Hoover, K.D., 12, 18, 141, 145, 147, 152, Katzner, D., 418, 427
253, 267, 272, 274, 289, 292, 325–327, Kaye, G.W.C., see Ferguson, A., 37
329, 330, 332, 338 Keating, J., 279, 292
Hoover, K.D., see Demiralp, S., 291 Kemmerer, E.W., 114, 131
Hoover, K.D., see Hartley, J.E., 292 Kendall, M., 380, 410
Hope, C., 223, 228 Kendrick, J.W., 195, 228
Horowitz, J., 329, 338 Kenessey, Z., 199, 228
Hotz, V.J., see Avery, R.B., 374 Kennedy, P., 325, 327, 339
Houstoun, R.A., see Ferguson, A., 37 Kenny, P.B., 383, 410
Hsiao, C., 288, 292 Keuning, S.J., 217, 223, 229
Huber, J., 368, 375 Keuning, S.J., see Bloem, A.M., 228
Huber, P.J., 298, 318 Keuning, S.J., see de Boo, A.J., 228
Hueting, R., 223, 228 Keuning, S.J., see de Haan, M., 228
Hull, J., 262, 267 Keuzenkamp, H.A., 150, 152, 323, 325,
Hulten, C.R., 182, 187 327, 331, 332, 339
Humphrey, T.M., 126, 131 Keynes, J.M., 154, 187
Hurwicz, L., 182, 187, 273, 274, 277, 292 Khamis, S.H., 166, 187
Killian, L., see Inoue, A., 339
Inoue, A., 328, 339 Kim, J., 282, 292, 331, 339
International Labor Organization, 167, 187 King, M.L., 303, 318
International Organization for King, M.L., see Dufour, J.-M., 318
Standardization (ISO), 46, 56, 59, 62, 64, King, R.G., 408, 410
65, 67, 69, 73, 77 King, R.G., see Baxter, M., 409
Ioannides, J., 336, 339 Kinley, D., 114, 118, 132
Author Index 433
Klein, J.L., 114, 131, 132, 132, 253, 267, Loomes, G., 85, 104, 372, 375
408, 410 Louviere, J.J., 358, 375
Klein, L.R., 253, 256, 261, 267 Lovell, M., 335, 339
Klep, P.M.M., 199, 200, 229 Luati, A., see Proietti, T., 6, 16
Kline, P., 26, 37 Lucas, R.E., 149, 152, 244–246, 248, 257,
Kocher, M., 80, 104 267, 273, 274, 280, 293
Kohn, R., see Ansley, C., 409 Luce, R.D., 26, 30–32, 37, 38, 252, 268,
Konus, A.A., 162, 187 286, 293
Koopman, S.J., see Durbin, J., 409 Luce, R.D., see Krantz, D.H., 18, 37, 132,
Koopman, S.J., see Harvey, A.C., 410 292
Koopmans, T.C., 253, 255, 260, 261, 267, Luce, R.D., see Narens, L., 38, 293
273, 279, 281, 282, 292, 418, 427 Luce, R.D., see Suppes, P., 38, 294
Koopmans, T.C., see Hood, W., 267
Köves, P., see Eltetö, Ö., 187 Maas, H., 115, 132
Krantz, D.H., 6, 7, 11, 18, 26, 27, 29–31, Mach, E., 235, 248
35, 37, 106, 132, 286, 292 Machlup, F., 322, 339
Krantz, D.H., see Luce, R.D., 38, 268, 293 Maddala, G.S., 276, 293
Krantz, D.H., see Suppes, P., 38, 294 Magnus, J.R., 13, 82, 104, 215, 229, 259,
Krause, K., see Harbaugh, W.T., 103 298, 300, 309, 315, 318, 319
Kroner, K.F., see Bollerslev, T., 265 Magnus, J.R., see Banerjee, A.N., 318
Krtscha, M., 165, 187 Magnus, J.R., see Keuzenkamp, H.A., 152,
Krueger, A., see Fuchs, V., 338 339
Krüger, L., see Gigerenzer, G., 355 Makridakis, S., 325, 339
Kuhn, T.S., 64, 77, 282, 292 Malaga, J.E., see Park, J., 319
Kula, W., 345, 355 Mäler, K.-G., 222, 229
Kuyatt, C.E., see Taylor, B.N., 77 Malinvaud, E., 254, 268
Kwiatkowski, D., 408, 410 Mallin, M., see Forni, M., 266
Kydland, F.E., 245, 248 Malmendier, U., see Lazear, E.P., 104
Mankiw, G., 422, 427
Manser, M., 172, 187
Laha, R.G., 310, 318 Maravall, A., see Kaiser, R., 410
Lau, M.I., see Harrison, G.W., 104 Marcellino, M., see Banerjee, A., 265
Laubach, T., 406, 410 Marget, A.W., 121, 132
Laury, S.K., see Holt, C.A., 104, 375 Mari, L., 4, 8, 12, 14, 33, 38, 48, 49, 60, 61,
Layard, R., 222, 229 67, 68, 77, 107
Lazear, E.P., 80, 104 Mari, L., see Carbone, P., 77
Leamer, E.E., 82, 104, 258, 265, 267, 298, Marschak, J., 254, 268, 273, 293
318, 319, 325, 327, 339 Marschak, J., see Becker, G., 374
Leaning, M., see Finkelstein, L., 77 Marshall, A., 127, 132
Leeper, E.M., 278, 292 Masten, I., see Banerjee, A., 265
Leontief, W., 332, 339 Mauris, G., see Benoit, E., 76
Lerner, A.P., 162, 187 Mauris, G., see Mari, L., 77
LeRoy, S.F., see Cooley, T.F., 291, 338 Mayer, T., 13, 17, 18, 82, 104, 140, 145,
Leser, C.E.V., 380, 399, 400, 410 146, 217, 261, 282, 293, 321, 330, 339
Levine, D.K., see Fudenberg, D., 103 McAleer, M., 325, 339
Linklater, A., 345, 355 McAleer, M., see Keuzenkamp, H.A., 339
Lippi, F., see Forni, M., 266, 410 McCloskey, D.N., 140, 152, 329, 337, 339
Lipsey, R.G., 135, 138, 144, 152 McCloskey, D.N., see Ziliak, S., 340
Lira, I., 72, 77 McConnell, M., 330, 339
List, J.A., see Harrison, G.W., 18, 103 McCullough, B.D., 335, 339
Liu, T.-C., 257, 267, 275, 292, 293 McDonald, R., see Manser, M., 187
Loader, C., 380, 383, 410 McFadden, D., see Green, D., 375
Locke, J., 24, 37 McInnes, M.M., see Harrison, G.W., 10,
Loeb, S., 330, 339 103, 104, 298, 318, 365
434 Author Index
Plosser, C.I., see Nelson, C.R., 411 Rothbarth, E., 162, 188
Plott, C.R., see Grether, D.M., 103, 375 Rothenberg, T.J., 258, 268
Poirier, D.J., 393, 411 Rubinstein, A., 82, 104
Polanyi, M., 348, 355 Rudebusch, G.D., see Diebond, F.X., 266
Polasek, W., 298, 319 Ruppert, D., 380, 411
Pollak, R., 170, 188 Rusnock, A., 346, 356
Pollock, D.S.G., 380, 402, 411 Russell, B., 19, 24, 38, 69, 77
Ponce de Leon, A.C.M., 361, 375 Russell, B., see Whitehead, A.N., 39
Ponce de Leon, A.C.M., see Müller, W.G., Rutström, E.E., see Botelho, A., 103
375 Rutström, E.E., see Coller, M., 103
Porter, R., see Orphanides, A., 132 Rutström, E.E., see Harrison, G.W., 10,
Porter, T.M., 15, 17, 18, 107, 108, 132, 344, 103, 104, 318, 298, 365
347, 351, 352, 354, 356, 358, 417, 427 Rytan, J., 415, 419, 427
Porter, T.M., see Gigerenzer, G., 355
Poterba, J., see Fuchs, V., 338 Sadiraj, V., see Cox, J.C., 103
Prelec, D., 82, 104 Salyer, K.D., see Hartley, J.E., 292
Prescott, E.C., see Cooley, T.F., 247 Samuelson, P.A., 162, 188
Prescott, E.C., see Hodrick, R., 410 Sargan, J.D., 260, 268
Prescott, E.C., see Kydland, F.E., 248 Sargent, T.J., 259, 262, 268
Primont, D., see Blackorby, C., 186 Sato, K., 164, 188
Proietti, T., 6, 16, 405–409, 411 Schlimm, D., 8, 18
Pugh, M.G., see Findley, D.F., 409 Schmidt, P., see Kwiatkowski, D., 410
Schumpeter, J., 251, 268
Schut, C.M., see Pannekoek, J., 229
Qin, D., 253, 254, 256, 258, 259, 268, 271,
Schwartz, A., see Friedman, M., 338
272, 293
Scott, A.J., 404, 411
Qin, D., see Gilbert, C.L., 12, 145, 266,
Scott, D., 286, 293
271, 272
Searle, S.R., see Henderson, H.V., 410
Quenneville, B., 406, 411
Seater, J., 149, 152
Quiggin, J., see Dowrick, S., 186
Selden, R.T., 122, 127, 132
Quine, W.V.O., 62, 77
Sent, E.-M., 261, 268
Serlitis, A., see Barnett, W., 131
Rabin, M., 81, 82, 104 Shannon, C.E., 46, 77
Rabin, M., see Charness, G., 103 Shapiro, M., see Mankiw, G., 427
Ramsay, J.O., 32, 38 Sharma, A.R., see Davison, M.L., 37
Rebelo, S., see King, R.G., 410 Shaxby, J.H., see Ferguson, A., 37
Reichlin, L., see Forni, M., 266, 410 Shearing, M., 419, 427
Reiersøl, O., see Koopmans, T.C., 267 Shephard, N., 262, 269
Reinsdorf, M.B., 11, 27, 38, 109, 110, 136, Shin, Y., see Kwiatkowski, D., 410
164, 167, 179, 188, 219, 333, 340 Shulman, H.B., see Findley, D.F., 409
Richardson, L.F., see Ferguson, A., 37 Siegler, M., see Hoover, K.D., 338
Richter, M., see Hurwicz, L., 187 Silverman, B.V., see Green, P.J., 410
Rider, R., see Frängsmyr, T., 355 Silvey, S.D., 360, 375
Robbins, L.C., 144, 152 Simon, H.A., 245, 248, 256, 269
Robertson, R., 330, 340 Sims, C.A., 259, 269, 275–278, 293
Robinson, G.K., 397, 411 Sims, C.A., see Sargent, T.J., 268
Rodenburg, P., 10, 18, 106, 130, 132 Singh, A.C., see Quenneville, B., 411
Rodriguez-Palenzuela, D., see Singleton, K.J., see Hansen, L.P., 292
Camba-Mendez, G., 409 Skilivas, S., see Mirowski, P., 339
Rødseth, A., see Moene, K.O., 293 Slovic, P., see Tversky, A., 375
Rosenberg, A., 336, 340 Slutsky, E., 408, 411
Rossi, A., 407, 411 Smets, F.R., 406, 407, 411
Rossi, G.B., 65, 77 Smith, A., 253, 269
Rosson, C.P., see Park, J., 319 Smith, T., see Ferguson, A., 37
436 Author Index
Smith, T.M.F., see Scott, A.J., 411 Thouless, R.H., see Ferguson, A., 37
Smith, V.L., see Cox, J.C., 103 Thursby, J., see Dewald, W., 338
Spanos, A., 321, 323, 326, 327, 337, 340, Tiao, G.C., 408, 411
404, 411 Tibshirani, R.J., see Hastie, T.J., 410
Spohn, W., see Stegmüller, W., 293 Tinbergen, J., 198, 202, 203, 229, 287, 294
Srba, F., see Davidson, J.E.H., 291 Törnqvist, L., 167, 188
Staehle, H., 162, 188 Torsney, B., see Ford, I., 375
Staiger, D., 406, 411 Triplett, J.E., 179, 188
Stamhuis, I.H., 201, 229 Trivedi, P.K., 178, 188
Stamhuis, I.H., see Klep, P.M.M., 229 Tucker, W.S., see Ferguson, A., 37
Starmer, C.V., see Cubitt, R.P., 375 Tutz, G., see Farhmeir, L., 409
Stegmüller, W., 280, 293 Tversky, A., 363, 375
Stekler, H.O., see Goldfarb, R., 338 Tversky, A., see Krantz, D.H., 18, 37, 132,
Stevens, S.S., 24–26, 30, 38, 232, 248, 286, 292
293 Tversky, A., see Luce, R.D., 38, 268, 293
Stigler, S.M., 344, 356 Tversky, A., see Suppes, P., 38, 294
Stigum, B.P., 290, 293
Stock, J.H., 262, 269, 276, 279, 293, 407, van Ark, B., 219, 229
411 van den Bergh, J.C.J.M., 221, 229
Stock, J.H., see Staiger, D., 411 van den Bogaard, A., 202, 203, 229
Stone, R., 262, 269, 415, 427 van der Eyden, J.A.C., see den Butter,
Strauß, S., see Kocher, M., 104 F.A.G., 228
Stuart, A., see Kendall, M., 410 Van Fraassen, B., 281, 284, 294
Stuvel, G., 164, 174, 188 van IJzeren, J., 164, 174, 188
Sugden, R., see Loomes, G., 104, 375 van Norden, S., see Orphanides, A., 411
Sullivan, M.B., see Harrison, G.W., 104 van Tongeren, J.W., see Magnus, J.R., 229
Summers, L., 147, 152, 322, 340 van Zanden, J.L., 202, 229
Summers, R., 197, 229 Varian, H.R., 172, 188, 283, 294
Varian, H.R., see Gibbard, A., 152, 338
Suppe, F., 281, 282, 284, 293
Vartia, Y.O., 164, 169, 188
Suppes, P., 26, 28, 30, 31, 38, 106, 132,
Vasnev, A., see Magnus, J.R., 319
236, 248, 282, 283, 286, 289, 294
Veall, M., see Pagan, A., 339
Suppes, P., see Krantz, D.H., 18, 37, 132,
Verbruggen, H., see den Butter, F.A.G., 228
292
Verbruggen, H., see Gerlagh, R., 228
Suppes, P., see Luce, R.D., 38, 268, 293
Verbruggen, J.P., see Don, F.J.H., 228
Suppes, P., see Scott, D., 293
Vesterlund, L., see Harbaugh, W.T., 103
Sutter, M., see Kocher, M., 104
Ville, J., 172, 182, 188
Sutton, J., 147, 152, 240, 248, 289, 294
VIM, 4, 14, 18
Swait, J.D., see Louviere, J.J., 375
VIM, see IVM
Swamy, S., 163, 188
Vining, R., 261, 269
Swamy, S., see Samuelson, P.A., 188
Vinod, H.D., see McCullough, B.D., 339
Swijtink, Z., see Gigerenzer, G., 355 Viscusi, W.K., 330, 340
Swoyer, C., 33, 39 Voeller, J., see Eichhorn, W., 186
Sydenham, P.H., 106, 132, 242, 248 Voeller, J., see Funke, H., 187
Szulc, B., 166, 188 Vogt, A., 165, 188
Volcker, P., see McAleer, M., 339
Taylor, B.N., 69, 77 von Helmholtz, H., 22, 39
Teller, P., 284, 285, 294 Von Neumann, J., 11, 18
Teräsvirta, T., see Granger, C.W.J., 267
Terrall, M., 345, 356 Wakker, P.P., 102, 104
Theil, H., 257, 269 Walker, J.M., see Cox, J.C., 103
Thompson, B., 329, 340 Walsh, C.M., 157, 161, 188
Thompson, R., see Patterson, H.D., 411 Wand, M.P., 392, 411
Thorbecke, E., 329, 340 Wand, M.P., see Ruppert, D., 411
Author Index 437
Ward, B., 324, 340 Wilcox, N.T., see Ballinger, T.P., 103
Watson, G.S., 391, 411 Wild, C., see Phannkuch, M., 339
Watson, G.S., see Durbin, J., 266 Wise, M.N., 349, 356
Watson, M.W., 408, 411 Wold, H., 256, 269
Watson, M.W., see Blanchard, O.J., 291 Wolfers, J., see Donohue, J., III, 338
Watson, M.W., see Staiger, D., 411 Woodward, J., 232, 233, 248, 274, 294
Watson, M.W., see Stock, J.H., 269, 293, Wu, C.F.J., see Ford, I., 375
411
Waugh, F.V., 256, 262, 269 Xu, D., see Tiao, G.C., 411
Weadock, T., see Grimm, B., 426
Weber, R.A., see Lazear, E.P., 104 Yeo, S., see Davidson, J.E.H., 291
Wei, S.-J., 330, 340 Youden, W.J., 344, 356
Weitzman, M.L., 218, 229 Yule, G.U., 409, 411
West, K.D., see Newey, W.K., 319
Westermann, T., see Proietti, T., 411 Zellner, A., 324, 325, 340
Wetenschappelijke Raad voor het Zha, T., see Leeper, E.M., 292
Regeringsbeleid, 224, 229 Ziliak, S., 329, 340
Weymark, J.A., see Vartia, Y.O., 188 Zingales, G., see Mari, L., 77
White, A., see Hull, J., 267 Zinman, J., see Karlan, D., 104
White, K.P., 247, 248 Zinnes, J.L., see Suppes, P., 38, 248, 294
Whitehead, A.N., 21, 39 Zivot, E., see Morley, J.C., 410
Whittaker, E., 399, 411 Zwerina, K., see Huber, J., 375
This page intentionally left blank
Subject Index
accounting 110, 129, 346, 350, 351 Becker–DeGroot–Marschak (BDM) mecha-
macro 109 nism 363
national 190, 191, 194, 196–199, 202, behavioral economics 11, 144, 336, 337
207, 208, 212–214, 218, 220–223 bias 6, 65, 81, 157, 158, 161, 172, 177, 179,
expenditure approach 191, 214, 426 224, 233, 243, 272, 326–329, 334,
income approach 191, 214, 426 387, 388, 405, 406
output approach 191 anchoring 362
national income 110, 218 black box model (of measurement) 41, 246
accounting principle 110 bootstrap 115, 118, 350
accounting system 11, 17, 110, 130, 192, Boskin commission 109, 179, 220, 351
194, 207, 208, 216 Box–Jenkins’ methodology 257, 259, 261
accuracy 4–6, 8, 11–15, 17, 62, 66, 69, 109, Brookings Institution 210
238–242, 246, 247, 256, 275, 343, Bundesbank, Germany 211
347, 351, 352, 354, 355, 377, 380, Bureau of Economic Analysis (BEA), US
405, 406, 413–417, 419, 420, 422, 209, 421–423, 425
423, 425 bureaucracy 107, 126, 349, 350
accuracy of national accounts, see national bureau of standards 354
accounts, accuracy business cycle study 260, 261
accurate representation, see representation,
accurate calculation error, see error, calculation
aggregation problem 227 calibration 8, 12, 49, 54–56, 61, 62, 64, 65,
anchoring bias, see bias, anchoring 70, 81, 102, 107, 118, 223, 234, 236,
approximation 65, 68, 154, 178, 240, 244, 239, 242, 244–247, 275, 327, 406
344, 380, 408, 413 capital gain 334
associative measurement, see measurement, caricature model, see model, caricature
associative causality test, see test, causality
astrology 344 census 118, 206, 346, 350, 420
astronomy 344–346 Census Bureau, US 209
autonomy Central Bureau of Statistics (CBS), Nether-
of an empirical relation 123–125, 240, lands 193, 196, 198, 201–203, 205,
273, 274, 278, 279 215, 217, 223, 224
of a model 263, 290 Central Planning Bureau (CPB), Nether-
autoregressive-integrated-moving average lands 148, 202–204, 207, 210, 212,
(ARIMA) model, see model, auto- 225
regressive-integrated-moving average Central Statistical Office (CSO), UK 206,
auxiliary parameter, see parameter, auxiliary 218
axiom, see index formula, axiom certainty equivalent 83, 363, 364
axiomatic approach 7, 11, 12, 25, 153, 162, ceteris absentibus 8, 239
163, 179, 194, 281 ceteris neglectis 239
axiomatic index number theory, see index ceteris paribus 5, 8, 238, 239, 264, 284, 322,
number theory, axiomatic 325
axiomatization 7, 8, 281, 282, 288 characteristic test, see test, characteristic
characterization for an index, see index for-
Bank of England 148, 151, 207 mula, characterization
barometer 16, 117, 118, 260 checking 14, 32, 335
Bayesian approach 80, 258, 259, 298, 361, checking device 114, 115, 119, 121, 123,
406 129
quasi-Bayesian method 258 chemistry 144, 273, 345, 346
440 Subject Index
“chicken and egg” problem 359, 360, 369 cost of living index, see index, cost of living
Chow test, see test, Chow cost–benefit analysis 108, 351–353
circularity test, see index formula, test, cir- Council of Economic Advisers (CEA), US
cularity 209, 210, 421
classical approach, see metrology, classical coverage factor 71
approach Cowles Commission (CC) 253, 272–274,
Cobb–Douglas index, see index, Cobb– 277, 280
Douglas CPB Netherlands Bureau for Economic Pol-
cointegration 260 icy Analysis, see Central Planning Bu-
combined standard uncertainty, see uncer- reau, Netherlands
tainty, combined standard cross-spectral analysis, see also spectral
commensurability axiom, see index formula, analysis 261
axiom, commensurability cubic spline 379, 395–397
commensuration 352, 353
common ratio test, see test, common ratio D-optimal design, see experimental design,
comparability of national accounts, see na- D-optimal design
tional accounts, comparability data 9, 11–13, 16, 114, 232–234, 240, 243–
comparative proportionality test, see index 245
formula, test, comparative proportion- before theorizing, see also data mining,
ality measurement without theory, statistics
compatibility 73
151
compound property of money 121, 126
quality of 332–336
computational experiment, see experiment,
data mining 14, 146, 151, 258, 261, 326–329
computational
data perturbation, see perturbation
conceptual issue 113, 121
data-generating process (DGP) 80, 276, 277,
confidence interval 69, 81, 82, 92, 102, 140,
287, 323, 324, 357
330, 406
local 276, 277
conformance 45, 73, 74
data-instigated model, see model, data-
Congressional Budget Office, US 210
instigated
congruence 276, 277, 288
database 332, 334, 335
consensus view 148, 203, 205
consistency 11, 17 deep parameter, see parameter, deep
consistency in aggregation, see index for- derived measurement, see measurement, de-
mula, test, consistency in aggregation rived
consistency of national accounts, see na- description 140–142
tional accounts, consistency descriptive statistics, see statistics, descrip-
constant basket test, see index formula, test, tive
constant basket design of measuring instrument, see measur-
constant relative risk aversion (CRRA), see ing instrument, design principle
risk aversion, constant relative Deutsches Institut für Wirtschaftsforschung
constructive empiricism 284, 287 (DIW) 211, 212
constructive realism, see realism, construc- DHSY model of consumption, see model,
tive consumption
consumer price index (CPI), see index, con- diagnostic 296, 297, 315, 323, 328, 333, 401
sumer price diagnostic and sensitivity 297, 298, 314–318
consumption model, see model, consump- diagnostic curve 314, 315
tion diagnostic test, see test, diagnostic
core module system (national accounts) 215, direct measurement, see measurement, di-
216 rect
correctness of a model 298 direct measurement scale, see scale of mea-
correlative interpretation of measurement, surement, direct
see measurement, correlative interpre- Direction de Prévision (DP), France 213
tation Divisia index, see index, Divisia
correspondence rule 7, 281, 282, 284, 285 domestic product 160, 190–193, 216
Subject Index 441
Durbin and Watson (DW) test, see test, expected utility theory (EUT) 79, 80–83, 85,
Durbin and Watson 86, 88, 90, 92–94, 96, 98, 101, 102
dynamic factor model (DFM), see model, expected value of a lottery, see lottery, ex-
dynamic factor pected value
dynamic system 12, 44 expenditure approach of national account-
ing, see accounting, national, expen-
eco-circ system 207 diture approach
econometric methodology, see also London experience 11, 25, 45, 69, 70, 106, 118, 136–
School of Economics econometric ap- 138, 140, 260, 322, 323, 343
proach 194, 201, 258, 271, 287, 290 experiment 9, 10, 15, 17, 80, 82, 83, 86, 88,
econometrics 3, 6, 8, 12, 13, 16, 145–147, 90, 101, 102, 106, 142, 144, 151, 239,
149–151, 233, 234, 271 240, 264
economic approach to index numbers, see computational 245, 246
index number theory, economic ap- field 10, 79, 83
proach laboratory 8, 10, 79, 80, 83
economic theory 142, 143, 150 natural 9, 10
versus statistical theory 323 stated choice 358
empirical adequacy 284, 285 thought 107, 130, 144
empirical model, see model, empirical experimental design 5, 15, 80, 81, 83, 233,
empirical relational structure, see relational 238, 284, 357–375
structure, empirical between-subjects 81
empirical sensitivity, see sensitivity, empiri- D-optimal design 81, 359–361, 365–368,
373, 374
cal
factorial
empirical substructure, see structure, empir-
fractional 358
ical substructure
full 358
empirical system 19, 26, 27, 30–32, 34, 36
multiple price list 88
Enlightenment 345
sequential 373
environmental degradation 190, 217, 222
within-subjects 79, 81–83
error 5, 6, 14–16, 81, 85, 86, 94, 98, 102,
experimental design theory 358, 361, 365
232, 233, 238, 242, 243, 344, 348
experimental procedure 7, 80, 82, 88, 98,
calculation 334, 335
101
measurement 6, 60, 178, 234, 242, 404,
experimetrics 357
407, 418, 422, 423
extreme bounds analysis 258, 259, 325, 326
observational 232, 233, 238, 240
prediction 96, 243 factor reversal test, see index formula, test,
random 63, 64, 330, 350 factor reversal
sampling 81, 326, 329, 331, 332, 404, 417 factual influence, see influence, factual
systematic 63, 64, 70, 344 falsificationism 135, 137
type 1 vs. type 2 74 feasible general least squares method, see
error correction 260, 264, 288 least squares method, feasible general
error rate 94 Federal Reserve Board (Fed), US 111, 113,
error term 5, 6, 16, 127, 238, 242, 243, 256 121, 123–125, 209, 210, 257, 278, 421
estimate 113, 122 field experiment, see experiment, field
flash 215, 419, 421–423, 425 filter 5, 15, 16, 377-411
revision 413, 414, 416, 419–426 Henderson 377, 379, 380, 382–384, 386–
estimation 5, 12, 139, 147, 244, 245, 253– 388, 392, 404
256, 258, 260, 261, 263, 275, 378, Kalman 401, 406
380, 381, 383, 389, 401, 405–407 Hodrick and Prescott 400, 402, 404, 409
estimator 6, 13, 256, 257, 297, 298, 358 filtering 107, 402, 405
European Union 190, 192, 217 Fisher’s ideal index, see index, Fisher’s ideal
evaluation, measurement as 41, 56 flash estimate, see estimate, flash
expanded uncertainty, see uncertainty, ex- flexibility of national accounts, see national
panded accounts, flexibility
442 Subject Index
formalization 68, 253, 255 identifiability 105, 113, 122–126, 128, 129,
fractional factorial design, see experimental 395
design, factorial, fractional identification 12, 252, 254, 255, 257, 259,
full factorial design, see experimental de- 261, 263, 274–276, 278, 279, 288, 289
sign, factorial, full ideological commitment 321, 346
full-information maximum likelihood (FIML), implicit price index, see index, implicit price
see maximum likelihood, full-inform- implicit quantity index, see index, implicit
ation quantity
fundamental measurement, see measure- impulse response function 264, 382, 393,
ment, fundamental 395
income approach of national accounting,
game theory 11 see accounting, national, income ap-
‘garbage in, garbage out’ principle 45 proach
gross domestic product (GDP) 111, 122, independent data set 205, 327, 331
125, 158, 160, 215, 219, 334, 414– index, see also price index
420, 424, 425 Cobb–Douglas 163, 176, 179, 184
per capita 216, 219 consumer price 155, 173, 179, 220, 224
general equilibrium theory 264 cost of living 120, 153, 154, 167–170,
generalized autoregressive conditional het- 172, 177, 181, 184, 351
eroscedasticity (GARCH) model, see Divisia 110, 127, 172, 178, 181, 182, 184,
model, generalized autoregressive 185
conditional heteroscedasticity Fisher’s ideal 108, 109, 161, 162, 164–
general-to-specific methodology 277, 327 168, 174, 178, 179
generic measuring instrument, see measur- human development 216, 219
ing instrument, generic implicit price 159, 174, 184
graduation 16, 378, 380 implicit quantity 158–160, 167, 169, 170,
Granger representation theorem 260, 288 176, 177, 182, 183
Granger-causality test, see test, causality, Laspeyres 109, 110, 156, 158–161, 164,
Granger 168–175, 177, 178, 182–185
‘green’ GNP, see sustainable gross national Paasche 110, 156, 158–161, 164, 168–
product 175, 177, 178, 182–185
guesstimate 112, 113, 128 Sato–Vartia 165, 167, 168, 176, 178, 185
guesswork 114 Törnqvist 167, 174, 176, 178, 179, 185
index formula 11, 108–110, 153, 154, 156,
happiness 222 157, 159, 161, 162, 165, 166, 177, 184
hedonic method 220 axiom 108, 109, 153
Henderson filter, see filter, Henderson commensurability 155–157, 159–163,
history 3, 11, 136, 138, 140, 142, 143 165, 175–177, 179–181, 183
of economics 107, 109 linear homogeneity in comparison pe-
of measurement 8, 10, 41, 47, 105, 111, riod prices 175–177, 183
114, 120, 130, 265, 354 local monotonicity 168, 176
of national accounting 190, 194, 196, 199 monotonicity 163, 165–168, 175–177,
of science 32, 106, 111, 343 179, 183
of the social sciences 107 ordinal circularity 171
homomorphism 7, 8, 11, 19, 26, 27, 28, 48, price dimensionality 165, 175, 177, 183
56–59, 106, 231, 232, 245–247 proportionality 157, 159–163, 165, 168,
human development index (HDI), see index, 174, 175, 179–181, 183, 184
human development weak axiom of revealed preference 169,
hypothetico-deductive method 135, 138, 171, 184
139, 143, 337 weak monotonicity 176, 177
characterization 163, 165, 167, 173, 179,
idealized entities 10, 127 185
idealized model, see model, idealized test
Subject Index 443
comparative proportionality 160, 175, invariance 4, 5, 9, 11–13, 17, 107, 111, 122,
184 126, 213, 214, 216, 233, 239, 240,
consistency in aggregation 164, 173– 244, 271, 273, 274, 278, 280, 287
175, 177, 183 under intervention 273, 289, 290
constant basket 157–159, 167 under transformation 30, 286, 289, 290
circularity 156, 157, 160–163, 166, 170, invariance view on structure, see structure,
178–183 invariance view
factor reversal 160–162, 164, 166, 173, isomorphism 25, 31, 33, 34, 283–287, 289
174, 177, 179, 183, 185
homogeneity of degree zero 164, 165, joint hypothesis about risk attitudes and con-
183 sistent behavior 86, 88
mean value 165, 167, 177, 181, 183
product 158, 159, 171, 174, 177, 184 Klein–Goldberger model, see model, Klein–
strong proportionality 157, 184 Goldberger
time reversal 157, 160–162, 164, 168, knowledge 135–138, 143, 147, 148, 151
174, 177, 184, 185
index number 5, 11, 16, 109, 110, 114, 116, laboratory experiment, see experiment, lab-
351 oratory
index number theory 3, 11, 116, 220 land survey 345
axiomatic 11, 153–188 Laspeyres index, see index, Laspeyres
economic approach 153, 162, 163, 167, latent process 80, 82, 98
168, 172, 175, 177–179, 181 law of nature 7, 8, 235, 236, 238
stochastic approach 153, 154, 166, 179 leading indicator model, see model, leading
indicator 123, 189–229, 260, 262, 351, 419 indicator
interpretation 227 League of Nations 199
indirect least squares method, see least least squares method 16, 256, 344, 346, 366,
squares method, indirect 378
indirect measurement, see measurement, in- feasible general 256
direct indirect 272
inference (measurement as a tool for) 43, 49, generalized 295, 299
50 ordinary 272, 296, 299, 359, 391
influence 4, 5, 9, 10, 237–241, 243, 244 life insurance 353
factual 9 limited-information maximum likelihood
potential 9, 239, 240, 243 (LIML), see maximum likelihood,
influence quantity 67 limited-information
information 3, 4, 8, 14, 41–43, 46–48, 50– linear homogeneity in comparison period
52, 57, 58, 62, 63, 67, 68, 76, 81, 86, prices, see index formula, axiom, lin-
349, 352 ear homogeneity in comparison pe-
information matrix 359, 360, 365, 366, 373, riod prices
374 Liu critique 275, 276
input/output analysis 109–111, 208 local data-generating process (LDGP), see
input/output tables 197, 208, 219 data-generating process, local
Institut National de la Statistique et local monotonicity axiom, see index for-
des Études Économiques (INSEE), mula, axiom, local monotonicity
France 212, 213 local polynomial regression 377, 378, 380,
institutional set-up of policy preparation 392, 405
190, 202–205, 227 logical positivism 7, 25, 26, 281, 282
instrument measurement, see measurement, logit model, see model, logit
instrument London School of Economics (LSE) econo-
interpretation of indicator, see indicator, in- metric approach 258, 260, 262, 272,
terpretation 276, 277, 279, 280, 287, 323, 324
intersubjectivity 42, 43, 48, 50, 53, 54, 58, lottery
76 choice 81, 83, 85, 86, 88, 90, 92–94, 98,
interval regression 92, 96, 98, 101 101, 142, 360, 363–365, 370, 372, 374
444 Subject Index
expected value 86, 88, 364 measuring instrument, see also barometer,
Lucas critique 149, 244, 245, 257, 273, 274, thermometer 10, 14, 15, 56, 105, 107–
277–280, 321, 336 111, 113, 115, 118–123, 125–130,
140, 233, 234, 236, 238, 239, 242,
Maastricht criteria 192 246, 265, 416
macro model, see model, macro design principle 108–111, 125, 128
macro-accounting, see accounting, macro generic 109, 110, 127
macro-aggregate 121 mechanical balance 106, 115, 116
magnitude 20–22, 27, 33, 34, 343 medicine 336, 346, 348
Marschak–Machina triangle 369 meta-analysis 147
mathematical statistics, see statistics, math- metaphysical assumption 11, 136, 137
ematical metric system 345, 354
maximum likelihood (ML) 255, 256, 272 metrology, see also measurement science 4,
full-information 255, 273 8, 12–14, 106, 107, 111, 236, 242,
limited-information 255, 273 275, 354
mean value test, see index formula, test, classical approach 4, 14
mean value uncertainty approach 4, 14, 15
measurability 20, 41, 42, 48, 60, 68, 76, 105, microfoundation 274
107, 108, 113, 117, 120, 121, 123, misspecification test, see test, misspecifica-
125, 126, 128, 129, 253 tion
measurand 3–6, 8, 10, 12–15, 17, 46, 48, 50– mistake 85, 148, 169,
52, 55, 56, 60–71, 73, 76, 235, 237, mixture model, see model, mixture
238, 240–242 model, see also representation 5, 7, 10–
definition 65 13, 15, 17, 61, 67, 107, 108, 117,
measurement 114, 135, 141, 343, 348 119, 127, 231–233, 240–247, 282–
associative 235, 236 286, 288, 343, 351
correlative interpretation 35, 234–238 autoregressive-integrated-moving average
derived 22, 23, 45, 51, 71, 114, 121, 130, 257, 387
235 caricature 150, 321–323
direct 10, 45, 51, 59, 71, 119, 233, 235, complex vs. simple 324, 325
236 consumption 280
fundamental 22, 23, 235, 286 data-instigated 257, 263
indirect 4, 10, 45, 51, 59, 235, 243 dynamic factor 262, 407
instrument 235–238 empirical 139, 140, 143, 145–150, 191,
model 8, 10, 241 256, 263, 264, 277, 282, 284, 322
physical 23, 31, 32, 43 generalized autoregressive conditional
pointer 236 heteroscedasticity 262
measurement error, see error, measurement idealized 105, 128, 129
measurement method 4, 14, 59, 65, 344, 406 Klein–Goldberger 256, 261
measurement principle 10, 14 leading indicator 262
measurement result 4, 5, 10, 11, 13–16, 42, logit 141, 369
46, 48, 49, 52–55, 60, 61, 63, 67–71, macro 198, 202, 337
73–76, 233, 240–242, 247 mixture 80
measurement science, see also metrology of data 283, 284, 285
50, 61, 64, 65, 68 of theory 285
measurement strategy 4, 6, 8, 10, 272 probit 360, 362, 363, 365, 367–370, 373
measurement system 15, 68 random preference 372
measurement theory 33, 34, see also repre- reduced form 255, 257, 276
sentational theory of measurement reduced form vector autoregressive 259,
measurement without theory 252, 261, 262, 276
273, 278 regime-switching 262
measuring formula, see also index formula simultaneous-equations 254, 255, 259,
113, 121, 122 260, 272–274, 277–279, 288
Subject Index 445
126, 135, 151, 162, 194, 231, 232, significance test, see test, significance
234, 235, 271, 286, 288 similarity 25, 28, 284–286, 298
reproducibility 14 simultaneous-equations model (SEM), see
residual heteroscedasticity 256 model, simultaneous-equations
residual serial correlation 252, 256 smoothing 378–380, 388, 400
resolution 44, 65, 68 social accounting matrix (SAM) 217
revealed preference, see preference, revealed Social Economic Council (SER), Nether-
reversal phenomenon 364 lands 204, 205, 225
revision of estimate, see estimate, revision social studies of science 106, 111
revision of national accounts, see national software package 146, 335, 418
accounts, revision specification, see model specification
risk 80, 353 specification search 258
attitude 79–83, 86, 90, 92, 96, 98, 101, 102 spectral analysis 261
aversion 79–83, 85, 86, 88, 90, 93, 94, 96, stability 4, 49, 65, 107, 124, 125
98, 101, 102 Stability and Growth Pact (SGP) of the EU
constant relative 83, 85, 86, 88, 90, 92, 192
94, 96, 98, 101, 102, 275, 363, 365, standard 14, 15, 55, 56, 65, 122, 207, 217,
370 223, 227, 236, 242, 244, 246, 275,
loving 88, 92 347, 354
neutral 86, 88, 92, 101 standard deviation 70, 72, 343, 421
robustness test, see test, robustness
standard uncertainty, see uncertainty, stan-
Royal Netherlands Economic Association
dard
201
standardization 11, 17, 242, 246, 345, 347,
349, 354, 355, 417
Sachverständigenrat, Germany 210, 211
standardization of national accounts, see na-
sample survey 113, 119–121, 127, 128, 327
tional accounts, standardization
sampling error, see error, sampling
standardized quantitative rule 107–109, 111,
sampling strategy 119
119, 126, 128
sampling system 110
stated choice experiment, see experiment,
Sato–Vartia index, see index, Sato–Vartia
stated choice
savings ratio 333, 334
scale of measurement, see also reference statistical office 107, 189, 193, 227
scale 25–28, 30, 31, 48, 56–58, 231, statistics 136, 138, 139, 142, 145, 343, 348
235–237, 286, 287, 289 descriptive 200
direct 235 mathematical 200
scale unit 53 Statistics Norway 207, 209
scarcity 144 Statistische Bundesamt, Germany 210
semantic view 7, 271, 280–282, 284–287, stochastic approach to index numbers, see
290 index number theory, stochastic ap-
sensitivity proach
analysis 13, 295–319 stochastic volatility (SV) 262
coefficient 13, 72, 298 strong proportionality test, see index for-
curve 314 mula, test, strong proportionality
definition 297, 299, 302, 308, 315 structural approach in econometrics 252,
empirical 82 254, 255, 257, 261, 272
of F-test 307–309 structural approach to measurement 271,
of OLS predictor 298–302 272, 280, 288, 290
of OLS variance estimator 302, 303 structural parameter, see parameter, struc-
of t-test 309–313 tural
of t-test rule of thumb 313 structural representation, see representation,
relationship with diagnostic test 314–317 structural
sensor (as input transducer) 55, 56, 61, 62 structural vector autoregressive (SVAR)
sequential experimental design, see experi- model, see model, structural vector
mental design, sequential autoregressive
448 Subject Index