(Spanos) Statistical Foundations of Econometric Modelling
(Spanos) Statistical Foundations of Econometric Modelling
(Spanos) Statistical Foundations of Econometric Modelling
Contents
Foreword
xi
by David Hendry
XV
Preface
Acknowledgements
List of symbols and abbreviations
lntroduction
Part I
2.1
2.2
2.3
study of data
Histograms and their numerical
Frequency curves
Looking ahead
DescriptiYe
Probability
Part 11
3
3.1
3.2
3.3
characteristics
15
22
23
23
27
29
theory
Probability
The notion of probability
The axiomatic approach
Conditional probability
33
34
37
43
Contents
47
48
55
60
62
68
5
5. 1
5.2
5.3
5.4
78
78
83
85
89
6
6. 1
6.2*
6.3
,' 4
4.1
4.2
4.3
4.4
4.5
96
96
99
variables,
108
109
summary
Looking ahead
Appendix 6.1 The normal
distributions
and
related
110
7
7. 1
7.2
7.3
116
116
121
127
8*
8. 1
8.2
Stochastic processes
The concept of a stochastic process
of a stochastic
Restricting the time-heterogeneity
130
131
process
8.3
8.4
8.5
9
9. 1
i?2
.
<i3
.1
.
Summary
Limit theorems
The early limit theorems
The 1aw of la rge n umbers
The central limit theorem
l.: nit ! heorems for stochastic processes
137
140
144
162
165
165
168
173
178
180
Contents
10*
10.1
10.2
10.3
10.4
10.5
10.6
183
183
185
192
194
198
202
Statistical inference
Part III
213
213
215
219
221
223
function
228
12
12.1
12.2
12.3
231
232
244
247
13
13.1
13.2
13.3
14
14.1
14.2
14.3
14.4
14.5
14.6
285
285
290
296
299
303
306
15*
15.1
15.2
15.3
normal distribution
The multivariate
Multivariate distributions
The multivariate normal distribution
Quadraticforms related to the normal distribution
3 12
3 12
3 15
11
11.1
11.2
11.3
11.4
11.5
methods
least-squares
moments
likelihood method
252
253
256
257
319
Contents
15.4
15.5
Estimation
Hypothesis
16*
16.1
16.2
Part IV
320
323
test procedures
models
326
326
328
statistical
17
17.1
17.2
17.3
17.4
17.5
339
339
342
346
349
352
355
18
18.1
18.2
18.3
18.4
18.5
357
357
359
363
366
367
l9. 1
19.2
19.3
19.4
19.5
19.6
19.7
19.8
20
Specification
Estimation
Hypothesis testing and confidence intervals
Experimental design
Looking ahead
The Iinear regression model I spification,
estimation and testing
lntroduction
369
369
370
Specification
Discussion of the assumptions
Estimation
Specification testing
Prediction
The residuals
Summary and conclusion
Appendix 19. 1 A note on measurement
-
375
378
392
402
405
408
systems
409
412
413
418
Contents
Weak exogeneity
Restrictions on the statistical parameters
of
interest
422
432
434
20.5
20.6
Collinearity
S'Near' collinearity
2 1.1
2 1.2
2 1.3
2 1.4
2 1.5
2 1.6
443
443
447
457
463
472
48 1
transformations
22
22.1
22.2
22.3
22.4
expectation
23
23. 1
23.2
23.3
23.4
23.5
23.6
24
24.1
24.2
24.3
24.4
24.5
24.6
The multiYariate
lntroduction
model
Specification
Estimation
Misspecification
testing
Specification testing
Prediction
Looking back
493
494
503
511
521
523
526
527
533
539
548
562
567
571
571
574
579
585
589
596
Contents
Prediction
The multivariate dynamic linear regression
(MDLR) model
Appendix 24. 1 The Wishart distribution
..
Appendix 24.2 Kronecker products and matrix
-
25
25. 1
25.2
25.4
25.5
25.6
25.7
25.8
25.9
25.10
26
599
6O2
differentiation
603
608
608
restrictions
Specification
Maximum likelihood estimation
610
614
619
621
626
637
644
649
654
Least-squares estimation
lnstrumental variables
Misspecification testing
Specification testing
Prediction
Epilogue: towards a methodology of econometric
modelling
26. 1
26.2
26.3
methodology
Condusion
659
659
a
References
673
lndex
689
PA R T I
Introduction
Econometric
1.1
Econometrics
phenomena
is concerned willl
observed data.
the slaremarfc
study
t#'
economic
lsfrlg
tsystematic'.
A preliminary
view
(i)
(ii)
(iii)
developments
in statistics:
the development
It was rather unfortunate that the last two lines of thought developed
largely independent of each other for the next two centuries. Their slow
convergence during the second half of the nineteenth and early twentieth
centuries in the hands of Galton, Edgeworth, Pearson and Yule, inter alia,
culminated with the Fisher paradigm which was to dominate statistical
theory to this day.
The development of the calculus of probability emanating from Graunt's
work began with Halley (1656-1742jand continued with De Moivre
(1667-1754q, Daniel Bernoulli (1700-824,Bayes 41702-61q, Lagrange
(1736- 1813j, Laplace g1749-1827j, Legendre g1752-1833q, Gauss (17891857) inter alia. ln the hands of De Moivre the main line of the calculus of
probability emanating from Jacob Bernoulli (1654-17054 was joined up
with Halley's life tables to begin a remarkable development of probability
theory (see Hacking ( 1975), Maistrov ( 1974)).
The most important of these developments can be summarised
under the
following headings:
manipulation of probabilities (addition, multiplicationl;
(i)
1.1
families of distribution
exponentiall;
functions
(iii)
(iv)
(v)
(vi)
(vii)
tempirical'
explicitly as
p,
2.33:, +0.05:,2
0.0017gJ.
Apart from this demand schedule, King and Davenant extended the line of
thought related to the population and death rates in various directions thus
establishing a tradition in Political
art of reasoning by
relating
figures upon things,
to government'. Political Arithmetik was to
stagnate for almost a century without any major developments in the
descriptive study of data apart from grouping and calculation of tendencies.
From the economic theory viewpoint
Political Arithmetil played an
where
numerical
role
in
classical
economics
data on money
important
and
prices,
public
imports were extensively
stock,
finance, exports
wages,
used as important tools in their various controversies. The best example of
the tradition established by Graunt and Petty is provided by Malthus'
'Essay on the Principles of Population'. ln the bullionist and currencybanking schools controversies numerical data played an important role (see
Schumpeter ( 1954)). During the same period the calculation of index
numbers made its lirst appearance.
With the establishment of the statistical society in 1834 began a more
coordinated activity in most European countries for more reliable and
complete data. During the period 1850-90 a sequence of statistical
congresses established a common tradition for collecting and publishing
data on many economic and social variables making very rapid progress on
this front. ln relation to statistical techniques, however, the progress was
much slower. Measures of central tendency (arithmetic mean, median,
u4/'lnltall'/v',
'the
A preliminary
view
(1980)).
y =X#
+u,
xl
xc,
by a
where p
is unknown. A number T of observations on y can be made,
corresponding to F different sets of (x:
xkl,i.e. we obtain a data set (yf,
xfk), r 1s2,
xf : xta,
'Cbut the readings on y, are subject to error.
(See Heyde and Seneta (1977).)
uzzpi)
.p
u'u=ty
-x#)'(y
-X#),
whichlead to
/= (X'X)--1X'y
( 1.4)
(see Seal (1967:. The problem, as well as the solution, had nothing to do
with probability theory as such. The probabilistic arguments entered the
problem as an afterthought in the attempts of Gauss and Laplace to justify
T are assumed
the method of least-squares. lf the error terms lk1, t 1,2,
to be independent and identically distributed (IlD) according to the normal
distribution, i.e.
=
c2),
.::(.0,
ld,
x.
1, 2,
(1.5)
T)
.E()rry/Xt xf )
=
#'xt,
1, 2,
(1.6)
'C
u,,
1, 2,
xf)
can be written in matrix form as in (2) and the two models become
indistinguishable in terms of notation. From the modelling viewpoint,
however, the two models are very different. The Gauss linear model
relationship where the xffs are known constants. On
describes a
the other hand, the linear regression model refers to a
relationship where yf is related to the observed values of the random vector
Xl (forfurther discussion see Chapter 19). This important difference went
largely unnoticed
by Galton, Pearson and the early twentieth-century
applied econometricians. Galton in particular used the linear regression
causal relationships in support of his theories
model to establish
of heredity in the then newly established discipline of eugenics.
The Gauss linear model was initially developed by astronomers in their
relationships for planetary orbits, using a
attempt to determine
large number of observations with less than totally accurate instruments.
The nature of their problem was such as to enable them to assume that their
theories could account for all the information in the data apart from a
vhfrc-ntpfsp(see Chapter 8) error term uf. The situation being modelled
design' situation because of the relative
resembles an
of
constancy the phenomena in question with nature playing the role of the
'law-like'
'predictive-like'
'law-like'
'law-like'
'experimental
A preliminary
view
'models'
1.1
never pointed out that his method of moments was developed for a different
statistical paradigm where the probability model is not postulated a priori
(see Chapter 13). The distinction between the population and the sample
was initially raised during the last decade of the nineteenth century and
of the
early twentieth century in relation to biqher t?l'J:'r approximations
limit
theorem
(CLT)
results
emanating
from
Bernouli,
Moivre
De
central
and Laplace. These limit theorems were sharpened considerably by
the Russian school (Chebyshev (1821-941, Liapounov (1857-1922q,
A preliminary
view
Wold ( 1938) provided the foundations for time series modelling by relatinp
the above models to the mathematical theory of probability establshed bj
Kolmogorov ( 1933). These developments in time series modelling were to
have only a marginal effect on mainstream econometric modelling until the
mid-7os when a slow but sure convergence of the two methodologies began.
One of the main aims of the present boc)k is to complete this convergence in
methodology,
the context of a reformulated
With the above dcvelopments in probability theory and statistical
inference in mind, let us consider the history of econometric modelling in
the early twentieth eentury. The marginalist revolution of the 187s, with
Walras and Jevons the protagonists, began to take root and with it a change
of attitude towards mathematical and statistical techniques and their role in
studying the economy. ln classical ekronomics ebserved data were used
mainly to
tendencies in support of theoretical arguments or as
kfacts' to be explained. The mathematisation of economic theory brought
about by the marginalist revolution contributed towards a purposeful
attempt to quantify theoretical relationships using observed data. The
theoretical relationships formulated in terms of equations such as demand
and supply funetions seemed to offer themselves for quantification using the
newly established techniques of correlation and regression.
The early literature in econometric modelling concentrated mostly on
two general areas, business cycles and demand curves (see Stigler (1954:.
This can be explained by the availability of data and th influence of the
marginalist revolution. The statistical analysis of business cycles took the
form of applying correlation as a tool to separate long-term secular
movements, periodic movements and short-run oscillations (seeHooker
(1905), Moore (1914)inter alia). The empirical studies in demand theory
concentratd mostly on estimating demand curves using the Gauss linear
model disguised as regression analysis. The estimation of such curves was
fitting' with any probabilistic
treated as
being
arguments
coincidental. Numerous studies of empirical demand schedules, mostly of
agricultural products, were published during the period 1910-30 (see
Stigler ( 1954), Morgan ( 1982), Hendry and Morgan (1986:, seeking to
of demand'. These studies
establish an empirical foundation for the
purported to estimate demand schedules of the simple form
'establish'
tcurve
'law
qD=
l
(1.10)
f/() +J1n,
OD
/7
'
1.1
11
b1
-1-
b1
1 2s
,
prices
at time
(1 11)
'discovery',
4realistic'
kirrelevant
'solve'
Sdetrending'
'
kpurify'
A preliminary
(ii)
(iii)
(iv)
(v)
statistical
N'iew
model specificalion'.
misspecification
testing;
statistical
specification
testing, reparametrisation,
identification;
theoretical models.
By the late 1920: there was a deeply felt need for a more organised effort
to face the problems raised by the early applied econometricians such as
Moore. Mitchell, Schultz. Ctark- Working, Wallace, Wright, fnlcr (Ilftl. This
led to the creation of the Econometric Society in 1930. Frisch, Tinbergen
of
international society'
and Fisher (Irving) initiated the establishment
//- r/lt, adt,ancement (?/' economic l/:t?t??'). in its rt?lation to statistics and
Anfl/lpnlrkrs.
The decade immediately after the creation
of the
Econometric Society can be characterised as the period during which the
foundations of modern econometrics were laid mainly by posing some
important and insightful questions.
An important attempt to resolve some of the problems raised by the
estimated theoretical distinction was mae by Frisch ( 1928) (1934).
Arguing
from the Gauss linear model viewpoint Frisch suggested the so-called
errors-in-variables formulation where the theoretical relationships defined
Jtkf)
in terms of theoretical variables pt EEEfy 1 ,,
defined by the system
of k linear equations:
empirical
versus
'an
bare
A'p f
and
=0
pt + ct
,.J.!k2
)'
are related
to pt
via
1.1
the same as
of this sort,
the theory
This is, in my opinion, unsatisfactory.
In a work
and
bctween slatisifL'fll
the connection
l,b.ltll'et'il'al l-ta/f-/ri?ns
must be thoroughly understood and the nature of the intbrmation which
the statistical relations furnish - although
they are not identical with the
theoretical rclations - should be clearly brought out.
(Sce Frisch ( 1938), pp. 2 - 3.)
.
Fisher
paradigm)
was
approach
to econometric
of experiments
He went on to argue:
A preliminary
view
t)!
0:
0,
A sketcb of a metbodology
1.2
of econometric
modelling adopted
A preliminary
view
Data
Prediction
Econometric
model
Theoretical
Theory
model
(f Orecasting)
Estimation
testing
policy evaluation
Statistical inference
interest
'textbook'
approach
to econometric
modelling.
rate, i.e.
MB
=./'(Ft #, 1).
(1.15)
in some
Most theories of the demand for money can be accommodated
variation of (15) by attributing different interpretations to F. The theoretical
model is a mathematical formulation of a theory. ln the present case we
expressed the theory directly in the functional form (15) in an attempt to
keep the discussion to a minimum. Let the theoretical model be an explicit
functional form for ( 15), say
MB
or
ln
zlyvt'ac.fza
MB
atj + rzj ln F + xz ln # + aa ln 1
in log-linear form
with tzll ln
being a constant.
The next step in the methodological
scheme represented
,4
by Fig. 1.1 is to
transform the theoretical model ( 17) into an econometric model. This is
commonly achieved in an interrelated sequence of steps which is rarely
explicitly stated. Firstly, certain data series, assumed to represent
measurements of the theoretical variables involved, are chosen. Secondly,
the theoretical variables are assumed to coincide with the variables giving
rise to the observed data chosen. This enables us to respecify (17) in terms of
these observable variables, say, V,, f,, Ft and I-t
ln Vt
txtl +
a1 ln 1-,)+ a2 ln #f + aafr,
(1.18)
A sketch of a methodology
l.2
the effects
n
x () + x 1 .f?+ a at + u
3l
( 18) yields
(1
.20)
T,
'illegitimate'
'tackle'
'weak
A preliminary
view
Sblinkers'
ilegitimate
tstraightjacket'
qtB
(1.2 1)
atl + a Lpt.
y-P,t
=
+ a j pit+ uft,
'xo
1, 2,
(1.22)
n.
t?2D(f)
Such a condition
for all
1.
(1.23)
of a methodology
A sketch
)'()+ / l
lt
will give rise to verq' misleading estimates for the theoretical parameters of
interest tztl and tzj (see Chapter 19 for the demand for money). This is
because the GM represented by the Gauss linear model bears little, if any,
resemblanee to the acttlal DGP which gave rise to the observed data (f, pt),
TJln order to account for this some alternative statistical model
t 1.2,
should be specified in this case (see Part IV for several such models).
Moreover, in this case the theoretical model (21) might not be estimable. A
moment's retlection suggests that without any additional information the
estimable form of the model is likely to be an kp-juslrrlcn!
process (price
or/and quantity). lf the observed data have a distinct time dimension this
should be taken into consideration in deciding the estimable form of the
model as well as in specifying the statistical model in the context of which
the latter will be analysed. The estimable form of the model is directly
related to the observable phenomenon of interest which gave rise to the
data (the actual DGP). More often than not the intended seope of the
theory in question is not the demand schedule itself but the explanation of
changes in prices and quantities of interest. ln such a case a demand or and
a supply schedule are used as a means to explain price and quantity changes
not as the intended scope of the theory.
=
.4 preliminary'
view
q/' lhe
(?).st?,-l't.?t/
(l(l(l
lead
The question which naturally arises at this stage is whether we can tackle
some of the problems raised above in the context of an alternative
methodological framework. ln view of the apparent limitations of the
framework should be flexible
textbook methodology any alternative
enough so as to allow the modeller to ask some of the questions raised
above even though readily available answers might not always be
forthcoming. With this in mind such a methodological framework should
attribute an important role to the actual DGP in order to widen the
modelling.
lndeed. the estimable model
intended scope of econometric
should be interpreted as an approximation
to the aclual DGP. This brings
the nature of the observed data at the centre of the scene with the statistical
model being defined directly in terms of the random variables giving rise to
the data and not the error term. The statistical model should be specified as
a generalised description of the mechanism giving rise to thedata. in view of
the estimable model- because the latter is going to be analysed in its context.
A sketch of such a methodological
framework is given in Fig. 1 An
important feature of this framework is that it can include the textbook
methodology as a speeial case under certain conditions. When the actual
DGP is
to resemble the conditions assumed by the theory in
question (Haavelmo type one observed data) then the theoretical and
estilnable models could coincide and the statistical model could differ from
these by a white-noise error. In general, however. we need to distinguish
between them even though the estimable model might not be readily
available in some cases such as the case of the transactions demand for
money (see Chapter 23).
In order to be able to turn the above skeleton of a methodology into a
ftllly fleshed framework we need to formulate some of the concepts involved
in more detail and discuss its implementation at length. Hence, a more
detailed discussion of this methodology is considered in the epilogue where
the various components shown in Fig. 1 are properly defined and their
role explained. In the meantime the following w'orking definitions will
stlffice for the discussion which follows:
.2.
-designed-
,2
Tlle()l-
'.'
a conceptual
constrtlct
A sketch of a methodology
l.2
Theory
Theoretical
Observed data
model
r'-
l
l
l
Statistical model
Estimable model
1
1
Estimation
Misspecification
Reparametrisation
Model selection
l
l
I
I
I
l
l
1
l
I
I
1
I
l
l
I
I
Empirical econometric
prediction,
model
I
l
1
l
potentitlly
chosen.
1
I
I
l
l
l
Estilnable
zyt?t/cp/..a particular
form of the theo retical model which is
estimable
in view of the actual DG P and the observed data
Enlpirical
??kt)t/(>/.'
trvtlrltpnt-zr-.
motlel:
reformulation
(reparametrisation/'
A preliminary
view
Looking ahead
its main aim is the statistical
As the title of the book exemplifies.
of
modelling.
econometric
In relation to Fig. 1.2 the book
foundations
mainly
The
concentrates
on the part u ithin the dotted rectangle.
statistical
model in lerms of the variables giving rise to the
of
specification a
observed data as well as the related statistictl inference results will be the
subject matter of Parts 11and 111.ln Part 15' N'arious statistical models of
interest in econometric modelling and the related statistical inference
results will be considered in some detail. Special attention will be given to
the procedure from the specification of the stttistical model to 1he
dcmand for money
of the empirical econometric model. The transactions
example considered above will be tlsed th roughout Part 1V in an attempt to
awaiting the tlnaware
il1 the context of the textbook
illustrate the
well
with
the alternative
methodology
methodology as
as compare this
formalised in the present book.
modelling
Parts 11 and lll form an integral part of econometric
and
viewed
of
and
the concepts
should not be
as providing a summary
definitions to be used in Part lV, A sound background in probability theory
and statistical inference is crtlcial for the implementation of the approach
adopted in the present book. This is mainly becatlse the modeller is required
statistical model taking into consideration the
to specify the
nature of the data in hand as well as the estimable model. This entails
making decisions about characteristics of the random variables which gave
independence,
rise to the observed data chosen such as normalitys
stationarity- mixing, before any estimation is even attempted. This is one of
the most crucial decisions in the context of econometric modelling because
model
renders
choice of the statistical
the related
an inappropriate
reader
advised
statistical inference concltlsions invalid. Hence. the
is
to view
of
Parts 11 and 11I as an integral part
econometric modelling and not as
reference appendices. ln Part IV the reader is encouraged
to view
econometric modelling as a thinking person's activity and not as a sequence
of technique recipes. Chapter 2 provides a very brief introduction to the
Pearson paradigm in an attempt to motivate 1he Fisher paradigm which is
the subject matter of Parts 11 and 111.
tdesign'
'dangers'
'appropriate'
Additional references
C H AP T E R 2
2.1
Histograms
In order to make the discussion more specific let us consider the after-tax
personal income data of 23 000 households for 1979-80 in the UK. These
data in raw form constitute 23 000 numbers between f 1000 and f 50000.
This presents us with a formidable task in attempting to understand how
income is distributed among the 23 000 households represented in the data.
The purpose of descriptive statistics is to help us make some sense of such
data. A natural way to proceed is to summarise the data by allocating the
The number of intervals is chosen a priori
numbers into classes (intervals).
and it depends on the degree of summarisation needed. In the present case
the income data are allocated into 15 intervals, as shown in Table 2. 1 below
(see National lncome (CH(/ Expenditul'e ( 1983)). The first column of the table
shows the income intervals, the second column shows the number of
incomes falling into each interval and the third column the relative
frequency for each interval. The relative frequency is calculated by dividing
the number of observations in each interval by the total number of
observations. Summarising the data in Table 2.1 enables us to get some idea
of how income is distributed among the various classes. lf we plot the
relative frequencies in a bar graph we get what is known as the histogram,
23
0. 16
0. 15
0. 14
0.13
U'0. 12
11
. E 0.
# 0.10
,/ 0.09
1 0.08
g o o7
0.06
c.c5
0.04
0.03
0.02
0,01
10
11
12 13 14
15
l ncome
Fig. 2.1. The histogram and frequency polygon of the personal
income
data.
Histograms
()j
.- =
..u
=1
-.
'...
...
'y
,1
where $: and zi refer to the relative frequency and the midpoint of interval I'.
' Tbe rnott? as a measure of location refers to the value of income that occurs
most frequentl) in the data set. ln the present case the mode belongs to the
first interval f 1.0- 1.5. Another measure of location is the mtalan referring to
when incomes al'e arranged in an
the value of ineome in thc luiddle
ascenling (01' descending) order according to the size of income. The best
cIftnulatit'v
qvaph which
way to calculate the median is to plot the
such
consrenient
answering
questions
-Ho5v
is more
for
as
many
observations fall below a particular value of income?' (see Fig. 2.2). From
the cumulative frequency graph we can see that the median belongs to the
interval f 3.0-3.5. Comparing the three measures of location we can see that
.//gt/l/.eznc)'
$'
1.0
0.9
$
0.8
' o7
g
.
# o.6
+-
0.5
';
'
'B 0.4
E
c=
0,3
1
0.2
0. 1
l
1
8 9
lncome
10
1 1 12
13
14
15
<
confirming
mean,
the
obvious
asymmetry
ol' the
'2
(zf pli
4.85,
15
mk
=
)
=
(zf
-
z'lksi
3 4
is known as the
defining what are known as hiqber central rrlf?rntanrs. These higher moments
can be used to get a better idea of the shape of the histogram. For example,
the standardised form of the third and fourth moments defined by
SK
?n a
X. an d
-
n
--7.4
u?
(2
.4)
SK
1.43
and
7.33,
which confirms the asymmetry of the histogram (skewed to the right). The
above numerical characteristics referring to the location, dispersion and
shape were calculated for the data set as a whole. lt was argued above,
however, that it may be preferable to separate the data into two larger
groups and study those separately. Let us consider the groups f 1.0-4.5 and
f4.0-20.0 separately. The numerical characteristics for the two groups are
and
2.5,
(721=
0.996,
SKL
0,252,
6.18,
:22
3.8 14,
SKz
2.55,
Kz
11.93,
respectively.
Looking at these measures we can see that although the two subsets of the
income data seemed qualitatively rather similar they actually differ
substantially. The second group has much bigger dispersion, skewness and
kurtosis coefficients.
Returning to the numerical characteristics of the data set as a whole we
2.2
Frequency curves
can see that these seem to represent an uneasy compromise between the
above two subsets. This confirms our first intuitive reaction based on the
histogram that it might be more appropriate to study the two larger groups
separately.
Another form of graphical representation for time-series data is the time
'C The temporal pattern of an economic time series
grkpll (zf l). l 1 2e
is important not only in the context of descriptive statistics but also plays an
important role in econometric modelling in the context of statistical
inference proper', see Part lV.
=
2.2
Frequency curves
Although the histogram can be a very useful way to summarise and study
observed data it is not a very convenient descriptor of data. This is because
()mof intervals) are
j (m being the number
m 1 parameters /1, /a,
describe
it.
analytically
histogram
is a
Moreover,
needed to
the
of
form
the
cumbersome step function
-
( )
,.c
)''')(Sf
i -.-
--
( f
'
1-
J (g,:.r
i
i
,:.r
L'i
j.
))
,
interval and 1(
'
is the
Ilc,zi 1 )
k!llz'f,zi 1 ).
zg
+.
be able to
the two subsets of the data separately we cotlld conceivably
version
smoothed
of
the
frequency
in
polynomial
polygons
form
a
express a
reasoning
This
line
of
1ed
statisticians
the
in
with one or two parameters.
second part of the nineteenth century to suggest various such families of
frequency curves with various shapes for describing observed data,
The Pearson
familyof frequencytwrptas'
dtlo /(z))
z+
(1
bo + b 1 :: + b 2::U,
d ):
which satisfies the condition that the curve touches the z-axis at T)(.c) 0 and
has an optimum at z= -a, that is, the curve has one mode. Clearly, the
solution of the above equation depends on the roots of the denominator. By
imposing different conditions on these roots and choosing different values
for a, bv, ?1 and bz we can generate numerous frequency curves such as
=
(2.8)
(iii)
4(J)
.,4
aty,':--lt'*
11
J-shaped.
(2. 10)
In the case of the income data above we can see that the J-shaped (iii)
frequency curve seems to be our best choice. As can be seen it has only one
parameter (1 and it is clearly a much more convenient descriptor ('if
equal to the
appropriate) of the income data than the histogram. For
lowest income value this is known as the Pareto frequency curve. Looking
at Fig. 2. 1 we can see that for incomes greater than f 4.5 the Pareto
frequency curve seems a very reasonable descriptor.
An important property of the Pearson family of frequency cursres is that
the parameters a, bv, l?I and bs are completely determined from knowledge
of the first four moments. This implies that any frequency curq'e can be fitted
to the data using these moments (see Kendall and Sttlart ( 1969)). At this
point, instead of considering how such frequency curves can be fitted to
observed data we are going to leave the story unfinished to be taken up in
Parts ll1 and IV in order to look ahead to probability theory and statistical
inference proper.
,4a
2.3
2.3
luooking ahead
Looking ahead
tdescribe'
Additional references
PART
11
Probability theory
Probability
'true'
'false',
'good'
'better'
,%
'
Probability
3.1
The theory of probability had its origins in gambling and games of chance
in the mid-seventeenth eentury and its early history is associated with the
names of Huygens, Pascal, Fermat and Bernoulli. This early development
of probability was rather sporadic and without any rigorous mathematical
foundations. The first attempts at some mathematical rigour and a more
sophisticated analytical apparatus than just combinatorial reasoning, are
credited to Laplace, De Moivre, Gauss and Poisson (see Maistrov (1974)).
Laplace proposed what is known today as the classical definition of
probability:
Dehnition 1
If a random experiment can rtrsu/r in N mutuall-v exclusive and
equally likely outcomes and if NA oj' rtasp outcomes result in lr
then te probability of A is desned !?.p
occurrence oj' te event
a4,
N
J'(.4) c=..-..J-.
N
To illustrate the definition let us consider the random experiment of tossing
The set of al1
a fair coin twice and observing the face which shows up.
equally likely outcomes is
S
-4
'observing
be
With
4.
Sequally
.);
iequally
3.1
The notion
of probability
iequally
-4
..4
1im na
ex
11
PA
,
t1:fl
PA.
Fig. 3. 1 illustrates this notion for the case of #=
we say that #(z4)=
in a typical example of 100 trials. As can be seen, although there are some
twild fluctuations' of the relative frequency for a small number of trials, as
these increase the relative frequency tends to
(convergearound ).
Despite the fact that the frequency approach seems to be an improvement
over the classical approach, giving objective status to the notion of
probability by rendering it a property of real world phenomena, there are
as n goes to
some obvious objections to it. tWhat is meant by
infinity'''?' l-low can we generate infinite sequences of trials'?' 'What happens
to phenomena where repeated trials are not possible'?'
The subjecttve approach to probability renders the notion of probability
a subjective status by regarding it as degrees of belief' on behalf of
individuals assessing the uncertainty of a particular situation. The
tsettle'
ilimit
Probability
36
1
.0
0.9
0.8
0.7
..c.
(ru)
0.0
0. 5
0.4
0.3
0.2
0.1
10
20
Fig. 3. 1. Observed
tossings.
30
relative
l
50
40
I
60
70
l
80
frequency of an experiment
1
90
I
100
-on
The axiomatic
approach
3.2
The axiomatic
approach
wllfcll satishes
,4 random experiment, denoted /?y 4 is an f?xpt?rrrlt?;r
conditions:
the Ji?lltpwng
alI possible distinct f.?lkrct?nlty.s are knfpwn a priori;
(f-l)
il1 /ny particular trial rt? ourtrol'tlt/ is not known a priori; and
(/?)
it trfkn be repeated unde. Eftpnlftrtnl conditions.
(c)
Although at first sight this might seem as very unrealistic, even as a model of
a chance mechanism, it will be shown in the following chapters that it can be
extended to provide the basis for much more realistic probability and
statistical models.
The axiomatic approach to probability theory can be viewed as a
In an attempt to
formalisation of the concept of a random yxpcrzlcnr.
Probability
formalise condition (a) all possible distinct outcomes are known a priori,
possible distinct
Kolmogorov
devised the set S which includes
outcomes' and has to be postulated before the experiment is performed.
tall
Dejlnition J
The samplespace,
outcomes (?J te
elementary events.
The elements
'.
experiment
t#'
t#*a//
possible
are
called
Example
Consider the random experiment Ji' of tossing a fair coin twice and
observing the faces turning up. The sample space of & is
with (ST), (TS), (SS), (TT) being the elementary events belonging to S.
The second ingredient of ($* to be formulated relates to (b)and in particular
to the various forms events can take. A moment's reflection suggests that
there is no particular reason why we should be interested in elementary
outcomes only. For example, in the coin experiment we might be interested
least one S',
at most one H' and these are not
in such events as z4l particular
in
elementary events;
.,42
'at
,41 t(z;T),
TH4, HH)
-4c l(SF),
CTH), (FT)l
and
'
.,4l
.,12
'c7'
$-'
3.2
The axiomatic
approacb
Two special events are S itself, called the sul-e plllrll and the impossible event
Z defined to contain no elements of S, i.e. .Z yf j; the latter is defined for
=
completeness.
A third ingredient of &' associated with (b) which Kolmogorov had to
formalise was the idea of uncertainty related to the outcome of any
particular trial of Ji. This he formalised in the notion of probabilities
attributed to the various events associated with $ such as #(,4j), #(.,4c),
expressing the likelihood' of occurrence of these events. Although
attributing probabilities to the elementary events presents no particular
mathematical problems, doing the same for events in general is not as
and
straightforward, The difficulty arises because if
are events
z1c, etc.,
k.p
S z41, zlc S ch
are also events because
of
and
implies the occurrence or
the occurrence or non-occurrcnce
not of these events. This implies that for the attribution of probabilities to
make sense we have to impose some mathematical structure on the set of all
which reflects the fact that whichever way we combine these
events, say
events, the end result is always an event. The temptation at this stage is to
define .F to be the set of a1l subsets of S, called the pwt!r ser; surely this
covers all possibilities! ln the above example the power set of S takes the
form
.g-f:
-4:
.,4a,
z4l
.4a,
z4:
x4c,
zzlc
-41
-42
.,4l
r%
,.F
lS,
t(.HT)), )(Tf1)), )(1S)), )(TT)), )(F11), (ST)),
(4'.Ff1),(1/f1)),
(TT')), .t(1-1'F),(HHII',
)(f1T'), (TT)),
(TT)), .t(fT), (TH), (ff'f1)l,.t(f.fT'), (Tf.f), (TT)),
((ff'.f),
tlfff/'l,(TT), (Tff)l, (CHH), (FT'), (HT)l).
.?,
t('f'f:l),
tlflffl,('FT))
ch
(fT')l
(('f'f1),
in ,F we
(3 6,:/;
tl'fffl.tTfll,
lt turns out that in most cases where the power set does not lead to any
inconsistencies in attributing probabilities we dene the set of events .F to
be the power set of S. But when S is intinite or uncountable (it has as many
40
Probability
elements as there are real numbers) or we are interested in some but not a11
ft
possible events, inconsistencies can arise. For example, if S
)
zlf
S and #(z4) a > 0, #f,
such that
1
ch Aj ,Z (#.j), isjzzz1, 2,
Then #4.)
where .P(.,4)refers to the probability assigned to the event
z' 1 P(.,4) )J,z 1 a > 1 (seebelow), which is an absurd probability- being
p')
greater than one; similar inconsistencies arise when S is uncountable. Apart
from these inconsistencies sometimes we are not interested in a1Ithe subsets
of S. Hence, we need to define ,LF independently of the power set by
structure which ensures that no
endowing it with a mathematical
inconsistencies arise. This is achieved by requiring that .LF has a special
mathematical structure, it is a c-field related to S.
.,41
z42,
U,i)-
zztf
-4.
Dnron
lf :
is called a J-field
Let ..F be a set q subsets t#' S.
lnt/l/complementation:
(f)
e: r/ltpnWG .F - closure
zzlfl
zzlf
1* 1 2,
clllsure
then ( I-%
6: ,F
g
()
1
,.t+-
.4
.:)
.kj
utlioll.
ctprfflrlh/t.?
Note that
t.'/??J
ullder
(iii)
(iv)
(v )
.z'1-=
.,4
k.p
S;
,F (from(iii) V= .(J (E
and
,5.)
.p1.
1 2
t h en ((-) I i ) G .:F
These suggest that a c-field is a set of subsets of S which is closed under
complementation- and countable unions and intersections. That is, any of
these operations on the elements of will give rise to an element of lt can
be checked that the power set of S is indeed a c-field, and so is the set
.f3'
.#-),.
(EE
.k9j
-4
..9
-#'
(.t(ST)),
.?A)
Z,
.),
.6
ts',
3.2
The axiomatic
eonsider an example
such a c-field.
approach
where S is uncountable
of
Example
Let S be the real line R
be
J
t6BxL
x c 2)
tx:
:c)
x<
<
where Bx
'.r
) and
c: z .%x
lj
x)
'.'.f
-
This is an educated ehoice, whieh will prove to be very useful in the sequel.
How can we construct a c-field on E2?The definition of a c-field suggests
that if we start from the events Bx, x 6: R then extend this set to include Xx
andtake countable unions of Bx and X'xwe should be able to define a c-field
on (R, c(J) -- the mfnmll c-,/it!?J lenerated b t'Ile t?rt?nrs Bx, x iE Q. By
definition Bx G c(.f). lf we take complements of Sx: X'x z.. e R, z > x
(x, :7- ) e c( J4 Taking countable unions of Bx : UJ- 1 ( :f- x (1/))j
( :y- x) s c(./). These imply that c(.f) is indeed a c-field. ln order to see how
large a collection c(J') is we can show that events of the form (x, ), gx,
also belong to c(J), using set theoretie operations
as
(x, z) for x < c, and
follows'.
'
.7
.:ys),
.cc
tx)
(x, (y.)
gx, :y:. ) (
=
(x, z)
fx )
'ctp
-
'L<) ,
xj c c(./).
.Y) g
tr(J),
gc'.1x. ) e:c(J),
.x(l
>;.
(')
11 =
ct., ,
1.
x, x
1
-
/1
e:c(J).
This shows not only that t#J) is a c-field but it includes almost every
conceivable subset (event) of R, that is, it coincides with the c-field
The
generated by :7r?.1. set of subsets of R, which wedenote by i.e. tr(J)
c-field will play a very important role in the sequel; we call it the Borel
#c/J on R.
.?4
.?d,
..'#
Probability
42
Dqflnition
.5
axioms:
6 +j.
Axiom 1: #(.g1)): 0 for
1,.and
Axiom 2: PS)
ZI5-)
,5;
ft
Axiom 3:
IL-1
1 #(,4f)
(' ylf ) : is
that
sequence of muttally exclusive events in
called
countable
additivity).
zlf ra Aj=
for i #.j)
-4
'l7crl'
#IU
,4fl
,),/-
.g
#(
'
):,/-
I0, 1q.
-+
The first two axioms seem rather self-evident and are satisfied by both the
classical as well as frequency definitions of probability. Hence, in some
sense,the axiomatic definition of probability
the deficiencies of
of probability
the other definitions by making the interpretation
dispensable for the mathematical model to be built. The third axiom is less
obvious, stating that the probability of the union of unrelated events must
be equal to the addition of their separate probabilities. For example, since
((SF)l rn
Z,
tovercomes'
(IHHII
=
#(.r(JfF))
kl
(f'f)))
#(l(ST'))) +#(((ffS)))
+
- .t .)
=
interpretation'
result. To
Again this coincides with the
summarise the argument so far, Kolmogorov formalised the conditions (a)
and (b)of the random experiment ($ in the form of the trinity (k%, P ))
comprising the set of a1l outcomes S v-the sample space,a c-field c'Fof events
related to S and a probability function #( assigning probabilities to events
For the coin example, if we choose .F t)(SF)), ((TH), (HH), (F7)),
in
Z, 5') to be the c-field of interest, P( ) is dened by
dfrequency
..%
'
.)
si'
'
Psl-
1,
.13(.43):=:0,
#(t(z.fT)))=.t,
,%
P(((TH),
P
'
)) is given
HH), (TT')))=.1.
a name.
Dejlnition 6
S endowed witb
-4 sample
satisfying
axioms 1-3 is
function
.sp/cre
.F and a probability
a c-jeld
called a probability space.
3.3 Conditional
probability
43
Having defined the basic axioms of the theory we can now proceed to
derive more properties for the probability set function using these axioms
and mathematical logic. Although such properties will not be used directly
what we called a probability model, they will be used
in constructing
indirectly. For this reason some of these properties will be listed here for
references without any proofs:
P1)
(#J)
P34
(P4)
(#.5)
#(W) 1 J'(..4),
=
.y1
E:
q.t'
f'4Z) 0.
f.J' ! c X2, P(X1) G P(X2), 1, X2 6 .X'
P(.,1l) + Ptz4al Pt,4j ro
8.,1.1%.)
t#*
1 is a monotone sequence
1.Jjz4,,)J,.:
=
z4
./l
v4c)
#(lim,,....
..42).
events in
.#-
then
a4,,)
limpi...wP(.4,,).
A monotone sequence of events in ,F can be either increasing (expanding)
z4p,
z4l
z1l
c yla c
c
c A,
uo
or
or decreasing (contracting), i.e.
- l
z4,,,
z4,,
z4p,
:u)
respectively. For an increasing sequence lim,,-.
.4,, j
1
z4,,.
P5 is known as the
and for a decreasing sequence lim,,-,
1
continuity propert of the set function #( ) and plays an important role in
probability theory. In particular it ensures that the distribution function
(see Section 4.2) satisfies certain required conditions see also Section 8.4 on
martingales.
.
'
'
'
'
0,*,
.,1,,
'
U,*,.
'
Conditional probability
One important extension of the above formalisation of the random
experiment t.$' in the form of the probability space (.$, #( )) is in the
probabilities.
direction of conditional
So far we have considered
probabilities of events on the assumption that no information is available
relating to the outcome of a particular trial. Sometimes, however,
additional information is available in the form of the known occurrence of
z4.
For example, in the case of tossing a fair coin twice we might
some event
know that in the first trial it was heads. What difference does this
information make to the original triple (S, P ))? Firstly, knowing that
the first trial was a head, the set of all possible outcomes now becomes
,t)6
'
,%
'
SA
)(.f/T'), (SS)),
sincetTW), (TT) are no longer possible. Secondly, the c-field taken to be the
power set now becomes
.F,
(S.a
((f.fT)), ((f.fS))).
,g,
#.:(,,4)=1,
.P,4(,3)=0,
#x(l(ST)))=,
#x(t(SS)))=-i'.
Probability
.4
.%
'
'
,4)
/',14z11)
>
z4)
#(z4I
z'4,4,ra
#4z1)
/ll
zz
#a(X1)
P(XI
j,l 4
I
-4)
t(ST)) and
#( t(1-JT')))
.,4
t(/fT), (HH4)
=.t,
.4)
=j=-,
as above.
Note that .P4-4)>0 for the conditional
Using the above rule of conditional
PlAl
.?121'--P(,41
f--
,4cl
'
probabilities to be delined.
probability we can deduce that
Pfz4cl
z11) #4d1),
= .P(.,42
.
(3.8)
for
.,41,
zzlc
e'
(3.9)
,?>
-42
has
uIX.,4l
l
-42)-
we say that
ztj
Independence
and
#tz'1ll.
zlc
are independent.
-42)
Conditional probability
45
Impovtant concepts
random experiment;
classical, frequency and subjective definitions of probability;
sample space, eiementary events's
c-field. minimal c-field generated by eventss Borel field;
probability set function, probability space (S, P( ));
conditional probability, independent events, mutually exclusive
events.
e?6
'
Questions
Why do we need probability theory in analysing observed data?
What is the role of a mathematical model in attempting to explain real
P henomena'?
Compare and contrast the classieal and frequency definitions of
probability. How do they differ from the axiomatic definition'?
Explain how the axiomatic approach formalises the concept of a
random experiment 4 to that of a probability space (S,
)).
Why do we need the coneept of a c-field in the axiomatisation
of
probability? Explain the concept intuitively.
Explain the concept of the minimal c-field generated by some events
using the half-closed intervals ( uo, xj, x (E R on the real line as an
t%p
example.
'
.,1
z4)
.t#'
Exerdses
Consider the random experiment of throwing a dice and you stand to
lose money if the number of dots is odd. Derive a c-field which will
enable you to consider your interests probabilistically. Explain your
choice.
Consider the random experiment of tossing two indistinguishable fair
coins and observing the faces turning up.
Derive the sample space S, the c-field of the power set L.F'and
(i)
define the probability set function P( ).
Derive the c-field generated by the events )SS) and (T'l).
lf you stand to lose a pound every time a coin turns up
what is the c-field of interest'?
'
'heads'
Probability
46
.4
'
-41
'
P,4(z41)
=
8.4
c5
z11)
.
17(X)
Consider the events IHH). and (FT) and show whether they
are mutually exclusive or and independent.
Consider the random experiment of tossing a coin until it turns up
theads'. Define the sample space and discuss the question of detining a
c-field associated with it.
Consider the random experiment of selecting a card at random from an
ordinary deck of 52 cards.
Find the probability of
(i)
.41 - the card is an ace;
and
'
z4cl
derived in (ii).
and compare
of
..43
independent;
not independent;
Additional references
Barnett
(1976).
CHAPTER
distributions
'
-f6
'
space
by
mapping it into a much more flexible one using the concept of a random
varable.
The basic idea underlying the construction of S, #( ))was to set up a
framework for studying probabilities of events as a prelude to analysing
problems involving uncertainty. The probability space was proposed as a
formalisation of the concept of a random experiment & One facet of tf'
which can help us suggest a more flexible probability space is the fact that
when the experiment is performed the outcome is often considered in
relation to somc quantisable attribute; i.e. an attribute which can be
represented by numbers. Real world outcomes are more often than not
expressed in numbers. lt turns out that assigning numbers to qualitative
outcomes makes possible a much more flexible formulation of probability
theory. This suggests that if we could find a consistent way to assign
numbers to outcomes we might be able to change (,S, #( ))to something
more easily handled. The concept of a random variable is designed to do
just that without changing the underlying probabilistic structure of
(S, % P( )).
,#7k
'
,%
'
'
Random variables
and probability
distributions
4
LHHi
j(J?s))
1(Fr)l
(/?F)
(r8)
(rr)
4.1
I 1 1 1 1 I
between
sample
space,
c-field
and
'
r0,
,@
'
'
./-s),
set function.
A'(
): S
-+
Rx,
4.1
variable
variable
Ar-number of
49
Sheads'
in the coin-tossing
.?/f
'r))
-...
t(T'
(z?'z')),
tt-rffl,
,
1
2 .:(1-1.r.f)),
-+
-.+
and we denote it by
-Y-
i(0)
t(TT)),
A--
14
X-
1(2)::=,
)(Sf1)),
'(
.#1
Random variables
i
for each subset N of Rx the inverse image X - (N) must be an event
'
in ,F. Looking at X as defined above we can see that X - (0)G,?A,
k..p 1
X - 1 (2)c
X - 1(
X - 1 (/t.0 J1 t..p f( 2 ))g
X - 1( 1) G
))(E
1((
1) t.p
g
A' that is, -Y( ) does indeed preserve the event structure of
Ry defined by F(tff1F) )=
,X On the other hand, the function Y'( ): S
YI..fLHH)) 1, y(.t TH j ) F( TT))
0 does nOt preserve the event structure
140)
1(
of .Lt- since F # F - 1) ( ,i/J This prompts us to define a random
variable A' to be any such function satisfying this event prpst?rnfng condition
in relation to some c-field defined on Rx; for generality we always take the
.%
to)
cz
.t2))
.t#',
'
.ft
.%
-+
'
.#t
Borel field
,?d
.%
on R.
Dhnition
.$t?r
.?8
'gt?rlr
(i)
field
R is a
ln deciding whether some function F( ) : S
(ii)
variable we proceed from the elements of the Borel field
of the c-field rF and not the other way around.
variable'.
(iii)
A random variable is neither
nor
Let us consider these important features in some more detail in
of the concept of a random
enhance our understanding
undoubtedly the most important concept in the present book.
,%'
-+
'
..,d
trandom'
random
to those
'a
order to
variable)
,%
-+
rh
'
.%
'
'
4.1
): S
'
(R
-+
Xl( klSfflll
Xl('t(TT)))
1,
140)
'(
'
(4.2)
)(TF)) iE
since
i(
z%-1
.%
,#,
tot,
.:
.%
c(A-1) cu ..'F
c(aY).
.%
'
Fz
X : + A'a + X s
(4.5)
'
-Yztttf1'flll=
1,
A-2()(f1T)))= Xa()(Tf1)))
-+
A-2('t(TT)))=0,
x'j
Note that
z1
is defined as
:at
Etwo
Jls'.
interested in.
il-low do we decide that some function .X( ): S R is a random variable
relative to a given c-field ,i.F?9From the above discussion of the concept of a
.
-+
Random uriables
./dx
.t#.:
'
X '' ((
-
v.
xj
ft
.Y(
-)
.'
(-
xj
py.- s
,!;
e:S
j!
.kT'
for a11 ( -
then
A-- (B4
(s
-Y(s)e: B, s (F S 'j c .F
v-
x)Js
.??,
for a1l B g .A
':y-
'(x)
.t7-
1 x< 2
:;
,(3
,
f#
( TH (T T) )
-
(H
'r)-
0 .A.J' < 1
(HH)
(4.9)
1 % )',
'
.../'),,
..
wrandotnn
Probability
an attempt
enters the picture after the random variable has been defined in
model induced by X.
to complete the malhematical
Table 4. 1
variable
A random
.?4
.?4
.?#
-->
'
for all B G
in the case illustrated
For example,
p
).
()
x ( J
Px l (h0 l)
4.
1.4
p x ( tf j J) j
-1-
Px , ( 111)
2'
=
3-4
.
(4.10)
,.#.
in Table 4. 1
p x (ft 2,))
-l4
Px ( )0 l k.p
j
p x ( j()'j
h11)
k.p (
t
1 Px
,
j J).j
J- ?
4
etc.,
( ft0 ) rn t 1)) 0.
=
The question which arises is whether, in order to define the set function
#x( ), we need to consider al1 the elements of the Borel field 4. The answer is
that we do not need to do that because, as argued above, any such element
of can be expressed in terms of the semi-closed intervals ( :s, .xq. This
we can
implies that by choosing such semi-closed intervals
define #xt ) with the minimum of effort. For example, Px( ) fOr x, as defined
in Table 4. 1, can be defined as follows:
'
..#
Sintelligently',
'
'
As we can see, the semi-closed intervals were chosen to divide the real line at
the points corresponding to the values taken by X. This way of defining the
semi-closed intervals is clearly non-unique but it wll prove very convenient
in the next section.
The discerning reader will have noted that since we introduced the
concept of a random variable A'( ) on (S, .k P( )) we have in effect
'
'
Random variables
54
and probability
distributions
'
.%
'
.?d,
'
Ceasier
'
'
.%
'
..%
'
'
':yo,
'
.t7-
'
.@,
'
-i/')
'
-%
'
1(8H), ( rrll
1(/./s)) j(rrlk
CHHL
(TH3
t8f)
Fr)
s 1(rJ?)
(/./r))
(/.fr)
(
1
r/.?),(r7')t
1(8r) (THb (/-/8)1
,
0
!
.-
.))
to (Rx,?#,#x(
'
.0
Rx
Fig. 4.3. The change from (.S,,#-,17(
)) induced
by X.
4.2
4.2
'
(R,
#xt )) which has a much more convenient mathematical structure.
The latter probability space, however, is not as yet simple enough because
Px( ) is still a set function albeit on real line intervals. ln order to simplify it
we need to transform it into a point function (a function from a point to a
point) with which we are so familiar.
The first step in transforming #xl ) into a point function comes in the
form of the result discussed in the previous section, that #xt ) need only be
defined on semi-closed intervals ( - cc,
x g R, because the Borel field 4
viewed
c-field
the
minimal
generated
by such intervals. With this
as
can be
view
of the fact that a1l such
proceed
in
mind
in
to argue that
we can
starting
intervals have a common
(
) we could conceivably define
point
function
a
.%
'
'
'
'
.x(1,
'point-
E:c,
F( ): ER--+
.
g0,11,
which is, seemingly, only a function of x. In effect, however, this function will
do exactly the same job as Px ). Heuristically, this is achieved by defining
F( ) as a point function by
'
for all x c R,
and assigning the value zero to F( - :y.). Moreover, given that as increases
the interval it implicitly represents becomes bigger we need to ensure that
F(x) is a non-decreasing
function with one being its maximum value (i.e.
.x
reasons we
'
Dlflnition 2
Let ZYbe a ,-.!?. tljlned
I0. 11 dehned tv
F(x) #x((
=
(.,
(pll
x1)
'wy-
#(
.k',i
)).The ptpnr
Pr(X G x),
F(x) is rltpn-f/fs?c-tv/.sfng''
( )
F( -
i1*
:y.-
) li rn.x
=
-+
.,
F( x ) 0
=
.jr
t#'
./ntrlforl
all x
A- and
F(
6
R
ustkrs/'s
)..R
-+
(4.14)
the
56
Random variables
and proability
distributions
It can be shown (seeChung ( 1974)) that this defines a unique point function
for every set function #x( ).
The great advantage of F( ) over #( ) and #xt ) is that the former is a
point function and can be represented in the form of an algebraic formula)
the kind of functions we are so familiar with in elementary mathematics.
'
'
'
'
Drlrft'?n
z4random
of l/?t? set
variable
random
Dehnition 4
X is called (absolutelyj continuous i.f its
F(x) is continuous .Jtpr
alI x iE R and there t.?--?'xr.
real
tbe
that
Iine
.J( ) on
-4 random lwr/?/g
(Iistribution
(1non-neqative
,lrlcrft'?n
-s?.,/c/?
.//ntrrft?l?
'
F(x)
(l) d r/,
4.2
Dehni rtpn 5
Y e: r'.'i
't'
F(x)
()
ld A
i.
t't'?? t 1*?.7
ut?l?.s
(4.20)
vx G Er,!l
- (Iis'l-ee
./'(u),
.v
ln the coin-tossing
for a discrete with those of a continuous r.v.
order to compare F(x) and
where
consder
the
Alet us
takes values n the interval lk, ?q and a1l
case
attributed
of
the
values z are
same probability', we express this by saying
unljrnll
is
Adistributed in the interval (k, l and we write Athat
t7tl, !?).The DF of Ar takes the form
,/-(0)
./'(
./'(2)
,/'(x)
'v
X < (1
elsewhere.
Random uriables
and probability
distributions
Comparing Figs. 4.4 and 4.5 with 4.6 and 4.7 we can see that in the case of
random variable the DF is a step function and the density
discrete
a
function attributes probabilities at discrete points. On the other hand, for a
continuous r.v. the density function cannot be interpreted as attributing
probabilities because, by definition, if X is a continuous r.v. P(X .x) 0 for
a1l x 6 R. This can be seen from the detinition of /'(x) at every continuity
=
4.2
59
random
variable.
(4.23)
i.e. .J( ): R
.
-+
r0,
v-,l.
(4.24)
(4.25)
(4.26)
(4.27)
(iv)
.J(x)=.uF(x),
(4.28)
Random variables
60
and probability
distributions
'
'
(4.29)
ln cases where the distribution function F(x) is continuous but no
integrating function .J(x)existss i.e. (d/dx)F(x) 0 for some x e: J2,then F(x)
is sqid to be a sinqular f/f.$r?'l'?l./lf(?n. Singular distributions are beyond the
scope of this book (see Chung ( 1974)).
=
4.3
'
'
.k
'
'
.@,
'
'
'
,.?d.
'
'
Pfs: -Y(y)6
(-
:y.,
x(1,
us
S)
#x( -
.,.'y,
xl
(4.30)
F(.x).
,/'(x)
./'(x)
./'(x;
'
(l)
).J(.x',04, 0 G O )
.
model
variable
for
differentvalues
of a parametric
(Lp=
ftx.,(?)
x
-9.
x
t?+ l
,
xll - a known number O r1+- the positive real line. For each value in 0.
/'(.x;p)represents a different density (hencethe term parametric family) as can
be seen from Fig. 4.8.
When such a probability model is postulated it is intended as a
description of the chance mechanism generating the observed data. For
example, the model in Fig. 4.8 is commonly postulated in modelling
personal incomes exceeding a certain level x(). lf we compare the above
graph with the histogram of personal income data in Chapter 2 for incomes over E4500 we can see that postulating a Pareto probability
density seems to be a reasonable model. In practice there are numerous
such parametric families of densities we can choose from, some of which will
be considered in the next section, The choice of one such family, when
modelling a particular real phenomenon, is usually determined by previous
experience in modelling similar phenomena or by a preliminary study of the
data.
When a particular parametric family of densities * is chosen, as the
appropriate probability model for modelling a real phenomenon, we are in
effect assuming that the observed data available were generated by the
'chance mechanism' described by one of those densities in *. The original
uncertainty relating to the outcome of a particular trial of the experiment
=
62
Random variables
and probability
distributions
has now been transformed into the uncertainty relating to the choice of one
0 in 6), say 0*, which determines uniquely the one density, that is, tx,'p*),
which gave rise to the observed data. The task of determining 0* or testing
some hypothesis about 0* using the observed data lies with statistical
inference in Part 111. ln the meantime, however, we need to formulate a
mathematical framework in the context of which the probability model (l)
can be analysed and extended. This involves not only considering a number
of different parametric families of densities, appropriate for modelling
different real phenomena but also developing a mathematical apparatus
which enables us to describe, compare, analyse and extend such models.
The reader should keep this in mind when reading the following chapters to
enable him her not to lose sight of the woods for the trees. The woods
comprise the above formulation of the probability model and its various
generalisations and extensions, the trees are the various concepts and
techniques which enable us to describe and analyse the probability model in
its various formulations.
4.4
uniYariate
Some
distributionst
'
(1)
Discrete distributions
Consider a random
outcomes, we call
(Esuccess',
Asuccess)
and Pr(X
'success'
kfailure'
on a
4.4 Some
/'(x', p)
pXl
distributions
univariate
1 p)
-
1 -
fo r x
'N
'
0, 1
otherwise.
and the probability
ln practice p is unknown
model
(4.34)
Such a probability model might be appropriate in modelling the sex of a
newborn baby, boy or girl, or whether the next president of the USA will be
a Democrat or a Republican.
isuccess'
otherwise.
and
we
'v
#(n, p):
&
'
'w
Sdistributed
reads
as'.
Note that
n
.'
n!
-(n y) ! .p!
/(!
'
(/
1)
'
(/( 2)
2 1.
.
tfailure',
'
'
'
'
-1
'
'
64
where
j.
,g
-u-
.jy
&
x'7.t,7/.,(
-
-- . - .
1 J?)q
rx
reads approximately
equal.
(2)
Continuous distributions
(i)
-'b
e ntp/vzlt//
tlist'lnibut
k't?/'?
on
normal distribution.
A random variable
zYis normally
function is given by
j'q
x y
.
''
(y
1
ex j)
cx (.27r)
-
-s .-.
...
1
-
2c
2
(x - g j
.
density
4.4
0.8
0.8
0.7
0. 7
65
0,6
0.6
0.6
n
R
< 04
5
0'0S
0.5
n
V
< 04
0.3
0.3
0.2
0,2
0.1
0.1
0 12 3 4 5 6
0.5
0 1 2 34 56
0.6
0.6
0.5
0.5
0.4
n
#'
< 03
1()
0.05
0.4
< 03
0.2
0.2
0. 1
0. 1
lo
0. 5
0 1 2 3 4 5 6 7 8 9 10
0.6
0.5
0. 5
0.4
0 12 3 4 56 7
0.6
n
/7
< 03
n
P
ac
0.05
0.4
< 03
0.2
0.2
0. 1
0. 1
0 12 34 5 6 7
X
pc
0'5
jI
0 1 2 3 4 5 6 7 8 9 10 12 14
1 1 13 15
random
this by Ar N(p, c2). The parameters p and c2 will be studied in more detail
when we consider mathematical expectation. At this stage we will treat
them as the parameters determining the location and flatness of the density.
For a fixed c2 the normal density for three different values of p is given in
Fig. 4. 10.
'v
Random variables
-4.0
Jz
Jl
4.0
0.40
0.30
:i o,2o
<
0.10
0.00
-8
-6
-7
-4
-5
-3
-2
-1
1.00
0.90
0.80
0.70
0,60
Sil(Lso
<
0.40
0.30
0.20
0, 10
0.00
J=
2.5
-8
-7
-6
-4
-5
-3
-2
-1
Fig. 4.1 1 represents the graph of the normal density for p 0 and three
alternative values of c; as can be seen, the greater the value of c the flatter
the graph of the density. As far as the shape of the normal distribution and
density functions are concerned we note the following characteristics:
=
fp
=
k)
Pry
1 expt /(2,
2c
cx/(2a)
x x/t+ k) Pry
-k
,v
.ytjj
(4.39)
.k),
(4.40)
G.. <yj,
x)
1 - F(x + 2p).
(4.4 1)
4.4 Some
univariate distributions
67
d./-txl ftx)
dx =
'
2(x
=0
2cz
'
at x
=p,
-p)
=
'
and
ftuj
-'
'
(2zr)
(4.42)
(iii)
=p
+ c,
Thus, c not only measures the flatness of the graph of the pdf but it
determines the distance of the points of inflection from p. Fig. 4.12
represents the graphs of the normal DF and pdf in an attempt to
give the reader some idea about the concentration
of these
functions in terms of the standard deviation parameter c around
the mean p.
10
.
W)
0.84 r----0.50
I
I
---u 1 c.:6
-c
f (x)
Shaded area
I
I
I
f
.-
+ (z
2c
1
(J,s)
0.9545
o 65
tV(27r1
0.1353
(V(2F1
-2g
p
Fig. 4. 12. lmportant
functions.
y-o
Jz + c
g + 2c
68
1
/.(y -t;jaalexp g
.
x.'
..--j.:r2
c is
(-Y-p)
---
which does not depend on the unknown parameters p, c. This is called the
standard normal distribution, which we write in the form Z N(0, 1). This
shows that any normal nv. can be transformed to a standard normal nv.
when p and c are known. The implication of this is that using the tables for
and F(x) using the transformation Z
J(z)and F(z) we can deduce
(.Y -p) c. For example, if we want to calculate P6X G 1.5) for A' N( 1, 4)
/:-4z) F(0.25) 0.5987, that is, F(x)
we form J (x 1) 2 0.25
x
./'(x)
F(1.5)
uc>
0.5987.
(ii) Expon
(.w tial
.?'l'll'/-J'
t?/' (listributions
characteristics
Numerical
of random
Yariables
'priori.
(1)
Mathematial
expectation
Let Ar be a random
respectively. The
F(.Y)
mean
.ylxl
n%
'
r.v.
Characteristics
4.5
69
of random vayiables
and
F(A-)
xf./'t-Yf
fOr
a discrete r.v.,
(4.47)
used. We sacrifice a certain generality by not going directly to the
Lebesque integral which is tailor-made for probability theory. This is done,
however, to moderate the mathematical difficulty of the book.
The mean can be interpreted as the centre q/' gravitv of the unit mass as
distributed by the density function. lf we denote the mass located at a
from the origin by m(xf) then the centre of gravity is
distance x, i 1, 2,
can be
located
at
-vfnyt-vj)
Zf-
.-
(4.48)
?Fl(.Y)
1.
If we identify nt-vsl with ptxf) then f)A-)= jf xptxf), given that
f Jx'f)=
provides a measure of location (orcentral
In this sense the mean of the r.v.
tendency) for the density function of X.
'
If A- is a Bernoulli distributed
r.v. (X
'v
b 1, p)) then
A'
/'(x) (1 - p)
lf X
'v
distributed
U((I,
'(A-)
,x?
.Vtxldx
j
x
b-
(1
dx
1
=
-.
then
r.v.,
x2
---
2 b
a +. /)
70
Random uriables
lf X
'v
F(Ar)
(2,c)
+. p )
e .-ya (ja
(2zr)
.x;
'
--
cc
1 x u
2
c
-
exo
(Jz
for
e-izz
(jz
(r
:c
dx,
(2z:)
-.jc2 (j Z
(27:)
= 0 + p 1 p,
since the first term is an odd function, i.e. h - x)= - /1(x).Thus, the
P arameter p for X Np, c2) represents its mean.
.
'w
ln the above examples the mean of the nv. A- existed. The condition which
guarantees the existence of '(Ar) is that
X
dx <
Ixl.(x)
cc.
< w.
)(gIxfl/txf)
or
vo
(4.49)
One example where the mean does not exist is the case of a Cauchy
distributed r.v. with a pdf given by
flxs
1
zr(1 + x 2 )
R.
.X 6
x,
-X
dx
Ixl/txl
lxl1 + x
=-
z: - x
=- olimzc 2
zr
-+
Ctl
1
--
zr
x,
oxc
dx
by synlnxetry
dx
c
1
x
dx=- lim logell
: a .
1+ x a
+J2)
-+
That is, '(Ar) does not exist for the Cauchy distribution.
c, (' c is a constant.
ftzArj + bxz) tkEtxjl + lxEl-fzl for Jn-p
'(c)
lwt? r.,.'s
ArI and
of random
4.5 Cllaracteristics
variables
'w
(Z)1-
::::>'
'v'
Z)'=
Z7=
bsuccess'.
such that
(4.51)
mean
median
(i.e.it has
=mode,
assuming that the mean exists. On the other hand, if the pdf is not unimodal
this result is not necessarily valid, as Fig. 4. 13 exemplifies. ln contrast to the
mean the median always exists, in particular for the Cauchy distribution
x,n
=0.
I
I
l
l
I
I
I
I
I
I
1
I
j
1
1
I
I
I
'G
mode
mode.
mean
median
mode
(2)
variables
Random
The varlance
Vart#l
SIIA- f'tA-ljz
(x - f(-))2/(x)
F(.Y))2/'(xf)
= il (xf-
(4.52)
dx - continuous
(4.53)
- discrete.
The variance
is referred to as
deviation.
standard
Exkrnp/t?s
(i)
VarlA-l
(0
Sf
Var(A-)
X-
p, thus
-p)2p=p(
a+
+ (1
-p)
-p).
-f.'/)2
(?
dx
=
-a
12
(verify).
where
v2./'(x)dx.
for z
-..-
Characteristics of random
4.5
Yariables
.fr
( PrI)
lS2)
.jr
(
( Iz'3)
-lt?lwtlt?n
probabilities.
(3)
Higher moments
Continuing
plr
(-x-- p)r./-(x)dx,
2, 3,
14
:3
--
and
lt *
-:
(T
pr- j=1 (
l
.
1/
p'ipr-j.
,
(4.58)
74
Random variables
function defined by
J
Eteilx)
kx-
eirxdz-txl,
/x(l)
v''- 1.
(4.59)
eA
., (jrlr
p'r.
F. --r!
1+
(4.60)
Fr
dr/
x
(jjr
(r)
=
A function related to
Chapter 10) is
loge/xtr)
()
/xtrl of
1+
(4.6j)
particular
x) (ir)r&r
r 1 r!
)2
(see
(4.62)
where
sr, r= 1, 2,
Example
Let A- Np, c2), the characteristic
'w
-r2c2),
1d/xlll
'i- dl
(M=expirp
=
o 1-
,-
1 d24 (r)
7 dr ,
-.ir
exptirp
()
a
= p +
Gz =
pa
J,
)(ip
(r a
rc c)
p,
()
,-
=0,
l1
=p,
l (/t'c,)-r
-
(4.63)
vs.
condition.
4.5
Important
Random
of random variables
Characteristics
concepts
the probability
variable,
by a r.v., a c-field
space induced
Questions
Since we can build the whole of probability theory on (S, % P ))why
do we need to introduce the concept of a random variable?
Define the concept of a r.v. and explain the role of the Borel field
generated by the half-closed intervals ( vz, x(l in deciding whether a
function .Xt ): S R is a r.v.
'Although any function Art ); S
Ii can be defined to be a nv. relative
c-field
valuable information if we do not
stand
lose
to
we
to some
define the nv. with care'. Discuss.
Explain the relationship between #( ), #xt ) and Fx( ).
Discuss the relationship between Fx( ) and .J( ) for both discrete and
continllous r.v.'s. What properties do density functions satisfy?
Explain the idea of a probability model * ).J(.x;
0j, 0 (E 6)) and
discuss its relationship with the idea of a random experiment $ as well
as the real phenomenon to be modelled.
Give an example of a real phenomenon for which each of the
following distributions might be appropriate:
Bernoulli;
(i)
binomial',
(ii)
normal.
(iii)
'
-+
'
--+
,.'h
'
'
'
'
'
ttx;
76
distributions
12. Compare the properties of the mean with those of the variance.
do the moments characterise the
13. Under what circumstances
distribution function of a r.v.?
Exercises
b, c, J) and 84/) #(h)
Consider a random experiment with S (tp,
#(C)
P(Is l'.
Derive the c-eld of the power set
(i)
l/J, ?)
say ..W).
Derive the minimal c-field generated by
(ii)
S:
Consider the following function defined as
=t,
=-),
,?/.'
-tfzl
-Y(c) -Y(J)
0,
.(?)
J'(/))
F((.')
1,
7 405 926,t
J'(J)
2.
(iii)
(iv)
(v)
(vi)
(vii)
,t7j
,#(.
.(y)
and
plot them.
Calculate E(F), Var(1'), az(F) and a4(J'l.
(viii)
The distlibution function o the exponential distl-ibution is
F(x)
exp
'
(i)
(ii)
(iii)
'(ei'x).
Note:
'# If the reader is wondering about the significance of thfs number it fs the number of
demons inhabiting the earth as calculated by German physician Weirus in the
sixteenth centtlry (see Jaslrow (1962:.
Characteristics
4.5
of random
variables
represent
proper
density
2(1 -x)2,
x > 1,'
341
x y 1,e
.J(x)
< x < 2,'
1),
+
0
.(x3
(iv)
/'(x)
3,
J(.Y)
iE R.
x
(v)
-lx
Prove that Vart-Yl E(X2) - gE(xY)q2.
lf for the nv. X, E(-Y) 2, F(xY2) 4, find F(3X + 4), Var(4X).
Let Ar N(0. 1). Use the tables to calculate the probabilities
F(2.5);
(i)
F(0. 15),(ii)
1 - F(2.0).
(iii)
and compare them with the boundsfrom Chebyshev's inequality. What
is the percentage of error in the three cases?
,f(x)
(ii)
(iii)
''E).
Additional references
Bickel and Doksum ( 1977)., Chung ( 1974)., Cramer ( 1946)., Dudewicz ( 1976),. Giri
t 1974)) Mood, Graybill and Boes (1974); Pfeiffer ( 1978); Rohatgi ( 1976),
C H AP T E R 5
The probability model formulated in the previous chapter was in the form
of a parametric family of densities associated with a random variable (r.v.)
0), 0 s O). ln practice, however, there are many observable
X: *
phenomena where the outcome comes in the form of several quantitative
attributes. For example, data on personal income might be related to
number of children, social class, type of occupation, age class, etc. ln order
to be able to model such real phenomena we need to extend the above
framework for a single r.v. to one for multidimensional r.v.'s or random
vectors, that is,
=
t/tx',
(A'1 Xz,
X',)'.
5.1
space takes the form S t(ST), CTH), CHH), (TT)). Define the function
Both
and -Y2( ) to be the number of
.lt ) to be the number of
of these functions map S into the real line (!4in the form
=
Stails'.
'heads'
'
'
'
'
(A-1(
'
'
(.Y1(
'
),
-'2(
-
))l(TT)l
(.0,2).
'
..%
X(
'
-+
222
definesa random
.22.say B H (:j,
X- 1(B)
belongsto
'
(s: xYI(s)
tield product
.@
,)
.@
(5.2)
..%
S
Xa
(88)
(HP
( F8)
(rr)
xl
80
Extending
,#
,xj
.@
,?d
X-
((
w xq)
.F for a1l x
/2
A random
Vtor
Ar(
'
)..is
a vector
as follows:
.jnction
cc <
X a(.s) G x 2 s s
,
s) e
..@
,82,
')),
r@1
.#2,
'
5.1
This is achieved
by attributing
i@l
to each B i!
the
(5.6)
This enables us to reduce
joint
#x(
tlistribution
tcunultlrfrt?l
Djlnition
'
./ncrftpn.
) to
F(
):
/2
--.
a random
ptvltpr
'
)). The
g0,1q,
stch that
EE
#?-(X .Gx)
X(
) takes the
alue (1, 1), (2,0), (0,2) with probabilities .l,
ln order to
derive thejoint distribution function (DF) we have to define al1 the events of
the form )s: Xltsl Gxl, -Yc$) Gxc, s c 5') for all (xj, x2) (E (22
.
1.4and .1.4.
respectively.
x:
<0,
0 Gx l
xc <0
<
2, 0 :; x2
.xl 1 2,
Ntlrt, a
The
degree of arbitrariness
F(x1 xa)
,
<
0,
xa <
-1.4.x 1 > 2 0
,
1,
t;
xl
y 2,
xcy
xz< 2
2.
<
2, x 2 y 2
x2 >
rectangles
<2
:1;h 2, oGxc
0 Gx j
<
2.
':t;,
-
xq.
Random vectors
82
of (X1, X2)
jnction
From the definition of the joint DF we can deduce that F(x1 x2) is a
monotone non-decreasing function in each variable separately, and
s
lim F(x1
X
-/
-#
1
X 2 -#
x2)
-*
lim F(x1
X
x2)
(5.8)
=0.,
1.
m.
'X
Tejoint
4' rlrtr
exists a density
(5.10)
.Jtxl.x2)
Pt-lxk
everywhere
except at a
jinite t)r
countablv
wjr
=
x1, Xz
(5.11)
x2).
.''
F(x1, x2)
./'txlf,
A i<
,z
xci).
Dehnition 4
Thejoint DF t?/' A-1 and A-cis called absolutely') continuous if r/-lthl't?
exists a non-negative function .J(x1x2) such that
,
5.2
Some
bivariate distributions
f (x xc)
z
A'
-ej
Xc
1
2
Xl
function of Table 5. 1.
Fig. 5.3. The bivariate density
-va).
if j'
at
) is continuous
(xj,
2)
(.--.&-1
f'lx:xa; 0) aagjo.a-
0- (pj pc,
,
(p,1q.
f (x
xa)
,-
l
l
l
l
l
Xz
xe
Nw
Nx
N
.-..
'-
w.
..-
>
w.
(0 0)
'--
N
>.
X1
normal density.
#1
--
X1
- 2p
#1
c1 -
p2
X2 -
(r 2
X2
- p2
o'z
2
m
ca
0 (k
=
(3)
a l a a)
,
/'(x 1 x c)
.
,
/1
... - ! J?1'P)2,
a -.j
1 .v 2
-Y1
+ xc
p 1 + I.'z
1
.
(5.20)
85
.f
/ (x ; 0) 0 6 O )
'
t.
family of
to that of a parametric
*
,/'(
xa
x,,; 0) 0 6 (9
,
(5.22)
5.3
Marginal distributions
Let X EB (xY1 Xal be a bivariate random vector defined on (,. ...Fi#( ))with a
joint distribution function F(x1, xa). The question which naturally arises is
whether we could separate A-l and Xz and consider them as individual
random variables. The answer to this question leads us to the concept of a
marginal J.srrf/pfklfon. The marginal distribution funetions of A-1and Xz are
defined by
'
F1 (x1)
and
F c(x 2 )
-+
J-
l im F(x : x 2)
,
. 1
''*
ik
2(#
:%
x2
)G
(5.25)
.:''
:$
2.(
x 1 A- s) < vs
,
lj
(5 6)
.2
t
which we know belongs to .K This event, however, can be written as the
intersection of two sets of the form
.,1
(-$)% xl X 2(s)
,
<
'zt-'
ts: A- (s)
I
:;
xl
(5.28)
WK
86
Random vectors
F1(x1)
lim Ftxl
X2
since 1im,,-+
.(e-'')
-+
x2)
e-ttVl,
-
0. Similarly,
F 2 (x2) 1 - e-tz
=
x2
6F R +
(5.30)
and
h (x1)
/t-vl x2).
-
Example
Consider the working
population
follows:
Income:
2000-4000, f 4000-8000, f 8000- 12 000, f 12 000-20 000, f 20 00050 000, over f 50 000.
5.3
distributions
Marginal
87
.?
-..;
-U-
...y.
Xz
---..
.--.
:LL
-l-
.y.
l-
1
2
3
4
5
6
Jc(xa)
0.250
0.075
0.020
0.250
0.040
0.075
0.020
0.010
0.005
0.030
0.015
0.010
0.400
0.400
.y,
ay
-7
.3
.....t.
0.5
0.020
0. 1(y)
0.035
1
I
/.1(.xj )
0.275
0.345
0.215
0.020
0.085
0.045
0.020
0.035
0.200
1.000
../'txlx2)
'-h
(-x,)
(5.32)
(x2),
'jz'
Independence in
that ,Yl and Xz are independent
terms of the distribution functions takes the same form
.X2)
l-.t.'.s.
F1(x1) F2(x2).
'
'..f2(x2),
/'(x1,x2) #.Jllxll
0.250 #
(0.275)(0.4),
and hence, Arl and Xz are not independent r.v.'s, i.e. income and age are
related in some probabilistic sense; it is more probable to be middle-aged
and rich than young and rich!
In the continuous r.v.'s example we can easily verify that
Fl(xl)
Ltnd
'
F2(x2)
(1
e-0xt)( 1
-
e-0X2)
=
F(x
1,
x2),
Random vectors
and
in the context of the probability
Note that two events,
said
P( )) are
(S,
to be independent (seeSection 3.3) if
.42,
,41
space
,.1
'
#(v4: ro
.4cl
#(-4j) #(,4a).
.
Zi
.J2(X2)
1.
(5.36)
(5.37)
../-2
( ) '-y.,-y.
N ( a )o.a
.X2
eX p
1
-
//x
2 -
u :1
(5.39 )
'-
(72
Hence, the marginal density functions ofjointly normal r.v.'s are unvarate
normal.
provides us with ways to
ln eonclusion we observe that marqinalisation
when
model
is
model
such
defined in terms of joint
simplify a probability
variables.
random
In
unwanted
out'
by
density functions
any
be
of
Xk
interest
X2,
the
density
of
nv.'s
marginal
the
can
general,
'taking
-'.'l
distributions
Conditional
5.4
our investigation
we can
-a.
Conditional distributions
5.4
'
zzla
-4....1.
#(
''jz?d
:k'
!'kF'
1
-4
..1
P(
..4
z)
ro
#(d2)
,.4
-.4
a e:./'
(1:*
''''
By choosing X j fts: X 1 (s) < .'7l ) we could use the above formula to derive
an analogous definition in terms of distribution functions. that is
=
where
f .Gxl
( .''A(::!1.
) P$X
.
't'
..
'
..
..'::1.
,,.'.,4c),
.-a
,4
P ( t s : A 1 (s ) .Gx : j...'ro c ( a ))
7
-
F .!k
j .' rr1
lk
2)
f'(c('c))
since ctA'al e although it is not particularly clear what form Fv, ct,ya) will
however, it is immcdiately
X2(s)
take. In the case where
a)
a
xa)
when
is
Xz
since
#(s:
obvious that
a continuous 1'.v., there will
.t
./1
z4a(s)
'(s:
.f
=0
90
Random vectors
c)
.f
Pr(X
Pr(X
x j 'Xz
x 1 X:
,
Prlx,
.f
2)
.f2)
=
j'lx 1
./2(.z)
.2)
.f-
2)
.
The upper tilda is used to emphasise the fact that it refers to just one value
taken by Xz.
Example
Let us consider the income-age example of the previous section in order to
derive the conditional density for the discrete r.v. case. Assume that we want
6 (incomeclass of over
to derive the conditional density of Xz given
f 50 000). This conditional density takes the form:
-',t-1
flxz
.f1)
0.005
0.035
=0.143
for Xz
2 (middle aged)
1 for Xz
3 (senior).
0.0 10
= 0.035 0.286
=
0.020
= 0.035
=0.57
1 (young)
for Xz
density.
variables
xj Xz
Prx,
.kL)
7cl
0
(j
The mathematical
-41
.t
s: X
(.s),I x 1 ) i! .F
(5.46)
5.4
Conditional distributions
kz by
-+
lim
= 0<h-0
tg,
..fa)
xl
(5.49)
dK.
J2(.f2)
xl
Rx:.
Examples
Consider the bivariate logistic distribution
F(x1 x2)
,
(X
FX
l
+e-x2)-
(1
+e-At
l (1 + C - )Xl
1+ e - x e 1+
j
2( 1 +
e-X2)
e-X2
A:2
e-x'
92
Random vectors
Xx c (X 2 )
-
Hence
1-
=
Let j'lx 1 x a)
.
,
xl
>fC1 >0,
C-
X2
'
f?xzl - *1 exp
1 + $x 1 )(1 +
()4
2)t2
;.lk + 1)(r.,1
1 (1
xc >az
distribution
>0.
-1-
l $4/./
2x 1
+ a 1x a -
k- x ( 1 + 0x a) l
21
(1 1 (1
c )-
(2
'
density functions
There are two things to note in relation to conditional
above
examples'.
brought out by the
the conditional density is a proper density function- i.e.
(a)
xlgR.y,
(5.54)
xzeRxa
./'(xj
(x1 x2)
-
e' R2.
'./'2(xa),
=./)
./'(xj
/)
x24-Y1
,'''xz)
(.Y1
=l
(5.57)
),
/'(xl
.
.
.x2)
-), (
x
exp
j
azrr.):
-j
x,(.x'2,
( /'1(x1))(-/x'
c
1 vi -
'
.:1)).
distributions
t'/2
cj
2) -
( 1 - /?
cax,
(2z:)
--
(5.59)
in this case are denoted by
(5.60)
(5.6 1)
(5.62)
C a Se :
(5.64)
Random vectors
F2(x)
'
'
'
(f
F,,(x).
lmportant concepts
Px( ));
random vector, the induced probability space (Rn,
the joint distribution and density functions',
marginal distribution and density functions, marginal normal
density;
independent r.v.'s; identically distributed r.v.'s;
conditional distribution and density functions, conditional normal
density.
.@'',
'
Questions
Why do we need to extend the concept of a random variable to that of a
random vector defined on (S, P( ))?
Explain how we can extend the definition of a r.v. to that of a random
vector stressing the role of half-closed intervals ( vz xq.
Explain the relationships between #( ), #xl ) and Fx x2( ).
!
Compare the concepts of marginalisation and conditloning.
Define the concepts of marginal and conditional density functions and
discuss their properties.
Define the concept of independence among r.v.'s via both marginal as
well as conditional density functions.
Define the concept of identically distributed r.v.'s.
.%
'
'
'
'
6.
Exercises
'tails'l.
eheads'
(iii)
-4-1
,/'(x1
/'l(x1.7l),
3)
.J'1(x1
2),
./(x2
./'clxc
0),
.:-2
X1
Xz
p1
(i)
(ii)
(iii)
t7'il
p1
'w N
PG 1 (r 2
-
cl
pclcc
/:2
Derive
Derive
0, 1, 2.
Derive
Derive f.
1, 2, 10.
and
./-(x1)
v-tx,
'x
,)
fctxz).
for x
x 1 > 0,
1, 2, 10, and
x2
>
0.
for
(xc,/'x1)
/'xz.a.j
x:
XG4itional derences
Bickel and Doksum (1977); Chung (1974)',Clarke (1975);Cramer (.1946/ Dudewicz
1974); Pfeiffer ( 19781.,Rohatgi (1976),
(1976); Giri ( 1974); Mood, Graybill and Boes (
6.1
of one random
Functions
variable
CHAPT ER 6
Functions
of random
variables
Fig. 6. 1. A Borel function of a random
6.1
considered a function from S to R and the above ensures that the composite
function (X): S R is indeed a random variable, i.e. the set
/l(Ar)(s)e:.B,)s,.i/-for any Bkt?d (seeFig. 6. 1). Let us denote the r.v. hX) by
Ft then F induces a probability set function #yt ) such that PyBh)= #x(#x)
JN.,4),in order to preserve the probability structure of the original
(S,
P )).Note that the reason we need ( ) to be a Borel function is to
preserve the event structure of
Having ensured that the function ( ) of the r.v. A' is itself a r.v. F hzrj
we want to derive the distribution of F when the distribution of X is known.
Let us consider the discrete case first. When A' is a discrete r.v. the F hzr)
is again a discrete r.v. and a1l we need to do is to give the set of values of F
and the corresponding probabilities. Consider the coin-tossing example
where X is the r.v. defined by A' (number of H - number of F), then since
.4
-.+
'
ts:
=
'%
'
,..d.
'
variable
P
Let A' be a r.v. on the probability space (,,
Suppose
-.+
function
S.
real
valued
R, i.e. -Yis a
on
S
function with at most a
11 is a continuous
(liscontinuities. More formally we need /1( ) to be
..#7t
'
'
that hl
countable
): R
XITH)
=0,
X(SS)
2, A'IFFI -2 and
=
the probability
function is
A- x
- 2 0 2
.,t J
Rs where
number
of
--+
a B()reI function.
'
Dpnit 1011l
variable.
=0,
#r(F
y)
1.
-t2
'
'$
'
P(
,s
t. 1' st.
1'-(s)
lI
.1
t'
<
t ik) I l /?
#( s :
1
(.)
- ((
(s) es1-1
l 1 t-t--t
11#.
'.'
t 17e
cs. )j
,
)),
tl 11it.l t.l e
I1tlt')ll
v''
'$''
y ba
r ia Illi.s
I 11 t llc ct str w' I'le1't,., is lt c() 11t i11tlt) tl s I'. tle l'i i I1g t 11etl ist l'l 1)11t I i 1 11 ( ) 1'1'
/7( ) is not as si l'npltt as thtl discrcte eastl because. rstly. 1'' is ntlt al ways a
continutus r.v. as well and, secondly, the solution to the problem depends
crucially on the nature of h ). A sufflcient condition for F to be a
continuous nv. as well is given by the following lemma.
.'
'
(.p)
>
1(.p))
h - '(-p)
for a <
dy
.p
<
(6.2)
b,
(aY
1/(2Uy)
(y)-./)(x71')(j-try1
-j
i.e. F
P#(x)
> y)
Pr(X
E; h -
1((
(6.3)
xt x1)).
,
Example 2
Np, c2) and F= Xl (seeFig. 6.2). Since Edtxll/tdx) 2.x we can see
increasing for x > 0 and monotonically
that /l(x) is monotonically
<0
and
Lemma
2 is not satisfied. However, for y>0
decreasing for x
Let X
expt
(aa)
normal distribution.
'v
1+.t( )
Ed-
i(y))/
-/.p)4yt7y1
distributed.
(see Fig. 6.2). In this form we can apply the above lemma with
for x > 0 and x < 0 separately to get
(d.')
=
'w
to the
refer
-jy)
expt
for y>o
-y)
That is, fytl,'lis the so-called gamma density, where F( is the gamma
fnction (F(n) jJ' t??1e-r d$. A gamma r.v.1 denoted by F.v G(r, p) has a
density of the form .J(y) g.p/F(r)q(p#)'- expt p)1, > 0. The above
1.2) and is known
distribution; an
distribution is G(1y,
as the chi-square
important distribution in statistical inference; see Appendix 6.1.
.)
,y
'v
Fy(y)
#r@(x)
= #r(
:%
y)
#r(x
6F
1(
),
xj)
-
As in the case of a single r.v. for a Borel function h ): R'1 R and a random
xY,,), (X) is a random variable. Let us consider
vector X (.11,Xz,
used
functions of random variables concentrating on the
certain commonly
variables
case for convenience of exposition.
two
'
Fx(
Vy)
Uy% A' %Qyl Fx(Uy)
=
6.2*
(l )
.Y1 + Xz.
By definition
the distribution
function of F
..Yj +
.:-.,.,
.et
'j
'v
and define F
A-j + X1.
ln particular,
Iy'(..v)
=
by
X1
symmetry.
-
A' 2
./'1
(.p
-
x.z)./2'(x2)dx2
Using an analogous
argument
then
,1
(xj )./a*
(-'
-vl
-
) dx 1
(2hl
-#/1
:6:
zc
1L
.
be seen it is not
1():!
6.2*
of random variables
Functions
Functions
X3 + Xz where
and
A-I
l03
.::0
and F
>0.
A'c are
F y (.p)
fv (y)
- ccl
0.5
.;''r'
)
:
i
-1
-p
.11
+ Xz + X5 where
.
).
Xi, i
1, 2, 3,
X
are
only continuous but also differentiable everywhere. The shape of the curve
is very much like the normal density. This is a general result which states
uniformly distributed independent r.v.'s,
that for Xi U( 1, 1), i 1, 2,
which
is closer to a normal distribution the
L Z)'-l Xi has a distribution
particular
of
value
case of the central limit theorem (see
greater the
n; a
Chapter 9).
'v
:.
The distrihution
(.
'
this becomes
I., tl'l;I'
t j
(.1,.v2)
'.fL(x2)dxc,
I.vzI./'l
'q
of A-1 A'2
where
.E.
(2)
xa) dxc,
jxajytyxa,
y,j
,y,.,(
1 1- t
.1! t
.$ (t l)e
(6.8)
htllnatical manipulations
lllat
x.
'
.1
'')t-
''
.'
.'
'1
..!
.'
'(
', 1'1sitl
t- 1'
t u-(
-s
',
l'.
'$
.
.'
:1 1,1(.1
.'
(.1
Ct1'1
It'!t
'i
'
'
'
'$
.1
'k
'1
'k
.....'''1
.j?.
--
..:
'$'
Functions of random
l() l
Ix. (Z)
.
Since
.J'(x1,::)
variables
n/ 2
Zn
ag(u,a)jjj-jus)
=/'1(x:)
nz
eXP -
'.gz),
'
.x
.-
--
Jr G j G 2
j)
()
u-) z V
S( G
2
'
'y.
:-
- 2-
y +.
()' 1
j-
V1
($ .- u
d ly
z2(nj)and
'v
()
l-t/fsll-fhuffon.
'
/y(
. Aj
xz2(na)
Xz
be two independent
.-.
xz/nz)
'c
,?,
n
-!.
The distribution
v2
n1
/ 1 -na
,
n2
exp
F
/'y (
.r.)
n 1 + nz
2
--
j-
n j
n2
..2
y,
y))
nl
dxa
nj /2
-n
),'c(,,j
ajyaj
2
j +
1.4,1
m,
j'n
h (x g ) dx .c
.
.v
tj1( la
1+
xa
:=
and define
(X1 /n1) nz Xl
. --
r.v.'S
1+
u 1
'
112
y?
)/21
,, ...?
d xa
W Uere bl =
Example 6
Let Arl
G2
V 1 CT2
.--
xp
'
of F
'
v
%
--
2
2
u!
..y -V ---j
G1
G2
106
Functions
of random
6.2*
uriables
Functions
'rhese assumptions
enable
to deduce that
us
(6. 12)
Example 9
.ct
Xi
N(0, 1), i
'v
1'1
=
1, 2 be two independent
ll1(.. 1 X2)
,
X1 + X2,
nv.'s and
A' 1
Ar2
sillce
X
l''
1 0 1( 1
-
.172)
'1.F2
j+
'
.y,
.'j!
min tXj
X2).
J= det
1
1+ )'c
t lllskilnplies that
+ ),c
I.'''l
/'(y1
.
-p2)
2,:
exp
-j
jy 3....1
(yy-).4)
(y 1 .J.'2 ) + y f
(1 + yc) c
2
...
1 ( 1 -I
1 J.c)2
.J,f
-j
exp
+
( j .y y,a)
(
1
27:
.rc)
(6. 10)
l l1t. Illltin
j
.. E
.I1l
111
t 11cltbove
example
Assume:
(i)
(ii)
(iii)
'
bn exist and
are
(. atlchy density.
'f!
.
,:@
108
Functions
6.3
of random variables
Looking ahead
6.4
variables,
109
a summary
Lgnlrntt 6 l
.
1/' Xi
(Yt'1
Nqyi c/).
x.
z'b'ij
Letnlna 6
1 2,
N(Yt,'-pi, Y'f'-
11 tgr'(?
c/)
normal.
independen r
?-.t?.*.!f
then
.2
x.
indepvndent
n Jtx' rees q/' jeedom.
-.l?.5y
g/-g
tbe
()-)'.1 .Y,2.)
6.2*
f--pnklrlt?
1/' Xi
()-'1
i 1
N(/tf c/), i
x
g2(j7.
'v
1 2,
2/.c2)
i
i
1'-'l'i/i fl
..
'G 2
i
n are
) ..-non-central
'
I-cmma 6.5
'
distributions.
Ll,nltntl 6.-3
1.1%X 1
and related
x.
../
x,
Lt.zrrlnkt/6
.-3
N(p, c2), Xz
1/' X
1(?; J)
.,:-1,,/'(x'/(X2,/:)1
t.??J
p/'c.
ptlratnet
j
'v
'v
'v
I'cl:ttionships
lt'ki
L'1'It
,y$
c2g2(n),
Lemma 6.4
?-.!,:.*s
A' 1 ztln1 )s X1 z2(nc), A-j Xz independent
thetl
(xYl n j )/(A-2/n2) F(n l na) - Fisher's F with n j and na degrees t?/'
lf
'v
'v
'v
jeedom.
Lenma 6.4*
.k.
.t *4
'
II,
lllx
.*,
II1
?
q 1)
.(.)
I
:
,1
,1
'
'lk's
y (.t
ahead
.tltpking
w'e considered
of functions of random
AIt Iltltlgh thtl
are in general rather
l 1t is kt ve 1'),' iInp() rtant facet of probability theo ry for two reasons:
I ( t 'l't t.I1 k'cctl I's i1) prltct icl?t hat the probability model is not defined
I 11 t t. l I 11 s $ ) l. l llt! ( ) l'i
g i11:1l I'. N' s btl t i 11 sorne fu nctions of these.
%l ; t 1I s l I k': 1 I I I 1 t't-I't- I 1k't.- isk c I'tlt. i t lIy tlttpentlell
t o 11 t he d ist ribution of
l 1 1 l 1 t I l , s . l l ; 1! 1k 1( ', I 1) y : t !' 1kl l7 It- s I s ! iI1 1 t t ( I's t 11(.l t t.ls t s t l.t t is t ic s .trtt
.
k'llitlltel'
the distribution
manipulations
l'nathematical
1dtI
( '
.,
Functions
of random
Appendix 6.1
variables
f(B)
f (/; 1)
normal
.- Ar
#lxl
.. .
Gx/
;.- .....-
y,
exp
(27:)
Var(.Y)
1 u- p
- -2
(5'
1 xy
- 2 .--..c
skvwnvss
s)
p'
Chi-square distribution
- F
'v
z2(n)
d r/,'
Reproductit'e
prtpptrrly
2
,
as
0,
kurtosis
a4
distribution
chi-square
()
7.
,
n)
---.;.,-
Vn
Higher moments:
r t?7 el
2-
cc
Zch
tN''2)
exp
z2(n,.(i)
- 15
'v
y!fH.''
.j
2 ), + J)1.
t - -.-41
2) - 1
)J)(,?'/'
is
Some prtppt?rres
F( F)
n+
(),)kuk +. j.)
(2k)! 1- k +
>0
Cumulants
chi-square.
Non-central
c2,
distributions
and related
/. (z ; 6
N(p, c2)
'v
.Y
(y; 6 )
Var(F)
>
12
,
2(n + 2).
E:
'E
'
.)
''m.
:
is that the
rtant difference with the centra j c jaj-square
ftl llct itln is shifted to the right and the variance increases.
deltsl l
11t
-;'''(
?t/
11(
+1
l 1'(:
/)?'f
)'
Functions
of random
variables
f tx)
z
(w; 4 ) Nxw
-..V'''''- f (x)
z-
tzvj
exp
x2
-
nz
nz -2
.E'(LJ)
nz > 2,
>
F-distribution
/'( &; n 1
'v
F-distribution
e-
n 2,'
y ..
1+
.. U
2k) - 1
N1
nz
j-'
f-distribution
>
+2:)
n 1 -i''t
-..n.2
n 1 + n z + 2/
-....
2
yn, +nz
n 1 4.
These moments show that for large n the
'v
ku J(n,+
Gt'
+2k)
nz
n2
......
Fl 1 +
gk
... . ,
>2s
t (t.?)
Non-central
'
w; n
>
N .,
'''
ir
/'(
(jj
,
:;2!
.--z
(.ti
C .....(
iEl
-.-::?2
.-
)1
- .-y
wzliV'
Y /n r(n/2)(n +
--
f (tz; n , n z )
) , a-j
'
)'.
);'.
tit
y'gs(.l-+k -1--ljtjt
k=0
2w2
b-y
.
jk/l
n + u(r,
ws
f (fz; n j n a ; )
,
'.
)il)1/.,
''''
::.
1201-ft,lp-f/n
g
()
:'
'
kr( 1.12)
---
'i;
St l-ff/cri
+rl2
2#lJ(n1
Vart (p)
n1(n2
2) c (n2 -4)
-2)
&
u s ()
Functions
of random variables
Important
Appendix 6.1
tl'tlaccr/.:
Additional
'larke
(
( l t?78);
Questions
Why should
be interested
we
in Borel functions
distributions?
:A Borel function is nothing more than a r.v. relative to the Borel field
on the real line.' Discuss.
Explain intuitively why a Borel function of a r.v. is a r.v. itself.
Explain the relationships between the normal, chi-square, Student's !,
Fisher's F and Cauchy distributions.
What is the difference between central and non-central
chi-square and
F-distributions'?
.td
Exercises
Let zYkbe a nv. with density function
A-1
(.X
*2
X
X
A'
(ii)
(iii)
of
2*
A' 17
e'Yi
l0 + 2%-21
Let the density function of the r.v. X be /'(x) e
distribution of F logc X.
Let the joint density function of X3 and Xz be
=
-X,
),t
j
*)
.'
.e
.x.
'$.'
',
1) (
.'
1'1'
references
( 1976).
and Boes
( 1974)*,Pfeiffer
of expectation
expectation
(7.1)
0j, 0 e:O
./'(x;
single random
as a useful characteristic of density functions of a
model
probability
generalised
the
to
Since then we
/ (x 1
x 2s
variable.
J'
'
..
in the
x , , ; 0 ) 0 iE ().
,
7.1
of a function of random
A'ariables
'
;..d.
--+
.vc)
.(.x1
'
7.1
of a function of random
Expectation
variables
equivalent ways:
:X.
(ii) E(ll('
:,
A-z))
/1(.x1 x2)./'(x1
dxj dxa.
xz)
and it is usually
The choice between (i) and (iij is a matter of convenience
of
of Y.
diffkulty
deriving
by
the
degree
in
the
distribution
determined
Let Xi
Using
'v
N(0, 1), i
X( + X(.
(ii),
E(X 21 + X 2)
2
1
/',(J.')
2,)-.(
.
=
----
(x( +
=(v2j
2aj
2/
exp
jx ( + x 2aj) dx ( dx c
z2(2.j- chi-square
+ -Y2a)
'.w
exp
cc
on
and
1. 2 be two independent
t 1-v)
-
we know that
E(Y)
)')
-.'Jy',(
-v
Properties
of expectation
fkE(lll(A-l Xa)) + ?E(/12(X1, A'2));
avd :( ), b a( ) are Borel
are
122 rtp R. .1n particular
Iinearity,
u/ntl'l ons
u'lltpl-t? a
.//4)m
1,
'
'
11
aiz-i
1* .=
t'onsrlnr.s
/1
E
x
Ar2)1
/nt
Z aikjz-i).
=
(7.5)
-.p.'s,
.ll-
tvyt?ry
Bovel jnction h
(')
R,
-.+
'
f(/'l1(A-1)2(A'c))
Eht
(.,1))
/'l2(..Y2)),
'
(7.6)
(7.7)
'
(7.8)
'
=0,
S(.Y1A'2)
and we say that A-l and Xz are orthogonal, writing X3 -1-Xz. A case between
F(.Y1), that
independence and uncorrelatedness is that in which E(k3/Xz)
is, the conditional and unconditional expectations of .11 coincide. The
analogous case for orthogonality is
=
E(XL,/X1)
when .E'(A-:) 0.
=0,
(7.10)
.',,
.Y'l) 0
- c,
plays an important role. For reasons that will become apparent
sequel we call this property martingale orthogonality'.
1,
Forms of hX
A-cl ofparticular
in the
interest
then
p;k
=
'(.YtA%)
X)
xtxt-fxlxz)
dxl dx2
-&A-1)/(-Y2 -'(A-2))k,
of
Expectation
7.1
E'(-Y1)
EEEE
/t1
Covtxl,
(7.14)
-/-t2)).
-Y2)- .E'((A'1-p1)(A-c
relationship
The covariance
Covt-l
Ar2)
E(A-1) F(A-a).
'(Ar1.Y2)
independent then using E2 wtl deduce tat
=
'
=0.
tlt?rv important
(7.15)
(7.16)
Covlx'j, Ar2)
1t is
the
is not true.
conYerse
(iij Variance l 2, k 0
=
-p1)2,
'Varl-Yl)
For a linear
E(A'1
EEE
functionjf
var jl aiuri
=)
is of the
form
a,? Var(A-i).
i
Corrt-'l, X2)=
Covtx'l A'c)
,
gvartA'jl
'
VartAralq
(7.20)
Properties
(C/)
(CJ)
(4)7.)
of CorrtA'l
-Y2)
-:-1
Example 2
Let
pcgljczj),
(*'x1a)
xllyLljafc,
-
Covtfj, X1)
X
(x1
(-Y1
-/z1)(xa
'X
cjcc
=,
-/t1)
cl
xz (2r)
(2a)
exp
exp
-j
-p2)-.
(1
.%1
(1
dx1
xa -/.ta
-p
-Jt1
cl
o.,
.-ptxlts-/jtjztj
(txa
-0.,a,v.
=0
=/(x1)
Nlta that correlation measures the linear dependence between r.v.'s but if
we consider only the first two moments that is the only dependence we can
analyse.
7.2 Conditional
expectation
Conditional expectation
7.2
Fx: Aatxl
xc)
).
0<h-0
(7.21)
(7.22)
Note that
(7.23)
'fx,
X.
'
xzlx
2)
.f
qc
-
h (x1)
h (x1)
)
(x2?7x1
x, (xz/ x 1 ) d x I
x.
./-,-2
-.
Bayes'
formula,
'
-,
(CDJ)
(CD3)
./'x,,xa(x1/x2)
xgf/ xa,
/xc(.X1 '.Y2) > 0,
xl
For a
fxL
.fnction
of x2.
is a proper density
.
fxj xatxl, xc) dx1 1.
.rfrlcrft?rl
EER
and
J'''-
wl
Properties
of the conditional
exists
expectation
'
-:-2
'
(7&rPf?.
Example 3
reqression
fncrftan
Conditional expectation
f (xlexa)
e'
.-
l
I
.-
I
I
X1
1
1
l
Xz
>
7z
.z.
.f
''-o
;v
2w
7e
eJ
(7.28)
Central conditional moments:
A'EIA-I
Of particular
skedasticity'
-EX,/Xz
=x2))'/-Y2
Vart-Yl/-'rc
x2)
'E(Arj
Exljjxz
(7.29)
x21,
variance,
EIXL/XZ
sometimes
x2))2/Ara
called
x
2.
Example 4
Let
?6
This
is the distribution
distribution
function of Gumbel's
bivariate
E0,1(l
.
exponential
since
'(X:/X2
.X2)
r! ( 1 + pxa +
r+
(1 + pxc)
Var(.Y1,.'A-a xc)
=
and
-2p2
+pxa)2
(l +p
--
r0)
,f.--.--
( 1 + pxc)
curve is non-linear
Example 5
The
'./'(x1xc).
,
ltkcx,
ptp-h 1)(4?1t7c)+
',x.a
-:71Ja)-''+''
xl
The marginal
x2 >az
>0.
density function of X1 is
2
(.X
PJt?
'-tt'-b 11
,
2x 2
Var(.;Y'1/,Y2 xa)
=
t?l (p+ 1) xj
c
az 0 0 + 1)
7.2
Conditional
expectation
Example 6
The
.Jtxj xa)
,
E(X3,,'Xz
x2)
'Xz
xa)
Vart-
2E1 +
zll -
expt
xc)),
- (x1+
1 - logt 1 + e-'2) - non-linear in x2
=.1yzrz
-
e-rt + e-
- homoskedastic
As argued above
value
ECXL,/XI
.fa,
(7.31)
The problem, however, is that we will be very hard pressed to explain its
meaning. What does it mean to say
average value of A-j given the
--+
variable
E'? The only meaning we can attach to such a
-Y2( ): S
random
conditional expectation is
'the
StA- 1 c(-Ya)),
where c(A'c) represents the c-field generated by Xz, since what we condition
on must be an event. Now, however, E'(-Yj c('c)) is no longer a nonstochastic function, being evaluated at the r.v. Xz. lndeed, .E'tArl/ct.Ycll is a
random tlariable with respect to the c-field c(A-a) c: .F which satisfies certain
properties. The discerning reader would have noticed that we have already
used this generalised concept of conditional expectation in stating CE5,
'(/l(A'1)/
where we took the expected value of the conditional expectation
'c xc). What we had in mind there was '(llj(.Y1) c(-2)) since otherwise
the conditional expectation is a constant. The way we defined '(A-j/c(-Y2))
as a direct extension of E(X !yA-a x2), one would hope that the similarity
between the two concepts wlll not end there. lt turns out, not surprisingly,
that f'taj /c(A'2)) satisfies certain properties analogous to CEI-CES:
=
'
of expectation
Example 7
=pl
-p2),
G2
and it is a
Xz
random
(homoskedastic).
we can show
c2j( 1
-p2)
is free of
(x),
i.e.
Var(F(.Y1/c(A'2))) + '(Var(A-1,,'c(.Yc))),
that is, the varianee of X : can be decomposed into the variance of the
conditional expectation of ..Y1 plus the expectation of the conditional
variance. This implies that
Var(-Y1) > Vart'tA-l
c(Ara)).
(7.34)
mean
either
'respect
'smoothing'
tsmoothing'
.%
,%
'
(c-cE4)
lE(x
(c-Cf.5)
(c-()7f3)
If
F(A').
tZ, then EX,
E(X/,F)
X.
flgftA- f/)) F(A-).
i 1, 2), FIJJIA#' V'1 9 2 (h
Ex/g' 1 ).
(c-CE7)
Lo'-CE8)
.f.0)1
z:
E(lA'l,,'f).
,$)
o'.s
.#t))
.%
'.f/)/.f/.
lj
2)
ELEIXI.@LII.LJ
7.3
Looking ahead
Let
be a r.p. relative to 94 and '(J-Yj)< tx;,, '()ArFj) <
SI-YF r2l A-JJIF f/').
If A- is independent of f/ ten fxY f?) S(Ar).
.;t'
(c-C'E9)
:t;
then
(wCE10)
These properties hold when the various expectations used are defined and
are always relative to some equivalence class. For this reason all the above
be qualified
statements related to conditional expectations must (formally)
surely' ('a.s.)(see Chapter 10). For example,
with the statement
C-CEI formally should read:
c' is a constant, i.e. X c a.s. then flxY V')
Ealmost
'lf
a S
.
=c
7.3
Looking ahead
tf
j'l.zj x 2
,
x,:,' p),
0 g (.))
(7.36)
Moreover,
Important
concepts
Questions
3.
4.
5.
6.
7.
.f/
Exercise.
For exercise 3 of Chapter 6 show that f'(A-1.Yc) F(A-1).E'(-Y2)but Ar1
and Xz are not independent.
For exercise 3 of Chapter 6,
derive J)-Y: /'Xz xa). Vartx't-l Xz x2), for x2 1, 2,
(i)
find Covt'j -Y2)-CorrtxYl -Ya).
(ii)
Let Xj and Xz be distributed as bivariate normal r.v.'s such that
=
A' 1
cf
Jzl
N
X2 ,v
pc1 c2
G1a
PtF1tr2
/22
pl
6'2a
=4,
(lalculate
E(X
X2
E(Ar2
-:-1
VartAfj/.Yz
4. Determine the
flxb
X2),
x1),
.x1
1, 2, 6,
=0,
1, 2,
Vart-rc/fj
x2),
x1).
of c for which
value
.X2)
X2
.X1(.X1
A72),
2,
pz
P
=0.2.
4,
7.3 Looking
ahead
Additional references
Bickel and Doksum
Whittle ( 1970).
CHAPTER
8*
Stochastic processes
8.1
(I)
txl xa,
13 1
process
(8.2)
The way we viewed this model so far has been as representing different
characteristics of the phenomenon in question in the form of the jointly
X,,. lf we reinterpret this model as representing the
distributed r.v.'s Arl,
same characteristic but at successive points in time then this can be viewed
as a dynamic probability model. With this as a starting point let us consider
the dynamic probability model in the context of (S, % P )).
.
'
8.1
space S.
Dejnition
Let
(.,
,./
#4
'
)) be a
'
'
'
-+
'
stochastic (random)process.
This definition suggests that for a stochastic process (xY( r), l c T), for
each r c T,
r) represents a random variable on S. On the other hand,for
each s G S, A'(s, ) represents a function of t which we call a realisation of the
process. -Y(s, r) for given s and t is just a real number.
'
x(
'
Example 1
Xs, r)
U(
where y(.) and z(.) are two jointly distributed r.v.'s and tI(')
independent of F( ) and Z( ). For a fixed r, say t 1, -Y(# F(# cos(Z(# +
u(#), being a function of r.v.'s, it is itself a r.v. For a fixed y, F(# y, Z(# z,
n(,s) u are just three numbers and there is nothing stochastic about the
function z(r) y costzr + ?# being a simple cosine function of l (see Fig.
8.1(fz)).
This example shows that for each t G T we have a different nv. and for
each s 6 S we have a different realisation of the process. ln practice we
observe one realisation of the process and we need to postulate a dynamic
probability model for which the observed realisation isconsidered to beone
of a family of possible realisations. The original uncertainty of the outcome
of an experiment is reduced to the uncertainty of the choice of one of these
,v
'
Stochastic
X
processes
(t)
3
2
1
0
10 2030
40
50 60
0. 10
0.05
-0.05
-0, 10
1964
1967
1970
1973
1976
1979
1982
T ime
(b)
'
zj ,
r0,
..(1)-
8.l
process
t0,
tsince
itentative'
Kno'.
-(f,,))
'tti
'tl.jal,
.jn
..
.%
Adescribe'
(8.3)
'(-Y(r))
=p(l),
'E(.X(r)-/.t(r))21
tl(r),
'(-Y(r)r) /t,,(r),
=
(8.4)
(8.5)
Stochastic processes
As we can see, these numerical characteristics
of X(1) are in general
1, given that at each t 6 T, Ar( 1) has a different distribution
F(Ar(r)).
The compatibility condition (ii)enablcs us to extend the distribution
function to any number of elements in T, say r1, lc,
1,,. That is, F(.Y(rj),
A'(rz),
X(r,,))denotes thejoint distribution of the same random variables
A'(r) at different points in T. The question which naturally arises at this stage
is how is this joint distribution different from the joint distribution of the
random vector X !E (#1, Xz,
Xnj' where X3 Xz,
Xt, are different
random variables'?' The answer is not very different. The only real difference
stems from the fact that the index set T is now a cardinal set, the difference
between ti and tj is now crucial, and it is not simply a labelling device as in
the case of Fz'j
Xn). This suggests that the mathematical
Xz,
developed
in
Chapters
5-7 for random vectors can be easily
apparatus
extended to the case of a stochastic process. Forexpositional purposes let us
consider the joint distribution of the stochastic process )A'(l), t c T) for
t t 1 , t :,
The joint distribution is defined by
functions of
'
=z
(8.6)
t11,l2)
is now called the
'rl,
p(l t
(lf
-p(l2))q,
r1,r2GT.
(8.7)
autocovariance
-------2-..5--.,
r2)
FE(.X'(f1)-p(f1))(A'(ra)
rl tz G T
,
1)t'(12))'
Example 2
(or
8.1
process
f (.2(11),
.
A''(l,,))
,,
- (2zr)
,,/2
,,
(t)-
,(t))),
1, 2,
n is an n x n autocovariance matrix and
g(12),
/t(?,,))'
is
a
n x 1 vector of means. ln view of the
#t) (/t(rl),
deduce
condition
the marginal distribution of each
we
can
compatibility
.Y(rf), which is also normal,
(8. 10)
As in the case of a normal random variable, the distribution of a normal
stochastic process is characterised by the first two moments /t(r) and n(l) but
now they are both functions of t.
The concepts introduced above for the stochastic process .Y(1),t e:T) can
be extended directly to a k x 1 vector stochastic process )X(l), t e:-F) where
X(l) = (.Y1(l), .Yc(r),
.Xk(l))'. Each component of X(l) defines a stochastic
tArftrls
(E T), i
1, 2,
k. This introduces a new dimension to the
process
r
concept of a random vector because at each 1, say r1, X(l1) is a k x 1 random
X(l,,)')' defines a random n x k
vector and for tzuz11 r2,
r,,,?' EEE(X(r1)?,
matrix. The joint distribution of .'.Tis defined by
.
cf./ll,
'r)
:$
.E'E(-Y/?)-/.ti(l))(.'.j('r) -Jt/z))(1
(8.13)
Stochastic processes
These concepts
for i
;e tween the stochastl'c processes Arf(r),
linear
dependence
the
measure
)
moment
r g Tl and .Y/(z), z e: Tl Similarly, we define the
Note that 1.41, z)
function by l?1fyl/, z)
)-j(l), .Yj(z)) ntr, z) when i
z)Et.?(r)l.T)()
r(r,
z) p(l)p(z)
' Using the notation introduced in Chapter
1.n(1,
6 (seealso Chapter 15) we can denote the distribution of a normal random
by
matrix
N()te that
cijt, z)
r7(r,z) and
1')
rijt,
r41, r)
=j.
'ross-product
=j.
nk'
X(l1)
X(r2)
X(r,,)
XN
Jl(r1)
p(r2)
C(r2-rl )V(r2)
'
1,,)
(r(?.1
,
llltnl
t7(r,, t I )
V(r,,)
(8. 14)
where V(rj) and C(lj, ly) are l x k matrices of autocorrelations and crosscorrelations, respectively. The formula of the distribution of needs special
notation which is rather complicated to introduce at this stage.
,'
(EX
/, i.e.
for
all '
7J,
(8.15)
we say that the process is q/' order 1. In defining the above concepts we
assumed implicitly that the stochastic processes involved are at least of
order 2.
The definition of a stochastic process given above is much too general to
enable us to obtain a manageable (operational) probability model for
modelling dynamic phenomena. ln order to see this let us consider the
question of constructing a probability model using the normal process. The
natural way to proceed is to define the parametric family of densities
/'(.Y(f) p,) which is now indexed not by 0 alone but r as well- i.e.
EET),
8.2
Restricting
time-heterogeneity
have to deduce the values of p(1) and P'(r, r) with the help of a single
observation! This arises because, as argued above for each r, -Y(s. 1) is a
random variable with its own distribution.
The main purpose of the next three sections is to consider various special
forms of stochastic processes where we can construct probability models
which are manageable in the context of statistical inference. Such
manageability is achieved by imposing certain restrictions which enable us
to reduce the number of unknown parameters involved in order to be able
to deduce their values from a single realisation. These restrictions come in
two forms:
of the process', and
restrictions on the time-heteroleneitq'
restrictions on the memory of the process.
ln Section 8.2 the concept of stationarity inducing considerable timehomogeneity to a stochastic process is considered. Section 8.3 considers
various concepts which restrict the memory of a stochastic process in
different ways. These restrictions will play an important role in Chapters
22 and 23. The purpose of Section 8.4 is to consider briefly a number of
important stochastic processes which are used extensively in Part lV. These
include martingales, martingale differences, innovation processes. Markov
processes, Brownian motion process, white-noise, autoregressive (AR) and
moving average (MA) processes as well as ARMA and ARIMA processes.
8.2
For an
arbitrary
stochastic
on t with
'estimate'
'random'
Djinition
(J
'r,
(8.17)
Stochastic processes
That is, the distribution function of the process remains unchanged when
shifted in time by an arbitrary value z. ln terms of the marginal distributions
t c T stationarity implies that
F(A'(M),
F(-Y(r))
FX(t +z)),
(8.18)
'
'
'
Dehnition 3
t?.JF(Ar(l1 +z),
.:-(11
.
t.Y(12)ldz,
E'EtArtrjl )d1
.
=
ln order to understand
(1)
First-order
(2)
(8.19)
(1981).
stationarity'
<
for al1 t c T and
'(I.Xr)I)
=p,
Second-order
(x;
stationarity'
1, Iz
A-(!',,))l,,(I
for /1
k.f
where/1 + lz +
+ z)), i.e.
0)
'ExY(r)j
.E'EXII
+ z)j
:x;
forall t e T
=pl,
(/1 2, lz 0)
=
constant free of
#.2
Restricting
time-heterogeneity
(/j
1,lz
(iii) '(.tA-(r1)).tA'rtral)j
and
Taking z
1)
z)) .t.-Y(/'a+
= figta'&-trj+
rl we can deduce that
.EE.Lz''(0)).:.'(r2
- r1))I
-
lllfa
z))(I
.
Irz rl 1.
a function of
...-.11),
These suggest that second-order stationarity for -Y(!) implies that its mean
and variance (Var(Ar(l)) p
pI) are constant and free of r and its
l2)
f;g)A-(0)))A'(l2
autocovariance (r(11
,
r1))(J pl) depends on the
stationarity, which is also
and
Second-order
not
interval jl2 rj j;
r2.
rj or
called wuk or wide-sense stationarit-v, is by far the most important form of
stationarity in modelling real phenomena. This is partly due to the fact that
in the case of a normal stationary process second-order stationarity is
given that the first two moments
equivalent to strict stationarity
normal
distribution.
the
characterise
ln order to see how stationarity can help us define operational
probability models for modelling dynamic phenomena let us consider the
implications of assuming stationarity for the normal stochastic process
',A-(r),t c T) and its parameters 0t. Given that '(-Y(r)) p and Var(-Y(l)) c2
for all l G T and tyrj, f2) s(lr1 rcl)for any rl rc c T we can deduce that for
1,,) of T the joint distribution of the process is
the subset (tI l2,
=
(n+
1) x 1 vector.
(8.20)
'memory'
'
Il1-
xueh as:
-?a)2.
(a)ll(lr1 -rc1)-(r1
(b)/,(lr1 /.c1) exp.t - Ir1 1c1).
-
(8.21)
(8.22)
Ic case (a) the dependence between .Y(rj) and xY(lc) increases as the gap
een rl and tz increases and in case (b) the dependence decreases as
-hi?ru
140
Stochastic processes
'memory'
kmemory'
'solve'
'the
'systematically',
8.3
Dehnition 4
A slochflslfc process .fA-(1),t e T) dejlned on the probability space
#( ))is said to be asymptotically independent lf forany subset
(.,
,%
'
14 1
8.3
gOeS
!t? zero
J.$ T
izfw).
-+
,t7-
-+
-+
-(r1)
lcJ
Dejlnition J
(t'(l)r(f+z))''
G P(T),
./r
aIl r 6 V,
sucb tat
0 %p(z) <
tplz),
-.+
Stochastic processes
,@j
.:-(1)
.,4
.@y
a(z) suplf'l-gt
=
c B4 - #(.4)/'4:)1,
(8.25)
Dejlnition 6
,-1stocbastic process
(z) 0 as z --.
kf
( c T)
.Y(M,
is said to be
(strongly)mixing
1)/-
bz'.
-+
of the asymptotic
As we can see, this is a direct generalisation
independence concept which is defined in terms of particular events and B
related to the definition of thejoint distribution function. In the case where
)A'(r), l G T) is an independent process a(z)
for z > 1. Another interesting
special case defined above of a mixing process is the m-dependent process
where a(z) 0 for z > m. In this sense an independent process is a zerodependent process. The usefulness of the concept of an m-dependent
process stems from the fact that commonly in practice any asymptotically
independent (ormixing) process can be approximated by such a process for
ilarge enough' ?'n.
A stronger form of mixing, sometimes called unlfrm mixing, can be
defined in terms of the following measure of dependence:
,4
=0
tp(T)
sup
(8,4/,)-#(-4)t,
PB)
>0.
(8.26)
Restricting
#.3
143
memory
Dehnition
(P(T)
'r
'--.
(f
-+
Looking at the two delinitions of mixing we can see that a(z) and @(z)define
absolute and relative measures of temporal dependence, respectively. The
formeris based on the definition of dependence between two events and B
separated by z periods using the absolute measure
,4
E#(z4ro #) 8-4)
-
'
#(#)) 10
#(X)) > 0.
stochastic
stationary
processes
ln the context of second-order
asymptotic uncorrelatedness can be defined more intuitively in terms of the
temporal covariance as follows:
Xt
CovtAXll,
+z))
t?(z)
-+
Dehnition 8
-4second-order
stationary
1 r
lim -T Zl p(z)
zv-vx
=0.
(8.28)
Stochastic processes
8.4
Some spial
stochastic processes
(1)
Non-parametric
processes
(1)
Martingales
Dehnition 9
(lfned
#( )) and
Let )A'(r), t G T) be a stocastic
on (S,
process
Jt
oj'
oujlelds
G
T
increasing
9$.
l e: T)
sequence
r
(f ,,
) an
tc
the
conditions:
satisfyinq
followinq
,.%
'
.@
,#7t
8.4 Some
special stochastic
Ar(l) is a random
()
(ff)
(fi)
processes
lmriable
(r.t?.)relative rtl
mean is bounde
X(l 1), for aII t 6 T.
'(-Y(l) 9. t - j)
Then (A'(l), t 6 T) is said rtp be a martingale
).f4, t 6 T) and wtl wrflp tAr(l), 4, t e T).
< c/s
.f/?.
for aII r e: T.
wirll
respect
to
E(xY(!+ z) i? t-
1)
Xt
(8.29)
That is, the best predictor of xY(r+ z), given the information 9't-. 1, is A'(l 1)
for any z >0.
Intuitively, a martingale can be viewed as a fair game'. Defining .Y(1)to
be the money held by a gambler after the rth trial in a casino game (say,
black jack) and Vk to be the history' of the game up to time 1, then the
because the gambler
condition (iii)above suggests that the game is
before trial t expects to have the same amount of money at the end of the
trial as the amount held before the bet was placed. It will take a very foolish
gambler to play a game for which
-
%fair'
(iiil'
Eqxltj/.@t
1)
.GX(l 1).
defines what is called a supermartingale
(8.30)
ilargely'
Example 3
Let tZ(r), l c T) be a sequence of independent r.v.'s such that F(Z(r))
0 for
146
Stochastic processes
z(k),
-Y41)
=
k=1
then )X(r),
(8.3 1)
h, r e T) is a
z( 1))
(ii) are
(8.32)
Example 4
only restriction is
Let
gz
A'(r)
EZ(k)
1)1,
Elzkl/.i
(8.33)
k=1
.Y(1:, then
c(Z(k), Zk
1),
Z( 1)) c(A-(k), .Y(k 1),
tA'(r), ?,, t G T) is a martingale. Note that condition (iii)can be verified using
the property c-CE8 (seeSection 7.2).
where 9k
i.e.
F(r)
.E'(Z(l)/Mt
.:-(/)
==
.-.
(8.34)
1),
:)
=0,
for all
'(.E'(F(r)F(/())
= E gF(k)'(F(f)/f/,
That is, (F(l), t e: T) is an orthogonal
rG T
q
-
is known as a
(8.35)
f G T.
what
we can
yj
:)j
=0
sequence as well
(8.36)
(seeChapter 7).
special stochastic
8.4 Some
147
processes
Dehnition 10
j.j
Q.y (uu
F41) is a nn, relative to
(f)
'(lF(r)!) < Jz'; and
(if)
f)F(r) . , - 1) 0, t c T.
(fff)
.
(uu
%;
Dehnition 11
.
(i)
(f)
xY(0),where
'(1
-:-(1)
1'(r)I<
2)
'(y(r)F(z))
Js;
=
.,
0, t >
() F(.j), and
z,
r, z c T.
.X'(1)
=/-t(r)
141),
(8.37)
for a11 t 6 T.
1), fOr
(8.38)
Stochastic
processes
F(0)
(8.39)
A-(0),
which is a martingale difference and A'(!) j () FU).ln the case where the
martingale is also a second-order process then F(f), r e: T) is also an
innovation process. In Chapter 9 it will be shown that an innovation
process behaves asymptotically like an independent sequence of random
variables; the most extreme form of memory restriction.
=
(2)
Markov processes
important
'past'.
'present',
Djlnition
12
#'
<
'j/2(A''(r))I
:yz
A((A'(l)),
.@%
c(A-(r): a
w/-ltlrp,Mb
J
=
.,)
.)
F44-(1)) /'c(A'(r
< t<
where
f-'tr
/?).(Note: ,#L
,g:
1)),
is past' p/lf.s
(8.40)
ipresent'.)
+ z)/tr(A'(!)),
(8.41)
.4
.4cE
LxL:,
#(X
r-h B Jdff)=
#(X
Jdf)
'
#(S .Xj).
8.4 Some
spial
149
stochastic processes
Dejlnition 13
<
Marko:
.,4stocbastic
f(IX(r)l)
('
and
:y-
.2?d--.1,)
.E4..Y41)
.E'(.X'(r)c(.Y(r
1),
(8.43)
Xt - n'l))).
.?f--,1,)
- 2) +
a1-Y(l - 1) + xzzrt
'
'
'
(8.44)
+ amhjt - rn),
'
a.)
0 lie inside the
and the roots of the polynomial km
unit circle (seeAR(m) below). This special stochastic process will play a very
important role in Part lV.
xk;.m -
(3)
'
'
'
,%
-srlrf.s
'
1, 2,
'
n, are independent
.E'(.Y(?,.)
- Arlrf-
)) 0,
=
ti 1 )
1zz41,,
-
czlrj
- ti
1),.
/'(.v 1
t1
-v,,,'
n, are normally
Jnction is
tn)
-.i
---=-
CXn
'-
/(27:)lj
x2
1
2c221
(;
,,
jw
-.
)
(2zr)
ti -
Stochastic processes
(.Xf
exp
1)2
xi
2c2(/.f /.j-
'
j)
ln the case where (7.2 1 the process is called standard Brownian motion. lt
is not very difficult to see that the standard Brownian motion process is
both a martingale as well as a Markov process. That is, (.Y'(1),l G T) is a
.Y(z), z %r. Moreover,
martingale with respect to 4t- since '(xY(l)/,?Fsince E(X(t)/@%
f'(A'(r)/c(A)z)))it is also a Markov process. Note also
that Eg(-Y(l) Xzjllj@z.
(1 z), t % z.
=
.)
(x)
.)
.1
Dejlnition 15
,4 stochastic
(f)
()
process
'(l/(r))
JtM4,r g T ) is said
0,'
J;(l/(r)lg(z))
-
if t
(f t+
=
to be a white-noise process
#'
z
z
r.v.'s).
(uncorrelated
stationary.
(11)
Parametric
stochastic processes
(4)
Autoregressivenjirst
order
(z4#(1))
The AR( 1) process is by far the most widely used stochastic process in
econometric modelling. An adequate understanding of this process will
provide the necessary groundwork for more general parametric models
such as AR(m), MA(m) or ARMA(p,t?) considered next.
Dejlnition 16
.4 stochastic
process
(x(r),!GT)
s said to be autoregressive
of
8.4
Some spial
stochastic
processes
1) + u(1),
l1(f)
(8.45)
is a white-noise
process.
The main
/-1
.:-41) a'Ar(0) +
=
)( xiult
=
f).
(8.46)
F(xY(f))
fE'(-Y(0))
(8.47)
and
+z))
'(A-(t)aY(l
f- 1
a'?r(0) A-
==
.
)
=
liut
--
a'+1?f(0) ?-
j
i 0
xiut A-z
--
i4
f-1
r+ z - l
')a2+'1xY(0)2)+ E
i
)
=
l+r-1
autr
j)
i
j0
aijtr +z
jj
(8.48)
This shows clearly that if no restrictions are placed on .Y(0) or/and a the
AR(1) process represented by (45)is neither stationary nor asymptotically
independent. Let us consider restricting
and a.
.1(0) in (46) represents the initial condition for the solution of the
stochastic difference equation. This seems to provide a
solution for
the difference equation (seeLuenberger (1969)) and plays an important role
in the determination of the dependence structure of stochastic difference
.40)
bunique'
Stochastic processes
equations. If
form
that .Y(0)
we assume
for simplicity,
=0
(8.50)
=0
J)-Y(r).Y(r + z))
clul
(:t
2f
-+
(8.j1)
.).
f(AX0)) 0,
=
'(A-(l).Y(r+ z))
E
i
liult
i)
j Vult + z
j 0
=
=
Hence, for
c2
stationary
j0
-jj
C&
aiaf +1
a2
c2a1
i
(8.54)
?-?(z)
=
(1
(8.55)
a',
)(
-+
stocllastic processes
Some spial
8.4
The main purpose of the above discussion has been to bring out the
importance of seemingly innocuous assumptions such as the initial
conditions (Ar(0) 0, Ar(- F)-+O as T-+ :yz ), the choice of the index set (T or
T*) and the parameter restrictions ()al 1), as well as the role of these
assumptions in determining the properties of the stochastic process. As seen
above, the choice of the index set and the initial conditon play a very
important role in determining the stationarity of the process. In particular,
for the stochastic process as defined by the GM (45)to be stationary it is
necessary to use T* as the index set. This, however, although theoretically
convenient, is an unrealistic presupposition which we will prefer to avoid if
possible. Moreover, the initial condition .Y(0) 0 or .Y( T) 0 as T :yz
is not as innocuous as it seems at first sight. -Y(0) 0 determines the mean of
the process in no uncertain terms by attributing to the origin a very special
status. The condition A'( T) 0 as T :x- ensures that A'Ir)can be
< 1.
expressed in the form (52)even without the condition
modelling
IV)
it is interesting
For the purposes of econometric
(seePart
1,
2,
in relation to
to consider the case where the index set is T )0,
above,
in
this case kfxYtfl,
stationarity and asymptotic independence. As seen
restriction
#40) 0,
under
second-order
T)
stationary even
the
is not
l iE
asymptotically
< 1. Under the same conditions, however, the process is
independent. The non-stationarity stems from the fact the autocorrelation
function t?(f, t +z) depends on t because for z 0,
=
-..
--+
-+
-+
Ial
.)
Ial
/1
+
,t
1')
J2aT
=
- tz
lt
(8.56)
t,tr,t +z)
::x:
c2(g
-#
(8.57)
f)Ar(l)/f/
t- j
fJ(A'(r)/c(X(l
= aA'tr
1/)
(8.58)
(8.59)
1),
Stochastic processes
),
N(0)
-Y40),
(8.60)
(8.61)
'(u(r)l(k))
E'(u(r))
'(.E'(lf(l) f. t
'(l,/(f)2)FtF(I#l)2/.(#()
- 1
f(-Y/)
0-,
(8.62)
j))
c2,
(8.63)
,.
tdesign'.
kgiven
'(r)
tl(T)
EX(t)Xt
= E @.Y(r
z))
1) + !(r.))aY(r z) )
-
az
1),
t(z)
(f/0).
(8.65)
8.4 Some
spial
stochastic
processes
Moreover,
d0)
f'(A-(r)Ar(l)) Exxt
- 1).:-(8) + /)tI(1)-Y(1))
=ar(1) + c2 a2tj0) + c2.
Hence,
G
d0)
(F
and
(1 a)
tz
'
'
(1
-aZ)
(8.67)
af.
tdesigned'
This shows clearly that in the case where the GM is
no need to
change to asymptotic stationarity arises as in the case where (45) is
postulated as a GM. What is more, the role of the various assumptions
becomes much easier to understand when the probabilistic assumptions are
made directly in tenns of the stochastic process tAr(r), t c: T) and not the
white-noise process.
(5)
mth-order
Autoregressive,
(z1#(-))
The above discussion of the AR(1) process generalises directly to the AR(/#
process where mb 1.
Dejlnition 1
of order
-4stochastic process t-Y(l), t G T) is said to be autoregressiYe
lf
satisjles
the
stochastic
(v4R(m))
it
equation
dterence
m
Ar(r) alArtl
=
1) + xzxt
- 2) +
'
'
+ xmzrt
1.n)
+ u(l),
(8.68)
a(f)x(l)
(8.69)
/.k48,
-xzLl
-am1r)
with 1tX(t) Xt
k),k> 1. The
whereatLl 1
T*
of
equation
1,
for
2,
the
difference
+
:I:
can be
solution
(24)
(0,
writtenas
=(
.-ajf-
.)
.71)
g(r) + x -
'(f-)l/(r).
(8.70)
txm
ajx'l -
.
-
aml
(8.71)
cm
Stocbastic processes
.:-40),
-Y(1),
xtn1 1). The
of (70)and takes the form:
-
(8.72)
where
7 t -F- 1 7 ()
(7
7,,,+
a j ),,,,
7,. vz
7,,,+z
'
'
'
'
myfl
'
(8 3)
.7
m7,=
0, 1, 2,
I2f
I<
1,
1, 2,
A-(0)aJ
(8.74)
n1.
That is, the roots of the polynomial (71) are said to lie within the unit circle.
Under the restrictions (74) the general solution goes to zero, i.e.
(/(1) 0
->
.Y4/.)
=
-.j).
)- o?.#l
j 0
=
0,
'(Ar(r)A'(l+ z))
E
j
l ),yl(l
-jj
pfutl +
i
of the
':
-
X'
= Z(,
j
'jb-vz
'2.
tUz'
'w.
-->
%;
-+
)0,1, 2,
.)
is
Some spial
8.4
stochastic processes
idesigned'
.Xtr
n'l)))
= i j 1 afzYtr-).
(8.78)
The first equality stems from the mth-order Markov property. The linearity
of the conditional mean is due to the normality, the time invariance of the
afs is due to the stationarity and asymptotic independence implies that the
roots of the polynomial (71) 1ie inside the unit circle. lf we define the
X(0)), t e T, and the
increasing sequence of c-fields 9L tr(.Y(r), A-t 1),
u(r),
T)
by
t
c
process
=
utrl-xtr)- Ex/'u
k-
(8.79)
1)-
tdesigned'
F(-Y(f)A'(r -z))
>
aj '(Ar(r- 1)Ar(l-z)) +
'
'
'
+ E(l(l)A-(t -z)),
(8.80)
(8.81)
satisfy the same difference
Hence, we can see that the autocovariances
equation as the process itself. Similarly, the autocorrelation function takes
the form
(8.82)
The system of equations for z 1, 2,
m are known as Flg/t?-lzl//k!r
which
play
important
role
in
the estimation of the coefticients
equations
an
The
relationship
198
1)).
between the
a. (see Priestley (
tz1 aa,
=
Stochastic processes
=0,
1, 2,
(8.83)
I2fI
r(z)
(6)
-+
(8.84)
(4f-4)processes
Moving average
Dejlnition 18
(8.85)
where :1, bz,
bk are constants and
t e:T) is a white-noise
the
white-noise
used
to build the process
process is
process. Ttzr is,
(aY(?), t c T), beinq a linear combination (#' the Iast k ?,k(r ils.
Given that f .Y(f), t c T) is a linear combination of uncorrelated random
variables we can deduce that
.
tlt(r),
(8.86)
0 tKz %;k
z> k
(8.87)
o/zGk
(8.88)
(!?() 1). These results show that, firstly, a MA(k) process is second-order
stationary irrespective of the values taken by /?1 bz,
bk, and, secondly,
after k
its autocovariance and autocorrelation functions have a
periods. That is, a MA(k) process is both second-order stationary and kcorrelated (r(z) 0, z > k).
ln the simple case of a MA(1), (85)takes the form
=
tcut-off'
..(1)
Ik(1)
/71u(t
1),
(8.89)
with
t7(0)
=
r( 1)
(1+ hf)c2,
1)1
+hf)
(1
p(1)
r(z)
0,
#1c2,
r(z)
0,
(8.90)
(8.9 1)
8.4 Some
special stochastic
processes
As we can see, a MA(k) process is severely restrictive in relation to timeheterogeneity and memory. It turns out, however, that any second-order
independent
stationary, asymptotically
process )-Y(r), t e: T) can be
as a MA(:ys ), i.e.
Kexpressed'
x(r)
)( bjut
-j),
j=
l(r)
A'(r)
EX(t)(%-
1),
.62,
(8.93)
A-(r) S(aY(r)/f4).
(8.94)
tl4l),
Given that
t G T) is an innovation process (martingaledifference,
orthogonal process), it can be viewed as an orthogonal basis for %. This
'(-Y(r)
enables us to deduce that
can be expressed as a linear
combination of the l41
i.e.
-4)
-.j)s,
.E'(-Y(1).@f
)
bjut -j4,
(8.95)
j=O
from which (92)follows directly. ln a sense the process :u(1), t G T). provides
blocks' for any second-order stationary process. This can be
the
seen as a direct extension of the result that any element of a linear space can
be expressed in terms of an orthogonal basis uniquely, to the case of an
infinite dimensional linear space, a Hilbert space (see Kreyszig ( 1978/.
The MA(k) process can be viewed as a special case of (92) where the
uncorrelatedness
assumption of asymptotic
is restricted
to kcorrelatedness. ln such a case )Ar(r), l G -F) can be expressed as a linear
u(? -k).
function of the last k orthogonal elements l(l), l41 - 1),
ibuilding
(7)
Autoregressive
stationary, asymptotically
uncorrelated
Stochastic processes
.Y(r)
bjut
.-jj,
j=
Dejlnition 19
1) +
'xvxl.t
'
'
p)
l/(r)
htutr
1) +
+ bqult q)
-
bq,are
constants
and
.ju(l),
'
'
'
(8.97)
t g T) is a
.:-(1) 47*(fa)u,.
(8.98)
bq(L)
=
xp(L)
bq)
,
+ uplup)
xpl-z-
k(1a)l/(r),
by the
(8.99)
to any degree of
(8.100)
8.4 Some
a12#-
'
'
'
+ aP)
.
(8.101)
lie inside the unit circle. No restrictions are needed on the coefficients or the
roots of bql. Such restrictions are needed in the case where an AR((x))
lie inside
formulation of (97)is required. Assuming that the roots of bpli
the unit circle enables us to express the ARMAIP, q) in the form
=0
(8.102)
This form, however, can be operational only when it can be approximated
enough' m. The conditions on a/2)
by an AR(m) representation for
stability
commonly
known
and those on k(2) as
conditions
as
are
invertibility conditions (seeBox and Jenkins ( 1976)).
The popularity of ARMAIP, q) formulations in time-series modelling
stems partly from the fact that the formulation can be extended to a
stochastic processes; the so-called
particular type of non-stationary
homogeneous non-stationarity This is the case where only the mean is time
dependent (the variance and covariance are time invariant) and the time
change is local. ln such a case the stochastic process tZ(l), t G T) exhibiting
such behaviour can be transformed into a stationary process by
differencing, i.e. define
'large
-Y(f)
( 1 - f-)dz(l),
(8.103)
1);
1)+Zt
-2).
(8.104)
$(f)l(l),
(8.105)
Stochastic processes
8.5
Summarr
The purpose of this chapter has been to extend the concept of a random
variable (r.v.) in order to enable us to model dynamic processes. The
extension came in the form of a stochastic process (Ar(r), t 6 T) where .:71) is
dened on S x T notjust on S as in the case of a r.v.; the index set T provides
the time dimension needed. The concept of a stochastic process enables us
x,,; 0), 0 6 0)
to extend the notion of the probability model *=
discussed so far to one with a distinct time dimension
t/xl,
(8.106)
This, however, presents us with an obvious problem. The fact that the
unknown parameter vector 0t, indexing the parametric family of densities,
their values from
depends on t will make our task of
(commonly) a single sample realisation impossible.
In order to make the theory build upon the concept of a stochastic
process manageable we need to impose certain restrictions on the process
itself. The notions of asymptotic independence and stationarity are
employed with this purpose in mind. Asymptotic independence, by
restricting the memory of the stochastic process, enables us to approximate
such processes with parametric ones which reduces the number of
unknown parameters to a finite set 0. Similarly, stationarity by imposing
time-homogeneity on the stochastic process enables us to use timeindependent parameters to model a dynamic process in a
is to reduce the
equilibrium'. The effect of both sets of restrictions
model
106)
probability
to
(
testimating'
tstatistical
*=
(8.107)
8.5 Summary
163
Markov
ergodicity,
asymptotic independence, nth-order
process,
martingale, martingale difference, innovation process, Markov property,
Brownian
Questions
What is the reason for extending the concept of a random valiable to
that of a stochastic process?
Define the concept of a stochastic process and explain its main
4.
5.
6.
components.
'Ar(.s,r) can be interpreted as a random valiable, a non-stochastic
function (realisation)
as well as a single numben' Discuss.
Wild fluctuations of a realisation of a process have nothing to do with
its randomness.' Discuss.
How do we specify the structure of a stochastic process?
Compare the joint distribution of a set of n normally distributed
independent r.v.'s with that of a stochastic process (.:-(0,
t E: T) for (r1,
;,,) in terms of the unknown
parameters involved.
l2,
Let (Ar(r), l c T) be a stationary normal process. Define its joint
< l,, and explain the effect on the unknown
distribution for r < t,
parameters involved by assuming (i)m-dependence or (ii)mth-order
.
8.
9.
10.
14.
Markovness.
(xY(r), l e: T) is a normal stationary process then:
asymptotic independence and uncorrelatedness', as well as
(i)
strict and second-order stationarity, coincide.'
(ii)
Explain.
Discuss and compare the notions of an m-dependent and an mthorder Markov process.
Explain how restrictions on the time-heterogeneity and memory of a
stochastic process can help us construct operational probability
models for dynamic phenomena.
restriction
notions
of asymptotic
Compare the memory
Fnth-order
independence, asymptotic uncorrelatedness,
Markovness, mixing and ergodicity.
Explain the notion of homogeneous non-stationarity and its relation
to .4RlMA(p, J, q) formulations.
Explain the difference between a parametric AR(1) stochastic process
tdesigned'
non-parametric
AR(1) model.
and a
Define the notion of a martingale and explain its attractiveness for
tlf
Stochastic
processes
tAny
19.
second-order
lal
stationary
and asymptotically
uncorrelated
stochastic process can be expressed in MA(cfs) form.' Explain.
Explain the role of the initial conditions in the context of an AR(1)
PI-OCCSS.
20.
PI-OCCSS.
22.
b'T'heARMAIP, q) formulation provides a parsimonious representation for second-order stationary stochastic processes.' Explain.
Discuss the likely usefulness of ARIMAIP, q) formulations in
econometric modelling.
Additional references
Anderson (197 1); Chung ( 1974); Doob ( 1953.4;Feller ( 1970),' Fuller ( 1976); Gnedenko
( 1969); Granger and Newbold (1977); Granger and Watson ( 1984),' Hannan ( 1970);
Lamperti ( 1977); Nerlove et (11. ( 1979); Rosenblatt ( 1974); Whittle ( 1970); Yaglom
( 1962).
C H A PT E R 9
Limit theorems
9.1
The term
tlimit
Scentral
Bernoulli's theorem
Let S', be the number q occurrences t#' an cpt?nr in n independent
trials t? (1 random experiment rsi and p #4z1) is the probabilitq' 0)'
iv each of rf? trials. F!n jr t/ny s > 0
occurrence
.,4
.f
.4
lim Pr
->
,1
(c
S',
-p
--Fl
< z
1,
i.e. tbe Iimit t#' the probabilitq' q/' the event $((S,,,/n) J?)l<t)
approaches one as the number #' trials (/f?(!.$to iljjlnity.
Shortly after the publication of Bernoulli's result De Moivre and Laplace
in their attempt to provide an easier way to calculate binomial probabilities
proved that when g(S,, n) -p(l is multiplied by a factor equal to the inverse of
its standard error the resulting quantity has a distribution which
approaches the normal as n
i.e.
-
--+
.:x;.,
(9.2)
166
Limit theorems
'the'
-4
.,,
'
'
'
=0)
-pq1
Dehnition 1
cf)
FI<
-+
:)
1.
(9.3)
Desnition 2
tF,,(y),
,4 sequence of r.v.'s (Y;,, n > 1) wr distribution functions
n > 1) is said to conYerge in distribution to a r.p. F with distribution
9.1 The
function F(y)
4J'
1im F,,(y)
11
--#
F(y)
C:l
--+
FL
IL(s)
-
<
F(s)I
for n > N.
(9.5)
Dehnition 3
.,4sequence of r.v.'s (Y;,,n > 1) con,erges to F
almost surely or with probability one) ,)'
lim Y;, F
Pr
lt
--#
or, equivalently',
lim Pr
u -/
(x)
243
(f for any
U IYk
(f?r.1-,.or
-+
F,
a constantj
(9.6)
) 2>0
Fl
<
m 7) n
1.
'weak
lwimittheorems
The law of large numbers
(1)
The weak
of lavge numbers
/Jw.,
( WL f.,'vl
realised
.
Poisson's theorem
Let .tfxY,,,n y 1) be a sequence
1) pi and Pr(Xi 0)
Prlzri
> 0,
;
=
I-S-
-+
,1
-.11.
- ...
N
Fl
ljm pr
x
(?Jindependent
1 - pi i
1, 2,
''
jj
i
=
pi
< ;
(9.s)
7-
(9.9)
since lim
u
--+
.x
(,
ll8
0, lim Pr
,,
--+
w.
. i
11
11
.=
Z
=
1
Xi - 11
!1
Z Jtf >
=
0,
9.2
1imPr
-/1
169
j( Xi -- 1 )2pi
j
<)
1,
-+
-+
11
Var(5',,)
Vart.Yfl +
we
need
Chapter
j #j)
Covtxl..Yjli
(9.10)
condition.
Markov's theorem
1im Pr
,,
-+
az.
-.
'
11
)
=
Xi - 11
11
E(Xij
i
< ;
1.
& % N --/
'Jw
Limit theorems
(2)
Te strong
(SL Nm
Borel's theorem
Let fA-,,) be a sequence 0). 11D Bernoulli r.v.'.%wjr Prlxi
Prlxi
0) 1 p for aIl i, ten
=
1) p and
=
S ''
lim
Pr
-+x
13
1
.
=p,
Sm
lim Pr max
?1 -+
-p
I'?I
mp n
x:.
=0.
S
-.$1
Sm
%max
-p
&
(9.17)
-p
.
??1
mn
as
t-0*=
%-.+'
lim
l1 --/ ::Ll
'Y
)
-
VartA4l
=0
(9.18)
for the WLLN in the case of independent r.v.'s, with the stronger condition
Cf
)
=1
(9.19)
r.v.'s.
teorem
Pr
lim n n rl
-+
?1
) g#f-f(A-j)()
=
=0
=
1.
9.2
can prove it
using
Pr
Kolmogorov
,u
'
k < ,,
jsk-'(sk)Iy
max
11
1g
c <
)) c/.
1
sequence of llD
:* Varl'x k )
)
:2
k=1
%'
=
k=1
-k
dx
x2.J(x)
< ct;. ,
tlaw
':s
1, 2,
(9.22)
Moreover,
0(.;2)
where
the Markov condition can be written as Var(S,,)
smaller order than' achieves the same effect since
Var(5',,) O(n) Var(S,,) o(n2) (see Chapter 10). The Kolmogorov
1.
conditlon is a more restrictive form of the Markov condition, requiring the
variance of the partial sums to be uniformly of at most of order n. This being
the case, it becomes obvious that the conditions LT3 and LT4, assuming
independence and identically distributed r.v.'s, are not fundamental
ingredients. lndeed, if we drop the identically distributed condition
altogether and weaken independence to martingale orthogonality the above
limit theorems go through with minor modifications. We say that a
sequence of r.v-'s Xn, n y 1) is martingale orthogonal if f'(A',,/c(A-,,- I
small
:o'
reads
'of
'=.
Limit theorems
=0,
'independent'
interested in.
(3)
for martingales
Let
.t5',,,
.';,
WLLN
for martingales
%c#(lA-I
z,-
y;.K Xi - Elxijfh j ),
An equivalent way to state the WLLN is
1g
-N i
13
11
LXi
=
(9.24)
E(Xi/%.
1)j
-->
0.
of stationarity
of (A-,,, n y 1)
9.3
1),
and );(-Y,,
n > 1) (seeChapter 8) can strengthen the WLLN result to
that of the SLLN.
The above discussion suggests that the most important ingredients of the
Bernoulli theorem are that:
(i)
we consider the probabilistic behaviour of centred nv.'s of the form
,f.6-
-+
9.3
As with the WLLN and SLLN, it was realised that LT2 was not
contributing in any essential way to the De Moivre-Laplace theorem and
the literature considered sequences of r.v.'s with restrictions on the first few
jt
j Xi, the CLT
moments. Let tA-,,, 1177 1 be a sequence of r.v.'s and S,,
of
considers the limiting behaviour
=
F,:
S,, - .E(S,,)
-7tvar(s,,);
N'
.
which is a normalised
and SLLN.
version
Lindeberl-luevy,
of
.%,
-
theorem
lim F.,( y)
''
?,-.
cc.
)f'-
'
1im P(K. G r
'
?,-.
w.
''
--,
'
x''(27:)
expt
1yu2)
d)?.
(9.28)
Limit theorems
Liapunov's theorem
Let
kfX,,,
n > 1)
be a
VartA-zl
c2
i
<
J >0.
.E'(lAz'fI2+)
c/ < az,
r.v.'s wr/l
'aa,
1
2
11
)-
C=
1,
of independent
sequence
>
ten if
(9.30)
theorem is rather restrictive because it requires the existence of
higher
than the second. A more satisfactory result providing both
moments
and
sufficient
conditions is the next theorem', Lindeberg in 1923
necessary
established the
if' part.
part and Feller in 1935 the
Liapunov's
'if'
'only
Lindeberq-Feller
theorem
r.p.'.s wf/'ll
distribution
jnctions
() Exi)
ii) Vartxf)
=pi
Ten
(9.3 1)
c/ < vs,
the relations
t2
11
Gi
=
t??,
0,
where
tr,, =
i
l
=
c/
,'
(9.34)
11
jg
(x
i 1 l-pilxsc
'
'
-pf)2
dFj(x)
0(c',2,) jr
alI
; >0.
(9.35)
9.3
Given that
1g
11
2
Cn i
F,
=
:2
-/z)2
Ix
.-pikxircf
dFftx)
(x
1% i ts n
this shows that the heart of the CLT is the condition that no one r.a
dominates the sequence of sums, that is, each (Arf
cf is small relative to
the sum gS,,- f(S,,)q c,, as n increases. The Liapunov condition can be
deduced from the Lindeberg condition and thus it achieves the same effect.
Hence the CLT ref ers to ihe istributional bealliour of the summation of
an increasing number of r.v.'s whfcll individuall), do not exert any
signlkfcant c/-kcr on the behaviour of the sum. An analogy can be drawn
from economic theory where under the assumptions of perfect competition
(no individual agent dominates the aggregate) we can prove the existence of
a general equilibrium. A more pertinent analogy can be drawn between the
CLT and the theory of gas in physics. A particular viewpoint in physics
considers a gas as consisting of an enormous number of individual particles
in continuous but chaotic motion. One can say nothing about the
behaviour of individual particles in relation to their position or velocity but
we can determine (at least probabilistically) the behaviour of a large group
of them.
Fig. 9. 1 illustrates the CLT in the case where
Ar,, are 1lD
r.v.'s,
1,
U(
1,
2,
Xi
l),
uniformly distributed
i.e.
i
n, and
+ Xz +
+ Xn.
represents the density function of Y;,
-/tf)
-:-1
,v
,(#)
.:-1
'
fl (p')
f (y)
--#'
+.
Z#
y'
.* Z
zz
--
(y1
*.
'
x.
where the
xfl.s
(y)
)'= Xi
1
Limit theorems
the SLLN to
to
r(,%,
' j--!S pj
jr 1.-,y.-?)
j-
--+
pj,
z xtt),1).
-..,
!),L/n)
,.
&
...-
:u
. .--
,,'1),,p(
N/
'
S - np
1-p);
''
g <
i)
1
-x,7(z:l exp
rx
a
u
-
du
(9.38)
.13?-46
=..
8 + .$ np
L-
#r(6 < S, % 8) :x *
y
l
N' (DJ?)(
pq
$ nn
'
(142.2
tnpii-i p)
1)
(140.3
16),
t(l?
-p)q
c-
s,, -p
< c
'
(-
1
-,z (2J:)
-e x/'
h, 100
p wa
lhat
uz
eXp
K--!.p.p
Pr
.-np)
e.qat),#?'
Pr
--
S2(?q
p
20 -
S5 oo
500 -p
.,
'
,#
=
1.
/:/( 1 - p)
.
<
0.944,
<
0.965.
c,2,
s x(m,,, ).
pl
Strictly speaking,
following sense:
such a statement
for large
(I)r(fJ
by
c
)X,,, n 7: 1)
CLT
Let
ft
lim (?,)
()
I
--+
j: + 0,
(9.40)
each
n
ln practice this
>
0,'
11
-1
i
L-.Z
result
(Xf - pi)
N(0, Z).
'v
(9.42)
t:t
Limit theorems
(9.43)
Since
lt
n-'l
(Xj - pi)
N(0, c'Ec)
,v
for all c + 0.
(9.44)
Then, using the Cramer-Wold theorem (seeChapter 10) we can deduce the
CLT result stated above.
As in the case of the other limit theorems (WLLN, SLLN) the CLT can be
easily extended to the martingale case.
CLT
for martinqales
.()?. n p: 1) be
t,%,,
dljlrences
S,,
Let
,,,
',,
,$',,
wl
.E'(.X'n/.f/.',,I )
a martingale
,
(f)
(9.45)
(9.46)
11
ctl, j
c/,
ci
EX,?),
then -..
f-n
?1
Xi
1
'w
x
=
This theorem is a direct extension of the Lindeberg-Feller theorem. lt is
important to note that (i)and (ii)ensure that the summations involved are of
smaller order of magnitude than c,2,
....-
9.4*
9.4*
11
jg
V Xi
--sn
V
if c2
tT2),
ZxN(,
-+
--
1im 11
co
-+
11
oj-vi
k
< cx',,
(9.48)
m-1
c/
Covtx''
+ Vart-f
Xi +.)
+j,
+.).
j=0
(?) a(rn)
04?,n-1),
z>
(r (?--
(9.51)
1:,
11
a.s.
E-Yi-.E.;t-)(I
- )
N i
--+
Jrl),'
(9.52)
./r
mixinq
processes
Let
n y 1) be a mixing process satisfyinq the restrictions
the SLLN to hold. ln addition let ly assume that..
imposed .J't??kfX,,,
(i)
'(Xn)
(9.53)
0,
(ii)
.(lA',,l2r)
(fff)
for 5',,(z)
'K
(9.54)
< crs,
--i
j Xi,
7--7+
tere
(np'l-'is(0) x(O, 1)
,,
--+
Limit tlleorems
-')
-')
SLLN
/br stationarq',
t.?rg(4Jfc'processes
1) be a stationary
n > 1, then
:x,
1 w-m Xi
-N i L 1
and
ergodic
process such
that
a s
.
-+#
(9.56)
E (X,,),
(9.58)
(9.59)
E('n)1't'+J''t'+''1
<
'v
then
l
-1 .-.r!
G
(5't?t!Hall
,.
srjtl j)
,
and S?)'Je'
(198044.
)'
9.5
8).
Summary
18 l
9.5 Summary
S11
. -#
ES
11
11
-+
the convergence
to
surely', i.e.
balmost
11
11
h1
The CLT, on the other hand, provides us with information relating to the
rate of convergence. This information comes in the form of the
factor by which to premultiply % E(%) so that it converges to a nondegenerate r.v. (the convergence in WLLN and SLLN is to a degenerate r.v.;
standard deviation of S,,,i.e.
a constant). This factor comes in the form of the
'appropriate'
1
N
,-k
,,,
g ar( s )j
(S,,
.E'(S,,))-/ Z
'v
(9.62)
N(0- 1).
,,
lmportant
concepts
in probability.
weak
Questions
1 and contrast it
k) P? <l)
Explain the statement 1im,,.-.
U5-,,
.g(.,,,
rl) =J?ll
1.
with #l-t1im,,-+
assumptions of the Bernoulli WLLN in
Discuss the underlying
relation to their contribution to the result.
Explain the difference between Chebyshev's and Markov's WLLN.
Whose behaviour do the WLLN and SLLN refer to?
Explain intuitively why a sequence of martingale differences with
finite variances Obeys the WLLN and SLLN.
Explain the Lindeberg-Feller CLT in relation to the assumptions and
.#?')
conclusions.
7*. ln the CLT why is the limit distribution a normal and not some other
distribution'?
'A1l limit theorems impose conditions on the individual r.v.'s of a
sequence in order to ensure that no one dominates the behaviour of
the aggregate and this leads to their conclusions.' Discuss.
Limit theorems
(i)
#/.Xp,
-1.
+ 2/)
(ii)
(iii)
u (jyy. (jj
11
Note:
i
Determine
)
=
#?-(.Y,, + n
=
ly
Additional references
Chung (1974); Cramer (1946); Feller (1968); Giri (1974); Gnedenko (1969);
(1963); Pfeiffer ( 1978); Rao (1973); Renyi ( 1970); Rohatgi (1976).
Locve
10*
CHAPTER
10.1
Introduction
?1
F,,(y)
Prlhqz'b-j
.Y,,)G )?),
jj
IA
1;
-n )'-=
-->
''
--+
11
a.s.
z-i
J
-
-rl i
Xi
11 i
j
-
F(xYf)
1
(WLLNI;
11
)
=
F(vYf) (sLLN);
1
Introduction
to asymptotic
theory
(CLT).
lim
c
?,-+x n
/1
j
=
Vart-tk)
0,
o(rl2).
tbig
( 10.6)
krough'
kimprove'
Modes of convergence
10.2
Modes of convergence
10.2
and
play a very important role in
of
of
only
because
the limit theorems discussed in
probability theory, not
they
underlie
Chapter 9 but also because
some of the most fundamental
concepts such as probability and distribution functionss density functions,
mean, variance- as well as higher moments. This was not made explicit in
Chapters 3-7 because of the mathematical subtleties involved.
ln order to understand the various modes of convergence in probability
theory 1et us begin by reminding ourselves of the notion of convergence in
ftf.?,,, G
mathematical analysis.
n
t is defined to be a function
sequence
from the natural numbers . # )1, 2, 3,
) to the real line R.
'convergence'
'limit'
The notions
'
,4
'
Dqpnition 1
f
A smuence ( a n g # is said to converge to a Iimit a f/' jr t:'t,ta?-v
Llrbitrar)'
(1 l?un'l/?t?/- A'(;) such
small nl/yl/afskls > 0 tbere ctp?-?-fpsptpnps
')
?;
,.
< t
1t.,,
w't? denote tbis
/1
l/-lar/'/?ta inequtllitq'
n > N(;);
wlll
all rt?r,e'n-s
?,, t?/'
/-':'IJs
/?p lim,,-.. fk,, a.
./?-
the s:?t/?..',nce-
E'xtl/nkp/a1
'
..
..
lim,,-, n
(1,/'n)''-Ie
.,x')'
j-
:.'
nj
.(log,
0, b > 0, b + 1,'1im,,-..I( 1 +
nln)
1, for any n > 0'. 1im,,-+
+
2.7 1828-,1im,,- .,. g(n2
+ n + 6)/(3n2 2n + 2)j -j;1im,,-+
. gx'''(n
() fo r n g t
,
'.
whose
This notion of convergence can be extended directly to kny
--+
The
necessarily
but any subset of R, i.e. /?(.v):(J R.
way
domain is not
this is done is to allow the variable x to define a sequence of numbel's
1x,,. h? G . t l converging to some limit x() and consider the sequence ) (x?,),
--+
?2i: . f as x,, .v() and denote it by limx .x(, /7(x) /.
.lnction
'
','
.-.
Depn i rt?n 2
A function
--+
xo.
(J
'for
crt?rl?
Introduction
theory
to asymptotic
a number
J(#
0 such that
>
Example 2
For
(x)
ex, limx.-
/1(x)
=
./1(x)
aox'' + t/lx'' -
+ a 11 j x +
-
function
1imJl(x) a,,.
(I,t,
x-0
Ix
(x)
t7X,
0,
a>
/l(x) (1 + x)1?-v,
=
(x)
/l(x)
lim /1(x) e,
=
x-0
1+ X
lim /1(.x) 1,
x-0
1im J?(x)
eJ,
x-0
gloget1 +
x) xj,
lim /?(x)
1,
x-0
l(x)
R, xa G D() is said
0, tere exists a
>
-(xo)I<?
./??-trllcr.y: x satiqjkving r/-l?rgstriction l-x x()l < J(c). W' denote this b).,
/l(-Y)
limx /-1(x) /,1(.X()).
is-said to be continuous if it is
q/'
point
at
its domain, D.
continuous
every
.,4
-xo
Example 3
The functions
(verifyl).
./ntrrtpl7
Modes of convergence
10.2
187
t,,(x),
n e: f
ln the case of a general function we can define the sequence
is a subset of Dh).
and consider its behaviour for each x e: where
.4
.4
')
Dqhnitioll 4
-)
-4
1,,(x)-
ll(x)1<c
holds
for all
x s
-4.
Example 4
1,
,,(x) k
/(
()
k!
lim
,:
-.
,,(x)=ex
x.j
ln the case where N(s, x) does not depend on x (onlyon c) then (ll,,(x),
n 67 #'')
is said to converqe uniformly on X. The importance of uniform convergence
stems from the fact that if each h,,(x)in the sequence is continuous and
converges uniformly to (x)on D then the limit /l(x) is also continuous. That
iss if
,
,,(x)
lim Jl,,(x)
=
for xo D
,:(x)
Xo
-*
for al1 x G D,
!1
untformly,
( 10.8)
CX)
-#
then
lim /?(x) ll(xo).
=
X0
''*
'
'')
t.Y,,(,s)
-
and
<
Ar(s)I
I.Y,,(,s) A'(-s)1<
-
for
')
(10.10)
s s S,
(10. 11)
.f
.)
'
188
Introduction
to asymptotic
theory
random variable they play a crucial role in its behaviour. lf we take its
probabilistic structure into consideration both of the above forms of
convergence are much too strong because they imply that for n > N
$z,,(s) Xlsll <f; whatever the outcome s (E S.
( 10.12)
The form of probabilistic convergence closer to this is the almost yurp
for a11s except of
con,erlence which allows for convergence of zY,,(s)to A-(.&)
S
which
#4A)
is
for
0,.
said
of
probability zero.
be
set
to
some
a
The term almost sure is used to emphasise the convergence on S-A not the
whole of S.
,4
.s-set
.gl
Dqhnition 5
fX
r.l.'s
.4 sctyttlnctl f#'
t.
(s),n (E
l1
''.
s said
a.S.
(tk.s.)t() a
>
-.t!.
z,
1.
-#
11
z4rlequit/alent
1
1im Pljs
--#
ll
tyL
k.Y,,,(-$) .(.)I<
-
tllazlt.l.sl
x'ith
F;-
all m > n)
f/?t? stronq
/kJ1,
(SLLN).
Another mode
of convergence
not considered
converlencv
(10. 14)
ls
conterqence
'sn-t?
is /?),
sl/r'f? convergence
t?
lllt, 'node
Iarge
in relation
in rth mean.
q/
nl/rrlltu?l-.s
to the limit
/3nlt?rl6
Let
ft,',,(y).
')
I 1r)
./y,.
c/?-
.::yz:
in
>
rth mean,
lim f(IA-,,
''''*
11
11
-+
X,
#'
A-lr) 0.
=
'Ed(';'
is the
c(?rlLrt/rlyt?nct?
in
lnean
189
Modes of convergence
10.2
Dejlnition
to
Ar,,(s),n
-)
(E
,.
X, denoted b Ar,,--> X,
'
?-.'.
of r.v.'s
.4 sequence
#'
1)
-*
:f
between
relationship
Tbe
probability
l)
( 10. 16)
'
t't?l
ctlnt'tlrfy?nt't?
be deduced b).'
tyornpt/rfnry
surel p and
almost
wr/,
( 16)
(14).
in
Dhnition
.4 sequence
F,:(x), n (E
.ft
A'
A' /'
,
t?./ l-.v.'s
')
')
,1
is said to converge
distribution jnctions
denoted b.v
in distribution to
x (s), n e:
wlll
-(-s),
-+
);
1im F,,(x)
11
-#
F(x)
(Et)
El/lfsll-r
-u(#,
-,#-)
')
The implication
Fig. 10.1
a.s.
--+
'=>
-+
Introduction
to asymptotic
theory
convergence than ( 16), which holds for a1l n 7:/.n. The implication
based on the inequality
r
=
-+
-+
is
(10. 18)
The
=
is rather obvious in the case where F(x) is a proper
implication
distribution function because for every .)>0, J >0, there exists N so that for
all 17> N, #?'($A',, ..Y1> c) < J, and thus
-+
-+
F,,(x -c)
implying that as
:,
l)
J,
0,
--+
( 10.19)
X 11 --+ X.
P
A' = .Y,,
-.+
-+
--+
-sufficiently
co
)(
a.s.
-->
!>
#?-()A',,
'j
>
11z= 1
<
.cc
a.s.
=
--+
(10.20)
-+.
a..S.
then f,,
-->
(10.2 1)
X.
r
.-.+
(see Serfling
r
An important property
l
of
p'th
mean convergence
some rb 1then Xn X foro < l < ?'. For example, mean square convergence
implies mean convergence. This is related to the result that if
-+
A((zY,,(r)<
'w
then F(1-Y,,(d)<
:t.
/-.
(10.22)
That is, if the rth moment exists (is bounded) then a11the moments of order
less than r also exist. This is the reason why when we assume that
slodesof
10.2
Var(.Y,,) <
191
convergence
ct..
implied.
asymptotic theory we often need to extend the above
results
to transformed sequences of random vectors .t(y(X,,),
convergence
iE f
above
results are said to hold for a random vector
The
convergence
n
.fX
f
k
if
they
hold
for each component Xin, i 1, 2,
,1
c
n
sequence (
of X ?1
In applying
').
'')
Lemma 10.1
')
11,
a.S.
(f)
a.9.
#(X,,)
X,, --> X
g(X,,)
X,,
-+
-+
and g(
'
): Rk
-+
Ra
(10.23)
g(X),'
( 10.24)
(iii) X,,
sequence
-.+
-+
( 10.25)
g(X).
These results also hold in the case where g ) is a Borel jnction (see
Chapter 6) when certain conditions are placed on the set of discontinuities
of g ): see Mann and Wald (1943).Borel functions have a distinct
advantage over continuous functions in the present context because the
limit of such functions are commonly Borel functions themselves without
requiring uniform convergence. Continuous functions are Borel functions
but not vice versa. ln order to get some idea about the generality of Borel
functions note that if h and g are Borel functions then the following are also
Borel functions: (i)a + bg, a, b G R, (ii)
(iii) maxt/?, g), (iv)mintn, (F), (v)
'
'
'
11,
l.
Of particular
interest in asymptotic
( 10.26)
Lemma 10.3
,f
Let t X Y n c f
l1 5
?1 '
')
be a sequence
(?Jpair
(?/'
randotn
k x 1 vectors.
lntroduction
theory
to asymptotic
(1)
-+
(x?)If X?,
-+
-+
Y 11 --+ 0
= X ??Y 11
0,'
-,+
and
Y,, --. C
D
(('o??s/Jnl)
(X,,+ Y,,)
= Y, X
1
10.3
11
--.
--.
X+ C
CX;
Convergence of moments
Consider
the sequence
of r.v.'s
( .Y11,
that
A-,,
-+
A-
!1
--*
LX?'
where F,,(x) and F(x) refer to the cumulative distribution functions of .Y,,
respectively. We define the momens q/' A',, (when they exist) by
and
-
/J(.Y;l )
xr
d F,,(x),
(10 8)
.2
,/,',(x)
',r,))
lim f).Y,r,),
( 10.29)
F( Xr11) EEEEq.)
M
r)
xr d yqx )
,
-%
( 10.30)
moments of .Y,,are defined in terms
10.3 Convergence of
moments
of its asymptotic
.E'(I-Y,,Ir)
Lemma 10.4
1/' A'
A- then;
--+
11
lim inf
11
--#
E'41-Y,,1)
> F(1-1)..
C42:
-*
;y SJa rt
ah,,)
,l)
cxl
-+
(x)
d.p 0
IA',r,I
=
(jxir>t.)
l-tlrl< uo
and
Lemma 10.6
Lemma 10.7
P
If .Y,:
limn
-+
<
n y 1) is uniformly
'(!aYIr)
txprj,
X and
-+
:x;
integrable, then
F(.Y,',) f(.Yr).
=
.,
Lemma 10.8
l.f
a S.
.
x,,
-+
limu..+.'txYrnl
xinf
F(I-,,Ir)A'(lxlr)
,v
then
'txYrl.
(For these lemmas see Sertling (1980).)Looking at these results we can see
that the important condition for the equality of the limit of the rth moment
and the Kth asymptotic moment is the uniform integrability of tA',r,,n y 1)
which allows us to interchange limits with expectations.
Introduction
to asymptotic
theory
+ /t''(>-rlln',r
-#(/z'r)
glmr)
-i#t2'(/t',.)(,nr+
-/z'r)2
-p'r)
'
'.
(10.31)
>#(/z'r)
Var(g(?'nr)) as
-'(:(,n2))3
Eglm
-/t'r)2,
+#t2'(/'r)f(,rlr
(10.32)
Vartmr),
(kt1'(/z-r)(I'2
(#'11(p'r))3'(g(,nr)
gt''(/t;))2#t2'(/z')'E#(,'nr)
(10.33)
-:/'r)1*,
(10.34)
-g(/z'r))3
+c
reads approximately
where
equal. These moments are viewed as
moments of a statistic purporting to approximate glm and under certain
conditions can be treated as approximations
to the moments of glm
(seeSargan ( 1974)). Such approximations must be distinguished from f(A7,)
as well as 1)A-r). The approximate moments derived above can be very
useful in choosing the functions g( ) so as to make the asymptotic results
more accurate in the context of variance stabilising transformations and
asymptotic expansions (seeRothenberg (1984:.In deriving the asymptotic
distributions of g(m,) only the l'irst two moments are utilised and one can
improve upon the normal approximation
by utilising the above
approximate higher moments in the context of asymptotic expansions. A
brief introduction to asymptotic expansions is given in Section 10.6.
:
:xt
'
'
10.4
The
ibig
10.4
Let tfat,,
),,,
The
0' and
tbig
195
o' notation
ilittle
Dejlnition 9
is said to
ttku,nc..,.1-)
The sequence
denotedby
>'
J,,
O(hn)
tf lim
11
bn
-.
<
K,
>
0.
(10.35)
Dejlnition 10
The sequence
denoted by'
1
a',
is said to be
tln,ng.zfz-l
lf
/(/7,,)
iof
a
lim --'!
ll
--#
(10.36)
=0.
bn
'2Kl
Example 5
1
2n2-
= O ip. ; (n+
ln + n2
O(n5n2+ n3 =
1),.
1)
log. n
O(n) o(n2);
o(n?),
a > o;
expt
n)
(6n2+ 3n)
o(n-),
o(n3)
O(n2).
pl
O(n*) then
The 0, o notation
If
an
o(n'+)
a, J > 0.
'
O(q,)
and
',,
J,,',,
O(c,,), then
O(c,,c,,);
IJ,,IrOert
-
an+
),,
,1
c,,)).
otrrlintcu,
/?,,
=
t?(c,,),
Vbig
0%above.
then
Introduction
to asymptotic
theory
Z.
We say
(x) O(g(x)) as x
=
xo
--+
if for a
/?(x)
K. x (E CD- xo).
lim
g(x)
x-xo
Moreover, we say that /1tx) o(g(x)) if
:;
/14x)
lim
x-xv
0,
g(x)
x G CD- xa).
Example 6
Jl(x)= ex
1,
and
/l(x) cos
=
1 + otx)
(x)
x,
lx) - g(x)
O(l(x)) we write
and for
'
(x) g(.x)
-
0(/(x))
then
/y(?,)()( ())
+--- - J + o(j
,2
n!
kt
) as
--+
0.
The 0, o notation
considered above can be extended to the case of
stochastic convergence, convergence almost surely and in probability.
Dehnition 11
Let kfX ?1 n is
,
'')
'#
be a sequence
(4j%r.t-.'s
and
R,,,n G
'(1
i.' a sequence
of
exists ntln-
10.4
The
()9 and
kbig
stochastic
ilittle
o' notation
)J,,,
sequence
(i)
that
suc
A'p'
(ln J
')
..i
a ?,
C'',
-..+
()
t?p((r,,).
(.',,
in probability i.f
't
tl,,
0,' denoted by Xn
-+
p(c,:).
t?g.s.(c,,).
JtX,,,
'')
where X,, N
X
(E
xjn
:
jn
J,,(a,,)
jr
some
Ja
(E
be
f
.j
(1
of k-dimensional random
sequence
1, 2,
Op (cj,,), j
k)
1, 2,
t'ectors
such tbat
.
n.l,
0(,,),
non-stochastic
#'' sucb
sequence
of
rllr?l
k-dimensional vectors
g,,(X,,) 0,(7,,)
=
198
Introduction
to asymptotic
Useful results
(OJ)
If Var(Ar,,)
relating
theory
tlrdel.
to
deviation.
(02)
(OJ)
If X,,
0/1).
Xn
:cc>
Xn +
--+
tp/1)
X.
(04)
lf -Y,,
10.5
-+
-.+
,.d
1g
mr
.)).
:'f'-
11
Fl -Y7
i 1
(10.38)
-=
1g
.E'4,?1,)
?1
=-
)
=
1)
=-rl jg p'r
i 1
=
(10.39)
10.5
Extending
199
Let
1. 2,
then
nzly
=
,'
''
where S,,
n
.
Y Ff,
=
and SLLN
P
f
( 10.40)
and
(1
( 10.4 1)
r,
then
(
wn
lf
Vart#fl
-+
Gy
c2
<
..'x
:;c
( 10.42)
1),
pj. -p')
-/t;)2
.E'(Arr
c)l,
ly
jj
11
frnr,nkl-iyj.1 j
i
-x(0,
VartFk.l
ry.), rb
jj
cy2<
'r)
mr - #
&A-r-Y))
n-1
/
= - Fr +k +
n
rl-'
!1
--s
F(-Yr+k) +
nc
Z 2 ;(A'rA-))
i y,./
( 10.43)
#r#k,
(/t;+k
-/.t;/zk).
( 10.44)
-/z;/t'k,
We
(iv)v'km,
l'nr
z -x(0,E),
-p'r)--+
(n'?11 ma,
m,.), p',.
(/t/1p'2
,
/t',.)
(10.45)
..j
,r.
Zl'j
to asymptotic
Introduction
theory
Example 7
lf
4?(
'
) loget
=
'
0, p'v> 0, Z
>
0:
logetrnr)
-->
logel/t'rl,'
a.S.
(ii) logetn-lr)
logetp/,.l;
-'.
/t;)j
,.'
c
& (/-tzr
/ -pr )
I)ltn.?r
.52-,--,
-
(iii) log e
log (? z
-.+
The last result, however, is not particularly useful because we usually need
distribution
and
of
derive
of
that
lmrl
the
not
to
,.'
(;1!
-p21
asymptotic
of
Let
distribution
derive
the
us
#?lx''Entrrlr
x' pzv -pr )).
g(mr), taking the opportunity to use some of the concepts and results
propounded above.
and hence from the
From 01 above we know that nly=p'r + 0/ 1/V'n)
Mann-Wald theorem we can deduce that if q( ) has continuous derivatives
of order k then
,.F'''
/.
,,
,,
'
( 10.46)
Assuming that k
qm)
r
and
andby o2,
) -f?(p'r))
x/''N (# (r?'?
r
Op(n- t)
v'-nln'lr-/,t'r)gt'1(p')r
+ o p (1).
Let
1z;,x'
=
then
n(g(rrlr)
P;, &U,,+
=
-(/(/.tr)),
L ,,
x.'
( 10.47)
-Jt:)
lit?'Flr
t.
=J
( 10.48)
1).
0p(
(/zr)#
From Lemma 3 we may conclude that if b'n -+ U then 1,$-+ cb', and thus
j'
(v) w/ ntf/trrlrl
For example, if glnr)
-g(/.tr))
'-a
A'(0- (/tzr-pr
.a).
)Eg )(/-tr)1
jj
(10.49)
loge n, then
,
/2
/t 2r -- 1if
7
n'lr- loge pv) N 0, . wj ..
N ,''ntloge
Jtr
a
,
'v
( 10.50)
to the
20 1
10.5
H(g1(l'nr),
#./2(I'nr),
vector
f?k(ln,.))
(10.51)
.-
'li
ln'l./
mr
1, 2,
I, j
1 2,
r.
-/z;
( 10.52)
z1
.i
?'1
J4
11 i
11
xf
?1
j'r 2 !
f
then
,rs)
qz,
c'2,
.7:,-
.-,.,
-Sz - zl zz
2
zs z 1
( 10.53)
'
-.+
-+
,1
/1(/z)) N,
Un((2,,)
'w
DVD'),
(10.54)
202
Introduction
to asymptotic theory
where
5,5 Cov(Zj),
(see Bhattacharya
P(Z)
t'lzf
10.6
1, 2,
zmJ,
expansions
The CLT tells us that under certain conditions the distribution of the
gVar(,%,)q), tends to a standard
standardised sum F;, (gS,,
normal r.v., i.e.
'(5',,)j/v
lim F,,()?)
-.+r
(14.J7)=
11
1
x
l.o 2
exo 1-
x (21)
i'
) dl.
( 10.55)
*(y)I
.
=p,
'
(I.'f
-*4A,)1 'K C v
sup 1J--,,(.:)
.-G
R
>
rs
log n
(10.56)
----,
theorem
Under the same conditions as tbe Liapunov result we can deduce that
c uu
supl#,,ly) -*(y)1 G--.r,i
ca
x'
33
C--j-.
>
( 10.57)
That is, the factor 1og n was unnecessary. As we can see, this is a sharpening
of the Lindeberg-l-evy CLT (seeChapter 9) in so far as the latter states that
*(y)1 0 as n :x. without specifying the rate of convergence.
The Berry-Esseen theorem using higher moments provides us with the
a dditional information that the rate of convergence is O(n-). Various
authors improved the upper bound of the error by reducing the constant C.
The best bound for C so far was provided by Beeck ( 1972), 0.4097 G C <
supyI#,,(y)
-
-+
-+
0.7975.
XL,
-/tfl3)
=pi,
asymptotic
203
expansions
( 10.58)
Although these
absolute magnitude
(i.e. uninformative)
:yz
as n
y
order to reflect the
v:3
*(.$1 G
c'
1f-*,,4.r)
>
:$
1
1+ yc
( 10.59)
Fk(.p) *(.r)
>
Z (,,2)+ Rr,,(.v),
( 10.60)
-----.-,
f=1
%&
,(x)
,(x)
-X
dx<
I/'(x)I2
.....c ,
204
Introduction
to asymptotic theory
,yz
Hklx)
..
1) dk4(x)
--
/(-Y)
'
dx k
k 0, 1 2,
=
( 10.62)
WI
Hks,xjHmlxjst.
dx
=/f!
=0
The first five of these polynomials
H0
1 H1
are
=x2
Hz
=.x,
otherwise.
1, H5
x3 -3x,
-6.%2 + 3.
( 10.64)
=x*
H.
and
'.yl
'wx
,
J.
bkllkx).
./;,(x)k=0
-
( 10.65)
sufficiently accurately by
Although in principle we can approximate
choosing a finite value for the summation. in practice we prefer to
approximate the ratio L)n'
(x)/4(x)qin view of the fact that it is usually
and thus easier to approximate sufficiently accurately
smoother than
by a low degTee polynomial. This ratio can be approximated by
,(x)
,(x)
,&(.Y)
( 10.66)
where
( 10.67)
These coefficients are chosen so as to minimise the error
./;
.
-
/(x)
.
'
(x)
t)(x)
(o bklikxs
./,',(x)
(I.Y.
can be approximated
by
./')(x)
where
=4(A:)
/*1(x)
Z bklikx).
k=0
(10.69)
10.6
205
expansions
b3
3/t'1
6
hp
- ),
etc.
/*(x) ).
.
l1
k=0
./?',(x)
(10.70)
where
4('(x)
dk/tx) k
dvk
,
1, 2,
That
-4
./;,(u)
F,*(x)
'
k=0
r
-1
(I)tk) (x).
k!
( 10.72)
,.',f
reassemble
f ;1'(x ) 4(x )
-
1+
.,1.
(x)
-.-3.6.,....-,
k
rl)
k (x?'
-
( 10.73)
lntroduction
206
to asymptotic
theory
where
is a polynomial in x. ln order to be able to choose the order of
Rr,,(x) defined by Ro,(x)
the approximation error (remainder)
(x)
we need to ensure that the above series constitutes a proper asymptotic
expansion. An asymptotic expansion is defined to be a series which has the
property that when truncated at some finite number r the remainder has the
same order of magnitude as the first neglected term. ln the present case
the remainder must be of order
,4ktxl
=l(x)
+ 1
(?1-)r
j (;
l.tr
-e
-f1,
1 ). 2 1)
(1:
g4)
./'(x) k=0
akkzktx) + Rr(x),
(/k(x),k > 0)
/k+
1(.X)
( 10.75)
is an asymptotic
sequence
as x
-+
O(/k)
xo
(10.76)
>
k (
l
--
expansion
of the form
.4ktxl
/(.'x) 1 +
.,(x)=
(seeFeller ( 1970))the
?') k
(10.77)
+ Rm
1'/21)
asymptotic expansion and R (x) otn-nr'h
This
expansion
widely
applied
Edgewortb
and
has
is known as the
been
in the
econometric literature (see Sargan ( 1976), Phillips (1977),Rothenberg
( 1984), inter alia).
In order to illustrate some of the concepts introduced above let us
consider the standardised Borel function
a proper
constitutes
'',:
z,, x(7'
=
''
l f
l Xi
( 10.78)
-p
=y,
zzzpi,
.r
c()= 1, cl
=0,
C5
1
-
&5,
10.6
expansions
207
Ki,
-p)
c3,
&4.
,(x)
,:3
lc
l &
1 (3J(x) +. 1 ....
45(x)
y*txl.f:(x) 4(x) --3!k-.4
-j -4
ua
yf u
1 .s, + l0,l 4(,j (x).
+ -f n
n
=
(j(j..yq)
Collecting the terms with the same order of magnitude we can construct the
Edgeworth expansion
1x
--l
./;,(x)=
(4j
10
.j.
g c,,,
(10.80)
here Rc,, O(n-f). ln terms of the Hermite polynomials
moments this takes the form
./;,(x)-
4(x)
+)
(q) (?-:-
3) 1',2:x'
''33tx'+-'
!
o'
10
p5
a
c
S6(x)
6!
+ Ra,,,.
(10.8 1)
O(n1-4r/2))
From these expansions we can see that in the case where the distribution of
the CLT approximation is of order 1 n
the Arfs is symmetric g(/t3c3)
and when the kurtosis a4= (p4c4) 3 the approximation is even better, of
order 1 nl and not of order 1 x/'n as the CLT suggests. ln a sense we can
interpret the CLT as providing us with the first term in the above
Edgeworth expansion and if we want to improve it we should include
higher-order terms.
and Edgeworth expansions for arbitrary Borel
The Gram-charlier
Ar,,can be derived similarly when the moments needed
functions of aY1
to determine the required coefficients are available (see Bhattacharya
(1977)). When these moments are not easily available the approximate
moments considered in Section 10.5 can be used instead. lt must be
emphasised that Edgeworth expansions are not restricted to Borel
functions (X) with asymptotic normal distributions. Certain Borel
=0)
208
Introduction
to asymptotic
theory
x dk
.L,,(x) e
i. (.x e -=
'7),
t.ydx
1 2,
( 10.82)
Error bounds and asymptotic expansions of the type discussed above are
of considerable interest in econometrics where we often have to resort to
asymptotic theory (see Part 1V).
Important
Convergence
concepts
almost
surely, convergence
in
Questions
Why do we need to bother with asymptotic theory and run the risk to
use inaccurate approximations?
Compare and contrast the two modes of convergence, convergence in
probability and almost sure convergence. Explain intuitively why
P
.s.
a
-+
-+
,
-..
'2,
-,,
10.6
8.
expansions
209
Explain how the results of the limit theorems for )'.j Xi can be
extended to the sample raw moments
)'-k A'r, rb 1. How can the
latter be extended to arbitrary continuous functions of them'?
Discuss the
for deriving the asymptotic distribution of f7(/?)
((/( ) continuous) from that of rnr.
Compare and contrast the following concepts:
?'th-order
(i)
moments',
(ii)
limits of rth-order moments;
asymptotic rth-order moments; and
(iii)
approximate moments.
(iv)
-method
'
expansions.
What order of approximation does the CLT provide in the case where
the skewness and kurtosis coefficients of )'=l Xi are zero and three
respectively'? Explain.
Discuss the question of how Edgeworth approximations can help us
discriminate between asymptotically equivalent Borel functions of a
sequence of r.v.'s X I1 n > l 1
.(f
Exercises
Determine the order of magnitude
sequences:
n3 + 6n2 + 2
-
6n3+
n rr::rc .1 2
1 .-
1.,1 1 2
=:::
Determine
(bigO and
small
0)
of the following
,.
?
to asymptotic
Introduction
thtxory
For the 11D sequence ).Y,,, n k: 1) where A-,, N(0, 1) we know p,,
()'- 1 Xl) z2(n).From the CLT we know that h,,(X) (p,,
'v
-n)/
'w
Ev/'(2n)) N(0,
'.w
(Z
(2x/2)/x/n,a4,,
approximation
of
(X) of
12/n.)
references
et al.
(1975);
Billingsley
(1968);Rao (1984);
PART
lII
Statistical inference
C H A P T E R 11
11.1
inference
Introduction
rtx,'
./'(x;
The nature
of statistical
inference
11.2
(see Chapter 2). But it is no more than a convenient descriptor of the data in
hand. For example, we cannot make any statements about the distribution
of personal income in the UK on the basis of the frequency curve .J*(x).ln
order to do that we need to consider the problem in the context of statistical
inference proper. By postulating * above as a probability model for the
distribution of income in the UK and interpreting the observed data as a
sample from the population under study we could go on to consider
questions about the unknown parameter 0 as well as further observations
from the probability model, see Section 11.4 below.
ln Section 11.2 the important concept of a sampling model is introduced
(p=
).(x; @,0 e:O),
as a way to link the probability model postulated. say
model
E!E
sampling
available.
The
x,,)'
obstrved
data
the
x (x1,
to
statistical
needed
ingredient
define
second
important
to
a
provides the
statistical inference.
model; the starting point of any
ln Section 11.3, armed with the concept of a statistical model, we go on to
discuss a particular approach to statistical inference, known as the
frequency approach. The frequency approach is briefly contrasted with
another important approach to statistical inference, the Bayesian.
A brief overview of statistical inference is considered in Section 11.4 as a
prelude to the discussion of the next three chapters. The most important
concept in statistical inference is the concept of a statistic which is discussed
in Section 11.5. This concept and its distribution provide the cornerstone
for estimation, testing and prediction.
.
iparametric'
11.2
nature of statistical
Te
inference
D ehni tion 1
-4 sample is J#n?J
.
A',,)
.
jnction
,
to
??
variables
(-Y1 -Ya
(?-.p.-s)
(lensit
,
densit
(nctions tr'o!'nc!'Jpwith the
0o4 t?y postulated b)' the probabilitl' model.
'krl/tpst;r
./'(x,'
*rp-rft.a'
'
J'
Note that the term sample has a very precise meaning in this context and it
is not the meaning attributed in everyday language. In particular the term
does not refer to any observed data as the everyday use of the term might
suggest.
The significance
distribution
which
Djlnition
have a
/-(x
-v,,
; 04 N
A',,)' is /t?.#??t?J
to lat?the
zY,,delloted b
1,
.'
/'(x ) p).
./'(x;
De-hn rt'pn 3
zY,,)is called a random
.,4 set t?. random variables (.Yj A'a.
0) (/ the
A' j zY2,
X,, are independent (7/7J
sample jom
identicall), distributed (11D4. //7 this ctk-s'r tbe distribution of te
-
.f(x;
-.t'.'s
11.2
the
sample rt//t-e'-s
/rn'l
tbe
//?f?l-.v.'s
.#?-sr
identically distributed.
are
rf?th (.2
l'ha t
.?c/
'
.4,'
relaxed.
#ni l???
-:'
2/,11..$
A',,) is said
n
l-espet-
Jo
l i ?.rt?/
.
'k'.
case r/lt?distribution
(?j'
l'tl'n
.7
.,
lt
(1p-t?
sanlple rtk/t?.'.k
the
r/'7t.?
( 11.4)
belong to the same
the density functions I-l.i ; %). i 1
family but their numerical characteristics (moments, etc.) may differ.
If we relax the independence assumption as well we have what we can call
non-random
sample.
a
U sually
,2
,??
The nature
of statistical
-''
inference
4 se:
( 1 1.5)
given xo wllprtl .J'(xj
conditional distribution of Xi
'x:
,x
,2,
i 1
f/illt??'l Xl, X2,
-
; 0i4'
,n,
represent
zTi
-
the
1.
Dhnition
11.3
A'I,
-+
'in
koptimum
ton
The nature
of statistical
Probability
4)
inference
model
G (i)j
1 (x ; $ ) 0
Distribution
/(x
1,
xz,
of the sample
xn ; 0 )
.
Sampling model
(.X1,Xa
Xn )
EE
Observed data
EE
(x,, xz,
xp)
Fig.
to statistical
inference.
.J'(px)
./(p/x)
.
=
0f (t?)
':x
/(x)
..f(x/p)./
(p),
./'tx)
being constant
for
11.4
An overview of statistical
inference
11.4
An o&'erview of statistical
inference
model?
-2,
misspecljlcation)
Point psrfmt/rt)n
(t?- just estimatio: refers to our attempt to give a
numelical value to 0. This entails constlucting a mapping h(') : 1.-.+* (see
Fig. 11.2). We call function h(X) an estimator of 0 and its value h(x) an
The nature
of statistical
inference
g( #
-
Oo
estimation.
'
-.+
'valid'
region,
where Ca t..p Cj
-??''
region,
(seeFig. 11.4).
11.5 Statistics
223
'
Co
(%
f'l
testing.
...,t'
-+
11.5
q( ):
'
.?'
-+
The nature
of statistical
Dellni
T't??
A statistic is
y('?jt/
q( ) : 4
--+
Estimators,
'
'
inference
'
are al1
Y,,
-j,'
sz
s.
. ..
ueun,
j
=
on
11
)12
11
-.-
n- 1i
y (x
-.
vkj,
i -
2 ,.
called the
sflnlplf;z
ptlrftlnc't?.
(1l
l0)
-7!k'
pi
&
.)k'
...--..j:f
cccn ....
!.
(!17-)
(
.
and
1
pl
y
n .- 1 i 1
/a(.,y)
--
(xj
-/t)2
-,.
q(') : .i'
--+
(.)
R'l,
function
( 11.13)
F( )')
.P#t?(X) : )')
( 11.14)
11.5 Statistics
when the distribution of X is known and several results have been derived.
The reader can now appreciate the reason he she had to put up with some
rather involved examples. All the results derived in that chapter will form
the backbone of the discussion that follows. The discerning reader must
have noted that most of these results are related to simple functions t?(X)
Ar,,. lt turns out that most of
of normally distributed r.v.'s, A'l, Xz,
the results in this area are related to this simple case. Because of the
importance of the normal distribution, however, these results can take us a
inference avenue'. Let us restate some of these
long way down
results in terms of the statistics Y,, and s'l, for reference purposes.
.
-statistical
Example 1
.Jtx;0)
(I)
'
(2zr)
c
,
1 x- u
exp 2
0 EEE(p, c2)
,(x;
R x R+ ;
#).
x=
1
N
11
y xj
11
=1
and
(.; J.t)
.x'J.-..'7.-,
x (() jj.
?
( 11.16)
( 11.18)
(iv)
Note that
n
jl
i
Arl
.
''$
-
%''
u
r'z
X -y
l'
.$2
+ (n - j ) Jj
(n )
( 11.19)
226
inference
and
Covf
x.
y,2)
.;f'
?1
0.
n( Xpj p )
-
.j
.r(u
(s2/c2)
,'
yn
('rpl/z2)
x
(11.20)
Sn
1, m
( 11.2 1)
1),
Ggood'
Strue'
isecond
as:
a S.
.
--
X 11
T,
(iii)
p,
->
-+
and
p;
(Y,,-p)
w'
z x(0,1);
-,
( 11.22)
,x.
=p,
=p.
mr
ni
19
Z aYr
,
(11.23)
11.5 Statistics
227
a.$.
lllv
-+
r?lr
--.
/.
pv,
P
('ii)
;i';r
ftrl (1
( 11.24)
(iii)
with
'tr?1rl
p;,
avl y.
=
(p;)2
,
(11.25)
assuming that pr < :x,
It turns out that in practice the statistics :(X) of interest are often
functions of these sample moments. Examples of such continuous functions
of the sample raw moments are the sample central moments being dened by
.
(11.26)
r > 1.
These provide us with a direct extension of the sample variance and they
represent the sample equivalents to the central moments
X
pr
(.X
(11.27)
d.X.
-FI'.'JIXI
(i)
'
a S.
.
br --. Jtr;
P
#r
#r
-#
V'nt,
,'
-pr)
Gr+
where
e'*l
r
/42,+ 2 '-ph
'L<L:
;
.x((),
-2(r
(11.28)
1),.
+ 1)/trpr+ a +
see exercise 1.
(r+
Lllyvlyz,
( 1 1.29)
228
inference
Step 1
Under certain conditions F;, :(X) can be shown to converge in probability
to some function of (p) of 0, i.e.
=
(f?), or
Y;,
-+
Let F.(y*)
11
,,
?1
,. /1
(?,
h(04.
Y;:
( l 1.30)
c,,(@, n k: 1) such
(11,,4p),
o
y- +
.s.
-+
...+
that
( 11 3 1)
N(0 1).
,
for
rfyc n
and F.(y*) can be used as the basis of any inference relatipg to F,, t?(X). A
question which naturally comes to mind is how large n should be to justify
the use of these results. Commonly no answer is available because the
answer would involve the derivation of F,,(y) whose unavailability was the
very reason we had to resort to asymptotic theory. ln certain cases higherorder apprbximations based on asymptotic expansions can throw some
light on this question (see Chapter 10). In general, caution should be
exercised when asymptotic results are used for relatively small values of n,
say n < 100?
=
1
-
t1
(number of
xjs G x),
Appendix 11.1
229
1 if xi (E ( vt0 otherwise,
-
Zj
xq
)'-
distribution postulated in * is
thell F,*,(x) ( 1/n)
1 Zi. If the original
F(x), a reasonable thing to do is to compare it with F,*,(x). For example,
=
o,,
=max
F(x)l,
l#,*,(x)
-
is the
D,, as defined is a mapping of the form D,,( ): -+ g0,1(1where
observation space. Given that Zi has a Bernoulli distribution F,*,(x) being
Z,, is binomially distributed, i.e.
the sum of Zj Zz,
.9-
,?'
'
Pr F,t(x)
gF(x) k(1
E1
F(x)
,,
)I, k
0, 1,
n,
-F(x))..
x,'',?(/7(x)
tFtxlg 1
z x((),1).
,w.
F(x)q )
result
v'n
.D,,--+
.p
where
J?
F(y)
1- 2
k-1
1
( 1/ -- expt - 22
-
p2)
of *;
see
Important
concepts
Questions
Discuss the difference between descriptive statistics and statistical
inferenee.
230
4.
5.
6.
9.
10.
--+
Exercises
1.* Using the results (22)-(29)show that for a random sample X from a
distribution whose first four moments exist,
nxn
-/z)
ntsaz -
XN
0
0
c2
ps
y5
y4
-cr*
Additional references
Barnett
(1973);
(1977);Cramer ( 1946);
Dudewicz
(1976).
CHAPTER
12
Estimation
l - properties of estimators
(.J(x;P), 0 6 O), O L
(i)
(ii)
X H(Ar1, Xz,
R;
X,,)' is a random
sample from
.J(x;0).
-+
--+
'
Example J
.133-,,z:
1(
(i)
(ii)
/1
Ni
p-2
P-3
-Yf,'
--
ki
)''1 X i'
..)'
''''''-
231
1 2,
fr
12
,
'
n>
>
1;
Properties
p-z
p-5
(X 1 + X );
11
1g
?1
N i
jj
2
i ',
zr
,1
p-6
$
1
-
of estimators
K f
iA-i
=
1g
'
'
)
n+ 1i j
Xf
igood'
12.1
>
'most
'how
'good'
Dejlnition 1
# 0-f 0 is said
.,1,7estimator
t.'.;jtyyd
&p be an tmbiased
estimator
of 0
#-
E'lJ)
() dp.
........
.
.
t#'
12.1
( 12.2)
(iistribution
0) =.J(x1 xa,
where
x,,; ?) is the
(?Jtbe sample, X.
without having to derive either of the
Sometimes we can derive
abovedistributions by just using the properties of E ) (see Chapter 4).
For example, in the case of the estimators suggested in Section 12.1, using
and the properties of the normal distribution we can deduce
independence
N(p,
l
(1/n), this is because ('Ikis a linear function of normally
that
distributedr.v.'s (see Chapter 6.3), and
.(x;
'(#)
'
'v
J;(;1)
E N
11
)( xi
=
11
-/
)(
1;(A',.)
N i
N (?
11
p=--
=
''-=
(see Fig. 12. 1). The second equality due to independence and the property
.E'(c) c if c is a constant and the third equality because of the identically
distributed assumption. Similarly for the variance of #1
=
jj
flt)1 P)2
-
11
-j.
K
(A-g- ?)2
11
.-j
f
Eqxt. - 0jl
Ka i
11
1
=
(12.4)
::=:
1 2,
'v
1-1.-
27 2
,
nz
-? nz 2 (n
5x
np
)
,
Properties
of estimators
n + 1 0, (rl+ 1)(2n+ 1)
6n
2
-v.
',y
'v
n 0,
n+ 1
(n+ 1)c
Hence, the estimators #1,#2,#aare indeed unbiased but #4, 4) and #, are
h,
biased. We define bias to be B(0) f(#) p and thus ,46)
E(2 n)/n?0,
Rt/5 ) n2( 1 + p2)- p, #())) g(n 1j/2qp, #($j=
0(qn+ 1). As can be seen
from the above discussion, it is often possible to derive the mean of an
estimator # without having to derive its distribution. It must be
remembered, however, that unbiasedness is a property based on the
distribution of #.This distribution is often called sampling distribution of 1
in order to distinguish it from any other distribution of functions of r.v,'s.
Although unbiasedness seems at first sight to be a highly desirable
property it turns out to be a rather severe restriction in some cases and in
most situations there are too many unbiascd estimators for this property to
be used as the sole criterion for judging estimators. The question which
naturally arises is,
can we choose among unbiased estimators?'.
Returning to the above example, we can see that the unbiased estimators
#2,05 have the same mean but they do not have the same variances.
j
Given that the variance is a measure of dispersion, intuition suggests that
the estimator with the smallest variance is in a sense better because its
around p. This argument leads to the
distribution is more
second property, that of relative efficiency.
=
ihow
iconcentrated'
Dehnition 2
more efficient
and
1
-
n
1
vart'/al,z=<
< -
k
1
I/) vartpa)< 1.
'1
vartl )
=-----%-
efftlj
Var(2),
1, 2,
h or h
1,
varta),
0)1
=var()+
E#()q2,
(12.6)
12.1
/'(j)
f (z)
)
/ (J:.,
'!
'(* -
$2 <
MSE(*)
<
#* ij
E(
'i
if
p)2
MSE(t.
As can be seen, this definition includes the definition in the case of unbiased
estimators as a special case. Moreover, the definition in terms of the MSE
enables us to compare an unbiased with a biased estimator in terms of
efficiency. For example,
MSE($)
<
MSE(l),
Let us consider the concept of MSE in some detail. As defined above, the
MSE of depends not only on but the value of ()in O chosen as well. That
is, for some pa e:O
MSEII,t?o)=E(4- %)2
=
figt-Ft')l
= Vart)
+ E)
-p())j2
g#(, p())12,
(12.7)
pe) zittil 0v is the bias of #
Properties
of estimators
f'(J,)
(Ja)
'C
r'
0
Fig. 12.3. Comparing
h and h.
MSEIVI
,
04<
MSEIVa,?), for
a1l 0 e O if n > 1.
ln view of this we can see that 4 and s are inadmissible because in MSE
terms are dominated by ,. The question which naturally arises is: can we
find an estimator which dominates every other in MSE terms'?' A moment's
reflection suggests that this is impossible because the MSE criterion
depends on the value of ? chosen. ln order to see this 1et us choose a
particular value of p in 0, say 0v, ani define the estimator
0* 00
=
for al1 x e:
,?,'
'true'
12.1
(7R(?)
---
d#(p)
1+
Iog ytx; ?)
t?p
of
sample
the
0* of p
(CR1)
(CRJ)
./'(x;
.,4
-t?r
,'q(()
./
1. 2, 3, exist
.k.'
estimators
lO g (x ; P)
Pp
,/'
Var(p*) )v E
- 1
.3
zln unbiased
t?/'
estimator
Vart#l
1Og
'
0 is said to
2
J'(X,' P)
<
(fully)efficient #'
?t7
1,,(p)-
wpn
/--/itrfpnr
its
1
.
tariance
equals tbe
p1
11
.f(x;p) il-1
p)
-,'/.2
= (2a)
exp t
17 -.--),.'-ja)
(
exp
--
1k
--l-(x
v
11
a fj
=
(xj- ?)2
'
p)2)
of estimators
Properties
and
d loy. .J(x;0)
d0
An alternative
equality
ojtd
F (x2
=n
..sjd...2
.J(x; )42(j
dp
by independence.
-p)
'--'j
log tx;0)
dp2
(j
,
./'(.xr'
-n
'solving'
'no
':
11
p-=c + jl
i
aixi,
( 12.10)
which includes the above linear estimators as special cases, and determine
7, which ensure that
the values for c, ai, i 1, 2,
is best Iinear unbiased
(BLUE)
of
Firstly,
for
be
unbiased
J'to
estimator
y.
we must have
1. Secondly, since Vartp)
which implies that c 0 and
j ai
)'c2
a? we must choose the ais so as to minimise
j)'- j J/c2
a,? as
j
satisfy
unbiasedness).
1 (for
well as
Setting up the Lagrangian for
1 ai
=
'(p)=0
N)'.
:)'-
Z)'=
12.1
239
min : ltaa2)
tzi
11
ail
F, ai
=
Summing over f,
l1
.
i
'1
1=
jyj
,;g
?1
lf =
1e
jj
ai
12
.,
.E'(#) 0,
=
i.e. E(#f)
0i
1, 2,
?'l.
Yj(0
10g XX;
covt)
lOg
pp
Or
> Inoli-i
varlpf)
',
t?0
1, 2,
.f(x;
1,,(:=E
Yjt
./x;
I0B /tX;-..?6
0
-
pp
l log ftx; 0)
(0 Jp
(12.12)
z;j(0 I0B
j'jy:
04
inequality for
1
I,,(p)J represents the th
)'j
( 12.13)
called the sample information matrix; the second equality holding under the
restrictions CRI-CR3. ln order to illustrate these, consider the following
example:
Properties
In example
/ =
'good'
is a
sample
of estimators
11
'l j
y xi
(12.14)
moment
(f2
11
y (xy-p2
tl
( 12.15)
should be a
examine whether
'good'
( 12.16)
Since
E(Xi -p)2
Ep -/t)2
c2,
and
(-Yj
i
1,
-/@)2
1*=
(7.2
1
+--
1)
-2
=(n
-K
1)c2.
( 12.17)
$.2
'
n
11
t.yj
1 f )-q
1
j2
( 12.18)
S2
'v
z (rl
-
1)
( 12. 19)
12.1
and thus
Var(s2)
c*
1)a
(n
-
2(n
1)
2c*
=
( 12.20)
since the variance of a chi-square r.v. equals twice its degrees of freedom. Let
Y,, and
are efficient estimators:
us consider the question whether
.s2
( 12.2 1)
o
1og ftx; 0)
t?log .J(x;0)
Jp
27:
-jlog
-j
(72
log
-#..j i )
=
-p)2,
( 12.22)
(-Yf
1
p log tx;0)
t?/z
f?
log
tx;04
pc2
:2 1og tx,'p)
(0
t'?p
t2
log
Jc2
tx,'0)
(?y
log
tx;0)
i)y (?(r2
2
log
tx;p)
pc4.
( 12.24)
0
n
2c 4
and
gI,,(p)j -
N
=
Properties
of estimators
This clearly shows that although T,, achieves the Cramer-Rao lower bound
sl does not. lt turns out, however, that no other unbiased estimator exists
which is relatively more efficient than s2; although there are more efficient
biased cstimators such as
d21
1r
n+ 1 i
?1
)
-
(Xi
Y,,)2.
( 12.26)
Dejlnition 4
R'n, n > rn, is called sufficient for 0
,4 statistic T( ):
t/' the
conditional distribution
z(x) z) is independent 0j% 0, i.e. 0 does
not appear in /(x,,''r(x) T) and the domain of f ) does not involve 0.
In example 1 above intuition suggests that z(X)
j)1-1 Xi must be a
sufficient statistic for 0 since in constructing a
good' estimator of p, 1,
we only needed to know the sum of the sample and not the sample itself.
That is, as far as inference about 0 is concerned knowing all the numbers
(A'j Xz,
A-,,)orjust
1 Xi makes no difference. Verifying this directly
by deriving f(x T(x) T) and showing that it is independent of 0 can be a
very difficult exercise. One indirect way of verifying sufficiency is provided
by the following lemma.
.4-
-+
'
.f(x
'
tvery
Z)'-
Fisher-Neyman
The statistic
jctorisation
letnma
factorisation
T(X) is
of the
sujlcient
abrp?
/'(x; 0) =.J(T(x);
p)
only
#' there
exists
(12.27)
wllcrc .J(T(x); p) fs (he Jtrnsfty ynctfono/' z(X) anl depends t)n 0 and
Jl(X),some function(?JX independent (?J0.
Even this result, however, is of no great help because we have to have the
statistic T(X) as well as its distribution to begin with. The following method
suggested by Lehmannand Scheffe( 1950) provides us with a very convenient
way to derive minimal sufhcient statistics. A sufficient statistic T(X)is said to
12.1
243
J(x; P)
'/'(x();0$
x G ..J, 0 (E 0.
( 12.28)
g (x x () ; p)
xtl;
( 12.29)
Xl?) is a minimai sufficient
This clearly shows that T(X)
1
1 A'j, ZJ'=
statistic since for these values of xa g(x, x(); 04 1. Hence, we can conclude
y2)
being simple functions of z(X) are sufficient statistics. lt is
that (X,,,
Xt? separately as
1
important to note that we cannot take
l Xi or
minimal sufficient statistics; they are jointly sufficient for 0 EB (/t,c2).
In contrast to unbiasedness and efficiency, sufficiency is a property of
statistics in general, notjust estimators, and it is inextricably bound up with
the nature of *. For some parametric family of density functions such as the
exponential family of distributions sufficient statistics exist, for other
families they might not. lntuition suggests that, since efficiency is related to
full utilisation of the information in the statistical model, and sufficiencycan
be seen as a maximal reduction of such information without losing any
relevant information as far as inference about 0 is concerned, there must be
relationship along the
a direct relationship between the two properties. A
should
needed
look no further
efficient
estimator
when
is
we
lines that
an
following
provided
lemma.
statistics,
by
sufficient
the
is
than the
=
(Z)'-
Z)'-
)-
statistic
)r
then
&h(X)
T),
conditional
of t(X)
expectation
i.e. the
wt'?rc h(X) F(t(X),/T(X)
given z(X) z.
From the above discussion of the properties of unbiasedness, relative and
full efficiency and sufficiency we can see that these properties are directly
Properties
of estimators
-+
.jlnite
12.2
Asymptotic properties
'
Dehnition 5
4n
estimaor
lt
lim
11-*
LX
p!< c)
P r( p-.
11
/'
( 12.3 1)
w't? u,,?-/lt?t', -+ p.
t./,7(./
This is in effect an extension of the WLLN for the sample mean T,, to some
Borel function (X). lt is important to note that consistency does not refer to
t approaching
The
0 in the sense of mathematical
converence.
with
pl
refers
probability
associated
<
the event
to the
c as
convergence
derived from the distribution of
as n --+ x.. Moreover, consistency is a
very minimal property (although a very important one) since if In is a
7 405 926, n if #r(t,, - ?t>
consistent estimator of () then so is
n +
which
implies that for a small n the difference
t?l
7405926/:) 1,n, n > 1,
might be enormous, but the probability of this occurring decreasing to zero
Ip,,
-
%*
=
Iu
aS
-+
7f
,,
has a well-behaved
Asymptotic
12.2
properties
symmetric distribution for n! < na < ns < rl4.< ks. This diagram seems to
suggest that if the sampling distribution .J(),,)becomes less and less
dispersed as n %) and eventually collapses at the point () (i.e. becomes
degenerate), then
is a consistent estimator of (). The following lemma
formalises this argument.
--+
,,
t hen
(i)
11
--+
lt is important, however, to note that these are only stfjl-lcient conditions for
consistency (nor necessaryl; that is, consistency is not equivalent to the
above conditions, since for consistency Varttdi,ylneed not even exist. The
above lemma, however, enables us to prove consistency in many cases of
interest in practice. lf we return to example 1 above we can see that
P
-+
246
Properties
of estimators
showthat
P
#c p, ;a +.. p ('+.+'reads
'does
not converge
-+
t) +.+ 0, s ++ p
4
;
showthat dl
--+
in probability
0,
+.+
+.+
p
c2 and sl
to'),
(7.2
--+
is a
Dejlnition 6
t is said
zzlnestimator
1im #,, 0
Pr
?1 --#
of 0 (J'
2:)
.%.
and is
tlenoted
by
-+
0.
)'-
(,,
Z)'-
Dehnition 7
'J,is ,s(:fJto be asymptotically normal
zln estimator
'tj
exist sttch that
1.
tf U;,(p) n y 1 )p,, > 1)
,
(1-lw()
sequences
-p
()p))-(
()1;-7,
,,
,,
-+
( 12.32)
.N'(0-1),
',-1 j
( - p ) x(()prjpjj
-$,?'
,,
where
'
i
'v
,,
reads asylnptotically
(12.33)
distributed as' and U(p) > 0 represents the
12.3
Predictors
247
normality
we
(12.34)
-->
This is automatically
satisfied in the case of an asymptotically normal
estimator
for Vartt?,yl J,)(p) and f)#,,) p,,. Thus, asymptotic normality
can be written in the form
=
,,
v'''rl($
,,
p)
p-(p)).
(12.35)
Dlinition
1
.l.,(p) lim -1,11,,(p)
cx7.
l
=
(12.36)
--.
0j%
the Cramer-Rao
J 1og f (xf,'p)
d0
'v
ttx;
,
1 x/(27:) expt
-.(x
-
@2,0 c R)
Properties
248
of estimators
flx,
1-l
..ftx-'
p)
,x,,;
.
(12.37)
p).
/(
): 0- --+
'
.kL
(12 38)
.
').
p11
=-
&f
( 12.39)
=kr
tgood'
igood'
predictor of X,, +. 1
?1 +
(,.
(12.40)
The random
variable
.1
/(,,) is called the predictor of X,, + 1 and its
prediction
value.
Note that the main difference between
value the
testimating'
and
prediction
estimation
is that in the latter case what we are
(A-,,+j) is a random variable itself not a constant parameter p.
/(t) we
In order to consider the optimal properties of a predictor
+ j
define the prediction error to be
.-,,
,,
1, +
.Y,,+
-',,
(12.41)
1.
Given that both -Y,,+ 1 and X,, 1 are random variables en + I is also a random
variable and has its own distribution. Using the expectation operator with
respect to the distribution of c,,+ j we can define the following properties'.
Unbiasedness. The predictor .)Q,,1 of X,, + l is said to be unbiased if
(1)
+
f (G,+
1)
0.
(12.42)
kn
+ 1
-k
H
zl + 1
E(X
?1
of X,+
)2KEIX
11 +
is said to be minimum
1
-X
n +
)2
Predictors
12.3
11 .1
'v
N 0, 1 +-
249
we can
deduce that
(12.44)
Important
concepts
Questions
Define the concept of an estimator as a mapping and contrast it with
the concept of an estimate.
Define the finite sample properties of unbiasedness, relative and full
efficiency, sufliciency and explain their meaning.
kunderlying every expectation operator J) ) there is an implicit
distribution.' Explain.
Explain the Cramer-Rao
lower bound and the concept of the
information matrix.
method of constructing
minimal
Explain the Lehmann-scheffe
sufficient statistics.
Contrast unbiasedness and efliciency with sufficiency.
Explain the difference between small sample and asymptotic
'
4.
properties.
8.
9.
to
of estimators
Properties
10.
Explain
the
of asymptotic
concept
efficiency
in
relation
to
.'(p,,)
--.
Exercises
Let X EEE(A'1 A-c,
X,,)' be a random
consider the following estimators of (?:
.
.'
;a
-s
j
=
jXj
12
2,aA-,, #.k
J-a.- +
1
1,
---j-
n- 1
'
jl ix i h ;, + l-a
i I
,
rl
1
jrx1- t?--,,
)
nl
cuc -.
...
(1'
and
Prt,,
n)
??
cuc -
n+
l
11-f- J
022
?1
Xll
tl i
as an estimator of c2.
Derive the sampling distribution of *2 and show that it is an
(i)
unbiased, consistent and fully efficient estimator of c2.
12.3
Predictors
Additional references
Bickel and Doksum ( 1977)) Cox and Hinkley ( 1974); Kendall and Stuart
Lloyd (1984)) Rao ( 1973)., Rohatgi (1976); Silvey (1975); Zacks (197 1).
(1973)',
Estimation 11 - methods
13.1
is emphasised
Part lV.
in
13.1
1)
11
l04
1*crr
n,
?1
2)
())
g),-
gf(#)1 '
'.
liob
ppk
0,
1 2,
,
n'l.
ln this form the least-squares method has nothing to do with the statistical
model framework developed above, it is merely an interpolation method in
approximation theory.
Gauss, on the other hand, proposed a probabilistic set-up by reversing
the Legendre argument about the mean. Crudely, his argument was that if
X EE (A-l Xz,
X,,)' is a random sample from some density function .J(x)
value for all such A-s, then the
is
the most representative
and the mean
normal,
density function must be
i.e.
,
.k,,
f'tvxl
''
=
'
----
'
exo
'
'*(2z:)
cx
'$.'ggio)
:=
-h cf
cf lbrl, c2),
'v
1
x2
- 2c c
went on
i zzu l 2,
,
1, 2,
n.
u,
Methods
'
'reads'
.f
()'f;#)
cx
y,(aal'exp
(-vf qil04jl
-
assumption
the probabilistic
by transferring
r.v., and
1
,o.2
from
0GO
;j
consider
3' (
==
.r
.3..'..2
1,
.J.?,,)/
lg
,1
2
/'(y,. 0) (27:c2)- n.'' exp - --j
'
2c
=
)( (yf
-
gi(0))l
Yl1 El'-f?f(p)12.
=
yi
(lkxki+ t:i
1, 2,
k=1
assumption
) 0, Vartl)
2
,
i,j
1, 2,
7.
( 13. 10)
ln some of the present-day literature this model is considered as an
extension of the Gauss formulation by weakening the normality
assumption. For further discussion of the Gauss linear model see Chapter
18.
of least-squares
The method
13.1
,f 0 1 x 1 i .1.t?i
::=
12
:=
1 and the
1.1
.
The least-squares
(13.12)
( 13. 14)
sample
t71.
of
estimator
is the least-squares
1, i
1, 2,
n,
0-1
(1/''n)j)'=1
i.e. the
.pf,
mean,
Given that the xjfs are in genera l assume d to be known constants 0-1 is a
.J',, of the form
linear function of the r.v.'s )'I
,
19
'-'
01
t. l ). f
.
..
where
Hence,
11
??
F(;1 )
)'- c'fuE'tyfl
-
)'-1 cfxl
i03
( 13.16)
t?l
,
i.e. t?:
- is an unbiase d es timator of p1 Moreover, since (p-1 - ()1 )
.
'l
Y)' I x.c f,
r
assumptions
relating to ;f if
of ?l has the smallest variance
Methods
theorem,
13.2
From the discussion in the previous section it is clear that the least-squares
method is not a general method of estimation because it presupposes the
existence of approximating functionsg/p), 1 1,2,
n, which play the role
of the mean in the context of a probability model. In the context of a
probability model *, however,unknown parameters of interest are not only
associated with the mean but also with the higher moments. This prompted
Pearson in 1894 to suggest the method t?/' moments as a general estimation
method. The idea underlying the method can be summarised as follows:
A-,,)'is a random sample from .J(x;0),
Let us assume that X (11,X1,
Rk.
The
of
moments
0c
r > 1, are by definition
mw
.J(x,'0), p'v>
of
unknown
since
functions
the
parameters,
=
'(.xr),
:'
.X'.'/-(.X;
/t'r(#)
=
P)dx,
(13.18)
0i
gipk p-,,
=-
1. 2,
Jtk'). 1
''''''-
(13.19)
k,
where the gis are continuous functions. The method of moments based on
the substitution idea, proposes estimating oiusing
li
gjtn'l 1 n'l2,
n'lkl,
1. 2,
(13.20)
k,
represent
;)'-1, 2,.Yr, r > 1,The
where r#l,= ( 1 n)
estimators of Vi,i
the fact that if pj
Jt h;.
.
-+
k.
(13
p?r
,
.2
1)
it follows that
a S.
.
3332'-'i -+ t, i
9
(see Chapter
'v
'
...
'i..
',
( 13.22)
10).
Exanvle
Let Xi
(I.:2
2,'
Np, c2), i
1, 2,
?,
then pj
p, y
c2 +/t2
and
l';j
d2
Iikelihood method
The maximum
13.3
1 12
(D1
ma -
1
-
tz
)X ;
(Y/j
11 j
?1
)-')(A'
--
i -
Xtlt).
Formally, the above result can be stated as follows: Let the functions
k, have continuous partial derivatives up to order I on 0/z;(p),i 1, 2,
Jacobian
of the transformation
the
and
=
J(
--.-P
dett?(p
If the equations
.j
p;(@
/t q)
pk)
( l 3.23)
+ () fo r 0 e,o.
p1g,
1, 2,
--+
'
13.3
The maximum
likelihood method
Methods
in a series of papers in the 1920s and 30s and extended by various authors
such as Cramer, Rao and Wald. In the current statistical literature the
method of maximum likelihood is by far the most widely used method of
estimation and plays a very important role in hypothesis testing.
The likelihoodfunction
(ii)
(fx;
#), 0 6 O );
(xY'1X1,
,
tx,'04,
where X takes values in .I= R'', the observation space. The distribution of
the sample D(x1, xc,
x,,; 04 describes how the density changes as X takes
different values in I for a given 0 6 0. In deriving the likelihood function we
reason as follows:
.
Example 1
.Jx; 0)
pX(
=
?)1-'Y,
(13.24)
0, 1,
'true'
was
what can we say about the two possible values of 03 Intuition suggests that
since the average of the observed realisation is 0.7 it is more reasonable to
assume that x is a sample realisation from flx; 0.8) rather than tx,'0.2).
That is, the value of 0 under which x would have had the highest
ilikelihood' of arising must be intuitively our best choice of 0. Using this
intuitive argument the likelihood function is defined by
L0; x)
k(x)D(x; p),
0 (E (1),
-+
g0, ).
Evz
(13.26)
Iikelihood method
The maximum
13.3
259
Slikelihood'
Example 2
Let X EEE(aYj
the randomness
Dlxl
x,,;
p)
,,
1-l
exp
x?'
-
(2ap)
f=1
11
-''l
= (2ap)
2p
XI
1-I
exp
J!
- 2p
S/
V(X)(27:P)
-
x/
p1
2
j
J'JCXP
.j
j-j
0.5
0 1
() 2
(x; 0 )
l (0 ; x)
L (0 ; x : )
0
4
L (p; xz)
o (1
X1
(2 3
5 6 7 x
Xz
(b)
Fig. 13.1. Deriving the likelihood function from the distribution of the
sample.
Methods
sl'()re
( 13.27)
/ifl'lt'rff.?n,
incorporate the same information as 1-(p,'x) itself; if we have any one of the
functions L(0; x), 1og 1-tp; x), s(p; x) we can derive the others.
In example 2 above
and
Hence.
( Iog fatp,' x)
-- .-..(0
1
n
+
j
20 2 p
d log L0'-'- x)
(.
(1l
?'
-.
dp
--
y x,?
.
1 ''
().
log
- x-;j
j
uz:
0) i
n
X) +
(y*
and
(2)
The maximum
Iikelihood estimatov
ML f)
Given that the likelihood function represents the support given to the
various 0 (E (.) given X x, it is natural to define the maximum likelihood
O such that
estimator of 0 to be a Borel function :
=
.%
--+
L4#; x)
(13.29)
13.3
The maximum
Iikelihood method
L (9 ; x)
iI
(
Fig.
,
13.2. A likelihood
;
function.
Methods
262
Note that
log L,
(13.30)
P log f-(p; x)
EEEs(#; x)
00
=0,
(13.3 1)
''
Ar/ 0,
-j+ 20 a i )g
1
=
lg
p= -n
11
Zx
2
f
(for a maximum).
Example 3
Let X N
(11,
.(x; 0) 0x - t?=
11
L(0; x)
/(x)
7) k(x)/'(x1
./xf;
and
=
d log L0;
dp
x)
''
n
=.-p
=>
F. log xf
r1
?t
= n N log xf
i
=0
--. 1
x,,) - P-
13.3 The
Iikelihood method
maximum
Before the readerjumps to the erroneous conclusion that deriving the MLE
is a matter of a simple differentiation let us consider some examples where
the derivation is not as straightforward.
Example 4
Let Z
(Z1, Za,
N
i-e.
Zf
Z,,) where
sample from
(A-d,F;.) be a random
1 p
0
0
exp
1-pc)-2,
-k (
l'(x,,;p)-f
log 1-(p; x, y) c =
1
j
n
log 27: -jlog
(1
1
-p
?1
- 2(1 -
-cpxy+l,zl
(x2
-pc)
p2)
-$'/),
)- (x,?-
2/?xfyi +
.1
11
d log L
dp '
,?(
-
-2)
+ )'/)
)(()txk?
(1+p c)
i
.y
2(1 - p 2 )
/' - /'
2)
(1 - p c
(1 p c)z
-
/1
n/(1
-/2)
/2)
(1+
,,
Z.
xfyf
11
x/ +
i
()
11
-
i -Pi
.''
))
=
.vl
0.
This is a cubic equation in p and hence there are three possible values for the
MLE and additional search is needed to locate the maximum value using
MLE'S
was a
numerical methods. The use of numerical methods in deriving
major drawback for the method when it was first suggested by Fisher.
Nowadays, however, this presents no difficulties. For a discussion of several
numerical methods as used in econometrics see Harvey (1981).
Example
.5
fatp,'x)
0-
''
if 0
:$
xf
% 0, i
1 2,
,
1 0 where 0 %
tx',t?)
=
n.
Methods
1 (0 ; x)
--
-.
function of example 5.
Example 6
Let X (-Y1 Xa,
A',,)' be a random
The
().
likelihood
7:
function is
x
=
sample from
/'(x;0)
e-tx-t') where
.5
',,)
Iog L
=0
likelihood method
The maximum
13.3
265
Or
?1og L
. lp,
(), i
a,zz
c=,
12
,
is negative
?loe L
t?pt?#'
? log L
H(
lpf t?oi.-j
o-t.
i,j
;tt
1, 2,
i,j
z: 0.
<0,
Example
Let X EEE(Ar1 X2,
,
f*; p,
'
J2)
1 x
- -2 .
exp
(2zr)
tr
tr2) G
0-(p,
x c R,
Ar,,)' be a random
2
,
tr
J! x (!Rs
.
and
log L
P log L
t?p -
1
=
y (xj-p)
/1
2t7.a i-l
--
wthe Mt,E is y-
?'
--
plogL
t?c2 = -
'1
o'
--
+
2c a
2c
,y
11i
) xi
=
-/:)2=0
Z (xf
f-1
T,,,
(xj-p)
2
,
( 13.33)
266
Methods
Since
and
-/
H(#)=
z'H(l)z<0
and
- 24
(3)
./'(x;03.
One of the most attractive properties of
MLE'S
is invariance.
Invariance
#be a MLE
Let
'
?1
p-= n
i
--.
)(
=
log Xi
11
13.3
The maximum
of
4=j
is
likelihood method
=-
MLE'S
Z 1ogXi.
=
(13.34)
'(:())##()).
))'
'w
full-eflciency
Unbiasedness,
'w
,,
,,
I,,(p)H E
-p2 1ogz-(p;x)
0 ?g
'JY
,,
and gI,,(p)(1-
N
=
0
we can see that Var(,,) achieves the Cramer-Rao lower bound. On the
other hand, the MLE of c2, :,2, (1 n) )'=: (A'f X,,)2as discussed above, is
not an unbiased estimator.
The property mostly emphasised by Fisher in support of the method of
maximum likelihood was the property of sufficiency.
=
268
Methods
Sujhciencj'
'can
=()-'
lj
y-
11
,1
X'i
d,2,
=
jj
11
..
)'')(A'
y-
Xt,)2
1
-n .
log L
(0
t?p
!!,
P
-.+
I .,
(p)
( 13.35)
Asymptotic properties
MLE'S
(IID casej
as seen
above, their asymptotic properties provide the main justification for the
almost universal appeal of the method of maximum likelihood. As argued
below, under certain regularity conditions, MLE'S can be shown to be
consistent, asymptotically normal and asymptotically efficient.
Let us begin the discussion of asymptotic properties enjoyed by MLE'S
by considering the simplest possible case where the statistical model is as
follows:
probability model, *
(i)
1 p), p (E O), 0 being a scalar;
EEE
sampling
model,
X tA-j
Xn)' is a random sample from flx; 0).
(ii)
of
Although this case is little interest in Part 1V, a brief discussion of it will
help us understand the non-random sample case considered in the sequel.
conditions
The regularity
needed to prove the above-mentioned
MLE'S
asymptotic properties for
can take various forms (see Cramer
( 1946), Wald ( 1949), Norden ( 1972-73), Weiss and Wolfowitz (1974),
Serfling ( 1980), inter alia). For our purposes it suffices to supplement the
regularity conditions of Chaptcr 12, CR 1-CR3, with the following
condition:
./'(x.'
13.3
The maximum
likelibood method
(CR4)
wllcre the
/1a(x)are inteqrable
/11(x)an
functions
12
tptpt?r (
vt zt
,
),i.e.
tlntl
X
P)dx <
lls(x).f(.'.t7;
satisfy them.
=0
J3 1og f
Jc 6
3x2
+ --jc
c
--.
as c2
--+
'
i.e. the third derivative is not bounded in the open interval 0..:: (72 < :;c. ;
condition CR4 is not satised (seeNorden ( 1973/.
Under the regularity conditions CR 1-CR4 we can prove (seeSerfling
x)1 (0= 0 admits a sequence
1980))
(
that the likelihood equation y log 1a4@,
such
solutions
1)
that:
n>
of
t,,,
Consistency:
a S.
.
tt
P
-+
pll
(weakconsistency)
p() N(0,
x,/ntqtl'-,
-
.!'(p)-
').
270
Methods
enlcienc
-,45b'mptobic
p.'
1
1(p) lil'n - t,(p)
Fl
co
n
=
-.
variance
of
@,equals
(13.36)
where Op(n) refers to a11 tenns of order n (see Chapter 10). The above
expansion is based on CR2 and CR4. ln view of the fact the log L0)
j)'- 1 log
0) and the .Jtxi;0), i 1, 2,
n, can be interpreted as
functions of 1ID (independentand identically distributed r.v.'s) we can
express the above Taylor expansion in the form
=
.(xi;
1
-
ni
''
t? log
ftxf,'p())a.s.
-+
P0
=-
p Iog
.(x;
r.v.'s
p())
t?0
(seeSection 9.2)
(13.38)
(J
.
11
a. S
a. S.
-
0o) -+ 0
To show asymptotic
expansion by ljw/'n
or
-+
,,
0v.
(13.40)
plog ytxf;p,)
+ Op(vi).
%j 1n )
#,,Un(#,,
i
1
V
oo
n
=--,u
(13.41)
13.3 The
maximum
likelihood method
Using the central limit theorem (CLT) for llD nv.'s (seeSection 9.3) wc can
show that
1
n
txf;)())
p log
Given that Bn
a S.
.
-.+
X(0, T(P)).
'--
PP
(13.42)
x/nlh 0v)
').
Nl, 10)-
w,
(13.43)
1604 E
=
lx,'otj
(?log
(?2
(?0
1og
tx,'0vj
t?pc
(13.44)
the information for a single observation. We know, however, that in the lID
(3462
equals n
case the sample information matrix 1?,(p) 1;g0 log L(0vj
times 1(04,given that each observation contributes equally to the sample,i.e.
nl(0) l',f0j. This implies that the limit of the Cramer-Rao lower bound is
1(p) bccause
=
1
lirn - 1,,(p)
1
=-
1,:(p) l(0j.
=
It must be stressed that this is only true for the llD case. ln more general
cases care should be exercised in assessing the order of magnitude of 1,,4?)
1,:(J1)=
and
agj
1(c)
1
=
2c
,2
x/ ntd,,
I1-1example
c()2
4
X(0, 2c().
'.w
3, 1
?,
n po2and
(po)
=
v/'nq
,,
0())
x.
although /,rcannot be derived explicitly, this does not stop us from deriving
its asymptotic distribution. This takes the form
-p2)2
&nb',
-pq
.t%
(1
0.
1+ # a.
Methods
The above asymptotic
J
(the
a .s.
?k)',
and
-,
--+
,,
zero subscript
''nk
(ii)
p)
-.
',
N(0, I(p) -
'v
conveniencel;
1)
,
(13.46)
where
( log
l(po) E
-
polh
.(x;
.(x;
oo
:2 1og
'j'j'
.?.lo..y
0v)
))
pp
).
0o)
.(x;
vo pg
( 13.47)
That is,
xz'noin
0i)
'
aN,
El(p)!-f),
1, 2,
1
I(p)J being the
k,
( 13.48)
cov(#,,)
-
#,,,the
matrix difference
I(p) - 1 > 0
(13.49)
lJ
#l(J,,
-/t)
Y,
/n(2
'dddj,...
1.lr
-c2)
.s.
c2
2c4
'
This shows that asymptotically *,2, ( 1,,7n) )'=j Xi Y,,)achieves the lower
bound even though for a fixed ?? it does not (see Chapter 12).
=
(5)*
Asymptotic properties
non-llD
twNe)
applied.
The maximum
13.3
likelihood method
case presents no particular difficulties apart from the fact that the result
11(#) 1,:(P) no longer holds. This in turn implies that the asymptotic
variance of
can no longer be 14$. This, however, is not such a difficult
problem because it can always be replaced by the assumption that 1,,(?) is of
order rl. That is,
=
,,
lim -N 1,,(p)
,1
--.
<
(x)
( 13.50)
.:x)
1',04
=
z log
?'
j'txf; p)
'
Ppc
= 1
kz
x-?
X2
A-1
-+
F(Xa/c(xY1)),
Ex
c(A' 1
l1
'
c(aY1, Xz,
Xi- 1), i 2, 3,
n, denote the c-field generated
1
z%i
r.v.'s
A-1,
Xi
By
construction
includes only the new
the
by the
1.
and
satisfies
properties;
Xi
the following
infonuation in
Let
.%
(i)
(ii)
.E'(Xi f.?.
iEzk
Rj
f- j
.@,
2 3
<
ij
,
(13 52)
n;
2 3
,
(13 53)
.
That is, the Xis define a martingale difference (seeSection 8.4). Hence by
conditioning on past infonuation at each i we can reduce the non-random
Methods
,Xn)' to a martingale orthogonal sample X,,EB (X1,
asymptotically can be treated in a similar way as a random
sample. For this we need to impose certain time-homogeneity and memor)
restrictions on t-Yu, n > 1) so as to ensure for example that
sample Xu EB (A-1
,
T,,) which
n - ESn)
(7.2
-+
as n
(13.54)
:c
-+
where Sn
n r 1) behaves asymptotically as the
7=1 Xi.ln such a case
of
martingale
with
each
variance c2. This in turn enables
differences
sum n
various
the
limit
theorems (see
Chapter 9) to derive a number of
us to use
L
asymptotic results needed. Heuristlcally, this enables us to treat the
parameters oiin the decomposition
t5',,,
-/
( 13.56)
.X().
( 13.57)
d
loc L..(04
d0
' '
'
''
Y - log fxif/xk
i
1 d0
,
=-
xf
1,-
0)
( 13.58)
is the random quantity we are particularly interested in. Observe that the
terms in the summation can be written as
d
dp
log wJtxf/.Yl,
A'f
1;
P)
j-j Elog
Li(0)
Li 1(P)(1HE zf(@,
-
'-log
( 13.59)
which implies that
d
d
Lf(?) 1ogLi
log fo,(p) F
d0
d
0 glog
1
ir
and assuming the expected values exist,
d
E --- 1og flxi x 1
xi 1 04/V' j
dp
''
''
1(p)j
EEE
j 1 zftp)
-
,'
z-.tply,z-lj-st(jdpIog
Iog
-,'(.yjd
.r-,--,(?)yz-:)-e.
(13.61)
13.3
Iikelihood method
The maximum
1)
and hence kf(d dp) 1og Lnoj,
These imply that Eziolf/gi
0, i 1, 2,
cftpls
martingale
and
1
is
the
are martinqale dlfferences
,A,,n > ) a zero mean
variable
Defining
the
random
8.4).
(see Section
(r.v.)
=
( 13.62)
we observe that it corresponds to the Fisher information matrix defined
above. Moreover, under conditions similar to CR 1-CR3 above,
( 13.63)
and 1,,(?) can be defined alternatively
1
,1
P)
,1(
dgf ( pj
Vf o
'
'
as
,,.'
,,
'
( 13.64)
.'''
j.j
,''
11
I7,,(p)(1
- F,1
'
0,
.:'f(p)
--+
(13.65)
provided 1,,(p)
as
property for MLE's.
':x)
-+
,LJo.
-->
consistency
The likelihood equation
0, i.e.
lim #rtl,,
!1 --
Moreover,
)'-j
- pp< c)
'zl
zf()=0
has a root
,,
( 13.66)
0.
restrictions
( 13.67)
a.S.
provided /,,(p)
i.e.
consistent,
'vz
-+
a. S.
.
,,
--.
as n
-+
:t.
,1
is also strongly
for
Methods
the consistency of
is the condition
MLE'S
that
( 13.68 )
tilhlu!
tc',,,
t!,,(tt p)
-+
N(0, 1).
( 13.69)
E.l(p)jl(
11
In
l/?tz
p)
11
which
E/,,(p)(I'l
ensures
x(0, 1).
11D case
,1
Fl
i=1
(jj
11
ctp)
J=1
log
dp
Jjxi; 0)
and
,,
f,,'(p)i Fl
,,
'(z,?(p)
?i
1(p) E
=
fxi
d().
) )i
1
-
(j
-'.u
jog .--,-7.--yfx
dp
.,
o)
; ?)
.-
random
sample case
11
n1$04-
normality
p) N(0,
(,?.f(p)/(t',
-
1)
that
13.3
The maximum
likelihood method
( 13.76)
of n. lt is obvious that in this
uo
-+
s automatically satisfed.
ln the general non-random
asymptotic normality result
like an analogous
N(0, Iz'(p)),
c,,(t - t?)
-
(13.78)
For
g1,,(?)1l.
Case,
l (0)p
=
11
n1(@
0(n)
and
'
c',,
/''n
-bh--
this consider
!1
P)
1?11
lim
-. z
11
ln cases where the sampling model is either an independent or a nonran dom sample the order of magnitude of 1,,(?) can be any power of U.For
most cases of interest, however, it suftices to concentrate on cases where
1,,(:) O P (n)
=
''t8)
-
-+
11
I .(#),
( 13.80)
1.
with V(p)
X. b'tlptotic
Under
certain
llt?rFrl/lf
!).'
regularity conditions
any consistent
),, is asymptotically normal, i.e.
solution
of
likelihood equation
(.f((?)/(t- l1 .
p)
x(0,1).
(13.81)
Methods
logL
diag
.
'
?01
p log L
(''?
pk
P log L
where .
(oi Under certain regularity conditions
D ? (p)( 11
where
D 11- 1(p)A11(1)D 11-
( 13.83)
1(p)
c(p)
-+
p2 jog L
'oi t'?pj
x(0,c(p)-
'v
Cr
A,,(p)
( 13.82)
cxs
l),
-p)
and
--+
i,)'
Asymptobic efhcienc
1,
k.
'
--
,,
'v
P-(p) 1.0) =
(13.84)
(6)
,,.
a .S
-+
0o
lim
Pr
?1 --*
0-?1 0()
-+
2t2.
0v
!1
--*
::J;
<
1 ;
4 1
=
equation
279
Iikelihood method
The maximum
(2)
n(
0v) x N(0,
Asymptotic c//ccncy.
V(p)
Note that asymptotic
1.(p) -
of 0o
v(p)).
l,, of
0v
normality
x''nt
MLE
unbiasedness,
i.e.
,,
Example 8
Let X,, EEE(.Y1, X1,
A',,)' be a non-random
time-series model:
Xi
tzztf
+
- 1
Uf,
where b'i V(0, c2); a normal white-noise process (see Chapter 8). The
distribution of the sample D(X,:; p) is multivariate normal of the form:
'v
X,,
'v
N(0, c2V,,(a)),
where
This is because
k 0, 1 2
=
Ial
D(x,:; 0j
=/(x();
0$
i
1-lTtxi/xl,
.
xi -
-,
04
280
Metllods
(ii)
assume
(iii)
?,
y 1)
of
,.
'
-a)2)
that Xv ,vX(0,c2/'(1
.t#,,,
that Xv
as
t?log L
pc2 =
Hence, the
1
2c4
2c2
?'
jq1
of a and c2 are
MLE'S
y,2
=
t72log L
2
t?a
lo L
472
g
(.7a 7c2 =
:2 log L
. c4
i-1
i
2c4
:2 log L
oyc
p2loe
<'
t'?a:t7.2
11
)'(1 fzkri-
)2,
Xi .j
'
1
''
.1
n i
.ya.a
Taking expectations
E
Xi -aA'f-1)2=0.
jj xixi
=
c6
) xi
-a
i
1
-
-1
...,..
''
)g(xi - xxi
=
j)2.
..
1
c,
''
yyj
Exi-
1 (n 1J
c2 (1 a2)
'
'
1
-
(n
a,
-
1)a
'
'
(7.2
''
jya
-a2)
(1
0;
ac
1 aa
-
1,
likelihood method
The maximum
13.3
02loq L
E-
n o ncl
+
2c*
c
<'
.-
Jc*
of
281
0
n
.-u;
2c
'IG
ti,, we
martingale
defines a zero-mean
(,%,,n > 1). Using
martingales and Markov processes it follows that
42* 11,
-J--1
)(
zr
,?-
(r;.
--+
1 xc
for
;)
l2'
li
--..
and
-+
the WLLN
respectively. Hence
p
&,, x.
-+
Using a similar argument we can show that :,2,--+ c2. For asymptotic
normality we need to express &,,in the form
&
jj
'
n(
.-
,,
deducing that
a)
=;:
x.''rl(4u
/1
--
a)
'w
2
f- l
j
N
,,
c2)
'.w
N(0, 2c4)
> 1
(see Anderson ( 197 1)). It is interesting to note that in the case where la!
the order of magnitude of 1u(a) is no longer n, in which case we need
a different normalising sequence for asymptotic 1)lnormality. In particular for
the sequence
it follows that
n y 1) where c,, ()'- 1 Xh
tc,,,
(.,,(.4,,a)
-
xN(0,c2)
1a1
Methods
Important
concepts
Questions
Explain the least-squares estimation method.
Explain the logic underlying
the method of moments.
Why is it that the method of moments usually leads to inefficient
4.
5.
7.
8.
estimators?
Define the likelihood function and explain its relationship
with the
distribution of the sample.
Discuss the relationship between the likelihood, log likelihood and
score functions.
Define the concept of a MLE and explain the common-sense logic
underlying the definition.
State and explain the small sample properties of MLE's.
sample X can be transformed into a
Explain how a non-random
martingale orthogonal sample X.
Explain why
log Ln(()), 9,,, n > l ) defines a zero mean
#l, define a martingale
difference.
martingale and the zf(p), i 1, 2,
State and explain the asymptotic properties of MLE's.
Discuss the relationship
and
between the order of magnitude
asymptotic normality of MLE's.
If we were to interpret the likelihood function as a density function
what does the MLE correspond to?
ttd/dp)
=
10.
11.
Exerdses
;i ,
li
'v
N-1(, c2)
1,
rl.
with
13.3
likelihood method
The maximum
283
PX
JlX
P)
'
:h
%-'
x!
(i)
4.
0, 1.
d log
E .
Let X
(A-1
,
d2 1og J-,t./:;x)
(jjta
tr2)
xc
exo
-
(2z:)
j.
x
lo2 2cu:
m
-
--'
2
,
distribution
(X',
8.
A-,,)' be a random
density function
with
tx,' ?'n,
Let X =
cjy
distribution
(i)
(ii)
f-tJt; x)
with
density function
P) P e - 0x
=
=p,
expt
.Jtx;p) ocf =
pxc), x > 0,
284
Methods
Xf
H(X')
X,i
# 1i
A-2i
c2
/?c l c 2
pc 1 c2
c22
..v../V' 0
i zzz 1, 2,
n.
=0
Additional references
Bickel and Doksum (1977/ Cox and Hinkley ( 1974); Kendall and Stuart ( 1973);
(1984); Rao (1973);Rohatgi ( 197$,. Silvey ( 197,5); Zacks (197 1).
Lloyd
CHAPTER
14
'good'
'good'
14.1
space
.%
'
tl'
(i)
),J(.X', ?). ()6 0)
X (Ar1 zY2,
-Y,,)/is a random sample, from .J(x',@.
(ii)
The problem of hypothesis testing is one of deciding whether or not some
=
286
-.+
(ii)
n 40 is a
=
.(Ar; 0)
(.11,Xz,
8
.
exp
(2zr)
Ho1 0= 60
(i.e.A'
f/1 : p# 60
(i.e.A'
against
1 A- - p
j
,
peg
>
E0,100j ;
.Y,,)',
sample from
random
'v
.(x;
$.
N(60, 64:,
(,.)0
=
t60)
(91
g0,100)
(60)
.
earound'
The acceptance
Co
C'1
and
reqion
1X,, 601 G
l)
tx:IX, 60I>
l)
ft
x:
is tbe
rejection
region.
The next question is,showdo we choose ;?' If ; is too small we run the risk of
rejecting Hv wpn l is true; we call this tbpe I error. On the other hand, if c is
too large we run the risk of acceping Ho wJlt?nl is false;we call this type 11
error. Formally, if x 6 C1 (rejectHo) and p e:(6)0(Ho is true) - type I error; if
x 6: Ctl (acceptHv) and 0 6 (91 Ho is false) - type 11errortsee Table 14.1). The
Table 14.1
Ho true
Hj false
Ho accepted
Hv rejected
correct
type I error
correct
type 11 error
14.1
Oa
0.
(14.1)
we postulate
Sj which takes
the alternati,e
H j : 0#0..o
or, equivalently,
H j : 0 e:0- 1
BE
0- - (-)()
(14 3)
(X;
P),
H 1:
/(X;
P), P E:0- 1
.%
.g)
btoo
#K(x
e:C1; p e O0)
a,
Prtx c Ca; 0 (E O 1)
j.
( 14.6)
Ideally we would like x j= 0 for al1 0 G O which is not possible for a fixed n.
Moreover, we cannot control both simultaneously because of the trade-off
between them. tl-low
do we proceed, then'?' ln order to help us decide let us
consider the close analogy between this problem and the dilemma facing
the jury in a trial of a criminal offence.
=
288
The
Hypothesis
HL :
with their decision based on the evidence presented in the court. This
evidence in hypothesis testing comes in the form of * and X. The jury are
instructed to accept Ho unless they have been convinced beyond any
reasonable doubt otherwise. This requirement is designed to protect an
innocent person from being convicted and it corresponds to choosing a
small value for a, the probability of convicting the accused when innocent.
By adopting such a strategy, however, they are running the risk of letting a
off the hook'. This corresponds to being prepared to
number of
relatively
high value of j, the probability of not convicting the
accept a
accused when guilty, in order to protect an innocent person from
conviction. This is based on the moral argument that it is preferable to let
off a number of guilty people rather than to sentence an innocent person.
However, we can never be sure that an innocent person has not been sent to
prison and the strategy is designed to keep the probability of this happening
very low. A similar strategy is also adopted in hypothesis testing where a
small value of a is chosen and for a given a. p is minimised. Formally, this
amounts to choosing a* such that
'crooks
and
a(@ %a*
for 0 e:O()
#r(IT,,
-601
> ;,'
(14.7)
p= 60)
0.05.
a, say a*
(14.8)
=0.05,
then
(14.9)
we know that
T,
'v
c2
(9.2
N p,
-u
where
ztx)-
.S,,- 60
jocyj
)
-
64
=
4:
(i.e.when
.x(()
,
!),
1.6,
( 14.10)
Hv is true)
( 14.11)
14.1
-601/1.265
IY,,
-
Pr
60I> c,;
1.265
p 60
=
0.05 where
cx
c
.
1.265
Given that the distribution of the statistic z(X) is symmetric and we want to
determine cx such that ##1z(X)1 > cz) 0.05 we should choose the value of c'a
from the tables of N(0, 1) which leaves a*/2 0.025 probability on either side
of the distribution as shown in Fig. 14.1. The value of ca given from the
N(0, 1) tables is cx= 1.96. This in turn implies that the rejection region for
the test is
=
( 14.13)
C1
(14.14)
60I
(i)
(ii)
::
-ca
r'
ca
(14.13).
Iz(X)I
=0.05,
tbad'
'good'
tsolve'
14.2
Optimal tests
-##x
Dejlnition 1
Theprobability of rejecting So w/lcn falseat some point t?1601,
Prx s C1; 0 pl) is called the power of the test at 0 0j
=
i.e.
Note that
G C1;
Prx
0 p1) 1 - #r(x
=
C0; 0=
t?1)
=
j(p1).
In the case of the example above we can define the power of the test at some
pj eO1, say )l 54, to be PrE(lT)-601) 1.265 > 1.96; p= 54q. 'How do we
calculate this probability'?' The temptation is to suggest using the same
distribution as above, i.e. z(X) N (X,, 60)/ 1.265 N(0, 1).This is, however,
wrong because 0 is no longer equal to 60,.we assumed that p= 54 and thus
(-Y,, 54)/1.265 N(0, 1). This implies that
=
'v
'v
z(X)
N(
,v
(54-60) jj
j.a65
for
.54.
Pr
j.a6j
y 1.96,. 0=
+ Pr
541=
Pr
(T,,- 54)
j.r6j
(X,, 54)
(54-60)
> 1.96
-
1.265
1.265
1.96
=
(54-60)
j.z6j
0.9973.
Hence, the power of the test defined by Cj above is indeed very high for
p= 54. ln order to be able to decide on how good such a test is, however, we
=0.8849,
=0.3520,
=0.05,
=0.3520,
=0.8849,
=0.9973.
As we can see, the power of the test increases as we go further away from
p= 60 Hjl and the power at 0 60 equals the probability of type l error.
This prompts us to define the power function as follows:
=
Dejlnition 2
Dehnition 3
is dejlned to be tbe size
.#(@
a
the test.
=
maxpoo,
(t)rte
signcance
leve of
ta
Dejlnition 4
(f)
max
(ff)
@0) y
wcrp
0 (2 (90
kt.#*(?)
a,.
J?*(p)
is the
;)
.
.,
(x:z(X) > 1
(14.16)
.645)
1.645
f (z)
C(
4-
=(x:z(X) G 1
'Z
-1
(14.17)
.645
.645)
/ (z)
C(
-0.03
0 0.03
regions
tx
=(x:1
region
Iz(X)1> 1.96).
( 14.19)
To that end we shall compare the power of this test with the power of the
size 0.05 tests (Fig. 14.2), defined by the rejection regions. All the rejection
regions define size 0.05 tests for HvL t? 60 against H3 : p# 60. ln order to
tgood'
and
discriminate between
tests we have to calculate
their power functions and compare them. The diagram of the power
* +
functions
((?),.#
(p) is illustrated in Fig. 14.3.
Looking at the diagram we can see that only one thing is clear
+ +
C)
defines a very bad test, its power function being Jomnlrp by the
other tests. Comparing the other three tests we can see that C( is more
+
< a for 0.< 60. Cib is more
powerful than the other two for 0>. 60 but
+ >
powerful than the other two for 0 < 60 but t
(p)< a for p > 60, but none of
the tests is more powerful over the whole range. That there is no UMP test
of size 0.05 for HvL 0= 60 against Hj : ()+ 60. As will be seen in the sequel, no
UMP tests exist in most situations of interest in practice. The procedure
adopted in such cases is to reduce the class of all tests to some subclass by
imposing some more criteria and consider the question of UMP tests within
=
kbad',
,#(p),
.#-'((?),
.#
'better'
'cut';
.#*(@
293
Optimal tests
@+
'++ (p)
1.00
>.
x.
p'
N
N
/
h
.t.#++''
0.05
ttpy
X ?
(#)
,(# (0 )
60
0
Fig. 14..3. The pow'er functions
..#(f?),
.#*(0j,
.#-t
''-(?),
.#-F
> ''(#).
the subclass. One of the most important restrictions used in this context is
the criterion of unbiasedness.
Dejlnition J
max
t?e O f.,
(0)t
0 g 0- 1 is
#)J
(14.20)
,(#(p).
max
() G (''1j
() 60
HvL
against
H3 : 0 > 60
H
()< 60,
+
the tests defined by C1 and C* are UMP, respectively. That is, for the one+.
lt is
sided alternatives there exist UMP tests given by C1 and C)
of
H3
above
H3
and
in
the
parameter
that
the
space
important to note
case
implicitly assumed is different. ln the case of Hk the parameter space
implicitly assumed is O E60,10% and in the case of H3, O g0,6%. This is
needed in order to ensurc that (6): and 0- I constitute a partition of 0=
Collecting al1 the above concepts together we say that a test has been
dejlned when the following components have been specified:
(FJ)
a test statistic z(X).
(T2)
the sjcc of the test a.
(FJ)
the distribution of z(X) under Hv and Sj
(F4)
the rejection region C1 (t?r,equivalently, C0).
Let us illustrate this using the marks example above. The test statistic is
.
r(X)
( 14.2 1)
1.27
'v
=0.05
Ca
/tz) dz= 1
- C
(14.22)
-a.
In order to derive the power function we need the distribution of z(X) under
Sl. Under Hb we know that
w/nuk'np1) ,vN(0,
z*(x)=
-
1),
(j.4 2,3)
.
for any pl G (91 and hence we can relate z(X) with z*(X) by
(p1
-po)
z(x)=z*(x) +
( 14.24)
to deduce that
xtx/'n (p1
-
z(x)
jj
p())
( 14.25)
=Pr
+Pr
z*(x) %
-c,
z*(x) > c,
. (p1
V'n
-vG
(pj
0o)
,
p: s e1.
( 14.26)
Using the power function this test was shown to be UMP unbiased.
The most important component in detining a test is the test statistic for
.&4#,,-/tj.x((),
j),
1)
(n
-
S2
'x,z
Gc
(n-
1),
(14.27)
but in general these pivots are very hard to come by.
The tirst pivot was used above to construct tests for p when c2 is known
(both one-sided and two-sided tests). The second pivot can be used to set up
similar tests for y when c2 is unknown. For example, testing Ho2 p
against H3 : p #/t() the rejection region can be defined by
=pv
C1
)x:lzj(X)1 y cz)
where z1(X)
fxcy
x/n
(Y,,-Jt)
,
(14.28)
W:
C1
zj(X) y c.)
tx:
with
(x =
dr,
.J'(1)
detennining ca.
The pivot
z2(X)
(n- 1)s2
z2(n
Gz
'v
1)
C7:
wherec.
)x:z2(X) %c.),
is detenuined via
Cz
dzztn 1)
-
a.
(14.31)
296
regions
'f(tpa,
.)
(1)
Slmple
$, t?e O )
./'(x',
.f
.(x;
./'(x'.
( 14.32)
( 14.33)
optimal tests
14.3 Constructing
297
/#(t?)
=
for 0 %
for 0 ()1
=
1)
(14.34)
The Neyman-pearson
theorem suggests that it is intuitively sensible to
base the acceptance or rejection of Hv on the relative values of the
distributions of the sample evaluated at 0= 0o and 0= ?l i.e. reject Hv if the
ratio .Jlx; t?o)/'(x; f?I) is relatively small. This amounts to rejecting Hv when
it a higher
lt is very
the evidence in the form of x favour H3 rgiving
that
solve
Neyman-pearson
theorem
not
note
does
the
the
important to
relating
problem
the
ratio
completely
because
of
the
p())
problem
tx,'
J(x; ?1) to a pivotal quantity (test statistic) remains. Consider the case
where X N(t?, c2), c2 known, and we want to test HvL 0= 0v against f11:
theorem we know that the
p= ?j (p()< ?:). From the Neyman-pearson
ratio
region
of
in
delined
the
rejection
terms
-
tsupport'.
'x-
())
a exists for some a.
can provide us with a UMP test if #rtx c Cj; 0=
The ratio as it stands is not a proper pivot as we defined it. We know,
however, that any monotonic transformation of the ratio generates the
same family of rejection regions. Thus we can define
=
z(X)
w''n (X,1- 04
r;-
gn(o1 .p())g
+.N-
a
in terms of
Cl
which
=
)x: T(X) y
6: C1 ;
'
?o) a
=
( 14.37)
x if
( 14.38)
exists.
#rtx
6: C 1 ;
(14.36)
as
c)).
0 + 0o
n --!- p
(74)) a,
=
values.
298
For example if a
,t?qpl)
=
zl
=0.05,
cl
-N
(x)> c)
/ n(p1 p2 )
-
(14.39)
p,
where
under Hj
(14.40)
1 j=
-
#r(z1(x) ,Ac)*)
c'l + c)*
sample
size since
Vn(() - o ).
G
(14.41)
()
<
0v the test
yz'
(# s p)
z(X) v----s-..n
=
which gives
C1
(2)
rise
=
kf :
x
to the
rejection
region
z(X) % ca*)
Composite
( 14.42)
one parameter
cas
against
H j : 0 < 0v
being the other extreme of two simple hypotheses, no such results as the
Neyman-pearson theorem exist and it comes as no surprise that no UMP
tests exist in general. The only result of some interest in this case is that if we
restrict the probability model to require the density functions to have
monotone Iikelihood ratio in the test statistic z(X) then UMP tests do exist.
This result is of limited value, however, since it does not provide us with a
method to derive z(X).
14.4
299
(14.43)
C1
)x:z(X) %c))
(14.44)
HoL
0= 0v against .S1 : 0 > 0v and Ho1
are also UMP for the hypotheses
0= 0v against Sj : 0 < po, respectively. This is indeed confirmed by the
example above.
diagram of the power function derived for the
Another result in the simple class of hypotheses is available in the case
exponential
where sampling is from a one-parameter
famil),of densities
(normal, binomial, Poisson, etc.). ln such cases UMP tests do exist for onesided alternatives.
'marks'
Two-sided alternatives
For testing S(): 0= 0o against H3 : 0+ ?tlno UMP tests exist in general. This
is rather unfortunate since most tests in practice are of this type. One
interesting result in this case is that if wc restrict the probability model to the
one-parameter exponential familvand narrow down the class of tests by
imposing unbiasedness, then we know that UMP tests do exist. The test
defined by the rejection region
C'1
)x:1z(X)!p: ca)
(14.45)
14.4
The discussion so far suggests that no UMP tests exist for a wide variety of
cases which are important in practice. However, the likelihood ratio test
procedure yields very satisfactory tests for a great number of cases where
none of the above methods is applicable. lt is particularly valuable in the
case where both hypotheses are composite and 0 is a vector of parameters.
This procedure not only has a lot of intuitive appeal but also frequently
leadg to UMP tests or UMP unbiased tests (whensuch exist).
300
Hypothesis
Consider
Ho 0 c(.)()
against
H 1 : 0 e:O j
Let the likelihood function be Ltp; x), then the likelihood ratio is defined by
max Atp', x)
2(x) dceo
L0; x)
x)
.1.z(#;
max
,s e)
( 14.46)
L0; x)
'support'
The numerator
'supported'
Cj
)x:2(x) G k)
( 14.47)
oBo
a.
L ()
1(#a)
--
---
I
1
I
I
I
l
I
I
I
1
I
I
I
I
I
1
l
I
I
I
I
l
I
J
Fig. 14.4. The likelihood ratio test.
when the
14.4
301
1k
(2zrc2)
-,:./2
L(p; x)
11
s-c j
exp
?1
.... l , t
)( (xf--pn)2
i 1
(xj-p)2
I=1
pr
2(x)
=
,,
j (xi- Y,,)2
f=1
n(-'.t'
2(x)
,,
1+
,1
)7 (l -i
=
1,'
14J1
sq
,v,
1+
j;j;i (!
.jj
--
l1/
;!
) under H 1
C 1 jfx :
-'p,,)
-/t())
here1&r=x7nI((#,,
I'I'',v l(n
-/1...2
-/-to)
V:1(Jt1
,
-/.10)
Jzl G 01
> cz )
region
takes
against
and
(72
of example 1 consider
model
c2
o
H 1 : c 2 + c 0,2
0- K J@x R
under Hv
z (n -
17
'v
1; J ) un d e r H 1
no'j2
n
tr
tr
2:
(E;())
dzztn
1)
kl
1 -a,
18.5, kz 29.3.
n - 1 30, /j
e.g. if a
Hence, the rejection region is (71
n kj or p y kc). Using the analogy
between this and the various tests of y we encountered so far we can
postulate that in the case of the one-sided hypotheses:
=0.1,
.ftx:
c2 > cj,
H 0 : c2
t;
Hj
c2 H :
0,
1
c2 < c(j
(7.2
>
(7.2
0,
fx:
py
jf
x : 17%kj );
/(a).
The question arising at this stage is: tWhat use is the likelihood ratio
test procedure if the distribution of 2(X)is only known when a well-known
pivot exists already'?' The answer is that it is reassuring to know that the
procedure in these cases leads to certain well-known pivots because the
likelihood ratio test procedure is of considerable importance when no such
pivots exist. Under certain conditions we can derive the asymptotic
distribution of 2(X). We can show that under certain conditions
Ho
2 Iog
(x) z2(,.)
-
(14.48)
('
'w
'
iasymptotically
reads
;<
0 :A1+ n);
(14.49)
(14.50)
(z(X)G 0 %f(X)),
#r(#X) G p %f(X))
(14.51)
-a,
Dehnition 6
The interlml (#X), f(X)) is called a (1
jr alI 0 e:O
#r(z(X) G 0 %f(X))> 1
-.
Hypothesis
suggests that in the long-run (in repeated experiments) the random interval
(z(X), f(X)) will include the ttrue'
but unknown 0. For any particular
sure' whether (z(X), f(X))
realisation x, however, we do not know
includes or not the true' 0,' we are only ( 1 - a) confident that it oes. The
duality between hypothesis testing and confidence intervals can be seen in
example discussed above. For the null hypothesis
the
ifor
Kmarks'
H () : 7 0()
=
0() 6: 0-
against
H 1 : p # 0v,
a size x test based on the acceptance
we constructed
C()(?o)
x : ()v- cz
c G Xn G 0v +
--c
(,'a
v'n
region
--rx n
( 14.53)
with cx delined by
fr:;4
$(z) dz= 1
-
T'
-a,
( 14.54)
N(0, 1).
This
implies
that Prx
C(X)
is
--s
,n
x-,
x<
,,
+ cx
tr
-sx,. /
Proo e:C7) 1 - a.
( 14.55)
( 14.56)
ln general, any acceptance region for a size x test can be transformed into a
( 1-a) confidence interval for 0 by changing ()-a, a function of x 6,7/,. to C, a
function of 0o s0.
One-sided tests correspond to one-sided confidence intervals of the form
#r(z(X) G 0j > 1 -a
(14.57)
Pro Gf(A-)) y 1
(14.58)
-a,
C(X)
)#:z(X)
:$
0 %f(X))
()-(X)
)#:z(X) % p).
#rtx : 0 G C(X)/p)> 1 a,
-
( 14.59)
a random
(14.60)
305
C)X)
tp: r(X)
0i fk(.X), i
t:i
:$
1, 2,
??l
( 14.6 1)
independent
(1
(zf)
confidence
(1
-a)=
J-l(1
-j).
x'tpny'zf/f?n'ta
'#)
for al1 0 6 0.
( 14.63)
because
0 t5 C7(X) (9 if and only if x 6
rt)tpl
( 14.64)
,?'',
( 14.65)
hence
tx:0v e:C(x)),
a.
( 14.67)
This duality between C04*0) and C(X) is illustrated below for the above
example assuming that n l to enable us to draw the graph given in Fig.
14.5.
Continuing with this duality it comes as no surprise to learn that
unbiased tests give rise to unbiased confidence regions and vice versa.
=
306
Fig. 14.5. The duality between hypothesis testing and interval estimation.
e:C(x)/#J 1 - a for #1 0z i 0.
(14.68)
ln general, a
test will give rise to a good conlidence region and vice
Lehmann
(1959)).
versa (see
#r(x:
0L
:$
igood'
14.6
Prediction
'
): 0-
-+
k??l
(14
.69)
Prediction
14.6
307
'
'
X,, +
l(X,,))2,
(14.70)
'(')
form
'(X,: +
l(X,,))2
'().Y,,+
Exn
7(-Y,,
+ 1 c(X,,)) + .tF(-Y,,+1/c(X,,))
1-
+ 2'('t.Y,,
c(X,,)))2 + ECECXn+
flX?,
+ 1.
1 -
Ezrn
/(X,,)))2
c(X,,))
l(X,,))2
(14.7 1)
Using the properties CE5 and SCE5 of Section 7.2 we can show that the last
term is equal to zero. Hence, (71) is minimised when
l(X,,)
f;(..,,
( 14.72)
1/c(X,,)).
When this is the case the second term is also equal to zero. That is, the form
1(.Y,,)which minimises (70) is
of the predictor
j
.-,,.
.i'
11 +
Exn
1/t4X,,)),
(14.73)
WlX,,)) #'X,,
(14.74)
(see Chapter 15). ln practice, when D(',, + 1 X,,; #) is not known linear
predictors are commonly used as approximations
to the particular
,
308
functional form of
'(A-,,+ 1/c(X,,)) =g(X,,).
.4
l
E(Xn + 1/X,,
x,,),
where x,, refers to the observed realisation of the sample X,,. The intuition
for the value xY,, 1 must be the
underlying (76)is that the best
values, in view of the past realisations
of
a11
possible
of
its
average
iguess'
#(XC=
It
then
Is
xn)
(seeFig.
14.6).
1).
(14.77)
expectation
with the marginal
That is, the conditional
coincides
expectation of aY,, : (seeChapters 6 and 7). This is the reason why in the
case of the random sample X,, where Xi N((). 1), i 1, 2,
n, if A-,, 1 is
also assumed to have the same distribution, its best predictor (in MSE
sense) is its mean, i.e. kn 1 (1//rl) )'-1 Xi (seeSection 12.3). lt goes without
saying that for prediction purposes we prefer to have non-random sampling
models because the past history of the stochastic proess t.Y,,, n )y:1) will be
of considerable value in such a case.
Prediction regions for A-,, take the same form as confidence regions for
..
'v
.1
1
I
I
I
I //
I /
51
.''
ly
Il sk-x x
ll NX
ll N
x
'x
Il
ll
I
1
I
I
I
n n
14.6
Prediction
309
0 and the same analysis as in Section 14.5 goes through with minor
interpretation changes.
Important
concepts
Questions
Explain the relationship between Hv and H3 and the distribution of
the sample.
and rejection
Describe the relationship
between the acceptance
regions and (-)0 and O1.
Define the concepts of a test statistic, type l and type 11 errors and
probabilities of type l and 11 errors.
Explain intuitively why we cannot control both probabilities of type l
and type 11 errors. How do we
this problem in hypothesis
testing'?
Define and explain the concepts of the power of a test and the power
function of a test.
Explain the concept of the size of a test.
Define and explain the concept of a UMP test.
State the components needed to define a test.
Explain why we need to know the distribution of the test statistic
under both the null and the alternative hypotheses.
Define the concept of a pivot and explain its role in hypothesis testing.
Explain the concepts of one-sided and two-sided tests.
Explain the circumstances under which UMP tests exist.
Explain the Neyman-pearson
theorem and the likelihood ratio test
procedure as ways of constructing optimal tests.
Explain intuitively the meaning of the statement
isolve'
10.
11.
12.
13.
14.
#r(z(X) G 0 Gf(X))
1 - a.
Hypothesis
310
confidence
uniformly
-a)
most accurate
region.
Exevcises
For the
example of Section 14.2 construct a size 0.05 test for Hv
HL :
against
60
0 < 60. Is it unbiased? Using this, construct a 0.95
0
confidence
interval for 0.
level
significance
c2)
the following hypotheses:
and
consider
Let X N(p,
'marks'
'v
(i)
ff0: /t G/zo,
(ii)
S(): c2 >
(iij)
H () : p
(iv)
H,.. /.t
(7.24),
y ()
c2 >
f11 : p Apo,
0, po - known;
c2 < c2(),
HL :
c2()
H 1 : p # p ()
c2
-pv-
> czt), H,
c2
- known;
:#:
2:,.
c2 < czo.
y +pv,
State whether the above null and alternative hypotheses are simple or
composite and explain your answer.
X,,)' be a random sample from N(0, 1) where 0 e:OLet X H(-Y1,
.tft?1,p2). Construct a size tz test for
.
against S1 : 0 0z.
=
(1
tz)
significance
level conlidence
interval
fx; 0)
Construct a size
p'I(
041 -
'Y
0, 1.
cl
(x
x:
(j)'=1 Xi)
(.S,,-yJ
Un
>k
sl
11
1j
y xi
-.:,,)2
.j
(i)
H o c 12 c c2 H j : c lj > c
(ii)
.:2
:$
,.
-.
()
(7
2)
y: c
22
,
:c
2j.
<
2,
c ;
Additional references
(iii)
H () : c 21 c 2,
=
H j : c 2j # c 2,
c1
x y : z (x y) y; k 1 )
.f(
xf
tx
y : z(x y) %k
.z
cl (x,y : ks %ztx,
where
respectively,
=
y)
,::
/,j.)
11
Z (A-i
-f,,)2/n
ztx,y)
,,,
Z (1'i=
1V?nl2
c2, H3 : cf + cl using
Additional references
Bickel and Doksum ( 1977); Cox and Hinkley ( 1974); Kendall
Lehmann (1959); Rao ( 1973); Rohatgi ( 1976)) Silvey ( 1975).
and Stuart
(1973);
CHAPTER
15*
The multivariate
15.1
normal distribution
distributions
Multivariate
The multivariate
normal distribution
is by far the most important
distribution in statistical inference for a variety of reasons including the fact
that some of the statistics based on sampling from such a distribution have
tractable distributions themselves. lt forms the backbone of Part IV on
and thus a closer study of this
statistical models in econometrics
distribution will greatly simplify the discussion that follows. Before we
consider the multivariate normal distribution, however, let us introduce
some notation and various simple results related to random vectors and
their distributions in gcneral.
Let X EB (A-1 Xz,
Xn)' be an n x 1 random vector defined on the
probability space (S, P )).The mean vector '(X) is defined by
,
.%'
'
F(A-1)
f(X)=
F(A'2)
r
1
Kp,
an
x 1 vector
s(-v,,)
and the covariance
matrix Cov(X) by
covt.vlx,,l I
,
covtxcAVar(A',,)
/1
) 1l
j MY,
J
.j
15.1
distribution
Multivariate
jlxj.
/t
pj),
i,j
1, 2,
E and
( 15.3)
u.
In relation to a'Ya y 0 we can show that if there exists an x e: R'', a#0 such
that Var(a'X)= a'Ea 0 then Pr@'X c) 1 where c' s a constant (only
holding
constants have zero variance), i.e. there is a linear relationship
Xn with probability one.
among the nv.'s .Y1
=
Lemma 15.1
Ltzhnrtltk15.2
AS(X) + b
(f)
F(Z)
(/)
Cov(Z)
.F)rZ
and ct/r/rfkrlc'tz E
AX + b
Ap + b,'
wr
pt?cltpry
F(X)
/lx,
Cov(X, Y)
/ly)'(l.
Correlation
So far correlation has been defined for random variables (r.v.'s)only and the
question arises whether it can be generalised to random vectors. Let
F(X) 0
X into
=
X EEE
Define Z
A' 1
X2
X2:
(n- 1) x 1 Y=
E. X n x 1, and partition
11
5'1 ,
J2 1
Y2 2
Corrtzl A-j )
5'12
---t-1
.---u-.
c21(x E22*2
,
x'Xc)2
12-21Jz1 and
we
The multivariate
normal distribution
coefhcient to be
1
2N22
- J21
(J1
c'l 1
c o rrtx
z/xa),
X1
X1: k x 1, X: (n k) x 1, E
-
X.a
ZL
'1Xj and Z:
E 1c
E21
E2 2
x'1 E 1 2 x 2
za)
(1El 1l)'2X222)
---u.
,
(y 1 x 12 x 22
-
E 11
C1 c
1x
21 y 1
)j
(1, x 1 1 a 1 ).#
>
Ec1Y211
corrtzj za).
(15.10)
where Xa : (n 2) x 1
X1
the
X3
c11
E
Another
between
form the
J13
ca1
c22
Jc3
J31
J32
Y33
-:-1
r.v.'s
F1
X3
b'lxa
and
X2
b'2Xa
Corrt'l
F2)
pl 2.3
Ecl1
cl
- a5z
3 E 33
c12
1
l
--y-3E33 /31J
- /32)1
1c22 (r23E3:5
-tr1
b Corrth
Fc).
15.2
normal distribution
The multivariate
15.2
normal distribution
The multiuriate
The univariate
( 15.12)
X,,)' when the xY,.sare l1D
The density function of X EEE(-Y1, X2,
normally distributed r.v.'s was shown to be of the form
.
- n/ 2jg2)
gaj
11/ 2
1k
exp
11
p) l
(xy
-
20-a f
Similarly, the density function of X when the .Ys are only independent,
Xi Nqpi, cJ),izuu1, 2,
n, takes the form
i.e.
'v
,,/2
= (27:)
(c:, c2,
2
c,,)
j.
-
exp
11
2i
)-(
1
.Xf
1!
pi
.....
(1j
o'i
j4)
E)-l expt
-?'/2(det
tx; p, E) (2z:)
=
-.(x
1(x
/z)'E -
Jl)),
(15.15)
and we write X N(p, E). lf the Arfsare 1lD r.v.'s E c2I,, and (detX) (c2)';.
On the other hand, if the A'zsare independent but not identically distributed
=
'.w
11
diagtcz1.
ln the case of n
p=
o
p1
pz
(det E)
c2)
l1
(det N)
and
(cJ.
) (c
=
2:
,
ct;
)
.
2
E
t7'l2
21 2a(
c c
c 12
tF12
1 p 2)
-
c2
G2
>
Pc 1c 2
2
G2
p/1tF2
for
1< p
<
where p
12
,
(71J2
3 16
Te multivariate
normal distribution
and
(see Chapter 6). The standard bivariate density function can be obtained by
defining the new r.v.'s
(1)
Properties
(N1)
Let X
'v
'v
./r
j A,X,
l
x.
jj
=
A,s,
f
jjl (A,tA,')
and
(-'' $x,)
'
x(,z, y' x)
p, Yf
E, t
1, 2,
The multivariate
15.2
normal distribution
Let X N(p, E) then the -Yfs are independent if and only f.Jaij 0,
c,,,,). ln general, zero
n, i.e. E diagtc 1 1
i #.js i,j 1, 2,
implq'
independence
but in tbe case ofnormality'
covariance does not
=
'v
are equivalent.
r//? lwo
Ij' X
'w
0j'
an), k x 1 subset Xj
whvre
X EEE
X1
,
X2
p,
pz
E11
E 1z
.-
.a
=2
L 22
q/r
'v
(Ik: 0),
=0.
'v
J(X
1;
(X;
P1)
and
P)dx 2
/'(x2;p2)
/'(x;0
dx1
(N5)
F()r
1t?
partition of X considered
Xl
(?J given X2 takes the
same
distribution
AX
0
=
in N4 the conditional
(X2 - p2),
+ E1 2E2-21
NpL
E1 1
E1 2E2-21
E21).
N 1 for
- E 1 2E zh
I,, -k
is also
./?nl
(Xl/X2),
Taking
and
X 1 - E 1 aE c-21X z
Xa
(15.19)
b 0
=
Cov(AX)
Ej 1
E 1 21
2-21
:22 1
0
.
Ec c
(15.20)
From this we can deduce that if E1 2 0 then X1 and Xa are independent
since(X1 Xc) x. N(/z1 E1 1). Moreover, for any E1 c, (Xj -E1cE2-21Xc) and X2
given that their covariance is zero. Similarly, (X2,/Xj)
are independent
X(#2 + E21E1-/ (X1 /11), E22 E21E1-11E1 2). ln the case n 2
=
'w
X3./Xz)
,
(7'
'v
( 15.2 1)
/'(x1 x2; /)
'.
Jtx; p)
.I'(xc-,0z)
(15.22)
The multivariate
normal distribution
function
x2)
pL
E1 cE2-altxa pz)
-
is linear in
xc
(15.23)
x2)
E1 yz'tft l cE2-21E2j
is
freeof
(2)
Without
partition
Multle
X1
X2
let X
c11
,.w
c12
,
..
=22
J21
(3)
normal distribution
correlation
Xv
x2.
vartx
,'x
Vart.-
multiple
a 1 cta-clcc,
( 15.25)
c:
correlation
Partial corvelation
further into
Let Xa be partitioned
Xz
X z EB
Xa
X s : (n - 2 ) x 1
A' 2 : 1 x 1
with
J22
Ec2
J23
E33
G5z
tz1 1
J1 3
c21
az
J31
E33
covtxl xcy/'xal
,
/?1 2
'
--.y
EVar(A-1/Xa)(1IEVar(.Yc,Xalj
=
Ec11
-
c1 (!Ea-31a aa
o'13E:'-3'/..51 lEcc2 - c'cata-a'
c 12 -
,a2(1-:
'
(15.26)
319
15.3 Quadraticforms
with
*'
X ,''X3
XN
1
A' ,'X 3
3E 3-31X a
J1
3-31
X :9
J 2 3E
c 21
3E a-alTr :$1
J1
o.z 3E
J 1 3 E 3- 3 J 3 2
- 1
(T 1 2 -
-I J 3 2 o'z,
/23E33 c32
( 15.27)
15.3
Quadraticforms
(Q1)
Let X
related
,v
p)'t
(j)
(X
(f)
XE - IX
(X
g2(n;
'v
z2(n)-
-p)
)-
chi-square;
chi-square;
non-central
./lr
z=.
'v
./),-
T'
=>
Let X
(Q2)
1
N p, - E
F
'w
1
'(/-p)
(t)
X,
-p)'E--
T
z..-g
p)'A(X
z2(n).
for A a
p)
h'om N2
yyn-lrnt?r/-ft.'
(A'
A) matrix
zzttrA),'
'.w
'wzzttr
A; ), j= p'Ap.
X/AX
(ff)
A). Note tr A re-krs to the
if and only if A is idempotent (.c.
7jj).
of
A
A
(tr j)'=1
rmc.l
matrix, l/l?n
Let X Np, E), Y > 0 and A is a
-*.2
(Q3)
.s.rnlm'r?-c'
x,
(X - #)'A(X
(f)
X'AX
if and
(Q#)
X
()
zzttrA);
-p)
'v
zzttrA', J), J
x.
only if AE is ftcrnptprenr
Le
'v
Nlp, E),
E > 0,
and
p'Ap,.
(f.t?.AYA
(xz
X1
X EEE
A).
p1
,
yg
E 1l
xaj xaa
p)
(X1
p1)'E1-11(Xl
-
JIjlll
X1 2
z2(n k).
-
The multivariate
normal distribution
'v
matrices.
AEB 0.
(./1
X'AX
pr
and
(./2
(Q6)
(Q7)
./,-
'v
matrices
flt?rn/ptpr???l
X'AX
-
tr A
Z BZ
g
F( t !- A t r B
.j
(j)
y' A p
tr 11
15.4
Estimation
'w
form
= k(X)(27:)
ztdet
'-nT
E)-T'
1'
exp
)
=
( 15.28)
1ogL(0; X)
c-
r
n T lo2 27: T logtdet E) 1
(Xf-p)'EF
--s
2
2
2t j
--,-
=-'-
1(X,
-p).
( 15.29)
Since
(Xf -/z)'E- 1(Xr
-p)-
/'
)(Xr -&r)E-
1(X,
-&w)
Xw=
c*
T
-
--
1
T
X,,
logtdet E) -
j( (X, -&w)(x,
-&w)',
( j 5.3())
tr E- IA
( 15.3 1)
15.4 Estimation
j
log1-(p,'X)
wx- (: v . JI )
.
(p
t?1og f.te, X) F E 1 A 0 = i
=Y
2
t?E- 1
.()
--
l 1og fatp,' X)
Lp t?/l'
'rx
.;.:y
(Xl -Xw)(Xl
-Xw), (15.33)
f?log L4:; X)
. PE
(jj.ya)
..
F
=
Y(I - : )E.
( 15.34)
Hence,
of p and E respectively.
MLE'S
Properties
Looking at Xwand i we can see that they correspond directly to the MLE'S
in the univariate case. It turns out that the analogy between the univariate
and multivariate cases extends to the properties of Xwand i.
ln order to discuss the small sample properties of Xwand i we need their
distributions. Since Xwis a linear function of normally distributed random
vectors it is itself normally distributed
Xw'vN /z, E
T
'v
I#,(E, F-
1)
( 15.36)
(2)
Useful distributions
Using the distribution (T- 1)S I4'(E, T- 1) the following results relating
to the sample correlations can be derived (see Muirhead ( 1982)):
'v
Simple correlation
ri.i
=
sij S Lsij?i-j,i,)
siisjj
,
1, 2,
.,
n.
The multivariate
normal distribution
If
For M
Lrij?ijwhen
diagtcl
1,
c.),
2 logtdettsl/
,.v.
.;
zztntn-
1:.
(15.38)
Multiple correlation
Rmw
ls
sl zS 2-z a j 'i
( 15.39)
S11
Under
R
k;
w-nj
=0,
h;-
.stu
.j,
w.u).
1-.!'i/
( 15.40)
In particular,
1
n'(.l'il.)=
r
2(T-n)(n
Vart y #)=
(F c
1)
(15.41)
1)4F- 1)
v'..'-
1)(w -.R2)
x(0,4R2(1 -R2)2)
,v,
0..::Rl
and instead we
<
1.
(15.42)
=0,
1).)
('r-
'w
z2(n
-
1).
to R2 is the quantity
#2F
'
X 1 X 2 (XIX 2
)--1X'2 x 1 l
(x1x1)
t
(15.43)
(15.44)
x 1 X c: T x k
The sampling distribution of 112was derived by Fisher (1928)
but it is far
variance
complicated
of
and
1ts
be
direct
interest.
too
to
mean, however,
are
of some interest
x1 : T
Hypothesis
15.5
4R2(1
Var(#2)
R2)2
+ O(T-
323
regions
2)
(15.46)
=0
'(#2)
T- 1
+ O(T
).
(15.47)
Partial correlation
1
s 1 2 s 1 3 S 3- 3 s 3 2
!
s31)-(s22
:'S33
(s1l -s:
-
P12
Under
pt 2.3
15.5
.-scass:s
( 15.48)
'
s32)-j
0.
(15.49)
L0. x)
maX
Ti)-'lT'-n-
:)-'F'2(det
c*tdet
c*tdettt +
t) expt
-1.Fn),
f-(p; x)
max
0 e (6,0
Fi/F-n-
zldet
Xw&'w)-r
(15.50)
p e (.)
1)
expt -JTn),
( 15.5 1)
we get
2(x)
det(TE)
dettt
+&w&r,)
vfz =
1+&J
.r.,z
--
j:
j ..).ggzjg-
(15.52)
where
z12=
'r&ys-l&
s= T-1 t
(15.53)
324
The multiuriate
normal distribution
is the so-called Hotelling's statistic which can form the basis of the test,
being a monotone function of 2(X). Indeed,
T- n
H1
n(F- 1)
(see Muirhead
based on the
t-j
(1982:.Using
rejection
x:
1)
n(T-
the Hotelling
H 2 > cu
rl). For
Tp E
( 15.54)
(15.55)
p= po against H3 : ## pv the
HvL
test statistic
(15.56)
-,:).
fcze
H2
region
dF(n, Twhere a
takesthe form
=
F(n, T - n; J),
'v
x:
Important
-.
rl(T- 1)
we can define a ( 1
C(X)
H= %;ca
( 15.57)
-a)
:$
(T-
1)n
(F-n)
ca
concepts
Questions
Explain the various correlation measures in the context of random
vectors and compare the general formulae with the ones associated
normal
distribution.
Comment
with the multivariate
on the
similarities.
Discuss the relationship
-p)
square distributed?
15.5
Hypothesis
325
State the conditions under which the quadratic forms X'AX and X'BX
will be independent.
6. State the conditions under which the ratio of the quadratic forms X'AX
and Z'BZ will be F-distributed.
Under which circumstances will the quadratic form X'AX and BX be
independent?
8. Discuss the properties of the MLE'S X,,and i of p and E respectively in
the context of the statistical model defined by the random sample X ESE
X,,)' from Np, E).
(X1 X2,
,
Additional references
Anderson
Mardia
(1984),.
16.1
Asymptotic properties
> c,, )
tx lz,,(X)I
:
G),
0 iE (9.
16.1 Asymptotic
properties
zr(p) #r(x
G C)''),
0 6 0.
(16.3)
max
,#,(p)
0 c 6o
( 16.4)
Dehnition 1
The sequence of tests for Ho.' 0 6 (i)o against 1'f1: 0 01 dejlned by
c
(C'1 n > 1) is said to be consistent of size f/
,
c(#)
max
e
e
and
(z
etl
7:(p) 1, 0 6 01
=
(16.6)
z:4:)
e (.:0
'l0)
tx<
<
1,
0 6 O1.
( 16.8)
Desnition 3
Asymptotic
test procedures
power (UM#) of
size a
7(0)
max
('a
(/
( 16.9)
ee
and
z0) >
for any
zr*(p),
0 (E (.l,
( 16.10)
ln asymptotic
b#0
( 16.11)
in order to assess the power of the test around the null. When
I
11
v'',,(4-po)
x(b,I.(p)-
1)
for l the MLE of 0. ln this case we consider only local power and a test with
greatest local power is called locally' uniformly most powerful.
The Iikelihood ratio and related test procedures
16.2
(1)
Simple
null
hypothesis
0+ 0v.
2(x)
=
x)
1-(p();
L(0., x)
max
e e til
f/pe', x)
L(1,. x)
( 16.13)
16.2
Likelihood
ratio
329
p Iog z-(p;x)
l-t; x) +(4-p)
xl-log
+.:1- 0)'
whereJp* p) < J- p! and
- 10). Since
(seeChapter
p
f-(p; x)
'ik log
#--J
00
p2log 1-(p*; x)
pp 0
,
p-j
(seeChapter
:-j
p) +
0,(1),
( 16.14)
og 1) refers to asymptotically
negligible terms
( 16.15)
=0,
conditions
log.L(;
x) XI(p),
(16.16)
(seeSerfling (1980))to:
p)'l(p)(I p) + o/ 1).
-
(16.17)
x/nt
0o)
tz
x(0,I(p)-
-2
log
1).
(16.20)
(x):>,n(I-pp'I(p)(4
Ho
-p(,)
15) that
z2(,,l),
(16.21)
(r.v.'s).
330
test procedures
Asymptotic
Ho
Wz'=n(l-po)I()(J-po)
given that
(iii)
I(J)
xz2(m),
(16.22)
-+
log
f7tr
V n -u
q(#)
=---s
L(0; x)
=-
N(0, I(p))
1q(po)
q(#()'l(#() -
Hv
( 16.23)
multiplier)
(or Lagrange
(instead
test statistic
zztrj,
form in asymptotically
normally
distributed
I-.V. S.
)x : !(x)> cz)
(71
region
(16.25)
where /(x) stands forall three test statistics and the critical value ca is defined
by dz2(r# a, x being the size of the test. Under local alternatives with a
Pittman type drift of the folnn:
Jc
(t
/.fl : 0,, 0o +
=
(16.26)
l(x4
z2(m;J), J
'v
distributed as:
b'Itp()b,
( 16.27)
since
x'',;(J
-
0v)
vt
#,,) + b
xtb,1(:0) - 1)
(16.28)
and
qtpo)
v7n
-x/ntl-ppltp()l
+0p(1)
xtbI(po), I(po)).
(16.29)
16.2 Likelihood
Hence, the power function for all three test statistics takes the form
X
itol
dzztm.. J),
(16.30)
C?
and thus, LR, W' and LM are asymptotically equivalent in the sense that
they have the same asymptotic properties.
Fig. 16.1, due to Pagan (1982),shows
the relationship between LR, ##rand
LM in the case of a scalar 0.
z,M=2
area
Lo
q(po),
p
(- p())2
g,,=2
=q(po)2
area Atl
q),
LR= 2 area
A C
qo) dp.
=2
( 16.3 1)
(16,32)
( 16.33)
po
Note that al1 three test statistics can be interpreted as functions of the score
function.
332
Asymptotic
test procedures
(2)
Composite
null
hypothesis
H o : 0 i! 0. o
(p:R(#)
0. ()
Rr
0. i Rm
It is both convenient
(1)0
H 1 : 0 (F0. j
0, 0 c 0)
(16.34)
where R(#)
.
-2pc,
x'V(l
0)
.N(0, 14:) -
'w
1),
(16.35)
and
1
t?
log L(0; x)
n'&
N(0, I(p)),
'v
xV(R(
R),I(p) - 1R:),
-R(p)) xN(,
R(4) +Rp(p
where
by
) +0/1),
R(p)
Rp=
(i)
at 0=
( 16.37)
(0
lf the null hypothesis Ho is true we expect the MLE 1,without imposing the
restrictions, to be close to satisfying the restrictions, i.e.if Hv is true, R(l) 0.
This implies that a natural measure for any departure from Hv should be
p1
R(
-011
(16.39)
If this is
different from zero it will be a good indication that Hv
different from
is false. The problem is to formalise the concept
zero'. The obvious way to proceed is to construct a pivot based on 1jR(l)1jin
order to enable us to turn this statement into a precise probabilistic
statement.
Ssignificantly'
tsignificantly
16.2
333
ln constructing such a vot there are two basic problems to overdepends on the units of measurement and
come. The first is that jjR(#)Jj
the second is that absolute values are not easy to manipulate. A quantity
both problems is the quadratic form
which
'solves'
- 'R(;),
R('Ev(R()1
(16.40)
hencewe
'
(t6.41)
R;I(p) - R:,
IRPI iR()
nR('gRI(#)-
Ho
Wald's suggestion
estimator, i.e.
z2(r).
(16.42)
'X
to replacing
amounts
1'V'=nR()'gRI(l)-
'w
Ho
'RI;II-1R()
'v
V(R()
with
a consistent
z2(r).
w''n0
#)
'.-
N (0 E : )
,
W'= nR(#*)'gRipRj1
with any
(16 44)
Ho
- 'R(#*)
'w
:'
(ii)
( 16.43)
z2(r).
(16.45)
R(p)p,
(16.46)
log1z(:
x)- Rp=0,
(16.47)
Asymptotic
test procedures
R(#) 0.
(16.48 )
In the case of the Wald procedure we began our search for an asymptotic
pivot using R(l) which should be close to zero when Hv is true. ln the
bydelinition and thus it cannot be used. But.
present case, however, R(#)
=0
is
(16.49)
= 0,
this is not the case for (t7log L(; xlj/'Pp and we can use it to construct an
asymptotic pivot. Equivalently, the Lagrange multipliers pfe can be used
instead. The intuition underlying the use of g(#) is that these multipliers can
be interpreted as shadow prices for the constraints and should register a11
d epar tures from Ho; if #is closed to'd #'#)is small and vice versa. Hence, a
reasonable thing to do is to consider the quantity
Using the same
argument as in the Wald procedure for jR(J)
we set up the quadratic
form
1-)
-0(
-01.
1#(#).
/z(l1'EV(/#l))I
-
(16.50)
x)
N(0,
'v
l(p)),
(16.5 1)
(p()
p(04)
'v
''''''''''''''tiklih',------,---'
-
-)--
( 16.52)
).
Hence,
1
g(#)'(R;l(p)- 'RJIZI-I
-11
Ho
-
z2(r).
( 16.53)
1
=
Hv
#(#)'gR;-l(#)- 1RJg(#)
-
z2(r),
(16.54)
or, equivalently,
LM
'
log L(. x)
lo -
p
'0
log L(. x)
(16.55)
16.2 Likelihood
The likelihood
Ho
LR
2(1ogLk,' x)
IVO:
LM
:>:
n(l
'v
(:
#-)'1(#)(1
-
z2(r).
( 16.56)
LR :>:
335
#).
Thus, although a1l three test statistics are based on three different
asymptotic pivots, as n a:i the test statistics become equivalent. Al1three
tests share the same asymptotic properties; they are a11consistent as well
as asymptotically locally' UMP against local alternatives of the form
considered above. In the absence of any information relating to higherorder approximations of the distribution of these test statistics under both
Ho and SI the choice between them is based on computational
convenience. The Wald test statistic is constructed in terms of J the
unrestricted MLE of 0, the Lagrange multiplier in tcrms of # the restricted
MLE of 0 and the likelihood ratio in terms of both.
ln order to be able to discriminate between the above three tests we need
to derive higher-order approximations such as Edgeworth approximations
(see Chapter 10). Rothenberg (1984)gives an excellent discussion of various
ways to derive such higher-order approximations.
Of particular interest in practice is the case where 0- (p1 0z) and f'fo:
0L
0 against ffj : 03# o, pl : r x 1 with pc: (rn- r) x 1left unrestricted. In
this case the three test statistics take the form
-+
.f.-(4,-
LR
(16.58)
x)),
x) -log
L,
-2(log
LM
1g(#) EI11(-)
,
--
-I1c(#)I22
ljyj-),
(yj21 tj.)g
-
where
0L
1(*)
I(p)
11c(*)
11
121(#) 1224#)
R;
pb
(16.59)
(16.60)
x)
t7log z-(p,'
(o, .-4.
(1,:0)
and hence
R;I(#)(#) - 1R:
1(:)
E11
I 1 a(p)I2-21(p)Ia1(#)q-
test procedures
(16.6 1)
see the
Asymptotic
test procedures
Important
concepts
statistic.
Questions
Why do we need asymptotic theory in hypothesis testing'?
Explain the concept of an asymptotic power function and use it to
define consistency, asymptotic unbiasedness and UMP in testing.
What do we mean by a size x test in this context?
Compare the LR, 1#' and LM tests in the case of a simple null
hypothesis (draw diagrams if it helps).
Explain the common-sense logic underlying the LR, 14' and LM test
procedures in the case of a composite null hypothesis.
6. Discuss the similarities and differences between the LR, Hzrand LM
test procedures.
Explain the circumstances
under which you would use these
asymptotic test procedures in preference to the test procedures
discussed in Chapter 14.
8. Explain the derivation of the 145and LM test statistics in the case of
HvL p1
03 against HL : 0L 0. p! being a subset of parameter vector
0= (p1,0z), considered above.
9. Verify the form of the Wald and Lagrange multiplier test statistics for
0z) using the partitioned matrix
Ho #1 0 against Sl : pl # 0, pHtpj
inversion rule
:#
- 1
A 11
M12
A2l
A2c
1
.
Additional references
Aitchison
Silvey (1959).
Moran
(1970);Rao (1973);
PART
IV
models
17
CHAPTER
17.1
The main purpose of Parts 11and IIl has been to formulate and discuss the
concept of a statistical model which will form the backbone of the
discussion in Part 1V. A statistical model has been defined as made up of
two related components:
(i)
a probability model, *= .tD(y; 04, 0 6: (.))t specifying a parametric
family of densities indexed by 0', and
)'w)' defining a sample from
a sampling model, y (y':,
D()?; p()),for some true' 0v in (4.
The probability model provides the framework in the context of which the
stochastic environment of the real phenomenon being studied can be
defined and the sampling model describes the relationship between the
probability model and the observable data. By postulating a statistical
model we transform the uncertainty relating to the mechanism giving rise to
the observed data to uncertainty relating to some unknown parameterts) 0
whose estimation determines the stochastic mechanism D(y', 0).
An example of such a statistical model in econometrics is provided by the
modelling of the distribution of personal income. ln studying the
distribution of personal income higher than a lower limit y() the following
statistical model is often postulated:
.y,2,
D()/ yo ; 0)
y2,
y EEF(-9?1,
P '* 1
--
Vo
,
J.7
yw)?is a random
!' The notation in Part IV will be somewhat different from the one used in Parts 11and
111.This change in notation has been made to conform with the established
econometric notation.
339
Note:
if 0 >2.
For y a random
L0; y)
>'0
function is
p+ l
V0
wp
0w),()
(y 1
).'f
.J,,,2
,
)?w)-(p
+ 1)
,
log L4t?;y)
d log L T
dp =-0
=>
+ T
log )'()-
log
T
t
log yr
l=1
log yf
0,
,
-
.J.'()
./T((4- 0)
x(0-p2).
pw-lww-l
=
.j.
F( F
1).J,,
exp
y.p
-
--
i.e.
awp
--)--
'v
z2(2F)
(see Appendix 6.1). This distribution of can be used to consider the finite
sample properties of as well as test hypotheses or set up condence
intervals for the unknown parameter 0. For instance, in view of the fact that
'l
E)
T
=
vart
w2p2
(T- 2)a (F- 3)
34 1
models
(see Johnson and Kotz ( 1970:. Using the data on income distribution
5000 (reproducedbelow) to estimate 0,
Chapter 2), for
(see
.vy
6000
5000
we get
#1
log
T'
l
..
8000
7000
12000
15 000
20 ()(X)
P()
as the ML estimate.
Using the invariance property of MLE's (seeSecton 13.3) we can deduce
that
'tarl)
'()
2. 13,
0.9 1.
As we can see, for a small sample (T= 8) the estimate of the mean and the
variance are considerably larger than the ones given by the asymptotic
distribution:
'll=
a
1.6,
42
vartl=F
=0.32.
'()
1.63,
Varl) 0.028,
=
with
as compared
4
-
1.6,
kart
=0.026.
These results exemplify the danger of using asymptotic results for small
samples and should be viewed as a warning against uncritical use of
asymptotie theory. For a more general discussion of asymptotic theory and
how to improve upon the asymptotic results see Chapter 10.
The statistical inference results derived above in relation to the income
of the
distribution example depend crucially on the appropriateness
should
represent a
statistical model postulated. That is, the statistical model
good approximation of the real phenomenon to be explained in a way
which takes account the nature of the available data. For example, if the
data were collected using stratified sampling then the random sample
assumption is inappropriate
Statistical models in
onometrics
343
(ii)
random.
For time-series data the sampling models of a random or an independent
sample seem rather unrealistic on a priori grounds, leaving the non-random
sample as the most likely sampling model to postulate at the outset. For the
time-series data plotted against time in Fig. 17. 1@)-(J)the assumption that
they represent realisations of stochastic processes (seeChapter 8) seems
more realistic than their being realisations of llD r.v.'s. The plotted series
exhibit considerable time dependence. This is confirmed in Chapter 23
where these series are used to estimate a money adjustment equation. ln
Chapters 19-22 the sampling model of an independent sample is
intentionally maintained for the example which involves these data series
and several misleading conclusions are noted throughout.
In order to be able to take cxplicitly into consideration the nature of the
observed data chosen in the context of onometric modelling, the statistical
models of particular interest in econometrics will be specified in terms of the
observable r.v.'s giving rise to the data rather than the error term, the usual
35000
25000
f
X
15000
5000
1963
1966
1969
1972
Time
1975
1978
1982
1966
1969
1972
Time
1975
1978
1982
consumerf
expenditure.
(a)
18o
'!' 160
J.
14000
12000
1963
(/?)Real
17.2
240
200
160
@
Q.
120
80
40
1963
1966
1969
1972
Time
1975
1966
1969
1972
1975
1978
1982
1978
1982
15
12
9
w*
()
1963
'
T i me
account.
'an adequate' approximation to the actual DGP giving rise to the observed
will be considered
data (see Chapter 1). This additional eomponent
section
17.4
ln
next
the nature of the
the
below.
extensively in Section
modelling
will
econometric
required
models
be discussed in
in
probability
model.
sampling
of
the
view of the above discussion
346
*= (D(y'2; 0t),
ot6 0,
t CETl.,
(17.1)
.)
where T ) 1, 2,
is an index set.
A non-random sample y raises questions not only of time-heterogeneity
but of time-dependenceas well. In this case we need thejoint distribution of y
in order to define an appropriate probability model of the general form
=
/t.D().'1
.p2,
A'w;#r),
ove' 0,
T1
(1, 2,
F)
T)
ln both of the above cases the observed data can be viewed as realisations
of the stochastic process (yf,t e: T) and for modelling purposes we need to
restrict its generality using assumptions such as normality, stationarity and
asymptotic independence or/and supplement the sample and theoretical
information available. In order to illustrate these let us consider the
simplest case of an independent sample and one incidental parameter:
(i)
D(y2,. 0 t )
exp
(2zr)
1
-- 2
-pt
2
,
y EEE()71,ya,
.
347
model
17.3
0t), t
1, 2,
(c2)-F/2(2a)-F/2
exp
1
2c a f j
-jtf)2
(.yf
gw),
As we can see, there are T+ 1 unknown parameters, #= (c2,jtj, yz,
sufficient
provide
with
which
observations
only
estimated
and
T
us
to be
warning that there will be problems. This is indeed confirmed by the
maximum likelihood (ML) method. The log likelihood is
.
log L(p', y)
log L
t'q/z, =
,log L
f,c2 =
const
--j-
1
(
2c a
T
-
2c z
log c2
2)(.J4
2c
2c a
pt)
1
-
-/tl)2,
0,
(17.4)
(y:
,-
1, 2,
(17.5)
-6
.
-/zf)2=0.
Z(.pt
( 17.6)
:2 1og L
aA
1= 1, 2,
'C and 82
=0.
it is important to look at
1
T
;
= 2c 4.
Z(A'-/4)
'
-;
c=
which are unbounded and hence lt and :2 are not MLE's; see Section 13.3.
This suggests that there is not enough information in the statistical model
(i)-(ii) above to estimate the statistical parameters 0= (Jz1,pz,
pw', c2).
An obvious way to supplement this information is in the form of panel
T: In the case where N
N, t= 1, 2,
data for yt, say y, i 1, 2,
realisations of y, are available at each r, 0 could be estimated by
.
1.
pt N
=
--
xv
Z A',
-
r= 1,2,
and
:2
1
T
j jy (yy ptll.
-
1 i
of 0.
lt can be verified that these are indeed the MLE'S
An alternative way to supplement the information of the statistical model
(ijvii) is to reduce the dimensionality of the statistical parameter space 0.
This can be achieved by imposing restrictions on 0 or modelling p by
relating it to other observable random variables (r.v.'s)via conditioning (see
Chapter 7). Note that non-stochastic variables are viewed as degenerate
r.v.'s. The latter procedure enables us to accommodate
theoretical
information within a probability model by relating such information to the
statistical parameters 0. ln particular, such information is related to the
meap (marginalor conditional) of r.v.'s involved and sometimes to the
variance. Theoretical information is rarely related to higher-order
moments (seeChapter 4).
The modelling of statistical parameters via conditioning leads naturally
to an additional component to supplement the probability and sampling
models. This additional component we call a statistical generating
mechanism (GM) for reasons which will become apparent in the discussion
which follows. At this stage it suffices to say that the statistical GM is
postulated as a crude approximation to the actual DGP which gave rise to
the observed data in question, taking account of the nature 4 such data as
well as theoretical a priori information.
In the case of the statistical model (i)-(ii) above we could solve' the
inadequate information problem by relating pt to a vector of observable
1, 2,
variables xlj, xcf,
'C say, linearly, to postulate
xkf l
.
pt b'xf,
( 17.9)
'.r
yt
=b'x
+ ut,
Eytut)
.y,
0 and
Eutj
0,
Elutl)
c2,
Eutus)
0,
f #s.
17.4
The statistical
generating
mechanism
349
Gauss linear model (see Chapter 18). The above statistical GM will be
extended in the next section in order to deline some of the most widely used
statistical models in econometrics.
17.4
The statistical
generating mhanism
and
.))
3'f pt +
=
where
ut
(17. 11)
#f f(#t/V),
(17. 12)
L being some c-field. This defines the statistical process generating y, with
pf being the postulated systematic mechanism giving rise to the observed
and ut the non-systematic
data on
part of yr defined by ut y'f
Defining ut this way ensures that it is orthogonal
to the systematic
component /t,; denoted by Jtf-l-l/? (see Chapter 7). The orthogonality
condition is needed for the logical consistency of the statistical GM in view
of the fact that ut represents the part of yf left unexplained by the choice of pt.
The terms systematic, non-systematic and orthogonality are formalised in
terms of the underlying probability and sampling models defining the
statistical model.
lt must be emphasised at the outset that the terms systematic and nonsystematic are relative to the information set as defined by the underlying
probability and sampling models as well as to any a priori information
related to the statistical parameters of interest 0. This information is
incorporated in the definition of the systematic component and the
remaining part of yt we call non-systematic or error. Hence, the nature of ut
depends crucially on how pf is defined and incorporates the unmodelled part
of #,1.This definition of the error term differs significantly from the usual use
of the term in econometrics as either errors-in-equation
or errors of
The
of
the
in
the
book
concept
present
measurement.
use
comes much
tnoise'
used in engineering and control literatures (see
closer to the term
.vf
-pt.
350
Klman
(1982:.Our
tzt,
Zt
'
.l
Xt
For a conditioning
information set
E (#l/V't )
(17 13)
l,
.p,
-
component
.E(.h/f4),
ut
(17.14)
(17.16)
( 17. 17)
using the properties of conditional expectation (see Chapter 7). It is
important to note at this stage that the expectation operator E( ) in (16)and
(17) is defined relative to the probability distribution of the underlying
probability model. By changing Lh (andthe related probability model) we
can define some of the most important statistical models of interest in
econometrics. Let us consider some of these special cases.
Assuming that .tZf, l q T) is a normal 11D stochastic process and
(a)
choosing % (Xt xf ),a degenerate c-field, (15)takes the special
form
'
yf
p'xt+ ut,
t 6 T,
(17.18)
choosing
.t
.f-
t
=
c(Xf), (15)becomes
#'X +
l&,
(17.19)
generating mechanism
The statistical
17.4
.),
.),
-f,
+
A', #k)x,
=
(d)
Z (.af1', + /ixf
-i
-f)
(17.20)
+ uf,
(17.21)
yl B'xf + u2,
=
z,
(xX,)
)
(t'..;
((0:)
s&'a2a))
( 17.22)
(17.23)
H($,
which detine ( uniquely This situation for example arises in the case of the
yrrlulrtlnptlusequations model where the statistical parameters of interest are
the parameters defining (21) but the theoretical parameters are different (see
Chapter 25). In such a case the statistical GM is reparametrised/restricted
in an attempt to define it in tenns of the theoretical parameters of interest.
restricted statistical GM is said to be an econometric
The reparametrised
model.
.
1.2).
17.5
Looking ahead
(:)x
(:)x
rt
1I! Y
'
w-
*=
II
''7
'
cp>>-
.-.
-M
'
.-.
*'
Z'-u
A
Il Q
x:
41
ga
ea
=6
>'
...
w.
d
:
;=
-* G
o ew
.=. (7
' + :
m
.0
(:)x
+ u.
.-k o
+ <
V
A
AN
a-
#=
>/
lD
tp
r
..
hw
t
'-d
Ux1!1
7:-
1*
+ rm..-x
8..
cw W
Il Il
-.
,.>
rq
k)
<
o
,.
R
C1
<
<
<
11 I1
11
>'
;wx
:>.
>h=
.
.
'-
1I1
=
'U
o
-6
r
o
.a
m
CJ
V-
x
d
=
=
:: o
1)
.a
=
-
*'
tv
1)
=
.
.
.q;!
ce
.=
2
l
=
x
o =c:
w.
x
1)
Q
=
ct
r w.
>.
lz
D
C)
;;
*
.
D
=
.-.
$7
*v:
'i;
Y
O
; o
.
'-
=u
'
a
m
v.
xo
w.
iz
'z
.z
z
: O
354
reason for the extensive discussion of the linear regression model is that this
statistical model forms the backbone of Part lV. ln Chapter 19 the
estimation, specication testing and prediction in the context of the linear
regression model are discussed. Departures from thc assumptions
(misspecification) underlying the linear regression model are discussed in
Chapters 20-22. Chapter 23 considers the dynamic linear regression model
which is by far the most widely used statistical model in econometric
modelling. This statistical model is viewed as a natural extension of the
linear regression model in the case where the non-random sample is the
appropriate sampling model. In Chapter 24 the multivariate linear
regression model is discussed as a direct cxtension of the linear regression
model. The simultaneous equation model viewed as a reparametrisation of
the multivariate linear regression model is discussed in Chapter 25. In
Chapter 26 the methodological discussion sketched in Chapter 1 is
considered more extensively.
Important concepts
Time-selies,
Questions
Explain why for most forms of economic data the notion of a random
4.
sample is inappropriate.
Explain the concept of a statistical GM and its role in the statistical
model specication.
Explain the concepts
the systematic and non-systematic
components.
Discuss the type of information relevant for the specification of a
statistical GM.
Appendix 17.1
Appendix 17.1
adjusted data on rnt?rll),' stock M1 (M),
real consumers' expenditure (F), its implicit price dejlator P) and interest
rate on 7 days' deposit account (1)for the period 19631-19821v. source:
Economic Trends, ptrlnurl/ Supplemenb, J9#J, CSO)
Table 17.2.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
Quarterlyseasonally
6740.0
6870.0
6990.0
7210.3
12 086.0
12 446.0
12 575.0
12 618.0
12 691.0
0.402 53
0.403 26
0.405 8 1
0.408 15
0.412 58
0.416 36
0.42 1 27
0.426 98
0.432 98
0.437 8 1
0.442 38
0.446 37
0,449 94
0.454 97
0.459 72
0.465 36
0.465 33
0.467 36
0.470 42
0.474 35
0.477 82
0.489 52
0.497 14
0.50 1 10
0.51 103
0.516 94
0.520 83
0.527 46
0.534 99
0.544 82
0.552 99
0.565 33
0.576 53
0.592 99
0.603 48
0.610 49
0.6 15 75
0,624 45
0.641 8 1
0.657 58
0.202 OOE-OI
0.200 E--0I
0.200 (XIE--OI
0.200 E-0I
0.237 OOE-OI
0.300 OOE-OI
0.300 E-0I
0,390 E-0I
0.500 E-0 1
0.470 E-0 1
0,400 OOE-OI
0.400 E-OI
0.400 E-0 1
0.400 E-0 1
0.486 E-0 1
0.500 E-0 l
0.45500E-01
0.368 E-0I
0.350 E-0I
0.489 OOE-OI
1
0.594 OOE-.O
0.550 E-.0 1
0.544 OOE-OI
0.500 E-.01
0.535 E--0I
0.600 E-0 1
0.600 E--0I
0.600 OOE-OI
0.585 E-0I
0.508 E-OI
0.500 YE--PI
1
0.500 OOE-.O
0.500 E-0 1
0.400 E-0I
0,367 XE--OI
0.325 E-0I
0.250 E-.01
0.250 E-0I
0.470 E-0 1
0.544 XE-OI
0.7 18 OOE-OI
0.703 OOE-O1
0.827 E-0I
7280.0
7330.0
7440.0
7450.0
7490.0
7570.0
7620.0
7610.0
7910.0
7830.0
7740.0
7600.0
7780.0
7880.0
8160.0
8250.0
82 10.0
8340.0
8530.0
8640.0
8490.0
83 10.0
8380.0
8660.0
8640.0
8920.0
9020.0
9420.0
9820.0
9900.0
10210.0
10 310.0
1 13.0
11 740.0
12 050.0
12 370.0
12 440.0
13 200.0
12 960.0
12787.0
12 847.0
12 949.0
12 959.0
12960,0
13 095.0
13 117.0
13 304.0
13 458.0
13 258.0
13 164.0
13 311.0
13 527.0
13 726.0
13 82 1.0
14 290.0
13 69 1.0
13 962.0
14 083.0
13 960.0
13988.0
14 089.0
14 276.0
14 2 17.0
14 359.0
14 597.0
14 64 1.0
14 603.0
14 867.0
15071.0
15 183.0
15 503,0
15 766.0
15 930.0
16 07 1.0
16 724.0
16 525.0
16 566.0
0.665 15
0.677 76
0.695 34
continued
356
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
13 020.0
12 850.0
13 230.0
13 550.0
14 460.0
14 850.0
15 250.0
16 770.0
17 150.0
17 880.0
18 430.0
19 050.0
19 0.0
19 440.0
20 430.0
2 1 970.0
23 170.0
24 280.0
24 950.0
25 920.0
26 920.0
27 520.0
28 030.0
28 840.0
29 360.0
29 260.0
29 880.0
29 660.0
30 550.0
3 1 8 10.0
32 870.0
33 210.0
33 760.0
36 720.0
37 590.0
38 140.0
40 220.0
16 5 17.0
16 2 11.0
16 169.0
16 288.0
16 38 1.O
16 342.0
16 358.0
16 0 15.0
15 937.0
16 105.0
16 163.0
16 199.0
16 240.0
15 980.0
16 020.0
16 153.0
16 364.0
16 840.0
16 884.0
17 249.0
17 254.0
17 396.0
18 3 15.0
17 8 16.0
18 072.0
18 120.0
17 729.0
17 83 1.0
l 7 870.0
18 040.0
17 926.0
17 934.0
17 97 1.0
17 927.0
17 998.0
18 242.0
18 543.0
-
0.72 1 44
0.752 76
0.790 53
0.950
0.824 84
0.867 22
0.9 19 47
0.982 88
1.0297
1.0703
1 1027
1.1322
1.1679
1.2237
1.2770
1.32 16
1.3523
1.3766
1
1.4363
1.46 19
1.49 10
1.5357
1.5796
1
1.7419
l.8 109
1.8823
1.9246
1 17
2.0 154
0.642 OOE-OI
E--0I
E-.0 1
0.592 OOE-.O1
0.693 OOE-OI
0.700
0.588
0.107 30
0.565
E-0 1
0.428 OOE-.O1
0.377 E-.0 1
0.332 E-.O1
0.306 OOE-0 1
0.5 l 8 OOE-.O1
0.675
E-0I
0.857 OOE.-01
0. 103 70
0.993
E-0 1
0. 1 15 00
0.13 1 40
0. l50
0. 150 ()0
0. 140 50
0. 13 1 (X)
0.109 40
0.9E-0
1
0.943 O0E-O1
0.133
0.114 20
0. 100 10
0.829 E-0I
0.624
E-.0 1
,4059
.6808
.97
2.0867
2. 1343
2. 1838
2.2 177
2.2673
2.29 19
2.3076
-U
Additional references
E-0 1
0.950 OOE--OI
0.950 E-0I
0.950 E--0I
0.950
E-0 1
0.846 E-.0 1
0.625 E-.0 1
'
CHAPTER
18
18.1
Specifkation
In the context of the Gauss linear model the only random variable involved
is the variable whose behaviour is of interest. Denoting this random
variable by yt we assume that the stochastic process ftyt, t G T) is a normal,
independent process with F(y,) pf and a time-homogeneous variance c2
for t (E T (-T being some index set, not necessarily time),
=
37f Nlpt,
'v
tr2),
'
.3)
.@
pt
Eyt/fqz'zu E(#,).
(tB.2)
ut,
J'tt=
) ix i =
b'x t
357
J't
.E(#t).
component
by
(18.4)
358
the statistical GM
.J'!
,
b'x t +
(2)takes the
particular
form
(18.5)
u t,
*=
j--r-jV( )
exp
1
-
(y? b'x,)2
-
IG
0 e' Ild x
R+ tGT
,
(18.6)
tyf,
(-p1,
ymb
.p2,
yvl'
(i)
(ii)
.E'(l,)
Eptutj
Elutus)
=/t,'(l,)
=0-,
G' 2,
=
0,
l + s,
t, s c T.
18.2 Estimation
359
18.2
Estimation
For expositional purposes 1et us consider the simplest case where there are
only two non-stochastic variables (k
and the statistical GM of the
Gauss linear model takes the simple form
=2)
A't d?1+
=
d72.X1t
(18.7)
14,
The reason for choosing this simple case is to utilise the similarity of the
mathematical manipulations
between the Gauss linear and linear
regression models in order to enhance the reader's understanding of the
matrix notation used in the context of the latter (seeChapter 19). The first
variable in (7)takes the value one for all t, commonly called the constant (or
intercept).
ln view of the probability model (6)and the sampling model assumption
of independence we can deduce that the distribution of the sample (see
Chapter 11) takes the form
F
D(1'1, #2,
J'w; 04
17D(A',;04.
(18.8)
360
log 2r--j-
--j-
t?log L
1og c a
(
p/?l = 2c
Z (A',
-2)
-'1
(18.11)
72-Y1) =0,
(18.12)
Solving (1 1H13)simultaneously
JI
/;2
MLE'S
J'c.f,
)- -
we get the
Z (.J.,,-
( 18.14)
.9
.:(.x',
( 18. 15)
x32
Z(x1(
where
42
and
Jj J2x, represents the
where gfEE:
of
the
error term uf. Taking
testimator'
,,
( 18.16)
estimated residuals',
second derivatives
.p,
=-
the natural
( 18 17)
.
( log L
p/?lpha = p21ogL
t?c2
J)2
1
1
p2log L
jgx,, kb
1 pcz
,
1
#.
$ xtut.
t
1
-
''-'tri
&''
(18.18)
18.2 Estimation
For p-(/?:, hc, c2), the sample information matrix Iw(p) and its inverse are
Zxt
c2
xf
Z xl
lwtpl-
c2 )
x2t
''
Z(x t
r
-' 1
Il w($1
(18.19)
0.2
-F
-t-Y3
- c
-y
-c2 jj-0 x
T
j-2.x
)((x,-'.)
) (xf
---------L?iq---L--)3
--')i)7----t)
.92
---------
if , (x,-.f)2#0,i.e. theremust
is positivedefinitetlwtpl>t)
be at least two distinct values for xt. This condition also ensures the
existence of J'cas defined by ( 15).
xorcthat/rtp)
of 0
Properties
1,
2,
Asymptotic properties
/'T4
E,(4)
p)
estimator of 0 (if
1
N(0, gI.(p)1 -
'
0, i.e. asymptotically
i.e.
tr:
.xt
-.
I is asymptotically
unbiased
()I,x,(p)q-
1,
i.e.
as T-+ w);
normal;
is 0);
varal
,z-
is asymptotically
efficient.
362
lim
v- x
--
1
T
)((x/
-
x3 c
( 18.2 1)
qxx > 0.
(2)
f=l
t=1
(18.22)
(vi)
-->
Jj
Jc
'v N
bj
b,
( 18.23)
where
vartra)
Z(x:-
.92
-x')2j,
T.l
2
.vz2(F-2),
Zf
.f),
(18.24)
18.3
Hypothesis
363
Tl
F-2
/.2
(T-2)
S442)=-T'
::::>
(; 1,
(x)
'
.$2
but
the estimator
,vz2(T'-2),
) are independent
c2.
(72c/:c2
of
by considering
(18.25)
.s2
(or 42).
the covariance
between them.
Comparing
Var
o
(23) with
(T- 2).s2
az
o.
Var(y2)=
2c*
T- 2
2(T-2)
2c*
the Cramer-Rao
T -
bound.
18.3
deviation
)1
l-fj against
H3 : hl #
-1,
being a constant.
364
basis for a
igood'
( 18.26)
( 18.27)
IJ
1 1(r-2).
--,-$---1-.-N,'g'%ar(/71)(I
-b-
( 18
.28)
cj
$J -l; j
(' (k'
.----t---s--.y
=
y..
ca
Jgvart/?llj
where 1
dr(F-2).
-a
=
-
(18.29)
L'a
Using the duality between hypothesis testing and confidence intervals (see
level confidence interval for bk
Section 14.5) we can construct an (1
based on the acceptance region,
-a)
C()('1)
y:
1;1
'1
(?1
-
% cu
#r
Evartrll.'.l
cz %
'1)
%ca
( 18.30)
T'Zt.Y'-O2
l
.v/
jj (xj-p2
t
(18 1)
.3
x,
18.3 Hypothesis
Similarly, for H(): Iz
Clts-cl
y:
y: a
1$-2
the rejection
z#
1r2- -l > cz
'
x' gkartrclq
:%
(T'-2)s2
:$
#r(Co)
is
( 18.32)
lh against
Consider Hv c2
useddirectly to set
tr'tl
365
(18.34)
1 a,
(25)can be
such that
( 18.35)
A ( 1 -J)
c(y)
c2:
(F-2)s2
ca
ts
:6:
(T-2)s2
( 18.36)
pt= :1
bzxt.
A natural estimator
Vart/)
of pt is
Vart;j +
41+ lzxt,
with Et)
pt, and
;2xt)
( 18.37)
366
(Jf
-
(i)
pt)
gvartrn
'v
Jf is
normal and
x(() 1).,
(js.a8)
(18.39)
construct
jsl
(x,
-
.y
Z(x,-.k)2
l
( 18.40)
This confidence interval can be extended to t> T in order to provide us with
a prediction conlidence interval for yvyt, ly 1
( 18.4 1)
18.4
Experimental
design
MLE'S
J1and
h of
bL
and bz
Looking ahead
18.5
'accurate'
.k
;2) 0
Covlrl,
(18.42)
and and ;2 are now independent. This implies that if we were to make a
changeof origin in x, we could ensure that J1 and z are independent.
Secondly, the variances of J1and /2 are minimised when f xtl (given
isas large as possible. This can be easily achieved by choosing the value of xl
and as large as possible. For
to be on either side of zero (to achieve
example,we could choose the xfs so that
';1
=0)
.'.1=0)
X2
X T+ l
X F
XF
K>
( 18.43)
11
18.5
Looking ahead
From the econometric viewpoint the linear control knob model can be seen
to have two questionable features. Firstly, the fact that the xfls are assumed
to be non-stochastic reduces the applicability of the model. Secondly, the
independent sample assumption can be called into question for most
economic data series. In other disciplines where experimentation is possible the Gauss linear model is a very important statistical model. The
purpose of the next chapter is to develop a similar statistical model where
th first questionable feature is substituted by a more realistic formulation
of the systematic component. The variables involved are all assumed to be
random variables at the outset.
Important concepts
Non-stochastic or controlled
repeated observations.
design,
368
Questions
Explain the statistical GM of the Gauss linear model.
Derive the MLE'S
of b and c2 in the case of the general Gauss linear
model where
b'xr + ut, t= 1, 2,
7; x, being a k x 1 vector of nonstochastic variables, and state their asymptotic properties.
Explain under what circumstances the MLE'S f 1 and 42of bj and bz
respectively are independent. Can we design the values of the non.p,
4.
against
HL :
bf #U,
(197 1);
CHAPTER
19
19.1
Introduction
The linear regression model forms the backbone of most other statistical
of the
models of particular interest in econometrics. A sound understanding
regression
prediction
and
estimation,
the
linear
in
testing
specification,
model holds the key to a better understanding of the other statistical models
In relation to the Gauss linear model discussed in Chapter 18, apart from
and the mathematical
similarity in the notation
some apparent
statistical
involved
analysis,
the linear regression
in the
manipulations
situation
model
from the one envisaged
model pumorts to
a very different
particular
model
could be considered to
the Gauss linear
by the former. In
models of the
estimable
analysing
statistical
model
for
be the appropriate
form
Mt
Jli,
=aa+
(19.2)
Akrxbpxzlx
(19.3)
Spaification
19.2
j (( j( ) jj
(x-'',
.''' yx
t;,1
sl1,2,
yt
Eyt/xt
xf)
y, E(y,,/Xf
14,,.,u,
=
is the
x,)
systematic
the
pt
p0
and
E (A',/'X,
my
/:
X,)=
component,
non-systematic
(19.5)
where
and
Jo + fxt (linearin
2E2-21m
x
,
component
x,),
(19.6)
az 1
# Ec-21
=
Var(u,/X,=x,) =Var(y'/X,=x,)=c2
(homoskedastic), (19.7)
19.2 Specification
where c2 cj 1 - o.j atc-al ,yj (seeChapter 15). The time invariance of the
parameters jo, ( and tz = stems from the identically distributed (lD)
assumption related to .tZr, t G T). It is important, however, to note that the
ID assumption
provides only a sufficient condition for the time invariance
of the statistical parameters.
ln order to simplify the notation let us assume the m 0 without any loss
of generality given that we can easily transform the original variables in
and (Xt
This implies that ?v, the
mean derivation form (y,
coefficient of the constant, is zero and the systematic component becomes
=
-n1y)
-mx).
#'Xl.
(19.8)
l-tt E (-Fr?''XtN)
ln practice, however, unless the observed data are in mean deviation form
the constant should never be dropped because the estimates derived
otherwise are not estimates of the regression coeflicients #= Ec-21J2, but of
'(XlX;)- ''(Xf'.Ff); SCe Appendix 19.1 on the role of the constant.
#*
The statistical GM of the linear regression model takes the particular
form
y', p'xt+ Ikt,
(19.9)
=
with 0- (#,c2) being the statistical parameters t#' interest; the parameters in
terms of which the statistical GM is defined. By construction the systematic
and non-systematic components of (9)satisfy the following properties:
E(ut%t
Xf)
(ii)
f;tutls/x,
xf)
(iii)
Eptut/xt
Xt)
E(1'r
-
ptkutjxt
X?)
=0,
tuf,
'
'
(i)'
'(u,)
'tutlks)
=
F(f'(?.k,/X,= xl))
'hftftagus/''xt
=0.,
xt))
G 2,
=
0,
and
(iiil'
JJ(/t,uf)
Elkytut
'Xf
xj))
=0,
r, s iE T
expectation).
The conditional distribution D()',/X,', 04is related to thejoint distribution
Dn, X; #) via the decomposition
D(#,, Xf',
) D(1',//X?; l ) D(X,;
( 19. 10)
a)
'
(see Chapter 5). Given that in defining the probability model of the linear
regression model as based on 1)(),,,/'Xf;p) we choose to ignore D(X,; #2)for
the estimation of the statistical parameters of interest p. For this to be
possible we need to ensure that X, is wt?kk/), exogenous with respect to 0 for
the sample period r= 1, 2,
r (see Section 19.3. below).
For the statistical parameters of interest 0 EEE(#, c2) to be well defined we
need to ensure that Ec2 is non-singular, in view of the formulae #= E2-21o.c1,
c2 cj j - o.j aE2-a1,a1 at least for the sample period r 1, 2,
T This
requires that the sample equivalent of 122,( 1/F)(X'X) where X EEE(xj
i.e.
xwl' is indeed non-singular,
.
,x2,
rank(X'X)
ranktx)
k,
Xf being a k x 1 vector.
As argued in Chapter 17, the statistical parameters of interest do not
necessarily coincide with the theoretical parameters of interest t. We need,
however, to ensure that ( is uniquely defined in terms of 0 for ( to be
identnable. ln constructing empirical econometric models we proceed from
a well-defined estimated statistical GM (seeChapter 22) to reparametrise it
in terms of the theoretical parameters of interest. Any restrictions induced
by the repa'rametrisation, however, should be tested for their validity. For
this reason no a priori restrictions are imposed on p at the outset to make
such restrictions testable at a later stage.
As argued above, tbe probabilitv tnodel underlying (9)is defined in terms
of D(y'r?'Xr;p) and takes the form
(1)
=
D(yr/Xt; 0)
exp
(21)
1
-
2c
j.
(yf p'xtll
-
0 s IJd x IJ@
+ t e: T
,
(19.12)
,y'w)',
19.2 Specification
together and specify the statistical
Statistical GM, yf
specilication
model:
#'xf+
model
u,, l G T
g3j
g41
g5q
'(yr,/Xt=x,)
S(y,,/'Xt xf) - the systematic component; ut yr
non-systematic component.
the
- N(#, J2)e
0
p Na-clcaj c2 o.j j - cj ara-al ,aj are the statistical
Cov(X,, y,),
parameters of interest. Note: Eac Cov(X,), ,2I
Var(yf).)
cl j
T
Xf iS weakly exogenous with respect to 0. r 1, 2,
No a priori information on 0.
xwl/,' T x I data matrix, (T > /).
Ranklx)
k, X (x1, xa,
(11)
Probability
E1(1
17(1
p,
model
Dl.,'r,''Xf;0$
exp
'-QVx'
( )
.
(-J.'f-
2c 1
#'x,)2
D(#f Xl ; #) is normal;
ti)
Ft-r X, xf) p'xt linear in xr;
(ii)
- homoskedastic
xr) (7.2
Vart))r/x,
(free of xt),'
(iii)
p is time invariant.
.
(111)
Sampling model
g8j
y EB (-:1
,
Standard
' X#
=
textbook
model
+u.
X(0, c21w);
no a priori information on (#,c2);
rank (X) k.
Assumption (1) implies the orthogonality F(Xf'?-/f,,At x,) 0, t,z,z1, 2,
'T;and assumptions (6j to g8j the probability and the sampling models
respectively. This is because (y,?'X)
is a linear functionof u and thus normally
distributed (see Chapter 15), i.e.
(1)
(2)
(3)
(u,/X)
'v
(19.13)
19.3
latter the probabilistic and sampling model assumptions are made in terms
of the error term not in terms of the observable random variables involved
as in g11-r81.This difference has important implications in the context of
misspecification testing (testingthe underlying assumptions) and action
thereof. The error term in the context of a statistical model as specified in
the present book is by construction white-noise relative to a given
information set % 7 +.
19.3
r1q
The systematic
and non-systematic
components
As argued
D(Z1 Z2,
,
(19. 14)
)-
D(z;
17D(Z,; #,)
=
( 19. 15)
D(Zf;
#)
(19. 16)
D(#t,/X,;
'
.1-
.1,
pt
E ()4,/Xt
Xt),
lh
y?
l?f-
EtAt
q'xt.
(19.17)
Xr).
forms:
(19.18)
Spification,
estimation
and testing
pt*p1
(seeChapters
(21
and
EplulAt
( 19. 19)
xt) #z
2 1-22).
Te parameters
of
interest
(31
Exogeneity
D(.Fl, Xr/#)
#) and then
we
(19.20)
19.3
g4(l
N0 a priori information
on 0 >
(#, c2)
This assumption
restrictions
IS
ranktx)
(x1
,
x2,
vank
k, k < 'J:
The need for this assumption is not at al1 obvious at this stage except
perhaps as a sample equivalent to the assumption
rankttzz)
k,
1
k . jg xrxt'
1
=
y.
(X'X)
L61
linearity
Normality
homoskedasticity
Spification,
Parameter
estimation
and testing
time-invariance
g8)
Independent sample
19.4
Estimation
(1)
1'
=k(y)
lgJ cv
-
exp
(2zr)
-#'x,)2
2c .z(yt
( 19.23)
(3)
( 19.25)
19.4 Estimation
F
1
ucz- 'r t ) 1 (y,- j'x' )2ec2
T'i j(1
tl,
in an obvious notation,
( 19.26)
c2, respectively.
lf
are the maximum likelihood estimators (MLE's) of # and
staiisticl
2,
T; in the
GM, p, #'x,+ u, r 1,
we were to write the
matrix notation form
=
(19.27)
y= Xj+u,
where y >
F x 1, the
(y1
,
xsl', F x k, and u H
T x 1, X EEE(x!
suggestive
form
take the more
(uI
.vv4',
MLE'S
/ (x'x) - t x'y
14z.)',
and for
BE
The information
y -XJf,
42
( 19.28)
'
Iw(p)-s((-Pl0BL)(P1Og)')-s(-t?21OBL),
(-)0
Pp (0'
t?p
where the last equality holds under the assumption that D()'t X,; 0)
probability model. ln the above case
represents the
'true'
:2 log L
. P/ 0/
1
=
c2
p2log L T
t?c4 = 2c4
y xfx;
ca
,-1
JX
:2 log L
(x'x), t7'j
cl
= -
tr
4.
f
)
=
xf ut
1
-
2Es
ut1
( 19.29)
Hence
X'X
c2(x'X)-
Ir($
and
gIw(p)q-
0
2c4
( 19.30)
lt is very important to remember that the expectation operator above is
defined relative to the probability model Dty'f,/Xr;0).
ln order to get some idea as to what the above matrix notation formulae
380
look like let us consider these formulae for the simple model:
''f-/?cxf +',/t,
l,f ib
t- 1, 2,
( 19.3 1)
X1
X2
X'X
x, Z
x,
xl
),
tpyyl
2
.vf
X'
'
EB
xf)',
t
- p-2
.p
).;
(19.32)
y p + u,
p .u
between the estimated systematic and non-systematic
=
components
in the
form
y=
/+,
-L
( 19.33)
19.4 Estimation
b NXj-
-X/
!h
and
= Pxy
(1-
( 19.34)
Pxly,
E(;')
(Px2 Px)
=
'(Pxyy'(l - Px))
= f7tpxyu'll
Px)),
Px)c2,
= Px(I -
since
(1 Pxly
-
since S(yu')
(1 Pxlu
c2Iw
y Pxy
=
(1 Pxly.
(19.35)
=0
/
J
2>
N 2-ai /2
tr 11
'
l-
X'X
=
X' y
.--.
( 19.36)
12
E 22
- a 21,
)-(y,.'X)(Xw'X)-'(X'y).
oz-l;'
(19.37)
moments:
Na a :
(X'X),
c'zl
X'y,
cj
1:
1
-.
y'y.
( 19.38)
and
we could
Specification,
estimation
and testing
-?
?
yr=##+=#X
m
-?
X#+.
( 19.39)
to the
''-'
#2
'
X(X X)
/
Xy
,
.'
1-
'
p;
(19.40)
y
yy
This represents the ratio of the variation
by / over the total
variation and can be used as a measure t#' goodness C!JJlt for the linear
regression model. A similar measure of fit can be constructed using the
decomposition of y around its mean jF, that is
'
'explained'
(y'y
T)*)
(/'/
Tgl) +
/,
(19.41)
denoted as
TSS
ltotal)
ESS
RSS
(19.42)
( residual)
(explained)
R- 2
pa,#
'
(y y
yy
F
=-
'zk
Rss
-
T SS
( 19.43)
.k2
Ecorrected'
( 19.44)
19.4 Estimation
383
The
(2)
correction
An empirical
by their
example
In order to illustrate some of the concepts and results introduced so far let
us consider estimating a transactions demand for money. Using the
simplest form of a demand function we can postulate the theoretical model:
MD=
(19.45)
(X P, 1),
MD
A,jnw Ayapxzix,
or
ln
MD
( 19.46)
( 19.47)
atj + aj ln F + aa ln P + as ln 1,
(xo
ln
.4.
is
( 19.48)
where mt ln Mt, yr In F), pf ln Pt, it ln It and t1( N/(0, c2). Choosing
some observed data series corresponding to the theoretical variables, M, F,
=
.v
P and 1, say:
Vf -
M 1 money stock;
real consumers' expenditure;
J Pt - implicit price deflator of #;,'
f, interest rate on 7 days' deposit account (seeChapter 17 and its
appendix for these data series),
respectively, the above equation can be transformed into the linear
regression statistical GM:
D-t, =
(19.49)
&f.
1963-1982f:
Spilication,
estimation
and testing
/-
0.865
-
.$2
=0.00
TSS
0.055
155
24.954,
kl
'
ESS
42
0,9953,
24.836,
equation
=0.995
RSS
0.1 18.
+0.865/:$
-0.0554 +
t.
( 19.50)
The danger at this point is to get carried away and start discussing the
plausibility of the sign and size of the estimated
(?).For example,
telasticities'
have both a
we might be tempted to argue that the estimated
tcorrect' sign and the size assumed on a priori grounds. Moreover, the
tgoodness of fit' measures show that we explain 99.5t% of the variation.
Taken together these results
that (50)is a good empirical model
for the transactions demand for money. This, however. will be rather
premature in view of the fact that before any discussion of a priori economic
estimated statistical
theory information we need to have a well-dhned
model which at least summarises the sample information adequately. Well
defined in the present context refers to ensuring that the assumptions
underlying the statistical model adopted are valid. This is because any
formal testing of a priori restrictions could only be based on the underlying
assumptions which when invalid render the testing procedures incorrect.
Looking at the above estimated equation in view of the discussion of
econometric modelling in Chapter l several objections might be raised:
The observed data chosen do not correspond one-to-one to the
(i)
theoretical variables and thus the estimable model might be
different from the theoretical model (see Chapter 23).
The sampling model of an independent sample seems questionable
in view of the time paths of the observed data (see Fig. 17.1).
The
high /2 (and #2) is due to the fact that the data series for Mt and
(iii)
Pt have a very similar time trend (seeFig. 17.1(t# and (L')). If we Iook
at the time path of the actual ( #'f) and fitted (.f,,)values we notice
(explains) largely the trend and very little else (see
that fr
Fig. 19.1). An obvious way to get some idea of the trend's
contribution in .*2 is to subtract pt from both sides of the money
equation in an attempt to
the dependent variable.
'elasticities'
kindicate'
ttracks'
bdetrend-
19.4
Estimation
actual
z fitted
10,6
'**
0.4
A
-'
10.2
10.0
z'e
'sz 9.8
z.
gs'
'-
--
9. 4
,-
9.2
.e
>'
.A
9.0
8.8
1963
1972
Time
1969
1966
1982
1978
1975
9.9
9. 8
/
k/ N
N.
actual
.../
9.7
x N.
xN
/
w
d
'-
<.
z v z'h-
z.s
h
sv'''.'X
/ !
.x
/
l /
l l
'hu
9.6
1963
.h.
1966
1972
Time
1969
.pz
1975
lj
1
N/ u
/
-..v
/
.e-'
1978
f itted
1982
detrended dependent
ln Fig. 19.2 the actual and fitted values of the
the
The new regression
emphasise
point.
variable (r?1, are shown to
equation yielded
elargely'
-pt)
tmf-p1)
2.896 +0.690.:,
.k2 0.468
=
-0.135pt
42 0.447
=
sl
-0.055t
+
=0.00155.
Iif,
(19.51)
386
Spification,
estimation
and testing
Looking at this estimated equation we can see that the coefficients of the
constant, yf and it, are identical in value to the previous estimated equation.
The estimated coefficient of pt is, as expected, one' minus the original
estimate and the is identical for both estimated equations. These suggest
that the two estimated equations are identical as far as the estimated
coefficients are concerned. This is a special case of a more general result
related to arbitrary linear combinations of the xzs subtracted from both
sides of the statistical GM. In order to see this let us subtract y'xf from both
sides of the statistical GM:
.s2
or
-T'Xf
(#'-T')Xf
),t
(19.52)
+ ut
j*zx + ut,
(19.53)
where
*
*
=y -X#
-+
'-'*
j =(X X)
'
X y#
'
=#-y,
-'
*'*
F- k
'
=
-
kl is not
invariant
'
t' +,y+-
(19.54)
'
2.
,z'y-+2
(19.55)
.*2
of the
As we can see, the
dependent variable equation is less
than half of the original. This confirms the suggestion that the trend in pt
contributes significantly to the high value of the original kl. It is important
to note at this stage that trending data series can be a problem when the
asymptotic properties of the MLE'S are used uncritically (seesub-section (4)
below).
'detrended'
19.4 Estimation
(2)
387
T(y)
of it.
Using the Lehmann-scheffe
theorem
the values of yo for which the ratio
exists, then
(seeChapter
oty,'x; (2,,:,.c)-w,c
(y-x#)'(y-x#))
expt-aota
p)
Dyo X; 0' =
l must be a function
.y
.rg1
w,,a
taagal-
exp
.xj),(y()
ty()
(19.56)
xpjq
is independent of 0, are ygyll y'y and X'y: X'y. Hence, the minimal
lz
T2(y))
sufficient statistic is z(y) H (z1(y),
(y'y,X'y) and J (X'X) - 2 (y),
1za(y))
T(y).
42 (1/F)(z1(y) -zk(y)(X'X)are indeed functions of
ln order to discuss any other properties of the MLE of 0 we need to
derivethe sampling distribution of 4.Given that / and :2 are independent
we can consider them separately.
=
The distribution of
/=(x'x)-
lx'y
/
(19.57)
Ly,
''*
!X) - 1
where L H(X
X is a k x T matrix of known constants. That j s,
linear functionof the nonnally distributed random vector y. Hence
,
p- NILX/, c2LL')
'v
or
p-'wNp c2(X'X)-
j is a
')
(19.58)
The distribution of 42
42
where Mx
=-
F
I
Tdl
c2
(y-Xj)'(y
-Xj)
Px. From
(Q2)of Chapter
'wzzttr Mx),
=-
'
=-
u'Mxu,
.
(19.59)
(19.60)
Spaitkation,
estimation
and testing
)J'- 1 aii, A: n x
= F- tr(X'X) - 1(X'X)
(since tr(AB)
n),
tr A + tr B)
tr(BA))
'-k.
=
Hence, we can deduce that
T.l
g2(w- k).
( 19.6 1)
T- k
cz
and
Var
F#2
c
2(T- k)
f)ti2)=
-T
c2 # c2
'
That is:
(3(ii))
(4(ii))
sl
(T- k)
'
( 19.62)
S2
(Ty z
'v
/()
(19.63)
.s2
.:2
.$2
.s2
389
19.4 Estimation
Cov(/)=c2(x'x)-
(19.64)
way
to proceed in
( 19.65)
1
.
(19.66)
-0.055,
5.2
72
=0.9953,
f,
(0.013) (0.039)
F= 80.
1og L= 147.412,
:=0.0393,
=0.9951,
Note that having made the distinction between theoretical variables and
observed data the upper tildas denoting observed data have been dropped
for notational convenience and R2 is used instead of 112in order to comply
with the traditional econometric notation.
(4)
(/
EEE
'2)
asymptotic
0v
Consistency
-.+
0)
(i)
42 is a consistent estimator of
lim
#K(l'2
c2j <
:)
c2, i.e.
1,
T-+ x
since MSE(#2)
-+
lim(X'X) F-+
EEE
limw- #r(l/
y
lim
F-,
-j1
< )) =
1, i.e.
xfx;
1
=
0,
J is a consistent
estimator
of
b.
390
Covt#)
(2)
ln order
-+
(x''z'(I,.
Asymptotic normality
I to be
of
(seeChapter
as T-+ ct.l
asymptotically
12).
-1))
-x(O,I.(p)
-p)
-+
1im 1 Iw(p)
1.(p) F-+
-gx.
=
(i)
x'/wtdz
-c2)
x(0,2c4).
.v''r(/
and non-singular
then
1).
,v
(19.68)
Qx is bounded
x(0,c2Q
-#)
(19.69)
(3)*
(&
Strong consistency
a .S
0)
-,+
a S.
.
-2 iS a Strongly
consistent
lim
Pr
T-+ a.
+2
estimator
(J2
of c2
-+
2)
t)-
ie
.
( 19.70)
1.
c2)
u'
P
u
F k
c2
where w H(w1,
that
wc,
ww-k), w
T- k
F (w/
k tr 1
c2),
(19.71)
matrix, such
F-k
1-1'41 Px)l1
-
diagt 1, 1,
1, 0,
0).
=0
-c2)2
(see
19.4 Estimation
391
a.s.
-F E(w/
,
Using the fact that
-c2)
('2
c2)
-+
jg1 (w/
c2) +-
s2.
....+
w-k
=-
a,s.
.s2
0,
( 19.72)
(7.2
T
a. S
X Y'
'
m.
p) if
-+
< C, i
Ixjgl
x'x
T
m
(#
f)
1, 2,
Z
t
z'
k, t
(X'X)- 1X'u
jg
x f x/t T
1 2,
for all T
=0
- 1
is non-singular
and F(xftlr)2
Since f2txttltl
SLLN to deduce that
XIX/ r
-T' Z sut
xJc2<
C-constant; and
(19.73)
1
) xfjkf.
(19.74)
w,
a.s.
->
0.
( 19.75)
Ixftxpl
1, 2,
k, r, s
1, 2,
1
xrx'r =Qx < :z) and non-singular,
lim
F
t
r-, x
needed for the asymptotic normality of / is a rather restrictive assumption
because it excludes regressors such as xf, t, t 1, 2,
T) since
-
x,?,
=*6
(19.76)
Zl
:1
Zf
Spification,
estimation
and testing
big as its
is that every random variable with bounded variance is
c/ < tz, then Zi Op(cf)
standard deviation-, i.e. if Varlzf)
(seeChapter
10).Using this result we can weaken the above asymptotic normality result
(69) to the following:
tas
Lemma
Ft??-the linear regression
Aw
model as specied
Z xjx;
(F)
t,li i
Qw
l=1
D ?- AwD /.
as y'..+ w
...+
information increases
2
--+0,
i
(li(r)
Xfw..h 1
12
,
wl
T);
F-+ z
Q, Q <
=
vz
and non-singular,
tben
D T (/-#)
Anderson
txstde
19.5
Spification
x(0,c2Q-
')
(1971:.
testing
Specihcation
393
emphasised, however, that the results of these tests should not be taken
seriously in view of the fact that various misspecifications are suspected
(indeed, confirmed in Chapters 20-.22). ln practice, misspecification tests
are used first to ensure that the estimated equation represents a well-defined
estimated statistical GM and then we go on to apply specification tests.
This is because specification tests are based on the assumption of correct
specification'.
hypothesis-testing
Within the Neyman-pearson
framework a test is
defined when the following components (see Chapter 14) are specified:
(i)
the test statistic z(y);
(ii)
the size (x of the test;
(iii)
the distribution of z(y) under Ho;
the rejection (or acceptance) region;
(iv)
(v)
the distribution of z(y) under HL.
(1)
Tests velating to c2
'good'
(T-/)
S2
'v
(19.77)
/().
(Ty z
Let us consider the null hypothesis Hv: c2 /2() (cg-known) against the
alternative hypothesis Sj : c2 > c()2. Common-sense suggests that if the
estimated c2 is much bigger than c we will be inclined to reject Hv in
favour of H3, i.e. for sl > c where c is some constant considered to be
enough', we reject Hv. ln order to determine c we have to relate this to a
probabilistic statement which involves the above pivot and decide on the
size of the test a. That is, define the rejection region to be
=
kbig
C1
s
y: (T- k) z > cz
( 19.78)
z(y)=(Fi.t).
k)
under Ho,
( 19.79)
394
Spification,
estimation
and testing
X
dz2(F-k)
a.
C2
=0.00
-:0
( 19.80)
(
'v
reads
:distributed
zty)
=
c/
zoty)
'''z
H,
'w
o.
..a' z
(T-k),
(19.81)
.#4c2) Jar
z(y) > cz
2
ti'o
2
; c > coc
Gz
dz 2 (F- k).
=
(?a(o.j,/'o.2)
(19.82)
The above test can be shown to be uniformly most powerful (UMP); see
Chapter 14. Using the same procedure we could construct tests for:
Hv.. c2
c2()against
=
or
Ho'. c2
cj against
c/*
x
0
: c2 <
C*j
(ii)
ffj
Sl
c2()(one-sided)
with
a
c2 # cj
(y:z(y) ga
cl
dz2(F-
>
)),
GO
dz2(F-k)
b
( 19.83)
(two-sided)With
or z(y)
dz2(F-k)=
k)
-X
(19.84)
395
The test defined by C.'tis also UMP but the two-sided test defined by C1* is
UMP unbiased. Al1 these tests can be derived via the likelihood ratio test
procedure.
and
t e: T 1
(1 2
.t
t s T2
T) + 1,
T1)
F),
yt pkxt+
1&,
and
#r2Xl
#t
+ ut,
Var(J',/X,
x,)
Vartyr/x,
x,)
allowed to
cf
(19.85)
( 19.86)
J2
against
H 1:
2
(7'1
a
(7'2
>
co
Tk
:2
s2=
1 T1 k l : 1 f
-
and
(-6
c?
-
st?
'vZ
c (T;
-/f),
sl/o-l/lj
k)s#/(cl/(Tc
-/4)
( 19.87)
1, 2,
yl
(due to the
sampling
model
k))1
(19.88)
396
Spification,
estimation
and testing
Hence,
T4$
s 21
c
c 0 sz
Ho
'v
'
Jt*',
-/)
(2)
Tests relating
to
pi
pi# 0 for
against
j
# is
is
0
some i
1, 2,
k.
)q x/'gcztx'xlJ
x'tvartp-
1q,
1
where (X/XIJ refers to the fth diagonal element of (X'X) - 1 Hence we can
deduce that a likely pivot for the above hypotheses might be
.
?'-/'i
(c2(X'X)J1q
xo,
1).
(1q.89)
The problem with this suggestion, however, is that this is not a pivot given
that c2 is unknown. The natural way to
this problem is to substitute
its estimator sl in such a way so as to end up with a quantity for which we
know the distribution. This is achieved by dividing the above quantity with
the square root of
ksolve'
((w-,)s2
,,''(wc-2/()),
'
.
'
19.5
Specification
testing
( 19.90)
which is a
very convenient
T(y)
jf
(F- /f).
'v
s E()()()f; 1
?
('1
region
> ca
),
(z(y)j
.ity:
a. That
is,
-a.
The decision on
optimal' the above test is can only be considered using
its power function. For this we need the distribution of zly) under H3, say
/f /$9,X # 0. Given that
%how
- jy
Ib
l(T-k),
1
Esa (X X)J l
I?
+
1y)
'r(y)c
i
E,s(x,xl.y
':()(y)
='
( 19.9 1)
'ro(r)
a non-central
with non-centrality
j=
'tT'- k;
( 19.92)
),
parameter
1q
tz g(X'X)J
/1
g(X'X)1
I
:.
s ()( X):'3 (1
=
--
= 2.8,
42 9
.s
.
g(X'X)2-1
/'?.
6.5,
.-4.1.
E(X X).t1
,
.-
398
(the coefficients are zero) are rejected. That is, the coefficients are indeed
significantly different from zero. lt must be emphasised that the above ttests on each coefficient are separate tests and should not be confused with
the joint test: HvL Fj /72 p5 p.k 0, which will be developed next.
The null hypothesis considered above provides an example of linear
hypotheses, i.e. hypotheses specified in the form of linear functions of #.
lnstead of considering the various forms such linear hypotheses can take we
will consider constructing a test for a general formulation.
Restrictions among the parameters j, such as:
=
(i)
+ /5
135
/72+
/73+ /74+ ks
can be accommodated
R#= r,
1',
1,'
ranktR)
(19.93)
0 0
1 0
r
-
()
'j
( 19.94)
Hj
# can be
:
considered
as
R## r.
Rb- r
$$
-
$r
(19.95)
thow
399
statistic related to (95).We could not use this as a test statistic for two
reasons:
it depends crucially on the units of measurement used for yt and Xf;
(i)
and
(ii)
the absolute value feature of (95) makes it very awkward to
manipulate.
The units of measurement problem in such a context is usually solved by
dividing the quantity by its standard deviation as we did in (89)above.
The absolute value difficulty is commonly avoided by squaring the
quantity in question. If we apply these in the case of (95)the end result will
be the quadratic form
(RJ
-
r)' EVar(R/
r)q - 'IR#-
r)
(19.96)
(Rj-r)
,v
N(Rj
c R(X X)
-r,
(R #
r)
'
(1
(96)becomes
IR'q - iIR#-
Ec2R(X'X)-
(19.98)
r).
-.
(R/
'-'
-r)
2
1
Ec R(X ) lR, 1- (R # -r)
,x
,v
z (1,n,),
(19.99)
chi-square
with m
i.e. (16) is distrlbuted as a non-central
(R(X'X') - 1R')) degrees of freedom and non-centrality parameter
=
(rank
( 19.100)
(see Appendix 6.1). Looking at (99)we can see that it is not a test statistic as
yet because it involves the unknown parameter a1. Intuition suggests that if
we were to substitute sl in the place at c2 we might get a test statistic. The
problem with this is that we end up with
(R/
r)'(R(X'X) S
r)
IR/IIIRJ
-
( 19. 101)
400
Specilication,
estimation
and testing
(T- k)
c
'w
(T'- k),
(19.102)
if we could show that this is independent of (99)we could take their ratio
(divided by the respective degrees of freedom) to end up with an Fdistributed test statistic', see 95 and ()7 of Chapter 15. ln order to prove
independence we need to express both quantities in quadratic forms which
involve the same normally distributed random vector. From ( 102) we know
that
sl
(F- k)
u'(I
P 'Y lu
u'Q u +
P .Y
'
X(X'X) - 1X'.
we can express
( 19.103)
(Rj-
( 19. 104)
where
In view of Qx(I
Px)
(R#
-r)
T(y)
u (1-Px)u
,
'v
'---
F(m, T- k;
(T'-k)c2
1 (RJ r)'gR(X'X) -
(19.105)
is
z(y)
z(y)
).
IR'II- IIRJ r)
'
( 19.106)
which apart from the factor ( l m) is identical to ( 101), the quantity derived
by our intuitive argument. lt is important to note that under Hv, R#= r and
J
i.e.
=0,
Ho
z(y)
F(m, T-/().
( 19.107)
t)7j
=
where a
region to be
dF(m, F- k).
t' St
(19.108)
parameter
19.5 Specilication
40 1
testing
E(T- k)(m+ )
as given in (100).ln view of the fact that
n1(T- k 2)q (seeAppendix 6. 1) we can see that the larger is the greater the
power of the test (ensure that you understand why). The non-centrality
Rj
(a very desirable
parameter is larger the greater the distance 2/
i
variance
smaller
conditional
The power also
and
the
feature)
the
c
1'n,
p2
of
order
Fln
k.
depends on the degrees freedom vl
to show this
well-known
relationship
the
between the F and beta
explicitly 1et us use
enables
which
distributions
us to deduce that the statistic
'(z(y))=
.-.r
z*(y) E''1'r(y)1
is distributed as non-central
terms of z*(y) is
beta
E)J.'1'r(y)
k#(#) #r(z*(y)
=
- (j
'2)
I
,i
1.t'
''-'/
)-( /!
=
J?al
(19.109)
power function in
cz*)
>
..2)
.v.
E(v1
c:
1+
U( 1*
-.
vb
lt-tyvl + I
l pz
2 )- 1
jjg +
,4.zva.l
( 19.110)
(see Johnson and Kotz (1970/. From (l 10) we can see that the power of the
test, ceteris paribus, increases as F-k increases and m decreases. lt can be
shown that the F test of size a is UMP unbiased and invariant to
transformations of the form:
*
(ii)
y*
=cy
where
y + pv,
pv e: (40.
1h'/ b',01
-
C0(/0)-
y:
<cu
k---WAF7XYL-X/E
( ) 1
.
(19.112)
402
) confidence interval
t#:
1h(1),
<b'p< h'/+casv7gI1'(x'x) -
lhjca
(19.113)
Note that the above result is based on the fact that if
then
V''v F( 1, T'- k)
Qvxr(F-
k).
(19.114)
#(1) + 0,
where /91)represents a1l the coefficients apart from the coefficient of the
constant. ln this case R
1 and r=0. Applying this test to the money
equation estimated in Section 19.4 we get
=1k-
z(y)
=
tjjo.(ml
248.362
546 a
5353.904,
0.05.
19.6
Prediction
p'xt+ I/f,
(19.115)
.'.'.'.=
12
,
Prediction
19.6
403
From Section 12.3 we know that the best predictor for yw+i,1= 1,2,
is its
conditional expectation given the relevant information set. ln the present
case this infonnation set comes in the fonu of w./ )Xw+/= xw.;). This
suggests that in order to be able to predict beyond the sample period we
need to ensure that 9. w+/, I >0, is available. Assuming that Xws, xr..; is
available for some lb 1 and knowing that Eluvytlxv-qt= xw-r/) 0 and
.
.9.
El-vvst Xw-,
pv-vl
a natural
x,.+,)
(19.116)
#'x,.+,,
lkl t #-xw.,.
(19. 117)
In order to assess how good this predictor is we need to compare it with the
value of y, y,vsk. The prediction error is defined as
actual
-y,.-,-w-,-uw+?+(,-#-w)'x,
and
(19.118)
-.,
r +l
evs:
(19.119)
+/
c gl +
ev-vl
x')...,(X'X) -
lxw-j-/ll
( 19. 120)
x x((),
1)
tr2.
We could
((Fklc 1,
1
( 19. 121)
see Section 6.3. (Using
....
Prt#xwvg-casv
....
xw.h/qGyw+/<
+x'w,.,(x'x)-1x,.../q)+c ,.u,''E1
,'h
# xw..,
,
-a,
(19.122)
404
where
Specification,
cx
estimation
and testing
via
mt 3.029 +0.678)?,
( 1.116) (0.113)
+0.863p,
R2
RSS
72
=0.993,
-0.049f,
(0.024)
=0.993,
0. 10667,
li,,
(0,014)
(19.123)
(0.040)
=0.0402,
1og L
127.703.
Using this estimated equation to predict for the period l980ffollowing prediction errors resulted:
-0.0317,
t?j
-0.0279,
ez
14,
-0.03
t?s
0.0276,
t?a =
-0.0193,
e
p1 ()
ey
-0.02 17,
=0.0457,
1982/1,,the
ezb
-0.0243,
f?g
0.0408,
=0.0497.
As can be seen, the estimated equation untlerpredicts for the first six periods
and overpredicts for the rest. This clearly indicates that the estimated
equation leaves a lot to be desired on prediction grounds and re-enforces
the initial claim that some misspecification is indeed present.
Several measures
of predictive ability have been suggested in the
econometric literature. The most commonly used statistics are:
j.
MSE=-
m t
T' + m
.;
el
l
(meansquare
errorl
(19. 124)
19.7
The residuals
MAE
N1
y. +. l
405
If4I(mean absolute
1 ;.
-m t jj
+.
j
-D1
sm
jg
r
)t
et1
w.
U=
,,2
( 19. 125)
errorl;
.)
--
-if
7.+..
&
2
t.*t
m f= w-j-I
(19.126)
(see Pindyck and Rubinfeld (198 1) for a more extensive discussion). For the
above example MSE= 0.001 12, MAE 0.03201, U 0.835. The relatively
high value of U indicates a weakness in the predictive ability of the
estimated equation.
The above form of prediction is sometimes called ex-post prediction
because the actual observations for ),f and X are available for the prediction
period. Ex-ante prediction, on the other hand, refers to prediction where
this is not the case and the values of X, for the post sample period are
kguessed' (in some way). ln ex-ante prediction ensuring that the underlying
assumptions of the statistical GM in question are valid is of paramount
importance. As with specification testing, ex-ante prediction should be
preceded by misspecification testing which is discussed in Chapters 20-22.
Having accepted the assumptions underlying the linear regression model as
valid we can proceed. to
xw t by iw-, and use
=
kguessestimate'
JF
p-T k T + /
12
'
'
,/
?.-s,
one additional to
19.7
.,
(?vyt
(see (118)).
The residuals
l,i, / y!
EB
#'xt
y -X#
1, 2,
in matrix
notation.
(19 129)
( 19.130)
From the definition of the residuals we can deduce that they should play a
role in the misspecification
analysis (testing the
very important
assumptions of the linear regression model) because any test related to
406
ty,,/Xr= xt,l6T) can only be tested via f. For example, if we were to test any
one of the assumptions relating to the probability model
(.VI,/X,
Xf)
X(#'xf,
'v
t7'2)
(19.131)
#'x,)
(y, -
x.
N(0, c2)
(notthe
(19. 132)
tlife
bautonomous'
c2(1
(19. 133)
- Px)),
'v .N(0,
where Px is the idempotent matrix discussed in Section 19.4 above. Given,
however, that rank (1 Px) trtl Px) T-k we can deduce that the
distribution of is a singular multivariate nonnal. Hence, the distributions
of and u can coincide only asymptotically if
=
v(X(X'X) - 1X')
-+
(19.134)
v(X'X) -
'
-+0
(19.135)
as
H'(I
Px)H
A,
01,
(19.136)
The residuals
19.7
407
being the matrix of the eigenvalues of I Px. This enables us to detine the
transformed residuals, known as BLUS residuals, to be
-
' H'
=
'v
(19.137)
y,l
..
bt- 1
recursive
t-
1, 2,
( 19. 138)
,.
where
is the
for t
j,t -
r k+ 1
1x t ,
(X, lX,
-
1)
X,
Ieast-squares
E&
(x j
xz
x,
1 yt
'
!'
( 19.139)
1,
estimator
'
j0-
of
# with
1 EB
(y j y z
,
yt
'
-.
j.
#, N(0,c2( 1 + x;(X?0-'jXr0-j)
'v
1xj)),
and for
P=
'
g1 + x;(X,0-
tl*
;* EEE(t?*
k+
k+ 1
$
2,
'
#+ x((),c2I.,-k).
x
'
1x,0-
tl*)/
F
)-
(19. 140)
'
x f (j
>
(19.141)
recursive
Speciation,
408
estimation
and testing
0.10
0.06
-0.05
-0. 10
1964
1967
1970
1973
T ime
1976
1979
1982
19.8
misleading.
Appendix 19.1
409
systems
A measurement system refers to the range of the variables involved and the
associated mathematical structure. This system is normally selected so that
the mathematical structure of the range retlects the structural properties of
systems such as
the phenomena being measured. Different measurement
nominal, ordinal, interval and ratio are used in praetice. Let us consider
them one by one very briefly.
possible are
Nominal: ln a nominal system the only relationships
(i)
whether the quantities involved belong to a group or not without
any ordering among the groups.
Onlinal: In this system we add to the nominal system the ordering
of the various groups (e.g. social class, ordinal utility).
lnerval: ln this system we add to the ordinal system the
(iii)
interpoint distances (e.g. measures of
possibility of comparing
of the values
that
Note
temperature).
any linear transformation
Fahrenheit
and
scales).
is
legitimate
(Celsius
taken
Ratio: ln the ratio system we add to the interval system a natural
origin for the values. ln such a systen the ratio of two values is a
meaningful
value.
In the case of the linear regression model caution should be exercised when
using different measurement systems for the variables involved. In general,
most economic variables are of the ratio type and in such a case the
410
concepts
2,
.#2 #2
recursive
Questions
Compare the linear Gauss linear and the linear regression models
(statistical GM, probability and sampling models).
Explain the concept of exogeneity in the context of the linear
regression model.
Discuss the differences and similarities between
'(.?f
pt BE
X!
x?), p)
EEE
'(.rr,'''c(Xt)).
4.
.l)'.
'
,)
7-
Exercises
Derive the
MLE'S
of 0 EE (j, c2)'.
Additional references
41 1
Derive the information matrix Iw(p) and discuss the question of full
>
efficiency of j and c
Show that
az
(i)
Z'(#)
(ii)
Px2 Px
(iii)
Px'
(iv)
(1 Px)X 0.
0,'
(Px
X(X'X) - 1X');
Px;
t satisfy the
conditions:
lim
v-.x
-.
1
T
(X'X)
lim(X'X)-
Qx<
LJo
non-singular;
=0.
F-y x
Construct
(iii)
/71+
/72+ /33
jl
fz /3
1,'
=0.
Show that
(R/
r)'(R(X'X)- 1R')-
IIRJ- r)
(R#
=u'Qu +
-r),
'
'v
Additional references
C H AP T E R 20
Snear
4 12
20.1
model
X).
(20.1)
pt G f(-Pr/X/
An alternative but related form of conditioning is the one with respect to the
c-field generated by Xf, which defines the systematic component to be
=
pl
'(A 6'(Xf)).
'
c(X,)
U c(A-.f) zzn
=
conditional
expectation
,)#)
',
(seeChapter
J', =
15). Using
,'X,
+ u,,
J1
212-21
(20.5)
xr,
cl 2E2-2'X?
(20.6)
statistical
GM:
(20.7)
414
from assumptions
Departures
statistical
GM
f()?,/oX,)),
ut
satisfiesthe following properties:
M
(20.8)
.)4
'lF(&,/c(Xf)))
f'tF(/t?,/,/c(Xt)))
.E'(ul)
Eutplq
Elutus)
=0,
(20.9)
=0,
.E')S(u,l</c(Xt)))
(20. 10)
c2
0,
s,
(20. 11)
tzp s,
#)
Dq' 'Xt;
#:)
'
D(X,;
a),
(20.12)
(20.13)
where
(20.14)
D(Xr;
(det E 2 2 )kyc
(2zr)
#c)
#c are
vectors
X1,
expt --l.x,Eca
,
xt )
(20. 15)
fromD(Zt,.#), t
1, 2,
T;
20.1
respectiyely,
where as usual
(xA'r,,)
z,
EEE
GM: yf
j'Xr + ut,
l (E T
E1q
r21
the
we can specify
'tl
g31
E4q
E51
the
'TJ
values
.?)
The probability
model
1
2c
cv ( )
(detxccl-feXp --l(X;E2-2'Xr))
k/z
t
x (2zr)
E6(1 (i)
0 c(Rk x R +)
D(y/X,; p) is normal;
(ii)
(iii)
;t),,, c(x,))
c(
-
(Z1, Zc,
j!
model
Ihe sampling
g8(I
respectively.
k), t
1, 2,
T)
.,
distributions.
Departures
The probability
-.v4'
'k'
1 .J'a.
,
from assumptions
statistical
GM
f-(0 ; y / )
D( #'j Xf; 0)
tg-I
'
'
D (X f ;
:=
# a)
.
differentiation
with
-.
)(x'),
j x x't
E,
(.?'',,z.)-
.,t''y
''
(20.18)
and
1
(i*2
Ta
V(
p*xtj a EEI -T
.:
-
(20.19)
'''b
/*
'*2
/*
1
1
( 1.'.1''4- k''lb:t'p + uj j-h (,/''./') - #'''u
=
(20.20)
20.1
# + 'I).4'';'')- ',f.''E'(u
= #,
'(#*)
if
'g(;??''t?)
E(u/c(J'))
1,'71
< x?,
That is,
=0.
COvll
**
c(t?'))(l
'E.E'(#*/c(.(F))1
)O
Ep
(20.21)
GM
A*
A*
>*
-#)
-#)(#
ELEC'
/
=
(7),
-#)(/
/) /c(?)1
'
1,'?''A'(uu' c(,'.F))t'(;?''t?') -
= FI)-''.??'')
o'lE.J-'.'jt -
11
(20.22)
E(I'-%
I1(p)
(20.23)
2c
estimator of (.
j* is an
Using the same properties for the conditional expectation
of c2
can show that for the MLE
mcient
operator
we
'*2
1
= T Fgfu/Mxu
1
= -,jr Sgftr
1
=
''
Mx)
c2,
defined by
is unbiased.
1
=
'.
czsttr
*/c(kY-))(j
where Mx
1
c(.?)(1
Mxuu'/c(.?))1
cuftr
T-k
1 Fgf't*
w
=-
'rtr
Iw - ''tl'.Il
.
.-
.,
Ma.E'(uu'/c(.'?/))1
1
Iw tr(tf'.'') - ,')F/,jF))
-
values
of ,?J
c2
(20.24)
the estimator
418
Departures
,+)
'*2
lim
F-+
E(.I'Ij
'
Qxx<
cfz
and non-singular,
(20.26)
we deduce that
x'''z'(/*
.N0,c2Qx4),
(20.27)
,v
N(0, 2c*).
(20.28)
-#)
UT(+2
c2)
20.2
The statistical
parameters
of interest
The statistical parameters which define the statistical GM are said to be the
statistical parameters of interest. In the case of the linear regression model
these are #= E2-)lc1 and c2 cl 1 cj aE2-zllaj. Estimation of these
statistical parameters provides us with an estimated data generating
mechanism assumed to have given rise to the observed data in question.
The notion of the statistical parameters of interest is of paramount
importance because the whole statistical analysis
around these
assumptions
defining the linear
parameters.
cursory look at
E1j-E8q
regression model reveals that al1 the assumptions are directly or indirectly
related to the statistical parameters of interest 0. Assumption E1jdefines the
systematic and non-systematic component in terms of 0. The assumption of
weak exogeneity (3q is dened relative to 0. Any a priori information is
introduced into the statistical model via 0. Assumption g5qreferring to the
rank of X is indirectly related to 0 because the condition
=
trevolves'
rank(X'X)
(20.29)
20.2 The
statistical
parameters
of interest
419
k,
(20.30)
required to ensure that E2c is invertible and thus the statistical parameters
of interest 0 can be defined. Note that for T > k, ranklx)
rank(X'X).
Assumptions E6qto (8(Iare directly related io 0 in vi
ofthe
fact that they
w
defined
in
of
a1l
D(y,,/Xt;
0).
terms
are
The statistical parameters of interest 0 do not necessarily coincide with
the theoretical parameters of interest, say (. The two sets of parameters,
however, should be related in such a way as to ensure that ( is uniquely
defined in terms of 0. Only then the theoretical parameters of interest can be
given statistical meaning. In such a case
t is said to be identnable (see
Chapter 25). Empirical econometlic models represent reparametrised
statistical GM's in terms of (. Their statistical meaning is derived from 0
and their theoretical meaning through (. As it stands, the statistical GM,
=
y'f
=
#'xl+
lzf,
t e: T,
(20.31)
depending on the
(20.32)
=0,
:,
E,V
N(0, c,Z1z),
'
(20.33)
420
Departures
but instead
y X#
=
+ u,
N(0, c2Jw)
'w
(20.34)
Xy
(20.35)
'
'
'r - k
y -X#.
(20.36)
between
WT +E,
(20.37)
Wy 10
(20.38)
and thus
A'(/)
'x'wy#o;
-#=(x'x)-
(20.39)
and
(20.40)
1
'-'
l .-XIX ?X) - X That is, # and ti 2 suffer from omitted variables bias
and y= 0, respectively; see Maddala (1977), Johnston ( 1984),
unless W'X
Schmidt ( 1976), inter alia.
Mx
=0
viewpoint,
From the textbook specification approach
where the
statistical model is derived by attaching an error tenn to thc theoretical
model, it is impossible to question the validity of the above argument. On
the other hand, looking at it from the specification viewpoint proposed in
Ec- lEc
3EL' ,3
1,
where Ec.a (Ycc - EaaYL1Es2), Eza Cov(Wr), Eaa Cov(Xf, Wf), JaI
CovtWt, yf) (seeChapter 15). Moreover, the probability models underlying
(33) and (34)are Dt.pf Xf; 0) and 1)(y,,,'Xr,Wr,' 0o) respectively. Once this is
realised we can see that the omitted variables bias (39)should be written as
=
20.3 Weak
E
)'/-1f
exogeneity
=(X X)
-bo
1J'
t#)
(20.41)
X Wy#0,
since
E
(u) WT #0,
(20.42)
j',..''A'U/
.
where E)..
'
y =X#() + Wy+ c
(20.43)
-#-
(20.44)
since
E
(u) 0
(20.45)
vt'.kr
.t./-)
treduction'
20.3
Weak exogeneity
422
statistical
GM
o-p, c2) is concerned. That is, although at the outset we postulate Dt-pt, Xt;
#) as far as the parameters of interest are concerned, D(yj/Xt; #1) suffices;
note that
D(#t, Xf; #) D(.F/Xl;
=
(20.46)
'
is true for any joint distribution (seeChapter 5). If we want to test the
exogeneity assumption we need to specify 1)(X,; #2) and consider it in
relation to D(yt/X2; #1) (see Wu (1973),Engle (1984),.
inter alia). These
exogeneity tests usually test certain implications of the exogeneity
assumption and this can present various problems. The implications of
exogeneity tested depend crucially on the other assumptions of the model as
well as the appropriate specification of the statistical GM giving rise to xr,
T; see Engle et al. (1983).
t= 1, 2,
Exogeneity in this context will be treated as a non-directly testable
assumption and no exogeneity tests will be considered. It will be argued in
Chapter 2 1 that exogeneity assumptions can be tested indirectly by testing
the assumptions g6q-E8j.The argument in a nutshell is that when
inappropriate marginalisation and conditioning are used in defining the
parameters of interest the assumptions g6J-E8(I
are unlikely to be valid (see
Engle et al. (1983),Richard (1982:.For example
the weak
a way to
exogeneity assumption indirectly is to test for departures from the
nonuality of Dtyf, Xf; #) using the implied normality of D(yl/Xf; 0) and
homoskedasticity of Vartyt/xr
xr). For instance, in the case where D(y,, Xf;
#) is multivariate Student's 1, the parameters /1 and #c above are no longer
variation free (seeSection 21.4). Testing for departures from normality in
the directions implied by D(y,, X,; #) being multivariate t can be viewed as
an indirect test for the variation free assumption underlying weak
exogeneity.
.
ttest'
20.4
Restrictions
on the statistial
parameters
of interest 0
(1)
on
(20.47)
20.4 Restrictions
of interest
on parameters
l0, p; y, X)
const
T
--j.
1og c a
2c z ty -
.,.
ns'g
.A#l
v
n.
y - .,h.P? # (R/
,
r),
(20.48)
where p represents an m x 1 vector of Laqranqe multipliers. Optimisation of
(48) with respect to #, c2 and p gives rise to the first-order conditions'.
01 1
(X'y -X?X#) - R'p
p = cl
pl
t?cz
2c a
(31
= -(R#
1 .(y
2c
=
(49)with
(20.49)
-X#)'(y -X/)=0,
(20.50)
(20.5 1)
0.
-r)
ep
Premultiplying
=0,
R(X'X)-
(20.52)
...- j-(X
#=
and
From
R(
- r
(50)we
X)
R ERIX X)
1
=- F
(Rj - r)
(20.54)
0.
:2 Y
-
MLE of
-x#
-x#
(y
'
MLE of c2 is
(20.55)
(y
y -Xj.
424
Departures
Properties
of
1-0(J #
5
2)
MLE'S
(#
r)) Cj 1
Cj z
C.z, Caa
'
(20.56)
c 11
c2E(x'X) - 1 -(x'X)
1
1
- R'gR(X'X) - 1R'j - R(X'X) -
Using
(i)
lj
Covtj),
#) C'a1,
=
Cov(#).
EEE
(56)we
Jand
x'x
Iw(#, p, c2)
R'
()
(20.57)
2c
;y
:2
42 +
1
F
Given that
T'42
--s
(R# -r)?gR(X'X) -
li
= -X(X'X)-
z2(w-/()
MLE
of the
or not;
square
RgR(X'X) -
(20.58)
IR'II- 1(R/
(20.59)
r).
20.4 Restrictions
of interest
on parameters
425
and
(Rj-
r)'
IR'I--1 (Rj-
ERIX'X) .7
r)
J),
z2(m;
'w
(20.60)
where
1
1
(R# -..j'gR(X'X) - R') - (R#
=
we ean deduce that
7-:2
- a
c
k,.
(20.61)
z2(T +
'v
r)
(20.62)
),
using the reproductive property of the chi-square (seeAppendix 6.1) and the
independence of the two components in (59).This implies
F(d2) + c2.
But for
(J().63)
.92
gl (F+
f'(#2)
-/()j?g,
c2 wjl.w
R#= r, since J
0.
ln Section 19.5 above we derived the F-test based on the test statistic
'r(y)
(R/
i
1
(R/
- r)' (R(X'X) - R'j --
/l1 l
z--'
r)
(20.64)
--
against
Hj
R## r,
must be close to
using the intuitive argument that when Hv is valid jlR#zero. We can derive the same test using various other intuitive arguments
similar to this in relation to quantities like j1J-/)tand
being close to
of the Fvalid
when
question
A
formal
derivation
is
5).
Ho
(see
more
zero
test can be based on the likelihood ratio test procedure (see Chapter 14).
The abovc null and alternative hypotheses in the language of Chapter 14
can be written as
.-rij
tj/j!
Ho1 pgeo,
H,
where
0G
(j,
(F2),
(.)
pc(6)1
c2):
.t(#,
jG
-60,
(F/, c2
.tf
R o ).s
r,
(7.2
(j
R+
jl
.
#y)
max L(0; y)
ps eo -----.
max L(0, y)
pct.k
L (#.
,
.-v--
w(,;.ya)- r?y,.-w.a
# (a j --jwn--.
sj.-- o
J.
ttp,. y) (2a)
.
(tz )
t?
(?c
-.
r,,a
ti a
(20.65)
Departures
2(y)
/'
Looking
(R/
1 ?-
==
'R'II-1(R/-r)
-r)'gR(X'x)-
-r/2
(20.66)
(T- k)sa
--
at
distribution we know. Hence, 2(y) can be transformed into the F-test using
the monotonic transformation
z(y)
-27z
(2(y) . -
T- k
1)
(20.67)
(R#
r) r (RIX X)
?,
(1
-
''''
(R# - r)
??
/'
-
(20.68)
(see exercise 4). This implies that z(y) can be written in the form
'r(y)
z(y)
uu-uu
T-k
(20.69)
'
UU
RRSS -URSS
URSS
T-k
.
(20.70)
where RRSS and URSS stand for restricted and unrestricted residual sums
of squares, respectively. This is a more convenient form because most
computer packages report the RSS and instead of going through the
calculation needed for (64)we estimate the regression equation with and
without the restrictions and use the RSS in the two cases to calculate z(y) as
in (70).
Example
Let us return to the money equation
mt 2.896 + 0.690.:,
( 1.034) (0.105)
=
R2
log L
=0.995,
42
estimated
0.865p,
-0.055,
in Chapter 19:
l'ir,
147.4, RSS
=0.117
.:=0.0393,
52, T= 80.
(20.7 1)
20.4 Restrictions
of interest
on parameters
427
ja # 1.
against
and
-p1
R2
yt)
-0.529
0.629,
(20.72)
-0.2 19f1+
(0.055) (0.019) (0.087)
,
#2
=0.624,
=0.0866,
T= 80,
we can
deduce that
(20.73)
(20.74)
Hence we can conclude that Hv is strongly rejected. lt must be stressed,
however, that this is a specification test and is based on the presupposition
that all the assumptions underlying the linear regression model are valid.
From the limited analysis of this estimated equation in Chapter 19 there are
clear signs such as its predictive ability and the residual's time pattern that
some of the underlying assumptions might be invalid. In such a case the
above conclusion based on the F-test might be very misleading.
The above form of the F-test will play a very important role in the context
of misspecification testing to be considered in Chapters 2 1-23.
(2)
on
ji
Having considered the estimation and testing of the linear regression model
when a priori information in the form of exact linear restrictions on j we
turn to exact non-linear restrictions.
428
Departures
Consider the case where a priori information comes in the form of m nonlinear restrictions (e.g.pj #c/ja, jl
-#c2):
=
f(#) 0,
1, 2,
m,
=0.
(20.75)
rank
(p
(20.76)
rl.
As in the case of the linear restrictions, 1etus consider first the question of
constructing a test for the null hypothesis
Ho1 H(#)
0 against
Hj
H(#) + 0.
(20.77)
Using the same intuitive argument as the one which served us so well in
constructing the F-test (seeSection 19.5) we expect that when Ho is valid,
H(j) :%0 The problem then becomes one of constructing
a test statistic
0 !)
.
H(#).
(20.79)
f(J)
H(/)
H(#) +
pH(#)
p#
(/-#)
tasymptotically
+op(1),
(20.80)
Restrictions
20.4
of interest
on parameters
429
v''r(/ - ,)
c2Qx-')
N(0,
(20.81)
-H(#))
N 0, c
H(,)
z op -
Qx-
pH(#)
'
(20.82)
op
- 1H(/)
where
j),
z2(m,.
'v
(20.83)
(:
c2j(''JjC>)ox-
covt/lQ
1(t''Jj>)'j
- H(#)'Ecov(/)q
-1H(,).
(20.84)
As it stands
.'b'
where
P#
(?H(j)
t?j
X'X - 1 ?H(/)
F
Jj
'
P
--+
covtj)
(20.85)
,,
t?j
#-j
H(/)'js2(0Hgj(j(x,x)-
u,(y)-
ltt?l-lgjjj-
'u(j)
,...,
zztm;),
(20.86)
are
c2 and
?H(#)
=
C1
c2
1y:1'Pr) >
Gl,
rejection
(20.87)
430
Departures
from assumptions
statistical
GM
where
dzztn1l
=
,x.
This is because
H
BStyl
z2(r#l).
(20.88)
This test is known as the WW/J test whose general form was discussed in
Chapter 16. ln the same chapter two other asymptotic test procedures
which give rise to asymptotically equivalent tests were also discussed. These
are the Lagrange multiplier and the likelihood ratio asymptotic test
procedures.
The Lagrange multiplier test procedure is based on the constrained MLE
of
J # instead of the unconstrained MLE J. That is, #-is derived from the
optimisation of the Lagrangian function
1(0,#;
=log
y)
?/
t? log
pt?/
t?H(#)
--
/.t(#)
lp
lH(#)
P#
= -H4#)
(p
=>
L0; y) -#H(#)
()
(20.90)
(#; y)
t7#
plp
0,
log L
(20.89)
H(#
0,
(20.91)
'
constrained
r)/z(#
-0)(
can be used as a
intuitiveargument
1
T
(20.92)
measure of whether Ho is true or not. Using the same
as for !hH(/)
j above we can construct the quantity
-0
#(#)'ECOv(#(#))q
p(#)
(20.93)
-v,'v
pt)
-,,(#))
--
x((,, c2g(f'Hp,t#')Qz'(-'----/?.'4p#
)'j-')
(ae.94)
Restrictions
20.4
(see Chapter
used
C1
431
f-Mtyl
can be
of interest
on parameters
#/1/
pH(J)(X X)
(72
p#
)y : Lsiy)
> ca
p(
(j)
c
z (n1,
.
'v
(20.95)
to construct an
=
H(J)
'
-.
region
(20.96)
dzztnl,'
> ral =
Cz
).
(20.97)
ln Chapter 16 it was shown that the Lagrange multiplier test can take an
alternative formulation based on the score jnction gr log L)?((0. ln view
of the relationship between the score function and the Lagrange multipliers
given in (91) we can deduce that we can construct a test for Hv against H3
based on the distance
logL)
- 0
( log Lj'
(0
Cov
1o2 L(#)
t? log
Ltpl
t?p
7loz Lj
'-'
(20.99)
'
t?p
of lVtyl and
reasonable
x N(0, Iw.(p))
(20.100)
(see Chapters 13 and 16) we can deduce the following test statistic:
Esy)
--
1
'r
p log L)'
t0
Iw)
log L(#)
Hu
(0
z (m).
(20. 10 1)
This test statistic can be simplified further by noting that Ho involves only a
subset of 0 t'ustj) and 1.(p) is block diagonal. Using the results of Chapter
'y
lrjt
g:2(x,x)-
(20. 102)
Departures
The test statistic Esy) constitutes what is sometimes called the ejjlcient
score form of the Lagrange multiplier test.
The Iikelihood ratio test is based on the test statistic (seeChapter 16):
J-aR(y)
z2(m; ),
(20.103)
.fy
f-R(y)
>
ca ).
(20.104)
T'(#-
cx
This approximation
4)'I,c0)(Jis very
we can approximate
LRy) as
(20.105)
suggestive
because
it shows
clearly
an
important feature shared by a1l four test statistics WJtyl, LMy), ESy)
and .LR(y). A11four are based on the intuitive argument that some distance
l H(/)1) g(/311 E7log L($-,0)l/?,)r and 14
#-- respectively, is
to
zero' when Ho is valid.
Another important feature shared by a11 four test statistics is that
their asymptotic distribution (z2(m)under Hv) depends crucially on the
asymptotic nqrmality of a ertain quantity inyolved in defining these
/
dijtancesg x?' F(# - #), ( 1,/x.' Tqlp) - #(#)), ( 1 x7 F) EPlog .L(#, 0)j/J$ and
v''F(#- 0) respectively. All thre tests, the Wald ( 14,'),Lagrange multiplier
LLM) and likelihood ratio (faR) are asymptotically equivalent, in the sense
that they have the same asymptotic power characteristics. On practical
grounds the only difference between the three test statistics is purely
computational, l4'tyl is based only on the unconstrained MLE p-LM(y) is
based on the constrained MLE J and LR(y) on both. For a given example
size T; however, the three test statistics can lead to difference decisions as far
as rejection of Hv is concerned. For example, if we were to apply the above
procedures to the case where H(#) R# r we could show that l4z'(y)
11
/)1
41
F'
'
tclose
20.5
f.R(y) y LMy)
Collinearity
E2-21o'al c2
,
k,
related
=
cj
ranktx)
j
=
g5qstating that
(20.106)
H(#,
c2)
(#
(20.107)
20.5 Collinearity
and (106) represents the sample equivalent to
ranktrzc)
k.
(20.108)
(72
/=(x'x)-
'x'y
42
and
1 y'(I -X(X'X)
=-
ly
(20.109)
of # and c2 can be defined. ln the case where X'X is singular j and 42 cannot
be derived.
The problem we face is that the singularity of (X'X) does not necessarily
imply the singularity of Ec2. This is because the singularity of (X'X) might be
a problem with the observed data in hand and not a population problem.
For example, in the case where T< k rank(X?X) < k, irrespective of E22
because of the inadequacy of the observed data information. The only clear
conclusion to be drawn from the failure of the condition ( 106) is that the
of the statistical
the estimation
sample injrmation in X is inadequate
t#' interest
c2.
The source of the problem is rather more
j and
parameters
difficult to establish (sometimesimpossible).
In econometric modelling the problem of collinearity is rather rare and
when it occurs the reason is commonly because the modeller has ignored
relevant measurement information (see Chapter 26) related to the data
chosen. For example, in the case where an accounting identitq' holds among
some of the xrs.
It is important to note that the problem of collinearity is defined relative
to a given parametrisation. The presence of collinearity among the columns
of X, however, does not preclude the possibility of estimating another
of the
statistical
GM.
parametrisation restriction
One such
parametrisation is provided by a particular linear combination of the
columns of X based on the eigenvalues and eigenvectors of (X'X).
.jr
Let
P(x?X)P'
diagti-l, 2c,
2,,,,0, 0,
(20.110)
0)
(20.111)
The new data matrix X* can be viewed as referring to the values of the
Departures
(X'X)pf
fOr
1, 2,
X* HE (X*1:X1),
#*
Decomposing
(iii) for
t
1, 2,
m + 1,
X1
0.
as y
(20.112)
P'# conformably
in the form
#*
=(a',
/)'
(20.114)
X*a + u,
,:2
- =(X/'Xt)-1X1'y
Moreover, in
view of
#= P#*
any linear
1-2
=-
y'(I -X1(Xt'X1)-1X1)y.
(20.115)
relationship
P1 + P2)'
combination
c'b=c'PIa
and
the
we can rewrite
c'# of
+c'Pcy
# is
estimable if c'Pa
0 since
=c'P1a,
(20.117)
section.
20.6
Near' collinearity
lf we define collinearity as the situation where the rank of (X'X) is less than
collinearity refers to the situation where (X'X) is
k,
singular or
ill-conditioned as is known in numerical analysis. The effect of this near
singularity is that the solution of the system
dnear'
tnearly'
(X'X)j
X'y
(20.118)
to derive j is highly sensitive to small changes in (X'X) and X'y. That is,
small changes in (X'X) or X y can lead to big changes in q. As with
20.6
tNear' collinearity
'near'
knear'
ttrue'
'near'
taccuracy'
Cov(#)
c (X X)
(20.119)
(xfrv
(
kit
(-YI'!
.ff)
-ff
1)2
1, 2,
k.
(20.120)
(b)
yluxlrv
regressions
436
Departures
and a high value of the multiple correlation coefficient from this regression,
Rl, is used as a criterion for tnear'
collinearity. This is motivated by the
following form of the covariance of h :
covt/ h )
c2(x'x)j-;,1
c2g( 1
Rjzlxkx
(20.122)
RJ
xkX -,(X'-p,X
-,)
1X'-y:x,
(c)
(20.123)
xhxh
Condition numbers
c (X'X)-
c2(PA - 1P/)
k
0.2
;::1
.x..z
y
1
9-6j
2,.
(20.124)
and thus
Var(#f)
-
.g
pij
'E ;j
f=1
1, 2,
(20. 125)
(see Silvey (1969:. This suggests that the variance of each estimated
coefficient /f depends on a1l the eigenvalues of (X'X). The presence of a
relatively small eigenvalue ki will
these variances. For this
reason we look at the condition numbers
tdominate'
&j(X'X)
2max
i
ki
,
1, 2,
k,
(20. 126)
%near'
20.6
tNear' collinearity
(20.127)
tJ'l i
Covtx'flrf),
rrlx f (Xft),
=
1/
plx,
2, 3,
(20.128)
k.
ln such a case
1x'
- (x'x)
i bi
=
>
Vartj)
-1
(xJx)
(20.129)
with the estimator as well as its variance being effected only by the th
regressor. This represents a very robust estimated statistical model because
changes in the behaviour of one regressor over time will affect no other
coefficient estimator but its own. Moreover, by increasing the number of
regressors the model remains unchanged. This situation can be reached by
design in the case of the Gauss linear model (seeChapter 18). Although such
an option is not directly available in the context of the linear regression
model because we do not control the values of Xl, we can achieve a similar
effect via reparametrisation. Given that the statistical parameters of interest
0 rarely coincide with the theoretical parameters of interest ( we can tackle
the problem at the stage of reparametrising the estimated statistical GM in
order to derive an empirical econometric model. It is important to note that
statistical GM is postulated only as a crude approximation of the actual
DGP purporting to summarise the sample information in a particular way,
as suggested by the estimable model. Issues of efficient estimation utilising
a1l a priori information do not arise at this stage. Hence, the presence of
tnear' collinearity in the context of the statistical model should be viewed as
providing us with important information relating to the adequacy of the
sample information for the estimation of 0. This information will help us to
model by
construct a much more robust empirical econometric
estimated
statistical
GM
which
provide
in
reparametrising the
us with
ways
without,
which
close
orthogonal
transformed regressors
to being
are
however, sacrificing its theoretical meaning. This is possible because there is
no unique way to reparametrise a statistical GM into an empirical
econometric model and one of the objectives in constructing the latter is to
438
Departures
'nearly'
tnear'
and
h is mapped into
.b't
X, is mapped into X)
tk1 ),t + c1
(20.130)
zlcxf
(20.131)
+ c2,
Z,
'v
Zt
EB
matrix.
(20.132)
N(m, E),
where
l'f
Xt
ml
m2
(r1 1
J1 2
J1 1
X22
(20.133)
the statistic
(2,S),
=-
)(
-
Zf
and
)( (z,
l=1
-2)(z,
-2)'
(20.134)
-.+
A2
+c
(20.135)
ASA',
(20.136)
and
S
-->
where
A
(130)
j
(/4)1
) (cr
x0
2
al
transformations
Ammc,
-+
(20.137)
AEA'.
--+
on the
(2
kl 1 )
(20.138)
2).
(seeequation (15.39)for
measuresare directly related
(#2
2- j)
yz :$( 1
via
-
2-
(20.139)
(seeTheil (1971)),where
partial
correlation
=g(2,
(/Z, S) g(A2
=
g1(2, S) =gj(A2
+c, ASA')
+c, ASA').
(20.140)
(20.141)
That is, the sample multiple and partial correlation coefficients are
invariant to the group of linear transformations on the data. lndeed it can
be shown that 112 is a maximal invariant to this group of transformations
That is, any invariant statistic is a function of 2.
(see Muirhead (1982)).
The multiple and partial correlation coefficients together with the simple
correlation coefficients can be used as guides' to the construction of
empirical models with nearly orthogonal
regressors. The multiple
used
overall
picture of the relative
particular
correlation in
to get an
can be
statistical
of
various
GM and the
contribution
the
regressors (in both the
440
Departures
from assumptions
empirical econometric
statistical
GM
1, 2,
k - 1,
(20.142)
(20. 143)
which Theil ( 1971) called the multicollinearity effect. ln the present context
such an interpretation should be viewed as coincidental to the main aim of
constructing robust empirical econometric models. It is important to
remember that the computation of the multiple correlation coefficient
differs from one computer package to another and it is rarely the one used
above.
To conclude, note that the various so-called
to the problem of
such
adding
collinearity
dropping
as
or
near
regressors or supplementing
the model with a priori information are simply ways to introduce
alternative reparametrisations,
not solutions to thc original problem.
isolutions'
Important
concepts
Questions
Compare and contrast
(i)
F().'f X,
(ii)
'''''
1
j= (X X) X y, p+
(iii)
42
xE7lyfc(X,));
xf),
''-'
',
t*2
1
=--
(.)??')
.?
.1
'
.4
.t
.
y,
*?*,
yt
20.6
Near' collinearity
'
'
u#
Discuss.
Explain informally how you would go about constructing
an
exogeneity test. Discuss the difficulties associated with such a test.
MLE'S
of p and c2:
Compare the constrained and unconstrained
)= j-(x'X)- 1 R'
j=(x'x)-1x'y,
42
1 (y
g.2
-x#
r
-.
1
T
(y
(y-x#,
-
x jj (y- X#)
,
a test for
go about constructing
against
S1 : Rp+ r
.f
Hv H(j)
0 against
rH(/)
Hj : H(j) # 0.
1 instead of
! I
tcollinearity-
Snear-collinearity'
'ill-conditioned-
442
- 1
E=D
A-
A-FE- 1F'
1F'
- EIB
-B'A-
EF
c2)j 1 and
derive g1w(#,
compare its
p,
and Cc2 in (56).
Verify the distribution of
-
--FE-
A- IB
various
elements with C1 j C1 2
,
($-
in(56).
vCr ify the equality
cuu'
-'
(R/ -r)'gR(X'X) - IR') - 1(Rj
For the null hypothesis Hv: Rp= r against ff1 : R## r use the Wald,
Lagrange multiplier and likelihood ratio test procedures to derive the
following test statistics:
?U(y)
==
-r)
( 1
i-
y- k
FNT(Y)' L A4tY)
==
.-
LRy)
T log
(7-
--CIVKGCXI
) j. suzty)
,
mzty)
1 + T-i
+ z), zb 0)
(see Evans
Additional references
al.
and
g7l
0- (j1c2)are time-invariant.
ln each of the Sections 2-5 the above assumptions will be relaxed one at a
time, retaining the others, and the following interrelated questions will be
discussed :
what are the implications of the departures considered?
(a)
(b)
how do we detect such departures'?, and
(c)
how do we proceed if departures are detected?
It is important to note at the outset that the following discussion which
considers individual assumptions being relaxed separately limits the scope
of misspecification analysis because it is rather rate to encounter such
conditions in practice. More often than not various assumptions are invalid
simultaneously. This is considered in more detail in Section 1. Section 6
discusses the problem of structural change which constitutes a particularly
important form of departure from E7).
21.1
Misspification
Misspecication
testing refers to the testing of the assumptions underlying
statistical
model.
In its context the null hypothesis is uniquely defined as
a
the assumptionts) in question being valid. The alternative takes a particular
fonn of departure from the null which is invariably non-unique. This is
443
444
Departures
from assumptions
probability
model
because departures from a given assumption can take numerous forms with
the specified alternative being only one such form. Moreover, most
misspecification tests are based on the questionable presupposition that the
of the model are valid. This is because joint
other assumptions
misspecification testing is considerably more involved. For these reasons
the choice in a misspecification test is between rejecting and not rejecting
the null; accepting the alternative should be excluded at this stage.
An important implication for the question on how to proceed if the null is
rejected is that before any action is taken the restllts of the other
misspecification tests should also be considered. lt is often the case that a
particular form of departure from one assumption might also affect other
assumptions. For example when the assumption of sample independence
(8))is invalid the other misspecification
tests are influenced (see Chapterzz).
In general the way to proceed when any of the assumptions 1(6(1-1(8)
are
invalid is first to narrow down the source of the departures by relating them
back to the NIID assumption of LZ,, f (E Ir and then respecify the model
of the
taking into account the departure from NllD. The respecification
of the reduction from D(Z1 Za,
Zw,' #)
model involves a reconsideration
to D(y,/Xf; p) so as to account for the departures from the assumptions
involved. As argued in Chapters 19-20 this reduction coming in the form of:
,
D(Z j
Z 7. ; #)
Dt Zrs' #)
J'''.j
.....
tg'ID( yf 'X?
) D(Xr
'
,'
#a)
With the above discussion in mind 1et us consider the question of general
procedures for the derivation of misspecification tests. In cases where the
alternative in a misspecification test is given a specific parametric form the
various procedures encountered in specification testing (F-type tests, Wald,
21.1
Misspecification
testing
445
Lagrange mtlltiplier
-omitted
statistical GM's.
was based on the comparison of two
This was because the information sets underlying the latter were different. lt
was argued, however, that the argument could be reformulated by
postulating the same sample information sets. ln particular if both
Zw; #) by using
parametrisations can be derived from D(Z1 Zc,
statistical
redtlction
GM's
then
the
arguments
alternative
two
can be made
comparable.
Sttchastic
Let ftzf l G 1 ) bo a kector
process defined on the probability
which
includes the stochastic variables of interest. In
#( ))
space (S,
'non-comparable'
..t
'
17 it was argued
Chapter
.y t
E(. .
'
,/
t/'.) +
,
uj
I/t
'
).'!- E .J',
with
(2 1.4)
t)
properties
by construction
including the
=0,
#'xr +
/tt* j'x
=
(2 1.6)
tt r ,
LI*
f
rk'r lX,
x,
#'x
f,
(2 1.7)
'
)
.
When any of the assumptions in NllD are invalid, however, the various
properties of pt and lIt no longer hold for pl and ul. ln particular the
446
Departures
orthogonality condition
Eplutt)
(5)is invalid.
The non-orthogonality
#0,
(21.9)
g*(x,)-f=1
(b)
(21.0)
k
#(x,)=t1 +
j'
'-V
J,bixit+ )( )cijxitxjt
1
jp 1
+
l )wil lkdi-lxirx./fxl
i-
'
'
'
'
/yj
ltf
#'0Xf
''o%*
(21.12)
bLxt+ i''vzl + cf
Zf
1,
(21. 13)
ffe: yll 0,
=
with
A direct comparison
-
li (
1
between
regression
#)'X,
+ SZ)
ut (#o
whoseoperational form
=
HL
),e # 0.
(13)
(21.14)
and
(21. 15)
+ cf,
/)?x+ ylzlp + c,
(21.16)
can be used to test ( 14) directly. The most obvious test is the F-type test
discussed in Sections 19.5 and 20.3. The F-test will take the general form
FF(y)
( 1
vgss
sy
(21.17)
447
21.2 Normality
where RRSS and URSS refer to the residuals sum of squares from (6)and
( 16) (or ( 13/, respectively; k* being the number of parameters in (13) and m
the number of restrictions.
This procedure could be easily extended to the highercentral moments of
) ?t /'Xt s
Elut/xt
(21.18)
r y 2.
x,),
21.2
(1985b).
Normality
As argued above, the assumptions underlying the probability model are all
interrelated and they stem from the fact that Dt-pf,Xf; /) is assumed to be
multivariate normal. When D(),,, Xf,' ) is assumed to be some Other
multivariate distribution the regression function takes a more general fonn
(not necessarily linear),
ElhAt
X!)
11(*,X!),
x,)
=g(k,
x,).
(21.20)
assuming
(J'f,/Xf
X)
'v
D(FX,,
(T2),
'
(1)
Consequences of
non-normality
448
from assumptions
Departures
--
probability
model
'apparently'
hto)
minV ),,,lk't
-
(2 1.22)
psf.l r
It is interesting to note that this method was tirst suggested by Gauss in 1794
as an alternative to maximising what we, nowadays, call the log-likelihood
function under the normality assumption (seeSection 13. 1for more details).
In an attempt to motivate the Ieast-squares method he argued that:
the most probable value of the desired parameters will be that in which the
sum of the squares of differences between the actually observed and
computed values multiplied by numbers that measure the degree of
precision.is a minimum
.
warned that:
Despite this forceful argument let us consider the estimation of the linear
regression model without assuming normality, but retaining linearity and
homoskedasticity as in (21).
The least-squares method suggests minimising
6$)
w (), #,x,)2
---j.
'
=1
(21.23)
21.2 Normality
V9
or, equivalently:
F
f(#)
=
el
0# = -
Z (A't=
#'x,)2 (y-X#)'(y
2X
(y-X#)
(2 1.24)
-X#),
(2 1.25)
0.
that rankl)
Solving the system of normal equations (25)(assuming
get the ordinary least-squares (OLS) estimator of #
b (X'X) - 1X'y.
of c2 is
1 /(b)
k) we
(21.26)
T- k
T- k
(y-
Xb) , (y Xb).
(2 1.27)
of b and
in view of the
.92
'
Gausr-Markov
theorem
Under the assumption (21), b, the OLS estimator of #, has minimum variance
among the class of linear and unbiased estimators (fora proof see Judge et
aI. ( 1982/.
.1
450
Departures
0=p, c2) we need the distribution of the OLS estimators b and J2. Thus,
unless we specify the form of Dxt, c2), no test or/and confidence interval
statistics can be derived. The question which naturally arises is to what
theory' can at least provide us with large sample results.
extent
'asymptotic
.1
(21),
Unft?r assumption
x/zb p) x(0,czo.y-
lim
.2
X'X
u-
-+
1)
(2 1.28)
(2 1.29)
Qx
is flniteand non-singular.
Lemma 21
.2
(21)we
Under
x''w(J2
c2)
x(0,
-.
tfl-y
-
1)c4),
lzlomcnr
of 1)(.J,/Xf;
(2 1.30)
04assumed
P-43
4
N/
G'
7w(.2-c2)
Lemma 21
Under
x(0,2c4).
(2 1.3 1)
.3
(21)
P
b p
-+
and
(!/limw-.(X'X)=0)
(2 1.32)
(2 1.33)
a2
S
-+
2
.
From the above lemmas we can see that although the asymptotic
distribution of b coincides with the asymptotic distribution of the MLE this
is not the case with ?l. The asymptotic distribution of b does not depend on
Normality
21.2
4f1
l
1)(yt,/X,; 04but that of does via p4. The question which naturally arises is
to what extent the various results related to tests about 0- (#,c2) (see
Section 19.5) are at least asymptotically justifiable.Let us consider the Ftest for Ho R#= r against HL : R## r. From lemma 2 1.1 we can deduce that
1R') - 1) which implies that
under Hv : V'F(Rb - r) N(0,c2(RQx'v
(Rb
r)
ERIX/X)- 1 R'(l G
Ho
(Rb
r)
'w
zwty) (Rb
(21.35)
gR(X'X) - IR'II- 1
(Rb
r)' .
c
z2(n'l).
m.
r)
1
x.
z2(r#
under Hv, and thus the F-test is robust with respect to the non-normality
assumption (21) above. Although the asymptotic distribution of zwty) is chisquare, in practice the F-distribution provides a better approximation for a
small T (seeSection 19.5)).This is particularly true when D(#'xt, c2) has
heavy tails. The significance r-test being a special case of the F-test,
c=-,,--,.--2,
zty)
i;
sv E(xk-')j
,
/,.-0
(2 1.37)
(2)
Testing
452
Departures
(a)
Non-parametric
The Kolmoqorotwsmirnov
test
tlf/xf,
C1
(21.38)
.05
.01
a
(21.39)
ca 1.23 1.36 1.67 .
For a most illuminating discussion of this and similar tests see Durbin
(1973).
The Shapiro-Wilk
test
This test is based on the ratio of two different estimators of the variance c2.
2
1V=
r
where
li41)
Z
=
tilrrtl'r-r.f.
li(t))
1
'
'
/Z l
I
=
(21.40)
G lo
if T is even
=-
% lcj G
F
1)
T- 1
if T is odd,
ty:1z:'< c.)
cu are
(1965)for
(21.41)
Parametric
tests
The skewness-kurtosis
test
The most widely used parametric test for normality is the skewnesskurtosis. The parametric alternative in this test comes in the form of the
Pearson family of densities.
The Pearson family of distributions is based on the differential equation
d ln
dz
-/)
.(z)
(z
c +cjz-hczzz
(21.42)
21.2 Normality
453
cl
(a4+
(..2)
3)(a3/c-
(21.43)
(414 33)c2/J,
12as ca (2a4 3a3 6) J, J= (1014.
(see Kendall and Stuart (1969:.These parameters
co
(21.44)
18)
(21.45)
=0,a4
u't>
=0,
(b)
'v
g2(r1),
aa
23
(21.46)
=0,
'v
..::0
'.w
)
( ) (--11
jZ,)
z,
m--s
(2 1.47)
(/t3/c3)
and
=0
=(p4/c*)
a4
3.
(21.48)
'
gz) E1
=
3z) +
Wta,j.3)(z*-6z2
-
+ 3)q(I)(z)
within the
(21.49)
454
from assumptions
Departures
model
probability
multiplier test:
T1(y)
F
=
6
where
ti=3+
,- g()
-
((i4 3)2
24
'w
s.--
--)((-i
tljl-i
z2(2)
(2 1.50)
,i/)'j
,z.--
,.-gt-,)
,z.-j
H'
,.'',,?)2j.
(2 1.5 1)
C1
zlty)
1)y:
dz2(2) a.
>cl,
Ca
v'''r 43
Ho
(21.54)
x(0,6)
x,7r(4.-3)
-
With
N,
24).
(2 1.55)
(13
R2
=0.995
,
F= 80,
(ij=
42
=0.995,
((i4.- 3)2
0.005,
1og L
=0.0393,
=0.
(21.56)
147.4,
145.
Thus, zj(y)
and since ca 5.99 for a
we can deduce that under
the assumption that the other assumptions underlying the linear regression
and a4 3 is not rejected for
model are valid the null hypothesis HvL aa
=0.55
=0.5
=0
0.05.
21.2
Normality
455
Outliers.
(3)
Tackling non-normality
(2 1.57)
j=
(ii)
=0.5,
1, Z*
Z*
Z(Z/
reciprocal',
(2 1.58)
- square root;
(21.59)
Departures
= 0,
Z*
logg Z - logarithmic
ote'. 1imZ*
log.
(2 1.60)
z).
J-0
The first two cases are not commonly used in econometric modelling
because of the diliculties involved in interpreting Z* in the context of an
model.
empirical econometric
Often, however. the square-root
transformation might be convenient as a homoskedasticity inducing
transformation. This is because certain economic time-series exhibit
variances which change with its trending mean (%),i.e. Var(Z,) pl,c2, tczz1,
T: In such cases the square-root transformation can be used as a
2,
variance-stabilising one (seeAppendix 2 1.1) since Var(Z)) r!, c2.
The logarithmic
tranAformation
is of considerable
interest in
econometric modelling for a variety of reasons. Firstly, for a random
variable Zf Whose distribution is closer to the 1og normal, gamma or chisquare (i.e.positively skewed), the distribution of loge Zt is approximately
nonual (seeJohnson and Kotz ( 1970:. The loge transformation induces
'near symmetry' to the original skewed distribution and allows Zt to take
negative values even though Z could not. For economic data which take
only positive values this can be a useful transformation to achieve near
normality. Secondly, the loge transformation can be used as a variancestabilising transformation in the case where the heteroskedasticity takes the
form
=
Varty /X t
t
xf)
c,2
(/t,/c2
!,
1, 2,
'fJ
(21.61)
xt) c2, t= 1, 2,
T Thirdly, the log
yt, Vartyl/xf
transformation can be used to define useful economic concepts such as
elasticities and growth rates. For example, in the case of the money
equation considered above the variables are a11in logarithmic form and the
estimated coefficients can be interpreted as lasticities (assumingthat the
estimated equation constitutes a well-defined statistical model; a doubtful
assumption). Moreover, the growth rate of Zt defined by Z* (zf -Zt- 1)/
Zt 1 can be approximated
by A loge Zt H loge Zt
Z,- j because
Alogezt logt 1+Zt*) ::x Zt*
In practice the Box-cox transformation can be used with unspecified
and let the data determine its value (seeZarembka (1974/.For the money
equation the original variables Mf, 'F;,Pt and It were used in the Box-cox
transformed equation:
For y)
xloge
-log
(Ms-
1)-/,1
+fzLY,-
1)+/,z(#2j-
1)+,.42-,
1)+u,
(21.6a)
457
21.3 tnearity
and allowed the data to determine the
chosen was J=0.530 and
JI
J2
J3
(0.223)
(0.119)
of
The estimated j
J4=
-0.000
=0.005,
=0.865,
0.252,
value
(0.0001)
value
07.
(0.00002)
transformation
is
this mean that the original logarithmic
inappropriate'?' The answer is, not necessarily. This is because the estimated
value of depends on the estimated equation being a well-delined statistical
GM (nomisspecification). ln the money equation example there is enough
evidence to suggest that various forms of misspecification are indeed
present (seealso Sections 21.3-7 and Chapter 22).
The alternative way to tackle non-linearity by postulating a more
appropriate form for the distriution of Zf remains largely unexplored.
Most of the results in this direction are limited to multivariate distributions
closely related to the normal such as the elliptical family of distributions (see
Section 21.3 below). On the question of robust estimation see Amemiya
(1985).
tDoes
21.3
Linearity
xt)
(2 1.63)
#'Xt,
where
#=Xa-21J2j
'v
2
ca + c 1 r + ca l ..j.
(,tl (a,
.s.9:
costt')
.j. (ym (m
sintt')
)r+ )r),
-,,.
(2 1.64)
,v
458
Departures
s,4(0(,)
s/lacal)
(x-p.,)
)
(gc)
(2 1.65)
implies that
.E'(y,,'X! xf)
-
,'1
ztz-lxt
(2 1.66)
and
Vart)'f/xf
X2)
(/(Xt.)(t7'1
5'!
1 -
2X2-21J21).
(21.67)
(1)
Implications
of non-linearity
'()'f/'Xt
Xf)
/l(N),
(21.68)
)'f
#'xf+
'tyr/xf
=x,)
With
-p1
=j'xf
and ul
x,) c2. The
=yt
Elutlt
0 and
yr /)(xt)+
=
(21.69)
ut,
ttrue'
E (!t,/Xt=xf)
statistical GM,
=0,
(21.70)
cf,
and ct
where pt
'(yl,/Xl= xf)
f()4/X? xf). Comparing (69)
and (70) we can see that the error term in the former is no longer
/I(x2)
white noise but ut
+8, Hg(xf)
+ st. Moreover,
=/?(xf)
=.pr
.--p'xt
=y',
.-p'xt
21.3 t-inearity
Eutxt
x.,)
JJ(utlt/x t
ln
view of
459
# 0 and
Eptutq
=:(xf),
2
xf ) gtxf ) + c/.
=
these properties of
ut we can
g(x,..))',
e - (f?(x1),g(x2),
#) #+ (X X) X e' ,,
.
(2 1.72)
and
J)s2)
(7.2
+ e'
Mx
w- k
e+ c
2
,
Mx
l .-XIX X)
,
(2 1.73)
,
,
--+
-+
'too'
'h
#*
min c2(#)
where c2(j)
Elutlj.
(21.74)
::::c0
,$2
we can show
(2)
that
Testing
a.S.
#-+#* and
.s2
a,S.
-+
c2(#*)
for non-linearity
460
Departures
against
J.f1 : El)'tjxt
xf)
=/l(x2).
(21.76)
)'f
where
#Xt
Szf
+ 713/32+
second-order
injzzz2, 3,
and
(21.77)
:t
terms
k,
(21.78)
xitx-itxtt,>./>
(2 1.79)
k.
y2
0 and
HL :
yz + 0
or
ya # 0
= (#0-
#)'X
+ T'2z, +
'kst
equivalent
(21.80)
Ef
ffe
LM(y)
FR2
ty
restrictions
LM(y)
>
ca )
;l(q)
(21.8 1)
a.
Cu
For small T the F-type test is preferable in practice because of the degrees of
freedom adjustment; see Section 19.5.
Using the polynomial in pt we can postulate the alternative GM of the
form:
yt
where pt
p'sxt+
czptl + cspt3 +
+ cmp? +
p?
between
(21.82)
21.3
Linearity
461
'
e'
b. #) xt +
Z cipt +
zx
's
14,
pt
x,.
(2 1.83)
Let us apply these tests to the money equation estimated in Section 19.4.
The F-test based on (77)with terms up to third order (but excluding )
yielded :
because of collinearity with
.)
477
0.1 17 520
().:4j4,7-/
-0.045
FF(F)
(-j-167
11.72.
FT(y)
(:-1-
35. l3.
(3)
Tackling non-linearity
ClA,'''Xf
Xr)
(xf).
(21.84)
Choosing the form of Dtl?f, Xt: #) will determine both the form of the
conditional expectation as well as the conditional variance (seeChapter 7).
An alternative way to proceed is to use some normalising transformation
462
Departures
=x))=#*'x
(21.85)
and
Vart-vtl/rx)
x))
c2.
(21.86)
zkri
it
12
(2 1.87)
f
(see Box and Tidwell
(1962:.
E-vbt/xt
xr)
gls, T).
(2 1.88)
#(N, $ +
(21.89)
l/f
of
(2 1.90)
463
21.4 Homoskedasticity
sl
1 Z (-p,-/x,, f))2.
w- k
(21.91)
21.4
Homoskedasticity
(1)
of keteroskedasicity
Implications
/ and
.:2
are concerned
(i)
Ep
estimator
f)
diagtcf,
c2a,
of
(21.92)
c;/)
c2A,
/-
-+
p i e. b-is a
,
estimator of
consistent
#.
where
then
#.
/(l)=(y-X#)'n-
'(y
-X,),
(21.93)
464
P(#)
P# =0
#=(x f) x)
,
lx/o
- ly
-((),)(--'-,)')-'
)):(---'-,)('t).
Given that
and
(2 1.94)
(21.95)
Cov(/) y Cov(#)
(21.96)
Vartyle/''xle
x))
c2,
t= 1,
(21.97)
y* Hy
=
and
X*
HX
where
H'H =A -
1.
(21.98)
variables
y* X*#+ u*
(21.99)
and the linear regression assumptions are valid for )'1 and X). Indeed, it can
be verified that
#- (x+'x*)=
1x*'
is
unnecessary
(J 1.104))
(21.101)
21.4 Homoskedasticity
465
tsolve'
Wr
F,1
li/xrxt',
(21.102)
for which information accrues as T-+ x. White (1980) showed that under
certain regularity restrictions
j
and
--+
(21 103)
(X'f1X).
(21.104)
it .S.
Ww
-+
c2(X'X) - 1
(21.105)
'modelling'
(2)
Testing departures
from homoskedasticity
466
Departures
(2 1.106)
(106)can be
('(l?r2) czlxyxj',
(21.107)
;=1
1 jr (2
t
wt 1
dzjxtxt',
(2 1.108)
-'.,
where
-45/
(21.109)
2,,
/p,rl', klt xitxjt,
, (/1,,
k l 12
i /' 1 2
.
between
gklk
n'l
1)
.
Kolmogorov-Gabor polynomial
suggest the test statistic
zy)
where
F
jl1
tl
zl'f
bw-1
1
--
)
-
tl
,llt
(2 1.110)
I,.--
ll
l
-42)2(,
-#w) kt -#w)',
1
#v-y f Z1 #,.
(2 1. 111)
cl
Jt
y: zy)
> ca),
zy)
,vz2(m)
(2 1.112)
Homoskedasticity
21.4
regression equation
l
acj + a j
/ 1t +
.v
ac/
zt +
(2 1.113)
+ am/mt.
of homoskedasticity,
z2(n1),
(21.114)
co + y'#! +
lll
yielded R2 0.190, FF(y) 2.8 and FR2 15.2. ln view of the fact that
the null hypothesis of
./6.73) 2.73 and z2(6) 12.6 for a
homoskedasticity is rejected by both tests.
The most important feature of the above White heteroskedasticity test
that
is
no particular form of heteroskedasticpy is postulated.
subsection
ln
(3)below, however, it is demonstrated that the White test is an
exact test in the case where D(Z,; #) is assumed to be multivariate t. ln this
case the conditional mean is pt #'xt but the variance takes the form:
=
=0.05
kapparently'
c/
c2
(21.115)
f
+x'Qxf.
variables'
xf) + lll We Can
Using the
argument for ul Eluijxt
This suggests
derive the above auxiliary regression (seeSpanos (1985b)).
that although the test is likely to have positive power for various forms of
heteroskedasticity it will have highest power for alternatives in the
multivariate t direction. That is, multivariate distributions for D(Zr; #)
which are symmetric but have heavier tails than the normal.
ln practice it is advisable to use the White test in conjunction with other
tests based on particular fonns of heteroskedasticity. ln particular, tests
which allow first and higher-order terms to enter the auxiliary regression,
such as the Breusch-pagan test (see(128)below).
lmportant examples of heteroskedasticity considered in the econometric
literature (seeJudge et al. (1985),Harvey (1981)) are:
tomitted
Cr2
c2'x);
(21.116)
>'
468
Departures
lii)
Gl
(iii)
c2l
from assumptions
probability
model
c2('x*)2.
t
exp('x*)!
.
!'
tother'
oj2
/;(a'x)),
(2 1.119)
aa
'
ap,=0,
(aj)
c2.
The 1og
21.4 Homoskedasticity
likelihood function
logL,
(retainingnormality,
x)
,'
469
const
1.
jl log c,2
t 1
-j
-.j.
'-'
j c;t 1
2(.p
#'xf)2,(21.120)
uM-L.t-j
oeIogz-tyl,
log,,(),I(p)-1
test
(2 1.121)
where #refersto the constrained MLE of 0- (j, ). Given that only a subset
of the parameters 0 is constrained the above form reduces to
'
LM
nx
(21.122)
(see Chapter 16). In the above case the score and the information matrix
evaluated under Hv take the forms
(2 1. 123)
(2 1. 124)
andlc1
42
LM test statistic is
where
=(),
LM
=(1
j7- 1
F)
tl
)(
)(
xlw,
=j
Hence, the
H (;
)(
xlw,
klm
'v
''.:t
xlxl'
R1
- 1),
(21.125)
(21.126)
',y- wy.z
1--bv4*
'
xlxl'
xl'w,
'
xlwf
'
(21.127)
w'?2
is asymptotically
equivalent
Departures
from assumptions
probability
model
Ho
'v
z2(m
-
(1979),Harvey (1981)1.
lf we apply this test to the estimated money equation with x,l > (x,, c?.
#af) (See (78) and (79)) xlr, x excluded because of collinearity) the
auxiliary regression
:2
--2
k'
yielded R2
/ 1X
+ 7'2 zt + T'3
3t
L't
FF(y)
1) 19.675 and
Given that TR2
F(11, 68) 1.94, the null hypothesis of homoskedasticity is relected by both
test statistics.
=0.250,
=20,z2(1
=2.055.
(3)
Tackling heteroskedasticity
y* Hy,
=
X*
for
HX
H'H
A - 1.
21.4
Homoskedasticity
471
is multivariate
*,0)
y,
Xf zw&
cj
o.1 c
az3
Y22
(21.13 1)
lt turns out that the conditional mean is identical to the case of normality
(largelybecause of the similarity of the shape with the normal) but the
conditionalvariance is heteroskedastic, i.e.
ft.Ft/''Xf Xf) J1 2Y2-11X/
(21.132)
and
=
(see Zellner ( 197 1)). As we can see, the conditional mean is identical to the
one under normality but the conditional variance is heteroskedastic. ln
particular the conditional variance is a quadratic function of the observed
values of Xl. ln cases where linearity is a valid assumption and some form of
heteroskedasticity is present the multivariate l-assumption
seems an
obvious choice. Moreover, testing for heteroskedasticity based on
H0:
0-2
f
(7.2
tzzzz1, 2,
D(.)Xr;
#)
D(),/X,;
#1) D(Xt;
'
(21.134)
a)
are no longer variation free (seeChapter 19 and Engle et al. ( 1983) for more
details) because /1 = (Jj cE2-21 cl 1 J1 2Ec-21 o.a1, Ea-a1)and /2 EEE(Eac) and
the constant in the conditional variance depends on the dimensionality of
Xl. This shows that /1 and /2 are no longer variation free.
The linear regression model based on a multivariate sdistribution but
with homoskedastic conditional variance of the form
-
Vart?&/xl
xt)
1.,()c
(J,'0
2)
/ and
:2 are
from assumptions
Departures
- probability
21.5
Parameter
time invariance
(1)
Parameter
time dependence
An important
)'f
assumption
#'xt+
underlying
the linear
model
regression
statistical GM
(21.136)
ut,
is that the parameters of interest 0 HE (#,c2) are time nlllnknr, where #>
1
tj.latc-alcaj
The time invariance of these parameters
1222
- J 21 and c2 cj j
is a consequence of the identically distributed component of the assumption
=
At
'v
N(0, Y),
i.e.
tzr,r e:T)
is Nl1D.
(21.137)
This assumption,
N(m(!), E(r)),
(21.138)
of several
(i)
mt
mt eXP )
=
(iii)
n2t
(21.139)
ao + al 1,'
ll
tl + a1
+ aj 11',.
(1 -
e-'
(21.140)
(21.14 1)
1),
(xA'f,)
-
;)2at(r/)))
(.)'j'(t,')t
((m'''
1at('/))
(21.142)
time inuriance
21.5 Parameter
& vt
-
Xt
.'
Vart
where
T'l
x t)
,/x
#'x*
t
l
xt )
c,2
1', l-kt,,/(1.,,),/1,
',.zltJ'),
#(l)f Ycct/)=
r&zcl
=
1(r)
,.1
/.',11(r)
x)
2(l)Y22(l) -
,1
2(r)Ec.a(r)-
',...,1t.?'4-
t1, x;)'
'cc1(r).
Geliminated'
Klargely'
474
Departures
(2)
Testing
for parameter
time dependence
and
E()'t ,.'Xj x)
the implied
#;Xf
yt - S()'f/'Xf
(21.145)
XJ),
statistical
yt
tIt
(21.146)
&t,
with pf
V)being the statistical parameters of interest. lf we compare
(146) with (136)we can see that the null hypothesis for parameter time
invariance for a sample of size T is
Klpt,
Hv :
#j #2
=
against
Sl
'
'
#w #
ct2 # c2
pt# #
c2j c22
and
for any r
1, 2,
cw2
=
(7.2
logfa=const
jg log c,2
1
-
-1
2(
cf-
'
-#r'xf)2
pr
(2 1.147)
,-,
(y, #,x,)xf
,
()7lo g L
0,
2 ..-Y
(, G't
ac:r
2c4t
u?
()
(2 1.148)
ctl
(Z,k-
(x0'X0)
k
#
txk'yk
#,
lyk,
k + 1,
T; the corresponding
pts
(21.150)
.T:
(y't-#;xf), t k + 1,
=
(2 1.149)
1, k + 2,
(X, X, ) Xt yt
(xk0)
-
H
yk0
(y1, yk). Moreover, for 1,:::ck +
couldconceivably be estimated by
.
fh
=
(A'f-
#; 1x,)
-
+ x;(#,
l/,
#, 1),
-
k + 1,
/? k- 1 +(X,=
1X!-
1)
x! d,
t
r= k + 1,
(21.152)
'f)
.
each new
#fsas
(21.153)
7-:
lx
==
'
'
substituting
f'!
lg,
+ x'
pt
(X,0-'j x,0-j)
- 1
f- 1
i
jg1
xfx;
pi
x;(x,0-' 1 x,0-j)
t- 1
'
xylj
1
(2 1.155)
c22,
(see exercise 7). Hence, under So, Et)
0, El)
This implies that the standardised recursive residuals
=
wl
6
.-
Jf
k+ 1
tzzzk + 1,
(21 156)
Hv
'v
N(0, c2Iw-k)
(21.157)
N(J, C),
(2 1.158)
Hl
where
,w.
J EB (k +
'j
=
t df
!, s
c Ectsq,
1
x/f #t (Xt0-'I Xf0- ) ' F x ix'iqi
i 1
1
k+
a,
w),
f-
k + 1,
T)
476
Departures
(7.2
--j
t!?, -(It
...............=
0'
+ x'(X
t
t
Xt -
- j
)-'
i
)-)x x'c2
i
1.
.
-.-
0
X
(X0'
r- l t - 1j-
t 1' +
=
X l'(X,0-'1 X,0- j.
t/ (/
)-
<?
t s
xt
T2. (2 1 160 )
.
' -'
)-')x
'ic-,?
,.x
-:.
(2 1.161)
7- (see exercise 8).
for t < y, l k + 1lf we separate Hv into
=
Hlvq)
'
for a11 t
pt #,
..
H$1t.
0
(7.2 =
t
(7.2
for al1 r
12
1 2.
'J)
'/-i
w
but
N((), c),
(2 1.162)
HL2$
.N(J, c2Iy.-j),
'v
(2 1.163)
This shows that coefficient time dependcnce only affects the mean of w and
variance time dependence affects ils coq'ariance. The implication from these
results is that we could construct a test for //t)1) given that f:lttlz'holds against
St/ '.' pt+ p for any t 1, 2,
T based on the chi-square distribution. ln
view of ( 163) we can deduce that
=
k.l:)
(j
w,
I'
'
r=k
f=k+1
'
cl
(,
(21 164)
:1
This result implies that testing for Ho '. given Hlvlbis valid, is equivalent to
testing for F(wr) 0 against F(wt) # 0. Before we can use ( 164) as the basis of
a test statistic we need to estimate c2. A natural estimator for c2 is
=
2
.sw
where
1
. -..----.. 1F k
f
j''n (u,
k
,.
1/(F- /$:))J
E)
jJ--'k
-#.
fi
(2 1 165)
.
).gj2
=u
u.',. kNrt?t,that
? + 0 when
HL') is
not
valid.
Ho
x
t (w
-
.
-
1)
.
(2 1.166)
21
time invariance
Parameter
.5
#t'a
(71
7' t?a
z 1 (y)1
.:y :
).
-
1- a
d rl T
(.
(21 167)
&- 1)
and ( 167) is
(see Harvey ( l98 1)). Under Hv the above test based on ( 166) HLI'
does not
UMP unbiased (see Lehmann ( 1959)). On the other hand, when
(7.2
E'(sw2,)
and this can reduce the power of the test significantly (see
>
1101(1
Dufour ( 1982)).
1
conditional on HLlb
being valid was suggested
Another test related to HL2
by Brown, Durbin and Evans ( 1975). The CL.rsL'AzJ-test is based on the test
statistic
l
1$.f
;
J4(
..
k + 1,
(21.168)
71,
(1/(T-/)q
j7- j ttl. They showed that under Hv the distribution
by N(0, l - /) ( W'; being an approximate
of W( can be approximated
Brownian motion). This 1ed to the rejection region
.s2
where
(I-T
.L,
y : !Hz;(>
(.z
/(
()v =
I%'
)(T' - /..)-' 5,
with a depending on the size x of the test. For tx 0.0 1, 0.05, 0. 10, a 1.143,
0.948, 0.850, respectively. The underlying intuition of this test is that if Hv is
invalid there will be some systematic changes in the j,s which will give rise
to a disproportionate number of wfs having the same sign. Hopefully, these
T
will be detected via their cumulative effects 1V;,t Ik'+ 1,
Brown vt aI. ( 1975) suggested a second test based on the test statistic
=
(2 1.170)
/fo
lz,--
(21 17 1)
j,zr
1-5
f-- k )
.
1
'.J( -- r) -- 11
;
(t
'
H0
x
F ((F - ;) (t - li))
,
(see
(2 1.172)
478
Departures
This enables us to use the F-test rejection region whose tabulated values are
more commonly available.
Looking at the three test statistics (166),(168)and (170)we can see that
one way to construct a test for Hvlt is to compare the prediction error
with the average over the previous periods, i.e. use the test
squared (wr2)
statistics
wr2
z2(y0)
t
t- 1
(1- k) i
)-
k + 1,
-k
which is compared
Hv
G2
'v
z (1)
(2 1.173)
TJ
wf2
t- 1
Hv
w/
y ij 1
and
(21.174)
z (1-k),
and the two random variables are independent. These imply that under Hv,
z2(y/) F(1,
zty)')
t(t - k), l k + 1,
TJ (21.175)
lt must be noted that z(y/) provides a test statistic for #t pt j assuming
- For an
that c,2 cJ.j; see Section 2 1.6 for some additional comments.
overall test of Hv we should use a multiple comparison procedure based on
the Bonferroni and related inequalities (see Savin ( 1984:.
One important point related to a11 the above tests based on the
standardised recursive residuals is that the implicit null hypothesis is not
quite Hv but a closely related hypothesis. lf we return to equation (156)we
can see that
x
-k)
or
'v
'(u?r) 0
=
which is
if x'tlqt
bt )
-
(2 1.176)
0,
pt.-pt- 1) 0.
=
In practice, the above tests should be used in conjunction with the time
of the recursive estimators / ft i 1 2
aths
k and the standardised
/7
TJ lf we ignore the first few values of
recursive residuals wf, t k + 1,
these series their time paths can give us a lot of information relating to the
time invariance of the parameters of interest.
ln the case of the estimated money equation discussed above, the time
paths of p,t, #2r,jat, #4 are shown in Fig. 21.1(J)--(J)for the period t 20,
80. As we can see, the graphs suggest the presence of some time
.
dependence in the estimated coefficients re-enforcing the variance time
dependence detected in Section 21.4 using the heteroskedasticity tests.
,
21.5
(3)
479
time inuriance
Parameter
Tackling parameter
time dependence
a-,-(Ms),-,n(wM),-cIn('wM),.,
+In(,ws),.-
(2 1.177)
-P
0.000 574,
M
Var A z ln P
=0.001
354.
(2 1.178)
480
Departurcs
from assumptiow
- probability
model
(b)
Fig. 2 1.1(fz). The time path of the recursive estimate of
p3t- the coefcient
time path of the recursive estimate of pzt the
21.6
Parameter
structural
481
change
0.100
0.075
0.050
0.025
it
0
-0.025
-0.050
-0.075
-0.1
1968
1970
1972
1974
1976
1978
1980
1982
1976
1978
1980
1982
Time
(c)
4
2
0
-2
Jl!
-4
-6
-8
1
-. O1968
1970
1972
1974
Time
(d)
Fig. 21. l(c). The time path of the recursive estimate of pt - the coeftkient
of the recursive estimate of #4, the coefficient of (.
this case the way to tackle time dependence is to respecify pt in order to take
account of the additional information present.
21.6
Parameter
structural
change
Parameter
482
Departures
0.075
0.050
0.025
kc
S
<
-0.025
-0.050
-0.075
1964
1967
1970
1973
1976
1979
1982
Time
ln (M/#)l.
0.10
0.05
f
*
g
-0.05
-0.10
1964
1967
1970
1973
1976
1979
1982
T ime
Fig. 2 1.3. The time path of A2 ln (M/P)f.
related to the point of change is available. For example, in the case of the
money equation estimated in Chapter 19 and discussed in the previous
sections we know that some change in monetary policy has occurred in
1971 which might have induced a structural change.
The statistical GM for the case where only one structural change has
occurred at t T'1(T1 > k), takes the general form
=
21.6
y
..
?l
structural
Parameter
p'jxr +
uj t
pjxt+
lkct
change
483
(2 1.179)
(2 1.180)
12
Tk) T 2
T1 + 1
T) .)w it h p1 EEEE(#1 tr 2j) an d
pcEEE(#2,c2a) being the underlying parameters of interest respectively. For
T) ,Tk + 1,
'Jl the
the case where the sample period is 1::::c1,2,
of
sample
form
the
takes
the
distribution
whe re T 1
.ft
y1 X 1 N
'2/X 2 x
whereTz =
X 1 #1
21
c I wj
X 2 #2
0
a
c-21
(21.181)
wa
S0:
and
/1 /2
=
c2
Sf2)'
0
1
#1 pzt
=
'
'
c2.2,
IS
S(1)
0
fa
.J1(21)
0
'
separately.
Case 1 (L
>
k)
T2(')
.-RSS'I
(seeChapter
,-RSSZ
F-2k
RSSL
RSSZ
'
(21.182)
where RSSV, RSSL and RSSZ refer to the residual sums of squares for the
Tk) and subperiod 2( 1:::=Fl + 1,
whole sample, subperiod 1 (1 1, 2,
F) respectively.
The test statistic (182)can be used to construct a UMP
That islxitcan
invariant test (seeLehmann (1959))
for HL'S against .J/1 fo HLl$.
used
against
for
be
Sj with ct c2a.
to construct an $op timal' test
pb j2
The test statistic is distributed as follows:
=
under
'v
under
HL3',
Sl
HLl$,
f'''h
(2 1.183)
(21.184)
484
Departures
where
-(#1
-#c)'
.z
HL''can be used
ly:T2(y)
(#1-#2).
Gl,
>
(21.185)
(21 186)
.
This test raises the same issues as the time invariance tests considered
in Section 21.6 where c21 o'2a had to be explicitly tor implicitly)
assumed in constructing a coefficient time invariant test. There is no
HLI'
is valid when Hvl might not be. A test
reason to believe, however, that
for Hvlb against S1 can be based on the test statistic comparing the two
estimated variances (seeChapter 19):
=
.s1
1'3(b p' EEE
1
RSSZ
T,
T3(b
F(T2
'v
T2 k
RSSL
- k, T1 k)
under
zaty) F(Tc k, T1 k; )
'v
(2 1.87)
HLlI,
under Hj
(2 1.188)
(2 1.189)
(c2a/c2j)
is the non-centrality parameter which ortunately' does
where
not depend on #1or #2.This turns out to be critical because the test for Hlvt)
against Sl defined by the rejection region,
=
Cl
ty:z:'(y)>
ca),
(2 1.190)
Let us apply the above tests to the money equation considered in the
previous sections. As mentioned above, a structural change is suspected to
have occurred in 1971 because of important changes in monetary policy.
Estimation of the money equation for the period 1963f-1971f: yielded
-0.793
mt
+ 1.050yf +0.305pt
(2.055) (0.208)
R2
1og L
=0.968,
90.08,
RSSL
t,
42 0.964,
=
-0.007t
,$1
=0.0155,
0.006 72,
(21.191)
21.6
structural
Parameter
Estimation
mt
6.397 + 1.641y,
(1.875) (0.191)
R2
#2
=0.994,
log1.=95.37,
Testing for
zaty)
-
yielded
-0.0761
+0.784/7,
(21.192)
f,
=0.0346,
84,
F=48.
(u)
(0.05284) 28
,7a,)
(:.(06
485
=0.993,
RSSZ
HLltagainst
change
(2 1.193)
5.004.
HLI'
For a size x
is rejected. At this stage there is not
c,= 1.81 and
given that HLI'
against Sl r'h HL1t
much point in proceeding with testing HL't
has been rejected. For illustration purposes, however, 1et us consider the
test regardless.
The residual sum of squares for the whole period is RSS 0.1 17 52 (see
against S1 f''h HLI/ the test statistic is
Chapter 19). Hence, testing for HL3'
=0.05,
'2tS-
52)
72)-(0.052
-m.(06
(0.117
72)+(0.052 84)
(0.006
:4)
72
1-Y117.516. (21.194)
HLZ,
is also
rejected.
strongly
It is important to note that in the case of the test for Hv the size is not a
i.e. for the above example the overall size is 0.0975. This is
but 1
becauseHo was tested as two independent hypotheses in a multiple
hypothesistesting context (seeSavin (1984/.
Given that ca=2.5 for a size
a=0.05
-a)2,
-(1
W22
z(y,0)
-
t k
-
t
i
(2 1.195)
jg
w/
.k
l G T2
the following
486
w/
This is based on the equalities RSSt RSSt- 1 + w/ and
RSSt- 1
(see exercise 9). The test statistic CH, known as Cllt?w test (seeChow
( 1960:, as in the case of z(y,0) in Section 2 1.5, can be used to construct a
UMP invariant test not of Ho against J.f1but of Ht against Hj fo HLI' where
S): Xa(#1 #2)0 0. This is not surprising given that pz is not estimable
and thus we need to rely on (y2-Xa/ 1 ) uc - X2(/ 1 - #2)(adirect analogue
of Section 2 1.5) in order to construct a test for
to the recursive residuals
statistic
T
he
Cs-test
is
distributed as follows:
HL).
)r--2
czf-Ftw
CH
2:
'F1
IILIb
under 11%
0 ro
-k)
F( F2, T1 - k; J)
'w
where
under H 1
r''h
HLlt,
(21.198)
(2 1.199)
F
f
Z
rj
'-'/
'-'
(.:2, l1x2!)
=(y2
-Xa,1)
''h
(y2-X2,1)
(21.200)
Appendix 21.1
487
(y2 - X 2 / 1 )'(y 2 - X 2 / 1 )
s'f
W heru
st
RSS 1
T1 - k
(a1.a()j)
lt is not very difficult to see, however, that the numerator and denominator
of this statistic are not independent and hence the ratio is not F-distributed.
Asymptotically,
T4(y)
however,
z2(T2)
'v
s(
c2 under
--+
HLI)
and thus
HL2t.
under
(21.202)
size a test for
aspnptotic
HLlagainst
1-1l
'
ly:z.tyl
t7l
=
> c'al,
dz2(T'2).
(21.203)
Ca
Let us apply the above tests to the money equation discussed above.
Estimation of this equation for the period 1963-1980
yielded:
mt 2.685 + 0.7 13yr + 0.852,/ 0.0521,+ f,
(1.055) (0.107) (0.022) (0.014) (0.039)
=
(21.204)
R2
Iog L
#2
=0.994,
138.52,
=0.994,
RSSL
.s1
=0.109
0.0392,
23,
Appendix 21.1
uriance
stabilising transformations
'
'
(.pf) hpt)
':>:
+ (A',
-/t,)/l'(/t,),
=pt.
488
from assumptio
Departures
- probability
model
1972
Time
1975
15
12
9
@
6
1963
1966
1969
1978
1982
-2.0
-2.5
-3.0
-3.5
-4.0
1963
1969
1966
1978
1975
1972
1982
Time
(b)
Fig. 2 1,4(J).
(#)Time
in order to approximate
this approximation
Varltytl/xf
xf) cxvargll/lt/tfl
graph of ln 1t.
the
(yt-/tt)'(p,))/X,
= vargx/x, xJ('(/t,))2.
=
'(p,)
==
then Vartyl/x,
El(/t,)1
=
'
) such that
of
variance
=
x;
(.pf)by
Appendix 21.1
489
'
11(/.tt )
pt.
ln Fig. 2 1.4(:/) the time path of 7 days' interest rate is shown which exhibits a
variance increasing with the Ievel Jtt. Its log. transformation is shown in Fig.
2 1.4()).
Important
ctlactzpt.k
Questions
Explain
linearity
between normality,
the relationship
homoskedasticity.
variables'
the
Explain the intuition underlying
misspecification test.
State the finite sample properties of the OLS estimator
Komitted
and
type
b (X'X) - 1X'y,
=
'
.1
490
10.
Departures
regression model.
How can we test for non-linearity'?
p-of #.
lx'flxtx/xj
- l
y().
'*h
(.Although
#= (X X)
Appendix 21.1
.J?r
Zf HE
Xv
26.
.w
#r bt 1 +
-
(),
() .j
(X, X, )
xrt.'4
n.f
xt),
t- 1
k + 1,
#,:
.
IL
Exercises
and
when r, N(0, c2) and compare them with the same
quantitites when nt D(0,c,2) and the form of D4.) is
unknown.
Show that b=(X'X)- 1X/y has minimum variance among the class of
unbiased estimators of the form
'v
b* (L + Cly,
=
(X'X) - 1X'
is a k x T arbitrary matrix.
Show that under the assumption
Ep +
and
(y,/Xt
xl) 'vD(/7(x,),c2),
JJ(y2)# c2,
X DXIX X)
of
cf2)
assumption of exercise 3.
c2A is essentially the same as
p under the
492
Departures
from assumptions
probability
'
model
--
A h-ap'
where
'x
0) -
(x t
1 :=
'
(x t -
?-
)-
(X0' X0
x'('X0' X0
Show that
-
#?
#, l
-
0
(X0' j X
...q- j )
, -
+ -
1+
-;-
x/xf
'*1
#y.syx,)
j'
'
L -''
1)
1Xr- x,
x-q.(y..j,
1
8. Show that (X,0-') x,0..
j )jj ) xyx;
9. Verify the expressions for Vartw'r) and Covtwrws) of Section 2 1.5.
10. Show that wf2 RSS: RSSt 1 t 7: /f + 1 Where RSS:
.-
Zl
-
(.f
#-'Xf) 2
t
Additional references
Bickel and Doksum ( 198 l ); Gourieroux
Ramsey ( 1974),. White and MacDonald
t?r
( 1980),. Zarembka
( 1985),.
22
CHAPTER
One of the most crucial asstlmptions underlying the linear regression model
)'w)' constitutes an
is the sampling model assumption that y EEE( )'a.
seqtlentially
T)
X,;
1, 2,
from
draw'n
salnple
D(
t
t?),
independent
the
assumption
enables
function
This
likelihood
to
respectively.
us to define
be
'l
.,
-IP; y)
c(y)
l-lD(),/X,;
p).
(22. 1)
f=1
Departures
22.1
Implications
of a non-random
(1)
t#*
sample
a non-random
sample
22.1
Implications
of a non-random
sample
495
sample assumption
the obvious way to make the independent
t G -1i-)is a (dependent) stochastic
inappropriate is to assume that
process. ln particular (becausewe do not want to lose the convenience of
normality) we assume that
tzt,
Zf
'v
8), Cov(Z?Z,)
N(m(r), E(l,
E(l,
t, s e: T.
#,
(22.2)
lf we return
tconstruction'
tpresent'.
now be
where the past histor-b'of the process comes into both distributions because
it contains relevant information for yf as well as X,. Although the algebra in
))
z, ss
22
XN
Zw
mt l )
E( I l ), E( 1 2)
m(2)
E(2, l )-E42, 2)
'
,x.
t'
@21
)!'
q m
=
tz!
c2
#'xr )
aE2-a1ca I mxl
+
#s
ez (('()p
rz2)
x,)
lp Tt F)
A.-(('(,(t) +. #'tltrjx,
+
Xl
t.r('--J'
. 1 ),
'k,,,
c(Y,O
(iii). Vartl',.
time independcnt
(iv)e1
sample
)?wl' is an independent
y LE t),l
sequentially drawn from .I)(.', X,,' kj ). t l 2,
'f; respectively
,
I.'r/.fz0,
(ii) E
model
'p
'
Sampling
E42,
E( F. l )
mt T)
'r)
22
cj j - .1 araa- trcj
J.'t//X,
E(
(ii)
xt ) - linear in xt
Vartl',
Xt
x,) - homoskedastic
(iii)
(iv)
1J(1
(x''''
) 1(''1))
N((.'''''
model
'
t ...
Probab ility
'
).
xl
XO
t
xI)
(t'()(/),polt),#f(l)-
#'f(/)x,J
.
see Appendix
1)
homoskedastic
involved
txJ(r),
.- 1
Iinear in Zlt'-
)'y.)/ is a non-random
y Es (yj
D( y'p,'/Zt0j Xl,. 9(l)),
t 1. 2.
.
- 1
0)
)-(Iaf(/).J.,
(free of
Zl' -
22.1
Implications
of a non-random
sample
497
the non-llD case is rather involved, the underlying argument is the same as
in the llD case. The role of D()'f/''Xr; 1) is taken over by D()'f Zf0- j Xf ; k$(t))
and the probability and sampling models in the non-llD case need to be
defined in terms of the latter conditional distribution. A closer look at this
distribution, however, reveals the following:
,
f-1
't.pf,/'J(Y/- j ), Xr0
Vart
)f
X/
't7'(Yr0-1)
,
x))l
c()(1)+
#(l)xI
F (af(l)yti + #(r)xt J;
-
r=1
xt0)
(22.4)
c(jttl ;
where
X f0
the conditional
Firstly, although
(x 1
EEE
mean
x 2,
x ).
t
sample sequentially
'Tkrespectively.
.
(2)
The respec6cation
approach
tzr,
tjolve'
f(Zr)
m,
Cov(ZtZ,)
Z1
Z2
E(Ir
Yotl
Y1Y:
xN
.$1).
(22.7)
Yz. 1
Er-?
Y1
Zw
Ew-l
tiuvttlt -s(),
Y1Eo
1 2,
,
T - 1.
(22.8)
That is, the Zfs have identical means and variances and their temporal
covariances depend only on the absolute value of the distance between
them. This reduces the above sample
g'F x ( + 1)(1x E.'Fx (k+ 1)(1
covariance matrix to a block Toeplitz matrix (see Akaike ( 1974)). This
restricts the original covariance matrix considerably by inducing symmetry
and reducing the number of
k + 1) x (k+ 1) matrices making up
these covariances from 7-2 to F- 1. A closerlook at statienarity reveals that
it is a direct extension of the identically distributed assumption to the case of
a dependent sequence of random variables. ln terms of observed data the
sample realisation of a stationary process cxemplifies no systematic
changes either in mean or variance and any z-period section of the
realisation should look like any other r-period section. That is, if we slide a
z-period
along the time axis the picture'
over the realisation
should not differ systematically. Examples of such realisations are given in
Figs. 21.2 and 21.3.
Assuming that ftzf, t (y T) is a normal stationary process implies that as far
tdifferent'
-wnf/tpw'
22.1
Implications
of a non-random
(ii)
(iii)
0* EEEE(c(), v
pi a i
499
c()2;
12
sample
(22.10)
t - 1 c()2).
(22.11)
bmemory'
'lnprtalfc-
Ftzr Z,0- 1)
;(zr,,'c(zr-
z,
- a,
z,
(22.12)
-,,,)),
u;b
.pt
-
E(yt/''c(Yy0-I )X0
x$0),
(22.14)
Departures
ZJ0-
Xf
t) to
be
in an obvious notation. Notv that c'a has been dropped for notational
convenience (implicitlyincluded in pvq.
of stationarity and
lt must be noted at this stage that the assumption
asymptotic independence for IZ, t e: T) are not the least l'estrictive
assumptions for the results which follow. For example asymptotic
independence can be weakened to ergodicity (see Section 8.3) without
affecting any of the asymptotic results which follow. Moreover, by
strengthening' the memory restriction to that of tp-mixing
some timeheterogeneity might be allowed without affecting the asymptotic results (see
White (1984)).
It is also important to note that the maximum lag m in ( 15) does not
represent the maximum memory lag of the process tyf/'Z)'- 1 Xf, t e: T) a.S in
the case of an m-dependence (see Section 8.3). Although there is a duality
with an mth-order Markov process, in the
result relating an rn-dependent
of
is
considerably
longer than m (seeChapter 8).
latter
the
the
memory
case
This is one of the reasons why the AR representation
is preferred in the
is
determined
The
maximum
lag
by the solution of
present context.
memory
polynomial:
1ag
the
,
a(.f-)
=
af.f-f
(22.17)
0.
and
sl
(22.18)
Xy
1 T'-/l'
=(
(22.19)
A'(/)
estimator
of
#'
,
''h
MSEI#)HE c 2 (X X)
t
A(s2)=c2 +y';(z*'M
(X X)
z*ly#c2.
.X
IXIX X) 1 # c 2 (X X) 1
js a biased estimator Of c2'
....y
X '(Z SZ
.:2
+?
Implications
22.1
(v)
(vi)
s2
+%
c2;
-$2
of a non-random
sample
is an inconsistent estimator
j01
Of
c2;
p MSE(j)'
$2(X'X) -
1 +.+
bomitted
y, j'x,
=
(22.20)
+ u,.
reveals that both statistical GM*s are special cases of the general statistical
GM
p,
under alternative
constitute
assumptions
'reductions'
D(Zl
Zw; #)
(3)
.t
c(Y,0-j ). x,0 xy
The autocorrelation
which makes
(22.21)
gt
'
)
(22.23)
approach
Departures
the error terms. This is contrary to the logic of the approach of statistical
model specification propounded in the present book (seeChapter 17). The
approach, however, is important for various
reasons. Firstly, the
approach is very illuminating for both
comparison with the respecification
approach
approaches. Secondly, the autocorrelation
dominates the
textbook econometric literature and as a consequence it provides the basis
for most misspecification tests of the independent sample assumption.
The systematic component for the autocorrelation
approach is exactly
the same as the one under the independent sample assumption. That is,
assuming a certain fol'm of temporal dependence (see (41)).
j)
yt > .E'(y, V',
#'x,.
(22.24)
This implies that the temporal dependence in the sample will be left in the
error term:
(22.25)
;f yt - E (.)/.67)).
.%.as defined in (23).In view of this the error term will satisfy the following
properties:
=
(i)
Elst/.@-,)
E (cJs/.f? r)
(22.26)
c2(p,
t,(r,
,s),
Vw>0.
in
(22.28)
J is an
unbiased estimator
(i)'
Ep
(ii)'
(iiil'
P
-+
# if limr..o.(X'VwX'''F<
=
)s2)
(iv)'
2X
,5,2
and non-singular;
is an inconsistent estimator
S 2(X/X) -
#;
'
gc2/(T'-/()j
(7.2
y.
of
Of
c2.
,
503
ln view of (iiil', (v)',(vi)' and (viil' we can conclude that the testing results
derived in Chapter 19 are also invalid.
The important difference between the results (i)-(vi) and (i)'-(vi)'is that /
estimator in the latter case. This is not surprising,
is not such a
however- given that we retained the systematic component of the linear
regression model. On the other hand, the results based on Covty/-'= X)
property of / in the present
G 21 are inappropriate. The only undesirable
v
the
relative
said
its
is
inefliciency
be
proper MLE of j when Vw
to
to
context
is assumed kntpwn. That is, p-is said to be an inefficient estimator relative to
the GLS estinlator
kbad'
'''
j
j
j
#= (X Vw X) . X Vw y
y
(22.29)
(see Judge et aI. (1985:. A very similar situation was encountered in the case
of heterosk-edasticity and the same comment applies here as well. This
efficiency comparison is largely irrelevant. ln order to be able to make
(2)
with
justifiable efficiency comparisons NN': shotlld be able to compare (/.
well
is
information
l
known,
however,
the
set.
t
based
same
estimators
on
no uonsistent estimator of the
that in the case where 5'w i s tlnknown
exi
information
and
the
of
interest
st
mat rix could not be used to
parameters
efficiency
bound.
full
lower
define a
22.2
sample
question: 'How do we proceed when the independent
of
considered
the
this
will
testing
invalid?'
before
is
be
assumption
approach
autocorrelation
the
the
assumption because in
two are
inextricably bound up and the testing becomes easier to understand when
the above question is considered first. This is because most misspecification
approach consider
tests of sample independence in the autocorrelation
which we
assumption
independence
particular forms of departure from the
approach,
This
however, will be
will discuss in the present section.
above,
mentioned
respecification
approach because, as
considered after the
respecification
the
the former is a special case of the latter. Moreover,
approach provides a most illuminating framework in the context of which
the autocorrelation approach ean be thoroughly discussed.
The
(1)
The
rtzArptlclr/ztw/t?zl
approach
the components
in the sampling
assumed to be a
the systematic
(22.30)
The non-systcmatic
uf
).,t - S(.J.'?
f'lf Vf -
1)
is defined by
component
,f(
t > m.
0,
(22.32)
Moreover,
2
Gf),
JJ(url,/x)
(22.33)
0,
'f p'vxt+
afl'r
1*=
+
i
(22.34)
&t,
- f+
ixt
)'f
Fxf+
(22.35)
lr,
#, #! 1 +
=
j),
(Xf Xl )
xtll'f -
#, 1 xr)
-
(22.36)
15
f(#f
#f 1),
-
(22.37)
The probability
distribution underlying (34) comes in the form of
D()'/'Zy- :, X.,',#), which is related to the original sequential decomposition
T
D(Z*; #)
L1D(Z,/Zf -
f=1
1,
Z1;
#),
(22.38)
22.2
D(Zf /Zf0- j ;
#)
Dt.y'f(L,t-k
505
(22.39)
D(Xf/'Z!0- 1 ;
#c)
D(Xf Xf0.j
,'
(22.40)
#a),
i.e. Yt0- j does not Granqer cause Xf (See Engle et aI. ( 1983:. When weak
exogeneity is supplemented with Granger non-causality we say that Xt is
stronql), exogenous with respect to 0.
The above changes to the linear regression model due to the non-random
sample taken together amount to specifying a new statistical model which
(1
?-t?g?-e'ssf(??7 nl()(leI. Becatlse of its inportance
in
we call the J'nfilhnftr lineal.
specification.
estimation,
testing and prediction
econometric modelling the
in the context of the dynamic linear regression model will not be considered
here but in a separate chapter (see Chapter 23).
(2)
The autocorrelation
approach
'largely'
i-e.
rovtzffzsj)
.....=
c-/l't
--
yj
), i.j
=::
1 2,
,
(22.41)
1%'
.+ 1,
tcf,
506
Departures
c,
.1
#'x +
l
pBt
- 1
(22.42)
;r,
N,,
0< p
<
(22.43)
1)
(22.44)
L(0,. ).)
,
(2z:c )
1%2
0-2
-.
tj
+ . 1og ( 1 - /?
2
2)
(22.46)
22.2
1
t?log L
X'Vw(p) (7p =-rj
c
(' 1og L
t?c2 -
T
=
x+
2c=
1
2c
L
p
p lo-#-.-+
1
p'/? ( - p2)
4.
c,v z.(/?) ..
pc1
(22.47)
0,
1;=
(7.2
(22.48)
(j
1,
+-.j
c
:=
ya (;, - vst-
)f;, j
-
0,
(22.49)
(3)
compared
- the common
factor restrictions
lpl
v, #'xf+
=
pftl't
f=1
-.
j'xr
i -
) +.
ld,
(22.51)
ppi
pi,
?' =
l 2,
,
m.
(22.52)
508
Departures
general form
(22 53)
.
priori.
z);r,,'c(E0
( .
where Ef0-j
(:,
EB
'tcr
1,
))
;f
r
=
a,
(22.54)
ait:t -f-
c(Et0-j ))+
;()),
(22.55)
ut
(22.56)
which is identical to (53), Hence, the common factors are the result of
tmodelling' the dependence in the sample in terms of the error term and not
in terms of the observable random variables directly. The result of both
approaches is a more general statistical GM which
the dependence
of the
in the sample in two different but related ways. The restrictiveness
autocorrelation approach can be seen by relating the common factor
restrictions (52)to the parameters of the original AR(lz7)representation of
Gmodels'
dependence
Tackling temporal
22.2
(Zr
At
m
) kN ))
f.'k1
1()
2(1')
a1
A 2 a(,)
'.w
a 2 1 (j)
I.? i
-
xf
-
distribution:
-1
f7.)1 1
,
(22.57)
'jcc
f-tu j
/30
Dz-zif5zl
ui
)(1 1 (i) + e?
and
ta1
2()
b'i
=
a c 1 (ij ) fo r i
12
a'21()
and
(22.58)
2-a1
.,2f1
(4) are
A2a(f)
ln.
when
J1 1 (f)I
for a11 i
1.
n1.
(22.59)
hold when Granger non-causality
That is. the common factor restrictions
holds among all Zfts and an identical form of temporal self-dependence
1985tJ)).
These are very
rn. t > m (see Spanos (
exists for all Zits. f= 1,
principle
factor
priori.
ln
the
impose
restrictions
common
unrealistic
a
to
of
the
the
by
context
in
tested
testing
indirectly
be
(59)
restrictions can
representation.
general AR(m)
A direct test for these restrictions can be formulated as a specification test
approach. In order to illustrate how the
in the context of the respecification
restrictions
can be tested 1et us return to the money
eommon factor
estimated
in Chapter 19 and consider the case where m 1. The
equation
statistical GM of the money equation for the respecification and
autocorrelation approaches is
.
illl <
Under
S
X
() :
1::
-.
a
'
,::
Iz
Xg
-
--
/.3
(:4,
'
(Z4
..
.-
/.4
(22.61)
from the
Departures
model assumption
sampling
wtt can see that the two sides of (62)have the common jctor ( 1 al L) which
can be eliminated by dividing both sides by the common factor. This will
give rise to (61).
The null hypothesis Hv is tested against
-
H 1: a 1 +
,-3
a
tz....j
s/74
or
/72
# -
a:
.:1
/./3
Although the Wald test procedure (see Chapter 16) is theoretically much
more attractive, given that estimation under 1j is considerably easier (see
Mizon ( 1977), Sargan ( 1980) on the Wald tests), in our example the
likelihood ratio test is more convenient because most computer packages
provide the log likelihood. The rejection region based on the asymptotic
likelihood ratio test statistic (seeChapter 16).
-2 logc-;-ty)
2(loge
-(#,
y)
where and
the form
y))
log. L,
z2(k
x.
(22.63)
1),
of 0 under
takes
X.
.f(
C1
y:
of
Estimation
mt
2 logg 2(y) y
0.766 + 0.793,n,
+ 0.160p! (0.220)
log L
Estimation
of
)
,
dz
+ 0,038)!r + 0.240.:!
(0. 169)
(0.182)
1 -
(0.0
/12
=0.999,
=0,999,
1).
(22.64)
=0.018
+ 0.023p/
(0.208)
(22.65)
li,,
(0.013)
(0.018)
1,
=209.25,
R2
24k
C2
R2
(.',
42
=0.998,
=0.998,
0.8 19t
(0.064)
t,
(0.022)
log L= 187.73,
.$=0.0223,
(22.66)
and three
43.04. Given that c, 7.8 15 for (z
of
is
strongly
rejected.
Ho
freedom,
degrees
As mentioned above, the validity of the result of a common factor test
Hence,
-2
1og.2(y)
=0.05
22.3
511
22.3
(1)
The respecijication
approach
In Section 22. 1 it was argued that the statistical results related to the linear
sample
regression model (see Chapter 19) under the independent
assumptions are invalidated by the non-independence of the sample. For
importance
to be able to test for
this reason it is of paramount
independence.
'f
#'x?+
uf
1 2,
,
(22.67)
'F
(22.68)
respectively. This implies that a test of independence
based on the significance of the parameters x EEE(aj,
#' )' i.e.
.
p.j
can be constructed
a.)', j* H(#'j,
#2,
.
and
pz 0
=
/l* 0.
'
RRSS - URSS
URSS
--
--
T'- ktnl + 1)
mk
---
--
.j-
(22.69)
Departures
justifiable. This is because the statistic z*( #') nykztl'l increases with the
number of regressors; a feature which is particularly problematical in the
present case because of the choice of n1. On the other hand, the test statistic
(69) does not necessarily increase when m increases. ln practice we use the
z(y) > t'alj where ca is defined by
test based on the rejection regict) C1
=
.ry:
L'L?
(22.70)
f'J
.R2
log L
=0.995,
138.425,
0.995,
RSS
mt 0.706 +0.589-,
(0.815) (0.132)
=
0.040 22.
0. 1165,
-0.018rn
-0.046mf 3 +0.2 14-1
-z
(0.152)
(0.166)
(0.129)
+ 0. 19lyt + 0.5 18 v, :
(0.199) (0.261)
-0.060pr
(0.348)
+0.606p-
(0.670)
0.253.9,1 c
-
(0.255)
0. 116.:,
(0.260)
(0.642)
-
(0.022)
+ 0.006/2
(0.021)
0.022.:/ 4.
- a-
(0.223)
-0.479pf
(0.630)
-0.025?'r
-4
(0.348)
-0.0 18)
(0.014)
log L
#2
=0.999,
,j.
(22.72)
+ l,
(0.0 18)
R2
-4
2 10.033,
=0.999,
Rk5'=0.017
=0.0
17 78,
697.
Hv
LMy)=
TRI
z2(mk),
(22.73)
22.3
regression
m + 1,
'.E (22.74)
In the case of the money equation the R2 for this auxiliary regression is
64.45. Given that cx= 26.926 for x
0.848 which implies that LMy)
and 16 degrees of freedom, hv is again strongly rejected.
lt is interesting to note that faMtyl can be expressed in the form :
=0.05
LMqy)
TR2
RRSS
URSS
(22.75)
RRSS
(see exercise 4). This form of the test statistic suggests that the test suffers
from the same problem as the one based on the statistic z*( T'), given that 5.2
increases with the number of regressors (see Section 19.4).
(2)
approach
The autocorrelation
of the
As argued in Section 22.3 above, tackling the non-independence
approach
before testing the
sample in the context of the autocorrelation
appropriateness Of the implied common factors is not thecorrect strategy to
adopt. In testing the common factors, implied by adopting an error
autocorrelation formulation, however, we need to refer back to the
useful is a test of
respecification approach. Hence, the question arises,
the independence assumption in the context of the autocorrelation
approach given that the test is based on an assumption which is likely to be
erroneous'?' In order to answer this question it is instructive to compare the
statistical GM's of the two approaches:
'how
J.',-
#'oxf+ j
i
afy,
1
,.
+
i
)
=
'ixt
+ u,,
r>
(22.76)
n1,
and
p/
p'xt+
t'tl-lrt
.;t,
?(f-)u,,
where J(L) and /74L)are pth- and /th-order polynomials in L. That is, the
postulated model for the error term is an ARMAIP, q) time series
formulation (see Section 8.4).
The error term )f interpreted in the context of (76) takes the form
G
:-(#0
-
bs'xt+
(22.78)
=1
t;f
,
Departures
'endorsement'
)', j'x,
=
)r,
faly1-laf
h(1.)u,.
.5!
(22.79)
Hence, a crude
(22.80)
which constitutes a special case of (74).
The particular aspect of the process
t G T). we are interested in is its
temporal structure. The null hypothesis of interest in the present context is
that
t g T) is a white-noise process (uncorrelated over time) and the
alternative is that it is a dependent stochastic process. The natural way to
tif,
.f(:,,
22.3
jej czz -
Covtkk
+I)
;
gvartk)vartq..1---6)
j
Covtk;,
(Var(cf)
+/)
var(:,)
--
(22.82)
'
(22.83)
Intuition suggests that a test fof thesapple independence assumption in the
present context should eonsider whether the values of for some I 1, 2,
rn, say, are significantly different from zero. ln the next subsection we
.
1 and then generalise the results to != m > 1.
consider tests based on 1:::z:
,
Tbe Durbin-ktson
tes
(/ 1)
=
The
most
.J.'r=
widely
#'x,+
'ir,
t:t =
pst - I +
(22.84)
l,,
(22.85)
Departures
where
0
-1
-1
0
2
-1
(22.86)
:
2
- 1
1
The relationship
where C
=diagl
1. 0,
'
V,.(p) -
:>:
( 1 -p)2I
used
the approximation
(22,88)
+ /?A1
HoL
0,
against
H3 : p + 0
.j
: z 1 ()') %c'a
.p
).
,
(22 9)
.8
where L'z refers to the critical value for a size a test, determined by the
distribution of the test statistic (85)under Hv. This distribution, however, is
inextricably bound up with the observed data matrix X given that b= MxcMx= Iw X(X'X) - 1X' (see Chapter 19) and
-
r 1 (y)
:'M .h(.A1Mx:
E'Mx:
(22.90)
F-k
''C1
(y)
r-k
i
.'ftl?
.=
) 1 t,?
=
(22.91)
22.3
t: H'c
=
N(0, c2Jw).
'v
(22.92)
##z 1 (y)f
for a given size
c.)
Pr
i
.=
',.
'solving'
t7'.)(/
G0
x.
p(y)-Pr
F .-
1*uzur
1(
(vj-r1(y))tk? /9
(22.94)
'r1 (y) 2( 1
'v
lI
(22.95)
above
the Durbin-
z 1 (y) 0.376.
(22.96)
1*.$stl.onJly
thow
l'ttjt?rlp/
Departures
(22.97)
the
(22.98)
and
L-'o
ry:z1(y) > Ju ).
(22.99)
ln the case where dv Gz1(y) %dv the test is inconclusive (seeMaddala (1977)
for a detailed discussion of the inconclusive region). For the case S(): p 0
should be used in (99).
against Sj : p <0 the test statistic 4
view
In
of the discussion at the beginning of this section the DurbinWatson (DW) test as a test of the independent sample assumption is useful
coefficient rl
in so far as it is based on the first-order autocorrelation
Because of the relationship (95)it is reasonable to assume that the test will
have adequate power against other forms of first-order dependence such as
MA(1) (seeKing (1983)
for an excellent survey of the DW and related tests).
Hence, in practice the test should be used not as a test related to an AR(1)
only but as a general first-order dependence test.
error autocorrelation
Moreover, the DW-test is likely to have power against higher-order
dependence in so far as the lirst order autocorrelation coefficient captures'
part of this temporal dependence.
=
-z1(y)
Higher-order
tests
#'Xf
+ Pt-
-P#'Xf
- 1
&r,
(22.100)
22.3
against
is Hv : p
as can be easily verified from (84).The null hypothesis
easier than its
under
much
is
of
Hv
estimation
100)
:
the
Because
(
0.
+
l'fl p
estimation under H3 the f-M-test procedure is eomputationally preferable
to both the Wald and the likelihood ratio test procedures.
The efficient score form of the faAf-test statistic is
=0
( log
z-A,1(y)
=
Lt,' y)
-
pp
I y.( #)-
log
-.-.
--
z-(#.,
)))
-.--
(')0
tA.ftyl
lj nlc2 j j.z1) .. j
-
log Lo
where 0 E:E
z)
oo1
(22.102)
(46)
2)
( 1 - /?
0
lv(04
(X'Vw-1 X)
(22.103)
(22.104)
(22.105)
and
LMy)
(22.106)
TI z2(1)
--
multiplier
Departures
rejection
region
C'1
.ty:
LMy)
ca )
>
wherex
dz2( 1).
(22.107)
C'(y
The form of the test statistic in ( 106) makes a lot of intuitive sense given that
the first-order residual correlation coefficient is the best measure of firstorder dependence among the residuals. Thus it should come as no surprise
to learn that for zth-order temporal dependence, say
;f
/?
=:
-1-tl r
t - c
Tl
(22. 108)
72rlc'r
v'
va
lflf
.+.
:jr''
'''
,..''
.y,
,.'.
,,.'
z
.''
2
;j
12
,
(22. 109)
m< F
'f)
(22. 110)
or a MA(-) process:
(ii)
(22.111)
1, 2,
for any i
1. 2,
z' j
with
rejection
Cj
?J
1
z2(m),
(22. 112)
region
.ty
Lsly) y ca )
(22.113)
22.4
Looking back
error processes
be
cannot
LM-
(22. 114)
IIv
'rR2
with the
z2(/n)
rejection
(22. 1lf )
region
X
f
t.5.
'.
TR2 p: c
::t
'
.f ,
dz2(m).
(22. 116)
yielded'.
FR2
','a-
(22.117)
Given that ('a 12.592 for a 0.05 the null hypothesis 11(): 7 0 is strongly
rejected in favour of Sl : + 0. Hence, the estimated money equation has
failed every single misspecification
test for independence showing clearly
that the independent sample assumption was grossly inappropriate. The
above discussion also suggests that the departures from linearity,
homoskedasticity and parameter time invariance detected in Chapter 2 1
might be related to the dependence in the sample as well.
=
'y
22.4
Looking back
G(p, () 0.
=
(22.118)
The reparametrisation
in terms of t: is possible only when the system of
equations provides a unique solution for ( in terms of o
H(p)
(22.119)
Apandix
22.1
deriYing
zl
mt 1)
Z2
1%
E( 1, 1).
E( 1, F)
m(2)
Y(2. 1).
E(2, F)
mt T)
E( T,' 1)
E(T) T)
'v
Zw
For
z, Af(r)
1.
(x''''
)
(m/vt''
,
mtr)
x(r)
a2 ! (. )-A22(,
1)
1)
f#rl
o.l1 1 (l)
)21
l -
(#, Z,0-1
Xf)
cotr) +
jk,(llx,+
c)1
2(r)
(l) f12a(/)
i + ,;(r)xr
)- gaftll),',
i
=
j1,
c()2(r)
Departures
c()(l) p1y,(!)
=
#7(r)
fza21
(l)
#;(f)
71 1
Important
al
f.51
2(r)f12-2'(r)mx(r)
'
(r)f122(r)
-
? 1 ct rlD
(i, r) -
2(f,
r)
2--21(
f.,tyl z(r)f12-21(r)A22(,
r).
concepts
Error autocorrelation,
stationary
process, ergodicity,
mixing, an inno-
Questions
the interpretation
autocorrelation approach
Compare
with
Appendix 22.1
sample.
sample.
Exercises
sample assumption for p-and
Verify the implications of a non-random
Of
.$2as given by (i)-(vi)and (i)'--(vi)' Section 22.1 in the context of the
respecification and autocorrelation approaches, respectively.
Show that I) 1ogL(@,y)q, (p 0 where log L(0; y) given in equation (49)
=
is
non-linear
in p.
Derive the Wald test for the common factor restriction in the case
where I 2 and m 1.
Verify the formula FR2 LIRRSS URSS) 'RRSS? T given in equation
(75).,see Engle ( 1984).
Derive the f-ktf-test statistic for Ho /? 0 against H 1 : /? # 0 in the case
where t?t /?;, 1 + lI, as given in equation ( 106/.
=
Additional references
model
526
23.1 Specification
misspecification and specification testing in the context of the DLR model,
respectively. Section 23.5 considers the problem of prediction. In Section
23.6 the empirical econometric model constructed in Section 23.4 is used to
the misspecification results derived in Chapters 19-22.
Sexplain'
23.1
Spification
zf
l A()z,-f
(23.1)
+ El,
where
c(z1L1))
El,t
YlA(f)Zl
i
Z/-
(Zf
1()
A(i)
Ef
and
(23.2)
-,
Zt
t7l
a21()
Z?
2,
2()
al
A22(f)
S(Z,,'c(Z,O-
Z1),
)),
(23.3)
with tEl,c(Zr0- j),r > rtll defining a vector martingale difference process,
which is also an innovation process (see Chapter 8), such that
(Er/''Z,0-1)
N(0,
n), n >
0,
model
-z-)i,
where Ljzt
=(
zrt-i,
1, 2,
-solution'
D(Z, ,''Z,0-, ;
,.
,/z,1.
,.
,j.
',''-
0
r- 1
( .k,
.
f -
((I
#;
(a12()
11
( )+
t:
!,
..
a,
1
2f1
)-j. ), X,0
a-cl
:.,.)1 zfkz-l
!'))
a a 1(
((x, .'x,...j
pv (1.c-c1 a j
.'.x: ),
.o
Ac.c(i'))- i
l 2.
,
where
-..:);)
(23.6)
23.1 Specication
'(ut ) E $4E
ut
(F T24
t6
.- 1
)) 0
=
fturl.ksl E )f utus/qth'-
1.)
Go
c? 1 1 - o 12 fl 22
(23.9)
2
tl'o
0
21
(23. 10)
'ft
ET34
Eptut
'tprlf)
1))
-t/6-
0.
The properties ETI-ET3 can be verified directly using the properties of the
conditional expectation discussed in Section 7.2. ln view of the equality
o'lut ut
,
that
deduce
can
we
ET4)
u 1)
(23.12)
-@(',
(23.13)
E'(ur,.'c(Uf0-1 )) 0,
=
N1
its
past.
tpwn
',
D(Xf,,''Zr0-j ;
a)
'L
(23.14)
more detailed
.l)(X,,?'Xr0j ;
2),
m + 1,
model
0, i
1, 2,
(23.15)
IN
).,, #'oxf+
=
a-vf - +
Z #;x!-
(23.l6)
l/f,
am
a2,
a.) satisfy
) afz
m
z -
(aI
m -
1
1
a2f
Covtyfllf
+,)
-+
:z
ta4
(7'211
as z --+
Jz
(see Chapter 8). lt is important to note that in the case where )Zt, t G T) is
assumed to be both stationary and asymptotically independent the above
restriction on the roots of the polynomial in ( 17) is satisfied automatically.
convenience let us rewrite the statistical GM ( 16) in the
For notational
concise
form
more
),f p*'X1
=
where
+ u,,
y =X*#*
+ u,
r > m,
m + 1,
'fj
.
(23.19)
23.1 Specification
where y: (T- rn) x 1, X* : (T- m) x rrltk + 1). Note that xr is k x 1 because it
1) 1 vectors',
rrl, are (/
includes the constant as well but xf - f, i 1, 2,
- x
this convention is adopted to simplify the notation. Looking at ( 18) and (19)
the discerning reader will have noticed a purposeful attempt to use notation
which relates the dynamic linear regression model to the linear and
stochastic linear regression models. lndeed, the statistical GM in (18)
and (19)is a hybrid of the statistical GM's of these models. The part
is directly related to the stochastic linear regression model in
Z)''-1 fyf -f conditioning
view of the
on the c-field c(Y,0- ) ) and the rest of the systematic
extension of that of the linear regression model.
being
direct
component
a
=
follows.
ln direct analogy to the linear and stochastic linear regression models we
need to assume that X* as defined above is of full rank, i.e. ranktx*)
1)'.
ml + 1) for all the observable values of Y?- j H ()..,, + 1
)'vThe probability model underlying ( 16)comes in the form of the product of
the sequentially conditional normal distribution Dtyf Zy- 1 Xf; 1), t > F/1.
T the distributon of the sample is
For the sample period t m + 1,
=
.p.,
(23.20)
sample from
The sampling model is speeified to be a non-random
)'vl' can be viewed as a nonD*(y; /1). Equivalently, y EEE(yu+ 1, )u + c,
T)
random sample sequentially drawn from D( bj/zt.- l X,; 41),r m + 1,
respectively. As argued above, asymptotlcally. the effect of the initial
conditions summarised by
.
DIZ 1 ; #)
Xr
.=
',
#)
0)
vt
'oxt
=
+
i
F
=
spec+cation
GM
The statistical
model
x i )'r
+
f
Y1 p'ixf - i + ut
=
(23.22)
(23.23)
Ll (1
The
(x.2 ,
a,,,,
#I
v,
#mc()2);
,
g3(I
g4(I
.12,
E5q
(11)
jj
m - i
ai
The probability
E61
m-
)wm
model
D()'t
values of Y7-.,
1
exp
J'''''l''''''--o
x ( )
1c / (
'
pr -
#*'X))2
(i)
(111)
The sampling
E81
y EEE()',,,+
(free of
model
asymptotically
is a stationary,
)',n+ 2,
independentsample sequentially drawn from D()4 Zr0- ,
'F respectively.
t m + 1, m + 2,
-pw)'
,xl,.p*),
N()te that the above specification is based on D(y, Zl0- j Xr; p*j, t m + 1,
T) directly and not on D(Z,,''Zf0- j ; #). This is the reason why we need
.
assumption (4(1in order to ensure the asymptotic independence of
,
J(
t
J't /Z0t -
1 !,
Xl ) t >
',
n1l.f
23.2
23.2
Estimation
Estimation
model
assumptions
(6qto (8()the
(23.25)
and
log 1-(p*,'y) const
=
(T- rn)
-
1og c/
1 -X*#*)'(y
- 2/.0 (y
-X*#*),
(23.27)
:20
1
-
/
-
(F - m)
(23.28)
li
y - X*J*The estimators /* and 42o are said to be appvoximate
where
estimators (MLE's) of #* and c()2, respectively, bause
Iikeliood
nmximum
conditions
initial
have been ignored. The formulae for these estimators
the
similarity
between the dynamic, linear and stochastic linear
the
bring out
the similarity does not end with the formulae.
models.
Moreover,
regression
for the dynamic linear regression model can
statistical
GM
the
Given that
viewed
the
other
of
two models we can deduce that in direct
be
as a hybrid
linear
regression model the finite sample
the
stochastic
analogy to
(vl
of
and
distributions
are likely to be largely intractable. One important
J*
dynamic
and stochastic linear regression models,
difference between the
however, is that in the former case, although the orthogonality between pt
and l1t (pf Ik?) holds in the decomposition
(23.29)
t', /t, + l?r. for each t > m,
it does not extend to the sample period formulation
=
(23.30)
y p+u
=
for z < 0
(23.31)
One important
Ep-'b
implication
:#:
/* is a biased estimator
of this is that
0,
of
#*,i.e.
(23.32)
(J*,
Asymptotic properties
of
#y
Using the analogy between the dynamic and stochastic linear regression
models we can argue that the asymptotic properties of 11depend crucially
on the order of magnitude (seeChapter 10) of the information matrix Iv0*)
dcfined by
'(X*'X*)
Ir(#*)
cp
(23.33)
This can be verified directly from (27)and (28).lf E(X*'X*) is of order Op(T)
then Iw(p*) Op(F) and thus the asymptotic information matrix I.,(p *)
limw.-.((1/F)Iw(p*)j <
Moreover, if GwEB S(X*?X*/T) is also nontsufciently
large', then the asymptotic properties of
singular for all T
as
of
MLE
be
Let us consider the argument in more detail.
0*
deduced.
can
a
If we define the information set
(c(Y)L1), X,0 x/)) we can deduce
1
that the non-systematic component as defined above takes the form
=
,'.'.f,
,6'-
nr
=
)'? Ef.t,ih-
1),
.f(/.r,
(23.34)
hrj)
.%-
(,
*
tf we
ian
j*)
(X*'X*
'j-
ttat
ensue
yj,x-* x
'
hy
.0
a4a .-ss
then
we can use
( ),
X*'tl
the strong
$7%A%
taw of targenumbes fo
mattqtates
to stow
Estimation
23.2
(-X*
'u
-j
a'S.
--,
(23.37)
().
Hence,
a %.
.
'*+
...#
p.
E(Xt3ut/.%
1)
Xtt.qElut/eh
-
:)
0, i
1, 2,
m(k + 1).
defines a
(23.38)
This suggests that the main assumption underlying the strong consisteney
or p-*is the order of magnitude of F(X*'X*) and the non-singularity of
F(X*'X*/F) for T > m (k+ 1). Given that X) EEE(.J.7, 1 yf 2,
.J7f-?n, N, xf 1
conditions
F(X*'X*)
satisfies
if the crossthese
we can ensure that
.
products involved satisfy the restrictions. As far as the cross-products which
ipvolve the y,-fS are concerned the restrictions are ensured by assumption
afz'n - i) 0. For the xts we need to assume
(4J on the roots of (2n j7--11
,
,xt-.)
that:
<
c, i
lxffl
1, 2,
k, t (EFT, C being a constant;
.E1,/(T-z)q
limw7--Jx,x;-,= Qzexists for z > 1 and in particular
non-singular.
also
Qo is
(i)
(ii)
a S
. .
a S.
.
ri'g
-+
(23.39)
c.
by multiplying
(#1 0w) with x? T (the order of its standard
normality
deviation) asymptotic
can be deduced:
Moreover,
/''
' T( #+ #+)
-
'
1
x(() j (p*)- )
(23.40)
That is,
...'
'r(:
o
(
' Tdo
-, v )
-
j!
c())
x
J
x(0,cpG - ),
(23.41)
4.
-N(0,2co).
(23.42)
The
lwc(7/tlconsistency
of
#w*
as
an estimator
of 0*, i.e.
#*
F
--+
(23.43)
p*
Example
The tests for independence applied to the money equation in Chapter 22
showed that the assumption is invalid. Moreover, the erratic behaviour of
estimators
and the rejection
of the linearity and
the recursive
assumptions
Chapter
confirmed
1
homoskedasticity
in
2
the invalidity of
values
conditioning
observed
of
only.
Xf
ln such a case
the
on the current
natural
proceed
appropriate
statistical
model
the
is to respecify the
way to
for the modelling of the money equation so as to take into consideration the
time dependence in the sample.
ln view of the discussion of the assumption of stationarity as well as
economic theoretical reasons the dependent variable chosen for the
postulated statistical GM is m;b ln (,hzJ,/'#,)(seeFig. 19.2). The value of the
maximum lag postulated is m 4, mainly because previous studies
restriction adequate to
demonstrated the optimum 1ag for
characterise similar economic time series (see Hendry (1980/. The
postulated statistical GM is of the form
=
'memory'
ln:
()v+
'ximJ-
(//1
i+
-)';
pzipf - i + (lsiit -
f)
(23.44)
+ ut.
tunusual-
(Q1/-197?fr)The
introduction
23.2
Estimation
(Qar- 19
',75
i t..)
GM
l-le sl./spt?n.tft??'lq.Jthe
th t? banks
to
('/'?(?nl'lt?/
t-tp/-stpr (1;1(1
r/?e
?7t,u'
r/7t.,Btlnk
llltliial.l
t?/'
Enlland t/s/t?t/
J/t?/-st-?k//
vjol
t/)4lp)'
Ioans.
I-he in rroc/nclfo'l f?/- A.11 as t/ tnonetar )' t(1?-gt.:1
The estimattd coeftcients for the period 1964f 1982f1, are shown in Table
23. 1. Estislatlon of (44)with htnl as the dependent variable changed only
the goodltlss-of-fit measure as given by R.2 and #2 to R2 0,796 and /12
0.7 11.The change measures the loss of goodness of fit due to the presence of
()Lt
a trend (compare Fig. 19.2 with 2 1.2). The parameters p EEEF(jt). jl (lzi
c2()j
in terms of whicl the statistical G M is
ui ,. I i 0- 1, 2, 3. 4- c I t'a- t. a.
defined are the statistical and not the (econonic ) theo retical pa rameters of
interest. ln order to be able to determ intt the latter (using specification
statistical G M is well
testing), we need to ensure first that the klstilrlatel
assumptiens
tlnderlying
is.
11
That
that
the
defined.
the statistical
g) 1))8-J
is the task of
model are indeed : al id. Testing for these assun-iptions
nisspecificttion tcsti ng in the context of the dynamic linear regression
naodel consid t?red i1-1 t ht2 next sectio n
Looking at the time graph of the actual (y,)and fitted ( f'r)(see Fig. 23. 1)
pr quite closely
values of the dependent variable we can see that t
for the estimation period. This is partly confirmed by the value of the
variance of the regression which is nearly one-tenth of that of the money
equation estimated in Chapter 19,*see Fig. 23.2 showing the two sets of
residuals. This reduction is even more impressive given that it has been
achieved using the same sample information set. In terms of the 5.2 we can
see that this also confirms the improvement in the goodness of fit being
ttracks'
0.075
0.050
actual'
l
0.025
l
l
l
l
l
j
$
v
I
t/
..1
p
/l
l
l
t
l
lw
f'-
$,
ZZ
0 025
f ktlsd
1/
l
k
-0.050
-0.075
1964
1967
1970
1973
Ti m e
1976
T.,
1979
1982
regressions
(19.66)
more than double the original one with ml as the dependent variable and y,,
Only
the regressors (see Chapter 19).
pf and it
Before we proceed with the misspecification
testing of the above
estimated statistical GM it is important to comment on the presence of the
23.3
Misspecification
testing
539
three dummy variables. Their presence brings out the importance of the
background in econometric modelling.
sample period economic
Unless the modeller is aware of this background the modelling can very
easily go astray. ln the above case, leaving the three dummies out, the
normality assumption will certainly be affected. lt is also important to
remember that such dummy variables are only employed when the
modeller believes that certain events had only a temporary effect on the
relationship without any lasting changes in the relationship. ln the case of
longer-term effects we need to model them, not just pick them up using
dummies. Moreover, the number of such dummies should be restricted to
be relatively small. A liberal use of dummy variables can certainly achieve
wonders in terms of goodness of fit but very little else. lndeed, a dummy
variable for each observation which yields a perfect fit but no explanation' of
any kind. In a certain sense the coefficients of dummy variables represent a
measure of our ignorance'.
thistory'
23.3
Mi%pecification
testing
(1)
Assumption underlying
(2lf
the statistical
pt
=
p'vx+
) taf.y,,+ #;x,
-f)
-f
(23.45)
value
m* then
Snear
m cbosen
small'
'too
small' then the omitted lagged Zrs will form part of the
lf m is chosen
unmodelled part of )'l and the error term
'too
(23.46)
is
no
longer non-systematic
.%
kfutvytnt
(23.47)
That is.
'true'
*1 +
#'oxf+ j
)),
((zf-'J
j'fxr j)
-
ul can
(23.48)
+ u,,
This implies that m < m* can be tested using the null hypothesis
H o : ap,* 0
a nd
where
x.*K
+. 1
(.1,,,
pm*0
=
a.)'
aga in s t
#.*EEEE(jm.#j
H 1:
,
<
#0
#.*+ 0
#,,,.)
.
The obvious test for such a hypothesis will be analogous to the F-type test
for independence suggested in Chapter 22. Alternatively, we could use an
asymptotically equivalent test based on FR2 from the auxiliary regression
;b
-(,()
-#'x +
r
(af).'r-f+ #;x,-f)+cf,
(23.50)
Misspecification
23.3
where
testing
of
f) + ut..
+
v p'vxt+ Yltaiy,- i #'fxf-
(71
ty
FR2
>
c.)
dzzunl*
n1)/t-1
(23.52)
L'y
estimated
in Section 23.3
with
-0.009
Fr(y)
0.010 49
0.009 65
43
65
0.467.
(23.53)
Given that
t'a
=4
W'r.st?n (DW) test is not applicable in the present context because the test
depends crucially on the non-stochastic nature of the matrix X. Durbin
of the DLR
1 + ur) in the context
( 1970) proposed a test fpr AR(1) (.u)
statistic
so-called
by
defined
ll-test
model based on the
1
F- m
N(0, 1)
(23.54)
b ( 1 - JD I4z-(y))
1-(T-rn) Vaitafl=pu1-
x.
<
under Ho: p
'X
cl
.fty
1n1
>
czy'.
,
1a
4() d-
(23.55)
Ca
1-*
(23.56)
l1I1 w*vartti'l)
Ho
and z(y)
(198 1:,
(see Harvey
'v
(23.57)
-#Dw),
=k1
1%1
?%=J'x)
l pflir-f + nr,
+
f
(23.58)
pc
'
'
'
=0,
=pI
for any i
S1 : pi #0,
1,2,
,1
.
(23.59)
is to use the F-test approximation which includes the degrees of freedom
chi-square
form. The
correction term instead of its asymptotic
autocorrelation error tests will be particularly useful in cases where the Ftest based on (48) cannot be applied because the degrees of freedom are at a
premium.
ln the case of the estimated money equation in Table 23.1 the above test
yielded:
statistics for a
=0.05
1.19,
LM
(2): FF(y)
LM
(3):
LM
(4): FT(y)
FF(y)=(
cx
1.96,
()s()j();gj)
1)
1(-j-1-0.318,
..:-1-0.23
1
0.010 466
-0.010
().()j() aj5
49
255
47
45
-0.350,
3.18,
(23.60)
c.=2.80,
(23.61)
cx
1, c.=2.5
/.
(23.62)
As we can see from the above results in all cases the null hypothesis is
(53)above.
accepted confirming
23.3
Misspification
543
testing
The above tests can be viewed as indirect ways to test the assumption
postulating the adequacy of the maximum lag rn. The question which
naturally arises is whether m can be determined directly by the data. ln the
statistical time-series literature this question has been considered
extensively and various formal procedures have been suggested such as
Akaike's AlC and BIC or Parzen's CAT criteria (see Priestley (198 1) for a
readable summary of these procedures). In econometric practice, however,
it might be preferable to postulate m on a priori grounds and then use the
above indirect tests for its adequacy.
specifies the statistical parameters of interest as being 0* HE
Assumption g2(I
(J1
xm, #(),#1
#,,,,tr()2). These parameters provide us with an
opportunity to consider two issues we only discussed in passing. The first
issue is related to the distinction made in Chapter 17 between the statistical
and (economic)theoretical parameters of interest. ln the present context 0*
as detined above has very little, if any, economic intemretation. Hence, 0*
represents the statistical parameters of interest. These parameters enable us
to specify a well-defined statistical model which can provide the basis of the
tdesign' for an empirical econometric model. As argued in Chapter 17, the
estimated statistical GM could be viewed as a sufficient statistic for the
theoretieal parameters of interest. The statistical parameters of interest
with the
parametrisation
provide only a statistically
(sufficient)
theoretieal parameters of interest being defined as functions of the former.
This is because a theoretical parameter is well defined (statistically) only
statistical
parameter. The
when it is directly related to a well-defined
determination of the theoretical parameters of interest will be considered in
Section 23.4 on specification testing. The second related issue is concerned
collinearity. ln Section 20.6 it was argued that
with the presence of
and
relatiy e to a giN en parametrisation
defined
collinearity
is
enear'
collinearity
likely
it
that
is
In
the
context
(or
present
information set.
insufficient data information) might be a problem relative to the
parametrisation based on 0*. The problem, however, can be easily overcome
a
in determining the theoretical parameters of interest so as to
Both issues
theoretical parametrisation.
parsimonious as well as
will be considered in Section 23.6 below in relation to the statistical GM
estimated in Section 23.2.
postulates the strong exogeneity of Xt with respect to the
Assumption (.3(J
7 As far as the weak exogeneity component
parameters 0* for t m + 1
concerned
it will be treated as a non-testable
of this assumption is
of
the
linear regression model (seeSection
presupposition as in the context
20.3). The Granger non-causality component, however, is testable in the
context of the general autoregressive representation
,
tadequate'
'near'
-near'
Sdesign'
'robust'
model
#l1
At
A(j)Z,
-f
+ E,
(23.63)
'l'.r
(23.64)
where URSS and RRSS refer to the residuals sums of squares of the
/'n, respectively. The
regressions with and without the .-js, i 1. 2,
rejection region takes the form
.,
C1
f
tX
'
'
14S
(23.65)
The Wald test statistic can be viewed as an F-type test statistic and thus a
natural way to generalise it to the case where k > 1is to use an F-type test of
significance in the context of multivariate linear regression (seeChapter 24).
For a comprehensive survey of Granger non-causality
tests see Geweke
( 1984).
Assumption g4qrefers to the restrictions
on x needed to ensure that
l )'r, t s T ) as generated by the statistical GM:
(23.66)
is an asymptotically
form
alfal.pf
=
where
w, + n1,
(66)in' the
(23.67)
=0
(23.68)
Misspification
23.3
testing
.v2,
171
-
-1-c 2 +
'
'
'
+ t?2/.
( 1 /.l
.
m
''
+ c m?vm
.
i>.
''
''
'
(23.69)
independence
In order to ensure the asymptotic
(stationarity) of
T)
g
need
this
l
decay
: )'t,
component to
to zero as t
we
w in order for
Priestley
198
conditions
initial
T)
(E
l
the
.ty,,
( 1/. For this to
to
(see
polynomial
the
of
which
;.z,
the
2,,,),
roots
be the case (21
are
-+
iforget
.,
(23.70)
should satisfy the restrictions
12f < l
(23 1)
.7
l 2,
1.=
rn.
,'';.,.
and
), i
1, 2,
lim gt)
l
-+
m.
.72)
(23
0.
Il
I2f
(23.73)
Cov(y,),,
z)
-
c2()r,
(23.74)
.f
These suggest that both the mean and covariance of $.yf, l c -1J-)increase with
.
t and thus as l --+ C/S they become unbounded. Thus yf, r (E T), as generated
by (73),is not only non-stationary (withits mean and covariance varying
remains constant.
with r) but also its
'memory'
y,, t c T) is non-stationary
-at
Z'(.Ff)
1
1-
Wt
(: j
and
CovtA',yf +,)
co2
since
lt
x1
a'k
(; (.)yj)
a1
-+
1211
-.,
much larger than the optimum maximum 1ag m* then the way to proceed is
to reduce rrl. 1f, however, the problem is due to the rank of the submatrix
xwl' then we need to reparametrise (seeChapter 20). In
X Htxj,
x2,
either case the problem is relatively easy to detect. What is more difficult to
collinearity which might be particularly relevant in the
detect is
As argued above, however, the problem is relative to a
context.
present
and thus can be tackled alongside
the
given parametrisation
reparametrisation of (48) in our attempt to design' an empirical
econometric model based on the estimated form of (48)(seeSection 20.6).
.
%near'
(2)
Assumptions underlying
the probability
model
distribution:
D(Z1 ; /)
1-1Dyt//=
1,
X,;
4),
(23.77)
#f
#lxr+
l-1
Z
=
ai-vr-f +
Z #;xt-f+ ut
=
(23.78)
pr J'x
=
+ y.4 + t',
(23.79)
=0.05,
548
the dependence in the sample) was the main reason for rejecting the
homoskedasticity assumption based on the results of the Whitc test. An
obvious way to refute such a conjecture is to use the same regressors /1,.
/6f in an auxiliary regression where tIt refers to the residuals of the
.
dynamic money equation estimated in Section 23.3 above. This auxiliao
regression yielded:
.
TR2
5. 11,
FF(y)
0.830.
(23.80)
The values of both test statistics for the significance of the coefficients of the
kits reject the alternative (heteroskedasticity)most strongly; their critical
values being ca 12.6 and t'a 2.2 respectively for
0.05. The time path of
the residuals shown in Fig. 23.345) exemplifies no obvious systematic
,x
variation.
ln the present context heteroskedasticity takes the general form
Vart r,/'c(Y0,,-j),
x,0
=
li2
t
t'tj
(.':
tl.
+ cztl.- a +
+ cgth
gt
(J,3.8.'2)
=4
23.4
tshort'
Spification
testing
23.4 Specification
testing
0.05
0.00
-0.05
-0.10
1964
1967
1970
1973
1976
1979
1982
T i me
(b)
Time
model
1.25
,00
0.75
lat
0.50
0.25
-0.25
1973
1976
1979
1982
1979
1982
Time
(c)
0.100
0.075
0.050
0.025
l4t
0
-0.025
-0.050
-0.075
-0.1
1973
1976
Time
(d)
55l
against
Sj
(23.83)
R#* # r,
FT*(y)=
1R/1 -
qdl
RRSS - URSS
URSS
1(RJ*r)
-
T-k*
q
(23.84)
(see Chapter 20). The problem, however, is that since the distribution of #-*
is no longer normal we cannot deduce the distribution of FT*(y) as
Fq, T- k*) under Ho. On the other hand, we could use the asymptotic
distribution of #*, i.e.
v'T(/*
N(0,c2G-1)
-,*)
(23.85)
qyl-sy) z2(t?).
-
Using this result we couldjustify the use of the F-type test statistic (84)as an
approximation of the chi-square test based on (85).lndeed, Kiviet (1981)
has shown that in practice the F-type approximate test might be preferable
in the present context because of the presence of a large number of
regressors.
In order to illustrate the wide applicability of the above F-type
specification test 1et us consider the simplest form of the DLR model's
statistical GM where k 1 and m 1:
=
A'f
=
JOXI
Jlxf 1 +J1X-
+'
lfr.
(23.86)
of empirical
k,f
fvxt+
ttt.
u j yl
+ ut
.,
()1
regression
..j
.I1
0)..
(23.87)
(23.88)
(23.89)
Case 4.
(23.90)
Case 5.
#oxf+ ()1 xr - 1 +
.r
Partial adjustment
(vx;+
#'f
=
x 1 )'f
Case 8. Dead-start'
'r
).
Jl1 -Y
==
+ ut
f -
-i- t: j )'f
0)..
(23.92)
+ x1
/.71
)(x, 1 - )'r - 1 ) +
(/.() 0).'
al
model
j.
/A7oJt.?/
(/.()+
poAx, + ( 1 -
n-ltpt/c/ (jj
Error--correction
A)',
t1t
ur.
1)..
(23 9 3)
.
-h 1/f
(23.94)
For the above eight cases the restrictions imposed are a1l linear restrictions
which can be tested using the test statistic (84) in conjunction with the
rejection region
C1
.t
y : F r(y)>
(',
where ca is determined by
x
dkq, F- k*).
=
Q'y
yf jax, +
=
t;,,
t/-rt?r
c,
553
model
al;,
(a1/0+
/JI
0)..
(23.95)
l,.
For further discussion on a1l nine cases see Hendry and Richard ( 1983), and
Hendry, Pagan and Sargan (1984).
ln practice, the construction ('design')of an empirical econometric model
taxes the ingenuity and craftsmanship of the applied econometrician more
than any other part of econometric modelling. There are no rules or
reduce any well-defined
established procedures which automatically
empirical
estimated
statistical
GM
specified)
to a
('correctly'
economic
mainly
model.
both
theory
This
is
because
econometric
as well as
role
choice
sample
data
play
in
the
properties
of
the
the
('design')of the
a
statistical
GM for a
this,
order
ln
let us return to the
latter.
to illustrate
estimated
23.3
this
equation
estimated
ln
in Section 23.2.
Section
money
and
misspecifications
possible
equation was tested for any
none of the
natural
question
rejected.
The
to ask at
underlying assumptions tested was
constitutes
this
estimated
statistical
GM
that
this stage is
a wellspecify
statistical
model,
how do we proceed to
(choose) an
defined
empirical econometric model?'
As it stands, the money equation estimated in Section 23.2 does not have
any direct economic interpretation. The estimated parameters can only be
viewed as well-defined statistical parameters. ln order to be able to proceed
of an empirical econometric model we need to consider the
with the
question of the estimable form of the theoretical model, in view of the
observed data used to estimate the statistical GM (see Chapter 1). In the
case of the money equation estimated in Section 23.2 we need to decide
whether th theoretical model of a transactions demand for money
considered in Chapter 19 could coincide with the estimable model. Demand
in the context of a theoretical model is a theoretical concept which refers to
to a range of hypothetical
the intentions of economic agents corresponding
values for the variables affecting their intentions. On the other hand, the
observed data chosen refer to actual money stock M 1and there is no reason
why the two should coincide for all time periods. Moreover, the other
variables used in the context of the theoretical model are again theoretical
constructs and should not be uncritically assumed to coincide with the
observed data chosen. In view of these comments one should be careful in
isearching' for a demand for money function. ln particular the assumption
that the theory accotlnts for al1 the information in the data apart from a
white-noise term is highly questionable in the present case.
In the case of the estimated money equation the special case of a static
demand for money equation can be easily tested by testing for the
significance of the coefficients of all the lagged variables using the F-type
'proper'
'assuming
'design'
554
0 and #f 0,
test considered above. That is, test the null hypothesis S0:
: #0 or
i 1, 2, 3, 4, against
#f# 0, i 1, 2, 3, 4. The test statistic for this
hypothesis is
=
.#.fj
F'Ijy)
0.11650
53
53 1(-fI1-
-0.010
(23.96)
28.072.
().()j() 53
Given that
the observed data referto are not intentions orhypothetical range of values,
but realisations. That is, what we observe is in effect the actual adjustment
process for money stock M 1 and not the original intentions. Hence, without
any further information the estimable form of the model could only be a
money adjustment equation which can be dominated by the demand,
supply or even institutional factors. The latter question can only be decided
by the data in conjunction with further a priori information.
Having decided that the estimable model is likely to be an adjustment
process rather than a demand function we could proceed with the design' of
thc empirical econometric model. Using previous studies rclated to
adjustment equations, without actually calling them as such (seeDavidson
Hendry and Ungern-sternberg (1981), Hendl'y
et al. (1978),Hendry (1980),
and
Richard
(1983), Hendry
(1983/,the following empirical econometric
chosen:
model was
In(->-M),
y.''j
-0.134
In(->-M),.f)
(j1
-0.474
(0.130)
(0.02)
(Intyl
-e.196
l-1
(0.022)
-0.80 1 ln Pt -0.059 ln ff
j) (
-0.025
f=1
(0.145)
(0.007)
=0.758,
log 1=223.318,
42 0.725,
=
1)f ln It
(0.008)
+ lif,
-0.045Qjt+0.059Qct+0.053Q31
(0.014)
(0.014) (0.015)
R2
(23.97)
=0.0137,
R&S=0.012 47,
F= 76.
53
(23.98)
=0.753.
=4
=0.01
=0.012
1 342 56
0.012 379
:
().()1 1 y4a
-0.01
FT(y)
with
c.= 2.1 1, x
1-
0.640
0.05,
177 62
-0.012
FT(y)
0.012 379
0.012 177
(23.99)
0.514
test. With
-0.2788
z(y)
F
=-
4a +-
(b)
0.685,
.t3,
tl
rpsrs Jr Iinearity
Misspecscaion
1.18,
cx
2.76.
1.75.
ARCH test:
FF(y)=
0.53 1, cz=2.51.
of normality
homoskedasticity
ca
and
.*
White test:
FF(y)
(23.100)
24
at a = 0.05.
is not
model
D1
for t # 36, f
F T) (y)
0.0 16 00
0.0 14 549
1l
.
HLlt:clj
5, 6,
80.
c2a yielded
(23.103)
FF2(y)
(11-0.0517,
()s
j?
13 678 3 60
?- g'yjj
(23.104)
)
is strongly accepted. Using
given that c?, 2.254 for x 0.05, HL3
the same procedure for Tz 51.
=
FFI
0.016 278
c'a= 1.83,
and
FF2(y)
1.22,
=0.05
(23.105)
.---
cu 2.254,
#1 #2 and
cf
Hence, Hv
0.685,
0.05.
o'q is accepted
(23.106)
at
1-
-a)2
(1
0.0975.
ln Chapter 2 1 we used the distinction between structural change and
parameter invariance with the former referring to the case where the point
of change is known a priori. Let us eonsider time invariance in relation to
(97) as well.
A very important property for empirical econometric models when
needed for prediction or policy analysis is the time invariance of the
estimated coefficients. For this reason it is necessary to consider the time
invariance of the model in order to check whether the original time
invariance exemplified by the estimated coefficients of the statistical GM
the empirical econometric model we try
has been lost or not. In
to capture the invariant features of the observable phenomena in order for
'designing'
23.4 Specification
testing
the model to have a certain value for prediction and policy analysis
purposes. Hence, if the model has been designed at the expense of the time
invariance the estimated statistical GM will be of very little value.
The recursive estimates of Lj7- 1 Atn:f ..j -pt -j)1, (Fl, - 1 - pt 1 - yt 1),
1)J'jf
-y), AR, it and
in Fig. 23.4(4J)-(./')
1 (
..j are shown
() AJ'l
respectively for the period 1969/-1982/:. Apart from some initial volatility
due to insufficient sample information these estimates show remarkable
time constancy. The estimated theoretical parameters of interest have
indeed preserved the time invariance exemplified by statistical parameters
of interest in Section 23.3.
lt is important to note that the above misspecification tests are not
iproper' tests in the same sense as in the context of the statistical GM. They
checks' in order to ensure that the
should be interpreted as
determination of the empirical econometric model was not achieved at the
expense of the correct specification assumption. This is because in
the empirical econometric model from the estimated statistical
GM we need to maintain the statistical properties which ensure that the end
meaningful
estimated equation but a
product is not just an economically
well-defined statistical model as well. Having satisfied ourselves that (97)is
indeed well defined statistically we can proceed with its economic meaning
as a money adjustment equation.
The main terms of the estimated adjustment equation are:
(i)
(1y
j7, j A ln(M P4t -f) - the average annual rate of growth of real
money stock;
(ln(M/#)t- 1 -ln F) 1) -- the error-correction
(ii)
term (see Hendry
( 1980)).,
j)
Vz-ot)
(1.3
the average annual rate of growth of real
1 A ln Fp
consumers' expenditure'.
(iv)
A ln Pt - inflation rate:
ln
lt - interest rate (7 days- deposit accountl;
(v)
(vi)
jt) j ( - 1)iln 1:- annual polynomial lag for interest rate.
Interpreting (97)as a money adjustment equation we can see that both the
rate of interest and consumers' expenditure play a important role in the
determination of tht? changes in real money stock. As far as inflation is
concerned we can see that the restriction for its coefficient to be equal to
minus one against being less than minus one (one-sidedtest) is not rejected
1.67 and the test statistic is
at a 0.05 given that (',
l7-
(ZJ?-
Sdiagnostic
Sdesigning'
z(y)
-.-
- 0.80 1 + 1.00
tj.j4j
-0.
137.
(23. 107)
This implies that in effect the inflation rate term cancels out from both sides
558
regression
model
a 1t
(a)
T ime
estimates of the
1 A ln 1-j), ln lt
(1F
M
PF
Aj -
0.301(
j .j. gj -
4..087
'
(23. 108)
559
ihow
Z)-,
560
0
ast
Time
Time
and jt) 1 ( 1)i ln It f. This suggests that if we were to assume that the
supply side is perfectly elastic then the equilibrium state, where
inherent
tendency to change' exists, can be related directly to ( 108). Hence, in view of
the perfect elasticity of supply ( 108) can be interpreted as a transactions
demand for money equation. For a more extensive discussion of adjustment
processes, equilibrium and demand, supply functions, see Hendry and
Spanos (1980).
-
'no
23.4 Specification
Re-estimation
of
testing
561
empirical model:
(23.109)
.R2ccz(j.'y:9,
s 0.01384=
1og L= 222.25,
The above estimated coefficients can be interpreted as estimates of the
theoretical parameters of interest defining the money adjustment equation.
These parameters are simple functions of the statistical parameters of
interest defining the statistical GM. An interesting issue in the context of
this distinction between theoretical and statistical parameters of interest is
collinearity or/nd short data (collectively
related to the problem of
called insufficient data information) raised in Chapter 20.
ln view of the large number of estimated parameters involved in the
money statistical GM one might be forgiven for suspecting that insufficient
problems
might
data information
be affecting the statistical
parametrisation estimated in Section 23.3, One of the aims of econometric
modelling, however, is to design' l'tlbusl estimated coefficients which are
directly related to the theoretical parameters of interest. For this reason it
will be interesting to consider the correlation matrix of the estimated
coefficients as a rough guide to such robustness, see Table 23.2.
The correlations among the coefficients of the regressors are relatively
small; none is greater than 0.8 1 with only one greater than 0.68. These
correlations suggest that most of the esttmated (theoretical)parameters of
design'.
interest are nearly orthogonal; an important criterion of a
The first column of Table 23.2 shows the partial correlation coefticients (see
Section 20.6) between Am and the regressors with the numbers in
parentheses underneath referring to the simple correlation coefficients.
coefficients show that every
The values of the partial correlation
contributes
substantially
the
explanation
of Arnfwith the errorto
regressor
correction term and the interest rate playing a particularly important role.
above (see
Another important feature of the empirical model
bnear'
tgood
tdesigned'
(??1t1 - Pt
/'
YkxAnA7K
'he
562
1 -
.1L-
llel:
0.4 13
( - 0.053)
- 0.754
( - 0.4 18)
-3 j=)
mWt
leresm
0.422
(0.056)
A y,j..j
.
-0.70 l
( -0.080)
j'
-0.077
p;)@-j
pt..j
pt-j
it ..j
0.4 13
27
4.=2
0.354
0
O
0
-0.025
0.025
on
(97)
.
--.
j=j
0. 196
0.80 1
-0.80 1
-0.059
based
estimates
-
j=Q
0.093
j=?
0
13
- 0.4
0
0.025
j=4
0.158
0
0
-0.025
23.5
Prediction
23.5
Prediction
563
(a)
(b)
Fig. 23.5(a)-(:'). The time paths of 40 observation
the coefficients of ( 109) apart from the constant.
window estimates of
564
Time
(c)
0
565
Prediction
23.5
ffgt/
us assume that
0* EEE(al aa,
predictor of
.
',f..1
tv
HF
x:.,.I)'
pm,
am. j(), j1,
is given by its systematic component,
.
)
=
0y
I
i.e.
B1
N1
xw,
c2a)
0w
0w)
E'(4,w+ l ,/c(Y X
Moreover,
'
Xi l y
I-
Va
- t)
the predietion
error
F;X
(23.110)
is
(23. 111)
(23. 112)
and
/f( l
-.
c(Y01 ) X?.
1
x0T 4.
0.
?N
)((
pr-hz-alpwsl
afy,.+a-..+
z=
=
#;xw+?-f.
566
model
and
p1
p1
pv-l- l
i
cf/zwx./-i
#;xw-,-,
j(2
0
/-rn+
1, ,n+2,
(23.115)
The above predictors were derived on the assumption that 0* was known
a priori 't-a grossly unrealistic assumption. In the case where 0* is not known
we use 0*, the approximate MLE and the predictor of )'v+l takes the form
/-1
p- v + l
j
i
m
.:i'
i
1
pr
l- i
+
i
(:i.
p v+
j ..
l-
j p-'i x v
1*=
g..
l 2 3,
=
m
(23. 116)
.
and
m
F-
'r
jl
=
(i'
yv+
t- i +
)
=
p-k'J
.
m+ 1
(23.117)
tguessestimated'
x7-
xow..ll
czt).
is given
(23.118)
Similarly,
(23. l 19)
and
I 2 3,
(23.120)
=:
a,?c()2+
j
= ij 1
i
=
r?T +
cj
1
l m+ 1
=
(23 12 1)
.
Looking back
567
is given by
distribution
N 0, )( a/coz
(.pw+,-w-,)
-
('
1- 1, 2,
(23.122)
r,,,
23.6
Looking back
'explain'
hmt
124 -0.485
-0.
(0.018) (0.130)
Aln1r-j
pt-j)
J=1
(0.022)
(23. 123)
(0.007)
(0.0 14)
kl
.R2
=0.709,
1ogL= 222.25,
=0.675,
s=0.013
RSS=0.012
832,
84,
F= 76.
Note that small letters are used to denote the natural logs of the variables
represented by the capital letters.
Rz
0.995,
log L= 138.425,
+0.862pf
-0.053t
Rl 0.995,
=
RSS
=0.
s=0.040
(23.124)
22,
1165,
equation
568
Time
(123)and (124).
569
23.6 Looking back
model
empirical
econometric
and
Richard
1982)
the
by
Hendry
introduced
(
l 124). Encompassing refers to the ability of a particular
( 123)
estimated statistical model to explain the results of rival models (seeMizon
( 1984) for an extensive discussion). The comparison of ( 123) and (124) above
ln this case, howevers a
was in effect a simple exercise in encompassing.
formal discussion of encompassing seems rather unnecessary given that
( 124) could not be seriously entertained as an empirical econometric
model given that it has failed almost all the misspecifieation tests applied.
Although the statistical fotlndations of ( 123) seem reasonably
sound, its
theoretical underpinnings are less obvious. This, however. it not surprising
given that economic theory is rather embarrassingly slent on estimable
adjustment equations. At this early stage it might be advisable to rely more
heavily on data-based specifications which might provide the basis for more
realistic theoretical formulations of estimable models. Muellbauer ( 1986)
provides an excellent example of how' economic theoretical arguments can
and interpret successful data-based empirical
be used to
econometric models. This seems a most promising way forward with
each other;
economic theory and data-based specifications complementing
1985).
Nickell
and
Engle
1985).
Pagan
1985).
Granger
(
(
(
see
The above discussion exemplifies the dangers of using an inappropriate
statistical model in economuttric modelling. At the outset it must have been
obvious that the sampling model assumption of independence associated
with the linear regression model was highly suspect for the kind of data
chosen. Despite that, we adopted an inappropriate statistical model in
order to illustrate the importance of the decision to adopt one in preference
of other statistical models. Throughout the discussion in Chapters 17-23
every attempt has been made to persuade the reader that the nature and
statistical properties of the observed data chosen have an important role to
play in econometric modelling. These should be taken into consideration
when the decision to adopt a particular statistical model in preference to the
others is made. In a certain sense the choice of the statistical model to be
used for the particular case of cconometric modelling under consideration
is one of the most important dccisions to be taken by the modeller. Once an
inappropriate choice is made quite a few misleading conclusions can be
drawn unless the modellel' is knowledgeable enough to put the estimated
statistical GM through a battery of misspecification tests before embarking
on specification tcstilg and prediction. lf, however, the modeller follows the
mtxim that
theory accounts for all the
naive methodological
information in the data f irrespecti: e of the choice of the data) apart from a
theoretical information is real information-,
white-noise error term' or
then the misspecifieation testing seems only of secondary importance and
misleading conclusions arc more than likely.
wencompasses-
trationalise'
'the
'only
concepts
Questions
Explain the role of the stationarity and asymptotic independence of the
stochastic proccss .tZ!, l (E T) in the context of the specification of the
DLR model.
Is the stationarity
of )Z,, t c T) necessary for the autoregressive
representation of the process?
Define the concept of strong exogeneity and explain its role in the
context of the DLR model.
statistical GM of the DLR model is a hybrid of those for the linear
4.
and stochastic linear regression models.' Explain.
Discuss the difference between the exact and approximate MLE'S of 0*.
:Do they have the same asymptotic properties?' Explain your answer.
State the asymptotic properties of the approximate MLE #* of p*.
l-low do we test whether the maximum lag postulated in the
specification of the statistical GM is too large'?'
tl-low do we interpret residual autocorrelation in the context of an
estimated statistical GM in the DLR model'?'
iWhy is the Durbin-Watson test inappropriate in testing for
an AR(1)
error term in the context of the DLR model'?'
-f'he
Additional references
(1983);Mann
The multivariate
Introduction
24.1
B'x ! + u t
fixt+
uit,
1, 2,
(1) is effectively
n1, l 6 T,
(24.2)
with B (#1,#c,
pm).
ln direct analogy with the m 1 case (seeChapter 19) the multivariate
linear regression model will be derived from first principles based on the
joint distribution of the observable random variables involved, D(Zf; #)
+ k) x 1. Assuming that Z, is an I1D normally
where Zt BE (y;,X;)', (?'?
distributed vector, i.e.
=
(xY',)
-
)
((0())()1a
2a))
Yxal
and
tl,
xl)
B'x,,
yt - C(y!/Xt x,),
components
Ec-21E2:
(24.5)
'
The multivariate
(i)
'(u,)
Stulu'sl
Ftpruf'l
where D
19.2).
E1 j
model
regression
by construction,
Moreover,
(iii)
linear
'EZ'(u,/X! x,)(l
=
ErEtuu's Xf
'ES(pu;,/Xf
E21
El 2Ea-21
Xf)1
x,)(1
0,'
fl
0
E
'X, x,)()
rprluf',
=
(compare these
with
0,
The similarity between the m 1 case and the general case allows us to
consider several loose ends left in Chapter 19. The first is the use of thejoint
distribution D(Zr; #) in defining the model instead of concentrating
exclusively on Dtyf X!; /1). The loss of generality in postulating the form of
the joint distribution is more than compensated for by the additional
insight provided. In practice it is often easier to judge' the plausibility of
assumptions relating to the nature of D(Z,; #) rather than D(yf/Xr; j).
analysis the relationship
Moreover, in misspecification
between the
assumptions underlying the model and those underlying the random vector
of the nature of the
process (Z,, t e: T) enhances our understanding
possible departures. An interesting example of this is the relationship of the
assumption that pZ,, t e: T) is a
normal (N);
( 1)
independent (J); and
(2)
identically distributed (1D) process; and
(3)
D(y!,/X,; 1) is normal;
!61
(i)
Eyt/'xt
x,) is linear in x,;
(ii)
(iii)
Covtyf/'x, xr) is homoskedastic (free of xr),'
p>(B, f1) are time-invariant;
g7q
g8(I (.J,'l,,'Xl,t (E T) is an independent process.
=
The relationship
is shown diagrammatically
below:
(.f) I8I
--+
The question which naturally arises is whether (i)-(iii)imply (N) or not. The
following lemma shows that if (i)-(iii)are supplemented by the assumption
24.1
that X,
Introduction
implication
holds.
Lemma 24.1
Zf
kV(0,E) jr
'v
only
(/'
(,*)
(1'1')
E (yly'xf
'v
Xt
Covtyf/Xf
4' and
t e: T
Xt;
) X 1 2Y2-J
=
Xl
X1 1
X j 2X2--/ X 2 1
(24.6)
XB + U,
+ uf
1, 2,
viewed
tn
I'th regression in
(2).ln order to define
represents al1 T observations on the
#I)
the
need
special notation of
conditional
D(Y
X;
distribution
the
we
Appendix
2).
Using
notation
this
the matrix
Kronecker products (see
written
in
form
the
distribution can be
(Y
where f
X)
.3'=
,v
the covariance of
(:& Ir represents
y
vec (Y )
(24.8)
'2
'F?'nx 1
.
y,,l
The vectoring operator vect ) transforms a matrix into a column vector by
stacking the columns of the matrix one beneath the other. Using the
vectoring operator we can express (6) in the form
'
vectYj
y* Xyjv
=
in an obvious notation.
+ uv
(24.10)
The multivariate
f 0,I)
These
0,
equations
1, 2,
be
can
(24.11)
p.
interpreted
providing
as
an
alternative
24.2
Spification
and estimation
Statistical GM: y,
y, : m x 1
x, :
B'xr + ut,
B: k x m
kx 1
x,)
B'x,,
1) the multivariate
lsT
components
u,
are:
y, - .E'(y,/Xt x,),
=
and by construction
ut) E EE(u,,''Xf xrll
=
0,
(4J
(5q
g31
24.2 Specification
(11)
Probability
alxl estimation
model
--i
tll
D(yt/'Xl; #)=
f1)
(det
CXP)
mjz
(2zr)
'--llyf
B Xf) D .y
,
R'nkx frt
tGT
D(y,//Xt,' #) - normal;
(i)
JJty; X, x,) B'xr - linear in x,;
(ii)
(iii)
Covlyf/xr= N) f - homoskedastic (free of xp);
0 is time invariant.
=
(111)
Mmpling
model
g81
Y=
(y1,y2,
v)=c(Y)
z-(p;
1-1o(y,,/x,;04
=
logtdet f1) - s
2
2
B'x,)
(24.12)
(24.13)
log L= const
t?log L
PB
= (X Y -X XBID
,
p log L T fl
l
pfl- .= Y
0.
'-'r
(24.14)
=0.
(24.15)
't C?''denotes the space of all real positive definite symmetric matrices of rank m.
The multivariate
Iinear
regression
model
(24. 16)
!!
p-i (x'x)
=
lx'y
1, 2,
'
(24.18)
/3.7.
y,
+r,
1, 2.
z'x,
z'xf
,rn.
(24. 19)
I -((;''l'))(Y'v)
.,
(Y'Y - l')l'))(Y'Y) -
'.
(24.20)
The matrix G varies between the identity matrix when l-J 0 and zero when
Y U (no explanation). In order to reduce this matrix goodness-of-fit
measure to a scalar we can usc the trace or the determinant
=
(24.2 1)
(see Hooper ( 1959)).
ln terms of the eigenvalues (21 ;.1,
goodness of fit take the form
.
t/1
N7
)-q;.i
l',1 i .
--
tn
and
(lz
- 1
(24.22)
.j.
covt () f1)=0,
(24.23)
24.2 Specification
and estimation
t#'
fl
and
prtppgl-ll'
o(Y X; p)
D(f l .X; p)
=exp)
-
1
.--YTYII
tr f1- EY'Y
-(Y -Yo)'XB -
(,
B'x'(Y -Y())(l)
of 0 if Y'Y =Y'Y() and Y'X
is independent
z(Y)
(24.24)
Y'X
(X'X) -
lz
2 (Y)',
(24.26)
ln order to discuss the other properties
distributions.
Since
2 B + (X'X) - IX'U
of
L=(X'X)- 1X'
=B + LU,
(24.27)
').
where
(24.28)
x,
(24.29)
Wz,,,(f1,r- /()
1 T
E(Tf1)= c2(F-
ti'li and
/&j.
properties of the
The Wishart distribution enjoys most of the attractive
normal
distribution (seeAppendix 1#.In direct analogy to (30),
multivariate
=(T-k);n,
A'I)('.r'f)
(24.31)
The multivariate
'
-+
lr(#)
(5)
(24.32)
j
- j
- (Eldll )
(d1!
in
(19).
and f-1
(i)
( Q B, f X f1)
Consistency:
-+
'4(1)
limw-w Covtfll
0, Lh % n.
lim (X'X)w-
2min(X'X)w-.+
1
,.'.'c
2 (x'x)wmaX
tr(X?X)w-1
-+
are equivalent:
0,'
F-* x
(d)
=f1
-+
0',
operator
'(
.
) is relative to the
underlying
probability
579
information
24.3 A priori
1
where 2min(X'X)r and 2max(X'X)w-refer to the smallest and largest
eigenvalue of (X'X)w and its inverse respectively', see Amemiya (1985).
Strong consistencyz
lim(XlX).w-x
(*
an d
=0
'
a..S.
-.+
B)
maX
(X'X)
F
.
<
2min(XX)w
,
a.. S.
for
(iii)
c, then
-->
Asymptotic normality
1).
N(0,I.(p)
For this result to apply, however, we need the boundedness of
zxclimr...
js(1/T)Ir(#) as well as its non-singularity. ln the present case
I.(p)
the asymptotlc information matrix is bounded and non-singular (fullrank)
Under this condition we can
if limw- w(X'X) F= Qx< %. and non-singular.
-
deduce that
T't -B)
1)
(1
v7'r(fz-n) x(0,2(n
-
(see Rothenberg
(24.33)
x(0,n () Qx-
(24.34)
() n))
(1973/.
24.3
asymptotically
A priori information
580
The multivariate
the present context arises partly because it allows us to derive tests which
testing and partly because this
can be usefully employed in misspecification
will provide the link between the multivariate linear regression model and
the simultaneous equations model to be considered in Chapter 25.
(1)
Lineav restrictions
kelated' to X,
to be considered
is
(24.35)
0,
D1
(0,Ik
),
2
CI
B1
(24.36)
(24.37)
R#= r,
Rj+
(24.38)
r,
BF1 + A1
(24.39)
0.
y, B'x,
=
(24.40)
+ ur,
1, 2,
k.
we can see that they are directly related to the regressors Xit,
The easiest way to take (35)into consideration in the estimation of pH(B, f)
the system (35)for B and substitute the
is to
into (40).ln
order to do that we define two arbitrary matrices D1' :4/f -p) x k, ranktDl)
=
'solve'
'solution'
A priori information
24.3
I
-p,
and Cf
(k -p)
(35) into
DB+C=0
(24.41)
D=(D1,D1),
C=
B= - D -- C
where G
yields
(GI
EEE
Y*
G1')
GICI
D-
(24.42)
GICI',
+
1.
tc?
us to solve
k enables
/'C1
Substituting
this into
(40)for r 1, 2,
=
X*C1'+ U,
(G/'X'XG
GjCj),
(24.44)
MLE of B is
1G1''X'X(
-G1Cj),
+ L(
1')
- G1 Cj ).
+ G/(G/'X'XGt)-
P(
P= I
(24.46)
lu.
(24.45)
1. =G1'(G/'X'XG1')- IGI'X'X
where
where
-GIC1)
1G1'X'X'(
= (GI'X'XGI)
-
model
=GjC1
(24,43)
1
C1 (X*'X*) - X*'Y*'
2,,,zGICI
projections)
(24.47)
E= -(x'x)since D1G j
1l,
D' 1 LD 1 (x'X)-
I- 1(D 1
f1= f f
'r-.,
=(1
v.
(fj
c1),
(24.48)
MLE of fl is
)'(x'x)(:-).
(24.49)
582
form
Tbe multivariate
(35)with
D1
(0,lk - 1),
B=
(j.1,B(1)),
(2)
kelated' to yt
Linear restrictions
0.
BFl
+ A1
to be considered is
=0,
(24.50)
where
k x q are known matrices with ranktr'l)
q.
m x q (qGn1) and
The restrictions in (50) represent
linear between-equations restrictions
F1 :
A1:
because the ith row of B represents the fth coelicient on a11 equations.
lnterpreted in the context of (35)these restrictions are directly related to the
yffs.This implies that if we follow the procedure used for the restrictions in
(38) we have to be much more careful because the form of the underlying
probability model might be affected. Richard (1979) shows how this
procedurecan give rise to the restricted MLE'S of B and f1. For expositional
purposes we will adopt the Lagrange multiplier procedure. The Lagrangian
function is
!(B,f), M)
T
=
logtdet f1)
--j-
--)
tr f
.j
(Y
-tr(A'(BF1 +A1)q,
XB)
(v
.xs)
(24.51)
(X Y -X XBID
,
PB =
el T f) -#Y
1
po - = Y
Pl
-(BFj
PA =
B)
asj
.()
,
-XB)'(Y -xB)=0,
+ A1)=0
(52)we
(24.53)
(24.54)
AF'1f1.
(24.55)
(X'X)(F1
which in view of
A
-BF1)(F'jQF1)-
1,
(24.56)
(54)becomes
(X'X)(rj
A1)(F'1f1F1)- 1.
(24.57)
A priori information
24.3
583
1
f=- T f; fT=f
,
+-
1
T
(:-
of B and fl are
1F'1f1,
(24.58)
)(x x)(:-)
(24.59)
+ A:)(l7fFj)-
(F1
MLE'S
(see Richard ( 1979/. lf we compare (58)with (48)we can see that the main
difference is that ( enters the MLE estimator of B in view of the fact that the
restrictions (50)affect the form of the probability model. It is interesting to
note that if we premultiply (58)by Fj it yields (54).The above formulae,ts8),
(59), will be of considerable value in Chapter 25.
(3)
Linear
frela/ed'
restvictions
to both y, and X,
(38)and (50)
(24.60)
C= 0,
p,
I(B,f1,A)
--j-
logtdet f1)
-trgA'(D1BF1
j
tr f - (Y -XB) (Y -XB)
--i'
(24.61)
C)q,
MLE'S
are
-(x'x)-
c)(r'IfF1)-
'rlf,
(24.62)
1
f= (1+ F (B-E) (x x)(E- ).
,
An alternative
derive
w'ay to
(24.63)
(62)and (63)is to
consider
(24.64)
D1B* + C= 0
Y* =XB* + E.
B* BF1 and E= UFj.
where Y* =YFI,
The linear restrictions in (60) in vector form can be written as
=
vec(D1BFj
or
(F'1 @)D1)/+
r,
+ vectc)
(24.66)
(24.67)
The multivariate
584
Iinear
regression
model
R#v
(24.68)
r,
R1
Rc
'
and
rc
r..
ranktRf)
restrictions
(24.69)
R.j
0
R : pi x k,
Exclusion
r1
pi
?-f:
pf x 1.
restrictions
XB + U,
5'+=X+#+
(24.70)
formulation
vectorised
(24.71)
+u+,
X+
j + (#'1#'2
=
and
where
pm'
)': lnk
x 1
u+
(1p,L X) : Tm x mk,
(u'I
,
u'2
,
u;
)': bn
x 1
Iw): Tm x Fn.
f1+ (f1 Lx?l
=
The Lagrangian
Ip., fl+, 2)
T
=
--y
--jyv
logtdet f1,y)
2.'(Rj+ - r),
vector
2: p x 1 of multipliers'.
-X+#+)'f1+ -
(y,,-X+#+)
(24.72)
24.4
t?!
-
b.
X'+f1+-1(y+
t?I
- 2 (R#+
=
r)
X+j+) - R'2
7.0. = - 'i- o.
- 1
+o.
formulations
=
585
0,
(24.73)
1
(y.- x . p. )jy. .xs#s),os- (24.,74)
0.
(24.75)
Looking at the above first-order conditions (73)-(75)we can see that they
equations which cannot be solved
constitute a system of non-linear
explicitly unless fl is assumed to be known. In the latter case (73)and (75)
imply that
J #.*
k
and
F.
*,(24.76)
(24.77)
(X'.f1+-lx
(24.78)
'
)- 1X'* j1-*
ly
*
24.4
formulations
slalinvaud
restrictions
(24.79)
are particularly
The multivariate
two-equation case
(.p1!j-(#11
)?cf #l2
.X1 f
#a1
pz5
(i)
Exclusion restrictions..
(ii)
Across-equation
x.:,
#11
(24.80)
c,
.X3t
=0,
tl/jyl
,j,
+.
=0,.
pz5
#21 #j 2.
lt turns out that in these two cases the restrictions can be accommodated
directly into a reformulation of the statistical GM and no constrained
optimisation is necessary. The purpose of this section is to discuss the
estimation of #+ under these two forms of restrictions and derive explicit
formulae which will prove useful in Chapter 25.
Let us consider the exclusion restrictions first. The vectorised form of
Y =XB + U,
(24.81)
y2
.'
,
#1
#2
y. =X+#+
u1
u2
(24.82)
#.
um
(24.83)
+ uv
X1
yl
y2
0
=
)'+ X1#:
'.
X2
0
,
0
+ u1,
Xpl
#*1
2
#*
#*
m
u1
u2
+
'
(24.84)
um
(24.85)
where Xf refers to the regressor data matrix for the h equation and p? the
corresponding coefficients vector. ln the case of the example in (80)with the
=
24.4
restrictions jl
587
0,
formulatios
xh)(',))+(--))'
Where
(ry))-()
(24.86)
of
Xj
H(x2,
#:= (X:'(D-
(:&Iw)y+.
(24.87)
(24.88)
o *- 1
lim X!'
w--w
X:
and non-singular,
taking the form
Q+<
'.yt
distribution of
the asymptotic
V'y j#-+ jv )
-
'
Q*-
y (()
'
'
(24.89)
(87)and (88) coincide,
(24.90)
X1
Xc
'
X,u X:
and
f)= diagl?:
&* jv
=
ac.
(x+xv ?
.&+
yv
mv,)
(24.91)
The multivariate
Iinear
model
regression
X)
x'1!
x' 2?
(24.92)
x'mt
(24.93)
+ uf
p .zj
Y 1t
X 1t
-Y2t
X=+
) x+?n-
1x*
(24.94)
MLE of
F
=
p5j
#2c
The constrained
-+
#+
#*
*
and
jg1 x*'nr
(24.95)
l'
>+
jj x t
j')'G
- 1
jl x t
() i- 1 y!
j g
/
(; 4 jj 6)
r
where l refers to the number of iterations which is either chosen a priori or
determined by some convergence criterion such as
Ft
=:
lj) +1
4: /
-....-
==
-j))
< ;
for some
l >0.
#-:=
.1
)()x)'f1=
'x)
- 1
wheref1=(1,/r)'l7,
(J =Y -X.
e.g,
=0.00
defined by
1.
(96)coincides
(24.97)
with
1'
jl x)'f1=
ly
r.
(24.98)
589
testing
Spification
ln the context of the linear regression model the F-type test proved to be by
far the most useful test in both specification as well as misspecification
analysis', see Chapters 19-22. The question which naturally arises is
whether the F-type test can be extended to the multivariate linear
regression model. The main purpose of this section is to derive an extended
F-type test which serves the same purpose as the F-test in Chapters 19-22.
From Section 24.2 we know that for the MLE'S
fkand D
(i)
1)*,
N(B, fl @ (X'X)-
'w
(24.99)
and
T(1
(ii)
(24.100)
'v
Using these results we can deduce that, in the case where we consider one
regression from the system, say the fth,
y
the
MLE'S
of
bi
(24. 10 1)
X/f V uf'
jf and
(X X)
are
ii
X yf
o'bii-,--
i-
y -X#i
(24.102)
Nbi, t.t)if(X'X)- : )
#iN
and
oh
T - .
t?f 1
'v
z (y,.j).
15)
(a4.j()?)
These results ensure that, in the case of linear restrictions related to jf of the
form Hv Rzjf rj against Hk : Rsjf + rf where Rf and rf are pi x k and pi x 1
known matrices and ranklR/) ;)i, the F-test based on the test statistic
=
'
'
'rtyy
--
'
'rlyi) ---j,,-s
-.-L-jx..
i (x x )
T
--
'/-'-..
'
(a4.j()4)
(24.105)
,) 1
590
The multivariate
Let us now consider the derivation of a test for the null hypothesis:
So: DB
against
=0
Sl
DB
C# 0
(24.106)
(0,Ikc): k2 X k, B
B1
and
B2
MLE'S
(x'X)- 1D'gD(x'x)
and
f
=fl
+-
1
T
( -)'(X'X)(2
ID'II- 1(D
0: kz x
ranktD)
=p.
,n,
of B and fl under
(24.107)
c)
(24.108)
-),
where
MLE'S
on the distance
hSD:
-CS(
(24.109)
The closer this distance is to zero the more the support for Ho. lf we
normalise this distance by defining the matrix quadratic form
f1-1(D
-c)'gD(X'X)
fT'f;=
=417'17
-
-c)'ED(x'x)-
(110) can be
(24.110)
(104)is a1l too apparent.
1D'j - 1(D
-c)
(24.111)
f.J'I'))(I')'fJ)-1,
(24.112)
U'U
Gmtf1, r- k),
(24.113)
where U'U U'MXU, Mx I -X(X'X) - 1X'. Moreover, in view of (112)the
chidistribution of 0/0 0'U is a direct extension of the non-central
'v
,v
(24.114)
SpecifKation
24.5
where
A= f1- IIDB
(0/17
where
Mo
C)'ED(X'X)-
is the non-centrality
testing
(24.115)
This is because
parameter.
0/U)= U'MOU
(24.116)
MoMx
(24.117)
0,
we can also deduce that U'MXU and U'MPU are independently distributed
(see Chapter 15).The analogy between the F-test statistic,
FF(y)
'll
S0
T-k
P
'-/
=
,
,.w
F(p, T- k),
(24.118)
17,1-.))(f.J,fJ)-1q,
I-J'I'.))(I')'(J)-
1q.
(24. 119)
(24.120)
J'tY :
z i (Y)>
(.'x
12
1.=
(24. 12 1)
we need the distribution of the test statistics r1(Y) and za(Y). These
distributions can be derived from thejoint distribution of the eigenvalues of
21, 2c,
kt where l mintm. p). because
, say,
.
(24. 122)
The distribution of 2 > (21 k.
;.!)' was derived by Constantine (1963)
and James (1964)in terms of a zonal polynomial expansion. Constantine
(1966) went on to derive the distribution of z2(y) in terms of generalised
Laguerre polynomials which is rather complicated to be used directly. For
this reason several approximations in terms of the non-central chi-square
distribution have been suggested in the statistical literature; see Muirhead
(1982) for an excellent discussion. Based on such approximations several
.
592
The multivariate
z1(y)
=
(24.12.3
T-k
Ho
':ltyl zzlmp),
-
z?(y)
-
(F- k)'r1(y).
(24.125)
The test
be interpreted as arising from the
Wald test procedure discussed in Chapter 16. Using the other two test
procedures, the likelihood ratio and Lagrange multiplier procedures, we
can construct alternative tests for Hv against Sj. The Iikelihood ratio test
procedure gives rise to the test statistic
L( Y)
L(4., Y)
.I-.RIYI
ln terms
dettf'i)
2'F
'
of the eigenvalues of
1,R(Y)=
C this test
1j.
(24.127)
region
is defined by
(24.126)
detg(')(U'fJ)
1-11+ ki
=
dettf'i)
(24.128)
- v.
where F*
log LR(y)
z2(,np),
(24. 129)
gF-k
+ 1)j (;)y?.n)', see Davis (1979)for tables of
upper percentage points ca.
The Lagrange multiplier test statistic based on the function
/(p,2)
-.(?'n
F
=
-p
-j-
dettfl)
(ss
.c))
(,4. jyt)l
24.5 Specification
testing
z-M(v)=tr(G),
(24.131)
where
(7 (f;'I7 - fJ'U)(fJ'U)- 1
(24.132)
This test statistic is known in the statistical literature as Pillai's trace test
statistic because it was suggested by Pillai (1955), but not as a Lagrange
multiplier test statistic. In terms of the eigenvalues 21, 2a,
2,,, this
statistic takes the form
=
z,Af(v)
'i
-1V;.i
(24.133)
'
(T'- k)f-M(Y)
z2(mp).
A similar
dettG).
(24.134)
is defined as the
(24.135)
'wi
(24.136)
1+ ;.i
fv'v
f-r'(;)('''Y)
B2
0.
H 1 : Bc # 0.
and
XI
(24.137)
B1 + XcBa + U.
(24.138)
(24.139)
where :1
residuals
the restricted
by U =Y -Xj21
IXIY
view
(X'jX1)C as the multivariate multiple correlation
we can
Defining
The multivariate
594
U =XIBI
+ XcBc + V.
(24.140)
All the tests based on the test statistics mentioned so far are unbiased but
no uniformly most powerful test exists for Hv against Sl; see Giri ( 1977),
Hart and Money (1976)for power comparisons.
A particularly important special case of Hv : DB C 0 against S1 :
DB C # 0 is the case where the sample peliod is divided into two sub71) and T2 (74+ 1,
F), where F- rl F2
samples, say, T1
2,
and '.TkF2 > k. If we allow the conditional means for the two sub-periods to
be different, i.e. for
-
=41,
+ U1
(24.141)
XaBc + U2,
(24.142)
=XIBI
r e: T1 : Y1
t G T2:
Y2
E (Yf/.?; Xf) f
=
(24.143)
(x)1.6,
and
=(l,
-Ik),
j, c
B1
B=
B2
('t))-()
x0-)(Bs))+(7)).
()
(24. 144)
constancy test.
A natural way to proceed in order to test the hypothesis Ho f11 f12
against Sl : f11
is to generalise the F-test derived in Chapter 21. That
scalar
function of the
is, use a
=
c#f12
'ratio':
=f1
f- 1
(24.145)
>
where
fl- i
1
Fl
U'U
f,
i
such as
d,
(ii)
dettflafll-
1),-
dz trtflcf-il- 1).
=
1 2.
,
596
The
24.6
Misspilication
multiYariate
lu
E u'n
t
-
)3q
and
x 4,,,,
zJ(u'f1
t
2
3.m
where
0ts
)
=
)2
(24157)
jj jg gt3s
(24. 158)
'n--
==
a,m
lu
j x
l
s,
=,
t s
'
12
IL
(24. 159)
:23.,n
g2(/),
-t(;n1tpn
+ ljjj?y+ J)
(24. 160)
and
F
(4.j.m -n1(?'n
+ 2)
8n1(,.:1
Hv
+2))2
'v
'
-6
(i2a
,x.
z2(lj
and
(24.161)
1:
(.i4 3)2
-
24
z2(1),
Hu
'v
g2j1).
(24.162)
Misspification
24.6
testing
Using ( 160) and ( 161) we can define separate tests for the hypothesis
f1g1J:
and
az
x 4., n
1'1t2):
0
0 against
pntr?l
S$1 ):
tya
?yj
+0
+ 2)
IL
(24.163)
= (Bo - )'x,+ F'kt + f)f r 1, 2.
where 1/:1 EEE(/1/,
/pr)' are the higher-order terms related to the
Kolmogorov-Gabor or RESET type polynomials (see (2 1.10) and (2 1.11)).
The hypothesis to be tested in the present case takes the form
t
Sa: F
(24.164)
H3 : F# 0.
against
This hypothesis can be tested using any one of the tests zf(Y), i
LR(Y) or 1wM(Y) discussed in Section 24.4.
1, 2, 3,
Homoskedasticitv.
4-r c'0
+ C' # + c,
where
and
,.
(24.165)
a.
()-l
1.
li
i!
liyr
k: j
4:,)
=0
1 2,
.
n1,
Aan-ltr?-l
+ 1).
(24.166)
can be based on
HL :
(24.167)
C# 0.
cross-products
of the
598
The multiuriate
Independence. Using the analogy with the m= 1 case we can argue that
X;)' is assumed to be a normal, stationary and lthwhen )Z,, t e: -T), Zt > (y'r:
order Markov process (seeChapter 8), the statistical GM takes the form
l
=B7xr +
yt
)
=
Aiyf -f +
1
zB;x,-
(24.168)
:,,
.tc,,
yf Fxr
(24.169)
)'x,+ j
1
ln particular,
(168)with the
+ u,,
we compare
assumption
gA/fyf f + B'fxf fl +
-
(24.170)
Et.
S0: Af
and
for all i
1, 2,
against
S1
Af # 0
Bf # 0
for any i
1,
1.
'
'
'
'
'
599
24.7 Prediction
24.7
ln
Prediction
view of
that
the assumption
yt B'xt
=
+ ur,
(24.172)
c: T,
Twere
the best predictor of yw-hg,given that the observations t 1, 2,
used to estimate B and n, can only be its conditional expectation
=
1= 1, 2,
'x,+,,
v-vl
(24.173)
yv-l
(2
w+l=
(24.174)
B)'xr+,+uw+I.
1=:,
S=
'- k
fi/0
(24.176)
'(yw+I
Ss
=s(1
test statistic
prediction
+ xy..I(X'X)-
(24.177)
w+I),
(24.178)
lxw+/).
(F-/(
-rn
+ 1)S
'v
(F-k)m
F(m, T-k
-n
+ 1),
(24.179)
and this can be used to test hypotheses about the predictions or construct
prediction regions.
24.8
0)
ln
Statistical GM
(24.180)
600
The multivariate
linear
model
regression
E1(1 pt Eyt c(Yr0-1), X,0 x,) and uf y, Elyt c(Yt0- j ), x,0 x,O).
B/,f1())are the statistical parameters
g2(l 0. v (A1,. At, B(),B1,
interest.
E3q Xf is strongly exogenous with respect to 0*.
=
(4(1
of
211- ) AfJ -f
f=1
I(5j
Y -.,, X, X
(11)
Probablity
-
1),
k'b
model
otyr/zl'-1
(p
,.
p*)
(det n())-'ix
1
= (27$=7
expt
B +,x)),o
yy, -
()
ljyf
s+,x)))y,
(24.181)
(i)
(ii)
(iii)
E'7(I
0* is time invariant.
(111)
Sampling model
g8j
B*'
(A?1 A'2 s
EEE
X* EH (rt
yt
''
'h
B')I
'h
-,)-
..2,
xr
tt -I, xf- xr - l
The estimation, misspecification and specification testing in the context
of this statistical model follows closely that of the multivariate linear
regression model considered
in Sections 24.2-24.5 above. The
modifications to these results needed to apply to the MDLR model are
analogous to the ones considered in the context of the m 1 case (see
Chapter 23). In particular the approximate MLE'S of 0*
-
.1.,
(x''x#)- 1x*'Y
(24.182)
24.8
The MDLR
model
and
o- 0
I.T *
U*
t.
'
&,-
(24. 183)
behave
asymptotically
autoregression form
y'h A1'y)=
(24. 184)
+ B1t'Z) + u),
where
.'.$.
1
()
4a
'
*
1
I.
1)
- I tt?
.*.2
(24 l 85)
(24. 18 6)
This is k ne S.Nn i n t htl
51f) B'l
=
*1, B
=
')'
ric 1 teratu re as
1-10l-net
tlco
.fnal
./i?rnk,
with
(24. 187)
..$.
t he
')
'
.
:'
12
.
(24 188 )
.
The
rt??'??7nlultipliers of delay z respectively.
t-ayrfl'/ll-l'l,//'n
with
so-called
also
Ionjl-run
the
t1s
ll
602
The multiuriate
L=
BIAI:T
1.
=B1(I -A1)-
(24.189)
z=0
The elements of this matrix lij refer to the total expected response of thelth
endogenous variable to the sustained unit change in the fth exogenous
variable, holding the other exogenous variables constant (see Schmidt
(1973),(1979),
for a discussion of the statistical analysis of these multipliers).
Returning to the question of prediction we can see that the natural
predictor for yw+1 given by y1,
yv and xj
xw+ , is
.
+ 2f'Z1+ 1
(24.190)
v. l St'yv
well
ln order to predict yw..z we need to know xw+ 1, xw..c as
as yw+1.
Assuming that xr+ l and xwo.care available we can use the predictor of yv+ j,
Jw.l in order to get yw..c in
=
v-vz
S'yv,v 1 +
2?'Z1+2
*
*
*j
*w
= A 1 (A 1 y v + B Z +
'
'-'
Hence,
'
'
)+
'-'
''h
B *1 Z w*+ 2
'
'-'
+
*1 2
*1
*1
+
= (A ) y w + A B Z w+ ( + B 1 Z w+ a
t
rl
(24.191)
'r
-(')'yw+
v..z
j=1
(?')'-')'zy+,,
1,2,
(24.192)
will provide predictions for future values of yt assuming that the values
taken by the regressors are available. For the asymptotic covariance matrix
of the prediction error (yw..z
see Schmidt (1974).
Prediction in the context of the multivariate (dynamic)
linear regression
model is particularly important in econometric modelling because of its
relationship with the simultaneous equation model discussed in the next
chapter. The above discussion of prediction carries over to the
simultaneous equation model with minor modifications and the concepts of
impact, interim and long-run multipliers are useful in simulations and
policy analysis.
'wsz)
D(S; 0)
ctdet
S)EtF-N
ldet
'w
1'/21
fIIEX/X
expt
-J
tr fl - 1S)
Appendix 24.2
603
where
D
Fn
2,
,=
t'j
1 /'4
l 1Ilntn- )1 )
r(
) being
Pvoperties
.
XV 1
(seePress (1972:.
Wishart distribution
of the
Sk are
T;), i 1, 2,
14.7:(Q,
If S1,
tj-.
n random
)(
'v
k, then
Sf
f
I4$(n, T),
where
If S
W'(f1,T) and M is a /( x
'w
MSM'
--
IXIMQM',
h?
r)
fllll.
(see Muirhead ( 1982), Press ( 1972) inter
W;,tf,
enable
S
T) and S and fl are
These results
us to deduce that if
conformally
partitioned
as
'.w
S11 Slc
Scl Sa2
(1
then
(a)
n11 n12
f21 nc2
where v : + n J, n, S j l : n 1 x n 1 Sc :
S lf W';,f(Dff,T), i 1, 2,
c
n2 x n a
S1: and Sj c are independent if fll 2 0.
2f12-c1f121
T- na) and is inde(S11 S1 2Sc-21S21
) l4$,(f1:1 - f11
of
and
Sa2.
pendent S1a
( : cfl LzS a 2 (f1l l fl afl z-z flc 1 ) (& S z c).
(S1 2 'Sc a)
=
'v
(b)
@)
'v
is'
A ()l B
rz11 B
x: CB
: 2 1B
:x
JmIB
j,,B
za B
.
am,,B
604
The multivariate
matrices
(viii)
(ix)
(x)
matrices;
Usejl
derlvatives
' logtdet A)
1
.4A - )
-- dA
,.
t?trtAB
h
'-.
4y-7
1E3;
( tr(A'B)
PB
'
'
= A;
t7tr(X'AXB)
= AXB
px
tr(.?')
(v)
(.7.4
n.h -
t?vec(AXB)
.
( vec X
Important
.s,
+ A'XB''
'
'
'
s x.
concepts
Appendix 24.2
605
Questions
Compare
statistical
models.
linear regression
and multivariate
-pa,,
=(X'X)- IX'Y
5.
I - (5'''5')-
f,l/f.'
t##'
(1 =
Jt
S.
B.
imply that :
Give two examples for each of the following forms of restrictions:
D 1 B + C 1 0,'
(i)
ii)
BF
0
1 + A:
(
Discuss the differences between them.
10. Explain how the linear restrictions formulation R#v r generalises
and (ii) in question 9.
-.+
(i)
2 ti (
=
12.
formulations'?
1
I- 1 + A 1 )fF'1f1F1 )- F'1(1
(rl
+ Al )(F'1fF1) -
'
rlf.
606
14.
The multivariate
15.
against
=0
ff1 : DB
C # 0.
16. Discuss the question of testing for departures from normality and
compare it with the same test for the m 1 case.
Explain
how you would go about
testing for linearity,
homoskedasticity and independence in the context of the multivariate
linear regression model.
18. 'Misspecification testing in the context of the multivariate dynamic
linear regression model is related to that of the non-dynamic model in
the same way as the dynamic linear regression is related to the linear
regression model.' Discuss.
19. Explain the concepts of impact, interim and long-run multipliers and
discuss their usefulness.
=
Exercises
where
D1
D1'
(Gj, Gt)=
1:
- !
(58),(59)and (62),(63.).
)'z=Ibzxtt +/$'22x2,
fszxst+/'/icx4t
2)
l?2t.
Appendix 24.2
607
(i)
lii)
(iii)
/.z1 ja2.
=
,= xiyt--i
B'fxt-f +.,,
?r-(B()
-())'xr
because X'
B)x,-f +v,,
Afyl-f +
+
=
Discuss.
Construct a 1
x prediction
Additional references
Anderson (1984),. Kendall and Stuart ( 1968); Mardia et al. ( 1979)) Morrison (197$,.
Srivastava and Khatri (1979).
The simultaneous
25.1
equations model
Introduction
where l??p, f,, ;)t. )',, )t refer to (the logg of) the theoretical variables, money,
interest rate, price level, income and government
budget deficit,
respectively. For expositional purposes 1et us assume that there exist
observed data series which correspond one-to-one to these theoretical
variables. That is, ( 1)-(2) is also an estimable model (see Chapter 1). The
question which naturally arises is to what extent the estimable model ( 1)-(2)
can be statistically analysed in the context of the multivariate linear
regression model discussed in Chapter 24. A moment's reflection suggests
that the presence of the so-called endoqenous variables it and mt on the RHS
of ( 1) and (2), respectively, raises new problems. The alternative
608
25.1
Introduction
609
formulation'.
(25.3)
(25.4)
'reparametrising'
(25.6)
and then derive the uijs by imposing restrictions on the aijs such as Jyl
0,
tzsc 0. ln view of this it is important to emphasise at the outset that the
simultaneous equations formulation should be interpreted as a theoretical
parametrisation of particular interest in econometric modelling because it
lmodels' the co-determination of behaviour, and not as a statistical model.
ln Section 25.2 the relationship between the multivariate linear
regression and the simultaneous equation formulation is explicitly derived
in an attempt to introduce the problem of reparametrisation
and
overparametrisation. The latter problem raises the issue of identification
which is considered in Section 25.3. The specification of the simultaneous
equation model as an extension of the multivariate linear regression model
where the statistical parameters of interest do not coincide with the
theoretical (structural) parameters of interest is discussed in Section 25.4.
The estimation of the theoretical parameters of interest by the method of
maximum likelihood is considered in Section 25.5. Section 25.6 considers
two least-squares estimators in an attempt to enhance our understanding of
the problem of simultaneity and its implications. These estimators are
related to the instrumental variables method in Section 25.7. In Section 25.8
we consider misspecification testing at three different but interrelated levels.
Section 25.9 discusses the issues of spccification testing and model selection.
In Section 25.10 the problem of prediction is briefly discussed.
lt is important to note at the outset that even though the dynamic
=
610
::.t
yr
1,
yt
l , X/', Xt - 1 ,
Xt
t)
,
,
25.2
(25.7)
((.Vytt)
y''x,x,)
-
((s#)
(&'.)
,)
X)x'
jl
f1)2a))
(25.8)
e(c(yj1)),
H
f(.y1t
Xf xf)
=
l-:''y)1' + A'lxr,
..'/')'')
(25.9)
(25.10)
whererf
#j - B(1)n1(z).:1. The systematic component
by
defined (10)can be used to construct the statistical GM
=
nc-cllajand A1
+
1
y1 l ro'yjl)
=
'
1xt
+ c:,,
(25.11)
and simultaneous
equatiols
models
'(.p:,/,.*-11)).
;1,
A'1,
-.:/11')1=0,
E'E.E'IEIII
E(;:,)-
(i)
p11
.EEF(7.t1,;1,.'.Fj1))!
'(#1,;1,)
(iii)
11
)1 2
nc-al?cl
=0,
zit + al
.a1
3p!
+ al
(25.12)
4).r
parameters
we call
(r0,A,
1 )'
1
(25. 13)
2j
l t
(25.14)
E(
are simple functions of B and f1. That is. they constitute an alternative
parametrisation of 0, in Dtyt/Xf; 0), based on the decomposition
D(y,/Xt; 04 D(.J'1r''yl1', Xl; 41) o(y;'',/X,; 4(1)().
-
(25.15)
Dt't Xr; 04
givingrise to
D(-J',.'yji',
a statistical
x,;,l(fj).
(25.16)
GM:
A'x
i t +cfl.
t +
l''ft = Fgdytfl
I
(25.17)
model
F'y + A'x t +
'
'1.
'.)
.1-EEEE((1- *1'.z
EEE
ar (l1,,c2f,
0,
t:t =
.1-
'
...:'.
z:'i /1
,'.'.$ 'z
',',',',',',',',','! ,
t'.
'.)
,
smtj'
(I-f is essentially F;9 with - 1 added as its fth element), is not a well-defined
of (7).Fol'
statistical GM because it constitutes an over-reparametrisation
equation,
reparametrisation
first,
the
the
say
one
EEE
<1 (r01A 1
,
d?1 1
2)
441 ) = (B(1 ) f1.,2.
x ttm =
f
tlj
is not.
A particular case where the cartesian product X7- 1 tli is a proper
reparametrisation of 0 is when there exists a natural ordering of the yfts such
1, i.e.
that yjt depends only on the pas up to
.j
Eyjt
'.../-.
rU') =
E( v-jfc()'f! i
,
1 2,
,j
-
1), Xt
x,),
1, 2,
rn.
(25.19)
ln this case the distribution D(yr,'''X;;0j can be decomposed
in the form
p1
.l)(yt,''EXt,0)
lL/q-it/''.vkt.),'2,-
'-'
)'i -
1f,
X,,-<).
(25.20)
This decomposition gives rise to a lower triangular matrix F and the system
( 18) is then a well-defined statistical GM known as a recursive system (see
below).
given in (18)is defined in
In the non-recursive case the parametrisation
unknown
?? H (F, A, V)
1)
mk
of
1)
+
+
parameters
terms
n mm
-hJ(n:
'(:fE;), and
EEE
well-defined
statistical
only
mk
1)
V
+
there are
where
+Jn1(r?
?!
constitutes
shortfall
of
parameters in 0,'a
mm - 1) parameters. Given that
of 0 there can only be mk +.ntn1 + 1) well-defined
a reparametrisation
parameters in tl. ln order to see the relationship between 0 and ?? let us
premultiply ( 18) by (F') - 1 (assumedto be non-singular):
=
(r')-
11)
l
'
lf we compare (21) with (7) we deduce that 0 and ?Jare related via
BF + A
0,
F'flr.
(25.22)
25.2
equations models
The first thing to note about this system of equations is that they are not a
priori restrictions of the form considered in Chapter 24 where 1-and A are
assumed known. The system of equations in (22)
the parameters 4
in terms of 0. As it stands, however, the system allows an infinity of
solutions for 4 given p.
The parameters 0 and ?; will be referred to as statistical and structural
parameters, respectively. The system of equations (22) enables us to
determine only a subset 41 of <(4H (4j:42)) for any given set of well-defined
statistical parameters 0. That is, (22) can be solved for mk +-tanltm + 1)
structural parameters <, in the form
tdefines'
(25.23)
41 G(P, 42).
=
tlz
Without any
(25.24)
detttl)
dettt! 1)
being the /th leading diagonal matrix.
vL, =
EI
(25.25)
X7'-1 l;f
614
, -
+ st,
Y-fjy 1Yf()
X/YO
l'
m=
ti
i- 1
and
1, 2,
parameters
v0, 1Av
i-
X'X
rn,
Jf, l7ff) by
(y,9,
tti =
v0,
*
i - 1Yi
X'y i
(25.27)
-Y,9ii T (yf
=-
a
1).,9
-XJf)
(yf-
Yf(; 1 yr
-
25.3
Identification
.j
xgf j
MLE'S
of
tti.
of the statistical
(25.29)
system,
(25.30)
V= F'f1F.
(25.31)
=0
=0,
where
H
(B, 1)
and
.-(r,)
(25.32)
25.3 ldentification
615
I'1v
product
notation
(seeAppendix
0,
(25.33)
0,
(25.34)
('))--e
for
(25.35)
Dejlnition 1
The structural
are said
parameters
H*
rank *
nytny +
rankt*A*l
1).
(f
(25.36)
I1+
(197$),we
+mk
rank(*A*)
m(m
1).
(25.37)
(25.38)
616
unrestricted
structural
model is
mt
=
(25.39)
(25.40)
As can be seen, the two equations are indistinguishable given that they differ
arises because
only by a normalisation condition. The overparametrisation
statistical
underlying
GM
effect
the
(39) and (40) is in
+ #21 + #31pt + l.4l(/r+ 1,/1,,
1
(25.41)
/.1
mt
.rf
(25.42)
and the two parametrisations
1
/.12
/71
/21 #22
#3 1 /' 3 2
/4.1 I4.,
B and (r,
J1 a
''
l 1
(621
)'12
21
2c
:3.1
J.l
,3
'0
,c
J4c
(25.43)
which cannot be solved for F and A uniquely, given that there are only eight
pijs and ten ijs and (bijs.
A natural way to
the identification problem is to impose exclusion
These restrictions
restrictions on (39) and (40) such as 4l
J22
enable us to distinguish between the money and interest rate equations.
Equivalently, the restrictions enable us to get a unique solution for (712,
),a1,
(5
al) given (py, i 1.
4- j 1, 2).
l .,
1 2, J1
1
2l
lThe exclusion restrictions on the fth structural equation can be expressed
in the form
esolve'
=0,
.3,
*.j
=0.
0,
(25.44)
mj
=40,
0, 0,0, 0, 1)
ma
(0,0, 0, 1, 0, 0).
rank
H*
(hi
ranktmfA*)
m+ k
m
1,
'
1, 2,
1, 2,
???.
(25.45)
as the
(25.47)
25.3
BF
ldentification
+ A.j
(25.48)
restrictions',
omit (n1
+ (/4 k1) exelusion
and impose (m endogenous and (/ - k:) exogenous variables from the first equation. ln this
case we do not need to define *1 explicitly and consider (46)because we can
substitute the restrictions directly into (48).Re-arranging the variables so as
to have the excluded ones last, the resbricted structural parameters Ft and
-n11)
1111)
A1' are
#ll
/21
B1z
Ba2
where
jl
1:
kl
jc
1:
(k -
J1
)x
0-
0,
B1 c : kl x
x 1,
J1
o 0
(25.52)
=0,
B2zrl
'y1
Bzsz
+ B1 c71 +
- / 11
-/21 +
.- 1
Bla'j
J 1 : k1 x 1
(m1-
B 2 a : (k - l
) x (tn 1
1)
,
B z :! : (l - k 1 ) x (m
m 1)
.
(25.53)
rnl - 1.
ln view of the result that the ranktBaz) mintk - pti nlj - 1) we can deduce
that a necessal'q' t't??7t/frl't?n for identification under exclusion restrictions is
that
=
(25.54)
ranklm:
,b
.=
??? ..
(2f
.55)
Tlle simultaneous
equations model
rank(*1A*)
rankto,
0)
(25.56)
=0,
Dehni tion 2
-4 particular
to be:
equation
c/?-p?
-$
saitl
under identified if
rank(*jA*)
< r??-
1,'
(25.57)
iii)
over identified
1,.
(25.58)
(f'
ranktmfA*)
The system
ranklmf)
(.)) is said to
??
m - 1,
identi.fled
ranktmf)
t'Jevery
>
equation
1.
.$
(25.59)
identilied.
Exatnple
Consider
restrictions
imposed:
(25.60)
712.F1/ -
(25.61)
-1.'2,
.F3f+
l 3.X1,
(123.X2,
:3,,
(25.62)
25.4 Specilkation
619
0 0
*:! (0, 1, 0,
=
*a
0 0
0 0
0
0
0, 0)
1),
2a
*aA*
(0, 0,
1).
3.
2 but ranktmj)
The first equation is overidentified since ranktmjA*)
underidentified
ranktmcA*)
second
equation
1
because
is
The
even
though ranklmz) 2 (the order condition holds). This is because the first
equation satisfied all the restrictions of the second equation rendering these
The third equation is underdentified
condtons
as well, beeause
rankt*aA*l
l < 2.
lt is important to note that when certain equations (af the structural form
(30) are overidentified then not only the structural parameters of interest are
uniquely defined by (31) but the statistical parameters 0 are themselves
restricted. That is, overdentifying restrctions imply that 0 G (6))where (.)1is
a subset of the parameter space (i).An important implication of this is that
the identifying restrictions cannot be tested but the overidentifying ones
can (see Section 25.9).
The above discussion of the identification problem depends crucially on
the assumption that the statistical parameters of interest 0n (B, f) are well
defined the assumptions
g1q-g8()underlying the multivariate linear
regression model are valid. However, in the case where some assumption is
changes we need to reconsider the
invalid and the parametrisation
identification of the system. For example, in the case where the
independenee assumption is invalid and the statistical GM takes the form
=
'phoney'.
B'x
i
j -
+ ut
!,
(25.63)
25.4
Spification
The simultaneous
model is viewed as a
620
Tlle simultaneous
equations model
0)
Statistical GM
y, Bk,
=
g1j
(25.64)
+ u,,
pt
xJ)
g3j
g4j
g5(I
(11)
Probability
(p
model
Dlyt ,,/XfP)
,'
(det f1)(27:)
eXP )
2--'--.rj-
lg
yt - B
Xr)
.1
)
()'f- B XJy
0 (E R'n
m5
t( T
(25.65)
D( yf/'Xt ; 0) is no rmal
(i)
(ii)
J)y;,/'Xr xf) B'x, - linear in x,;
(iii)
Covty,/'''xf xr) f'l homoskedastic
>
(B.
f1) is time invariant.
0
',
(111)
Sampling model
g8j
(free of x,);
y2.
frona D(yr,/X?;#), l
25.5
likelihood estimation
Maximum
importance
in econometric
equations,
likelihood estimation
Maximum
D/in i rffpl3
I n r/lt.zcase J4,/? eve ( is
r-itlvlkrftl,ll i r.s (indirect)
Iikelihood estirnator (Iz%'
/.$
/?)'
ILE ) (Ie./i?7tp(/
n'Iaxirnurn
.jlk.$
(-
jjj
jj
0- (
fl.
(r
(x'x)- lx'Y
v -x.
(25.67)
lLsbased
t?s/rf'nTf/rtal-
fl
ln the simpler
0.
f1= (
J1
('?n
lk).
case of exclusion
jj la): ;
- ll1 +
'k'
restrictions
the system
(66) is
()
(2,j.69)
= 0,
(25.70)
(25.71)
J1
j:
: -
B : a B c-jj B c :
matrix.
given that B2a is a sqtlare (nl - 1) x (mj 1) non-singular
The IMLE can be N iewed as an unconstrained MLE of the theoretical
for testing any
parameters of interest which might provide the
-
'benchmark'
622
The simtlltaneous
equations model
=0,
f=
f+-
1 (:
-*)
(x x)(
-)
(25.75)
-(r(j)
+ A(j))(F(j)'f(r'()
lrtf
and
1 A(
v(t)-F(t)'fF(-.,jk
Z ZA(
,
(25.77)
a
and
and
z-tv,x)
Richard (1983)).
(see Hendry
Substituting (76)and (77)into the log
of
multivariate
likelihood function
the
linear regression model log Lttl; Y)
likelihood function log f-((., Y) defined by
we get the
Cconcentrated'
1og f,tt; Y)
const T
logldet fla
log(det((F'Y'MxYF')
- 1(A'Z'ZA)j).
(25.78)
This function can be viewed as the objective function for estimating ( using
12/ (, f) as opposed to the direct estimation of (by solving (76)and (77)for
1.The log likelihood function (78)has the advantage that it provides us with
Iikelihood estimation
Maximum
25.5
'solution'
12
,
(2 5 80)
.
X'Z'ZX
t'h'.
T --....
tr tf-'ff-l-itf--,.,--(f''P)t?(i
?A
v
)-t
Z'Z
1,
(25.82)
=0.
of ( and
The system of equations (82)could be used to derive the MLE'S
t-i,
derived
asymptotically
equivalent
be
An
form
f.
and
can
hencef', 2
using
fl
1
F
r/n
PF
ti
(7
- A'z/z
JA
--.
C'1
(f
0 and
--+
t7A
A'Z/XH ..0
f)
(i
tL'ff'
1
)- .i'Z'Xf1
JA
Ptf
=0,
1, 2,
p.
(B : I),
EE
equivalent
(25.83)
form
(25.84)
For the details of the derivation see Hendry ( 1976), Hendry and Richard
system (84)is particularly intuitive because it can
( 1983). The reformulated
iding
interpreted
an estimator of ( in terms of the sufficient
as pro:
be
statistics and
H(,
1-,,,=
and f
f of the fonn
fl,
(25.85)
The simultaneous
624
equations model
of the minimal
sufficient statistics
(25.86)
EZ
and
'X t
tt, where cf
It can be shown
1
-T
xr)
-
I'l'xr
I)'
x?
lk
(25.87)
(25.88)
A'z'xl
--.
0,
(25.89)
A'Z/ZA,
-(r+
H=(B:
A)'(r'fr)
I),
-
'
rv!1.
(25.91)
(25.92)
When V and H are given, (90)is linear in A and an explicit solution can be
derived. On the other hand, when A is given, V and 11can be derived easily.
This suggests that (90)-492)can be used to generate estimators of A for
different estimators of V and n. lndeed, Hendry ( 1976) showed that most of
the conventional estimators in the econometric literature such as threestage Ieast-squares (3SLS), two-stage least-squares (2SLS), can be easily
Hendry referred to (90)as the estimator generating
generated by (90)V92).
j'
25.5
Maximum
likelihood estimation
625
equation (.EGE). The usefulness of the EGE is that it unifies and summarises
a huge literature in econometrics by showing that:
(a)
proposed.
A(r(j)
'
(76)
(25.93)
A(t)F(()
'
(25.94)
qit)
(7lo L
t(
1/()
E!q(()q(t)'1
(25.96)
(see Chapter 13), where E'( ) is with respect to the underlying probability
model distribution D4y, Xf,' 0). Using the asymptotic information matrix
.
626
The simultaneous
equations model
defined by
.,''j j'-
tlslxj- x((),I
x
(() - ).
(25.98)
A more explicit form of 1.(() will be given in the next section in the case
exist, for the 3SLS (three-stage leastwhere only exclusion restrictions
squares) estimator.
One important feature of the above derivation of (84)and (90)-(92)is that
it does not depend on F being ln x rn and non-singular. These results hold
true with r being m x q ((/%?n). In this case the structural formulation is said
to be incomplete (seeRichard ( 1984:. For f- m x r?11 (:n3< ?zl),
(84)
gives rise to a direct sub-system generalisation of the llmited information
1) (see Richard ( 1979:.
maximum likelihood (LIML) estimator (with -1
'solving'
25.6
Least-squares
estimation
(1)
Two-stage least-squares
(2N15)
ZSLS is by far the most widely used method of estimation for the structural
parameters of interest in a particular equation. For expositional purposes
1et us consider the first structural equation of the system
25.6
Least-squares estimation
''yrt
o(y,,/Xl; 04 D ).,.r
=
X,,' 1/1 )
of the probability
otyjl',,-'x
'
model:
(25.10 1)
141))
p''j,
=
))
.E().,
and non-systematic
;1,
=
(25.102)
component
J)).,//,F)1
.J'! -
,#7'J'' (c(y)1'),Xr
'),
(25.103)
xr).
This suggests that, because of the normality of Dt)'t Xf; 0), ?!j and 4(1) are
variationfree and thus yjl' is wctpkly exogenous with respect to ?;: (see
Chapter 20). If no restrictions are placed on
-(F)
1
A1
its )1 1-1! is
*1 (Z'(1)Z(1))- 'Z'(1)y1.
(25.104)
1'
il
'1
Z(1):1
(25.105)
-0
(25.106)
't?yrl'l'c-rt?p7s.
.D(y1 t
''y(
X, ; <*1
)
.l)lyl
1 )1,
1 )1
xf; 44*1)).
(25. 107)
628
The simultaneous
equations model
1 )f
),1 : (n j
where
1) x 1
.-
1 )1 +
(25. 108)
clf,
1) x 1
.
and
(25.109)
(25. 110)
Let us consider
system
via
Bl-/ + A1t
=0.
coefficient vectors
At'
(#1
#c1 B22
B21
r 1 + J 1 #1 1
=
Bc2r1
!
-
g'j
h.t,
'j
.j.
.g,0.-j
(25. 113)
) q07
(25. 114)
(25. 115)
=#21
.
These equations
#1
y,
Bca7
(25.112)
(J'1 0)',
.-
B1 a-h
B1 2
/1 l
#21
and
B(j)
B21
B31
B22
B3c
In the case where the equation ( 110) isjust identljled k k : rl1 - 1) Bca is
(,'n1 1) x (,,n1 1) square matrix with rank rn1 1. Hence, (114) and ( 115)
can be solved uniquely for )!1 and J1 :
=
r1
B 2-cl #,
J1
p1 ,
B c 1 B z-, pa 1
Looking at (116) we can see that in this case 4j and tlj ) are still variation
free. Moreover, no structural parameter estimator is needed because these
can be estimated via the indirect MLE's..
71
: 22- j #21
-
21
(25.1 17)
25.6
629
estimation
Least-squares
when
However,
n1
f.51 3
12
t.?a
f122
f23 2
l'l
32
c?g 1
) (1 j
y
=u
j'j
D22
as
well as
ln the
f2 ! 3
3
ft) 3 1
(25. 120)
f132
rr,)
23
f12 3
(25. 121)
t),
(25. 122)
( 114), ( 115).
nz-alc)z j
B2-21
pz1
(25.123)
and yl as defined by ( 121) and ( 115) coincide. On the other hand, in the
overidentified case ( 12 1) and ( 115) define ),l differently with ( 115)
free condition,
invalidating the N'ariation
is in terms of the
An alternativ'e
u'ay to view the above argument
non-systematic
lf
components.
we define plj in ( 110) by
systematie and
:1::
then
lt
J( ).: : c( y 1 ? ). X 1 t
x j f ),
(25.124)
(25. 125)
(25.126)
630
where
cl',
1'1t
ftl'lf/'cty:
), X 1
xl
,)
(see Spanos (1985$). This suggests that the natural way to proceed in order
to estimate ()'1,Jj t'11) is to find a way to impose the restrietions (114)-( 115)
and then construct an estimator which preserves the orthogonality between
pl and t?st. The LIML estimator briefly considered in the previous section is
indeed such an estimator tseeTheil ( l97 1/. Another estimator based on the
same argument is the so-called two-stage least-squares (ZSLS) estimator.
Let us consider its derivation in some detail.
The orthogonality in ( 126) suggests that
,
.r1t
=
7'1y1r+J'1x1f + zt
(25.127)
is equivalent to
y1,-B'1zxl,
(1 15). In
(25.128)
lg,,
+Brz.zx(1)fA-ulr
(25.129)
into ( 127):
I!
3.'1,r'1(B', cxlf
-
+ B'22x(1),+ ul
f)
+J'1 xj,
+ t?st
+
= (/1B?12+J'l)x1, +J'1B)cx(1), +
l'-l'u, l/jr - y/lulf, (130)becomes
ltf.
'y'jul,
Given
lt,
(25.130)
(25.131)
3.'1:(7'1B'12 +J?1)x1t+
=
/1B'2cx(1 ), +
(25.132)
l,flt.
yl =(X1B12 +X(1)Bc2)71
-FXIJI
ll
(25.133)
explicitly:
(133)estimate
(25.134)
25.6
estimation
Least-squares
631
XB
+ U1
using
(X'X) - X'Y
(25.135)
(25.136)
into
y1
Z 1x
+ u
(25.137)
+ u?
*1 =
*1
+ U
'
Z1
1,
(25. 138)
(Y l ; X ! )
Estimate a*1 by
,a.
.a.
1
x*
m.
,'<
)
(Z:ZI)
f1
J1 zsu.s
(:.!5 139)
zlyl
i''
'
X'
''
x,1$, x,'
..
-'
y'
'
x,'
..
1
'
3.1
it /21+
+ ?2l,
Izsh + /724.::
zzpt +
( 1:
J 21 t
(x 3 j
pt +
J41
into
j21 + jazp; + qzyt + fz.bgt
),: + ul
Applying OLS to this equation yields the ZSLS estimator of aj 1, a21 a31
and ayl.
Given that ( 133)preserves the orthogonality between the systematic and
non-systematic component by redefining the latter, the 2SLS method uses
the equivalence between ( 127) and (130) and via an operational form of the
,
--.+
as T-+
-.
(25.14 1)
-..c
.
For a given Tl however. f (2,':ul) #0 and thus &1'is a biased estimator, i.e.
Ell'l
z,.tz
(25.142)
tz1:.
$-1uluu
vv
:= X1A 1
1
directly
p r 1 () 1 v x j
,
,k
XIXI
- l
with
(v j
(.T
J)y
j ),y j
,
Xlyl
(2 5 14 3)
.
632
The simultaneous
which, in
view of
V'IX j
equations model
i''1i' I
Y'jX j
Y'1Y 1
Ory
where
0
T1
1 mx
.-. 1
f*is not
(25.144)
71
(25.145)
'
-0
,v
fJ'l 0 1
0 S)
$-711 yj 1 y 1.r1
J* --o
0/ u
v 0
f1
171
Y 01
(yj v j )
M1
I -XI(X')X1) --1X'1
Y'1 Y 1
Y'1 X 1
Y1 1
x 1x 1
'
'
ranktml)
ranktmjA*)
(25. 146)
1,
AI'+ rt
where
=0,
(25.147)
- 1
= (X'X) - IX/Y,
r?
(25.148)
zl
0
fzst.s=(Y'1(Px
ZSLS
Px.)Y1) --1Y'1(Px
Px, )y1
fcsus),
(25.149)
(25. 150)
!'
.Yj
25.6
form
7zsus g.?2- z 17 2 1
yr::z
where
estimation
Least-squares
B''
W01 v
(25 15 1)
.
W'12
11
(),
w 21 w 22
vo vo
j
'
3l
yr 1
(25. 152)
x)
N(B0j
'xf
x,
Doj)
(25.153)
YO/QYO
1
1 where Q
where B01 EEr (#'j B',c)', we can see that the distribution of
Px - Pxj is an idempotent matrix must be a matrix extension to the noncentral chi-square (see Appendix 24.1). That is.
.
wo1
11J
??1j
(n01 w.M0)1
5
>
M0
no -
)Y(')'Q)Y0).
i
similarlyas
f(k)=(W22
(25.156)
-l*Sc1).
(25.157)
fuIs1u=(5V22 -f*S22)-'(M'21
where
Sy
1 (Y
-
?-
XBO:
)'(Y0j xB:))
-
SE
S1 1
S21
S1 2
S2 1
(25, 158)
(see Mariano ( 1982)). These results suggest that the same methods for the
derivation of the distribution of fcsps
can be used to derive those of ftkjand
fujul-. ln Chapter 6. several ways to derive the distribution of Borel
functions of random vectors were discussed and the general impression was
that such deri: ations can be very difcult exercises in multivariate calculus.
In the case of fl abo: k?. the derivations are further complicated by the fact
that the distribtltion ftlnction of the non-central Wishart is a highly
complicated function based on infinite series (see Muirhead (1982)). The
complexity of the distribution increases rapidly with the ranktMoj). This is
basically the reason why the econometric literature on the distribution of
these estimators concentrated mainly on the case where 1Al1 1 (see
Basmann (1974) for an excellent survey). ln the general case of n1 > 1 (see
Phillips ( 1980)) the distribution of # is so complicated so as to be almost
=
The simultaneous
model
equations
F( :*1 x *1)zs
I-s
X(0
x.
*1
?-? 1 D
(25.159)
where
D 11
B/ 2 Qx B
.
B' 2 Q1
The asymptotic
x'x 1
lim
-,
Q1 j
--
Q11
X'X
.X
X'1 X 1
--.T
lim
w..- y
E(ss1 f 1;'y1 t ).
of *1 can be estimated
covariance
*
*
Cov(41)=
f11
I7*
11 a
(25. 160)
1im
Qx F-+
Q1
Q, B.c
ir'1 i' 1
i''1 X 1
X1Y1
X1X:
by
(25. 161)
where
,1
v 1f - x 1
1
non-singular
if
25.6
Least-squares
estimation
that
v'' F(k - 1)
--+
0.
(25.162)
t?/P,
W),
(25.163)
fl
(25.164)
where
P 55E
X'u 1 X'U 1
T
apply
(25.165)
(seePhillips ( 1980a)).
The asymptotic
+ Op(F-),
(25.166)
(2)
(3S+5)
636
suggests that the same formulation should lead us to a more efficient system
estimator than the ZSLS estimator.
Expressing all m equations of (99) in the same form as equation one n
( 110) the SURE formulation in the present context gives rise to the systenl
y1
Z1
ya
()
za
x1
c*1
12
P ':2
'
ym
where
Zm
7f
('' f : xf)
Ai
Jf
12
x FX
(25.167)
'
:*
.P1
'
n''l
,
'+ Z++
=
is an obvious
the system
notation.
y+ =
(25.168)
The ZSLS estimator amounts
a,y + c:.
(25.169)
to applying OLS to
EEE
(25.170)
'v
N(0, V () 1w),
(25.172)
where Vis estimated from the ZSLS residuals applied to all the equations of
the system via
l''t'ij
''?A
2)
I
--' 2'-'A
'
l,
'
..1
12
,
(25. 173)
For obvious
reasons this estimator of a,j is called the three-stage leastestimator, first suygested by Zellner and Theil ( 1962).
(3SLS)
squares
l
It is important to note that for(Z'+(V
- (@)Iw)2 * ) in ( 189) to be invertible
k,v must be of full column rank which requires that each 2f (ij. Xf), i 1,
=
Instrumental
2,
variables
comprising
rn, i.e. a11 equations
when the system is identified
Moreover.
1 (;'
(# l
-T * -
(g)
Iy );
-.a
v3st.s
P
*
....+
(;j.
1
) N ((j D - )
x
j.74)
,
(25.175)
(see Schmidt ( 1976)). Sargan (1964/) showed that when only exclusion
restrictions are present and the system is identified, 3SLS and FIML are
asymptotically equivalent, i.e. D 1.((),. see Section 25.3.
the result related to the
ln relation to the finite sample distribution of Vysus
moments of the ZSLS applies to this ease as well. That is, moments exist up
1he
nonto the order of overidentification (see Sargan ( 1978)). Moreover.
the L1M L estimator extends to the FIML
existence of moments
estima-tor (see Sargan ( 1970)). As argued above. howeNrer. existence or nonexistence of moments should not be considered as a criterion for choosing
between estimators in general.
=
.for
25.7
Instrumental
variables
Ktextbook'
pf I'X
=
where X,: p
that
(25.176)
+ Tr.
x 1 vector
'(X;c,) # 0.
of
variables
such
(25.177)
The simultaneous
equations model
projection
(OLS) estimator
:=(X'X)- 1X'y
we can see that this estimator is in general biased and inconsistent since
i x + (X'X) - 1X/c and S)(X'X)- 1X'E) # 0
=
IV method
introducing
not go
a new vector
'(Z; G)
to
'solves'
of instruments
((Z/ c,/F)
Eaa
((Z'Z,'F)
,$
problem b)
F such that:
0)
-->
Cov(Z,)
Eaa)
-+
(iii)
((Z'X,/F)
Eaa)
-+
P
'(Iv)
and
91:.
-+
if in addition to
Moreover,
y'
(iv)
(Z v#x/ T)
X(0,
'w
'
(i)-(iii)we
azXaa)
then
/'F(4Iv
N/
a)
'.w
- 1 E3aEaa
- 1
N(0, c 2 N.zs
)
than
variable
fs)
Sargan
estimator
(GIVE)
&)j=(X'PaX) - IX'P y
(25.179)
'optimal
Covt:lv)
2
c2(HYca) - 1(H'ZaaH)(N'aaH')- 1
Instrumental
25.7
variables
639
(Z*'X) - Z*'y
1
1
Given that for /(H) (HEaa) - (H'EasH)(E'asH') ' /(H) /(AH) for any
matrix
minimising
we can choose H by
m x m non-singular
Of
:lt
(H'Ea aH)
subject to HE2a
1.
E 5-1 E a a(N aa E -l E aa )- l
the
equivalent
1
P :: XIX'P X):5
toptimal'
set of instruments
is
and thus
,'
jjrgjjf
jjj.f
....'
.
..
lj,
''
(# x.
N(0, c (EazEa........a Ea a) ... )
The GIVE estimator as derived above is not just consistent but also
asymptotically efficient. Because of this it should come as no surprise to
learn that a number of well-known estimators in econometrics including
OLS, GLS, ZSLS, 3SLS. LIML and FIML can be viewed as GIVE
estimators (see Bowden and Turkington ( 1984), furt/l' alia).
Norp that in the case where m /?, g)i). I& as defined in (178).
The main problem in implementing the IV method in practice is the
choice of the instruments Z,. The above argument begins with the
presupposition that such a set of instruments exists. As for the conditions
(i)-(iv), they are of rather limited value in practice because the basic
ln order to
orthogonality condition (i), as it stands, is non-operational.
make it operational we need to specify explicitly the distribution in terms of
which E ) in ( 176) and (i)-(iv) are defined. lf we look at the argument
leading to the GIVE estimator ( 179) we can see that the whole argument
around an (implicit)distribution which esomehow'
involves y't, Xf
and Z,. ln order to see this let us assume that the underlying distribution is
Dt-pr,Xp; #) (assumed to be normal for simplicity). lf we interpret the
systematic and non-systematic components of ( 176) as in Chapters 17 and
20, i.e.
.
F()
'
Krevolves'
and
c-al
x E o'c :
=
'(Xf';r) 0.
=
(E a c
by
t?t =
Covtxf )
yf - E ()'r,6'Xf))
o'c 1
Co v(X,,
(25.180)
-pf)),
Construction
and
9,:u:(X'X) ' 'X'
fully efficient.
and
640
The simultaneous
equations model
(v)
Cov(Z,,.pr)
/3 1
# 0.
A glance at the IV estimators ( 178) and ( 179) confirm that the latter
condition is needed in order to ensure the existence of these estimators, in
addition to (i)-(iii).This, however, requires Zf to be correlated with pt but
do we
uncorrelated with a random variable cf. directly related to yr.
resolve this apparent paradox?'
ln addition to the above raised questions the discerning reader might be
estimator of x in
wondering how the GIVE estimator ( 179) can be a
176)
when
apparently
the
has
latter
(
parameter
to do' with Zf. As
emphasised throughout Chapters 19-24 there is nothing coincidental
estimator. For
about the form of the parameter to be estimated and its
1X'y
(X'X)estimator
of
is a
example the fact that &
x Ea-alokj is no
that
the
natural
given
sample
analogues
of
and
J21 are X'X
E22
accident,
respectively.
analogy
in
of
and X'y
Using the same
i),. we can see
the case
estimator of must be:
that the parameter this is a
Sl-low
'good'
'nothing
'best'
'good'
'good'
(25. 18 1)
D( J..,, Xr/'Zr;0j.
(25. 182)
Let us consider how this choice of the implicit distribution could help us
resolve the above raised issues.
Firstly, assuming that the parametrisation
of interest in ( 176) is I*, as
defined in ( 18l), we can see how the non-orthogonality
(177) arises. lf we
ignore ( 182) and instead treat f'( ) in ( 177) as defined in terms of D()'t, Xt; #)
.
then
'(X;k)
'(X;(yf- x'Xt))
(c21-
E22t:* )# 0
25.7
Ilstrumental
variables
E(X;y,./c(Z,))
E'IXCfR,c(X,)1/'c(Z,))
*'Xf)/'c(Zf)1/'c(X/))
= F)X;JJg(yf
Z/Ya-.tEa aa*l c(Xt) )
f Ygi as 1
t f
= E$X'LZ
-
= E 23 E-3 3
f(Xf'G 'c(Zl))
65 1
-E
2.3
t- 3 3IE 32 x*
in ( 177)
for
as defined in (181). ln other words, the non-orthogonality
expectation
of
the
x*
in
1
82)
but
in
defined
terms
terms
(.
was
arose because
Xl.,
D(
p,,
of
#).
The above discussion suggests that the instruments Zf are random
variables jointly distributed with the pt and X, but treated (implicitly)as
specification ( 176),
ln defining the statistical
conditioning variables.
explicitly.
include
That
natural statistical
is,
the
Z,
however, we did not
*
GM
yl a'()Xt+ SZ,
(25.184)
+ ut
where
atl
(25.185)
X2370 0
=
x*
(25.186)
ll
(X'M,X) - X'5'1Z y
(25.187)
642
The simultaneous
equations model
and
(Z'Z)
- 1Z'y -(Z'Z)
.
- 1(Z'X)()
Pz= Z(Z'Z)- lZ', Mz=I - Pz. The estimator ('10 of (), however, is not an
estimator of the parameters of interest a*. On the other hand, when Zf in
(184) is dropped without taking into account its indirect effect (throughthe
parametrisation) the estimator #= (X'X) - 'X'y is inappropriate. This is
because (i introduces a non-existent orthogonality between the estimated
'systematic' and
components. That is, for #= X:= Pxy and
-X4=
Mxy
sample equivalent to ( 176) induced by
that
the
wecan
see
=y
knon-systematic'
is
pxy + Mxy
(25.188)
Intuition
t:1
zt
(p) -p,)
5't E (Jtf'rqZ,))
=
it
f(/lf,/fT(Zr))()+ c,
(25.189)
from cf to define
(25.190)
0.
with Eptttl)
(190) is different from ( 184) in so far as the original parametrisation has
been retained
It is interesting to note that the above
can be motivated
=
're-design'
25.7
Instrumental
variables
643
'('gZ'f)f/J(Zf)1)
(see Chapter 7)
which in turn is
if
(25.19 1)
for
valid
Ept c(Zf>
(i)holds
we can
E (lt J(Zl))
E (ZIEI;, c(Z,)))
(25.192)
c(Zr))
'(.)?,
Vchoice'
The conditions ( 19 1) and (192) are equivalent to ( 189) and thus the
of Z, so as to ensure condition (i) is equivalent to the above re-design' of
( 176).
ln order to understand how the IV estimator (179) is indeed the most
appropriate estimator of x* in the context of ( 176) 1et us derive the sample
equivalent of (190).Using the well-known analogy between conditional
expectations and orthogonal projections we can deduce the sample
F (T > m + p):
equivalents to (189) and (190) for t 1, 2,
=
p * P z xa*
=
and
c*
PzXa* + Mzxa
1) +
M z X()
(25.193)
+ c.
Sre-designs'
the form:
y Xa*
=
+ M,X(a() - a*) + c
(25.194)
644
model
(186) we can see that Hv is an indirect test of the hypothesis that the
coefficient of the conditioning variables Zt is zero, i.e.
covt)?t,ZXt)
yl
Zll
+ zl3
(25.195)
PXYITI
J'1
XIJI
Y' P x Y 1
Y'1 P x X 1
XIPXY 1
X'IX 1
,1
Iv
+ MxY17 +
:1'
(25.196)
Y'1 P xyl
X'1yl
of al is:
'
.,)
(aj. j oy)
25.8
Misspecification testing
(ii)
M-ut,
(25.198)
25.8
Misspecification
testing
(1)
equation,
7'1y1, -F-ti'1X1? +
1'1,
The statistical
system
645
'l/f
(25.200)
level
(25.201)
(see Chapter 24). This suggests that if the identification problem was
in terms of 0 in f 198) we need to reconsider it in terms of 0* in (199).Given
that (198) and ( 199) coincide under
isolved'
Hv : Bf
0,
-'.f
0-
1.=
1, 2,
/,
(25.202)
646
restrictions.
In the
we need to account for these implicitly imposed
restrictions
context of (198)the restrictions in (202)are viewed as
which fail the rank condition (seeSection 25.3) because a11 equations satisfy
these restrictions. When Ho is tested and rejected, however, we need to
reconsider our estimable model (seeChapters 1 and 26) in view of the fact
that the original model allowed for no lags in the variables involved. When
a situation like this arises in practice the modelling is commonly done
equation by equation and not in terms of the system as a whole because of
the relative uncertainty about which variables enter which equation. For
this reason it is important to consider misspecification testing in terms of
individual equations as well.
itestable'
tphoney'
(2)
The unrestricted
single structural
equation level
The restricted
single structural
Ievel
equation
25.8
Misspecilkation
647
testing
ln Sections 25.6 and 25.7 it was argued that (200)can be viewed in various
alternative ways which are useful for different purposes. The two
intemretations we are interested in here are:
(25.203)
(25.204)
Given that
E (. . k'1 f
'o-l
y1
.'
X1t
:),
)?'jy j t + J'1x 1 r
x j t)
aE-tylf,''X, x!)
-
Bz1cxl, + B'lcxtl
*j
),'ju j t +
?.fj t
)?,
we can see that we can go from (204)to (203)by subtracting y'jyjt from both
sides of (204).For estimation ptlrposes (204)is preferable because of the
orthogonality between the systematic and non-systematic components (see
Section 25.6). For misspecification testing- howey'er,it is more convenient to
use (203).
Tbe
oI' D(
??t?,-n'la/!'r.
'1,..y1,,
of the skewness-kurtosis
J: t
.yl
, -
f;
j..y
, -
t,
l 2
,
(25.205)
whereflv,tsvrefer to
f( 1':,
xl!)
?'Z1,,
(25.206)
x'Ifl'
and 1'H (/1,J'1) can be tested using the F-type test
where Z1t EEE(y'1p,
discussed in the context of the linear regression model (see Chapter 2 1).
That is, generate the higher-order tenns #r,using Z1f or powers of ft),and test
the null hypothesis
He: d
regression
in the auxiliary
)',1
/
H 1: d# 0
a01 +
Z1(at'
tlzd
:0
1
$) ) + Td
(25.207)
+
&'
(25.208)
The simultaneous
The homoskedasticitv
Var()'1 t/o-ly
equations model
of the conditional
), X j r
xl
r)
variance
(25.209)
r: 1
can be tested by extending the White and related tests discussed in Chapter
2 1 to the present context. If we assume that no heteroskedasticity is present
in the simultaneous equation model as a whole we can test (209)against
t/c(y1,),
Vartyl
using Hv cl
X 1,
xlr)
/'l(z 1 ,),
(25.210)
regression
(25.211)
(see Kelejian ( 1982)). This can be tested using the F-type test or its
asymptotic variants (e.g. T'R2) discussed in Chapter 21. ln the case where
the assumption related to the presence of heteroskedasticiLy
in the rest of the
appropriate,
need
modify
F-type
is
because the
the
tests
not
system
to
we
asymptotic distribution of > ((),1) is not
Ft
-c)
N(0, c
x
J
Q/ )
(25.212)
but
j.'
,. T( -c)
x.
J
N(0, 2c #.(Q-
'
f(Z1?l1',)
-..
L'#) +
%1
(25.213)
L )),
2LQcj
limits of
respectively
(,Fj
#)#)'q,
and L is the
(25.214)
(seePagan
xr)
649
(25.215)
ftxt),
(25.2 16)
0r
l
*1 t
(25.217)
+ vl,
*1t (01
-&*
I l''
)'z 1 t + Y c'0
f
1t - i
+v
(25.218)
t5
Spaification testing
whereylt: (n11(1)
+J'I
X1t
against Sl
yl #
(25.220)
lj f,
yj Under
i,'1
,.
x(1)r given
y1f B'12X 1 t +
=
1) x 1, x1f : kl x 1, (k
(25.219)
-kj)
Testing Ho : l'j
'l
l11,
X1,
H3 however, y1,
,
that
B'22X(1)t
+ ulf
(25.221)
yf
X1l1' + X(l)/(1)
+V1
(25.222)
650
The simultaneous
equations model
T-k
FT(y1)=
'f
kz
H
'--
F- k),
F(2,
(25.223)
Lk
t)-l
.ftyl:
FT(yt)
>
ca),
df-lkc, F
-k).
(25.224)
t7t
(2)
r against S1 : Ra/ # r
with
x'/T*lsus
*)
.N'(0,t?1 1 D 1-11)
t'
(25.225)
(see Section 25.6), we can use the same intuitive argument as in the case of
the F-test (seeChapter 20) to suggest that a natural choice for a test statistic
in the present context is
(R&lst.s r)'gR(2'121) - 1 R'(J- ltR&lsus
-r)
FT'(y1)
where
f1:1
In practice
(25.226)
&o
z2(.
it is preferable
approximation
(25.227)
to use the F-type form
(226)based
on the
(25.228)
25.9 Specification
651
testing
test are:
(i)
(ii)
which
(3)
and
restrlctions
-k:
H3 : B22r1 #21#0.
1.o: 82271 #21
enables us to test Ho is the statistical
which
A convenient formulation
sub-model:
=0,
A'lt
/'11 xl t + b',j.x(1), +
1./1t
+ Br2cxll), + u1,
y1, B'1ax,,
-
Multiplying the second equation by y'2and subtracting from the first we can
define the unrestricted structural formof y1, as :
(#11 -B12T1)'x1f
7'1y1l+
-P1, =
(/21
-
B2271)/x(jjt +
:1,.
(25.229)
This form presents itself as the natural formulation in the context of which
Hv can be tested. If we compare (229)with restricted structural form(219)
we can derive the auxiliary regression:
tt
or its
t#11
operational
Jt,
J?'x1
The obvious
test statistic
B12r1 -J1)'x1, +
form:
:
J!l'lxllj, +
way to test Hv
II
LM
rR2
(#c1-B2c),1)'x(l), +c1,
x
7
g2((y).
!?f.
(25.230)
(230)to define the LM
(25.231)
652
The simultaneous
An asymptotically
for small F is:
RRSS
FF
model
equatio>
T-k
URSS
fo
'v
URSS
where
RRSS=(y1
URSS
ylflv
appx
FM1,T-k)
-Y1fIv
-X1JIv)'(y1
-Y1fIv)'(Mx
=(y1
(1960)as better
(25.232)
Xl#lv)
Mx,)(yl -Y1iIv).
(i)
=YI)4
XIJI+X2J1
r1
:1
J1
against
=0
H3 : J1
(25.233)
'0.
FXp'(r1)
URSS;V
T- k
where
and
URSS;..
(y1
(25.234)
URSS;V
Y1f hp,
X1J*1,pXc#*c,p.)'
-
*1
p )
(y 1 - Y 1 f , p X 1IL'p. - X 2J*w
FT',wlyl)
z2(>.
be preferable
FT;Jy1)
aPPX
Flq, T-k),
(25.235)
653
with the usual rejection region. For a readable survey of the finite sample
results related to the test statistics (226)and (234)see Phillips (1983).
The above tests can be generalised directly to the system as a whole using
either the 3SLS formulation in conjunction with the F-type tests or the
FIML formulation using the likelihood ratio test procedure.
for exogeneity
specification of (219) above,
Testing
(4)
variables
y1 =Y171
XIJI
+ MxY1(y1
(25.236)
r1)+ :1.
ln this formulation we can see that y1t and x1, in (219)are treated similarly
when ),t yl 0. Hence, an obvious way to parametrise the endogeneity' of
yj, is in the form of:
0, Sl : (7? 71) #0.
Hl :(T1-T1)
(25.237)
=
RRss -
usss
'Rss
T-2(rn1
,,n1
1)-I
1
Hu
-
Ft'''l
1, T'-2@1
1'+k1)
(25.238)
where RRSS and URSS refer to the RSS from (219) and (236)respectively,
both estimated by OLS (seeWu (1973),Hausman (1978), inter ali.
The above test can be easily extended to the case where only the
exogeneity of a subset of the variables in y1f is tested by re-arranging (236)
yl
(see Hausman
PxYt7? +Y171
''FXIJI
MxY1(y -y!)
+ MxYtyf
+Et
(25.239)
and Taylor
The simultaneous
equations model
25.10
Prediction
lf we assume that the observed value xw+, of the exogenous random vector
Xw./ is available, the obvious way to predict yw../ is to use the statistical
GM,
yt B'xf + u,,
t G T,
(25.240)
yt B(()'xt
=
(25.241)
+ u?,
and the MLE of 0 and ( coincide only in the case where the system of
structural equations (withthe identifying restrictions imposed),
F*'yt + A*'xr + st*
0,
(25.242)
(25.243)
.
- trt(-)where A(j) and F(j) refer to some asymptotically efficient estimator f of the
structural parameters (, such as FIML or 3SLS. lt can be shown that in the
'
B(-
overidentified case:
x,7''z-(B(j)-B(t))
x//'rt
with (Fo
-B)
,v
N(e, Fa),
x(e,F0),
(25.244)
(25.245)
25.10 Prediction
655
.v'
B(())
r )-
N(0, F2),
method
(25.246)
and both (F() - Fz) and (F2 F:5) are positive semi-definite (see Dhrymes
( 1978)). lt is not very difficult to see that this asymptotic efficiency in
estimation carries over to prediction as well given that the prediction error
defined by
-
(w+/ - yr+/)
(2
B)'Xw+!
(25.247)
+uw+/
-
B) and
n.
concepts
of interest,
Reparametrisation,
theoretical (structural) parameters
equations,
overparametrisation,
recursive
of
simultaneity,
system
restrictions,
restrictions,
exclusion
identification, linear homogeneous
order and rank conditions for identification. underidentification, just and
overidentification, indirect MLE, estimator generating equation, full
informatio' n maximum likelihood (FIML), limited information maximum
likelihood (LIML), two-stage least squares (2SLS), k-class estimator, noncentral Wishart distribution, instrumental variables, three-stage least-
squares (3SLS).
Questions
3.
r:
'
t' )
yt + Ajx, +
/
ij,
Explain.
of the variation
'Endogeneity arises because of the violation
free
exogeneity
induced by the identifying restrictions.'
condition of
Discuss.
656
t'l-he
yl =Z
+:* 1,
25.10 Prediction
657
..<
..x
B(t)
will provide
A(()F(t)
efficient predictors
more
estimators
defined via
for yw-hj.'Discuss.
Exerclses
a 1 1 + a 1 zit + a 1 5pt +
it
(1)
(2)
J 1 4y1,.
Express equations
B'x f +u
forms
'
f'
t1
=0.
defined.
Verify that in the case of a recursive simultaneous
1 IFft= y0?y9
1t
+J'xi
ilf
1, 2,
and
model
n'l,
equations
X)):
1
ii T
=
--.
if
ii
yi
Zf&,9, i
1 2,
,
,9
MLE'S
of
and nf.
are indeed the
equations
structural
Consider the following
imposed'.
restrictions
+ 7 4 1 .V41+
- 1.'1 l + 7 2 1 )' 2f
+
l
4.1' 1
-h
-/'
24.-.21
.F4t
-1-
1 1 X 1t
lcxl,
l 3x1,
1 4.,X1 t
the exclusion
3 1 .X 3/
l1
;3,,-
2cx2t
+ aaxa,
+
t;
(524..X2/
fl4t.
(ii)
-/4.3.:4.1
with
The simultaneous
658
muationsmodel
Show that in the case of ajust identified equation the ZSLS and IMLE
estimators coincide.
Show that l-((-)'fF(t-)
( 1/F).d(()'Z'ZA((-) in Section 25.5.
Derive the 2SLS estimator of al' in yl Z1al + 1)1' and explain why
where * (y-.irj sus -X1J2sus)is an inconsistent
!7tl (1/F)t'/
estimator of tll'j
Compare your answer in 6 with the derivation of the 2SLS estimator as
an instrumental variables estimator.
t'T'he GM yl Pxzlf
+ ct* (see Section 25.6) can be viewed as a
equivalent
sample
for the sample
to y3t y'1'(y1,/c(X,)) + J':xlf + k:T*
T' Explain.
period t 1, 2,
Derive a test for the exogeneity of a subset y2, of y1f in the context of the
single equation
=
+ tt
Additional references
Anderson
( 1982);
Hausman
( 1983);
26.1
A methodologist's
of
critical eye
.
and philosophy
with
the integration of economic methodology
econometrics. Methodologists have generally skirted the issue of
methodological foundations of econometric theory, and the few
econometricianswho have addressed philosophical issues have seldom
gone beyond gratuitous references to such figures as Feigl or Carnap
.
Epilogue
660
iobjective
'comply'
'illegitimate'
'illegitimate'
'vague'
'estimable'
iillegitimate'
ibest'
'best'
26.2
Formalising
661
a methodology
for them. Often, research workers felt compelled, and referees encouraged
them, not to report their modelling strategy but to devise a new theoretical
hypothetico-deductive model using some form of dynamic optimisation
method and pretend that that was their theoretical model all along (see
Ward ( 1972/. As argued in Chapter 1, research workers are driven to use
procedures because of the
the textbook
methodology forces them to wear. Moreover, most of these
procedures become the natural way to proceed in the context of the
alternative methodology sketched in Chapter 1. In order to see this let us
consider a brief formalisation of this methodology in the context of the
philosophy of science.
'straightjacket'
Sillegitimate'
'illegitimate'
26.2
Econometric
662
Epilogue
to explain observed quantity or and price changes over time, then the
actual DGP should refer to the actual market process giving rise to the
observed data. lt should be noted that the intended scope of the theory is
used to determine the choice of the observable data to be used. This will be
taken up in the discussion of the theory-dependence of observation.
,4 theory is defined as a conceptual construct which purports to provide
an idealised description of the phenomena within its intended scope (the
actual DGP). A theory is not intended as a carbon copy of
providing an exact description of the observable phenomena in its intended
is much too complicated for such an exact copy
scope. Economic
to be comprehensible and thus useful in explaining the phenomena in
question. A theory provides only an idealised projected image of reality in
tenns of certain abstracted features of the phenomena in its intended scope.
These abstracted features, referred to as concepts, provide the means by
descriptions are possible. To some extent
which such generalised (idealised)
the theory assumes that the phenomena within its intended scope can be
iadequately explained' in terms of the proposed idealised replicas viewed as
isolated systems built in terms of the devised concepts.
ln the context of the centuries-old dichotomy of instrumentalism versus
realism the above view of theory borrows elements from both. lt is
instrumentalist in so far as it assumes that a theory does not purport to
provide an exact picture of reality. Moreover, concepts are not viewed as
describing entities of the real world which exist independently of any
theory. It is also realist in two senses. Firstly, it is realist in so far as underthe
circumstances assumed by the theory (as an isolated system) its
can be ascertained. There is something realistic about a demand schedule in
so faras it can be established ornot under the circumstances assumed by the
theory of demand. Secondly, it is realist because its main aim is to provide
explanation of the phenomena in its intended scope. As such,
an
of
adequacy
the
a theory can be evaluated by how successful it is in coming
the
reality it purports to explain. Theories are viewed as
with
grips
to
providing
to reality and they are judged by the extent to
approximations
enhance our understanding of the phenomena
such
which
We
question.
cannot, however, appraise theories by the extent to which
in
provide
exact pictures of reality
they
treality'
treality'
:validity'
tadequate'
kapproximations'
.
theories in a way that
descriptions
.
(See Chalmers
(1982),p.
163.)
This stems from the view that observation itself presupposes some theory
providing the tenus of reference. Hence, in the context of the adopted
26.2 Formalising
663
a methodology
664
Epilogue
26.2 Formalising
a methodology
665
ft
(b)
( -.
'
l
present
'
!b*
z.
1 12
future Z
(c)
)
Such information is useful in relation to important concepts underlying the
specification of a statistical model such as exogeneity, Granger-causality,
i
(
--
,. -
666
Epilogue
structural invariance
,
data
26.2 Formalising
a methodology
667
tnestable'
668
Epilogue
tgood'
'isolated
26.2
Formalising
a methodology
669
Sdrafting'
in local surgery' by
assumption. What we do not do is to
the alternative hypothesis of a misspecification test into an otherwise
unchanged statistical model such as postulating an AR( 1) error because the
kengage
Durbin--Watson
assumption,
in the context of
the linear regression model; see Hendry (1983) for a similar viewpoint. Once
a well-defined estimated statistical model is reached we can proceed to
determine (construct) the empirical econometric model.
Starting from a well-defined statistical model we can proceed to test any
theoretical restrictions which can be related to the statistical formulation.
The specification
bchecking'.
..
670
Epilogue
parsimony
(vi)
(see Hendry and Richard (1982),
( 1983), Hendry (1983), Hendry and Wallis
( 1984)).
ln cases where the observed data can be relied upon as accurate
measurements of the underlying variables, there is something realistic
about an empirical econometric model in so far as it can be a
or a
bad' approximation to the actual DGP. ln such cases the instrumentalists'
interpretations of such an empirical econometric model is not very useful
because it will stop the modeller seeking to improve the approximation.
The realistic interpretation, however, should not be viewed as implying the
econometric modelling is aiming at; we have
existence of the ultimate
criteria
outside
our theoretical perspective to establish ultimate
no
This should not stop us from seeking better empirical econometric models
which provide us with additional insight in our attempt to explain
observable economic phenomena of interest. In particular, an empirical
econometric model which explains why and how other empirical studies
have reached the conclusions they did is invaluable. ln such a case we say
that the empirical econometric model encompasses others purporting to
explain the same observable phenomenon; on the subject of encompassing
see Hendry and Richard (1982), (1983),and Mizon ( 1984). Hendry and
Richard ( 1982), in their attempt to formalise the concept of a well-defined
empirical model, include the encompassing of all rival models as one of the
important conditions for what they call tentatively adequate conditional
data characterisation'
(TACDI;for further discussion on the similarities
and differences between the two approaches see Spanos ( 1985).
The empirical econometric model is to some extent as close as we can get
to an actual DGP within the framework of an underlying theory and the
available observed data chosen. As argued above, its form takes account of
and sample.
all three main sources of information - theory, measurement
As such, the above-discussed
methodology differs from both extreme
approaches to econometric modelling where only theory or data are used
for the specification of empirical models. The first extreme approach
requires the statistical model to coincide with the theoretical model apart
from a white-noise error tenn. The second approach ignores the theoretical
model altogether and uses only the structure of the observed data chosen as
the only source of information fbr the determination of the empirical model;
see Bos and Jenkins (197$., see Spanos ( 1985).
This concludes the formalisation of the alternative
methodology
sketched in Chapter 1. A main feature of the new methodology is the
broadening of the intended scope of econometrics. Econometric modelling
is viewed not as the estimation of theoretical relationships nor as a
of economic theories, but as an
procedure in establishing the
tgood'
'truth'
ttruth'.
'a
'trueness'
26.3 Conclusion
endeavour to understand observable economic phenomena of interest
using observed data in conjunction with some underlying theory in the
context of a statistical framework.
26.3
Conclusion
.
as there is no bcst wal' to listen to a Tchaikovsky symphony,
book, or to raise a child, there is no best way to investigate
.
social reality.
Epilogue
Yet methodology
has a role to play in all of this. By showing that science is
not objectives rigorous, intellectual endeavour it was once thought to be.
and by demonstrating that this need not lead to anarchy, that critical
discourse still has a place, the hope is held out that a true picture of the
strengths and limitations of scientific practice will emerge, And with luck.
this insight may lead to a better and certainly more honest, science.
(See ibid, p. 252.)
Additional references
Blaug
( 1980);
Boland
(1982);Caldwell ( 1984);
( 1984).
Index
l 53
asymptotic stationarity
asymptotic test procedu res 328- 35
14 1
asympto t ic unco rrelatedness
autoco rrelation 134
at1toco rrttlat ion erro rs 50 l - 3 505- 11
tests for. 5 l3...2 1
autocovariance. 134
autoproduct moment, 134
auxiliary regressions. 446- 7, 460- 1 467
470
.
Bassman
efficiency. 247
nonnalit) 246
7
unbiasedness
asymptotic. properties
consistcncy. 327
.
-n4
Iocally UM P. 328
UMP, 327 8
unbiasedness. 327
test. 652
of tests. 326-8
690
Index
CAN estimators, 27 1
canonical correlations, 3 14
Carleman's conditionp 74
Cauchy distribution. 70-1 105
causality (setpGranger non-causality)
central limit theorem
cross-correlation, 135
cross-covariance, 135
cross-section data. 342-3
cumulants, 74
cumulative distribution function (.$&&
distribution function)
cumulative freqtlency, 25
CUSUM test, 477
CUSLIMSQtest, 477
De Moivre-l-aplace,
64s 165
Liapounov, 174
Lindeberg-Feller,
Lindeberg-l-evy.
174
173
characteristic ftlnction, 73--4
Chebyshev's inequality, 73
chi-square distribution, 98-.9, 108, 1 l l
non-central 108, l 1l
Chow test 487-8
collinearity. exact. 432....4
&near',434.-40
common factor restrictions, 507- 1l
condition ntlmbers, 436
conditional distriblltions, 89--94
exponentials 92
logistic. 9 l
normal. 93
Pareto, 92
conditional cxpectation. 12 1..7
wl'f a tr-field l25-7
w'?'ran observed vtlue, 12 l 5
propertics, 122, 125, l26 -7
conditional moments. 122- 5
mean. 122.-3
variances 122--3
conditional probability, 43-4
confidence region. 303-6
conllucnce analysis, 12
consistcncy, wcak, 244..-6
strong, 246
constant. in linear regression, 370- l 4 10
constrained MLE*s, 423-4
continuous rvfs. 56
185.-8
convergence, mathematical,
of a function, l 85-6
of a sequence, l 85
pointwise, 187
uniform, 187
convergence, modes of, 188-92
almost sure, J 88, 167
in distribution. 189, 167
in probability. 189. l66
in l'th mean 188
convergence of moments, 192.-4
corrclation coefficient, 119
covariance, 119
matrix, 3 12-3
Cramer-Rao,
lower botlnd, 237
regularity conditions, 237
Cramer-Wold lemma, 19 l
expansion,
206-7
Index
691
505-7
error, autocorrelation,
error bounds, Berry-Esseen, 202-3
error-correction model, 554
errors-in-variables, 12
also nonerror term. 349-50. 374-5 (.st?t?
systematic component)
estimable model. 23, 668
estimate- 23 l
estimation. methods, 252-84
estimation. properties of estimators,
23 1-.49
estimators,
CAN. 27 l
FIML, 625
GLS, 463. 503, 587-8
IV 637-.-44
,
k-class, 632
LIML, 629, 63 1, 633
OLS, 449-52
3SLS, 639.-40
ZSLS, 635-7
estimator generation cquation. 624-6
events, 38
elementary. 38
impossible, 39
mutually exclusive. 44
sure, 39
exclusion restrictions (.t?t>a priori
restrictions)
exogeneity, weak, 273, 376.7, 42 1-2
strong, 505, 629-30
tests for, 653
expectation, 68-9
conditional, 12 1-7
properties of, 70- 1, 116-20
experimental design, 366-7
exponential distribution, 76, 92, 124
exponentiai family of distributons. 68, 299
F distribution. centrals 104, 108, 113, 3 19
non-central i 13 3 19. 320, 324
F test. 398--402 425-,
553
power of. 40 1
.
F-type misspecification
test. 446
FIML,
625
final form, 60 1
finite sampie properties. 232.44
efficiency. 234--8
linearity. 238
sufficiency, 242...-4
unbiasedness, 232--4
692
Index
lag operator,
155, 16 1, 509
multiplier test procedure, 330,
333....4,430-2
in misspecilication testing, 446, 453, 460.
466, 468-9, 5 19-2 1
Laguerre polynomials, 208
law of large numbers (see WLLN, SLLN)
leading indicator model, 554
Lagrange
test)
Liapunov's
CLT,
174
648
locally UMP test, 335
logical empiricism, 662. 3
logical positivism. 3, 662-3
logistic distribution, 9 1 125
log-likelihood. 258-60
log-normal distribution. 283. 457
long-run equilibrium solution, 558-9
long-run multipliers, 602
lower bound (scc Cramer-Rao lower
bound)
,
Index
693
theorem. 296
Neyman-pearson
non-centrality parameter. 108, 111-13
in the F-test, 399, 40 1
nonlinear model. 46 1- 3
non-linear restrictions tstyt.'
a priori
restrictions)
non-parametric inference. 2 18
non-parametric processes, 146-52
non-parametric tests, 453
non-random sample. 2 18. 343. 494-7
357
non-stochastic N'ariables.
non-systematic component. 350, 370, 374,
376
64 6
normal dislribution
bivariate. 83 4. 88
mean of. 7()
multivariate. 3 l j 24
standard. 68
variance. 7 l
normality. 447 57
misspecification tests lbr. 45 1-j
4j5 7
normalising transformations.
normal (Gaussian ) stochastic process.
135--6
time homogeneitl rcstrictions on. 139
.
nuisance parameters.
null hypothesis. 286
4 14
observation space. 2 17
ogive, 27
omitted variables bias argument, 4 19-2 1
and auxiliary regressions. 446, 458-6 1
468, 47 1, 502, 515, 523
reformulated, 445-7
OLS, 448-51
0, o notation, 195--6
Op, op notation, 196-9
order condition fsee identification)
order of magnitude, 171, 174, 179, 194-8
orthogonal projection. 38 1 41 1, 642
orthogonality, 118
between systematic and non-systematic
components, 350, 358. 371, 38 1
overdiflkrencing, 479
overidentifying restrictions. 6 18
test for, 65 1-3
overparametrisation, 612-13
,
panel data, 42
parameter space, 60
parameter structural change, 48 1-7
parameter time-invariance, 378, 472-8 1
parametric family of densities (see
probability model)
parametric processes, 146
Pareto distribution, 6 1 339....41
partial adjustment model, 552
partial correlation coefficient, 314, 318,
323, 439-40
Pearson family of densities, 28, 452-3
Pearson paradigm, 7-8
Pillai's trace test, 593
pivots, 295
,
Political
Arithmetik, 4-5
power function, 29 1
power of a test, 290
power set, 39
predetennined variables, 610
prediction, 22 1, 247-9, 30*9
in the linear regression model, 402-5
in the multivariate linear regression
model. 599, 601-2, 654-5
principal components, 434
probability, definition,
axiomatic, 43
classical, 34
frequency, 34
subjective, 35
probability limit (,T?E, convergence)
probability model, 60-1, 214
694
Index
correlation
coefficient)
random experiment, 37
random matrix, 135
random sample, 2 16-17
random variable, 48-76
continuous, 56
defnition, 50
discrete, 56
functions of, 97, 99-1 10
minimal o'-lield generated by, 50
random vector, 78-93
rank condition (.st?t?
identication)
Rao-Blackwell
lemma, 243
realism, 663
recursive estimator, 407, 474-78
recursive system, 6 12- 14
regression curve, 122-4
regressors, order of magnitude, 39 1-2
rejection region, 286
reparametrisation/restriction, 2 1, 352
RESET type tests, 446, 460-1, 555, 597
residuals, 405-8
BLUS, 407
recursive, 407, 474-8
residual sum of squares (RSS), 428
respecification approach, 498-502, 505-9
restrictions see a priori restrictions)
Russian school, 36, 64
sample information, 352, 667
sample moments. 227
sample space, 38
sampling model, 2 15-19
score function, 260
second order stochastic process, 138-9
selection matrices, 6 19
sequential conditioning, 273-4, 495
serial correlation (see autocorrelation)
Shapiro-Wilk
test, 452
c-field. definition, 40
generated by a r.v., 50-1
generated by a set, 40
increasing sequence of, 51
simple random sampling, 343
simultaneous equation model, 608-58
singular normal distribution, 406
size of a test, 29 1
skedasticity, 123-4
skewness, 26, 73
170-2
Student's t distribution)
f distribution t.st?t?
t test, 364, 396-7, 392
test, definition, 294
test statistic, 289
testing, 22 1, 285-303
Theil's inequality coefcient, 405
theoretical parameters of interest. 349, 351,
553, 569, 61 3-14, 620, 669
theoretical model, 2 1, 667
theory, 20, 662-6
3SLS estimator, 635-7
time-homogeneity restrictions, 137-9
time series data, 130, 342
Toeplitz matrix, 495, 505
total sum of squares, 382
2SLS estimator, 626-35
type I error, 286
type 11 error, 286
UMA region, 305
UMP test, 291
Index
695
Convcrgence
(.$&E?
Convergence)
in
probability)
weakly stationary process (sczcsecond
order stationarity)
Weibull distributiona 105
white-noise process, 150, 151
White test for homoskedastcity, 465-7
Wilk's ratio tcst. 593
Weak
(.?t? Convergence
Convergence
Window. z-period,
40, 562
Wishart distribtltion, 32 1 577, 602-3
WLLN
168--9
Wold decomposition, 159
.
Yule-Walker
equationse
157
Zero-one
restrictions
restrictions)
tst??exclusion
(1) Symbols
Nq;1,c2)
normal
random
c-field
distribution
experiment
p and variance
with mean
c2
'...-r
.# - Borel field
k..p- union
to - intersection
- - complementation
subset of
(F - belongs to
( - does not belong to
r3' - empty set
c(z4) - minimal c-field generated by
# - for all
t.zrt:z,
j) - uniform distribution between a and j
5(,7,J?)- binomial distribution with parameters n and p
* : ) probability model
z2(??) chi-square distribution with n degrees of freedom
Ftnj na) - F distribution with nl and na degrees of freedom
-4
--+
- convergence
D
in probability
-+ - convergence in distribution
a S
. .
-+
'
almost
Sure
Convorgence
--+- convergence in
R - the real line (
?'th
:y-
mean
.
:y-
List of abbreviations
.-
'v - asymptotically
E(
E(
Vart
distributed as
,v - distributed
.
) - expected
under
value
) - variance
(2) Abbreviations
DGP - data generating process
CLT - central limit theorem
DF - distribution function
pdf - probability density function
WLLN - weak law' of large numbers
SLLN - strong law of large numbers
residual sum of squares
URSS - unrestricted
RRSS - restricted residual sum of squares
DLR - dynamic linear regression
MLR - mtlltivariate linear regression
MDLR - multivariate dynamic linear regression
MSE - mean square error
UMP - uniformly most powerftll
llD - independent and identically distributed
MLE maximum likelihood estimator
xxiii
List of abbreviations
OLS
GLS
AR
MA
ARMA
ARIMA
ARCH
LR
LM
GM
- ordinary least-squares
- generalised least-squares
.- autoregressive
moving average
.autoregressive,
moving average
-autoregressive,
integrated, moving average
autoregressive,
conditional
heteroskedasticity
likelihood
ratio
test
multiplier
- Lagrange
generating mechanism
.-
rv -
random
variable
IV instrumental variable
GIVE - generalised instrumental variables estimator
ZSLS - two stage least-squares
LIML - limited information maximum likelihood
FIML - full information maximum likelihood
3SLS - three stage least-squares
wrt with respect to
IMLE -- indirect maximum likelihood estimator
CAN - consistent, asymptotically normal
NllD - normal 11D
UMA - uniformly most accurate
MAE - mean absolute error
regression
specification error test
RESET
BLUE
best linear unbiased estimator
BLUS best linear unbiased scalar (residuals)