Simulation-Based Econometric Methods PDF

CORE Lectures
Simulation-Based Econometric Methods
The "CORE Foundation" was set up in 1987 with

the goal of stimulating new initiatives and research
activities at CORE.
One of these initiatives is the creation of
CORE LECTURES,
a series of books based on the lectures delivered each
year by an internationally renowned scientist invited
to give a series of lectures in one of the research areas
of CORE.
CORE Lectures
SIMULATION-BASED
ECONOMETRIC METHODS
CHRISTIAN GOURIEROUX
and
ALAIN MONFORT
OXFORD U N I V E R S I T Y PRESS
This book has been printed digitally and produced in a standard specification
in order to ensure its continuing availability
OXFORD
UNIVERSITY PRESS
Great Clarendon Street, Oxford OXa GDP

Oxford University Press is a department of the University of Oxford.
It furthers the University's objective of excellence in research, scholarship,
and education by publishing worldwide in
Oxford New York
Auckland Bangkok Buenos Aires Cape Town Chennai
Dar es Salaam Delhi Hong Kong Istanbul Karachi Kolkata
Kuala Lumpur Madrid Melbourne Mexico City Mumbai Nairobi
Sao Paulo Shanghai Singapore Taipei Tokyo Toronto
with an associated company in Berlin
Oxford is a registered trade mark of Oxford University Press
in the UK and in certain other countries
Published in the United States
by Oxford University Press Inc., New York
Christian Gourieroux and Alain Monfort, 1996
The moral rights of the author have been asserted
Database right Oxford University Press (maker)
Reprinted 2002
All rights reserved. No part of this publication may be reproduced,
stored in a retrieval system, or transmitted, in any form or by any means,
without the prior permission in writing of Oxford University Press,
or as expressly permitted by law, or under terms agreed with the appropriate
reprographics rights organization. Enquiries concerning reproduction
outside the scope of the above should be sent to the Rights Department,
Oxford University Press, at the address above
You must not circulate this book in any other binding or cover
and you must impose this same condition on any acquirer
ISBN 0-19-877475-3
We are especially grateful to L. Broze and B. Salanie for checking

the presentation and the proofs. They are not responsible
for remaining errors.
We also thank E. Garcia and F. Traore who carefully typed this text
and F. Henry who helped with the layout.
This page intentionally left blank
Contents
1 Introduction and Motivations

1.1 Introduction
1.2 A Review of Nonlinear Estimation Methods
1.2.1 Parametric conditional models
1.2.2 Estimators defined by the optimization of a criterion function
1.2.3 Properties of optimization estimators
1.3 Potential Applications of Simulated Methods
1.3.1 Limited dependent variable models
1.3.2 Aggregation effect
1.3.3 Unobserved heterogeneity
1.3.4 Nonlinear dynamic models with unobservable factors
1.3.5 Specification resulting from the optimization of some expected criterion
1.4 Simulation
1.4.1 Two kinds of simulation
1.4.2 How to simulate?
1.4.3 Partial path simulations
1
1
2
2
3
6
7
8
10
11
12
2 The Method of Simulated Moments (MSM)

2.1 Path Calibration or Moments Calibration
2.1.1 Path calibration
2.1.2 Moment calibration
2.2 The Generalized Method of Moments (GMM)
2.2.1 The static case
2.2.2 The dynamic case
2.3 The Method of Simulated Moments (MSM)
19
19
20
20
21
21
22
24
14
15
15
15
18
viii
CONTENTS
2.3.1 Simulators
2.3.2 Definition of the MSM estimators
2.3.3 Asymptotic properties of the MSM
2.3.4 Optimal MSM
2.3.5 An extension of the MSM
Appendix 2A: Proofs of the Asymptotic Properties of the MSM Estimator
2A. 1 Consistency
2A.2 Asymptotic normality
24
27
29
31
34
37
37
38
3 Simulated Maximum Likelihood, Pseudo-Maximum Likelihood,

and Nonlinear Least Squares Methods
3.1 Simulated Maximum Likelihood Estimators (SML)
3.1.1 Estimator based on simulators of the conditional density
functions
3.1.2 Asymptotic properties
3.1.3 Study of the asymptotic bias
3.1.4 Conditioning
3.1.5 Estimators based on other simulators
3.2 Simulated Pseudo-Maximum Likelihood and Nonlinear Least
Squares Methods
3.2.1 Pseudo-maximum likelihood (PML) methods
3.2.2 Simulated PML approaches
3.3 Bias Corrections for Simulated Nonlinear Least Squares
3.3.1 Corrections based on the first order conditions
3.3.2 Corrections based on the objective function
Appendix 3A: The Metropolis-Hastings (MH) Algorithm
3A.1 Definition of the algorithm
3 A.2 Properties of the algorithm
50
50
55
56
56
57
58
58
59
4 Indirect Inference
4.1 The Principle
4.1.1 Instrumental model
4.1.2 Estimation based on the score
4.1.3 Extensions to other estimation methods
4.2 Properties of the Indirect Inference Estimators
4.2.1 The dimension of the auxiliary parameter
61
61
61
62
64
66
66
41
41
42
42
44
45
48
CONTENTS
4.3
4.4
ix
4.2.2
Which moments to match?
67
4.2.3
Asymptotic properties
69
4.2.4
Some consistent, but less efficient, procedures
71
Examples
71
4.3.1
Estimation of a moving average parameter
71
4.3.2
Application to macroeconometrics
75
4.3.3
The efficient method of moment
76
Some Additional Properties of Indirect Inference Estimators
77
4.4.1
Second order expansion
77
4.4.2
Indirect information and indirect identification
82
Appendix 4A: Derivation of the Asymptotic Results
84
4A.1
Consistency of the estimators
85
4A.2
Asymptotic expansions
86
Appendix 4B: Indirect Information and Identification: Proofs

4B. 1
4B.2
Computation of / I (0)
89
89
Another expression of I (0)
91
5 Applications to Limited Dependent Variable Models
93
5.1
5.2
MSM and SML Applied to Qualitative Models
93
5.1.1
Discrete choice model
93
5.1.2
Simulated methods
94
5.1.3
Different simulators
96
Qualitative Models and Indirect Inference based on Multivariate

Logistic Models
5.2.1
5.2.2
5.3
100
The use of the approximations when correlation is present 102
Simulators for Limited Dependent Variable Models based on

Gaussian Latent Variables
5.3.1
5.4
Approximations of a multivariate normal distribution in a

neighbourhood of the no correlation hypothesis
100
103
Constrained and conditional moments of a multivariate

Gaussian distribution
103
5.3.2
Simulators for constrained moments
104
5.3.3
Simulators for conditional moments
107
Empirical Studies
112
5.4.1
112
Labour supply and wage equation
CONTENTS
5.4.2
Test of the rational expectation hypothesis from business

survey data
Appendix 5 A: Some Monte Carlo Studies
6 Applications to Financial Series
6.1
6.2
119
Estimation of Stochastic Differential Equations from Discrete Observations by Indirect Inference
119
6.1.1
The principle
119
6.1.2
Comparison between indirect inference and full maximum

likelihood methods
121
6.1.3
Specification of the volatility
125
Estimation of Stochastic Differential Equations from Moment

Conditions
6.2.1
6.2.2
6.3
113
115
Moment conditions deduced from the infinitesimal

operator
Method of simulated moments
133
133
137
Factor Models
6.3.1 Discrete time factor models
138
138
6.3.2
State space form and Kitagawa's filtering algorithm
139
6.3.3
An auxiliary model for applying indirect inference on factor ARCH models

SML applied to a stochastic volatility model
141
142
6.3.4
Appendix 6A: Form of the Infinitesimal Operator
143
7 Applications to Switching Regime Models

7.1 Endogenously Switching Regime Models
145
145
7.1.1
7.1.2
7.2
Static disequilibrium models

Dynamic disequilibrium models
Exogenously Switching Regime Models

7.2.1 Markovian vs. non-Markovian models
7.2.2 A switching state space model and the partial Kalman
filter
7.2.3 Computation of the likelihood function
145
148
151
151
152
153
References
159
Index
173
Introduction and Motivations

1.1 Introduction
The development of theoretical and applied econometrics has been widely influenced by the availability of powerful and cheap computers. Numerical calculations
have become progressively less burdensome with the increasing speed of computers while, at the same time, it has been possible to use larger data sets. We may
distinguish three main periods in the history of statistical econometrics.
Before the 1960s, models and estimation methods were assumed to lead to analytical expressions of the estimators. This is the period of the linear model, with the
associated least squares approach, of the multivariate linear simultaneous equations, with the associated instrumental variable approaches, and of the exponential
families for which the maximum likelihood techniques are suitable.
The introduction of numerical optimization algorithms characterized the second
period (1970s and 1980s). It then became possible to derive the estimations and
their estimated precisions without knowing the analytical form of the estimators.
This was the period of nonlinear models for micro data (limited dependent variable
models, duration models, etc.), for macro data (e.g. disequilibrium models, models with cycles), for time series (ARCH models, etc.), and of nonlinear statistical
inference. This inference was based on the optimization of some nonquadratic
criterion functions: a log likelihood function for the maximum likelihood approaches, a pseudo-log likelihood function for the pseudo-maximum likelihood
approaches, or some functions of conditional moments for GMM (Generalized
Method of Moments). However, these different approaches require a tractable
form of the criterion function.
This book is concerned with the third generation of problems in which the econometric models and the associated inference approaches lead to criterion functions
without simple analytical expression. In such problems the difficulty often comes
from the presence of integrals of large dimensions in the probability density function or in the moments. The idea is to circumvent this numerical difficulty by an
approach based on simulations. Therefore, even if the model is rather complicated,
it will be assumed to be sufficiently simple to allow for simulations of the data,
for given values of the parameters, and of the exogenous variables.
In Section 1.2 we briefly review the classical parametric and semi-parametric
nonlinear estimation methods, such as maximum likelihood methods, pseudomaximum likelihood methods, and GMM, since we will discuss their simulated
counterparts in the next chapters.
CHAPTER 1. INTRODUCTION AND MOTIVATIONS
In Section 1.3, we describe different problems for either individual data, time
series, or panel data, in which the usual criterion functions contain integrals.
Finally, in Section 1.4 we give general forms for the models we are interested in,
i.e. models for which it is possible to simulate the observations.
1.2 A Review of Nonlinear Estimation Methods

1.2.1 Parametric conditional models
To simplify the presentation, we assume that the observations are compatible with
a parametric model. Let us introduce different kinds of variable. The endogenous variables yt, t = 1 , . . . , T are the variables whose values have to be explained, y, may be uni- or multidimensional. We denote by Zt, t = 1 , . . . , T,
a set of exogenous variables, in the sense that we are interested in the conditional distribution of y i , . . . , yr given z\, , ZT and initial conditions _y0:
/o(yi> . yr/zi, . . . , Z T , yo)- This p.d.f. (probability distribution function) may
be decomposed into:
In the following sections we assume that there is no feedback between the y and
the z variables. More precisely, we impose:
where xt denotes (zt, yt-\). The equality
is a condition for Sims or Granger noncausality of (y r ), on (zt). In other words,

the zt are strongly exogenous variables.
Under condition (1.1), we get:
and, therefore, it is equivalent to consider the links between Z \ , . . - , Z T

y,, ..., yT or the links between xt and y,, for any t.
an
1.2. A REVIEW OF NONLINEAR ESTIMATION METHODS
Note that we have assumed that the conditional p.d.f. fo(yt/xt) does not depend
on t; more precisely, we assume that the process (yt, zt) is stationary. Also note
that, in cross sections or panel data models, the index t will be replaced by i and
the 3>,-s will be independent conditionally to the z,s.
To summarize, we are essentially interested in a part (1.2) of the true unknown distribution of all the observations, since the whole distribution consists
of /o(yi, - . . , yr/zi,... ,ZT, yo) and of the true unknown marginal distributions
fa(zi, . . . , Z T , y o ) o f z i , . . . , Z T , y o In order to make an inference about the conditional distribution, i.e. fo(yt/xt), we
introduce a conditional parametric model. This model M is a set of conditional
distributions indexed by a parameter 9, whose dimension is p:
where 0 C Rp is the parameter set. This model is assumed to be well-specified;

i.e.
fo(y,/xt) belongs to M,
(1.4)
and identifiable; i.e., there exists a unique (unknown) value 90 such that:
OQ is called the true value of the parameter.

In such a framework, the determination of the unknown conditional distribution is
equivalent to the determination of the true value of the parameter.
1.2.2 Estimators defined by the optimization of a criterion

function
The usual estimation approaches consist in optimizing with respect to the parameter
a criterion depending on the observations yt, Zt, t = 1,... ,T. The estimator is
defined by:
In practice, the function ^>T is generally differentiable and the estimator is deduced
from the /^-dimensional system of first order conditions:
With few exceptions, this system does not admit an analytical solution and the
estimation is obtained via a numerical algorithm. The most usual ones are based
on the Gauss-Newton approach. The initial system is replaced by an approximated one deduced from (1.7) by considering a first order expansion around some
value Oq:
where |^r is the Hessian matrix. The condition ^jj~(0) = 0 is approximately

the same as the condition
The Newton-Raphson algorithm is based on the corresponding recursive formula:
As soon as the sequence 6q converges, the limit is a solution 8j- of the first order
conditions.
The criterion function may be chosen in various ways depending on the properties
that are wanted for the estimator in terms of efficiency, robustness to some misspecifications and computability. We describe below some of the usual estimation
methods included in this framework.
Example 1.1 (Conditional) maximum likelihood
The criterion function is the (conditional) log likelihood function:
Example 1.2 (Conditional) pseudo-maximum likelihood (PML) (Gourieroux,

Monfort, and Trognon 1984, c)
In some applications it may be interesting for robustness properties to introduce
a family of distributions which does not necessarily contain the true distribution.
Let us introduce such a family f * ( y t / x t ; 9). The PML estimator of 9 based on the
family /* is defined by:
Such pseudo-models are often based on the normal family of distributions. Let us
consider a well-specified form of the first and second order conditional moments:
(say).
1.2. A REVIEW OF NONLINEAR ESTIMATION METHODS
We may introduce the pseudo-family of distributions N [m(xt; 9), a2(x,; 0)], 0

0, even if the true distribution is nonnormal. In such a case the criterion function
is:
Example 1.3 Nonlinear least squares (Jennrich 1969, Malinvaud 1970)

The estimator of 8 is obtained by minimizing the prediction errors. With the same
notation as before, it is given by:
It is easily checked that this procedure may be considered as a PML approach

based on the family [N [m(xt; 9), 1], 9 e }.
Example 1.4 Generalized method of moments (Hansen 1982, Hansen and Singleton 1982)
Let us introduce some ^-dimensional function K (y,, xt) of the observable variables
and let us assume that we have a well-specified expression for the conditional
moment:
This equality defines some estimating constraints. Indeed, it yields:
which implies:
for any function a. The idea of the GMM is to look for a value 0T, such that the
empirical counterparts of the constraints (1.14) are approximately satisfied. More
precisely, let us introduce r #-dimensional functions aj, j = 1, ...,r, and denote
A (ai,... ,ar). Let us also introduce a nonnegative symmetric matrix 2 of size
(r x r). Then the estimator is:
The elements of matrix A(xt) are instrumental variables (with respect to the constraints).
Example 1.5
Extended methods
It is possible to extend all the previous examples in the following way. Even if
we are interested in the parameters 6, we may introduce in the criterion function
some additional (nuisance) parameters a. Then we can consider the solutions OT ,
aT of a program of the form:
Equivalently, we have:
where ^(fl) = maxa *]/(#, a) is the concentrated criterion. Such an approach is

for instance the basis for the adjusted PML method (Broze and Gourieroux 1993).
1.2.3 Properties of optimization estimators

As usual, two kinds of asymptotic properties have to be considered: the consistency
of the estimator, and the form of its asymptotic distribution.
Consistency. If the well normalized criterion function uniformly converges to
some limit function:
and if this limit function has a unique maximum 0, then the estimator OT =
arg maxe tyT (0) converges to this value. Therefore in practice three conditions
have to be fulfilled:
(i) the uniform convergence of the normalized criterion function;
(ii) the uniqueness of 0 (identifiability condition of 0 with respect to the criterion function);
(iii) the equality between this solution Og (often called pseudo-true value) and
the true value OQ.
Asymptotic normality. Whenever the estimator is consistent, we may expand the
first order conditions around the true-value OQ. We get:
1.3. POTENTIAL APPLICATIONS OF SIMULATED METHODS
The criterion function is often such that there exist normalizing factors HT, h*T,
with:
Therefore we get:
We deduce that:
Under stationarity conditions, we generally have: ftr = T, h*T = \/T, hT/h*T
Vr.
Optimal choice of the criterion function. Some estimation approaches naturally
lead to a class of estimators. For instance, in the GMM approach we can choose in
different ways the instrumental variables A and the distance matrix 1. Therefore
the estimators &r(A, 2) and their asymptotic variance-covariance matrices,
are naturally indexed by A and 2. So, one may look for the existence of an optimal
choice for this couple, A, Q, i.e. for a couple A*, 2* such that:
where <3C is the usual ordering on symmetric matrices. For the usual estimation
methods, such optimal estimators generally exist.
1.3 Potential Applications of Simulated Methods

In this section we present some parametric problems for which the likelihood function f ( y t / x t ; 0), and the conditional moments m(xt\ 9), a2(xt\ 9), k(xt\ 6),...,
do not admit a tractable form. In such a framework the optimization algorithms,
CHAFTER 1. INTRODUCTION AND MOTIVATIONS
such as the Newton-Raphson algorithms, cannot be used directly since they require a closed form of the criterion function *. Generally this difficulty arises
because of the partial observability of some endogenous variables; this lack of
observability will introduce some multidimensional integrals in the expression of
the different functions and these integrals could be replaced by approximations
based on simulations. In such models it is usual to distinguish the underlying endogenous variables (called the latent variables), which are not totally observable,
and the observable variables. The latent variables will be denoted with an asterisk.
As can be clearly seen from the examples below, this computational problem
appears for a large variety of applications concerning either individual data, time
series, or panel data models, in microeconomics, macroeconomics, insurance,
finance, etc.
1.3.1
Limited dependent variable models
Example 1.6 Multinomial probit model (McFadden 1976)

This is one of the first applications that has been proposed for the simulated
estimation methods we are interested in (McFadden 1989, Pakes and Pollard 1989).
It concerns qualitative choices of an individual among a set of M alternatives. If
the individual z is a utility maximizer, and if (/,, is the utility level of alternative
j, the selection is defined by:
j is the retained alternative if and only if
f/y > Uu
VI ^= j.
The selection model is obtained in two steps. We first describe the latent variables (i.e. the utility levels) as functions of explanatory variables z, using a linear
Gaussian specification:
U,j = zijbj + Vij,
j = 1 , . . . , M, i = 1 , . . . , n,
Ut = Zib + vt, i = l,...,n,
where /,- = (Un,..., UiMy, u{ = (vn,..., viM)', and v{ ~ W(0, 2).
Then in a second step we express the observable choices in term of the utility
levels:
In such a framework the endogenous observable variable is a set of dichotomous
qualitative variables. It admits a discrete distribution, whose probabilities are:
where g is the p.d.f. of the M 1 dimensional normal distribution associated with

the variables vtj vu, I / j.
Therefore the distribution of the endogenous observable variables has an expression containing integrals whose dimension is equal to the number of alternatives
minus one. These integrals may be analytically approximated when / is not too
large (/ < 3 or 4), but for a number of applications, such as transportation choice,
this number may be larger.
Example 1.7 Sequential choices and discrete duration models
In structural duration models, the value of the duration may often be considered as
the result of a sequence of choices. For instance, in job search models (Lancaster
1990, Chesher and Lancaster 1983, Lippman and McCall 1976, Nickell 1979,
Kiefer and Neumann 1979), under some stationarity assumptions each choice is
based on a comparison between the potential supplied wage and a reservation
wage, i.e. a minimal wage above which the job is accepted. Let us denote by y*it,
3>2j, the potential and reservation wages for individual i at time t.
If we consider a sample of individuals becoming unemployed at date zero, the
length of the unemployment spell for individual i is:
A = i n f { ? : y * , > y*2it}.
If the latent bivariate model associated with the two underlying wages is a dynamic
linear model, for instance with values of the wages:
where (MI,,-/, u2iit) are i.i.d. normal N(0, 2), it is directly seen that the distribution
of the duration has probabilities defined by the multidimensional integrals:
P[D, =d] = P[y*t. , < y* u , . . . , ),*.,_, < j,* . ,_ y* ., > V2*w}.
The maximal dimension of these integrals may be very large; since the probabilities
appearing in the likelihood function correspond to the observed values of the
durations, this dimension is equal to max,=1 ndi, where n is the number of
individuals.
The same remark applies for the other examples of sequential choices, in particular
for prepayment analysis (see Frachot and Gourieroux 1994). In this framework
the latent variables are an observed interest rate and a reservation interest rate.
10
1.3.2
Aggregation effect
The introduction of integrals in the expression of the distribution of the observable

variables may also be the consequence of some aggregation phenomenon with
respect to either individuals or time.
Example 1.8 Disequilibrium models with micro markets (Laroque and Salanie
1989, 1993)
Let us consider some micro markets n = 1 , . . . , N, characterized by the micro
demand and supply:
where zt are some exogenous factors, stlj, sts some macroeconomic error terms,
which are the same for the different markets, and end, e" some error terms specific
to each market.We assume that (e^, e") are i.i.d., independent of Zt, Std,
We now consider some observations of the exchanged quantity at a macro level.
If N is large, we get:
Therefore the expression of the observable variable Q, in terms of the exogenous

variable Zt and of the macroeconomic error terms appears as an expectation with
respect to the specific error terms.
Example 1.9 Estimation of continuous time models from discrete time observations (Duffie and Singleton 1993, Gourieroux et al. 1993)
In financial applications the evolution of prices is generally described by a diffusion
equation, such as:
where W, is a Brownian motion. (j,(yt, 9) is the drift term, and a(yt, 9) is the
volatility term. However, the available observations correspond to discrete dates;
they are denoted by y\, yi, ..., yt, yt+\, .... The distribution of these observable
variables generally does not admit an explicit analytical form (see Chapter 6),
but appears as the solution of some integral equation. This difficulty may be
seen in another way. Let us approximate the continuous time model (1.19) by
its discrete time counterpart with a small time unit \/n. We introduce the Euler
approximation yM of the process v.
11
This process is defined for dates t = k/n and satisfies the recursive equation:
where (e*) is a standard Gaussian white noise. The distribution of y\, yz, , yt,
yt+i,... may be approximated by the distribution of y[ , y^ ,..., y^n\ yt+i,
We note that the process y^ is Markovian of order one and we deduce that it
is sufficient to determine the distribution of y^ conditional on y^ to deduce
the distribution we are looking for. Let us introduce the conditional distribution
of 3$li)/B given yffn: f(y$+n/n/yffn) say. From (1.20), this distribution is nor-
mal with mean yffn + ^[yffn; 0], and variance ^<r2(yffn; 9). The conditional
distribution of y^ given y^ is given by:
and requires the computation of an (n 1)-dimensional integral since we have to

integrate out all the missing values y^l/n,..., y^+(n_l)/n.
1.3.3
Unobserved heterogeneity1
The introduction of unobserved heterogeneity in nonlinear models also creates

multiple integrals.
Example 1.10 Random parameter model
A nonlinear regression model such as:
may be extended by allowing parameter 9 to depend on the individuals. To avoid

identification problems, the usual practice consists in writing
where 0,-, i = l,...,n, arei.i.d. variables, for instance distributed as N(0, S2), and
independent of the error terms WL Then the conditional probability distribution
function (p.d.f.) of the endogenous variable yt, given the exogenous variables z,-,
is:
where K is the number of explanatory variables and <p the p.d.f. of the standard
normal distribution.
'See Gourieroux and Monfort (1991, 1993a).
12
Example 1.11 Heterogeneity factor

The modelling of qualitative variables, count data, duration data, etc., is generally
based on conditional specifications of the form
where / is a well-chosen distribution, such as logistic, Poisson, or exponential,

depending on the kind of endogenous variable. Therefore the whole influence of
the explanatory variables is driven by the scoring function Zi&. It is important to
take into account the possibility of omitted explanatory variables independent of
the retained ones. It is known that in a linear Gaussian model the OLS estimators
of the 9 parameter remain unbiased in the case of such an omission, but this
property is no longer valid for nonlinear specifications. This heterogeneity will be
introduced by adding an error term to the scoring function:
where for instance the u\ are i.i.d. variables, with given p.d.f. g. Then the conditional distribution of y, given z, is:
Except for some very special and unrealistic choices of the couple of distributions (/, g) (for instance Poisson-gamma for count data, exponential-gamma for
duration data), the integral cannot be computed analytically.
Two remarks are in order. First, the dimension of the integral is equal to the
number of underlying scoring functions and is not very large in practice. The
numerical problems arise from the large number of such integrals that have to be
evaluated, since this number, which is equal to the number of individuals, may
attain 100000-1000000 in insurance problems, for instance.
Second, a model with unobserved heterogeneity may be considered a random
parameter model, in which only the constant term coefficient has been considered
random.
1.3.4
Nonlinear dynamic models with unobservable factors
The typical form of such models is:
where (e^,), (e2,) are independent i.i.d. error terms the distributions of which are
known (see 1.4.2), and (z r ) depicts an observable process of exogenous variables.
13
The process fe) is assumed to be independent of the process (e\,, s2t). Vt-i
denotes the set of past values y,^, y,-2i,... of the process y. (yt) is the observable
endogenous process and (y*) is the endogenous unobservable latent factor.
If the functions rife, yt-\, y*, ; 6>) and r 2 fe, yt-\, y*_i, ; 0) are one to one,
the previous equations define the conditional p.d.f. f ( y t / Z t , y-i, y*', 0) and
f(y*/Zt, yt-i, y*-i', 6)- Therefore the p.d.f. of yr_, y? given ZT_ (and some initial values) is OLi /(^/z*. tt-i. }: 0)f(yf/Zt_, y,-i, y*_^, 0), and the likelihood
function, i.e. the p.d.f. of y r , appears as the multivariate integral
where fji(y*) denotes the dominating measure.

Example 1.12 Factor ARCH model (Diebold and Nerlove 1989, Engle et al.
1990)
The relationship between the factors and the observable variables corresponds to
a linear specification:
where y,, A. are m-dimensional vectors and C is a lower triangular (m x m) matrix. We introduce here a single factor y*, which is assumed to satisfy an ARCH
evolution:
Example 1.13 Stochastic volatility model (Harvey etal. 1994, Danielsson and
Richard 1993, Danielsson 1994)
The simplest model of this kind is
where (ei,, 2?) is a standard bivariate Gaussian white noise. In this kind of model
the conditional distribution of yt given y* is W(0, exp y*) and the latent factor y*
is an AR(1) process.
Example 1.14 Switching state space models (Shephard 1994, Kim 1994, Billio
and Monfort 1995)
These models are:
14
where {et}, {n,} are independent standard Gaussian white noises and {u,} is a
white noise, independent of {st}{n,} and whose marginal distribution is W[o,i]> the
uniform distribution on [0, 1],
In this model the first factor y*t is a quantitative state variable whereas yzt is a
binary regime indicator. (The case of more than two regimes is a straightforward generalization.) This general framework contains many particular cases:
switching ARMA models, switching factor models, dynamic switching regressions, deformed time models, models with endogenously missing data, etc.
Example 1.15 Dynamic disequilibrium models (Laroque and Salanie 1993,
Lee 1995)
In this kind of model the latent factors are demand and supply, whereas the observed
variable yt is the minimum of the demand and the supply. Such a model is:
Note that no random error appears in r\; this implies that the model is, in some way,
degenerated and that the general formula given above for the likelihood function is
not valid. However, it is easily shown (see Chapter 7) that the likelihood function
appears as a sum of 2T T-dimensional integrals.
1.3.5 Specification resulting from the optimization of some

expected criterion
An example of this kind appears in a paper by Laffont et al. (1991). The authors
derive from economic theory a first price auction model. For an auction with J
bidders, the winning bid has the form
y = [max(D ( y_i); po)/v(J)],
where Vj-., j = 1 , . . . , / are the underlying private values, p0 is a reservation price,
independent of the VjS, and vyy is the order statistic associated with the VjS.
In this example the function linking the latent variables D/, j = 1 , . . . , /, po, and
the observed bid y has a directly integral form. Note that, by taking the expectation,
we eliminate the conditioning effect and arrive at:
Ey = (max(u ( ./_i),/Jo)),
but some integrals remain because of the max and the ordering of the bids.
1.4. SIMULATION
15
1.4 Simulation
1.4.1 Two kinds of simulation
For a given parametric model, it is possible to define the distribution of y\,..., yj
conditional on z\, .., zr. yo. /(7zi,.. , zr, W #), say, and the distribution of
yt conditional on xt = (z,, yt-i), f ( - / x t ; 0), say. It is particularly important in
the sequel to distinguish two kinds of simulation.
Path simulations correspond to a set of artificial values (yst(0), t = 1,..., T) such
that the distribution of y{(9),..., ysT(6) conditional on z\, , ZT, y_o_ is equal to
f(-/zi,...,zT,yo,0).
Conditional simulations correspond to a set of artificial values (}f (#), t =
1 , . . . , T) such that the distribution of yst(0) conditional on xt = fo, yt-\) is
equal to f ( - / x t ; 0), and this for any t.
It is important to note that these simulations may be performed for different values
of the parameter, and that the conditional distributions of the simulations will
depend on these values.
Moreover, it is possible to perform several independent replications of such a set of
simulations. More precisely, for path simulations we can build several sets ys(9) =
(yf(9), t = 1 , . . . , T),s = 1 , . . . , 5, such that the variables ys (9) are independent
conditionally on z\, .., ZT, yo and y\,... ,yr- This possibility is the basis of
simulated techniques using path simulations, since the empirical distribution of
the y*(9), s 1 , . . . , 5 will provide for large S a good approximation of the
untractable conditional distribution /(-/Zi, - . , ZT, yo', 9).
Similarly, for conditional simulations we can build several sets ys(9) = (yst(6),
t = l,...,T),s - l,...,S, such that the variables y/(0), t - 1 , . . . , T, s =
1 , . . . , 5, are independent conditionally on z i , . . . , ZT, yo- This possibility is the
basis of simulated techniques using conditional simulations, since the empirical
distribution ofyst(9), s 1 , . . . , S will provide for large S a good approximation
of the untractable conditional distribution /(-/z, yt-\\ 9), and this for any t.
1.4.2 How to simulate?

A preliminary transformation of the error terms
The usual software packages provide either independent random numbers uniformly distributed on [0,1], or independent drawings from the standard normal
distribution. The use of these packages requires a preliminary transformation of
the error terms, in order to separate the effect of the parameters and to obtain, after
transformation, the basic distribution.
16
Example 1.16
Gaussian error terms
If the initial model contains Gaussian error terms e* independently following

N (0, Q), where 1 depends on the unknown parameter 9, it is possible to use the
transformation e* = Ast, where A A' = fi, and where s, has the fixed distribution N(0, Id).
We may simulate e (9) for a given value of the matricial parameter A(9) by first
drawing independently the components of st from the standard normal (ef is the
corresponding simulated vector) and then computing s*s(9) = A(8)sst. When 6
changes it is possible to keep the same drawing e* ofst in order to get the different
drawings of e*. In fact, we shall see later that it is necessary to keep these basic
drawings fixed when 9 changes, in order to have good numerical and statistical
properties of the estimators based on these simulations.
Example 1.17
Inversion technique
If e* is an error term with a unidimensional distribution whose cumulative distribution function (c.d.f.) Fg(-), parameterized by 9, is continuous and strictly
increasing, we know that the variable
follows a uniform distribution on [0, 1]. We may draw a simulated value e* in

this known distribution, and e*s(9) F^(est), obtained by applying the quantile
function, is a simulated value in Fg.
Particular cases are:
Exponential distribution:
Weibull distribution:
Cauchy distribution:
1.4. SIMULATION
17
Path simulations
The models of the previous subsections are often defined in several steps from
some latent variables with structural interpretations. We may jointly simulate the
latent and the observable endogenous variablesfor instance demand, supply,
and exchanged quantityin the dynamic disequilibrium model (example 1.15),
the underlying factor and the observed vector in the ARCH model (example 1.12),
the utility levels and the observed alternatives in the multivariate probit model
(example 1.6).
After the preliminary transformation of the error terms, the models have the following structure:
where (st) is a white noise whose distribution is known.

The simulated paths of both latent and observable processes are obtained in a
recursive way, from some given value & of the parameter, some initial values
3to, JQ , the observed exogenous path (z,), and simulations of the normalized error
terms (e,). The recursive formulas are:
Conditional simulations
For a general dynamic model with unobservable latent variables such as (1.21),
it is not in general possible to draw in the conditional distribution of y, given
z\, .., ZT, Jt-\- However, this possibility exists if the model admits a reduced
form of the kind:
where (st) is a white noise with a known distribution. The conditional simulations
are defined by:
where the 8s, are independent drawings in the distribution of et.

These conditional simulations are different from the path simulations given by:
which are computed conditionally to the simulated values and not to the observed
ones.
18
1.4.3 Partial path simulations

Finally, we may note that the untractability of the likelihood function is often due
to the introduction of some additional error terms. More precisely, if some error
terms were known, the form of the likelihood function would be easily derived.
Let us partition the vector of errors
into two subvectors such that the p.d.f. conditional to a path (z t , u,) has a closed
form. This means that the integration with respect to the remaining errors w, is
simple. We have:
for any fixed path (yt). Therefore we can approximate the unknown conditional
p.d.f. by:
i.e. by using simulations only of the u error terms.

Example 1.18 Factor ARCH models
Let us consider example (1.12). When the process (e2r) is known, the factors are
also known. The conditional distribution l ( ( y , ) / ( E 2 t ) ) i s eclualto me conditional
distribution of (yt) given (yf); i.e., it corresponds to an independent Gaussian
process, with mean Ay(* and variance-covariance matrix CC'.
Example 1.19 Random parameter models
In the model:
introduced in example 1.10, we may easily integrate out w, conditionally to MJ.
The Method of Simulated

Moments (MSM)
2.1 Path Calibration or Moments Calibration
The basic idea of simulated estimation methods is to adjust the parameter of interest 9 in order to get similar properties for the observed endogenous variables (yt)
and for their simulated counterparts (yst(9)). However, the choice of an adequate
calibration criterion is important to provide consistency of the associated estimators. In particular, two criteria may seem natural: the first one measures the
difference between the two paths (yt, t varying) and (yf (9), t varying); the second
one measures the difference between some empirical moments computed on (yt)
and (y/ (9)) respectively. It will be seen in this subsection that the first criterion
does not necessarily provide consistent estimators. To illustrate this point, we
consider a static model without exogenous variables. The observable variables
satisfy:
yi=r(Ui;Qo),
i-l,...,n,
(2.1)
where #o is the unknown true value of a scalar parameter, (M,-) are i.i.d. variables
with known p.d.f. g(u), and r is a given function.
We introduce the first order moment of the endogenous variable:
and assume that the function k does not have a closed form.
Now we can replace the unobservable errors by simulations drawn independently
from the distribution g. If u\,i 1 , . . . , , are such simulations, we deduce
simulated values of the endogenous variables associated with a value 9 of the
parameter by computing
20
CHAPTER 2. THE METHOD OF SIMULATED MOMENTS (MSM)

2.1.1
Path calibration
Let us introduce the path calibrated estimator of 9 as:
Under usual regularity conditions, this estimator tends asymptotically to the solution 6*00 of the limit problem:
where 0<x, satisfies the asymptotic first order conditions:
We note that the path calibrated estimator is consistent if and only if 9^ = 0Q is

a solution, i.e. if and only if -jgVr(u; #o) = 0, a condition that is not satisfied in
general. For instance, if we have Eyt = OQ, Vyi = 9$, we get:
2.1.2
Moment calibration
Alternatively, we may look for a moment calibrated estimator. Let us consider

the empirical first order moments computed from the observations and from the
simulations. The estimator is defined by:
It converges to the solution 6X of the limit problem:
2.2. THE GENERALIZED METHOD OF MOMENTS (GMM)
21
As soon as k is one to one, this limit problem has the unique solution 9^ = 9o,
and the estimator is consistent.
2.2
The Generalized Method of Moments (GMM)
This estimation method has already been introduced in Chapter 1. Here we briefly
recall its main properties, making a distinction between the static and the dynamic
case. The GMM will be approximated by the method of simulated moments
(MSM). In particular, the expression of the asymptotic variance-covariance matrix
of the GMM estimator will serve as a benchmark for measuring the efficiency loss
arising from simulations.
2.2.1
The static case
Let us consider i.i.d. observations (j,-, zi), i = 1 , . . . , n, on endogenous and

exogenous variables. We introduce some functions K (j,-, zt) of these observations,
where K is of size q, and we assume that the conditional expectation of K(y,;, zi)
given Zi has a well-specified form:
where E0 is the expectation for the true distribution of (y, z), and 80 is the true
value of the parameter whose size is p.
Now let Z, be a matrix function of Zi withsize(, q), whereof > p. Theelements
of Z, may be seen as intrumental variables, since they satisfy the orthogonality
conditions:
The GMM estimators are based on the empirical counterpart of the above orthogonality conditions. If fi is a (K, K) symmetric positive semi-definite matrix, the
estimator is defined by:
22
Proposition 2.1
Under regularity conditions (see Hansen 1982),

(i) On(Q) is a consistent estimator of the true value 6g.
(ii) The GMM estimator is asymptotically normal:
where:
This asymptotic variance-covariance matrix depends on 2, and it is possible to

choose this matrix in an optimal way, i.e. to select 2* such that:
Proposition 2.2
An optimal choice of the matrix 2 is:
Then
and:
The optimal 2* matrix depends on the unknown distribution of the observations.

It has to be consistently estimated. It is easily checked that the replacement of 2*
by a consistent estimator does not change the asymptotic properties. Such an
estimator is:
where 6n is any consistent estimator of OQ, for instance the GMM estimator with
^2 = Id.
2.2.2
The dynamic case
When lagged endogenous variables are present in the model, we have to distinguish
two cases.
2.2. THE GENERALIZED METHOD OF MOMENTS (GMM)
23
GMM based on dynamic conditional moments

Let us denote (yt~i,zL) by x, and let us consider a well-specified form for the
conditional moment of K(yt, xt) given xt:
and instrumental variables depending on x,:
The GMM estimator is denned as before by:
and has the same asymptotic properties as in the static case after replacing Zt by x,.
GMM based on static conditional moments
Let us now introduce estimating constraints based on static moments, i.e. conditional only on the current and lagged exogenous variables zt '
Then the instrumental variables have also to depend only on zt ' Zt = Z(zt).
The estimator solution of
is also consistent, asymptotically normal, but the form of the asymptotic

variance-covariance matrix, and of the optimal Q matrix are modified. We
simply have to replace in the formulas given in propositions (2.1) and (2.2)
Vb {Z [K(y, x) - k(x; #o)]}, with:
where
24
In practice, the choice between dynamic and static conditional moments will be
based on the type of models considered. It is clear that more information is contained in the dynamic conditional moments than in the static conditional moments,
but static conditional moments may be easier to compute, for instance in models
of type (M*) (see 1.21) where unobservable endogenous variables are present.
The distinction is particularly important for pure time series models, i.e. when
exogenous variables z are absent. Introducing as endogenous variable
for instance, we may consider GMM based on conditional moments: E(yt/yt-i),

E(y?/yt-i), E(ytyt+i/yt-[) (dynamic conditional moments in our terminology),
or GMM based on marginal moments: E(yt), (y r 2 ), E(ytyt+\) (static conditional
moments in our terminology).
2.3 The Method of Simulated Moments (MSM)1

2.3.1 Simulators
The application of GMM requires a closed form for the specification of the moments. In the examples described in Chapter 1, such a closed form does not exist
and the function k will be replaced by an approximation based on simulations.
Such an approximation k is called a simulator. Even if it is natural to build such
simulators by following the structural form of the model, it may often be preferable
for precision or regularity reasons to introduce some other simulators. Let us, for
instance, consider a static model defined by the reduced form
where (e;) has a known distribution; then we deduce from the definition of the
conditional moment,
that
We say that k [z,-, e*; 0] = K [r(z,, ef, 0), z,-], where ef, drawn from the distribution of Si, is a (conditionally) unbiased simulator of fc(z,-; 0). This natural
simulator may have drawbacks in terms of precision or in terms of discontinuity
'McFadden 1989, Pakcs and Pollard 1989.
2.3. THE METHOD OF SIMULATED MOMENTS (MSM)
25
with respect to 9 (see example 2.2), so it may be useful to look for other unbiased
simulators k(zt, ,-; 0), where ut has a known distribution such that:
the expectation being taken with respect to w,.

Example 2.1 Models with a closed form for the distribution of the latent
variables
As seen in Chapter 1, the structural models are generally defined in two steps, by
using some latent variables. Moreover, the latent model is often tractable, and
it is essentially the nonlinear transformation between the latent variables yf and
the observable ones yt which creates the difficulties. Let us consider such a case,
where the conditional distribution of 3;* given zi has a tractable form: f*(y*/z',0),
and let us denote by y a(y*) the link between latent and observable variables.
We have:
If the conditional distributions f*(-/Zt\ 0) have a fixed support (i.e. independent

of 9), and if <p(-) is a nonnegative function with the same support and such that
I (f(u)du = 1, we get:
Therefore:
where ut, drawn from the distribution with p.d.f. tp, is a (conditionally) unbiased
simulator. As a matter of fact we have exhibited a class of simulators, depending
on the choice of function <p, called an importance function. The precision of the
MSM estimators defined below will depend on the choice of the simulatori.e.,
in the previous framework, of function (p.
Finally we can note that such a choice may depend on the value of the conditioning
variable: we may introduce a conditional known distribution (p(ui/zi), and the
associated simulator:
26
Example 2.2 Smooth simulators for the multinomial probit model

As an example of the previous class of simulators, let us consider the multinomial
probit model (see example 1.6).
In the multinomial probit model, we have:
yi = (yn, -,}';j)/,
where:
and AJ is the j'th row of a matrix A satisfying A A' = Q. If we are interested in

the conditional moments associated with the selection probabilities
a natural simulator is:
However, this simulator, called the frequency simulator, is not differentiable (and
not even continuous) with respect to the parameters b j , AJ. It may be replaced
by another simulator based on importance functions. Indeed, let us consider the
variables
Vjt Uij - Uu,
measuring the differences between the utility levels. The distribution of
V j i , . . . , V j j - i , Vjj+i,..., Vjtj is a normal distribution.
Let us denote
fj(vj/Zi, 0) the associated p.d.f. We have:
Therefore, we may use the simulator
where v/ is drawn from a known distribution <p(vj/zt) whose support is (K.+)-7"1.

For instance, <p might correspond to a product of exponential distributions, to a
product of truncated normal distributions, and so on. This kind of simulator is
clearly differentiable with respect to 6 as soon as // is.
27
2.3.2 Definition of the MSM estimators

The estimator is derived by calibrating some empirical moments based on observations and simulations.
Static case
Let k(zi, Uj\ 9), where M; has a known conditional distribution, be an unbiased
simulator of &(z,; 6). The MSM is defined as the solution of:
where:
Such estimators depend on the moments K that are retained, on the instruments, on
the matrix Q, on the choice of the simulator, and on the number S of replications.
When S tends to infinity, | ?=! k(zi, u}\ 9) tends to E[k(zi, u; 0)/Zi\ = k(zr, 9),
and the estimator coincides with the GMM estimator.
Dynamic case
As before, we distinguish the cases of dynamic and static conditional moments,
even if the objective functions are similar.
Dynamic conditional moment Let us consider the dynamic moment condition:
and an unbiased simulator of k(xt; 9); this simulator k(x,,u;0) is such that
E[k(xt, u; 0)/xt] = k(xt\ 0), where the distribution of u given xt is known. Then
the simulated moment estimator is defined by:
where:
and M* is drawn in the known conditional distribution of u given xt.
28
Static conditional moment Let us consider the static moment condition:
and an unbiased simulator k such that:

where the conditional distribution of u given zt is known. Then the simulated
moment estimator is defined by:
where:
and wj is drawn in the conditional distribution of u given z r .

It is clear that formulas (2.18) and (2.19) are similar, but, as already mentioned, it
is important to distinguish the two cases for the following reason. If we consider
the simulators introduced in the two cases, they are defined differently, the first
one being conditioned by xt, the second one by zt- More precisely, if the model
has a well-defined reduced form, i.e. if it is of the (M) type (see (1.22)), i.e.
where u has a known distribution:
is an unbiased simulator of the dynamic conditional moment. Note that in this
case yst is simulated (through u) conditionally to the observed lagged values yt-i,
i.e. we use conditional simulations.
Let us now consider a static conditional moment.
The function
K [ r ( y t - i , zt, u; 6), z,] with fixed y,_l cannot be used as an unbiased simulator
conditional on the variables zt alone. Before defining the simulator, it is first necessary to replace the recursive form of the model, yt = r(yt-\,Zt, ut; $),byafinal
form in which the lagged values of y are replaced by functions of past exogenous
variables and past innovations: yt = r/fo, M,; 0). Then K[rf(zj_, Ut\ &), Zt_] is
an unbiased simulator conditional on Zt_. In this case it is necessary to draw the
whole path ut or the whole path y?. Therefore the simulations of the endogenous
variables are performed unconditionally on the observed values of these variables.
It is important to note that if the model has no well-defined reduced formfor
instance in the (M*) case (see (1.22))only the MSM based on static conditional
moments is available.
29
Drawings and Gauss-Newton algorithms

As noted before, the optimization of the criterion function i/f is generally solved by
a numerical algorithm, which requires the computation of the objective function
for different values 9\,..., 9P of 0. It is important to insist on the fact that the
drawings us are made at the beginning of the procedure and kept fixed during the
execution of the algorithm. If some new drawings were made at each iteration
of the algorithm, this would then introduce some new randomness at each step; it
would then not be possible to obtain the numerical convergence of the algorithm,
and the asymptotic statistical properties would no longer be valid.
2.3.3
Asymptotic properties of the MSM

Static case
Proposition 2.3
When n tends to infinity and S is fixed,

is strongly consistent;
whrer:
with:
and where k and K are simplified notations for k(z, u; OQ)

and K(y,z) respectively.
Proof. This is given in Appendix 2A.
The asymptotic variance-covariance matrix is decomposed into two terms: the
first one Ef 1 E 2 E f 1 is the asymptotic covariance matrix of the GMM estimator,
and the second one summarizes the effect of simulations. It is nonnegative and
decreases with the number of simulations; in the limit, when S -> oo, the MSM
and GMM estimators are asymptotically equivalent.
Corollary2.1
whrer
si
asymptotic covariance matrix of the GMM estimator.
the
30
Moreover, this additional effect of the simulation depends on the quality of the
simulator V(k/z). For instance, if we consider a simulator with two random
generatorsk(z, iti,ii2',0), and the simulator obtained by integrating out 1/2 :
k(z, u\; 9) = E[k(z, u\, u2,; 0)/u\]we have:
Therefore, whenever possible, we have to decrease the number of random terms

appearing in the simulator.
Corollary 2.2
Let us assume that y r ( z , e ; d ) .

(i) If u is a subvector of e, and if the simulator is of the
form: k(u, z : 0) = Eg [K(y, z ) / z , u], then:
(ii) In particular, Qs(&) (l + |) Sf 1 E 2 Sf', and the

upper bound is reached for the simulator k ( z , ', &)
K [r(z, e; 6), z] corresponding to u = e.
Proof, (i) We have:
Therefore the result directly follows by replacing V0[Z(k - k)] by this decomposition in the second expression of Qs(&) given in Proposition 2.3.
since the second term in the decomposition of <2,s(2) is nonpositivc. Moreover,

this term vanishes when
TABLE 2.1: Efficiency

Number S of replications
1
2
3
Lower bound of the asymptotic
50%
66.6 % 7 5 %
relative efficiency
Maximal relative increase of
141 % 122 %
115 %
confidence intervals
31
9_
90%
102 %
In this case, the asymptotic relative efficiency of the MSM estimator defined as
the smallest eigenvalue of
is, under the condition of Corollary 2.2, larger than (1 + j)~l. It is interesting to
note that the efficiency loss is not large even with a small number of replications.
We give in Table 2.1 the values of (1 +1) ~' (lower bound of the asymptotic relative
efficiency) and of (1 + |)+1/2 corresponding to the maximal relative increase in
the length of confidence intervals.
Dynamic case
For the MSM based on dynamic conditional moments, the results are identical to
the previous ones, and we do not repeat them.
For static conditional moments (or unconditional moments for pure time series),
the results are modified. More precisely, if ()>;?(#)) is a simulated path of the
endogenous process, and if the simulator used for Eg [K (y,, z,) /zt ] is K [yj (9), zt ],
then the asymptotic variance-covariance matrix of the MSM estimator is (1 + |)
times that of the GMM estimator (see Duffle and Singleton 1993).
2.3.4 Optimal MSM

In the previous subsection, and for a given set of conditional moments, we get a
class of MSM estimators, since the matrix Q and the set of instruments Z may be
chosen in various ways. In this section we discuss the choices of & and Z for the
static case (or, equivalently, for the dynamic conditional moment case).
Optimal choice of 2
The asymptotic variance-covariance matrix of the MSM estimator is:
32
From the Gauss-Markov theorem, we know that the nonnegative symmetric matrix
(D'Q.D)-lD'Q.Y1Q$iD(D'V,Drl is minimized for to = E 0 ~'. We deduce the
following result.
Proposition 2.4
We have: (2s(S2) Qs(&*), where the optimal choice

of the matrix is:
The asymptotic variance-covariance matrix corresponding to this choice is:
where
As usual, the optimal matrix depends on the unknown distribution and has to be
consistently estimated. Let us consider the first term, for instance. We have:
where 0n is a consistent estimator of $oThis approximation is consistent; since k does not have a closed form, it has to
be approximated using the simulator k. To get a good approximation of fe, it is
necessary to have a large number of replications 52. Let us denote by M? 2, s =
1 , . . . 52, some other simulated values of the random term with known distribution,
the matrix
where wf is a simulated value of u, is a consistent estimator of the optimal matrix

$2*, when and 52 tend to infinity. So, whereas the derivation of the estimation of
33
9 by MSM requires only a small number of simulated values, the determination

of the optimal matrix and of the associated precision require a much larger set of
other simulated values; however, these simulations are used only once.
Finally, we may compare the optimal asymptotic variance-covariance matrix of
the MSM with the optimal asymptotic variance-covariance matrix of the GMM.
The latter corresponds to another choice of the matrix 2** = (Vb [Z(K k)])~l
is given by:
We directly note that 2** ^> 2*, which implies:
It is also clear that:
Selection of the instruments

When the optimal matrix is retained, the asymptotic variance-covariance matrix
is:
Then we can use a classical result on optimal instruments. If A and C are random
matrices of suitable dimensions, are functions of z, and are such that C is square
and positive definite, then the matrix
E0(A'Z')[E0(ZCZ')r1E0(ZA)
is maximized for Z = A'C~l and the maximum is Eo(A'C~lA).
Proposition 2.5: The optimal instruments are:
where the different functions k, k are evaluated at the

true value 9Q. With this choice the asymptotic variancecovariance matrix is:
34
When S goes to infinity, we find the well-known optimal instruments for the GMM:
and the associated asymptotic variance-covariance matrix:
Also note that when k = K, i.e. when the frequency simulator is used, we have:
and, therefore, the optimal instruments are identical in the MSM and the GMM.
2.3.5
An extension of the MSM
In the usual presentation of the GMM (see Hansen 1982), the true value of the
parameter is defined by a set of estimating constraints of the form:
where g is a given function of size q\ we consider the static case for notational
convenience. It is important to note that in the previous sections we have considered
a specific form of these estimating constraints, i.e.
What happens if we now consider the general form (2.22)? As before, we may
introduce an unbiased simulator of g ( y i t zi', 0). This simulator g ( y t , zi, ,-; 0),
which depends on an auxiliary random term uf with a known and fixed distribution
conditional on y;, n, is based on a function g with a tractable form and satisfies
the unbiasedness condition:
Then we can define an MSM estimator as a solution of the optimization problem:
35
where ust,i = 1,... ,/i, s = 1,..., S, are independent drawings in the distribution
of u, and Z; are instrumental variable functions of zt
Proposition 2.6
When n goes to infinity and S is fixed, the estimator 0Sn (Si)

defined by (2.24)
(i) is consistent, and
(ii) is such that
where
Proof. See Appendix 2A.

Example 2.3 Simulated score
It is well known that the maximum likelihood method may be considered a GMM.
More precisely, let us consider the i.i.d. case and let us denote by f(yi/zi', 0)
the conditional p.d.f. of v,- given zi The ML estimator satisfies the likelihood
equations
and is a GMM estimator based on the estimating constraints associated with the
score function:
Ifthep.d.f. has an untractable form, and if g(yi, Zt, *; 9) is an unbiased simulator

of the score; that is, if
then a consistent estimator of 9 is the MSM estimator based on g.

Example 2.4 An unbiased simulator of the score function for latent variable
models
Let us consider a latent model with i.i.d. variables (y*,zO,i 1, . . . , n, and let
us denote by /* (yf/zi; 0) the conditional p.d.f. of y* given zi If the endogenous
36
observable variable is a known function of y* : y; = a(y*), we know that the

score function is such that:
Therefore it is natural to consider the previous equality as an unbiasedness condition and to propose the unbiased simulator (3 log f * / d 0 ) ( y * s / z t ; 9), where yfs
is drawn in the conditional distribution of y* given y,, z,. If this drawing can be
summarized by a relation (in distribution), i.e.
where w, has a fixed known distribution, the unbiased simulator is:
In practice such a simulator will be used only if the function b has a simple form.
This is the case for some limited dependent variable models (see Chapter 5).
Example 2.5 Derivation of the asymptotic variance-covariance matrix for a
simulated score estimator
In the special case of the simulated score, the size of g is equal to the size of the
parameter and D is a square matrix. Therefore the asymptotic variance-covariance
matrix given in Proposition 2.6 does not depend on 7 and is equal to:
and, since
get:
In particular, if the instruments are Z = Id (these are the optimal instruments for
the GMM based on the score function), and if the simulator is based on the latent
APPENDIX 2A. PROOFS
37
score (see (2.27)), we get:
where / and 7* are respectively the information matrix of the observable model,
and the information matrix of the latent model:
The price that must be paid for the simulations is |7~1(7* - 7)7~ 1 ; as usual it
decreases as | when 5 increases and, moreover, it is proportional to the information
difference I* I between the latent model and the observable model.
Appendix 2 A: Proofs of the Asymptotic Properties of the

MSM Estimator
2A.1 Consistency
Let us consider the general form of the MSM estimator defined by (2.24):
When S is fixed and n goes to infinity, ^MSn(6) converges almost surely to:
where Eu is the conditional expectation with respect to the distribution of us

given (v, z) (i.e. with respect to the marginal distribution of u since us and (y, z) are
38
independent), is the conditional expectation with respect to the true conditional

distribution of y given z, and Ez is the expectation with respect to the distribution
ofz.
Since Eug(y, z, us; 9) = g ( y , z ; 9)andEg(y,z; #o) = 0, it is clear that this limit
is equal to zero and, therefore, minimal for 9 = OQ. Assuming that the previous
almost sure convergence is uniform in d e 9 compact and that &$ is the unique
minimum, we obtain the strong consistency of 6sn(l). Taking g ( y i , z,-, us{; 9)
K ( y t , z,-) k(zi, ust; 0), we get part (i) of Proposition 2.3.
2A.2 Asymptotic normality

The first order conditions of the minimization defining $sn(ST2) (denoted sn for
the sake of simplicity) are:
An expansion around 6$ gives:
where j denotes the j'th component.

When n goes to infinity, the last term goes to zero because:
APPENDIX 2A. PROOFS
39
It follows that:
with D = E0[zjj;(y, z; #o)L where EQ is the expectation with respect to the true
distribution of (y, z).
When n goes to infinity:
converges in distribution to Af(0, Eg), with:
This implies that */n(0Sn &) converges in distribution to:
with
Particular case
In the particular case where
g(y, z, us; 6)=K(y, z) - k(z, u5; 9)
and g(y,z;0)=K(y,z)-k(z;8),
we get the results of Proposition 2.3. Since the general form of Qs(&) directly
reduces to the third form given in this proposition with D = E0 [zj~(z, 00)], the
two other forms are obtained immediately.
Simulated Maximum Likelihood,

Pseudo-Maximum Likelihood,
and Nonlinear Least Squares
Methods
In the previous chapter we discussed the simulated analogues of the method of
moments. We are now considering simulated versions of other estimation methods such as maximum likelihood, pseudo-maximum likelihood or nonlinear least
squares. The main characteristic of the associated estimators is their non consistency when the number S of replications is fixed. Therefore theoretical consistency
may only be obtained either by performing a large number S(S -> +00) of replications, or by introducing some correction to eliminate the asymptotic bias; however,
this theoretical problem may not be so important in practice.
3.1 Simulated Maximum Likelihood Estimators (SML)

The maximum likelihood estimator of a parameter 9 is defined as:
where f(yrjzr\ 0) is the conditional p.d.f. of yT = (y\, ... , yT), given ZT_ =
(zi,..., ZT) and some initial conditions.
We are interested in problems where this p.d.f. has an untractable form, and it is
important to distinguish two cases. In the first case it is possible to find unbiased
simulators of each conditional p.d.f. f ( y t / y t - i , Z t J , 0}, also denoted f ( y t / x t ; 9),
appearing in the decomposition:
This first case often occurs if the model has a well-defined reduced form
(see (1.22)):
42
CHAPTER 3. SML, SPML AND SNLS METHODS
However, such simulators do not generally exist in dynamic models with unobservable factors (see Section 1.3.4), defined as
In this second case other approaches must be found (see Section 3.1.5).
3.1.1 Estimator based on simulators of the conditional density

functions
Let us denote by f(yt, xt, u; 9) an unbiased simulator of f(yt/xt; 0), i.e. such
that
and where the conditional distribution of u given xt, yt is known. In practice this
distribution is often independent of xt, yt.
Then we may draw independently, for each index t, S simulated values ust, s =
1 , . . . , S, of the auxiliary random term u.
Definition 3.1 A simulated maximum likelihood estimator of 0 is:
It is obtained after replacement of the untractable conditional p.d.f. with an unbiased approximation based on the simulator.
3.1.2 Asymptotic properties

Let us first study the consistency properties of such estimators. We distinguish
two cases: S fixed and S tending to infinity with T.
S fixed
If S = 1, the objective function,
tends asymptotically to the limit function:
3.1. SIMULATED MAXIMUM LIKELIHOOD ESTIMATORS (SML)
43
where g is the p.d.f. of u. We know that the true value of the parameter #o is
the solution of maxeEQ l o g f ( y / x ; 6 ) , but it is not in general a solution of the
maximization of (3.3), since the log and the integral do not commute, and 9Sr is
not consistent. This inconsistency comes from the choice of the simulator / as
an unbiased simulator of /. If log f(yt,xt, u ; 9 ) were an unbiased simulator of
log f ( y , / x t ; 0), the limit function i/r^ would have been
and would have a maximum at 90. However, in practice it is often possible to

exhibit unbiased simulators of /, but it is difficult to exhibit unbiased simulators
of log/.
S tending to infinity
If S tends to infinity, the limit objective function is:
since / is an unbiased simulator of /.

It is the same limit problem as for usual maximum likelihood estimation, and the
estimator is consistent.
Proposition 3.1
(i) The SML estimator is consistent, if T and S tend to infinity,

(ii) It is inconsistent if S is fixed and T tends to infinity.
When we compare Proposition 3.1 with the property of consistency of MSM valid
even for fixed S, the SML approaches may appear uninteresting. However:
(a) In practice it is sufficient to retain a number S of replications such that
9ST ~ Hindoo 9ST, and such a number often is of moderate size.
(b) The MSM also requires a large number of simulations in order to compute
the standard errors of the estimator.
44
(c) In finite samples, what really matters is the relative magnitude of the square
of the bias and of the variance of the estimator. As discussed below, the
magnitude of the bias may be reduced by the choice of suitable simulators, or
by the introduction of some correction terms, while the variance of an SML
estimator (close to the variance of the efficient ML estimator) is generally
smaller than the variance of an MSM estimator (close to the variance of an
inefficient GMM estimator).
(d) Finally, whereas the GMM approach may be preferred to the ML approach
since it requires less distributional assumptions (GMM is a semi-parametric
approach), this argument fails for the MSM approach which requires the
complete specification of the distribution for simulation purposes.
When the number of replications S tends to infinity, the simulation step may have
an effect on the asymptotic covariance matrix of the simulated ML estimator,
except if the speed of divergence of S is sufficiently large. The following result is
proved in Gourieroux and Monfort (1991).
Proposition 3.2
If S, T -> oo and VT/S -> 0, then the SML estimator

is asymptotically equivalent to the ML estimator.
3.1.3
Study of the asymptotic bias
An expansion for the asymptotic bias

When S tends to infinity, the bias is of order |. More precisely, we have the
following result (Gourieroux and Monfort 1991):
Proposition 3.3
When S tends to infinity, the asymptotic bias is equivalent to
where 7 ($o) is the information matrix, and
As expected, the bias depends on the choice of the simulator, and may be reduced
by a sensible selection of /. Moreover, the square of the bias compared with the
variance may be measured by:
Therefore it is small when the underlying ML estimator is precise, i.e. when [~l (00)
is small.
45
A first order correction for the asymptotic bias

As noted before, the inconsistency comes from the choice of a simulator log /
which is a biased simulator of log /. How can we partially correct this effect if
the bias is not too large? We have:
and:
Therefore log/ + |[(/ /) 2 // 2 ] is a simulator of log/ with a smaller bias

than log / in general. A first order correction of the SML estimator consists in
computing the corrected estimator:
3.1.4 Conditioning
In a number of examples described in Chapter 1, the conditional p.d.f. has an
integral form:
where is a subvector of the error term e appearing in the reduced form of the
model. In such a case it is possible to introduce the simulator:
f ( y , x, u; 0) f * ( y / x ; u; 9), where u has a distribution with p.d.f. g. (3.6)
46
It is also possible to introduce an importance function (p with the same support

as g, such that <p(u) > 0, / <p(u) du = 1, and to introduce the simulator:
where u has a distribution with p.d.f. <p.

We give below some examples where some conditioning naturally appears.
Example 3.1 Panel probit (or Tobit) model with individual effect
In this model the latent variables are indexed by individual and time and follow a
linear model with individual random effect:
where M,, tu,-( are independent variables with standard normal distributions. The
observable endogenous variables are either
yit = i\(y*>0-) probit model (crw may be constrained to 1),
or yit = yf^y*>o) Tobit model.
The simulators may be based on the conditional distribution of y given z and
the individual effect u, since it is easy to integrate out the other random terms w
because the Wit,t = l,...,T are independent. We get:
(a) probit model:
(b) Tobit model:
Example 3.2 Panel probit (or Tobit) model with individual effect and serially
correlated disturbances (Hajivassiliou and McFadden 1990, Keane 1990a, b,
Stern 1992, Gourieroux and Monfort 1993a)
The latent variables satisfy:
47
where uf, wit are independent variables with standard normal distribution. The
error term (e,-r) satisfies an autoregressive formulation and the initial value e,-i is
assumed to follow the marginal distribution associated with this autoregressive
scheme. For a probit (or Tobit) model based on this latent variable, the p.d.f. of
y conditional on z and u is no longer tractable (since the e,(, t = 1 , . . . , T are
correlated), but another conditioning is available. Let us consider the global error
term for individual i. It is a T-dimensional vector:
where ej is the vector whose components are equal to one. The variancecovariance matrix of this error term is:
where the entries of JT are one, and the generic entry of 1 is u>ij [cr2/
(l-p 2 )]pl''--' - l.
Lemma 3.1
eit may be decomposed into:

where (e^), (e'2)) are independent, and (e^) is a Gaussian white
noise with variance: <7 2 /(l + |p|)2 = a^.
Proof. The spectral density function of (sit, t e Z) is equal to:
It has a monotonic form with a minimum value for 0^(1 + |;0|)~~2 and the result
follows.
QED
Now we may consider another decomposition of the global error term:
where e,a) and vt are independent, vt - WfO, a%JT + i- aÎdT]. Therefore we

can write:
where
and
If we condition with respect to u*, i.e. if we integrate out e ( a) , we obtain the

following conditional p.d.f.:
48
(a) probit model (we can take a^ = 1):
where a( is the rth row of matrix A.

(b) Tobit model:
3.1.5 Estimators based on other simulators

In dynamic models where latent variables are present, it is often impossible to find
unbiased simulators of the conditional p.d.f. f(yt/yt-\,Zt\ 0), and the methods
presented in the previous sections do not apply. In this kind of model the overall
p.d.f. of the observable and latent endogenous variables (yx, yj-) can be written:
and the likelihood function is:
The likelihood function appears as a T-dimensional integral (if y* is univariate)

of a function which in general does not have a closed form.
Let us consider, for instance, the typical M* model (see example 1.17):
49
where et = (s(t, s'2t) is a white noise whose distribution is known. If s\t and
e-it are contemporaneously independent, the function f(yt/Zt_, yt-i, y*\ #) (resp.
/(y*/Zr, Vt-i, y*-i> #)) appearing in the previous integral is the p.d.f. of the image distribution of the distribution of s\, (resp. s2t) by r\(zt, y<-i, y * , - ' , Q) (resp.
r2(zt,yt-i.y,*-i:-;0)).
For this kind of likelihood function the ML method is untractable; moreover,
the previous SML methods do not apply either. In this context three kinds of
solution have been proposed. The first one is based on numerical approximations and will not be described here (see Kitagawa 1987, and Chapter 6 below),
the second one is based on simulations of the whole likelihood function using
the importance sampling technique, and the third one is based on simulations of
"[log /(yr, y-r/zr., yo, Jo' Q)/yr] in me Expectation Maximization (EM) algorithm.
Importance sampling methods1
As previously seen, the likelihood function naturally appears as the expectation of the function \\5=lf(ytlzt_,yt-\,y*',0~) with respect to the p.d.f.
nLi/Cy*/?i' y-i' y*-i'< 0~)> where z and yt-\ are observed values. It is important to note that this p.d.f. is neither f ( y ^ / z r _ , yo, y^; 0) (except if yt_\
does not appear in f ( y * / Z t , yt-\, y*_i; 0), i.e. if (y,) does not cause (yf)), nor
f(yr/yT> ZT_> Jo. yo> ^)- However, it may be easy to draw in this p.d.f.; for instance, in the M* model such a drawing is recursively obtained by using the
formula
where es2t, t 1 , . . . , T, are independent drawings in the distribution of s2. Therefore an unbiased simulator of the whole likelihood function /(yr/Zr. yo, y^; #)
is:
where the yst (9) are drawn in the auxiliary p.d.f. mentioned above. This method
is clearly an importance sampling method.
This basic importance sampling method may be very slow, in the sense that the
simulator may have a large variance; therefore accelerated versions of this method
have been proposed, in particular the Accelerated Gaussian Importance Sampling
method in the Gaussian case (see Danielsson and Richard 1993), and the Sequentially Optimal Sampling methods of various orders in the switching state space
models (see Billio and Monfort 1995, and Chapter 7 below).
'See Danielsson and Richard (1993); Billio and Monfort (1995).
50

Simulated Expectation Maximization (SEM) algorithm2
The log likelihood function can be written:
Since the LHS of the equation does not depend on y%., we have, for any values 0^
of the parameter,
Let us define 6* ((+1) as the value maximizing the first term of the RHS with respect
to 9. Using the Kullback inequality, it is easily seen that the # (i+1) thus obtained
is such that:
This is the principle of the EM algorithm, which is an increasing algorithm such that
the sequence 0(l) converges to the ML estimator. The problem with this algorithm
is that, although log / ( y T , y*T /zj_, y0, y; 9) has in general a closed form, the same
is not true for its conditional expectation:
which is the quantity to be maximized.

In the SEM algorithm, this expectation is replaced by an approximation based on
simulations. So the problem is now to be able to draw in the conditional distribution
of y given yj, ZT, yo, Jo and ^- There is no general solution to this problem.
Shephard (1993), in the context of a nonlinear state space time series model, has
used the Metropolis-Hastings algorithm to solve this problem (see Appendix 3A).
3.2 Simulated Pseudo-Maximum Likelihood and

Nonlinear Least Squares Methods
3.2.1 Pseudo-maximum likelihood (PML) methods3
These methods are semi-parametric methods. They require the specification of
some parametric form for the conditional moments of interest, generally first order
moments (PML of order 1), or first and second order moments (PML of order 2).
2
3
See Shephard (1993).

See Gourieroux et al. (1984fc, 1984c) and Gouricroux and Monfort (1993fo).
3.2. SIMULATED PSEUDO-MAXIMUM LIKELIHOOD METHODS
51
Pseudo-maximum likelihood of order 1

Let us introduce a parametric specification for the first order dynamic conditional
moment:
where m is a tractable known function. We assume that there exists a unique

value 0o of 9 such that m (xt; 6>o) is the true conditional expectation of y, given x,.
In the PML approach we also introduce a family of p.d.f.s /(y; m) indexed by the
mean. The PML1 estimator is the solution of:
Therefore we proceed as if the conditional distribution of y, given xt were

f(yt', m(x,; 9)), and as if 9j were the associated maximum likelihood estimator.
But the family / is in general misspecified (f(yt', m(xt; 9o)) is not the true
conditional p.d.f. of yt given xt) and the interpretation of T as a maximum
likelihood estimator is invalid. If the pseudo-family f ( y ; m) is chosen arbitrarily,
the corresponding PML1 estimator is, in general, inconsistent. However, there
exist pseudo-families which ensure the consistency of OT .
Proposition 3.4
The PML 1 estimator is consistent if and only if the family

is such that:
f ( y ; m) = exp[A(m) + B(y) + C(m)y}.
(see Gourieroux etal. 1984b).

The p.d.f. has an exponential form with a cross term in m and y, which is linear
in y. For this reason such families are called linear exponential families. For
such a choice the criterion function becomes:
The estimator satisfies the first order conditions:
or, since m(x,; 6) is the conditional mean of y,,
52
It is an orthogonality condition corresponding to the first order conditional

moment, with instruments *jf(xt; 90)~^[m(x,; 00)]- Several classical families of
distributions belong to the class of linear exponential families. Some examples
are given below along with the associated criteria.
Example 3.3
We have:
Univariate normal family (with unit variance)
The associated criterion is such that:
and the PML1 method based on this normal family is the nonlinear least squares
method.
Example 3.4 Multivariate normal family
This kind of family is suitable for multivariate endogenous variables. The p.d.f.
associated with the normal distribution N[m, ], fixed, is:
The PML1 estimator is a weighted nonlinear least squares estimator:
Example 3.5
Poisson family
We have:
The estimator is defined by:
53
and can be computed only if the conditional mean m(xt; 8) is always strictly
positive.
Some other linear exponential families include the binomial, negative binomial,
multinomial, and gamma families.
Pseudo-maximum likelihood of order 2
A similar approach may be followed if we specify both first and second order
dynamic conditional moments:
We now introduce a pseudo-family indexed by its mean and its variance-covariance

matrix. This pseudo-family is denoted by f ( y ; m, ), and the PML2 estimator is
defined by:
The following proposition is the analogue of Proposition 3.4.

Proposition 3.5
The PML2 estimator is consistent if and only if the family

is quadratic exponential, i.e. has the form:
Example 3.6 Multivariate normal family

This is the simplest example of a quadratic exponential family. The estimator is
defined by:
In the literature it is often called a quasi-maximum likelihood estimator, and

is widely used for nonlinear simultaneous equation models and pure time series
models.
54

Asymptotic properties of the PML method
Proposition 3.6
As soon as the PML estimator is consistent, it is also

asymptotically normal:
where :
Proof, see Gourieroux et al. (1984&).
PML based on static conditional moments

Until now we have specified the form of some dynamic conditional moments, but
in some problems it is the form of static conditional moments that is given (see
Laroque and Salanie 1993):
The PML1 and PML2 approaches may be applied as before; the condition for
consistency, i.e. the choice of pseudo-family, remains the same, but the form of
the asymptotic covariance matrix of the PML estimator has to take into account
the serial correlation of the pseudo-score vector 3 log ft/dO. We have:
where:
3.2.2
55
Simulated PML approaches

Description of the methods
We are now interested in a parametric model whose likelihood function is untractable. When the likelihood function has no closed form, the same is likely to
be true for dynamic or static conditional moments, and the exact PML methods
cannot be used. In such a case we may extend the PML approaches by introducing
approximations of first (and second) order conditional moments in the expression
of the pseudo-log likelihood function. These approximations are based on simulations. The simulations will be conditional on yt~\, Zt_ in the case of dynamic
conditional moments, and will be path simulations (conditional on z\, , IT, .yo)
in the case of static conditional moments.
Example 3.7 Simulated nonlinear least squares based on the dynamic conditional mean
We consider an unbiased simulator of the first order dynamic conditional moment
m(x,, u; 9), where u has a known distribution (conditionally on xt), and such that:
The simulated nonlinear least squares estimator of 9 is:
where the ust are independent drawings of u.

Example 3.8 Simulated PML1 estimator based on the dynamic conditional
mean
More generally, we may use any linear exponential family and define the SPML1
estimator:
Example 3.9 Simulated PML2 estimator based on the static conditional

moments
Let us introduce simulated paths (ys,(9)), for given values of zi,..., ZT, 9 and
initial values yo of y. Then | ]Tf=1 yst (9) is an unbiased simulator of:
and:
56
is an unbiased simulator of Vg(y1/Zt) = (z,; 0).

The SPML2 estimator is obtained by solving the program (3.15) after replacement
of m(zj\ 0), Ti(zt; 9) by the previous approximations.
These are similar to the properties of the SML method (Gourieroux and Monfort 1991, Laroque and Salanie 1989, 1993, 1994).
Proposition 3.7 (i) The SPML estimators are consistent if T and S tend
to infinity. They are inconsistent if S is fixed and T tends
to infinity.
(ii) They have the same asymptotic distribution as the
associated PML estimators if -v/T/5 > 0.
3.3 Bias Corrections for Simulated Nonlinear Least

Squares
We consider a static model yt = r(zt, u,; 9), and simulated nonlinear least squares
based on the complete simulator m(zt,u;9) r(zt,u;0). It would be easy to
extend the approaches below to other kinds of simulator and PML approaches.
The SNLS estimator is:
It satisfies the first order conditions:
3.3.1
Corrections based on the first order conditions
Asymptotically, i.e. when T tends to infinity, S fixed, the first order conditions are:
where 9Soc = \imTOsT- Let us introduce the conditional mean m ( z t ; 0 )

E0(y,/zr). We get:
3.3. BIAS CORRECTIONS FOR SNLS
57
or:
We note that the solution dsoo is different from the true value 9$; therefore the
SNLS estimator is inconsistent for fixed S. Moreover, we see that the asymptotic
bias is of order |. This bias comes from the correlation introduced between the
'instruments' f^fo, ust; #$00) and the residuals y, r(zt, ust; ftsoo)This correlation vanishes if different simulations are used for the instruments and
for the moments, i.e. if the estimator is defined as a solution of:
where ust and u\ are independent drawings in the distribution of u. Note that this
modified estimator is an MSM estimator, since we have the moment condition:
and we may choose as conditioning variables zt = (zt, u ] , . . . , f*). Therefore

the results of Chapter 2 apply.
This corrected estimator is consistent for S fixed and T tending to infinity, and its
asymptotic variance-covariance matrix is (for S* = S) given by:
3.3.2
Corrections based on the objective function
This method is analogous to the one presented in Section 3.1.3 for the SML
approach, but since the criterion function is quadratic it will provide an exact bias
correction. Let us consider the limit of the criterion function:
58
There are three terms in the previous decomposition. It is clear that 6 = 90 gives
the minimum of the sum of the two first terms and that the asymptotic bias is created
by the third term. An idea proposed by Bierings and Sneek (1989) and Laffont
et al. (1991) consists in modifying the criterion function in order to eliminate this
term. Let us consider the following estimator:
with S > 2. The limit objective function is now E0Vo(yt/Zt) + Eo[m(zt', OQ)
m(zt', 6)]2, and 6ST is consistent for 5 fixed, T tending to infinity.
Appendix 3A: The Metropolis-Hastings (MH) Algorithm

3A.I Definition of the algorithm
The aim of the Metropolis-Hastings algorithm is to simulate in a distribution
P for which no direct simulation method is available and the p.d.f. of which,
denoted by /, may only be known up to a multiplicative constant.
The MH algorithm is defined as follows (see Metropolis et al. 1953; Hastings
1970):
(i) Choose any starting value x^.
(ii) At iteration r + \, draw v (r) in a candidate (or instrumental) distribution
defined by the p.d.f. g ( y / x M ) and take
APPENDIX 3A. THE METROPOLIS-HASTINGS ALGORITHM
59
,<'+'> = y> with probability

x(r+D
_ ^(T) otherwise.
(iii) Change T in T + 1, go to (ii).

It is important to note that, since / only appears in p through a ratio, the normalizing
constant may be unknown. Also note that if g(y/x) = g(x/y), the function g no
longer appears in p. Finally, if the starting value jc(o) is such that f ( x ( o ) ) > 0, it
is seen that /(;c(T+1)) > 0 for any T, since a yM satisfying /(y (r) ) = 0 implies
p(y^\ * (T) ) = 0 and is not chosen.
3A.2 Properties of the algorithm

A useful preliminary lemma is the following.
Lemma 3A.1
Let us consider a Markov chain defined by the transition

probabilitiesd<2(;yA) = q(y/x)dy+n(x)dex(y), where
ex is the unit mass at x and n(x) = 1 f q(y/x)dy; if
f(x)q(y/x) = f(y)q(x/y), the distribution P defined
by the p.d.f. / is an invariant distribution of the chain.
Proof. For any mesurable set A we have:
QED
60
Using this lemma we can show the basic property of the MH algorithm.
Proposition 3A.1 Consider a MH algorithm (in which the p.d.f.s g(-/x) have
the same support as /). The MH algorithm defines a
Markov chain admitting P as an invariant distribution.
Proof. The Markov chain defined by the MH algorithm is such that
q(y/x) = p(y,x)g(y/x).
Therefore, we have :
and the previous lemma applies.

QED
Under additional assumptions, ergodicity results such as j XlT=i M-*^)~ ^ Eph

a.s. (for any P-integrable function K) can also be shown (see Tierney 1994; Robert
1996); in particular this result holds if / and g are strictly positive and continuous.
The MH algorithm is often used to draw in a conditional p.d.f. f ( x / z ) when only
the joint p.d.f. f ( x , z) is known, since f ( x / z ) is proportional to f ( x , z).
The MH algorithm can also be used within the Gibbs algorithm in order to simulate
in some conditional distributions. In this case it is important to note that the Markov
chain obtained by using only one draw of the MH algorithms for each step of the
Gibbs algorithm still has the correct invariant distribution, i.e. the one of the
genuine Gibbs algorithm (see Robert 1996).
Indirect Inference
4.1 The Principle1
4.1.1 Instrumental model
When a model leads to a complicated structural or reduced form and to untractable
likelihood functions, a usual practice consists in replacing the initial model (M)
with an approximated one (Ma) which is easier to handle, and to replace the
infeasible ML estimator of the initial model,
by the ML estimator computed on the instrumental model:
Since (Ma) and f are misspecified, the approximated ML estimator is generally

inconsistent. The indirect inference will use simulations performed under the
initial model to correct for the asymptotic bias of 0%. To describe the approach, it
is first important to distinguish the parameters of the two models (M) and (Ma).
We denote by ft the parameters of (Ma). The idea is as follows.
We first compute the PML estimator of ft using model (Mfl):
In parallel, we simulate values of the endogenous variables yst(9) using model (M)
and a value 9 of the parameter. As in previous chapters, we replicate S times such
simulations. Then we estimate the parameter ft of the instrumental model from
these simulations. We get:
Note that for dynamic models we use path simulations.

'See Gourieroux et al. (1993), Smith (1993), and Gallant and Tauchen (1996).
62
CHAPTER 4. INDIRECT INFERENCE
FIGURE 4.1: Indirect inference.

Finally, an indirect inference estimator of 9 is defined by choosing a value OST for
which J3T and ftST (9) are as close as possible:
where ^ is a symmetric nonnegative matrix, defining the metric. This procedure

is summarized in Figure 4.1.
As usual, the estimation step is performed with a numerical algorithm, which
computes OST(&) as:
where:
and h is the updating function of the algorithm. Therefore a better diagram is
given in Figure 4.2.
4.1.2 Estimation based on the score

Some other estimation criteria may be considered. For instance, instead of estimating 9 through the PML estimators of /}, we can consider directly the score
4.1. THE PRINCIPLE
FIGURE 4.2: Numerical implementation of indirect inference.
63
64
function of the auxiliary, or instrumental, model. It is given by:
and is equal to zero for the PML estimator of ft. An approach proposed by Gallant
and Tauchen (1996) selects a value of 0 such that:
is as close as possible to zero:
where E is a nonnegative symmetric matrix.
4.1.3 Extensions to other estimation methods

In Section 4.1.1 the estimation of 9 is based on the comparison of PML estimators of ft. However, it is possible to generalize this approach. In such
extensions we consider the initial model (M) whose log likelihood function is
Ylt=i lBf(yt/yt-ii ^t', 0)', we introduce an auxiliary parameter ft and an estimation method of ft based on the maximization of a criterion function (satisfying
some technical conditions given in Appendix 4A):
where:
Estimation based on auxiliary estimators

From simulated values of the endogenous variables yst(9), t = 1 , . . . , T, s =
1 , . . . , S, we can compute:
and the indirect inference estimator is defined by:
4.1. THE PRINCIPLE
65
where & is a symmetric nonnegative matrix.

Note that we obtain asymptotically equivalent indirect inference estimators of 6
(see Appendix 4A) if, in the previous minimization, #57- (0) is replaced by
either
with
or j}ST(0) with j8Sr(0) = argmax^ ^[^(e*),?^; ^], where the

sequence of values for the exogenous variables is periodically repeated:
ZkT+h-Zh,
k = Q,...,S-l,h = l,...,T.
However, the last equivalence requires an additive decomposition for the derivative
of the criterion function
which is the case if
(M-estimator type criterion), or if ^r is a moment criterion of the type
(see example 4.1).

Estimation based on the score
In this approach the estimator is denned by:
where E is a symmetric nonnegative matrix.
66
4.2 Properties of the Indirect Inference Estimators

4.2.1 The dimension of the auxiliary parameter
Even if we do not want to start a precise discussion of the identification issues at
this stage (see Section 4.4.2), it is useful to make some remarks on the dimension
of the auxiliary parameter ft.
First, in order to get a unique solution (or 0) to the previous optimization problem,
the dimension of the auxiliary parameter ft must be larger than or equal to the
dimension of the initial parameter 6. It is a kind of order identifiability condition.
Second, if the problem is just identified, i.e. if dim ft = dim 6, the different methods
become simpler.
Proposition 4.1 If dim ft = dim 0, we have, for T sufficiently large:
(i) #57" (2) = OST independent of 2;
(ii) 0sr() = GST independent of S;
(iii) ST = ST.
Proof.
(i) In the just identified case, ST (2) is the solution of the system
(since for such a choice the criterion function takes the minimal possible
value 0 for T sufficiently large) and therefore it is independent of 2.
(ii) Similarly, in the just identified case, OST(^) is the solution of the system:
and is independent of S.
(iii) Finally, if
has a unique solution in ftT, we deduce that this solution is Ayr(#s;r)> and
from (ii) that it is equal to ftT. From (i) we know that f)T /6sr(#sr)> and
therefore OST = &ST-
QED
4.2. PROPERTIES OF THE INDIRECT INFERENCE ESTIMATORS
4.2.2
67
Which moments to match?
The title of this subsection refers to the paper by Gallant and Tauchen (1996). The
question is: What is the underlying parameter on which the estimation process is
based?
This value may be specified only if we consider the asymptotic optimization problem. Let us consider for instance the PML methods used in Section 4.1.1. The
criterion is asymptotically:
The solution of this asymptotic problem is the function
called the binding function.

The estimation of 0 is based on the pseudo-true value of ft, i.e. the value of the
binding function evaluated at the true value of OQ: b(0o).
The indirect inference based on auxiliary PML estimators consists in:
(i) determining ftr, a direct consistent estimator of b(6o),
(ii) determining ftsr(-), a direct consistent estimator of function b(-) (when
T - oo, S fixed),
(iii) and then solving approximately b(80) = b(6) to get an estimator of 90.
Note that the maximization of the criterion Eg log fa(yt/xt; ft) with respect to ft
is equivalent to the minimization of the Kullback-Leibler information criterion:
which gives the proximity between the two conditional distributions f(yt/xt; 9)
and fa(yt/x,; ft); fa(y,/x,; b(9)) corresponds to the distribution of (Ma) that is
the closest to f(yt/x,; 0).
In some specific cases the parameter b(9) may admit an interpretation in terms of
moments, but in general it has a much more complicated interpretation.
Moreover, as shown in the following example, MSM methods on static conditional
moments are particular cases of indirect inference methods.
68
FIGURE 4.3: The binding function.

Example 4.1
If the criterion i//T is of the form:
the pseudo-true value of ft is the marginal moment of k(yt, xt):
In this case, the indirect inference approach is asymptotically equivalent to an

MSM approach. Indeed, we have:
and:
4.2. PROPERTIES OF THE INDIRECT INFERENCE ESTIMATORS
69
is an MSM estimator based on the static conditional moment E[k(yt, zt)/Zr] and
on the identity instrument matrix. In the pure time series case, i.e. when no
exogenous variables are present, we find the MSM method proposed by Duffie
and Singleton (1993).
4.2.3
The asymptotic properties of the indirect inference estimators are given below for
a general criterion function such that
converges to a deterministic limit denoted by ôo(d, p ) , and such that the binding
function b(9) = arg min^ V*oo(#, ft) is injective. We introduce the two matrices:
Proposition 4.2
The indirect inference estimator ST(^) is consistent,

asymptotically normal, when S is fixed and T goes to
where:
Proof. See Appendix 4A.

As for the MSM, the effect of simulations is summarized by the multiplicative
factor 1 +Ji
In the just identified case (dim ft = dim 9) we have seen (Proposition 4.1) that the
estimator and, therefore, its asymptotic precision are independent of Q. Indeed
since ~ (00) is invertible, the variance-covariance matrix reduces to:
70
Similar results may be derived for estimators based on the score. They are direct
consequences of the following result (see Appendix 4A).
Proposition4.3
The estimators ST(E) and ^(/oEJo) are asymptotically equivalent:
The indirect inference estimators 6ST (2) form a class of estimators indexed by the
matrix !T2. The optimal choice of this matrix is l = 2*.
Proposition 4.4
For the optimal matrix Q* = JO!QI J0, the asymptotic

variance-covariance matrix of the indirect inference estimator is simplified in:
Of course, the optimal choice of for estimators based on the score is E* I Q [ .

The expression of the asymptotic variance-covariance matrix contains the derivative of the binding function at the true value. It is possible to estimate this quantity
consistently without determining the binding function and its derivative. b(6) is
the solution of:
Equivalently, it satisfies the first order conditions:
A derivation of this relation with respect to 9 gives:
it implies:
4.3. EXAMPLES
71
We deduce an expression of the asymptotic variance-covariance matrix of the

optimal indirect inference estimator which may be directly computed from the
criterion function:
A consistent estimator of this matrix can be obtained by replacing ^r^, by ^T, b(0o)
by PT, and /o by a consistent estimator based on simulations (see Gourieroux et
al. 1993: Appendix 2). Significance and specification tests can also be based on
the indirect inference method (see Gourieroux et al. 1993).
4.2.4
Some consistent, but less efficient, procedures
Some symmetrical calibration procedures might also have been introduced. For
instance, we might have considered:
or an estimator based on the score function (see Smith 1993):
However, it can be checked that these methods, while consistent, are generally less
efficient than the optimal ones we have described in the previous subsection.
4.3 Examples
The main step in the indirect inference approach is the determination of a good
instrumental model (Ma) or a good auxiliary criterion tyT (which may be an approximation of the log likelihood function). We now give some examples of the
determination of instrumental models in which the major part of the initial modelling has been kept. Some other examples, specific to limited dependent variable
models and to financial problems, are given in Chapters 5 and 6 respectively.
4.3.1 Estimation of a moving average parameter2

The initial model
Let us consider a moving average process of order 1:
where (e f ) is a Gaussian white noise with variance 1.

2
See Gourieroux et al. (1993) and Ghysels et al. (19946).
72

The instrumental model
It is known that pure autoregressive formulations are easier to estimate than

pure moving average formulations, since the pseudo-maximum likelihood method
based on Gaussian errors coincides with ordinary least squares. So, the auxiliary
model that is introduced is:
where u, is a Gaussian white noise. To get simple estimation procedures, we do

not constrain the autoregressive parameters to be compatible with the truncated
autoregressive version of the MA(1) formulation, i.e. such that $ = ft'. Therefore
we consider an unconstrained estimation of the autoregression coefficients. The
associated pseudo-true values depend on the choice of the autoregressive order r;
for instance, for r = 1,
and for r = 2:
Monte Carlo studies

We will compare different estimation methods of parameter 0: the direct maximum
likelihood procedure applied to the MA(1) process, which gives an asymptotically
efficient estimator (we used the Kalman filter), and three indirect inference methods
using an AR(1), an AR(2), or an AR(3) model as the instrumental model.
The comparison is performed through a Monte Carlo study with 60 = 0.5 and 200
replications. The number of observations is T 250, and we use only one
simulated path: 5 = 1 . Moreover, we do not use the optimal indirect inference
estimator, but only an estimator with Q. Id. In Figures 4.4, 4.5, and 4.6, we
show the estimated finite sample p.d.f. of the ML estimator and of the three indirect
inference estmators. Table 4.1 gives the empirical mean, the standard deviation,
and the square root of the mean square error for each of these estimators. For all
methods the finite sample bias is small. For one lag the efficiency loss is rather
important, but the efficiency is almost reached for r = 2 and completely reached
for r = 3. Furthermore, it is worth noting that in this experiment the computation
of the indirect estimator based on r = 3 is about eighteen times faster than the
computation of the ML estimator.
4.3. EXAMPLES
73
RGURE 4.4: The p.d.f. of the ML estimator (

estimator, r = 1 (
).
) and of the indirect inference
FIGURE 4.5: The p.d.f. of the ML estimator (

estimator, r = 2 (
).
74
FIGURE 4.6: The p.d.f. of the ML estimator (

estimator, r 3 (
).
TABLE 4. 1 : Summary statistics for the four estimators

Estimator
Basedonr = l
Basedonr = 2
Based on r = 3
ML
Mean
0.481
0.491
0.497
0.504
Standard
deviation
0.105
0.065
0.053
0.061
Root mean
square error
0.106
0.066
0.053
0.061
4.3. EXAMPLES
4.3.2
75
Application to macroeconometrics
Macroeconometrics is also a source of potential applications for indirect estimation

methods. In this subsection we consider the practice of linearization in macroeconomic models without or with latent models. Some other examples are related
to dynamic optimization problems. (The first application of indirect methods to
such a problem is due to Smith (1993).)
Model without latent variables
We consider a dynamic model of the form:
where y and u have the same dimension. If g is a complicated nonlinear function

or if the ut are correlated, (4.18) may be difficult to handle. It is often replaced by
a linearized version with respect to u, around zero:
or, with respect to ut and yt-i, around zero and a long-run equilibrium value y
(often taken equal to the historical average j Y^=i Jt in practice):
We may apply the indirect inference approach using either (4.19) or (4.20) as
auxiliary model and ft as auxiliary parameter. In such a case, the approach corrects
for the linearization bias.
We may also apply indirect inference from (4.20) with the auxiliary parameter
(ft', y')'. In this case we simultaneously estimate the 'implicit' long-run equilibrium associated with the linearized version.
Finally, there is no strong reason for expanding around u = 0 rather than around
another point u. This means that another approximated model is:
with auxiliary parameter
76

Model with latent variables
Let us assume that the model can be put in the nonlinear state space form:
where y* is a state vector which is in general (partially) unobserved and (e't, r)',)' is
a Gaussian white noise. The extended Kalman filter (Anderson and Moore 1979)
could be used to compute an approximate log likelihood function, but this estimator is inconsistent. It could also be used as a first step estimator in the indirect
estimation procedure, which in a second step provides a consistent and asymptotically normal estimator. In this example it is directly a criterion function (i.e. an
algorithm) that is used, without looking for a tractable instrumental model.
4.3.3 The efficient method of moment

To get precise indirect inference estimators based on an instrumental model, it is
intuitively possible to follow two kinds of approach. In the first one we select a
simple auxiliary model, close to the initial model and containing a comparable
number of parameters. In such a case this instrumental model often has a natural
structural interpretation, and a number of examples of this kind are described in
Chapters 5 and 6. Another approach consists in introducing an auxiliary model,
essentially descriptive, with a large number of parameters, providing a good approximation for any distribution when the number of parameters tends to infinity.
In such a framework the 'parameters' of this instrumental model do not have any
structural interpretation, and are essentially used for calibration. This second approach has been applied with some semi-nonparametric models as an auxiliary
model by Gallant and Tauchen (1996), and Gallant et al. (1994), and called the
efficient method of moments, since the idea is to reach the asymptotic efficiency
by increasing the number of auxiliary parameters.
For instance, when the observations correspond to a pure time series, they consider
an auxiliary model where the p.d.f. is proportional to some polynomial times the
p.d.f. of a normal distribution such as:
where:
4.4 SOME ADDITIONAL PROPERTIES
77
where yr is the sample mean, a? the sample standard error, t a logistic map, and
P is a polynomial in v, whose coefficients are polynomial in y*_^,..., y*_p.
4.4
Some Additional Properties of Indirect Inference

Estimators
In this section we have gathered some additional theoretical results concerning the
indirect inference approach. They concern the second order expansion of indirect
inference estimators and particularly their ability to reduce the finite sample bias,
and a definition of the indirect information on the parameter of interest contained
in the auxiliary model.
4.4.1 Second order expansion

The form of the expansion
For notational convenience we consider an auxiliary parameter ft with the same

size as the parameter of interest 9. We assume that the first step estimator fa
admits the second order expansion:
where A(v; 90), B(v; OQ) are random vectors, depending on some asymptotic random term v, and where the equality is in a distribution sense. We have previously
seen that the first order term A(v; OQ) follows a zero-mean normal distribution.
Considering, for the sake of simplicity, the pure time series framework, i.e. without
explanatory variables, similar expansions are also valid for the first step estimators
based on simulated data. We consider S replications and, for each index s, T
simulated values {y/(#), t = 1 , . . . , T}. The first step estimator associated with
this set of simulated values $ST (9) (say) is such that:
where the asymptotic random terms v, vs, s = 1 , . . . , S, may be considered as

i.i.d. by definition of the drawings.
78
Let us now consider an indirect inference estimator 9Sr defined as the solution of
the system:
We shall discuss the second order expansion of
in terms of functions A and B. From (4.25), we deduce:
and, taking into account the expansion (4.26) of ST-
The identification of the first and second order terms of the two members of the
equality provides the following terms in the expansion for
Proposition 4.5
79
From these expressions, we can deduce the second order bias of the indirect inference estimator denned by (4.25). We have:
Since a* is zero mean, and A(v; 90),A(vs; 9o),s 1 , . . . , 5, arei.i.d., we deduce:
It may be noted that, whereas the initial second order bias of the first step estimator
is:
the second order bias of the indirect inference estimator no longer depends on the
second order terms B(-;6o).
The case of a consistent first step estimator
Even if indirect inference is useful mainly when the initial model is untractable, it
might also be used when a consistent estimator of 9 is easily available, i.e. when
80
there exists a consistent first step estimator #r(0o) such that b(00) = 90. In such
a framework we will see (as noted by Gourieroux et al. (1994); McKinnon and
Smith (1995)) that indirect inference is a possible approach for correcting for finite
sample bias. In some sense the correction is similar to the one based on the median
proposed by Andrews (1993), or to a kind of bootstrap approach.
By applying the previous proposition to this particular case, we obtain some simplifications on the second order expansion of the indirect inference estimator.
Proposition 4.6
If b(6) = 9, then
In particular, we deduce that, when S tends to infinity, we have:
The indirect inference estimator is simply equivalent to the initial estimator corrected for the second order bias. When the number of replications S is finite, the
second order bias of the indirect inference estimator is smaller in absolute value
than the one associated with the first step estimator as soon as:
which gives the limit number of replications providing this improvement.

Example 4.2
To illustrate the previous effect, let us consider the simple case of i.i.d. observations
yi,..., yr with a normal distribution N(m, a 2 ). The true values m0, a% are
unknown. If the first step estimator is the maximum likelihood estimator, we
have:
and we know that Oj is second order biased . Now we have:
81
where ys,(0) = m + aust, ust ~ IIN(0,1). Therefore we have:
Finally, the indirect inference estimator ojT of a2 is given by:
or:
The finite sample distribution of a2, is such that:
whereas the finite sample distribution of ajT is such that:
In Figure 4.7 we give the p.d.f. of aj/a0 and of o\Tjo^ for T = 20 and 5 = 10.
In the limit case, S = +00, we get (T - l)6-r/a02 ~ %2(T - 1), and a^T is
unbiased.
Of course, when S is small the gain in the bias is balanced by a loss in the variance.
In the example, the exact first and second order moments are:
82
FIGURE 4.7: The p.d.f. of the estimator with and without correction by indirect
inference T = 20, S = 10, ( ): x2(T - 1), ( ): F[T - 1, S(T - 1)].
The mean square errors are:
MSET
MSEST
For large T, we have:
4.4.2
Indirect information and indirect identification
Let us go back to the context of Sections 4.1.1 and 4.1.2, i.e. to the case where the
criterion 1/^7- is equal to:
83
where fa(yt/x,; /6) = fa(yt/yt:, V, ft) is the conditional p.d.f. of some instrumental model (Ma); in other words, ^r is equal to jLj,(fi), where Lj(fi) is the
log likelihood function of (Ma). We still assume that the true conditional p.d.f.
belongs to the family f ( y t / x t ; 0) associated with model (M).
Let us introduce the usual Fisher information matrices of (M) and (Ma):
Note that in the expression of Ia(fi) the variance-covariance matrix is taken with
respect to the distribution defined by the f a ( y t / x t ; /6), t l,...,T.
We assume that both of them are invertible, which implies the local identifiability
of both models.
It is natural to say that (M) is locally indirectly identifiable from (Ma) if the binding
function b(0) is locally injective, and to introduce the indirect information matrix
of (M) based on (Ma) as the asymptotic variance-covariance matrix (under M)
of the vector obtained by the orthogonal projection of the components of:
on the space spanned by the components of:
Note that the unconditional expectation
is equal to zero because of the definition of b(6), but that, in general, the conditional
expectation
is different from zero.

It is clear that, under (M), the indirect information matrix, denoted by 11(9),
is smaller, in the usual sense, than 1(6). Moreover, 11(0) has the following
properties.
84
Proposition 4.7
(i) The indirect information matrix 11(0) is equal to:
where:
(ii) If the matrix 11(6) is invertible, (M) is locally indirectly identifiable from (Ma).
Proof. See Appendix 4B.
The first part of this property shows that, when the criterion of the indirect inference method is the log likelihood of an auxiliary model, the asymptotic variancecovariance matrix of the optimal indirect inference estimators is (see Proposition 4.4):
When S tends to infinity we obtain a kind of indirect Cramer-Rao bound, which

is, as in the direct case, the inverse of the Indirect Information matrix.
The second part of Proposition 4.7 shows that the links between information and
local identifiability are identical in the indirect and the direct cases.
Appendix 4A: Derivation of the Asymptotic Results

In this appendix we just sketch the proofs of the asymptotic properties of indirect
inference estimators. The aim is to understand why these estimators are consistent,
to get the form of their asymptotic variance-covariance matrices, and to give
their asymptotic expansions in order to study their asymptotic equivalence. More
precise proofs are given in Gourieroux et al. (1993) and Smith (1993).
APPENDIX 4A. DERIVATION OF THE RESULT
85
4A.1 Consistency of the estimators

To prove the consistency, we need several regularity conditions. The most important ones are:
(Al) The normalized function ^T(ysT(&), Zj^, ft) tends almost surely to a deterministic limit function I/TOO (#, ft) uniformly in (9, ft) when T goes to oo.
(A2) This limit function has a unique maximum with respect to ft: b(9)
argmax^co (#,/?).
(A3) Vr and 1^00 are differentiable with respect to ft, and
(A4) The only solution of the asymptotic first order conditions is b(0):
(A5) The equation ft = b(0) admits a unique solution in 9.
The proof of consistency is based on the study of the asymptotic optimization

problem. Let us first consider the two intermediate estimators, j$T and ftsr(6)We have:
Therefore ftT converges to b(90),

&(). Then:
an
d ftsr(-) converges to the binding function
(as soon as
A similar argument based on the derivativ
is positive definite)
gives the consistency o
86
4A.2 Asymptotic expansions

We need some additional regularity conditions, which essentially concern the second order differentiability of the functions 1/^7, t^, with respect to both parameters
9, ft and the continuity of the derivatives. Applying the implicit function theorem
will imply the first order differentiability of the binding function b(-). Moreover,
we assume that:
This implies:
(say).
Asymptotic expansions of pV and PST(QO)
These are directly deduced from the first order conditions. For instance, we have:
or
APPENDIX 4A. DERIVATION OF THE RESULT
87
Note that, from (4A.1), it is clear that $37(6) is asymptotically equivalent to

I ?=i 4f(fl>) (with ^(00) = argmax^ ^rf^rô), ZT\ ft]), and to fa(%) =
argmax^V f sr[}'sr( 6 ')'2sr;/S](withzi7- +A =z^,A; = 0 , . . . , S-\\h = l,...,T),
provided that we have an additive decomposition:
with a fixed number of variables in y, and z,.

Similarly, we have:
Asymptotic expansion ofOsr (&)

The first order condition for OST (&) is:
An expansion around the limit value #o gives:
88

Asymptotic expansion o
The first order condition is:
An expansion around the values 90, b(90) of the parameters provides:
or, using (4A.1)
Finally, we get:
Asymptotic equivalence of the estimators

We know that:
Therefore the asymptotic expansion of ST () given in (4A.4) may also be written:
A comparison with expansion (4A.3) directly gives Proposition 4.3:
APPENDIX 4B. PROOFS
89
Asymptotic distribution of the estimators

From (4A.1), (4A.2), we get:
Therefore using (A7), (A8), we obtain:
And finally, using (4A.3):
where W(S, fi) is given in Proposition 4.2.

The optimality of the matrix 2 = 2* is a consequence of the Gauss-Markov
theorem.
Appendix 4B: Indirect Information and Identification:

Proofs
4B.1 Computation of 11(0)
Let us consider the asymptotic variance-covariance matrix, under (M), of:
(with /r(6>) = /(y,/x t ; 0), f,a(/3) = f a ( y t / x t ; ft)). This matrix is denoted by:
90
with:
(since, for k < 0,
and:
with:
Fherefore, the asymptotic variance-covariance matrix of the projection of the

:omponents of 4= XlLi 18 ft(Q)/dQ on the space spanned by the components of
APPENDIX 4B. PROOFS
91
is:
4B.2 Another expression of

From the definition of b(9), we have:
or:
where /o(0) is the conditional p.d.f. of yo given the exogenous variables. Differentiating with respect to 9, we get:
The last limit is zero under the usual mixing assumption, and therefore:
and
Finally, we get:
If 77 (9) is invertible, ^|^ is of full column rank and the second part of Proposition 4.7 follows from the implicit function theorem.
Applications to Limited
Dependent Variable Models
5.1 MSM and SML Applied to Qualitative Models
5.1.1 Discrete choice model
The problem of discrete choices made by the agents is at the origin of the method
of simulated moments (McFadden 1989). In this subsection we first recall the form
of the model (see Section 1.3.1), the expression of the log likelihood function, and
the first order conditions. The model is defined from the underlying utility levels:
and from the endogenous dichotomous variables summarizing the choices:
The conditional discrete distributions of the endogenous variables are characterized

by the probabilities
The previous relationship can be seen as a moment condition, since it is equivalent

to:
With the notations of Chapter 2, we have K(y{, z,-) = ji,k(Zi,Q) = P(ZJ\ 9). The
log likelihood function is given by:
94
CHAPTER 5. APPLICATIONS TO LDV MODELS
where /,-, j 1 , . . . , / , is the subset of individuals for which ytj = I .

Finally, the likelihood equations are of the form:
or, since:
It is an empirical orthogonality condition, where the instruments are asymptotically
This choice of instruments in the GMM allows us to reach efficiency. However,

it has to be noted that such instruments depend on the unknown true value OQ;
therefore they have to be approximated by
where 9 is a consistent estimator of OQ, and this replacement has no effect on the
asymptotic distribution of the estimator.
5.1.2
Simulated methods
Let us introduce an unbiased simulator p j ( z i , u ; 9 ) o i p j ( z i ' , 0 ) , and let us denote

by Pj(zt, ufj-, 6) = | ]f=i Pj(Zi, ustj', 9) the average simulator, where ufj is a
notation for (M*., s = 1 , . . . , 5).
MSM
As seen in Chapter 2, we first have to select a set of instruments Z,-7-(z,-); then the
MSM estimator is the solution of an optimization problem:
5.1. MSM AND SML APPLIED TO QUALITATIVE MODELS
95
SML
The SML estimator is the solution of:
and satisfies the first order conditions:
This orthogonality condition is not an empirical moment orthogonality condition.

Indeed, even asymptotically,
is not an instrument, since it depends on the simulated values M?-, which introduce
a correlation between Zy(0o) and
Simulated instruments
It has been proposed to apply the MSM with instruments close to the Z appearing in
the likelihood equations. To destroy the correlation previously discussed between
Z and y ps, the idea is to consider:
where:
The vfj are drawn independently of the My and in the same distribution. Then we
can look for the estimator $ solution of:
where 0Sn is a consistent estimator of 9, when n -> oo, and 5 is fixed. With such
a choice, and with S* and S sufficiently large, we may be close to the asymptotic
efficiency.
96
5.1.3 Different simulators

We now have to discuss different possible unbiased simulators of the probabilities
Pj(Zi', 0). A good simulator has to be smooth, and has to imply a good precision
of the estimator, and short computer time. Some of these simulators have already
been presented in previous chapters. Some others are specific to this qualitative
framework.
The frequency simulator
The latent equations can be written:
where
The frequency simulator initially proposed by McFadden (1989) and Pakes and
Pollard (1989) consists in drawing the errors u\, and then considering
where Aj is the y'th row of A.

These simulators have several drawbacks.
(i) If we use a moderate number of replications, the average simulator has a
significant probability of taking the value zero. It is a problem for the SML
method, where we have to compute the logarithm of this quantity.
(ii) Moreover, the frequency simulator is nondifferentiable (and even discontinuous) with respect to 9. This lack of smoothness does not allow the
use of first order conditions, and introduces both numerical and theoretical
difficulties (see Pakes and Pollard 1989).
(iii) Finally, this simulator is not very efficient for approximating the probability
Pj(ti', 8), especially when the number / is large. In such cases some probabilities PJ are small, and a rather large number of replications are needed
to approximate them reasonably.
The importance sampling simulator based on latent variables
This simulator has already been introduced in examples 2.1 and 2.2. We introduce
the vector Vj whose components are Vji = Uij Uu, I = 1 , . . . , j 1, j +
1 , . . . , / , and whose distribution is known, up to the parameters. If fj(Vj/zi', 0)
denotes the conditional p.d.f., we have:

where <p is a known distribution with support (K. )
97
. The simulator is:
If fj is smooth with respect to 9, the same is true for PJ .

Note that, contrary to the frequency simulator, different simulations are used for
the different alternatives j proposed to a same individual. Such a computation of
approximated probabilities, alternative by alternative, will also appear in the other
simulators and increases the precision of these simulators when the importance
sampling functions are well chosen. Note, however, that, when different simulations are used for the different alternatives, the sum of these simulators over j is
not equal to 1.
The Stern simulator
In qualitative models, including discrete choice models but also sequential choices,
the main problem is to approximate the individual selection probabilities for given
values of the parameter. These selection probabilities are of the form:
where v = ( v i , . . . , vm)' ~ N(0, ). The values a,-, bj and the variancecovariance matrix are known as soon as we know the values of the explanatory
variables and the values of the parameters. Therefore they depend on i and 6.
To build an unbiased simulator of the probability of the rectangle D
07= i [j' ^j 1 frtne normal distribution N (0, ), Stern (1992) proposed to decompose the variance-covariance matrix . Let us introduce the smallest eigenvalue A,
of the matrix ; we have XI dm ~5> 0, and therefore we can write:
where C is an (m x m) matrix whose rank is m 1 (or less if the multiplicity

order of A. is strictly larger than 1). We deduce that the initial normal distribution
N(0, ) is also the distribution of:
where u and w are independent, u ~ N[0, Idm], w ~ ./V[0, Idm].

Let us now consider the selection probability:
where c; is the ;'th row of. C,
98
Therefore:
where M ~ TV (0, W m ) is an unbiased simulator of P(D). This simulator is based

on the general idea of conditioning introduced in previous chapters.
The GHK simulator
This simulator was first introduced by Geweke (1989) and has been used in various papers (Borsch-Supan and Hajivassiliou 1993, Hajivassiliou et al. 1996,
Keane 1993, Hajivassiliou 1993a, b, c). It is sometimes called the GHK (GewekeHajivassiliou-Keane) simulator. The idea is to directly approximate the probability of a rectangle.
To simplify the presentation, we first consider the bidimensional case. We have to
estimate the probability of a rectangular domain P[v e D] = P(v e [a\,b\] x
[a2, b2]), where v ~ N(0, E).
We first transform the random term v to get a random vector with a standard normal
distribution. The transformation may be chosen as a lower triangular matrix:
In terms of u\, u2 the selection probability is:
in the u-space the domain D* has the form shown in Figure 5.1.
Now let us consider a drawing
in the standard normal distribution restricted
in the standard normal restricted
(or conditional) to
and a drawing
99
FIGURE 5.1: Domain D*.

to [&2 yu\, fa yu\}. It is easily seen that the distribution of (w*, u^) is not
the bivariate standard normal distribution restricted to D*. However, an unbiased
simulator of P[v e D] is:
Indeed, we have:
Note that u\ has not been used, but it has been introduced in order to prepare the
general case.
Finally, it is easy to check that a drawing u* in the standard normal distribution
restricted to [CKI, fi\\ is deduced from a drawing u\ in the uniform distribution
f/(o,i) on (0, 1) by the formula:
since
100
General case. Let us now consider the extension to a general dimension m. After
the lower triangular transformation, the domain D* has the following form:
Let us introduce the following drawings:

'mN(0, 1) conditional to
in N(0, 1) conditional to
in N(0, 1) conditional to
An unbiased simulator of P[u e D*] = P(v e D) is:
5.2
Qualitative Models and Indirect Inference based on

Multivariate Logistic Models
The indirect inference approach may be based on an approximation of the initial

multivariate probit model for which the selection probabilities (or, equivalently,
the log likelihood function) have a simple analytical expression. In this section
we discuss an approximation based on multivariate logistic models (Gourieroux
et al. 1993, Gourieroux and Jouneau 1994).
5.2.1 Approximations of a multivariate normal distribution in a

neighbourhood of the no correlation hypothesis
Let us consider a random vector Y = (Y\,..., Yp)' with a Gaussian distribution
N(m, E). We introduce the correlation matrix R, whose entries are the correlation
coefficients p,-;. The cumulative distribution function of Y is:
G(yi,...,yp) = P[Yi <yi,...,Yp < yp\
5.2. QUALITATIVE MODELS AND INDIRECT INFERENCE
101
Let us consider a neighbourhood of the no correlation hypothesis. Since R is close

to the identity matrix, we have the expansion:
Therefore a first order expansion of the c.d.f. is:
(Say).
In the neighbourhood of the no correlation hypothesis, this function is also equivalent to:
(Say).
This is a local correction for the correlation effects by the introduction of Mill's
ratios.
Finally, if we consider that the unidimensional normal distribution is well approximated by a logistic distribution,
102
where F(y) = [1 + exp(y)]~l, we get two other approximations:
and
where S(y) = 1 F(y) is the survival function associated with the logistic
distribution.
5.2.2
The use of the approximations when correlation is present
Let us consider the log likelihood function associated with a discrete choice model.
It is given by:
where {D,; = f/,-e - f/y, / = 1 , . . . , j -1, j +1,..., /} has a multivariate normal

distribution whose c.d.f. is G(-; m t j ( 0 ) , ,/(#)), and m,;-(#) and //(0) denote the
mean and the covariance matrix.
Therefore Ln = ^"=1 X)j=i ^7 logG(0; m,7(0), S!V(6>)). This log likelihood
function is untractable because of the multiple integrals appearing in G. However,
it may be replaced by approximations such as:
or
5.3. SIMULATORS FOR GAUSSIAN LDV MODELS
103
or
or
which all lead to simple analytical forms of

These approximations can be used as the basis of the indirect inference approach.
An indirect inference estimator of 9 is derived by minimizing a quantity of the
kind:
where
and )> J (0) is a set of n simulated vectors yf(0) drawn from the discrete choice
models.
Such an approach is consistent and has a good precision when the correlations are
small (since Ln is a good approximation of L), but also for much larger correlations.
(See Appendix 5A for some Monte Carlo studies.)
5.3 Simulators for Limited Dependent Variable Models

based on Gaussian Latent Variables
5.3.1 Constrained and conditional moments of a multivariate
Gaussian distribution
Section 5.1 has described several simulators for probabilities of the form:
where D = YlJ=l[aj,bj] and v = (vi,..., vm)' ~ N(0, E).

It is also useful to introduce simulators of some more general moments constrained
by v e D. These moments may be of the form:
(constrained moment),
104
or:
(constrained moment),
where h is a given integrable function. For instance, conditional computations
naturally appear in limited dependent variable models with truncation effects,
when the variable is observed only if a given constraint v e D is satisfied. But
conditional moments are also important if we perform a direct analysis of first
order conditions in more classical frameworks such as discrete choice models, in
particular if we apply simulated scores (see example 2.3). To illustrate this point,
let us consider a Gaussian variable v ~ N[/J,, E], and the probability:
where fi and E are unknown, whereas the dj, bj are known, and where v = v fj,,
a a /JL, b ~b [A.
The previous probability is a function of the parameters /i, E:
where D YYj=i [&j 'bj]- When we consider the first order conditions associated
with a maximum likelihood approach, we have to compute the derivatives:
and
Some direct computations give:
and similarly:
5.3.2 S imulators for constrained moments

A particular case of constrained moment E[h(v)TLo(v)] is the probability
P[v 6 D], which corresponds to h = 1. It explains why the simulators described
in this subsection are direct extensions of the ones presented in Section 5.1.3, and
we will keep the same notations.
105
The Stern simulator

The Gaussian vector v may be decomposed in:
where u and w are independent, u ~ N(Q, Idm), w ~ AT(0, ldm), and A, is a scalar,
in practice the smallest eigenvalue of S. We have:
Therefore E[A(v) lo (u)/V], where H* is drawn in the standard normal distribution

N(0, Idm), is an unbiased simulator of the constrained moment. Now we have
to check if the conditional expectation E[h(v)&D(v)/u\ has a simple analytical
form. This expectation is equal to:
Such an integral may be easily computed for some specific functions h, in particular
when h(v) = Y[J=i v>j This class of h functions is interesting since we have seen
that it naturally appears with ry- =0,1,2, when we look at the score vectors. For
such products of power functions, the previous multiple integral is a product of
one-dimensional integrals, which are easily derived as soon as we know how to
compute an expression of the form:
It is a simple exercise to establish the following recursion formula:
(with the convention

The GHK simulator
Let us recall that we first apply a triangular transformation A such that:
where
106
Then we make explicit the constraints on u implied by v e D:
This set of constraints will be denoted:
Next, we introduce the following drawings:

in N(0, 1) conditionally to
in W(0, 1) conditionally to
in N(0, 1) conditionally to
The distribution of M* = ( * , . . . , u*m)' is the recursive truncated normal distribution with p.d.f.:
where ^ has already been defined in (5.16) and where by convention the denominator is <$>(fa) $(0:1) for j = 1.
Proposition 5.1 An unbiased
simulator of E[h(v)ô(v)]
is
h[Au*]p(u*,..., M^_J), where M* follows the recursive
truncated normal distribution defined in (5.24).
Proof. We have:
QED
107
The result is similar to the one derived in Section 5.1.2. Just note that it is now
necessary to draw the last component u*m as soon as it appears in the function
h(Au*).
5.3.3
Simulators for conditional moments
The question clearly is: How can the conditional distribution of v be drawn in
given v e >? We propose several answers to this question.
Acceptance-rejection methods
A crude acceptance-rejection method consists in drawing simulated values vs,
s 1,..., in the distribution of v until one of these values satisfies the constraint
vs e D. This first value compatible with the domain is the first simulated value
in the distribution conditional on {v e D}. A drawback of such a practice is the
large number of underlying drawings that may be necessary in order to get one
effective simulated value in the conditional distribution, when P(v e D) is small.
The accelerated acceptance-rejection method will avoid this problem. It is

based on the following lemma:
Lemma 5.1
Let / and g be two p.d.f.s on Em such that:

(i) the support of g is the domain D;
Let us consider some independent drawings (vs, xs), s

I,..., of vs in the distribution g and of xs in the uniform
distribution on [0, 1],
Let us define the first drawing si such that f ( v " ) >
xsag(vs).
Then the distribution of v1 = vS} is:
i.e. the distribution / conditional on v e D.
108
Proof. We have:
QED
In this accelerated procedure the proportion of efficient drawings, i.e. the ones that
satisfy the constraint, is on average:
Therefore from this point of view the accelerated approach is preferable to the
crude one as soon as a < 1. To get a good performance, we have to choose the
auxiliary p.d.f. g such that a = supD ^ is as small as possible. Anyway we
have:
if
and by integration:
This gives fD f ( v ) dv as the lower bound for a.

Example 5.1 Application to conditioning of the normal distribution
The previous lemma can be applied to the case of multivariate normal distributions,
i.e. to simulations in the distribution of v conditional on v D, where v ~
N[Q, ]. After some change of variable v = Au,u ~ N[0, /J m ],itisequivalentto
simulate in the distribution of u given u e D*, where D* is defined in Section 5.3.2.
In such a case we have:
and we may choose as auxiliary p.d.f. with support D* the recursive truncated
normal distribution (see (5.24)):
109
We get:
The corresponding accelerated acceptance-rejection method consists in drawing

some underlying simulated values-us in the recursive truncated normal distribution, xs in the uniform distribution on [0,1]until the first index si for which:
Then we compute i/1 = AuSl, which follows the normal distribution N(Q, E)
conditioned by v e D.
The Gibbs sampling simulator
The basis of the Gibbs sampling is the characterization of a multivariate distribution by the set of all univariate conditional distributions.
Lemma 5.2
Let x (x\,..., xm)' be a random vector with a strictly

positive density function f ( x ) with respect to a product
of measures <8)=1 d/j,j (Xj). Then the multivariate distribution is completely characterized by the knowledge of
the conditional distributions fj(xj/x-j), j = 1,..., m,
where x-j stands for (x\,... , X j - i , X j + i , . . . ,;cm),i.e.the
set of all the variables except the one with index j.
Proof. We give it for the bidimensional case. Let us introduce the marginal
distributions f$x$, fi{xi) of x\ and x%. From the Bayes formula, we have:
Integrating with respect to d^$x$, we get:
Therefore the marginal distributions (and also the joint distribution) are known as
soon as we know the two conditional distributions.
QED
Remark 5.1
The condition of strict positivity of the multivariate p.d.f. is necessary for the
characterization, as shown by the following counterexample. The distributions
Pa = t/u >2 n + (1 - cOt/?!2,,, where a e [0, 1], have the same conditional
V 2/
(,5'V
distributions. These conditional distributions are either t//0 j \ or Un A and are
independent of the scalar a.
Proposition 5.2
Let /} (xj/x-j) be the conditional p.d.f. associated with a

multivariate p.d.f. f ( x ) . We recursively define a Markov
process of order 1, XT = (x\,..., x^), in the following
way:
(i) x is given;
is:
(ii) the distribution of
given
the distribution
is:
of given
the distribution of xrm given x\,..., x^_lt xr~1 is:
f
Then the stationary distribution of the Markov process

is/.
Proof. We will check the property for the bidimensional case. Let us assume
that the distribution of xT~l is f (x\~l, x\~l); we have to prove that the distribution
of jcr is also /. The marginal distribution of x.\ is:
(where f\ (x\) and /afe) are the marginal p.d.f.s of f(x\, ^2))Since the conditional distribution of x2 given x\ is fz(x2/x\), the result follows.
QED
111
From the previous property we can approximately simulate a drawing in / in the

following way (called the Gibbs sampler):
(i) First we choose some initial value x.
(ii) Then we successively draw in the conditional distributions following the
procedure of Proposition (5.2), to derive a simulated path of the Markov
process XT'S, r =0,... ,T.
It is possible to prove that there exists a number p, 0 < p < 1, such that, for any
function h,
where Kh is a constant depending on h.

Therefore, for T sufficiently large, xr's may be considered a good approximation
of a drawing in /. Note that the xrs, r > T, can also be considered as drawn
from /, but that they are not independent.
Example 5.2 Application to the conditioning of a normal distribution
Let us apply the previous approach to a drawing of a value v from the normal
distribution N[Q, H] conditioned by > e D = 07=1 [ fl j> bj]- The multivariate
distribution is:
where
The p.d.f. is difficult to use directly since P[v e D] has an intractable form. What
about the different univariate conditional distributions? The conditional p.d.f.s are
such that:
since the components v*, k ^ j, already satisfy the constraints. Therefore we see
that the density function /)(tu;/iLy) is the p.d.f. of the conditional distribution of
Vj given u_y- = iL,-, reconditioned by Vj . [aj,bj]. The conditional distribution
of Vj given u_ ; = v^j is the normal distribution:
where
112
The conditional p.d.f. fj(Vj/v~j)
is directly deduced, and has an analytical form:
Therefore the successive conditional drawings of the Gibbs sampling will be easily
performed.
5.4 Empirical Studies

In this section we briefly describe empirical studies that have appeared in the literature. Essentially, we consider models that could not have been estimated without
simulation techniques, because of the presence of high dimensional integrals.
5.4.1 Labour supply and wage equation

Following the seminal approach by Heckman (1981), the labour force participation
is based on the comparison of the maximum wage offered and the reservation wage.
More precisely, let us consider a panel data set on n individuals, i = ! , . . . , ,
each observed at periods t = I,..,, T. We write the latent model as:
where wit, w*t are the offered and reservation wages respectively, mit, nit are
the conditional means, and w,,, vit some error terms. These latent variables are
used to define the sequence of individual decisions: the worker / is unemployed
at
Let an indicator dummy variable be defined as dit = 0 iff y*t < 0, and dit ~
1 otherwise. Then the employment-unemployment history of the individual is
characterized by the vector of indicators dn,..., diT.
For studying some questions such as the persistence of aggregate unemployment
or its evolution over the business cycle, it is necessary to specify carefully all the
dynamic aspects of the latent model. It is known that for duration models several
dependences have to be considered:
the state dependence, i.e. the effect of the current state occupied by an
individual;
the duration dependence, i.e. the effect of the length of the current spell of
unemployment; a spurious duration dependence may also be due to unobserved individual heterogeneity;
5.4. EMPIRICAL STUDIES
113
the history dependence, e.g. the effect of the cumulated length of unemployment spells before the current spell (lagged duration dependence), or
the effect of the number of past spells (occurrence dependence);
the habit persistence, i.e. effect of lagged values of WH, w*t,
In summary, the dynamic aspects may be captured either by introducing some
lagged endogenous variables among the explanatory variables, or by considering some particular structures of the error terms, such as a decomposition in an
unobserved individual effect (the omitted heterogeneity), or an additional time
individual term with an autoregressive structurefor instance,
where or,-, /6,-, e;i(, ?j,-,r are independent Gaussian variables, with zero mean and
variances a^, a^, a2, a%, respectively.
The estimation of the parameters may be based on different kinds of observation. In
practice, these observations may correspond either to the states dit, i = 1, ...,N,
t = 1 , . . . , T (if it is a pure discrete time duration modelsee Muhleisen 1993), or
to observations of the states dn and of the wage wit when the worker is employed,
dit = 1 (see e.g. Bloemen and Kapteyn 1990, Keane 1993, Magnac, Robin and
Visser 1995). In such a nonlinear framework, even the estimations of the coefficients of the static explanatory variables may be very sensitive to misspecifications
of the dynamics.
To examine this point, we reproduce in Table 5.1 some results obtained in Muhleisen (1993). The parameters are estimated by optimizing a simulated log likelihood function. Since this likelihood is a function of some probabilities associated with the multivariate normal distribution, the GHK simulator has been used.
Because the performance of the GHK estimator depends on the accuracy of the
computations, owing to its recursive structure, a pseudo-random number generator
with 20-digit precision has been used (Kennedy and Gentle 1980).
5.4.2
Test of the rational expectation hypothesis from business

survey data1
The main assumption for rational expectations (RE) is that prediction errors are
uncorrelated with any variable in the information available at the previous date.
If y* is the variable to be predicted, and y*e is the expectation of this variable held
at date t 1, the RE assumption implies for instance that the linear regression of
3? on ?;,?;_!,)?!,
'See Nerlove and Schuermann (1992).
114
TABLE 5.1: The participation equation3.

AR(1) error
Variable
AR(1) error + random effect
Constant
2.74 (4.5)
2.72 (4.4)
Age (130)
1.23(1.3)
1.93(1.3)
Age2/1000
-0.55 (1.4) -0.55(1.4)
Nationality (1 = foreign)
-0.013 (0.14) -0.023 (0.24)
Disability (1 = yes)
-0.22(2.12) -0.22(2.13)
No. of children (age 0-6)
0.04(0.51)
0.04 (0.5)
(age 7-10) -0.12(1.6) -0.11 (1.6)
(age 11-15) -0.14(1.8) -0.13(1.8)
Partnership (1 = yes)
0.06 (0.6)
0.07 (0.6)
State (1 = north/west)
-0.24 (2.7) -0.24 (2.6)
Region (1 = rural)
-0.06 (0.6) -0.06 (0.7)
Years of schooling/10
0.15(0.8)
0.16(0.8)
Job status (1 = blue collar) -0.23 ( 2.4) -0.23 (2.3)
Union membership
0.09 (1.2)
0.09(1.1)
-2.35 (15.2) -2.44(13.5)
di,t-\
-0.015(0.13) 0.05 (0.38)
di,t-2
-0.25 (2.5) -0.23 (2.30)
di,t-3
-0.49 (4.0) -0.47 ( 3.8)
di,t-4
-0.05 (0.3) -0.05 (0.3)
nj=i4.*-,-0.07 (6.0) -0.07 (5.8)
Lagged unemployment
Male unemployment rate
-0.98 (3.4) -0.98 (3.4)
0.015 (0.40)
*l
0.090
(0.99)
0.14(1.9)
P
a
The data are taken from six waves of the Socio-Economic Panel for West Germany for the years
1984-1989, and concern 12000 individuals. The estimations are performed using only the history of
the states, and two schemes are considered for the error term ,-,: a pure autoregressive scheme ft = 0,
and the complete scheme also including the random effect ft. The different explanatory variables,
including some lagged endogenous ones, are included in a linear way in /*,-,.
APPENDIX 5A. MONTE CARLO STUDIES
115
is such that GO = 2 = #3 = 0, a\ = 1.
In practice, it is possible to have some time-individual data on expectations and
realizations from business surveys. Unfortunately, these data are qualitative, with
stltprnativpc cnrh fls*
In a first approach we may consider that such data are deduced from the underlying
quantitative variable by truncation. For instance, the qualitative variable associated
with y*t is:
where i is the individual index and t the date. Let us consider the simple case of
two dates, T 1 and T. We may specify the distribution of the latent variables
as
rnultivariate normal:
where z,, r are observable exogenous variables and where there is independence between individuals. The associated distribution of the qualitative observed variables
(yi,r, y f ? , Ji.r-i) y/7._1), i = I,..., N, will contain four-dimensional integrals
and will depend on the parameter 0 (c1, d', b', a, cry)'.
Nerlove and Schuermann (1992) have estimated such a multinomial probit model
(without explanatory variables), and considered the test of the RE hypothesis. It
requires a preliminary transformation of the constraints (ao = ci2 = a^ =Q,ai
1) into constraints on the components of 6. Such constraints include:
The sample consisted of 1007 manufacturing firms for the fourth quarter of 1986
(T 1) and the first quarter of 1987 (T), and the estimation has been performed
using the smooth recursive conditioning simulator (GHK). The number of replications was 20. The RE hypothesis has been rejected from the set of data.
Appendix 5A: Some Monte Carlo Studies

As seen in Chapter 4, the asymptotic and finite sample properties of indirect inference estimators are improved if the auxiliary model or criterion is well chosen,
especially if the binding function is close to the identity function in the just identified case. Therefore it may be useful to evaluate the approximations of the bivariate
116
Gaussian c.d.f. proposed in Section 4.2. Let us denote by <J>2<X y, p) the c.d.f. of
the normal distribution:
and let us consider the approximated c.d.f. of the form:
Then we may look at the solution of the minimization problem
where:
The criterion K is a Kullback-Leibler information criterion measuring a discrepancy between the two distributions ^ and G, where <l>2 corresponds to the initial
model, and the observations are qualitative and dichotomous. The solution of the
minimization problem gives the values of the binding function for the values of the
parameters 0, 0 (for the means), 1, 1 (for the variances), and p for the correlation.
It is easily checked that the two components of the binding function associated with
the means are equal, mx (p) = my (p), and that the same is true for the components
associated with the variances crx(p) = ay(p). Figures 5A.1-5A.3 describe the
three functions mx(p), ax(p), and r(p). It is directly seen that the approximation
has nice properties in the domain p [0, 5] since for these values of p we have:
APPENDIX 5A. MONTE CARLO STUDIES
FIGURE 5A.I: The component of the binding function mx ()
FIGURE 5A.2: The component of the binding function ax(-).
FIGURE 5A.3: The component of the binding function r ().
117
Applications to Financial Series

6.1 Estimation of Stochastic Differential Equations from
Discrete Observations by Indirect Inference
6.1.1 The principle
Models without factors
Let us consider a continuous time process satisfying a stochastic differential equation:
where (Wt) is a standard Brownian motion. If the only available observations
correspond to integer dates 1, 2 , . . . , T, it is not possible in general to determine
the analytical form of the density of the observations l(y\,..., yT', 0). However,
the path of the continuous time process (y t ) may be simulated with a good accuracy.
The idea is to introduce the discrete time analogue (or Euler approximation) of (6.1)
corresponding to a small time unit 8 (such that 1/5 is an integer). More precisely,
we define the process (yf') such that:
where
and (ef \ k varying) is a Gaussian white noise with unit variance.

Then, for each parameter value 0, we can simulate y$s(0), k = 1 , . . . , [T/S],
s = 1 , . . . , S, using (6.2), and deduce simulated values for the observation dates
by just selecting the values corresponding to integer indexes:
It is known that (v,( ) tends in distribution to (y,) when S tends to zero at a sufficient
rate (see e.g. Guard 1988). Therefore (6.3) will provide an accurate simulation
of y as soon as 8 is sufficiently small.
120
CHAPTER 6. APPLICATIONS TO FINANCIAL SERIES

Instrumental models
Different instrumental models may be introduced to estimate the parameter 9 of

the diffusion equation. The most natural one is the discrete time version of (6.1)
with the time unit 5 = 1:
This is a nonlinear autoregressive model, with an explicit form of the log likelihood
function:
The pseudo-maximum likelihood estimator J3T = arg^ maxLj-(^) is frequently

used in financial applications to estimate do, but it is inconsistent. The indirect
inference approach based on Lar will correct the bias arising from the time discretization.
It is also possible to introduce some other kinds of auxiliary models, such as
GARCH models, as suggested in Engle and Lee (1994).
Models with factors
A similar approach may be followed for more general models based on the evolution of underlying factors, y* say. These models may be written:
where a is given. The different processes may be multivariate with corresponding

sizes for the matricial functions.
Well-known examples of such models include stochastic volatility models. Such
models contain a subsystem describing the evolution of asset prices S, (say), and
a second subsystem giving the evolution of some state variables with an effect on
the price volatility:
where (^) is a Brownian motion, with possible instantaneous correlation (see

e.g. Scott'1987, Wiggins 1987, Hull and White 1987, for pricing problems in this
framework). In practice, the only available observations are price values at discrete
dates, and the factors driving the volatility are unobserved.
6.1. ESTIMATION OF STOCHASTIC DIFFERENTIAL EQUATIONS
121
As above, it is possible to simulate (yi,..., yr) = (S\,..., ST)by using a time

discretization of system (6.7) with a small time unit 5. In this simulation step it is
necessary jointly to simulate the observable and latent processe
The discretized version of (6.7) with time unit & = 1 may be chosen as auxiliary
model (Ma). It is given by:
where
is a Gaussian white noise with variance-covariance matrix Q.
The auxiliary parameter is ft = [9', vech 2']'. System (6.8) defines a nonlinear
state space modelling for which the likelihood function has an untractable form.
However the parameters ft may be estimated by some Kalman filter type methods
(see Section 6.3.2). These methods would give inconsistent estimators of ft even
if (6.8) were valid; however, first, (6.8) is an approximated model; second, we are
interested in consistent estimators of 0 not of ft; and third, indirect inference will
correct for the inconsistency.
Some other examples of factor models are time deformed processes (see e.g.
Clark 1973, and Ghysels et al. 1994a). In such models we introduce two underlying processes: (i) a price process, expressed in intrinsic time (market time) and
following a stochastic differential equation:
and (ii) the changing time process Z, which gives the relation between calendar
time and market time. This increasing process may also be assumed to follow a
stochastic differential equation:
where the two processes (W*) and (Cf) are independent Brownian and gamma
processes, respectively.
The observations are the prices at some discrete dates in calendar time, S,, t =
l,...,T, where S, = S*z<.
In the following subsections we describe several applications of this kind which
have recently appeared in the literature.
6.1.2 Comparison between indirect inference and full maximum

likelihood methods
Such comparisons are possible only for specific models for which the distribution of
discretized observations is tractable. Two Monte Carlo studies are presented below
122
and correspond to the geometric Brownian motion and the Ornstein-Uhlenbeck

models (see Gourieroux et al. 1993, and Broze et al. 1995a).
Geometric Brownian motion with drift
This well-known model has been widely used in finance and is the basis of the
derivation of the Black-Scholes formula for option pricing. The price y, of the
underlying asset is assumed to satisfy the stochastic differential equation:
where Wt is a standard Brownian motion and n, and a are the drift and volatility
parameters respectively. By applying Ito's formula, we get the equivalent form:
We deduce from (6.10) the exact discretized version of (6.9), which corresponds
to a random walk with drift in the log price:
and to a lognormal distribution for the price. Therefore the parameters /x, a may
be estimated by the full maximum likelihood method, i.e. by:
We may also introduce the direct Euler approximation of (6.9):
which gives an autoregressive form for (yt) with some conditional heteroscedasticity. A naive estimator is derived by taking the (pseudo-)maximum likelihood
estimator corresponding to (6.12):
123
This estimator is equal to the empirical moments:
It is inconsistent, since the discretization (6.12) is not the right one. The asymptotic
bias is easily derived. For fa, we have:
The bias, i.e. Efa ~ l*> (exp jit) (1 + At), is always positive. Finally, we can
correct for this bias by applying indirect inference on the basis of the auxiliary
model (6.12). We get a third estimator, (fa, a%).
To compare the properties of these three methodsmaximum likelihood, naive,
and indirect inferencewe reproduce in figures 6.1 and 6.2 the results of a Monte
Carlo study (with 200 replications). The true values of the parameters are ^ = 0.2,
a = 0.5, and the number of observations is T = 150. Indirect inference is applied
with 5 = 1 simulation and with 5 = 1/10.
The positive bias of the naive estimator, and the complete correction by indirect
inference, are clearly seen in the two figures. The indirect inference estimator is
even less biased in finite samples than the ML estimator for the volatility parameter
(see Section 4.4). Of course, the distribution of the indirect inference estimator
is less concentrated than the distribution of the ML estimator, a consequence of
the asymptotic efficiency of the latter. But we have to remind ourselves that the
indirect inference method has been performed with only one replication and that
the concentration might have been improved by increasing the number 5. Table 6.1
summarizes the statistical properties of the three estimators.
Estimation of an OrnsteinUhlenbeck process
An Ornstein-Uhlenbeck process is a solution of the differential equation:
It is a three parameter model, with a mean reverting effect: if y, is at some date

far from a, it tends to go back to this value in the following periods (if k is
positive), k is a return to the mean parameter, a the volatility, and a a 'long-run
124
FIGURE 6.1: Geometric Brownian motion, estimation of \i:

tor,
ML estimator,
naive estimator.
indirect estima-
FIGURE 6.2: Geometric Brownian motion, estimation of a:

tor,
ML estimator,
naive estimator.
indirect estima-
125
TABLE 6.,1: First and second order moments.

Estimator
ML
M
a
Indirect fJNaive
a
V
a
Mean
0.201
0.503
0.201
0.499
0.220
0.624
Bias
0.001
0.003
0.001
-0.001
0.020
0.124
Standard
deviation
0.040
0.030
0.057
0.087
0.049
0.061
Root mean
square error
0.040
0.030
0.057
0.087
0.053
0.138
equilibrium' around which the path (yt) is varying. Such a model also admits an
exact discretization, which is:
where (st) ~ IIN(0, 1). It corresponds to a linear autoregressive formulation of

order 1. It provides some other interpretations of parameters: for instance, the
value of k is directly linked with the serial autocorrelation, while a is the mean of
the process.
As in the previous subsection, we are going to compare three estimation methods:
the ML estimator based on (6.14), the PML (or naive) estimator based on the crude
discretization
and the indirect inference estimator, also based on (6.15). In this simple example
it is easy to look for asymptotic properties of the naive estimator since (6.14)
and (6.15) have the same structure. The limits for the naive estimators are:
Therefore the bias correction by indirect inference is essentially useful for the two
parameters k, a; in fact, it turns out that the ML and the naive estmators of a and
a* are identical. Figures 6.3 and 6.4 give the distributions of the three estimators,
where the Monte Carlo study has been performed with Jk = 0.8,a = 0.1, a = 0.06,
T = 250, 5 = 1, and a simulation time unit of 1/10.
6.1.3
Specification of the volatility
The previous two examples are very specific since the associated continuous time
models have exact discretizations, which is not the case in more general frameworks. However, we have seen that indirect inference is a good way to correct the
126
FIGURE 6.3: The Ornstein-Uhlenbeck process, estimation of k:

estimator,
ML estimator,
naive estimator.
indirect
FIGURE 6.4: The Ornstein-Uhlenbeck process, estimation of a:

estimator,
ML estimator,
naive estimator.
indirect
127
TABLE 6.2: First and second order moments of the estimators.

Estimator
ML
k
a
a
Indirect k
a
a
Naive
k
a
a
Mean Bias
0.859 0.059
0.100 0.000
0.063 0.003
0.811 0.011
0.100 0.000
0.060 0.000
0.574 -0.226
0.100 0.000
0.043 -0.017
Standard
deviation
0.122
0.005
0.004
0.170
0.007
0.005
0.051
0.005
0.002
Root mean
square error
0.135
0.005
0.005
0.170
0.007
0.005
0.232
0.005
0.017
asymptotic biases on volatility parameters. In this section we consider other models with stochastic volatilities and with infeasible maximum likelihood estimators.
(See also Monfardini 1996 for applications of indirect inference on discrete time
stochastic volatility models.)
The Brennan-Schwartz model for the short-term interest rate1
Among the models proposed in Chan et al. (1992), where the short-term interest
rate is assumed to satisfy:
the Brennan-Schwartz model (1979) corresponds to the constraint y 1.

Broze et al. (1993) have estimated such models from rates of return on US Treasury
bills with one month to maturity. The short-term interest rate is identified with
the one-month rate. The data cover the period January 1972-November 1991,
and contain 239 observations. The evolution of this rate in first difference clearly
shows the presence of conditional heteroscedasticity.
A generalized version of (6.16) in which the volatility term has been replaced by
has first been estimated from the (misspecified) discretized version, taking into
account the stationarity condition 0 < y < 1. The results are given in Table 6.3.
If these estimators were consistent for the parameters of the underlying continuous time model (which is not the case, since they have been derived from the
'See Broze et al. (1993); see also DeWinne (1994).
128
FIGURE 6.5: The US Treasury bill interest rate.
FIGURE 6.6: The US Treasury bill in difference.
TABLE 6.3: Estimations from the discretized version.

Parameter
Estimation
a
0.23
j3 + I
0.97
CTQ
0.094
a\
-1.73
y
1
129
TABLE 6.4: Estimation by indirect inference.

Parameter
Estimation
a ft +1
ap
a\
y
0.03 0.98 0.102 -0.08 1
discretized version), we would have concluded that the Chan et al. model is misspecified. Indeed, the estimation of y reaches the limit point between stationarity
and nonstationarity, and the constant term a\ in the variance is significantly different from zero.
However, such a conclusion may be valid only after a bias correction of the previous
estimators. This correction has been performed by indirect inference based on a
discretized version with a time unit of 1/10 (see Table 6.4).
The y estimator still reaches the limit point, which confirms the misspecification.
In fact, it means that the unconstrained estimator of y, i.e. without imposing the
inequality constraint y < 1, would have taken a value strictly larger than one.
The estimators of parameters ft, cr0 are not strongly modified. On the contrary, the
estimations of a and a\ are much more sensitive to the discretization of the model.
Given the previous Monte Carlo results, this is not surprising for the volatility
parameter a\. Concerning the a parameter, the effect is probably a consequence
of the constraint on y. The unconstrained estimator of y (a volatility parameter) is
strongly modified by discretization of the model. Imposing the constraint y 1
transfers this modification to another parameter measuring the nonstationarity
phenomenon; a is such a parameter.
The informational content of option prices
Pastorello et al. (1994) were interested in comparing the informational content of
option prices and stock prices for the parameters associated with the dynamics of
the volatility. For this purpose, they first introduced a stochastic volatility model
of the form:
where S,, at denote the stock price and the volatility respectively, and where (Wts),
(W) are two independent standard Brownian motions. Therefore the log volatility
satisfies an Orstein-Uhlenbeck model, and, conditional on the volatility process,
the stock prices have a lognormal formulation with a time varying volatility.
The option prices are deduced from the arbitrage free conditions. Let us consider
a European call option on the asset S, maturing at time T, with strike K. It delivers
at date T the terminal value max(0, ST K). By arbitrage free arguments, its
130
price at date t is a discounted expected value of this terminal cash flow, where
the expectation is taken with respect to a pricing (risk-neutral) probability Q, and
conditionally to the information available at t:
In the previous expression the instantaneous interest rate r is assumed to be fixed

and deterministic. Even if we assume that the option price only depends on the two
underlying factors (St, a,), we are in a framework of incomplete markets, and the
valuation probability Q is not defined in a unique way. However, for the particular
dynamics (6.16), it is possible to exhibit all the admissible probabilities Q (see Hull
and White 1987); moreover, if we assume that there is no risk premium associated
with the risk on volatility (the volatility of the volatility), we get a unique pricing
formula:
where the expectations are taken with respect to the historical probability (the one
associated with (6.16)), where:
and <J> is the c.d.f. of the standard normal distribution.

The formula (6.18) is an extension of the Black-Scholes formula (see Black and
Scholes 1973), which corresponds to the limit case of a constant deterministic
volatility a. In such a case the formula reduces to:
In Pastorello et al. (1993), the observations are stock prices S,, t = 1 , . . . , T, and
prices of at-the-money options, i.e. options such that x, = 0, corresponding to a
sequence of maturities r = T t. The observations are denoted by:
where:
131
Only one maturity t is introduced date by date in order to avoid deterministic

relationships between observations. Indeed, such relationships will exist as soon
as we introduce two options or more date by date, since the dynamic model (6.16)
only depends on just two underlying factors.
In practice, option prices are often normalized by the Black-Scholes formula in
order to correct partially for the maturity and strike effects. This normalized price,
called implicit volatility and denoted by a', is denned as the solution of:
and for at-the-money options it is given by:
It is of course equivalent to considering the observations y*, t = I,..., T, or the

observations yt = (St, a/), t = 1,..., T.
Three estimation methods of the parameters a, k, a have been considered by
Pastorello et al.
(i) In the first one, they use the observations St, t = 1,..., T, only on the stock
prices. Then the parameters JJL, a, k, a are estimated by indirect inference
using as auxiliary model the discretized version of system (6.18) with a time
unit of 1/10.
(ii) In the second one, they use the data only on option prices, i.e. the observed
values of the implicit volatility, and they apply the Ornstein-Uhlenbeck
model directly to these volatilities. Since this dynamic admits an exact
discretized version, it is equivalent to estimating the AR(1) formulation:
This idea of directly replacing the unobserved volatility by the implicit

volatility derived from the option price is not correct. Since the underlying volatility is stochastic, we know from (6.21) that cr/ does not coincide
with ay, and does not follow the Ornstein-Uhlenbeck dynamics.
(iii) The third estimation method is also based only on the data on option prices.
It uses the auxiliary model (6.22), and then is corrected by indirect inference.
These three methods have been applied to a set of artificially generated data on
(St, Pt), t = 1 , . . . , T. The values of the parameters retained for the data generating process have been fixed to reasonable levels:
132
TABLE 6.5: Summary statistics of estimators o f k .

Indirect inference
based on (St)
M e a n ( U 7 3
Standard error 0.131
Indirect inference
based on (or/)
OJUT
0.030
Uncorrected method
based on (a/)
0.142
0.026
TABLE 6.6: Summary statistics of estimators of a.

Indirect inference Indirect inference Uncorrected method
based on (St)
based on (cr/)
based on (a/)
Mean
-6.46
^6A2
-6.70
Standard error 0.11
0.04
0.02
fj, = 4.6% per year, which is the average nominal return on Treasury bills for the
period 1948-83.
k = 0.116, a = 6.422, a 0.192, which correspond to the estimates reported
in Melino and Turnbull (1990).
The Monte Carlo experiments have been performed with a sample size T = 720,
corresponding to daily returns. Some summary statistics are given in Tables 6.56.7.
Since the parameter a measuring the magnitude of the random character of the
volatility is rather high, the implicit volatilities a' are bad proxies of the underlying volatilities, and the application of the Uncorrected method based on the
observations (a/) leads to strongly biased estimators.
The other important remark concerns the relative precision of the estimators based
on the stock prices (St) and the option prices (cr/). It clearly appears that the option
prices are much more informative, especially for the parameters k and a, which
measure the time correlation and the random character of the volatility. Of course,
TABLE 6.7: Summary statistics of estimators of a.
Mean
Standard error
Indirect inference
based on (St)
O245
0.134
Indirect inference Uncorrected method

based on (cr/)
based on (cr/)
C U 9 5 0 . 0 6 5
0.039
0.003
133
better results may have been obtained by a joint use of the two kinds of observation,
but the improvement of the precision would not be very important compared with
the one based uniquely on the information summarized in the implicit volatility;
moreover, all the methods using stock price data are sensitive to misspecification
errors concerning the dynamics of (St). Indeed, we have to recall that the pricing
formulas that are derived from the arbitrage free conditions do not depend on the
form of the instantaneous expected return ^(t, St, at) such that:
6.2 Estimation of Stochastic Differential Equations from

Moment Conditions
The parameters of stochastic differential equations may also be estimated by methods of moments. In this section we present both exact and simulated methods of
moments.
6.2.1 Moment conditions deduced from the infinitesimal

operator2
Infinitesimal operators
Let us consider a strictly stationary continuous time vector Markov process (yt).
We may introduce the family of operators (G(), defined on square integrable functions by:
Definition 6.1
The infinitesimal operator of the Markov process is the

derivative at t 0 of the operator G,; it is denoted by A
and given by A = limô j(G, Id).
Since the limit limô }(Gt<p <p) does not necessarily exist for all square integrable functions, the infinitesimal operator is in fact defined on a subset of the set
of square integrable functions. This subset is called the domain of the infinitesimal
operator and is denoted by D.
2
See Hansen and Scheinkman (1995), and Conley et al. (1994).
134
Proposition 6.1

If the Markov process (y,~) satisfies the stochastic differential equation
dy, = n(yt)dt + a(yt)dW,,

where (W,) is a multivariate standard Brownian motion,
the infinitesimal operator is given by:
Proo/. See Appendix 6A.

A similar approach for defining another infinitesimal operator may also be followed
for the reverse time process. More precisely, we may introduce the sequence of
operators (G*) defined by:
and their derivative at zero:
We denote by D* the domain of A*.

Proposition 6.2
G* (resp. A*) is the adjoint operator of Gt (resp. A) for the

scalar product associated with the marginal distribution
of y,.
Proof. Indeed, we have
Similarly, we have:
We deduce the result from the definition of an adjoint operator:
QED
135
Note that the scalar product is the one associated with the marginal probability
distribution of yt, which is time independent because of the stationarity assumption.
The determination of the operator A* sometimes requires the computation of the
stationary distribution of the process. However, for most stationary univariate
diffusions the two operators are equal: A* A (see Hansen and Scheinkman
1995).
Moment conditions
Proposition 6.3
(i) We have:
(ii) We have:
Proof: (i) We have:
Therefore E [}(G, - Id)<p(y0)] = 0, and by taking the limit, EA<p(y0) = 0.

(ii) Since A and GI commute, we have:
QED
136
We must add that the second set of moment conditions can be extended to functions
without multiplicative forms. If (p(yt, yt+{) is such a function of the two arguments,
we get, for univariate diffusion equations,
These moment conditions may be used directly as the basis of exact moment
methods in some simple cases. For instance let us consider the Chan et al. (1992)
model introduced in Section 6.1. It corresponds to the stochastic differential
equation:
We have, for a function <p possibly depending on the parameter,
and the moment condition EA<pg(rt) = 0 is:
In practice, we may introduce several functions <p and the associated moment conditions. For instance, we may introduce exponential functions <pj(r) = exp(o/r),
j I , . . . , p, and the moment conditions will be:
Properties of the method

Up to now, practical implementations of the previous ideas have been performed
only for some specific modellings (see Conley et al. 1994). Therefore, we will
essentially discuss some general properties of the associated method of moments.
As usual, there is the problem of a suitable choice of the moment conditions, i.e of
the functions <p, since the precisions of the estimators will depend on this choice.
In particular, it would be interesting to know the loss of information resulting from
the use of the conditions EA<p(yt) = 0, which are essentially marginal conditions
and only take the local dynamic into account through the infinitesimal operator A.
We see from the Chan et al. (1992) example (see (6.27)), that the moment conditions derived from the infinitesimal operators may be not sufficient to identify the
parameters of interest. If y is clearly identifiable, the other parameters a, ft, OQ
will be determined up to a multiplicative factor.
137
In fact, the main drawback of the Hansen-Scheinkman (1994) approach seems

to be its difficulty of implementation for unobservable factor models. We shall
illustrate this point using the stochastic volatility model considered in Pastorello
et al. (1994). (Another illustration for a time deformed process is given in Ghysels
et al. (1994a).) The model is defined by:
or by introducing the bivariate process y, = d * ):
Therefore, considering <p functions independent of the parameter, we have:
Let us now assume that the only available data are the stock prices St = y\t- In
such a case, the only moment conditions that can be used as a basis for GMM are
the ones for which A<p(yt) depends only on the first coordinate y\t, and this for all
the admissible values of the parameters. This constraint is equivalent to:
simultaneously independent of yz, and it is satisfied only by the affine functions

of y\ : <p(y) = a + by\. Therefore the approach provides uninteresting moment
conditions in the case of parameter independent <p functions.
6.2.2
Method of simulated moments
We have seen that exact methods of moments were difficult to implement, especially in the case of unobservable factors. An alternative method is the method
of simulated moments. Since the analytical forms of the conditional distribution
of yt given y,-\ are in general not simulable, the MSM has to be applied to static
cross moments (see Chapter 2). Such an approach has been followed by Duffie and
Singleton (1993). As before, the use of static moments only is likely to introduce a
loss of precision, especially for the parameters summarizing the nonlinear features
of the dynamics. It seems at least useful to take into account the third and fourth
order moments Eyf, Ey* in the case of stochastic volatility models, since we know
138
that the existence of a stochastic volatility increases the tails of the marginal distribution, and in particular the kurtosis (see e.g. Engle 1982), together with cross
order moments such as E(yfyf_k) to capture the conditional heteroscedasticity
and E(\yt\yt~k) to capture the leverage effect (see Andersen and Sorensen 1993).
6.3 Factor Models

6.3.1 Discrete time factor models
The modelling of financial time series may also be performed through discrete
time processes. In such a case the absence of analytical forms for the conditional
p.d.f. is not due to problems of time aggregation, but generally comes from the
unobservability of some latent processes (the factors) that drive the dynamics.
After some preliminary transformations, this kind of model may be written as
an (AT) model (see (1.21)) of the form:
where the functions ri(y,_i, y*, ; 9) and r2(yt-i, 3>*_i, ; 0) define a one to one
relationship between (ei,,S2t) and (yt,y*), and where (e\t), (e 2f ) are independent white noises with known distributions. The y,, t = I, ...,T, variables are
observable, but the factors y*, t 1 , . . . , T, are unobservable.
As mentioned in Section 1.3.4, conditional p.d.f.s are easily derived from system (6.3). These are the conditional p.d.f. of yt given yt-\, y*, denoted by
f(yt/yt-\.iy*\Q)i and the conditional p.d.f. of y* given yt-i,y*-\, denoted
by f*(y*/yt-i,y*-i'^)We then deduce the p.d.f. of yT = ( y i , . . . , y r ) ,
y* = ( y * , . . . , y*) given the information _yo, y:
If the process (yt, y*) is strongly stationary and T is large, the effect of the initial
conditions yo, y becomes negligible when studying the asymptotic properties of
the estimators.T'herefore we will not discuss this problem of initial values. The
likelihood function (conditional to .yo, 3$) has the form of a multivariate integral:
where f^(y*) denotes the dominating measure.
6.3. FACTOR MODELS
139
Example 6.1 Factor ARCH models

Multivariate ARCH models naturally contain a large number of parameters, and
it is necessary to introduce constraints in order to make this number smaller. A
natural approach, compatible with the needs of financial theory and with some
features of financial series which often have common evolutions in the volatilities,
leads to the introduction of unobserved factors (Diebold and Nerlove 1989, Engle
et al. 1990, King et al. 1990, and Gourieroux et al. 1991).
Let us consider for instance the model with one exogenous factor of the DieboldNerlove type:
where (yt) is the observable n-dimensional process, (et) is a Gaussian white noise
with an unknown variance-covariance matrix Q, (y*) is the unidimensional factor
independent of (e,), and A. is the n-dimensional sensitivity vector of the components
of y, to the common factor y*. We assume that the factor follows an ARCH(l)
formulation (Engle 1982):
conditional on y,-\, y*_^ (with, for instance, the identification constraints o > 0,
a\ > 0, a0 + i = !)
In this example, we have:
6.3.2
State space form and Kitagawa's filtering algorithm3
The dynamic model (6.29) appears as a nonlinear state space system, where the first
subsystem is the measurement equation and the second is the transition equation.
In a linear state space system it is well known that the Kalman filter is an algorithm
allowing for the exact computation of the conditional p.d.f. of yt given yt-i (and
the initial conditions). In this subsection we will discuss the possibility~of such
an exact algorithm for nonlinear models, and show that, except for some specific
cases, the exact computation of the likelihood function is not possible. Then it
will be necessary to use either numerical or simulated methods.
3
SeeKitagawa(1987).
140
Kitagawa's algorithm explains how to compute recursively the conditional p.d.f.

f(y*-~pl/yt-i), in the particular case of model (6.29):
with the notation

It requires several steps.
Step 1: Time updating
Let us assume that the p.d.f. f(y*L~pl/yt-i) is given. Then we deduce the conditional p.d.f.
where the first term of the RHS in directly deduced from (6.35).
Step 2: One step prediction
We deduce
where the first term of the RHS is known from (6.34) and the second is given by
Step 1.
Then, integrating out y*Lp, we get:
which is the general term of the likelihood function.

Step 3: Measure updating
From Step 2, we get
And, integrating out yt-p, we obtain f(y*Lp+\/y_t), which is the input of the next
iteration.
In summary, Kitagawa's algorithm provides a recursive computation of the multiple
integral defining the likelihood function.
Such an algorithm is not so simple to implement, since it requires the computation of integrals. These computations can be done explicitly in the case of
6.3. FACTOR MODELS
141
linear Gaussian state space models, where Kitagawa's algorithm coincides with
the Kalman filter, and when the factor y* is discrete with a finite number of possible valuesb\,... ,bj, say. In such a case the integrals reduce to finite sums
(see Hamilton 1989). In the general case, and if p is small, these integrals could
be approximated by numerical methods (see Kitagawa 1987) or by simulation
methods.
6.3.3
An auxiliary model for applying indirect inference on factor

ARCH models4
Since exact computation of the likelihood function can be performed when the
factor takes only a finite number of values, a natural idea is to apply indirect
inference to an approximated version of the ARCH model in which the factor
has been state discretized. Let us consider the factor ARCH model introduced in
(6.32) and (6.33) and a partition of the range of y* in / given classes (a,-, o/+i),
j = 0 , . . . , J 1, where OQ = oo, cij +00. The state discretized factor is
defined by:
where bj are given real numbers, such as the centres of the classes, except for the
two extreme ones.
The dynamics of the discretized factor may be defined in accordance with the
dynamics of the initial one by:
(say).
Then the initial factor ARCH model is replaced by the proxy model:
where (e,), (y,) are independent, (et) ~ IIN(0, 2), and (y,) is a qualitative Markov
process with transition probabilities P/;(a0, i). This auxiliary model can be
estimated by the maximum likelihood method, using Kitagawa's algorithm. Then
the correction for the state discretization of the factors is performed by indirect
inference.
4
See Gourieroux et al. (1993) and Gourieroux (1992).
142
TABLE 6.8: Estimations of the SVM.

Static
Static
model
model
Model
Model
Parameter
1
2
3
4
b~0
ÔB (0.04) -0.28 (0.65J~Ô002 (0.002) -0.007 (0.008)
bi
0
0.18(0.005) 0
0.06(0.01)
b2
0
0
0.96(0.01)
0.96(0.02)
p
0
0
0
0.09(0.02)
t)
0.76(0.05) 0.71(0.05)
0.16(0.03)
0.16(0.03)
log likelihood -2937
-2928
-2898
-2890
a
Standard errors are in parentheses. Source: Danielsson (1993).
6.3.4
SML applied to a stochastic volatility model5
Stochastic volatility models (SVM) may also be directly defined in discrete time.
Danielsson (1993) considered such a model with the structure:
where (s,), (vt) are independent Gaussian white noises with zero mean and unit
variance. Therefore there is a stochastic volatility which is predetermined, and
generally not observable. This lack of observability implies a likelihood function
with the form of a T-variate integral:
The expression of the likelihood function is simplified only in the static case:
p = b2 = 0.
The previous model has been estimated by the SML method using eight years
of daily observations of Standard and Poor's 500 index for the years 1980-7
[T 2022], and an accelerated Gaussian importance sampler. Several estimation
results are given in Table 6.8 depending on the parameters of the model which
are a priori constrained to zero. In particular, the first two columns of the table
correspond to static cases.
5
Danielsson(1994).
APPENDIX 6A. THE INFINITESIMAL OPERATOR
143
As expected, the coefficient b2 giving the specific dynamic of the volatility is

highly significant. Moreover, the coefficient b\ is also significant, which means
that the volatility does not admit an autonomous evolution.
Appendix 6A: Form of the Infinitesimal Operator

Proof. We shall just sketch the proof. We have:
by using the stationarity assumption. A second order expansion around the value
y, y provides:
since E[dWt/yt = y] = 0, and the other terms are negligible. Finally, we get:
since
QED
Applications to Switching
Regime Models
7.1 Endogenously Switching Regime Models
7.1.1 Static disequilibrium models
The canonical static disequilibrium model is defined as:
where z\t, Z2t are (row) vectors of observable exogenous variables, y*t and yt
are latent endogenous variables, yt is observed, and (si,, 2<) are independently
N(Q, /2) distributed. The parameter vector is 0 = (a{ ,a'2,a\, a2)'.
In this simple canonical case the likelihood function is easily computed and is
equal to:
where ^ and 4> are the p.d.f. and the c.d.f. of N(Q, 1) respectively.
However, in multimarket disequilibrium models with nonlinear demand or supply
functions, or in models with micro markets (see Laroque and Salanie 1989), the
likelihood function becomes very complicated, and in some cases untractable. In
order to solve this problem, Laroque and Salanie (1989) introduced various versions of the simulated pseudo-maximum likelihood (SPML) method. Moreover, in
Laroque and Salanie (1994) an evaluation of these methods based on experiments
is given. In these experiments the model retained is the previous canonical model
in which (z\t, Z2t) is a bivariate vector following
independently for any t and where a\ = 0.2 = a\ a-i-=\.

The estimation methods considered are the full information maximum likelihood
method, three versions of the pseudo-maximum likelihood methods, called PML1,
146
CHAPTER 7. APPLICATIONS TO SWITCHING REGIME MODELS
PML2, and QGPML, and their simulated analogues, called SMPL1, SMPL2,
and SPMLG. The three PML methods are obtained by maximizing the following
pseudo-likelihood functions:
PML1:
PML2:
QGPML:
where 21 is a preliminary estimate of 0 based on the PML2 method and where
m\(zt, 0) and v ( z t , 9) are, respectively, the mean and the variance of yt derived
from the model and given by:
with
and:
with:
In other words, these PML methods are based on normal pseudo-likelihood functions, although v, is clearly not normal; note that in this case, the PML1 and the
QGPML reduce, respectively, to the nonlinear and the quasi-generalized nonlinear
least squares methods.
The simulated analogues of these methods are obtained by replacing m\(zt,0) and
v(z t , 0) by approximations mf(zt, 0) and vs(zt, 0) based on simulations, namely:
and:
7.1. ENDOGENOUSLY SWITCHING REGIME MODELS
147
TABLE 7.1: Mean estimates on the retained samples (out of 200 samples); constrained estimator a\ = 02 = cr.
Coefficient
(true value)
ai
(1.00)
2
(1.00)
CT
(1.00)
T
20
50
80
20
50
80
20
50
80
FTML
1.04
1.03
1.03
1.04
1.02
1.02
0.88
0.94
0.95
PML1
1.06
1.07
1.09
1.26
1.29
1.26
1.84
1.98
1.98
Coefficient
(true value)
ai
(1.00)
a2
(1.00)
a
(1.00)
5= 5
0.98
0.95
0.95
1.00
0.96
0.94
0.64
0.37
0.30
f
20
50
80
20
50
80
20
50
80
SPML1
5 = 10
1.01
0.96
0.97
1.03
0.96
0.95
0.79
0.44
0.40
PML2
S = 20
1.07
1.00
0.98
1.04
0.99
0.96
1.01
0.64
0.51
QGPML
S= 5
1.04
0.99
1.06
0.96
1.08
0.96
1.27
1.01
1.30
0.97
1.25
0.96
1774072
1.94
0.45
1.89
0.42
1.02
1.02
1.02
1.03
1.03
1.02
0.88
0.94
0.95
S= 5
1.06
1.09
1.10
1.08
1.09
1.09
1.23
1.32
1.36
SPML2
S= 10
1.04
1.09
1.07
1.07
1.05
1.05
1.02
1.06
1.08
S = 20
1.05
1.05
1.05
1.04
1.05
1.04
0.94
0.99
1.00
SQGPML
S = 10
S = 20
1.02
1.07
0.98
0.98
0.98
0.99
1.02
1.06
0.97
0.98
0.95
0.97
5776LOT"
0.54
0.61
0.43
0.54
with:
where (j ( , e^), s = 1 , . . . , S are drawn independently in N(Q, li).

It is readily seen from the expression for mi that a\ and 02 are not first order
identifiable and that only s = (of + <r|)1/2 is first order identifiable; so, when
implementing the first order PML methods PML1, QGPML, and their simulated
analogues SMPL1 and SQGPML, the constraint a\ = a2 = a has been imposed.
The Monte Carlo study conducted by Laroque and Salanie (1994) shows many
results on spurious maxima, on importance sampling, on the estimated asymptotic
standard errors of the estimators, on their empirical standard deviations, and on
their biases. The results on the biases in the constrained case (a\ ai = a)
are summarized in Table 7.1. As expected, the second order PML and SPML
methods are better than the first order ones, especially for the estimation of a2
and a. However, the PML1 and QGPML methods are here dominated by their
simulated analogues. The PML2 and SPML2 methods with S = 10 or 20 give
very similar results, both of them being very close to those of the FIML method.
148

7.1.2
Dynamic disequilibrium models

SPML methods
Let us now consider the following dynamic disequilibrium model:
where z\t, lit are (row) vectors of observable exogenous variables, y*t and y2t
are latent endogenous variables, y, is observed, and (s\t, e2t) are independently
,/V(0, h) distributed. (The cases of more than one lag or of autocorrelated disturbances are straightforward extensions.)
The likelihood function of such a model is intractable. In order to evaluate the
complexity of this function, let us first introduce the notations:
where r, is the regime indicator.

There is a one to one relationship between (y*t, y2t) and (yt ,y*,r,), and the conditional p.d.f. of (y,, y*, rt) given J;_i = (y*^, y2,t-i, Zt) = (ytî_, y^, rt~i, z,)
is
where m* r (0) andm2 f (0) are the conditional expectations of y*t and y2t givenJ,-!,
i.e.
and 9 is a notation for the parameter set (a{,a'2,b\,b2,c\,C2,a\, 02)'. Note that
the p.d.f. ft(y,, y*, r,/Xt~i', &) given in (7.4) is taken with respect to the measure
A-2 <E> (So + S\), where A.2 is the Lebesgue measure on R2 and 50, Si are the unit
masses on 0 and 1.
We can deduce the p.d.f. of VT, y, ?Y (given some initial values):
7.1. ENDOGENOUSLY SWITCHING REGIME MODELS
149
Therefore the likelihood function, or the p.d.f., of yL is:
This likelihood function appears as a sum of 2r T-dimensional integrals, and is,

therefore, a priori intractable.
In this context, it is possible to use the simulated PML method based on static (or
unconditional) moments. More precisely, let us define
for some k,
The PML1 and PML2 methods, based on normal pseudo-likelihood functions and
on static (or unconditional) moments, consist in minimizing, respectively,
and
Since Mt(9) and Vt(0) do not have closed forms, we consider their simulated
analogues in which Mt(0) and Vt(0) are replaced by approximations based on
path simulations of the model, Yf(6)s = l,...,S:
M,5(0) = -y/(0)
Laroque and Salanie (1993) proposed a Monte Carlo study based on
where the z\t are independently distributed in N(2.5, 2.52), zit = 5, (e1(, e2()
follow independently AT(0, /i), a\ = a2 = a\ = a2 = 1, b = 0.5, 7" = 50,
5 = 10, 20, 50, k = 0, and 200 replications have been performed.
The results obtained for the SPML2 estimates are reproduced in Table 7.2. As can
be seen, the estimation biases are rather small in spite of the fact that k has been
taken equal to 0, i.e. Y, is simply yt.
150
CHAPTER?. APPLICATIONS TO SWITCHING REGIME MODELS
TABLE 7.2: Monte Carlo experiment in the dynamic disequilibrium model.

True value
~a}
b
a2
CTI
02
LOO
0.50
1.00
1.00
LOO
S = 10
L02
0.50
1.01
1.03
L04
Mean
estimate
S~= 20 T^50
L 0 4 L O T
0.50
0.49
1.00
0.99
0.99
0.98
1.00
0.96
Dispersion of
estimate
S"^TO5^20S^TO
O22
O24
O3T~
0.06
0.06 0.05
0.08
0.06 0.05
0.28
0.26 0.32
0.25
0.23 0.20
SML methods
Lee (1995) has proposed two SML methods based on the following decompositions
afft(yt,yf,rt/It-i;0):
and
Using decomposition (7.8), the likelihood function (7.6) appears as the expectation
of the function Hf=i ft(yt/rt,Zt-i', 0) with respect to the variables v j f , . . . , y*r,
r\,... ,TT, and using the probability distribution whose p.d.f. is:
Drawing independently S paths (>f09), r/(<9), t = 1 , . . . , T), s = 1 , . . . , S, in

this distribution, we get an unbiased simulator of T(9)
where !;?_, (0) = {3^, 3^(0), ^(0), z,}.

Similarly, using decomposition (7.9) we get another unbiased simulator:
7.2. EXOGENOUSLY SWITCHING REGIME MODELS
151
where the S paths (yfs(9), rst(6), t = 1 , . . . , T), s = 1 , . . . , S, have been drawn

independently in the distribution whose p.d.f. is:
The latter method gives better results than the former, and in this case the three
required p.d.f.s are:
The last two p.d.f.s show how to draw rt and yr* at each date t: first rr is drawn
in the distribution on {0,1} defined by f t ( l / y t , T-t-\, 0), then y* is drawn in the
truncated normal distribution defined by ft(y*/yt, rt, lt-\;.6). (We have seen in
Section 5.1.3 how to realize such a drawing efficiently.)
The results of a Monte Carlo study using this method are reported in Lee (1995)
and seem encouraging; however, it seems difficult to extend this approach to more
complicated models where the likelihood function is intractable even in the static
case (for instance, multimarket models).
7.2
Exogenously Switching Regime Models
7.2.1 Markovian vs. non-Markovian models

We consider the case of a model with two regimes, denoted by r, = 1 and r, 0.
The generalization to the case of k regimes is straightforward. These regimes are
driven by a Markov chain, defined by:
where the u, follow independently U[o,i\, the uniform distribution on [0, 1]; n0
and ;TI are, respectively, the probabilities of staying in state 0 and state 1.
152
Let us now assume that the observed endogenous variable yt is given by:
where {v,} is a white noise independent of {u,}, with a known distribution, where
3>/Ip = (yt-i, . - , yt-p), r't_p (rt, ..., r,-p), and g is a known function.
The conditional distribution of (yt, r,) given (y,_--\_, r t _i) depends only on (y'tlp,
r'lp), and the process(y?, r t ) is jointly Markovian of order p. (Note that this would
remain true if r, were Markovian of order p and if the transition probabilities were
functions of y'tlp.)
In this case Kitagawa's algorithm, described in Section 6.3.2, can be used to
compute the likelihood function (by taking y* = r,). The integrals appearing in
this algorithm are sums over the 2P+1 possible values of r't_ , and the algorithm
may be tractable if p is not too large. The algorithm thus obtained has been used by
Hamilton in various contexts, in particular in a switching AR(p) model (Hamilton
1989) in which (7.12) is specified as:
where <t>(L) is a polynomial of degree p in the lag operator L, with 4>(0) = 1.

However, when (yt, r t ) is not Markovian such an algorithm cannot be used, since
the number of terms in the sums at date t would be 2', which makes these sums
rapidly uncomputable.
So, we will extend the model denned by (7.11), (7.12), and in this more general
framework we will define inference methods based on simulations.
7.2.2
A switching state space model and the partial Kalman filter1
Let us consider a switching state space model defined by:
where {s,}, {r]t} are independent standard Gaussian white noises, {u,} is a white
noise, independent of {s,}{rjt}, whose marginal distribution is W[o,i], the uniform
distribution on [0, 1] and yo, zo, ro are nonrandom.
The dimensions of y, and y* are denoted by n and k respectively. The regime
indicator variable rt is assumed to take two values 0 and 1, but an extension to any
finite number of regimes is straightforward, y, is observable, r, is unobservable,
and y* is (at least partially) unobservable.
'This subsection refers to Billio and Monfort (1995).
153
Note that, from (7.15), y* does not cause rt in the Granger sense, but y, may cause
rt and, therefore, rt may be not strongly exogenous but only predetermined.
The framework defined by (7.13)-(7.15) contains many interesting models as
particular cases: switching ARMA models (which may have a nontrivial moving
average part), switching factor models, dynamic switching regressions, deformed
time models, models with endogenously missing data, and so on (see Billio and
Monfort 1995).
The partial Kalman filter, denoted by KF(rj-), is defined as the Kalman filter
mechanically applied to the linear state space model obtained from (7.13), (7.14)
for any given sequence rj.
It can be shown that, if the assumption of noncausality from y* to rt (implied by (7.15)) holds, the conditional distributions (yf/yt-\_, rj), (yf/y,_, rj),
and (yt/yt-i_,fj) are normal distributions whose expectations and variancecovariance matrices are the outputs of the partial Kalman filter:
These outputs are very useful tools for the computation of the likelihood function
of the model, or of the filters of y* and rt. (Similarly a partial Kalman smoother
can be defined and used for the computation of the smoothers of y* and rt.) In
particular, the p.d.f. f(yr/yt_-i_, rj) of N(yt/t-$rj), M,/,-i(rj)) will be useful for
the computation of the likelihood function.
7.2.3 Computation of the likelihood function

The basic sequential sampling method
Let us define, for any rr, the partial likelihood function as:
and let us denote by P the probability distribution on {0, l}r defined by:
where p(i/yt^-\_, r,-$, i =0,1, are the probabilities of the Bernoulli distributions:
154
The likelihood function 1T can be written as:
In theory, this formula provides an unbiased simulator of tj and a way of approximating 1T from S independent simulated paths {f/}, s = 1 , . . . , S, drawn in P;
this approximation is:
However this method provides very poor results and must be improved.
Sequentially optimal sampling (SOS) methods
Using arguments based on the sequential optimality of importance sampling methods, several improvements of the basic sequential sampling method have been
proposed (in Billio and Monfort 1995).
(i) The first order sequentially optimal sampling (SOS(l)) method is based on
the following unbiased simulator of IT'-
where:
and where the 5 paths (f,,r 1 , . . . , T - 1), s = 1 , . . . , S, have been

independently drawn in the distribution on {0, l} r ~' defined by:
with
155
(ii) The second order sequentially optimal sampling (SOS(2)) method uses the
unbiased simulator (assuming T even):
where
and where each of the S paths (r*, t = I , . ..,T 2), s = 1 , . . . , 5, has

been drawn by pairs using sequentially the probabilities
(iii) A strong second order sequentially optimal sampling (SOS*(2)) has also
been proposed and is based on the unbiased simulator:
where
and where the S paths (rfs, r = 1,..., T - 2) have been sequentially drawn
from:
156

TABLE 7.3: (T - 100).
Simulator
Simulated
Log simulated
Variance
likelihood mean likelihood mean
xlO 5
13ASIC3.52 x 10~20L43J4L2x 10"34
SOS(l)
0.9902
1.0001
288
SOS(2)
1.0096
0.9999
199
1.0078
12
0.9999
SOS*(2)
This procedure could be generalized to an SOS*(p) method; but, clearly, the
computational burden increases with pin the limit case p T we would get
the exact likelihood function.
Monte Carlo study
In order to evaluate the performances of the previous methods, we consider a simple
case in which the likelihood function is computable by Hamilton's algorithm,
namely the switching AR(1) model:
with:
where r, is a two state Markov chain defined by TTQ = 0.89 (probability of staying
at 0) and ?TI = 0.84 (probability of staying at 1), and {e;} is a standard Gaussian
white noise independent of {r,}.
We considered samples of various lengths T drawn from this process.
For each of the four previous methods (BASIC, SOS(l), SOS(2), SOS*(2)) we
computed 5 = 1 0 000 simulations of the value of the likelihood function at the true
parameter. Each simulation was divided by the true value of the likelihood function,
and Table 7.3 gives, for T = 100, the mean of these normalized simulations, the
log of this mean divided by the log of the likelihood function, and the estimated
variance of the mean of the normalized simulations.
From Table 7.3 it is clear that the basic method does not work at all, whereas
the SOS(l), SOS(2), SOS*(2) give satisfactory results and the ordering of their
respective performances is as expected.
Figure 7.1 shows the variability of the various simulators. Note that we have
plotted the log simulators in order to be able to show them on the same graph (with
the same scale).
Figure 7.2 shows the convergence rate of the log simulated mean for the various
methods (except the basic method, which does not converge). From this figure it
appears that the SOS*(2) method is the best: for this method, the mean is close
to 1 as soon as 5 is larger than 50.
157
FIGURE 7.1: Log simulated likelihood/log likelihood, AR model, T = 100; S =

10000.
FIGURE 1.1: Convergence ol the log simulated likelihood/log likelihood, AR

model, T = 100; S = 500.
REFERENCES
AiT-SAHALlA, Y. (1996), 'Non Parametric Pricing of Interest Rate Derivative

Securities', Econometrica, 64: 527-64.
AMARO de MATOS, J. (1994), 'MSM Estimators of American Option Pricing
Models', INSEAD Discussion Paper, presented at AFFI Meeting, Tunis.
ANDERSEN, T., and B. SORENSEN (1993), 'GMM Estimation of a Stochastic
Volatility Model: A Monte-Carlo Study', Discussion Paper, Northwestern
University.
ANDERSON, B., and J. B. MOORE (1979), 'Optimal Filtering', Prentice-Hall,
Englewood Cliffs, NJ.
ANDREWS, D. (1993), 'Exactly Median Unbiased Estimation of First Order Autoregressive/Unit Root Models', Econometrica, 61: 139-65.
BERRY, S., and A. PARES (1990), 'The Performance of Alternative Simulation
Estimators', Discussion Paper, Yale University.
BIANCHI, C., R. CESARI, and L. PANATTONI (1993), 'Alternative Estimators of
a Diffusion Model of the Term Structure of Interest Rate: A Monte Carlo
Comparison', Discussion Paper presented at ESEM, Uppsala.
BIERINGS, H., and K. SNEEK (1989), 'Pseudo-Maximum Likelihood Techniques
in a Simple Rationing Model for the Dutch Labour Market', mimeo, University of Limburg and Free University of Amsterdam.
BiLLio,M., andA.MONFORT(1995), 'Switching State Space Models: Likelihood
Function, Filtering and Smoothing', Discussion Paper no. 9557, CREST.
BLACK, E, and M. SCHOLES (1973), The Pricing of Options and Corporate
Liabilities', Journal of Political Economy, 81: 637-59.
160
REFERENCES
BLOEMEN, H., and A. KAPTEYN (1990), 'The Joint Estimation of a Non-Linear

Labour Supply Function and a Wage Equation Using Simulated Response
Probabilities', Discussion Paper, Tilburg University.
BOLDUC, D. (1992), 'Generalized Autoregressive Errors in the Multivariate Probit
Model', Transportation Research BMethodological, 26: 155-70.
(1994), 'Estimation of Multinomial Probit Models Using Maximum Simulated Likelihood with Analytical Derivatives and GHK Choice Probability
Simulator', Discussion Paper, Laval University.
and M. KACI (1991), 'Estimation des modeles probit polytomiques: un
survol des techniques', Discussion Paper 9127, Laval University.
BORCH-SUPAN, A., and V. HAJIVASSILIOU (1993), 'Smooth Unbiased Multivariate Probability Simulators for Maximum Likelihood Estimation of Limited
Dependent Variable Models', Journal of Econometrics, 58: 347-68.
L. KOTLIKOFF, and J. MORRIS (1991), 'Health, Children, and Elderly
Living Arrangements: a Multiperiod, Multinational Probit Model with Unobserved Heterogeneity and Autocorrelated Errors', in Topics in the Economics of Aging, D. Wise (ed.), University of Chicago Press.
BOSSAERTS, P. (1989), 'The Asymptotic Normality of Method of Simulated Moments Estimators of Option Pricing Models', Discussion Paper, Carnegie
Mellon University.
and P. Hillion (1988), 'Method of Moment Tests of Contingent Claims
Asset Pricing Models', Discussion Paper, Carnegie Mellon University.
BRENNAN, M., and E. SCHWARTZ (1979), 'A Continuous Time Approach to the
Pricing of Bonds', Journal of Banking and Finance, 3: 135-53.
BROWN, B. (1990), 'Simulation Based Semi-Parametric Estimation and Prediction in Nonlinear Systems', Discussion Paper, Rice University.
and R. S. MARIANO (1984), 'Residual-Based Procedure for Prediction and
Estimation in Nonlinear Simultaneous Systems', Econometrica, 52: 32143.
BROZE, L., and C. GOURIEROUX (1993), 'Covariance Estimators and Adjusted
Pseudo-Maximum Likelihood Method', CORE Discussion Paper 9313.
O. SCAILLET, and J. M. ZAKOIAN (1995), 'Testing for Continuous-Time
Models of the Short Term Interest Rate', Journal of Empirical Finance, 2,
199-223.
and J M ZAKOIAN (1995a), 'Quasi Indirect Inference for Diffusion
Processes', CORE Discussion Paper 9505.
REFERENCES
161
(1995&), Tests de Specification fondes sur des simulations: le

cas des diffusions', Discussion Paper, CORE.
BUTLER, J. S., and R. MOFFITT (1982), 'A Computationally Efficient Quadrature
Procedure for One-Factor Multinomial Probit Model', Econometrica, 50:
761-4.
CANOVA, F. (1992), 'Statistical Inference by Calibration', Discussion Paper,
Brown University.
CHAN, K., G. KAROLYI, F. LONGSTAFF, and A. SANDERS (1992), 'An Empirical
Comparison of Alternative Models of the Short Term Interest Rate', Journal
of Finance, 47: 1209-27.
CHESHER, A., and T. LANCASTER (1983), 'The Estimation of Models of Labour
Market Behaviour', Review of Economic Studies, 50: 609-24.
CLARK, P. (1973), 'A Subordinated Stochastic Process Model with Finite Variance
for Speculative Prices', Econometrica, 41: 135-55.
CLEMENT, E. (1994), 'Inference statistique des processus de diffusion', Discussion Paper 9404, CREST.
COLEMAN, J. (1990), 'Solving the Stochastic Growth Model by Policy-Function
Iteration', Journal of Business and Economic Statistics, 8: 27-30.
CONLEY, T, L. HANSEN, E. LUTTMER, and J. SCHEINKMAN (1994), 'Estimating
Subordinated Diffusions from Discrete Data', Discussion Paper, University
of Chicago.
Cox, D. R. (1961), 'Test of Separate Families of Hypotheses', in Proceedings of
the Fourth Berkeley Symposium on Mathematical Statistics and Probability,
i, University of California Press, Berkeley, 105-23.
DACUNHA-CASTELLE, D., and D. FLORENS (1986), 'Estimation of the Coefficient
of a Diffusion from Discrete Observation', Stochastics, 19: 263-84.
DANIELSSON, J. (1993), 'Multivariate Stochastic Volatility', Discussion Paper,
University of Iceland.
(1994), 'Stochastic Volatility in Asset Prices: Estimation with Simulated
Maximum Likelihood', Journal of Econometrics, 64: 375^00.
and J.-F. RICHARD (1993), 'Accelerated Gaussian Importance Sampler with
Application to Dynamic Latent Variable Models', Journal of Applied Econometrics, 8: 153-73.
DELLAPORTAS, P., and A. SMITH (1993), 'Bayesian Inference for Generalized
Linear and Proportional Hazards Models via Gibbs Sampling', Applied
Statistics, 42: 443-59.
162
REFERENCES
DEN HAAN, W., and A. MARCET (1994), 'Accuracy in Simulations', Review of

Economic Studies, 61: 3-17.
DE WlNNE, R. (1994), 'Processes de diffusion de taux d'interet et correction du
biais de discretisation', mimeo, FUCAM.
DIEBOLD, F., and M. NERLOVE (1989), 'The Dynamic of Exchange Rate Volatility: a Multivariate Latent Factor ARCH Model', Journal of Applied Econometrics, 4: 1-22.
DUFFIE, D., and K. SINGLETON (1993), 'Simulated Moments Estimation of
Markov Models of Asset Prices', Econometrica, 61: 929-52.
ENGLE, R. (1982), 'Autoregressive Conditional Heteroscedasticity with Estimates
of the Variance of United Kingdom Inflations', Econometrica, 50: 9871007.
and G. LEE (1994), 'Estimating Diffusion Models of Stochastic Volatility',
Discussion Paper, University of California at San Diego.
V. No, and M. ROTHSCHILD (1990), 'Asset Pricing with a Factor ARCH
Covariance Structure: Empirical Estimates for Treasury Bills', Journal of
Econometrics, 45: 213-37.
ERDEN, T., and M. KEANE (1992), 'A Dynamic Structural Model for Estimating
Market Structure in Panel Data', Discussion Paper, University of Alberta.
FRACHOT, A., and C. GOURIEROUX (1994), Titrisation et remboursements anticipes, Economica, Paris.
J. P. LESNE, and E. RENAULT (1995), 'Indirect Inference Estimation of
Factor Models of the Yield Curve', Discussion Papers, Universite d'Evry.
GALLANT, A. R. (1987), Nonlinear Statistical Models, John Wiley, New York.
and G. TAUCHEN (1996), 'Which Moments to Match?', Econometric Theory
(forthcoming).
and H. WHITE (1988), A Unified Theory of Estimation and Inference for
Nonlinear Dynamic Models, Basil Blackwell, Oxford.
D. HSIEH, and G. TAUCHEN (1994), 'Estimation of Stochastic Volatility
Models with Diagnostics', Discussion Paper, Duke University.
GELFAND, A., and A. SMITH (1990), 'Sampling Based Approaches to Calculating
Marginal Densities', Journal of the American Statistical Association, 85:
398-409.
REFERENCES
163
S. HILLS, A. RACINE POON, and A. SMITH (1990), 'Illustration of Bayesian

Inference in Normal Data Models using Gibbs Sampling', Journal of the
American Statistical Association, 85: 972-85.
A. SMITH, and T. LEE (1992), 'Bayesian Analysis of Constrained Parameter
and Truncated Data Problems', Journal of the American Statistical Association, 87: 523-32.
GENON-CATALOT, V. (1990), 'Maximum Contrast Estimation for Diffusion Processes from Discrete Observations', Statistics, 21: 99-116.
GEWEKE, J. (1989a), 'Efficient Simulation from the Multivariate Normal Distribution Subject to Linear Inequality Constraints and the Evolution of Constraint Probabilities', Discussion Paper, Duke University.
(1989/?), 'Statistical Inference in Dynamic Behavioral Models using the
Simulated Multinomial Likelihood Function', Discussion Paper, North
Carolina University.
(1991), 'Efficient Simulation From the Multivariate Normal and Student tDistributions Subject to Linear Constraints', Computing Science and Statistics, Proceedings of the Twenty Third Symposium on the Interface. Computing Science and Statistics, America Statistical Association, Alexandria,
VA, 571-78.
M. KEANE, and P. RUNKLE (1992a), 'Alternative Computational Approaches
to Statistical Inference in the Multinomial Probit Model', Discussion Paper,
Federal Reserve Bank of Minneapolis.
and P. RUUD (1992ft), 'Alternative Computational Approaches
to Statistical Inference in the Multinomial Probit Model', Discussion Paper,
University of Minnesota.
J. R. SLONIM, and G. ZARKIN (1992c), 'Econometric Solution Methods
for Dynamic Discrete Choice Problems', Discussion Paper, University of
Minnesota.
GHYSELS, E., C. GOURIEROUX, and J. JASIAK (1994a), 'Market Time and Asset
Price Movements: Theory and Estimation', Discussion Paper, CREST and
CIRANO.
GHYSELS, E., L. KHALAF, and C. VODOUNOU (1994ft), 'Simulation Based Inference in Moving Average Models', Discussion Paper, University of Montreal.
GOURIEROUX, C. (1992), Modeles ARCH et applications financieres, Economica,
Paris; English translation, Springer Verlag, forthcoming.
and F. JOUNEAU (1994), 'Multivariate Distributions for Limited Dependent
Variable Models', Discussion Paper, CEPREMAP.
164
REFERENCES
and A. MONFORT (1989), Statistique et modeles econometriques, Economica, Paris; English translation, Cambridge University Press, 1995.
(1991), 'Simulation Based Econometrics in Models with Heterogeneity', Annales d'Economie et de Statistique, 20/1: 69-107.
(1993a), 'Simulation Based Inference: a Survey with Special Reference to Panel Data Models', Journal of Econometrics, 59: 5-33.
(1993>), 'Pseudo Likelihood Methods', in Handbook of Statistics, G.
S. MADDALA, C. R. RAO, and H. VINOD (eds.), North-Holland, Amsterdam.
(1995), 'Testing, Encompassing, and Simulating Dynamic Econometric
Models', Econometric Theory, 11: 195-228.
and A. TROGNON (1984o), 'Estimation and Test in Probit Models with
Serial Correlation', in Alternative Approaches to Time Series Analysis, J.
P. FLORENS, et al. (eds.), University St Louis, Brussels.
(1984>), 'Pseudo-Maximum Likelihood Methods: Theory',
Econometrica, 52: 681-700.
(1984c), 'Pseudo-Maximum Likelihood Methods: Applications
to Poisson Models', Econometrica, 52: 701-20.
A. E. RENAULT, and A. TROGNON (1987), 'Simulated Residuals',
Journal of Econometrics, 34: 201-52.
(1991), 'Dynamic Factor Models', Discussion Paper, CREST.
(1993), 'Indirect Inference', Journal of Applied Econometrics,
8: 85-118.
E. C. RENAULT, and N. Touzi (1994), 'Calibration by Simulation for Small
Sample Bias Correction', Discussion Paper, CREST.
GREGORY, A. W., and G. W. SMITH (1991), 'Calibration as Testing Inference

in Simulated Macroeconomic Models', Journal of Business and Economic
Statistics, 9: 297-303.
GUARD, T. C. (1988), Introduction to Stochastic Differential Equations, Marcel
Dekker, New York.
HAJIVASSILIOU, V. A. (1993a), 'A Simulation Estimation Analysis of the External Debt Crises of Developing Countries', Discussion Paper 1057, Cowles
Foundation, Yale University.
(1993>), 'Simulation Estimation Methods for Limited Dependent Variable
Models', in Handbook of Statistics, ii, G. S. MADDALA, C. R. RAO, and H.
VrNOD (eds.), North-Holland, Amsterdam, 519-43.
REFERENCES
165
(1993c), 'Simulation for Multivariate Normal Rectangle Probabilities and

their Derivatives: The Effects of Vectorization', in International Journal of
Supercomputer Applications, 231-53.
and D. McFADDEN (1989), 'Country Heterogeneity and External Debt
Crises: Estimation by the Method of Simulated Moments', mimeo.
(1990), 'The Method for Simulated Scores for the Estimation of LDV
Models with an Application to External Debt Crises', Discussion Paper 697,
Cowles Foundation, Yale University.
and P. RUUD (1994), 'Estimation by Simulation', forthcoming in Handbook
of Econometrics, iv, C. ENGLE, and D. McFADDEN (eds.), North-Holland,
Amsterdam.
D. McFADDEN, and P. RUUD (1996), 'Simulation of Multivariate Normal
Rectangle Probabilities', Journal of Econometrics, 72: 85-134.
HAMILTON, J. (1989), 'A New Approach to the Economic Analysis of Nonstationary Time Series and the Business Cycle', Econometrica, 57: 357-84.
HANSEN, L. (1982), 'Large Sample Properties of Generalized Method of Moments Estimators', Econometrica, 50: 1029-54.
and J. SCHEINKMAN (1995), 'Back to the Future: Generating Moment
Implications for Continuous Time Markov Processes', Econometrica, 63(4):
767-804.
and K. SINGLETON (1982), 'Generalized Instrumental Variables Estimation
of Nonlinear Rational Expectations Models', Econometrica, 50: 1269-86.
HARVEY, A., E. Ruiz, and N. SHEPHARD (1994), 'Multivariate Stochastic Variance Models', Review of Economic Studies, 61: 247-64.
HASTINGS, W. K. (1970),'Monte Carlo Sampling Methods Using Markov Chains
and their Applications', Biometrika, 57: 97-109.
HAUSMAN, J., and D. WISE (1978), 'A Conditional Probit Model for Qualitative
Choice: Discrete Decisions, Recognizing Interdependence and Heterogenous Preferences', Econometrica, 46: 403-26.
HEATON, J. (1995),'An Empirical Investigation of Asset Pricing with Temporally
Dependment Reference Specifications', Econometrica, 63: 681-717.
HECKMAN, J. (1981), 'Statistical Models for Discrete Panel Data', in Structural
Analysis of Discrete Data with Economic Applications, C. MANSKI and D.
MCFADDEN (eds.), MIT Press, Cambridge, Mass.
166
REFERENCES
HOTZ, J., and R. MILLER (1993), 'Conditional Choice Probabilities and the Estimation of Dynamic Programming Models', Review of Economic Studies,
60: 497-530.
and S. SANDERS (1990), 'The Estimation of Dynamic Discrete Choice Models by the Method of Simulated Moments', Discussion Paper, University of
Chicago.
R. MILLER, S. SANDERS, and J. SMITH (1992), 'A Simulation Estimator
for Dynamic Models of Discrete Choice', Discussion Paper, University of
Chicago.
HULL, J., and A. WHITE (1987), The Pricing of Options on Assets with Stochastic
Volatility', Journal of Finance, 3: 281-300.
ICHIMURA, H., and T. SCOTT-THOMPSON (1993), 'Maximum Likelihood Estimation of a Binary Choice Model with Random Coefficients of Unknown
Distribution', Discussion Paper 268, University of Minnesota.
INGRAM, B. E, and B. S. LEE (1991), 'Estimation by Simulation of Time Series
Models', Journal of Econometrics, 47: 197-207.
JENNRICH, R., (1969), 'Asymptotic Properties of Nonlinear Least Squares Estimators', Annals of Mathematical Statistics, 40: 633-43.
KEANE, M., and K. WOLPIN (1992), 'Solution and Estimation of Discrete Dynamic Programming Models by Simulation: Monte Carlo Evidence', Discussion Paper, University of Minnesota.
KEANE, M. P. (1990), 'Four Essays in Empirical Macro and Labor Economies',
Ph.D. dissertation, Brown University.
(1993), 'Simulation Estimation for Panel Data Models with Limited Dependent Variable Models', in Handbook of Statistics, ii, G. S. MADDALA,
C. R. RAO, and H. VINOD (eds.), North-Holland, Amsterdam, 545-70.
(1994), 'A Computationally Practical Simulation Estimator for Panel Data
with Applications to Estimating Temporal Dependence in Employment and
Wages', Econometrica, 62: 95-116.
KENNEDY, J., and J. GENTLE (1980), Statistical Computing, Marcel Dekker, New
York.
KlEFER, N., and G. NEUMANN (1979), 'An Empirical Job Search Model with
a Test of the Constant Reservation Wage Hypothesis', Journal of Political
Economy, 87: 89-107.
KlM C. J. (1994), 'Dynamic Linear Models with Markov Switching', Journal of
REFERENCES
167
KIM, S., and N. SHEPHARD (1994), 'Stochastic Volatility: Likelihood Inference

and Comparison with ARCH Models', Discussion Paper, Nuffield College,
Oxford.
KING, M., M. SENTANA, and S. WADHWANI (1990), 'A Heteroscedastic Model of
Assets Returns and Risk Premia with Time Varying Volatility: an Application to Sixteen World Stock Markets', mimeo, London School of Economics.
KlTAGAWA, G. (1987), 'Non Gaussian State Space Modeling of Nonstationary
Time Series', Journal of the American Statistical Association, 85: 1032-41.
KOOREMAN, P., and G. RIDDER (1983), 'The Effects of Age and Unemployment Percentage on the Duration of Unemployment', European Economic
Review, 20: 41-57.
LAFFONT, J. J., H. OSSARD, and Q. VUONG (1995), 'Econometrics of First-Price
Auction', Econometrica, 63: 953-80.
LANCASTER, T. (1990), The Econometric Analysis of Transition Data, Cambridge
University Press.
LAROQUE, G., and B. SALANIE (1989), 'Estimation of Multi-Market Fix-Price
Models: an Application of Pseudo Maximum Likelihood Methods', Econometrica, 57: 831-60.
(1993), 'Simulation Based Estimation of Models with Lagged Latent
Variables', Journal of Applied Econometrics, 8: 119-33.
(1994), 'Estimating the Canonical Disequilibrium Model: Asymptotic
Theory and Finite Sample Properties', Journal of Econometrics, 62: 165210.
LEE, L. F. (1990), 'On the Efficiency of Methods of Simulated Moments and
Maximum Simulated Likelihood Estimation of Discrete Response Models',
Discussion Paper 260, University of Minnesota.
(1995a), 'Simulation Estimation of Dynamic Switching Regression and
Dynamic Disequilibrium Models: Some Monte Carlo Results', Working
Paper 9512, Hong Kong University.
(1995&), 'Asymptotic Bias in Simulated Maximum Likelihood of Discrete
Choice Models', Econometric Theory, 11: 937-83.
LERMAN, S., and C. MANSKI (1981), 'On the Use of Simulated Frequencies to
Approximate Choice Probabilities', in Structural Analysis of Discrete Data
with Econometric Applications, C. MANSKI and D. McFADDEN (eds.), MIT
Press, Cambridge, Mass, 305-19.
168
REFERENCES
LlPPMAN, S., and J. McCALL (1976), 'The Economics of Job Search: a Survey',
Economic Inquiry, 14: 155-367.
LlPSTER, R. S., and A. N. SHIRYAYEV (1977), Statistics of Random Processes, I
General Theory, Springer-Verlag, Berlin.
(1978), Statistics of Random Processes, II, Applications, SpringerVerlag, Berlin.
LO, A. (1988), 'Maximum Likelihood Estimation of Generalized Ito Processes
with Discretely Sampled Data', Econometric Theory, 4: 231-47.
McCULLAGH, P., and J. A. NELDER (1989), Generalized Linear Models, Chapman
& Hall, London.
McFADDEN, D. (1976), 'Quantal Choice Analysis: a Survey', Annals of Economics and Social Measurement, 5: 363-90.
(1989), 'A Method of Simulated Moments for Estimation of Discrete Response Models without Numerical Integration', Econometrica, 57: 9951026.
and P. RUUD (1987), 'Estimation of Limited Dependent Variable Models
from the Regular Exponential Family by the Method of Simulated Moment',
Discussion Paper, University of California at Berkeley.
(1990), 'Estimation by Simulation', Discussion Paper, MIT.
McGRATTAN, E. (1990), 'Solving the Stochastic Growth Model by LinearQuadratic Approximation', Journal of Business and Economic Statistics,
8: 41-4.
McKiNNON, J. G., and A. A. SMITH (1995), 'Approximate Bias Correction in
Econometrics', Discussion Paper, Queen's University.
MAGNAC, T., J. M. ROBIN, and M. VISSER (1995), 'Analysing Incomplete Individual Employment Histories Using Indirect Inference', Journal of Applied
MALINVAUD, E. (1970), The Consistency of Nonlinear Regressions', Annals of
Mathematical Statistics, 41: 956-69.
MARCET, A. (1993), 'Simulation Analysis of Dynamic Stochastic Models: Applications to Theory and Estimation', Discussion Paper 6, University of
Barcelona.
MARIANO, R. S., and B. W. BROWN (1985), 'Stochastic Prediction in Dynamic
Nonlinear Econometric Systems', Annales de 1'INSEE, 59-60: 267-78.
REFERENCES
169
MELINO, A., and S. M. TURNBULL (1990), 'Pricing Foreign Currency Options

with Stochastic Volatility', Journal of Econometrics, 45: 239-65.
METROPOLIS, N., A. W. ROSENBLUTH, M. N. ROSENBLUTH, A. H. TELLER, and
E. TELLER (1953), 'Equations of State Calculations by Fast Computing
Machines', Journal of Chemical Physics, 21: 1087-92.
MIZON, G. E., and J.-F. RICHARD (1986), 'The Encompassing Principle and its
Application to Testing non Nested Hypotheses', Econometrica, 54: 657-78.
MONFARDINI, C. (1996), 'Estimating Stochastic Volatility Models Through Indirect Inference', Discussion Paper, European Institute, Florence.
MORAN, P. (1984), 'The Monte-Carlo Evaluation of Orthant Probabilities for
Multivariate Normal Distributions', Australian Journal of Statistics, 26: 3944.
MUHLEISEN, M. (1993), 'Simulation Estimation of State Dependence Effects in
Unemployment', Discussion Paper, University of Munich.
NELSON, D. (1990), 'ARCH Models as Diffusion Approximations', Journal of
NERLOVE, M., and T. SCHUERMANN (1992), 'Testing a Simple Joint Hypothesis of
Rational and Adaptive Expectations with Business Surveys: an Exercise in
Simulation Based Inference', Discussion Paper, University of Pennsylvania.
and M. WEEKS (1992), 'The Construction of Multivariate Probability Simulators with an Application to the Multivariate Probit Model', Discussion
Paper, University of Pennsylvania.
NEWEY, H. (1982), 'Maximum Likelihood Estimation of Misspecified Models',
NEWEY, W. K. (1989a), 'Locally Efficient, Residual Based Estimation of Nonlinear Models', Discussion Paper, Princeton University.
(1989&), 'Distribution-Free Simulated Moment Estimation of Nonlinear
Errors in Variables Models', Discussion Paper, MIT.
(1993), 'Flexible Simulated Moment Estimation of Nonlinear Errors in
Variables Models', Discussion Paper 9318, MIT.
NEWEY, W. K., and K. D. WEST (1987), 'A Simple, Positive Definite Heteroscedasticity and Autocorrelation Consistent Covariance Matrix', Econometrica, 55: 703-8.
NlCKELL, S. (1979), 'Estimating the Probability of Leaving Unemployment',
170
REFERENCES
FAKES, A., and D. POLLARD (1989), 'Simulation and the Asymptotics of Optimization Estimators', Econometrica, 57: 1027-57.
PARDOUX, E., and D. TALAY (1985), 'Discretization and Simulation of Stochastic
Equations', Acta Applicandae Mathematica, 3: 23-47.
PASTORELLO, S., E. RENAULT, and N. Touzi (1994), 'Statistical Inference for
Random Variance Option Pricing', Discussion Paper, CREST.
PESARAN, H., and B. PESARAN (1993), 'A Simulation Approach to the Problem
of Computing Cox's Statistic for Testing Non-Nested Models', Journal of
RAO, P. (1988), 'Statistical Inference from Sampled Data for Stochastic Processes', Contemporary Mathematics, 20, American Mathematical Society.
RICHARD, J.-F. (1973), Posterior and Predictive Densities for Simultaneous Equation Models, Springer-Verlag, Berlin.
(1991), 'Applications of Monte Carlo Simulation Techniques in Econometrics and Game Theory', Discussion Paper, Duke University.
ROBERT, C. P. (1996), Methodes de Monte Carlo par Chames de Markov, CREST.
ROBINSON, P. (1982), 'On the Asymptotic Properties of Models Containing Limited Dependent Variables', Econometrica, 50: 27-41.
RUDEBUSCH, G. D. (1993), 'Uncertain Unit Root in Real GNP', The American
Economic Review, 83: 264-72.
RUUD, P. (1991), 'Extensions of Estimations Methods Using the EM Algorithm',
Journal of Econometrics, 49: 305-41.
SCOTT, L. (1987), 'Option Pricing when the Variance Changes Randomly: Theory, Estimation and Application', Journal of Financial and Quantitative
Analysis, 22: 419-38.
SHEPHARD, N. (1993), 'Fitting Nonlinear Time Series with Applications to
Stochastic Variance Models', Journal of Applied Econometrics, 8: 56384.
(1994), 'Partial non-Gaussian State Space', Biometrika, 81(1): 115-31.
SMITH, A. (1990), 'Three Essays on the Solution and Estimation of Dynamic
Macroeconometric Models', Ph.D. dissertation, Duke University.
(1993), 'Estimating Nonlinear Time Series Models Using Simulated Vector
Autoregressions', Journal of Applied Econometrics, 8: 63-84.
REFERENCES
171
STEIN, E. M., and J. C. STEIN (1991), 'Stock Price Distribution with Stochastic
Volatility: an Analytic Approach', Review of Financial Studies, 4: 727-52.
STERN, S. (1992), 'A Method of Smoothing Simulated Moments of Probabilities
in the Multinomial Probit Models', Econometrica, 60: 943-52.
THELOT, C. (1993), 'Note surlaloilogistiqueet 1'imitation', Annalesde 1'INSEE,
42: 111-25.
TIERNEY, L. (1994), 'Markov Chains for Exploring Posterior Distributions (with
discussion)', Annals of Statistics, 22: 1701-62.
Touzi, N. (1994), 'A Note on Hansen-Scheinkman's Back to the Future: Generating Moment Implications for Continuous Time Markov Processes', Discussion Paper, CREST.
VAN DlJK, H. K. (1987), 'Some Advances in Bayesian Estimation Methods using
Monte Carlo Integration', in Advances in Econometrics, vi, T. B. FOMBY
and G. F. RHODES (eds.), JAI Press, Greenwich, CT.
VAN PRAAG, B. M.S., and J. P. HOP (1987), 'Estimation of Continuous Models on
the Basis of Set-Valued Observations', Paper presented at the Econometric
Society European Meeting, Copenhagen.
WHITE, H. (1982), 'Maximum Likelihood Estimation of Misspecified Models',
WIGGINS, J. (1987), 'Option Values under Stochastic Volatility', Journal of Financial Economics, 19: 351-72.
ZEGER, S., and R. KARIM (1991), 'Generalized Linear Models with Random
Effects: a Gibbs Sampling Approach', Journal of the American Statistical
Association, 86: 79-86.
ZELLNER, A., L. BAUWENS, and H. VAN DIJK (1988), 'Bayesian Specification
Analysis and Estimation of Simultaneous Equation Models using Monte
Carlo Methods', Journal of Econometrics, 38: 39-72.
INDEX
accelerated acceptance-rejection method 109
accelerated Gaussian importance sampling 49
acceptance-rejection method 109
adjusted PML method 6
aggregation effect 10
ARCH model 13, 18, 141, 143
auction model 14
bias correction 44, 56
binding function 67, 85
Black-Scholes formula 122, 130
Brennan-Schwartz model 127
changing time process 121
conditional simulation 15, 17
conditioning 45
continuous time model 10
diffusion equation 10, 120
discrete choice model 93
disequilibrium model 10, 14, 17, 145, 148
duration model 9, 113
dynamic condition moment 23, 54
efficient method of moment 76
Euler approximation 10,119,122
expectation maximization (EM) 50
factor ARCH model 13, 18, 141, 143
factor model 120, 138
frequency simulator 26, 96
finite sample bias 80
Gauss-Newton algorithm 29
generalized method of moments 5,21
geometric Brownian motion 122
GHK simulator 98, 105
Gibbs sampling 60, 109
heterogeneity 11,12,113
implicit long run equilibrium 75
implicit volatility 131
importance sampling simulator 25,49, 96

indirect identification 82, 87
indirect inference 61, 66, 77, 100, 119, 121
indirect information 82, 84, 87
individual effect 46
infinitesimal operator 133, 142
instrumental model 61, 72, 83, 120
instrumental variable 21,22,33
inversion technique 16
job search model 9
Kitagawa's filtering algorithm 139
Kullback-Leibler information criterion 67
latent variable 8,25, 35,75, 96,103, 138
leverage effect 137
limited dependent variable 8,91,103
linear exponential family 51
linearization bias 75
macroeconometrics 75
maximum likelihood 4,35,41,43
method of simulated moments 24, 27,29, 31, 37,
67,94
Metropolis-Hastings algorithm 50, 58
Mill's ratios 101
moment calibration 19, 20
moving average parameter 71
multinomial probit model 8, 26
multivariate logistic model 100
Newton-Raphson algorithm 4, 8
nonlinear estimation method 2
nonlinear least squares 5,41,56
order identifiability condition 66
option price 129
optimal instruments 33
optimal MSM 31
optimization estimators 6
Ornstein-Uhlenbeck model 123,129
173
174
panel probit model 46
partial Kalman filter 152
path calibration 19,20
path simulation 15, 17, 18
probit model 17
pseudo-maximum likelihood 4, 50, 51, 145
pseudo-true value 6, 67
quadratic exponential family 53
quasi-maximum likelihood 53, 145
random parameter 11,18
recursive truncated normal 104
INDEX
simulated maximum likelihood 42, 142, 150
simulated nonlinear least squares 55, 56
simulated PML 54, 148
simulated score 35, 36
simulator 24
smooth simulator 26
static conditional moment 23, 54, 55
Stern simulator 97, 105
stochastic differential equation 119, 133
stochastic volatility 13, 120, 129, 137, 142
switching regime model 145,151
switching state space model 13, 49, 152
true value of the parameter 3
second order expansion 77

selection of the instruments 33
semi-parametric approach 44
sequential choice 9
sequentially optimal sampling 49, 154
simulated expectation maximization
algorithm 50
unbiased simulator 24, 28, 34, 41

weighted nonlinear least squares 52

Simulation-Based Econometric Methods PDF

Uploaded by

Copyright:

Available Formats

Simulation-Based Econometric Methods PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Simulation-Based Econometric Methods PDF

Uploaded by

Copyright:

Available Formats

What is the book about?

What is the book about?

What statistical methods are discussed in the book?

What statistical methods are discussed in the book?

CORE Lectures

Simulation-Based Econometric Methods

The "CORE Foundation" was set up in 1987 with

Great Clarendon Street, Oxford OXa GDP

We are especially grateful to L. Broze and B. Salanie for checking

This page intentionally left blank

1 Introduction and Motivations

2 The Method of Simulated Moments (MSM)

3 Simulated Maximum Likelihood, Pseudo-Maximum Likelihood,

Which moments to match?

Some consistent, but less efficient, procedures

Estimation of a moving average parameter

The efficient method of moment

Some Additional Properties of Indirect Inference Estimators

Second order expansion

Indirect information and indirect identification

Appendix 4A: Derivation of the Asymptotic Results

Consistency of the estimators

Appendix 4B: Indirect Information and Identification: Proofs

Another expression of I (0)

5 Applications to Limited Dependent Variable Models

MSM and SML Applied to Qualitative Models

Discrete choice model

Qualitative Models and Indirect Inference based on Multivariate

The use of the approximations when correlation is present 102

Simulators for Limited Dependent Variable Models based on

Approximations of a multivariate normal distribution in a

Constrained and conditional moments of a multivariate

Simulators for constrained moments

Simulators for conditional moments

Labour supply and wage equation

Test of the rational expectation hypothesis from business

Estimation of Stochastic Differential Equations from Discrete Observations by Indirect Inference

Comparison between indirect inference and full maximum

Specification of the volatility

Estimation of Stochastic Differential Equations from Moment

Moment conditions deduced from the infinitesimal

State space form and Kitagawa's filtering algorithm

An auxiliary model for applying indirect inference on factor ARCH models

Appendix 6A: Form of the Infinitesimal Operator

7 Applications to Switching Regime Models

Static disequilibrium models

Exogenously Switching Regime Models

Introduction and Motivations

CHAPTER 1. INTRODUCTION AND MOTIVATIONS

1.2 A Review of Nonlinear Estimation Methods

where xt denotes (zt, yt-\). The equality

is a condition for Sims or Granger noncausality of (y r ), on (zt). In other words,

and, therefore, it is equivalent to consider the links between Z \ , . . - , Z T

1.2. A REVIEW OF NONLINEAR ESTIMATION METHODS

where 0 C Rp is the parameter set. This model is assumed to be well-specified;

OQ is called the true value of the parameter.

1.2.2 Estimators defined by the optimization of a criterion

CHAPTER 1. INTRODUCTION AND MOTIVATIONS

where |^r is the Hessian matrix. The condition ^jj~(0) = 0 is approximately

Under stationarity conditions, we generally have: ftr = T, hT = \/T, hT/hT