Simulation-Based Econometric Methods PDF
Simulation-Based Econometric Methods PDF
Simulation-Based Econometric Methods PDF
CORE Lectures
SIMULATION-BASED
ECONOMETRIC METHODS
CHRISTIAN GOURIEROUX
and
ALAIN MONFORT
OXFORD U N I V E R S I T Y PRESS
This book has been printed digitally and produced in a standard specification
in order to ensure its continuing availability
OXFORD
UNIVERSITY PRESS
Contents
1
1
2
2
3
6
7
8
10
11
12
19
19
20
20
21
21
22
24
14
15
15
15
18
viii
CONTENTS
2.3.1 Simulators
2.3.2 Definition of the MSM estimators
2.3.3 Asymptotic properties of the MSM
2.3.4 Optimal MSM
2.3.5 An extension of the MSM
Appendix 2A: Proofs of the Asymptotic Properties of the MSM Estimator
2A. 1 Consistency
2A.2 Asymptotic normality
24
27
29
31
34
37
37
38
50
50
55
56
56
57
58
58
59
4 Indirect Inference
4.1 The Principle
4.1.1 Instrumental model
4.1.2 Estimation based on the score
4.1.3 Extensions to other estimation methods
4.2 Properties of the Indirect Inference Estimators
4.2.1 The dimension of the auxiliary parameter
61
61
61
62
64
66
66
41
41
42
42
44
45
48
CONTENTS
4.3
4.4
ix
4.2.2
67
4.2.3
Asymptotic properties
69
4.2.4
71
Examples
71
4.3.1
71
4.3.2
Application to macroeconometrics
75
4.3.3
76
77
4.4.1
77
4.4.2
82
84
4A.1
85
4A.2
Asymptotic expansions
86
Computation of / I (0)
89
89
91
93
5.1
5.2
93
5.1.1
93
5.1.2
Simulated methods
94
5.1.3
Different simulators
96
5.3
100
5.4
100
103
103
5.3.2
104
5.3.3
107
Empirical Studies
112
5.4.1
112
CONTENTS
5.4.2
6.2
119
119
6.1.1
The principle
119
6.1.2
121
6.1.3
125
6.3
113
115
133
133
137
Factor Models
6.3.1 Discrete time factor models
138
138
6.3.2
139
6.3.3
141
142
6.3.4
143
145
145
7.1.1
7.1.2
7.2
145
148
151
151
152
153
References
159
Index
173
In Section 1.3, we describe different problems for either individual data, time
series, or panel data, in which the usual criterion functions contain integrals.
Finally, in Section 1.4 we give general forms for the models we are interested in,
i.e. models for which it is possible to simulate the observations.
In the following sections we assume that there is no feedback between the y and
the z variables. More precisely, we impose:
an
Note that we have assumed that the conditional p.d.f. fo(yt/xt) does not depend
on t; more precisely, we assume that the process (yt, zt) is stationary. Also note
that, in cross sections or panel data models, the index t will be replaced by i and
the 3>,-s will be independent conditionally to the z,s.
To summarize, we are essentially interested in a part (1.2) of the true unknown distribution of all the observations, since the whole distribution consists
of /o(yi, - . . , yr/zi,... ,ZT, yo) and of the true unknown marginal distributions
fa(zi, . . . , Z T , y o ) o f z i , . . . , Z T , y o In order to make an inference about the conditional distribution, i.e. fo(yt/xt), we
introduce a conditional parametric model. This model M is a set of conditional
distributions indexed by a parameter 9, whose dimension is p:
(1.4)
and identifiable; i.e., there exists a unique (unknown) value 90 such that:
In practice, the function ^>T is generally differentiable and the estimator is deduced
from the /^-dimensional system of first order conditions:
With few exceptions, this system does not admit an analytical solution and the
estimation is obtained via a numerical algorithm. The most usual ones are based
on the Gauss-Newton approach. The initial system is replaced by an approximated one deduced from (1.7) by considering a first order expansion around some
value Oq:
As soon as the sequence 6q converges, the limit is a solution 8j- of the first order
conditions.
The criterion function may be chosen in various ways depending on the properties
that are wanted for the estimator in terms of efficiency, robustness to some misspecifications and computability. We describe below some of the usual estimation
methods included in this framework.
Example 1.1 (Conditional) maximum likelihood
The criterion function is the (conditional) log likelihood function:
Such pseudo-models are often based on the normal family of distributions. Let us
consider a well-specified form of the first and second order conditional moments:
(say).
which implies:
for any function a. The idea of the GMM is to look for a value 0T, such that the
empirical counterparts of the constraints (1.14) are approximately satisfied. More
precisely, let us introduce r #-dimensional functions aj, j = 1, ...,r, and denote
A (ai,... ,ar). Let us also introduce a nonnegative symmetric matrix 2 of size
(r x r). Then the estimator is:
The elements of matrix A(xt) are instrumental variables (with respect to the constraints).
Example 1.5
Extended methods
It is possible to extend all the previous examples in the following way. Even if
we are interested in the parameters 6, we may introduce in the criterion function
some additional (nuisance) parameters a. Then we can consider the solutions OT ,
aT of a program of the form:
Equivalently, we have:
and if this limit function has a unique maximum 0, then the estimator OT =
arg maxe tyT (0) converges to this value. Therefore in practice three conditions
have to be fulfilled:
(i) the uniform convergence of the normalized criterion function;
(ii) the uniqueness of 0 (identifiability condition of 0 with respect to the criterion function);
(iii) the equality between this solution Og (often called pseudo-true value) and
the true value OQ.
Asymptotic normality. Whenever the estimator is consistent, we may expand the
first order conditions around the true-value OQ. We get:
The criterion function is often such that there exist normalizing factors HT, h*T,
with:
Therefore we get:
We deduce that:
Vr.
Optimal choice of the criterion function. Some estimation approaches naturally
lead to a class of estimators. For instance, in the GMM approach we can choose in
different ways the instrumental variables A and the distance matrix 1. Therefore
the estimators &r(A, 2) and their asymptotic variance-covariance matrices,
are naturally indexed by A and 2. So, one may look for the existence of an optimal
choice for this couple, A, Q, i.e. for a couple A*, 2* such that:
where <3C is the usual ordering on symmetric matrices. For the usual estimation
methods, such optimal estimators generally exist.
such as the Newton-Raphson algorithms, cannot be used directly since they require a closed form of the criterion function *. Generally this difficulty arises
because of the partial observability of some endogenous variables; this lack of
observability will introduce some multidimensional integrals in the expression of
the different functions and these integrals could be replaced by approximations
based on simulations. In such models it is usual to distinguish the underlying endogenous variables (called the latent variables), which are not totally observable,
and the observable variables. The latent variables will be denoted with an asterisk.
As can be clearly seen from the examples below, this computational problem
appears for a large variety of applications concerning either individual data, time
series, or panel data models, in microeconomics, macroeconomics, insurance,
finance, etc.
1.3.1
f/y > Uu
VI ^= j.
The selection model is obtained in two steps. We first describe the latent variables (i.e. the utility levels) as functions of explanatory variables z, using a linear
Gaussian specification:
U,j = zijbj + Vij,
j = 1 , . . . , M, i = 1 , . . . , n,
Ut = Zib + vt, i = l,...,n,
where /,- = (Un,..., UiMy, u{ = (vn,..., viM)', and v{ ~ W(0, 2).
Then in a second step we express the observable choices in term of the utility
levels:
In such a framework the endogenous observable variable is a set of dichotomous
qualitative variables. It admits a discrete distribution, whose probabilities are:
where (MI,,-/, u2iit) are i.i.d. normal N(0, 2), it is directly seen that the distribution
of the duration has probabilities defined by the multidimensional integrals:
P[D, =d] = P[y*t. , < y* u , . . . , ),*.,_, < j,* . ,_ y* ., > V2*w}.
The maximal dimension of these integrals may be very large; since the probabilities
appearing in the likelihood function correspond to the observed values of the
durations, this dimension is equal to max,=1 ndi, where n is the number of
individuals.
The same remark applies for the other examples of sequential choices, in particular
for prepayment analysis (see Frachot and Gourieroux 1994). In this framework
the latent variables are an observed interest rate and a reservation interest rate.
10
1.3.2
Aggregation effect
where zt are some exogenous factors, stlj, sts some macroeconomic error terms,
which are the same for the different markets, and end, e" some error terms specific
to each market.We assume that (e^, e") are i.i.d., independent of Zt, Std,
We now consider some observations of the exchanged quantity at a macro level.
If N is large, we get:
where W, is a Brownian motion. (j,(yt, 9) is the drift term, and a(yt, 9) is the
volatility term. However, the available observations correspond to discrete dates;
they are denoted by y\, yi, ..., yt, yt+\, .... The distribution of these observable
variables generally does not admit an explicit analytical form (see Chapter 6),
but appears as the solution of some integral equation. This difficulty may be
seen in another way. Let us approximate the continuous time model (1.19) by
its discrete time counterpart with a small time unit \/n. We introduce the Euler
approximation yM of the process v.
11
This process is defined for dates t = k/n and satisfies the recursive equation:
where (e*) is a standard Gaussian white noise. The distribution of y\, yz, , yt,
yt+i,... may be approximated by the distribution of y[ , y^ ,..., y^n\ yt+i,
We note that the process y^ is Markovian of order one and we deduce that it
is sufficient to determine the distribution of y^ conditional on y^ to deduce
the distribution we are looking for. Let us introduce the conditional distribution
of 3$li)/B given yffn: f(y$+n/n/yffn) say. From (1.20), this distribution is nor-
mal with mean yffn + ^[yffn; 0], and variance ^<r2(yffn; 9). The conditional
distribution of y^ given y^ is given by:
1.3.3
Unobserved heterogeneity1
where 0,-, i = l,...,n, arei.i.d. variables, for instance distributed as N(0, S2), and
independent of the error terms WL Then the conditional probability distribution
function (p.d.f.) of the endogenous variable yt, given the exogenous variables z,-,
is:
where K is the number of explanatory variables and <p the p.d.f. of the standard
normal distribution.
'See Gourieroux and Monfort (1991, 1993a).
12
where for instance the u\ are i.i.d. variables, with given p.d.f. g. Then the conditional distribution of y, given z, is:
Except for some very special and unrealistic choices of the couple of distributions (/, g) (for instance Poisson-gamma for count data, exponential-gamma for
duration data), the integral cannot be computed analytically.
Two remarks are in order. First, the dimension of the integral is equal to the
number of underlying scoring functions and is not very large in practice. The
numerical problems arise from the large number of such integrals that have to be
evaluated, since this number, which is equal to the number of individuals, may
attain 100000-1000000 in insurance problems, for instance.
Second, a model with unobserved heterogeneity may be considered a random
parameter model, in which only the constant term coefficient has been considered
random.
1.3.4
where (e^,), (e2,) are independent i.i.d. error terms the distributions of which are
known (see 1.4.2), and (z r ) depicts an observable process of exogenous variables.
13
The process fe) is assumed to be independent of the process (e\,, s2t). Vt-i
denotes the set of past values y,^, y,-2i,... of the process y. (yt) is the observable
endogenous process and (y*) is the endogenous unobservable latent factor.
If the functions rife, yt-\, y*, ; 6>) and r 2 fe, yt-\, y*_i, ; 0) are one to one,
the previous equations define the conditional p.d.f. f ( y t / Z t , y-i, y*', 0) and
f(y*/Zt, yt-i, y*-i', 6)- Therefore the p.d.f. of yr_, y? given ZT_ (and some initial values) is OLi /(^/z*. tt-i. }: 0)f(yf/Zt_, y,-i, y*_^, 0), and the likelihood
function, i.e. the p.d.f. of y r , appears as the multivariate integral
where y,, A. are m-dimensional vectors and C is a lower triangular (m x m) matrix. We introduce here a single factor y*, which is assumed to satisfy an ARCH
evolution:
Example 1.13 Stochastic volatility model (Harvey etal. 1994, Danielsson and
Richard 1993, Danielsson 1994)
The simplest model of this kind is
where (ei,, 2?) is a standard bivariate Gaussian white noise. In this kind of model
the conditional distribution of yt given y* is W(0, exp y*) and the latent factor y*
is an AR(1) process.
Example 1.14 Switching state space models (Shephard 1994, Kim 1994, Billio
and Monfort 1995)
These models are:
14
where {et}, {n,} are independent standard Gaussian white noises and {u,} is a
white noise, independent of {st}{n,} and whose marginal distribution is W[o,i]> the
uniform distribution on [0, 1],
In this model the first factor y*t is a quantitative state variable whereas yzt is a
binary regime indicator. (The case of more than two regimes is a straightforward generalization.) This general framework contains many particular cases:
switching ARMA models, switching factor models, dynamic switching regressions, deformed time models, models with endogenously missing data, etc.
Example 1.15 Dynamic disequilibrium models (Laroque and Salanie 1993,
Lee 1995)
In this kind of model the latent factors are demand and supply, whereas the observed
variable yt is the minimum of the demand and the supply. Such a model is:
Note that no random error appears in r\; this implies that the model is, in some way,
degenerated and that the general formula given above for the likelihood function is
not valid. However, it is easily shown (see Chapter 7) that the likelihood function
appears as a sum of 2T T-dimensional integrals.
1.4. SIMULATION
15
1.4 Simulation
1.4.1 Two kinds of simulation
For a given parametric model, it is possible to define the distribution of y\,..., yj
conditional on z\, .., zr. yo. /(7zi,.. , zr, W #), say, and the distribution of
yt conditional on xt = (z,, yt-i), f ( - / x t ; 0), say. It is particularly important in
the sequel to distinguish two kinds of simulation.
Path simulations correspond to a set of artificial values (yst(0), t = 1,..., T) such
that the distribution of y{(9),..., ysT(6) conditional on z\, , ZT, y_o_ is equal to
f(-/zi,...,zT,yo,0).
Conditional simulations correspond to a set of artificial values (}f (#), t =
1 , . . . , T) such that the distribution of yst(0) conditional on xt = fo, yt-\) is
equal to f ( - / x t ; 0), and this for any t.
It is important to note that these simulations may be performed for different values
of the parameter, and that the conditional distributions of the simulations will
depend on these values.
Moreover, it is possible to perform several independent replications of such a set of
simulations. More precisely, for path simulations we can build several sets ys(9) =
(yf(9), t = 1 , . . . , T),s = 1 , . . . , 5, such that the variables ys (9) are independent
conditionally on z\, .., ZT, yo and y\,... ,yr- This possibility is the basis of
simulated techniques using path simulations, since the empirical distribution of
the y*(9), s 1 , . . . , 5 will provide for large S a good approximation of the
untractable conditional distribution /(-/Zi, - . , ZT, yo', 9).
Similarly, for conditional simulations we can build several sets ys(9) = (yst(6),
t = l,...,T),s - l,...,S, such that the variables y/(0), t - 1 , . . . , T, s =
1 , . . . , 5, are independent conditionally on z i , . . . , ZT, yo- This possibility is the
basis of simulated techniques using conditional simulations, since the empirical
distribution ofyst(9), s 1 , . . . , S will provide for large S a good approximation
of the untractable conditional distribution /(-/z, yt-\\ 9), and this for any t.
16
Example 1.16
Inversion technique
If e* is an error term with a unidimensional distribution whose cumulative distribution function (c.d.f.) Fg(-), parameterized by 9, is continuous and strictly
increasing, we know that the variable
Weibull distribution:
Cauchy distribution:
1.4. SIMULATION
17
Path simulations
The models of the previous subsections are often defined in several steps from
some latent variables with structural interpretations. We may jointly simulate the
latent and the observable endogenous variablesfor instance demand, supply,
and exchanged quantityin the dynamic disequilibrium model (example 1.15),
the underlying factor and the observed vector in the ARCH model (example 1.12),
the utility levels and the observed alternatives in the multivariate probit model
(example 1.6).
After the preliminary transformation of the error terms, the models have the following structure:
Conditional simulations
For a general dynamic model with unobservable latent variables such as (1.21),
it is not in general possible to draw in the conditional distribution of y, given
z\, .., ZT, Jt-\- However, this possibility exists if the model admits a reduced
form of the kind:
where (st) is a white noise with a known distribution. The conditional simulations
are defined by:
which are computed conditionally to the simulated values and not to the observed
ones.
18
into two subvectors such that the p.d.f. conditional to a path (z t , u,) has a closed
form. This means that the integration with respect to the remaining errors w, is
simple. We have:
for any fixed path (yt). Therefore we can approximate the unknown conditional
p.d.f. by:
i-l,...,n,
(2.1)
where #o is the unknown true value of a scalar parameter, (M,-) are i.i.d. variables
with known p.d.f. g(u), and r is a given function.
We introduce the first order moment of the endogenous variable:
and assume that the function k does not have a closed form.
Now we can replace the unobservable errors by simulations drawn independently
from the distribution g. If u\,i 1 , . . . , , are such simulations, we deduce
simulated values of the endogenous variables associated with a value 9 of the
parameter by computing
20
Path calibration
Under usual regularity conditions, this estimator tends asymptotically to the solution 6*00 of the limit problem:
2.1.2
Moment calibration
21
As soon as k is one to one, this limit problem has the unique solution 9^ = 9o,
and the estimator is consistent.
2.2
This estimation method has already been introduced in Chapter 1. Here we briefly
recall its main properties, making a distinction between the static and the dynamic
case. The GMM will be approximated by the method of simulated moments
(MSM). In particular, the expression of the asymptotic variance-covariance matrix
of the GMM estimator will serve as a benchmark for measuring the efficiency loss
arising from simulations.
2.2.1
where E0 is the expectation for the true distribution of (y, z), and 80 is the true
value of the parameter whose size is p.
Now let Z, be a matrix function of Zi withsize(, q), whereof > p. Theelements
of Z, may be seen as intrumental variables, since they satisfy the orthogonality
conditions:
The GMM estimators are based on the empirical counterpart of the above orthogonality conditions. If fi is a (K, K) symmetric positive semi-definite matrix, the
estimator is defined by:
22
Proposition 2.1
Proposition 2.2
Then
and:
where 6n is any consistent estimator of OQ, for instance the GMM estimator with
^2 = Id.
2.2.2
When lagged endogenous variables are present in the model, we have to distinguish
two cases.
23
and has the same asymptotic properties as in the static case after replacing Zt by x,.
GMM based on static conditional moments
Let us now introduce estimating constraints based on static moments, i.e. conditional only on the current and lagged exogenous variables zt '
Then the instrumental variables have also to depend only on zt ' Zt = Z(zt).
The estimator solution of
where
24
In practice, the choice between dynamic and static conditional moments will be
based on the type of models considered. It is clear that more information is contained in the dynamic conditional moments than in the static conditional moments,
but static conditional moments may be easier to compute, for instance in models
of type (M*) (see 1.21) where unobservable endogenous variables are present.
The distinction is particularly important for pure time series models, i.e. when
exogenous variables z are absent. Introducing as endogenous variable
where (e;) has a known distribution; then we deduce from the definition of the
conditional moment,
that
We say that k [z,-, e*; 0] = K [r(z,, ef, 0), z,-], where ef, drawn from the distribution of Si, is a (conditionally) unbiased simulator of fc(z,-; 0). This natural
simulator may have drawbacks in terms of precision or in terms of discontinuity
'McFadden 1989, Pakcs and Pollard 1989.
25
with respect to 9 (see example 2.2), so it may be useful to look for other unbiased
simulators k(zt, ,-; 0), where ut has a known distribution such that:
Therefore:
where ut, drawn from the distribution with p.d.f. tp, is a (conditionally) unbiased
simulator. As a matter of fact we have exhibited a class of simulators, depending
on the choice of function <p, called an importance function. The precision of the
MSM estimators defined below will depend on the choice of the simulatori.e.,
in the previous framework, of function (p.
Finally we can note that such a choice may depend on the value of the conditioning
variable: we may introduce a conditional known distribution (p(ui/zi), and the
associated simulator:
26
yi = (yn, -,}';j)/,
where:
However, this simulator, called the frequency simulator, is not differentiable (and
not even continuous) with respect to the parameters b j , AJ. It may be replaced
by another simulator based on importance functions. Indeed, let us consider the
variables
Vjt Uij - Uu,
measuring the differences between the utility levels. The distribution of
V j i , . . . , V j j - i , Vjj+i,..., Vjtj is a normal distribution.
Let us denote
fj(vj/Zi, 0) the associated p.d.f. We have:
27
where:
Such estimators depend on the moments K that are retained, on the instruments, on
the matrix Q, on the choice of the simulator, and on the number S of replications.
When S tends to infinity, | ?=! k(zi, u}\ 9) tends to E[k(zi, u; 0)/Zi\ = k(zr, 9),
and the estimator coincides with the GMM estimator.
Dynamic case
As before, we distinguish the cases of dynamic and static conditional moments,
even if the objective functions are similar.
Dynamic conditional moment Let us consider the dynamic moment condition:
and an unbiased simulator of k(xt; 9); this simulator k(x,,u;0) is such that
E[k(xt, u; 0)/xt] = k(xt\ 0), where the distribution of u given xt is known. Then
the simulated moment estimator is defined by:
where:
28
where:
29
2.3.3
Proposition 2.3
with:
whrer
si
the
30
Moreover, this additional effect of the simulation depends on the quality of the
simulator V(k/z). For instance, if we consider a simulator with two random
generatorsk(z, iti,ii2',0), and the simulator obtained by integrating out 1/2 :
k(z, u\; 9) = E[k(z, u\, u2,; 0)/u\]we have:
Therefore the result directly follows by replacing V0[Z(k - k)] by this decomposition in the second expression of Qs(&) given in Proposition 2.3.
31
9_
90%
102 %
In this case, the asymptotic relative efficiency of the MSM estimator defined as
the smallest eigenvalue of
is, under the condition of Corollary 2.2, larger than (1 + j)~l. It is interesting to
note that the efficiency loss is not large even with a small number of replications.
We give in Table 2.1 the values of (1 +1) ~' (lower bound of the asymptotic relative
efficiency) and of (1 + |)+1/2 corresponding to the maximal relative increase in
the length of confidence intervals.
Dynamic case
For the MSM based on dynamic conditional moments, the results are identical to
the previous ones, and we do not repeat them.
For static conditional moments (or unconditional moments for pure time series),
the results are modified. More precisely, if ()>;?(#)) is a simulated path of the
endogenous process, and if the simulator used for Eg [K (y,, z,) /zt ] is K [yj (9), zt ],
then the asymptotic variance-covariance matrix of the MSM estimator is (1 + |)
times that of the GMM estimator (see Duffle and Singleton 1993).
32
From the Gauss-Markov theorem, we know that the nonnegative symmetric matrix
(D'Q.D)-lD'Q.Y1Q$iD(D'V,Drl is minimized for to = E 0 ~'. We deduce the
following result.
Proposition 2.4
where
As usual, the optimal matrix depends on the unknown distribution and has to be
consistently estimated. Let us consider the first term, for instance. We have:
where 0n is a consistent estimator of $oThis approximation is consistent; since k does not have a closed form, it has to
be approximated using the simulator k. To get a good approximation of fe, it is
necessary to have a large number of replications 52. Let us denote by M? 2, s =
1 , . . . 52, some other simulated values of the random term with known distribution,
the matrix
33
Then we can use a classical result on optimal instruments. If A and C are random
matrices of suitable dimensions, are functions of z, and are such that C is square
and positive definite, then the matrix
E0(A'Z')[E0(ZCZ')r1E0(ZA)
is maximized for Z = A'C~l and the maximum is Eo(A'C~lA).
Proposition 2.5: The optimal instruments are:
34
When S goes to infinity, we find the well-known optimal instruments for the GMM:
Also note that when k = K, i.e. when the frequency simulator is used, we have:
and, therefore, the optimal instruments are identical in the MSM and the GMM.
2.3.5
In the usual presentation of the GMM (see Hansen 1982), the true value of the
parameter is defined by a set of estimating constraints of the form:
where g is a given function of size q\ we consider the static case for notational
convenience. It is important to note that in the previous sections we have considered
a specific form of these estimating constraints, i.e.
What happens if we now consider the general form (2.22)? As before, we may
introduce an unbiased simulator of g ( y i t zi', 0). This simulator g ( y t , zi, ,-; 0),
which depends on an auxiliary random term uf with a known and fixed distribution
conditional on y;, n, is based on a function g with a tractable form and satisfies
the unbiasedness condition:
35
where ust,i = 1,... ,/i, s = 1,..., S, are independent drawings in the distribution
of u, and Z; are instrumental variable functions of zt
Proposition 2.6
where
and is a GMM estimator based on the estimating constraints associated with the
score function:
36
Therefore it is natural to consider the previous equality as an unbiasedness condition and to propose the unbiased simulator (3 log f * / d 0 ) ( y * s / z t ; 9), where yfs
is drawn in the conditional distribution of y* given y,, z,. If this drawing can be
summarized by a relation (in distribution), i.e.
In practice such a simulator will be used only if the function b has a simple form.
This is the case for some limited dependent variable models (see Chapter 5).
Example 2.5 Derivation of the asymptotic variance-covariance matrix for a
simulated score estimator
In the special case of the simulated score, the size of g is equal to the size of the
parameter and D is a square matrix. Therefore the asymptotic variance-covariance
matrix given in Proposition 2.6 does not depend on 7 and is equal to:
and, since
get:
In particular, if the instruments are Z = Id (these are the optimal instruments for
the GMM based on the score function), and if the simulator is based on the latent
37
where / and 7* are respectively the information matrix of the observable model,
The price that must be paid for the simulations is |7~1(7* - 7)7~ 1 ; as usual it
decreases as | when 5 increases and, moreover, it is proportional to the information
difference I* I between the latent model and the observable model.
When S is fixed and n goes to infinity, ^MSn(6) converges almost surely to:
38
39
It follows that:
with D = E0[zjj;(y, z; #o)L where EQ is the expectation with respect to the true
distribution of (y, z).
When n goes to infinity:
with
Particular case
In the particular case where
g(y, z, us; 6)=K(y, z) - k(z, u5; 9)
and g(y,z;0)=K(y,z)-k(z;8),
we get the results of Proposition 2.3. Since the general form of Qs(&) directly
reduces to the third form given in this proposition with D = E0 [zj~(z, 00)], the
two other forms are obtained immediately.
where f(yrjzr\ 0) is the conditional p.d.f. of yT = (y\, ... , yT), given ZT_ =
(zi,..., ZT) and some initial conditions.
We are interested in problems where this p.d.f. has an untractable form, and it is
important to distinguish two cases. In the first case it is possible to find unbiased
simulators of each conditional p.d.f. f ( y t / y t - i , Z t J , 0}, also denoted f ( y t / x t ; 9),
appearing in the decomposition:
This first case often occurs if the model has a well-defined reduced form
(see (1.22)):
42
However, such simulators do not generally exist in dynamic models with unobservable factors (see Section 1.3.4), defined as
In this second case other approaches must be found (see Section 3.1.5).
and where the conditional distribution of u given xt, yt is known. In practice this
distribution is often independent of xt, yt.
Then we may draw independently, for each index t, S simulated values ust, s =
1 , . . . , S, of the auxiliary random term u.
Definition 3.1 A simulated maximum likelihood estimator of 0 is:
It is obtained after replacement of the untractable conditional p.d.f. with an unbiased approximation based on the simulator.
43
where g is the p.d.f. of u. We know that the true value of the parameter #o is
the solution of maxeEQ l o g f ( y / x ; 6 ) , but it is not in general a solution of the
maximization of (3.3), since the log and the integral do not commute, and 9Sr is
not consistent. This inconsistency comes from the choice of the simulator / as
an unbiased simulator of /. If log f(yt,xt, u ; 9 ) were an unbiased simulator of
log f ( y , / x t ; 0), the limit function i/r^ would have been
When we compare Proposition 3.1 with the property of consistency of MSM valid
even for fixed S, the SML approaches may appear uninteresting. However:
(a) In practice it is sufficient to retain a number S of replications such that
9ST ~ Hindoo 9ST, and such a number often is of moderate size.
(b) The MSM also requires a large number of simulations in order to compute
the standard errors of the estimator.
44
(c) In finite samples, what really matters is the relative magnitude of the square
of the bias and of the variance of the estimator. As discussed below, the
magnitude of the bias may be reduced by the choice of suitable simulators, or
by the introduction of some correction terms, while the variance of an SML
estimator (close to the variance of the efficient ML estimator) is generally
smaller than the variance of an MSM estimator (close to the variance of an
inefficient GMM estimator).
(d) Finally, whereas the GMM approach may be preferred to the ML approach
since it requires less distributional assumptions (GMM is a semi-parametric
approach), this argument fails for the MSM approach which requires the
complete specification of the distribution for simulation purposes.
When the number of replications S tends to infinity, the simulation step may have
an effect on the asymptotic covariance matrix of the simulated ML estimator,
except if the speed of divergence of S is sufficiently large. The following result is
proved in Gourieroux and Monfort (1991).
Proposition 3.2
3.1.3
As expected, the bias depends on the choice of the simulator, and may be reduced
by a sensible selection of /. Moreover, the square of the bias compared with the
variance may be measured by:
Therefore it is small when the underlying ML estimator is precise, i.e. when [~l (00)
is small.
45
and:
3.1.4 Conditioning
In a number of examples described in Chapter 1, the conditional p.d.f. has an
integral form:
where is a subvector of the error term e appearing in the reduced form of the
model. In such a case it is possible to introduce the simulator:
f ( y , x, u; 0) f * ( y / x ; u; 9), where u has a distribution with p.d.f. g. (3.6)
46
where M,, tu,-( are independent variables with standard normal distributions. The
observable endogenous variables are either
yit = i\(y*>0-) probit model (crw may be constrained to 1),
or yit = yf^y*>o) Tobit model.
The simulators may be based on the conditional distribution of y given z and
the individual effect u, since it is easy to integrate out the other random terms w
because the Wit,t = l,...,T are independent. We get:
(a) probit model:
Example 3.2 Panel probit (or Tobit) model with individual effect and serially
correlated disturbances (Hajivassiliou and McFadden 1990, Keane 1990a, b,
Stern 1992, Gourieroux and Monfort 1993a)
The latent variables satisfy:
47
where uf, wit are independent variables with standard normal distribution. The
error term (e,-r) satisfies an autoregressive formulation and the initial value e,-i is
assumed to follow the marginal distribution associated with this autoregressive
scheme. For a probit (or Tobit) model based on this latent variable, the p.d.f. of
y conditional on z and u is no longer tractable (since the e,(, t = 1 , . . . , T are
correlated), but another conditioning is available. Let us consider the global error
term for individual i. It is a T-dimensional vector:
where ej is the vector whose components are equal to one. The variancecovariance matrix of this error term is:
where the entries of JT are one, and the generic entry of 1 is u>ij [cr2/
(l-p 2 )]pl''--' - l.
Lemma 3.1
It has a monotonic form with a minimum value for 0^(1 + |;0|)~~2 and the result
follows.
QED
and
48
49
where et = (s(t, s'2t) is a white noise whose distribution is known. If s\t and
e-it are contemporaneously independent, the function f(yt/Zt_, yt-i, y*\ #) (resp.
/(y*/Zr, Vt-i, y*-i> #)) appearing in the previous integral is the p.d.f. of the image distribution of the distribution of s\, (resp. s2t) by r\(zt, y<-i, y * , - ' , Q) (resp.
r2(zt,yt-i.y,*-i:-;0)).
For this kind of likelihood function the ML method is untractable; moreover,
the previous SML methods do not apply either. In this context three kinds of
solution have been proposed. The first one is based on numerical approximations and will not be described here (see Kitagawa 1987, and Chapter 6 below),
the second one is based on simulations of the whole likelihood function using
the importance sampling technique, and the third one is based on simulations of
"[log /(yr, y-r/zr., yo, Jo' Q)/yr] in me Expectation Maximization (EM) algorithm.
Importance sampling methods1
As previously seen, the likelihood function naturally appears as the expectation of the function \\5=lf(ytlzt_,yt-\,y*',0~) with respect to the p.d.f.
nLi/Cy*/?i' y-i' y*-i'< 0~)> where z and yt-\ are observed values. It is important to note that this p.d.f. is neither f ( y ^ / z r _ , yo, y^; 0) (except if yt_\
does not appear in f ( y * / Z t , yt-\, y*_i; 0), i.e. if (y,) does not cause (yf)), nor
f(yr/yT> ZT_> Jo. yo> ^)- However, it may be easy to draw in this p.d.f.; for instance, in the M* model such a drawing is recursively obtained by using the
formula
where es2t, t 1 , . . . , T, are independent drawings in the distribution of s2. Therefore an unbiased simulator of the whole likelihood function /(yr/Zr. yo, y^; #)
is:
where the yst (9) are drawn in the auxiliary p.d.f. mentioned above. This method
is clearly an importance sampling method.
This basic importance sampling method may be very slow, in the sense that the
simulator may have a large variance; therefore accelerated versions of this method
have been proposed, in particular the Accelerated Gaussian Importance Sampling
method in the Gaussian case (see Danielsson and Richard 1993), and the Sequentially Optimal Sampling methods of various orders in the switching state space
models (see Billio and Monfort 1995, and Chapter 7 below).
'See Danielsson and Richard (1993); Billio and Monfort (1995).
50
Since the LHS of the equation does not depend on y%., we have, for any values 0^
of the parameter,
Let us define 6* ((+1) as the value maximizing the first term of the RHS with respect
to 9. Using the Kullback inequality, it is easily seen that the # (i+1) thus obtained
is such that:
This is the principle of the EM algorithm, which is an increasing algorithm such that
the sequence 0(l) converges to the ML estimator. The problem with this algorithm
is that, although log / ( y T , y*T /zj_, y0, y; 9) has in general a closed form, the same
is not true for its conditional expectation:
51
52
and the PML1 method based on this normal family is the nonlinear least squares
method.
Example 3.4 Multivariate normal family
This kind of family is suitable for multivariate endogenous variables. The p.d.f.
associated with the normal distribution N[m, ], fixed, is:
Example 3.5
Poisson family
We have:
53
and can be computed only if the conditional mean m(xt; 8) is always strictly
positive.
Some other linear exponential families include the binomial, negative binomial,
multinomial, and gamma families.
Pseudo-maximum likelihood of order 2
A similar approach may be followed if we specify both first and second order
dynamic conditional moments:
54
Proposition 3.6
The PML1 and PML2 approaches may be applied as before; the condition for
consistency, i.e. the choice of pseudo-family, remains the same, but the form of
the asymptotic covariance matrix of the PML estimator has to take into account
the serial correlation of the pseudo-score vector 3 log ft/dO. We have:
where:
3.2.2
55
We are now interested in a parametric model whose likelihood function is untractable. When the likelihood function has no closed form, the same is likely to
be true for dynamic or static conditional moments, and the exact PML methods
cannot be used. In such a case we may extend the PML approaches by introducing
approximations of first (and second) order conditional moments in the expression
of the pseudo-log likelihood function. These approximations are based on simulations. The simulations will be conditional on yt~\, Zt_ in the case of dynamic
conditional moments, and will be path simulations (conditional on z\, , IT, .yo)
in the case of static conditional moments.
Example 3.7 Simulated nonlinear least squares based on the dynamic conditional mean
We consider an unbiased simulator of the first order dynamic conditional moment
m(x,, u; 9), where u has a known distribution (conditionally on xt), and such that:
and:
56
3.3.1
Asymptotically, i.e. when T tends to infinity, S fixed, the first order conditions are:
57
or:
We note that the solution dsoo is different from the true value 9$; therefore the
SNLS estimator is inconsistent for fixed S. Moreover, we see that the asymptotic
bias is of order |. This bias comes from the correlation introduced between the
'instruments' f^fo, ust; #$00) and the residuals y, r(zt, ust; ftsoo)This correlation vanishes if different simulations are used for the instruments and
for the moments, i.e. if the estimator is defined as a solution of:
where ust and u\ are independent drawings in the distribution of u. Note that this
modified estimator is an MSM estimator, since we have the moment condition:
3.3.2
This method is analogous to the one presented in Section 3.1.3 for the SML
approach, but since the criterion function is quadratic it will provide an exact bias
correction. Let us consider the limit of the criterion function:
58
There are three terms in the previous decomposition. It is clear that 6 = 90 gives
the minimum of the sum of the two first terms and that the asymptotic bias is created
by the third term. An idea proposed by Bierings and Sneek (1989) and Laffont
et al. (1991) consists in modifying the criterion function in order to eliminate this
term. Let us consider the following estimator:
with S > 2. The limit objective function is now E0Vo(yt/Zt) + Eo[m(zt', OQ)
m(zt', 6)]2, and 6ST is consistent for 5 fixed, T tending to infinity.
59
_ ^(T) otherwise.
QED
60
Using this lemma we can show the basic property of the MH algorithm.
Proposition 3A.1 Consider a MH algorithm (in which the p.d.f.s g(-/x) have
the same support as /). The MH algorithm defines a
Markov chain admitting P as an invariant distribution.
Proof. The Markov chain defined by the MH algorithm is such that
q(y/x) = p(y,x)g(y/x).
Therefore, we have :
Indirect Inference
4.1 The Principle1
4.1.1 Instrumental model
When a model leads to a complicated structural or reduced form and to untractable
likelihood functions, a usual practice consists in replacing the initial model (M)
with an approximated one (Ma) which is easier to handle, and to replace the
infeasible ML estimator of the initial model,
In parallel, we simulate values of the endogenous variables yst(9) using model (M)
and a value 9 of the parameter. As in previous chapters, we replicate S times such
simulations. Then we estimate the parameter ft of the instrumental model from
these simulations. We get:
62
63
64
and is equal to zero for the PML estimator of ft. An approach proposed by Gallant
and Tauchen (1996) selects a value of 0 such that:
where:
65
either
with
k = Q,...,S-l,h = l,...,T.
However, the last equivalence requires an additive decomposition for the derivative
of the criterion function
66
(since for such a choice the criterion function takes the minimal possible
value 0 for T sufficiently large) and therefore it is independent of 2.
(ii) Similarly, in the just identified case, OST(^) is the solution of the system:
and is independent of S.
(iii) Finally, if
has a unique solution in ftT, we deduce that this solution is Ayr(#s;r)> and
from (ii) that it is equal to ftT. From (i) we know that f)T /6sr(#sr)> and
therefore OST = &ST-
QED
4.2.2
67
The title of this subsection refers to the paper by Gallant and Tauchen (1996). The
question is: What is the underlying parameter on which the estimation process is
based?
This value may be specified only if we consider the asymptotic optimization problem. Let us consider for instance the PML methods used in Section 4.1.1. The
criterion is asymptotically:
which gives the proximity between the two conditional distributions f(yt/xt; 9)
and fa(yt/x,; ft); fa(y,/x,; b(9)) corresponds to the distribution of (Ma) that is
the closest to f(yt/x,; 0).
In some specific cases the parameter b(9) may admit an interpretation in terms of
moments, but in general it has a much more complicated interpretation.
Moreover, as shown in the following example, MSM methods on static conditional
moments are particular cases of indirect inference methods.
68
and:
69
is an MSM estimator based on the static conditional moment E[k(yt, zt)/Zr] and
on the identity instrument matrix. In the pure time series case, i.e. when no
exogenous variables are present, we find the MSM method proposed by Duffie
and Singleton (1993).
4.2.3
Asymptotic properties
The asymptotic properties of the indirect inference estimators are given below for
a general criterion function such that
converges to a deterministic limit denoted by ^oo(d, p ) , and such that the binding
function b(9) = arg min^ V*oo(#, ft) is injective. We introduce the two matrices:
Proposition 4.2
70
Similar results may be derived for estimators based on the score. They are direct
consequences of the following result (see Appendix 4A).
Proposition4.3
The indirect inference estimators 6ST (2) form a class of estimators indexed by the
matrix !T2. The optimal choice of this matrix is l = 2*.
Proposition 4.4
it implies:
4.3. EXAMPLES
71
A consistent estimator of this matrix can be obtained by replacing ^r^, by ^T, b(0o)
by PT, and /o by a consistent estimator based on simulations (see Gourieroux et
al. 1993: Appendix 2). Significance and specification tests can also be based on
the indirect inference method (see Gourieroux et al. 1993).
4.2.4
Some symmetrical calibration procedures might also have been introduced. For
instance, we might have considered:
However, it can be checked that these methods, while consistent, are generally less
efficient than the optimal ones we have described in the previous subsection.
4.3 Examples
The main step in the indirect inference approach is the determination of a good
instrumental model (Ma) or a good auxiliary criterion tyT (which may be an approximation of the log likelihood function). We now give some examples of the
determination of instrumental models in which the major part of the initial modelling has been kept. Some other examples, specific to limited dependent variable
models and to financial problems, are given in Chapters 5 and 6 respectively.
72
and for r = 2:
4.3. EXAMPLES
73
74
Mean
0.481
0.491
0.497
0.504
Standard
deviation
0.105
0.065
0.053
0.061
Root mean
square error
0.106
0.066
0.053
0.061
4.3. EXAMPLES
4.3.2
75
Application to macroeconometrics
or, with respect to ut and yt-i, around zero and a long-run equilibrium value y
(often taken equal to the historical average j Y^=i Jt in practice):
We may apply the indirect inference approach using either (4.19) or (4.20) as
auxiliary model and ft as auxiliary parameter. In such a case, the approach corrects
for the linearization bias.
We may also apply indirect inference from (4.20) with the auxiliary parameter
(ft', y')'. In this case we simultaneously estimate the 'implicit' long-run equilibrium associated with the linearized version.
Finally, there is no strong reason for expanding around u = 0 rather than around
another point u. This means that another approximated model is:
76
Let us assume that the model can be put in the nonlinear state space form:
where y* is a state vector which is in general (partially) unobserved and (e't, r)',)' is
a Gaussian white noise. The extended Kalman filter (Anderson and Moore 1979)
could be used to compute an approximate log likelihood function, but this estimator is inconsistent. It could also be used as a first step estimator in the indirect
estimation procedure, which in a second step provides a consistent and asymptotically normal estimator. In this example it is directly a criterion function (i.e. an
algorithm) that is used, without looking for a tractable instrumental model.
where:
77
where yr is the sample mean, a? the sample standard error, t a logistic map, and
P is a polynomial in v, whose coefficients are polynomial in y*_^,..., y*_p.
4.4
In this section we have gathered some additional theoretical results concerning the
indirect inference approach. They concern the second order expansion of indirect
inference estimators and particularly their ability to reduce the finite sample bias,
and a definition of the indirect information on the parameter of interest contained
in the auxiliary model.
where A(v; 90), B(v; OQ) are random vectors, depending on some asymptotic random term v, and where the equality is in a distribution sense. We have previously
seen that the first order term A(v; OQ) follows a zero-mean normal distribution.
Considering, for the sake of simplicity, the pure time series framework, i.e. without
explanatory variables, similar expansions are also valid for the first step estimators
based on simulated data. We consider S replications and, for each index s, T
simulated values {y/(#), t = 1 , . . . , T}. The first step estimator associated with
this set of simulated values $ST (9) (say) is such that:
78
Let us now consider an indirect inference estimator 9Sr defined as the solution of
the system:
The identification of the first and second order terms of the two members of the
equality provides the following terms in the expansion for
Proposition 4.5
79
From these expressions, we can deduce the second order bias of the indirect inference estimator denned by (4.25). We have:
It may be noted that, whereas the initial second order bias of the first step estimator
is:
the second order bias of the indirect inference estimator no longer depends on the
second order terms B(-;6o).
The case of a consistent first step estimator
Even if indirect inference is useful mainly when the initial model is untractable, it
might also be used when a consistent estimator of 9 is easily available, i.e. when
80
there exists a consistent first step estimator #r(0o) such that b(00) = 90. In such
a framework we will see (as noted by Gourieroux et al. (1994); McKinnon and
Smith (1995)) that indirect inference is a possible approach for correcting for finite
sample bias. In some sense the correction is similar to the one based on the median
proposed by Andrews (1993), or to a kind of bootstrap approach.
By applying the previous proposition to this particular case, we obtain some simplifications on the second order expansion of the indirect inference estimator.
Proposition 4.6
If b(6) = 9, then
The indirect inference estimator is simply equivalent to the initial estimator corrected for the second order bias. When the number of replications S is finite, the
second order bias of the indirect inference estimator is smaller in absolute value
than the one associated with the first step estimator as soon as:
81
or:
In Figure 4.7 we give the p.d.f. of aj/a0 and of o\Tjo^ for T = 20 and 5 = 10.
In the limit case, S = +00, we get (T - l)6-r/a02 ~ %2(T - 1), and a^T is
unbiased.
Of course, when S is small the gain in the bias is balanced by a loss in the variance.
In the example, the exact first and second order moments are:
82
FIGURE 4.7: The p.d.f. of the estimator with and without correction by indirect
inference T = 20, S = 10, ( ): x2(T - 1), ( ): F[T - 1, S(T - 1)].
The mean square errors are:
MSET
MSEST
4.4.2
Let us go back to the context of Sections 4.1.1 and 4.1.2, i.e. to the case where the
criterion 1/^7- is equal to:
83
where fa(yt/x,; /6) = fa(yt/yt:, V, ft) is the conditional p.d.f. of some instrumental model (Ma); in other words, ^r is equal to jLj,(fi), where Lj(fi) is the
log likelihood function of (Ma). We still assume that the true conditional p.d.f.
belongs to the family f ( y t / x t ; 0) associated with model (M).
Let us introduce the usual Fisher information matrices of (M) and (Ma):
Note that in the expression of Ia(fi) the variance-covariance matrix is taken with
respect to the distribution defined by the f a ( y t / x t ; /6), t l,...,T.
We assume that both of them are invertible, which implies the local identifiability
of both models.
It is natural to say that (M) is locally indirectly identifiable from (Ma) if the binding
function b(0) is locally injective, and to introduce the indirect information matrix
of (M) based on (Ma) as the asymptotic variance-covariance matrix (under M)
of the vector obtained by the orthogonal projection of the components of:
is equal to zero because of the definition of b(6), but that, in general, the conditional
expectation
84
Proposition 4.7
where:
(ii) If the matrix 11(6) is invertible, (M) is locally indirectly identifiable from (Ma).
Proof. See Appendix 4B.
The first part of this property shows that, when the criterion of the indirect inference method is the log likelihood of an auxiliary model, the asymptotic variancecovariance matrix of the optimal indirect inference estimators is (see Proposition 4.4):
85
(A4) The only solution of the asymptotic first order conditions is b(0):
an
(as soon as
is positive definite)
86
This implies:
(say).
Asymptotic expansions of pV and PST(QO)
These are directly deduced from the first order conditions. For instance, we have:
or
87
88
Finally, we get:
89
(with /r(6>) = /(y,/x t ; 0), f,a(/3) = f a ( y t / x t ; ft)). This matrix is denoted by:
90
with:
and:
with:
91
is:
or:
where /o(0) is the conditional p.d.f. of yo given the exogenous variables. Differentiating with respect to 9, we get:
The last limit is zero under the usual mixing assumption, and therefore:
and
Finally, we get:
If 77 (9) is invertible, ^|^ is of full column rank and the second part of Proposition 4.7 follows from the implicit function theorem.
Applications to Limited
Dependent Variable Models
5.1 MSM and SML Applied to Qualitative Models
5.1.1 Discrete choice model
The problem of discrete choices made by the agents is at the origin of the method
of simulated moments (McFadden 1989). In this subsection we first recall the form
of the model (see Section 1.3.1), the expression of the log likelihood function, and
the first order conditions. The model is defined from the underlying utility levels:
With the notations of Chapter 2, we have K(y{, z,-) = ji,k(Zi,Q) = P(ZJ\ 9). The
log likelihood function is given by:
94
or, since:
where 9 is a consistent estimator of OQ, and this replacement has no effect on the
asymptotic distribution of the estimator.
5.1.2
Simulated methods
As seen in Chapter 2, we first have to select a set of instruments Z,-7-(z,-); then the
MSM estimator is the solution of an optimization problem:
95
SML
The SML estimator is the solution of:
is not an instrument, since it depends on the simulated values M?-, which introduce
a correlation between Zy(0o) and
Simulated instruments
It has been proposed to apply the MSM with instruments close to the Z appearing in
the likelihood equations. To destroy the correlation previously discussed between
Z and y ps, the idea is to consider:
where:
The vfj are drawn independently of the My and in the same distribution. Then we
can look for the estimator $ solution of:
where 0Sn is a consistent estimator of 9, when n -> oo, and 5 is fixed. With such
a choice, and with S* and S sufficiently large, we may be close to the asymptotic
efficiency.
96
97
where v = ( v i , . . . , vm)' ~ N(0, ). The values a,-, bj and the variancecovariance matrix are known as soon as we know the values of the explanatory
variables and the values of the parameters. Therefore they depend on i and 6.
To build an unbiased simulator of the probability of the rectangle D
07= i [j' ^j 1 frtne normal distribution N (0, ), Stern (1992) proposed to decompose the variance-covariance matrix . Let us introduce the smallest eigenvalue A,
of the matrix ; we have XI dm ~5> 0, and therefore we can write:
98
Therefore:
in the u-space the domain D* has the form shown in Figure 5.1.
Now let us consider a drawing
in the standard normal distribution restricted
in the standard normal restricted
(or conditional) to
and a drawing
99
Indeed, we have:
Note that u\ has not been used, but it has been introduced in order to prepare the
general case.
Finally, it is easy to check that a drawing u* in the standard normal distribution
restricted to [CKI, fi\\ is deduced from a drawing u\ in the uniform distribution
f/(o,i) on (0, 1) by the formula:
since
100
General case. Let us now consider the extension to a general dimension m. After
the lower triangular transformation, the domain D* has the following form:
in N(0, 1) conditional to
5.2
101
(Say).
In the neighbourhood of the no correlation hypothesis, this function is also equivalent to:
(Say).
This is a local correction for the correlation effects by the introduction of Mill's
ratios.
Finally, if we consider that the unidimensional normal distribution is well approximated by a logistic distribution,
102
and
where S(y) = 1 F(y) is the survival function associated with the logistic
distribution.
5.2.2
Let us consider the log likelihood function associated with a discrete choice model.
It is given by:
or
103
or
or
and )> J (0) is a set of n simulated vectors yf(0) drawn from the discrete choice
models.
Such an approach is consistent and has a good precision when the correlations are
small (since Ln is a good approximation of L), but also for much larger correlations.
(See Appendix 5A for some Monte Carlo studies.)
104
or:
(constrained moment),
where h is a given integrable function. For instance, conditional computations
naturally appear in limited dependent variable models with truncation effects,
when the variable is observed only if a given constraint v e D is satisfied. But
conditional moments are also important if we perform a direct analysis of first
order conditions in more classical frameworks such as discrete choice models, in
particular if we apply simulated scores (see example 2.3). To illustrate this point,
let us consider a Gaussian variable v ~ N[/J,, E], and the probability:
where fi and E are unknown, whereas the dj, bj are known, and where v = v fj,,
a a /JL, b ~b [A.
The previous probability is a function of the parameters /i, E:
where D YYj=i [&j 'bj]- When we consider the first order conditions associated
with a maximum likelihood approach, we have to compute the derivatives:
and
Some direct computations give:
and similarly:
105
where u and w are independent, u ~ N(Q, Idm), w ~ AT(0, ldm), and A, is a scalar,
in practice the smallest eigenvalue of S. We have:
Such an integral may be easily computed for some specific functions h, in particular
when h(v) = Y[J=i v>j This class of h functions is interesting since we have seen
that it naturally appears with ry- =0,1,2, when we look at the score vectors. For
such products of power functions, the previous multiple integral is a product of
one-dimensional integrals, which are easily derived as soon as we know how to
compute an expression of the form:
106
The distribution of M* = ( * , . . . , u*m)' is the recursive truncated normal distribution with p.d.f.:
where ^ has already been defined in (5.16) and where by convention the denominator is <$>(fa) $(0:1) for j = 1.
Proposition 5.1 An unbiased
simulator of E[h(v)^o(v)]
is
h[Au*]p(u*,..., M^_J), where M* follows the recursive
truncated normal distribution defined in (5.24).
Proof. We have:
QED
107
The result is similar to the one derived in Section 5.1.2. Just note that it is now
necessary to draw the last component u*m as soon as it appears in the function
h(Au*).
5.3.3
The question clearly is: How can the conditional distribution of v be drawn in
given v e >? We propose several answers to this question.
Acceptance-rejection methods
A crude acceptance-rejection method consists in drawing simulated values vs,
s 1,..., in the distribution of v until one of these values satisfies the constraint
vs e D. This first value compatible with the domain is the first simulated value
in the distribution conditional on {v e D}. A drawback of such a practice is the
large number of underlying drawings that may be necessary in order to get one
effective simulated value in the conditional distribution, when P(v e D) is small.
108
Proof. We have:
QED
In this accelerated procedure the proportion of efficient drawings, i.e. the ones that
satisfy the constraint, is on average:
Therefore from this point of view the accelerated approach is preferable to the
crude one as soon as a < 1. To get a good performance, we have to choose the
auxiliary p.d.f. g such that a = supD ^ is as small as possible. Anyway we
have:
if
and by integration:
and we may choose as auxiliary p.d.f. with support D* the recursive truncated
normal distribution (see (5.24)):
109
We get:
Then we compute i/1 = AuSl, which follows the normal distribution N(Q, E)
conditioned by v e D.
The Gibbs sampling simulator
The basis of the Gibbs sampling is the characterization of a multivariate distribution by the set of all univariate conditional distributions.
Lemma 5.2
Proof. We give it for the bidimensional case. Let us introduce the marginal
distributions f\(x\), fi{xi) of x\ and x%. From the Bayes formula, we have:
Therefore the marginal distributions (and also the joint distribution) are known as
soon as we know the two conditional distributions.
QED
Remark 5.1
The condition of strict positivity of the multivariate p.d.f. is necessary for the
characterization, as shown by the following counterexample. The distributions
Pa = t/u >2 n + (1 - cOt/?!2,,, where a e [0, 1], have the same conditional
V 2/
(,5'V
distributions. These conditional distributions are either t//0 j \ or Un A and are
independent of the scalar a.
Proposition 5.2
(where f\ (x\) and /afe) are the marginal p.d.f.s of f(x\, ^2))Since the conditional distribution of x2 given x\ is fz(x2/x\), the result follows.
QED
111
where
The p.d.f. is difficult to use directly since P[v e D] has an intractable form. What
about the different univariate conditional distributions? The conditional p.d.f.s are
such that:
since the components v*, k ^ j, already satisfy the constraints. Therefore we see
that the density function /)(tu;/iLy) is the p.d.f. of the conditional distribution of
Vj given u_y- = iL,-, reconditioned by Vj . [aj,bj]. The conditional distribution
of Vj given u_ ; = v^j is the normal distribution:
where
112
Therefore the successive conditional drawings of the Gibbs sampling will be easily
performed.
where wit, w*t are the offered and reservation wages respectively, mit, nit are
the conditional means, and w,,, vit some error terms. These latent variables are
used to define the sequence of individual decisions: the worker / is unemployed
at
Let an indicator dummy variable be defined as dit = 0 iff y*t < 0, and dit ~
1 otherwise. Then the employment-unemployment history of the individual is
characterized by the vector of indicators dn,..., diT.
For studying some questions such as the persistence of aggregate unemployment
or its evolution over the business cycle, it is necessary to specify carefully all the
dynamic aspects of the latent model. It is known that for duration models several
dependences have to be considered:
the state dependence, i.e. the effect of the current state occupied by an
individual;
the duration dependence, i.e. the effect of the length of the current spell of
unemployment; a spurious duration dependence may also be due to unobserved individual heterogeneity;
113
the history dependence, e.g. the effect of the cumulated length of unemployment spells before the current spell (lagged duration dependence), or
the effect of the number of past spells (occurrence dependence);
the habit persistence, i.e. effect of lagged values of WH, w*t,
In summary, the dynamic aspects may be captured either by introducing some
lagged endogenous variables among the explanatory variables, or by considering some particular structures of the error terms, such as a decomposition in an
unobserved individual effect (the omitted heterogeneity), or an additional time
individual term with an autoregressive structurefor instance,
where or,-, /6,-, e;i(, ?j,-,r are independent Gaussian variables, with zero mean and
variances a^, a^, a2, a%, respectively.
The estimation of the parameters may be based on different kinds of observation. In
practice, these observations may correspond either to the states dit, i = 1, ...,N,
t = 1 , . . . , T (if it is a pure discrete time duration modelsee Muhleisen 1993), or
to observations of the states dn and of the wage wit when the worker is employed,
dit = 1 (see e.g. Bloemen and Kapteyn 1990, Keane 1993, Magnac, Robin and
Visser 1995). In such a nonlinear framework, even the estimations of the coefficients of the static explanatory variables may be very sensitive to misspecifications
of the dynamics.
To examine this point, we reproduce in Table 5.1 some results obtained in Muhleisen (1993). The parameters are estimated by optimizing a simulated log likelihood function. Since this likelihood is a function of some probabilities associated with the multivariate normal distribution, the GHK simulator has been used.
Because the performance of the GHK estimator depends on the accuracy of the
computations, owing to its recursive structure, a pseudo-random number generator
with 20-digit precision has been used (Kennedy and Gentle 1980).
5.4.2
The main assumption for rational expectations (RE) is that prediction errors are
uncorrelated with any variable in the information available at the previous date.
If y* is the variable to be predicted, and y*e is the expectation of this variable held
at date t 1, the RE assumption implies for instance that the linear regression of
3? on ?;,?;_!,)?!,
114
The data are taken from six waves of the Socio-Economic Panel for West Germany for the years
1984-1989, and concern 12000 individuals. The estimations are performed using only the history of
the states, and two schemes are considered for the error term ,-,: a pure autoregressive scheme ft = 0,
and the complete scheme also including the random effect ft. The different explanatory variables,
including some lagged endogenous ones, are included in a linear way in /*,-,.
115
is such that GO = 2 = #3 = 0, a\ = 1.
In practice, it is possible to have some time-individual data on expectations and
realizations from business surveys. Unfortunately, these data are qualitative, with
stltprnativpc cnrh fls*
In a first approach we may consider that such data are deduced from the underlying
quantitative variable by truncation. For instance, the qualitative variable associated
with y*t is:
where i is the individual index and t the date. Let us consider the simple case of
two dates, T 1 and T. We may specify the distribution of the latent variables
as
rnultivariate normal:
where z,, r are observable exogenous variables and where there is independence between individuals. The associated distribution of the qualitative observed variables
(yi,r, y f ? , Ji.r-i) y/7._1), i = I,..., N, will contain four-dimensional integrals
and will depend on the parameter 0 (c1, d', b', a, cry)'.
Nerlove and Schuermann (1992) have estimated such a multinomial probit model
(without explanatory variables), and considered the test of the RE hypothesis. It
requires a preliminary transformation of the constraints (ao = ci2 = a^ =Q,ai
1) into constraints on the components of 6. Such constraints include:
The sample consisted of 1007 manufacturing firms for the fourth quarter of 1986
(T 1) and the first quarter of 1987 (T), and the estimation has been performed
using the smooth recursive conditioning simulator (GHK). The number of replications was 20. The RE hypothesis has been rejected from the set of data.
116
Gaussian c.d.f. proposed in Section 4.2. Let us denote by <J>2<X y, p) the c.d.f. of
the normal distribution:
where:
The criterion K is a Kullback-Leibler information criterion measuring a discrepancy between the two distributions ^ and G, where <l>2 corresponds to the initial
model, and the observations are qualitative and dichotomous. The solution of the
minimization problem gives the values of the binding function for the values of the
parameters 0, 0 (for the means), 1, 1 (for the variances), and p for the correlation.
It is easily checked that the two components of the binding function associated with
the means are equal, mx (p) = my (p), and that the same is true for the components
associated with the variances crx(p) = ay(p). Figures 5A.1-5A.3 describe the
three functions mx(p), ax(p), and r(p). It is directly seen that the approximation
has nice properties in the domain p [0, 5] since for these values of p we have:
117
where
It is known that (v,( ) tends in distribution to (y,) when S tends to zero at a sufficient
rate (see e.g. Guard 1988). Therefore (6.3) will provide an accurate simulation
of y as soon as 8 is sufficiently small.
120
This is a nonlinear autoregressive model, with an explicit form of the log likelihood
function:
121
where
The auxiliary parameter is ft = [9', vech 2']'. System (6.8) defines a nonlinear
state space modelling for which the likelihood function has an untractable form.
However the parameters ft may be estimated by some Kalman filter type methods
(see Section 6.3.2). These methods would give inconsistent estimators of ft even
if (6.8) were valid; however, first, (6.8) is an approximated model; second, we are
interested in consistent estimators of 0 not of ft; and third, indirect inference will
correct for the inconsistency.
Some other examples of factor models are time deformed processes (see e.g.
Clark 1973, and Ghysels et al. 1994a). In such models we introduce two underlying processes: (i) a price process, expressed in intrinsic time (market time) and
following a stochastic differential equation:
and (ii) the changing time process Z, which gives the relation between calendar
time and market time. This increasing process may also be assumed to follow a
stochastic differential equation:
where the two processes (W*) and (Cf) are independent Brownian and gamma
processes, respectively.
The observations are the prices at some discrete dates in calendar time, S,, t =
l,...,T, where S, = S*z<.
In the following subsections we describe several applications of this kind which
have recently appeared in the literature.
122
where Wt is a standard Brownian motion and n, and a are the drift and volatility
parameters respectively. By applying Ito's formula, we get the equivalent form:
We deduce from (6.10) the exact discretized version of (6.9), which corresponds
to a random walk with drift in the log price:
and to a lognormal distribution for the price. Therefore the parameters /x, a may
be estimated by the full maximum likelihood method, i.e. by:
which gives an autoregressive form for (yt) with some conditional heteroscedasticity. A naive estimator is derived by taking the (pseudo-)maximum likelihood
estimator corresponding to (6.12):
123
It is inconsistent, since the discretization (6.12) is not the right one. The asymptotic
bias is easily derived. For fa, we have:
The bias, i.e. Efa ~ l*> (exp jit) (1 + At), is always positive. Finally, we can
correct for this bias by applying indirect inference on the basis of the auxiliary
model (6.12). We get a third estimator, (fa, a%).
To compare the properties of these three methodsmaximum likelihood, naive,
and indirect inferencewe reproduce in figures 6.1 and 6.2 the results of a Monte
Carlo study (with 200 replications). The true values of the parameters are ^ = 0.2,
a = 0.5, and the number of observations is T = 150. Indirect inference is applied
with 5 = 1 simulation and with 5 = 1/10.
The positive bias of the naive estimator, and the complete correction by indirect
inference, are clearly seen in the two figures. The indirect inference estimator is
even less biased in finite samples than the ML estimator for the volatility parameter
(see Section 4.4). Of course, the distribution of the indirect inference estimator
is less concentrated than the distribution of the ML estimator, a consequence of
the asymptotic efficiency of the latter. But we have to remind ourselves that the
indirect inference method has been performed with only one replication and that
the concentration might have been improved by increasing the number 5. Table 6.1
summarizes the statistical properties of the three estimators.
Estimation of an OrnsteinUhlenbeck process
An Ornstein-Uhlenbeck process is a solution of the differential equation:
124
indirect estima-
indirect estima-
125
a
V
a
Mean
0.201
0.503
0.201
0.499
0.220
0.624
Bias
0.001
0.003
0.001
-0.001
0.020
0.124
Standard
deviation
0.040
0.030
0.057
0.087
0.049
0.061
Root mean
square error
0.040
0.030
0.057
0.087
0.053
0.138
equilibrium' around which the path (yt) is varying. Such a model also admits an
exact discretization, which is:
and the indirect inference estimator, also based on (6.15). In this simple example
it is easy to look for asymptotic properties of the naive estimator since (6.14)
and (6.15) have the same structure. The limits for the naive estimators are:
Therefore the bias correction by indirect inference is essentially useful for the two
parameters k, a; in fact, it turns out that the ML and the naive estmators of a and
a* are identical. Figures 6.3 and 6.4 give the distributions of the three estimators,
where the Monte Carlo study has been performed with Jk = 0.8,a = 0.1, a = 0.06,
T = 250, 5 = 1, and a simulation time unit of 1/10.
6.1.3
The previous two examples are very specific since the associated continuous time
models have exact discretizations, which is not the case in more general frameworks. However, we have seen that indirect inference is a good way to correct the
126
indirect
indirect
127
Mean Bias
0.859 0.059
0.100 0.000
0.063 0.003
0.811 0.011
0.100 0.000
0.060 0.000
0.574 -0.226
0.100 0.000
0.043 -0.017
Standard
deviation
0.122
0.005
0.004
0.170
0.007
0.005
0.051
0.005
0.002
Root mean
square error
0.135
0.005
0.005
0.170
0.007
0.005
0.232
0.005
0.017
asymptotic biases on volatility parameters. In this section we consider other models with stochastic volatilities and with infeasible maximum likelihood estimators.
(See also Monfardini 1996 for applications of indirect inference on discrete time
stochastic volatility models.)
The Brennan-Schwartz model for the short-term interest rate1
Among the models proposed in Chan et al. (1992), where the short-term interest
rate is assumed to satisfy:
has first been estimated from the (misspecified) discretized version, taking into
account the stationarity condition 0 < y < 1. The results are given in Table 6.3.
If these estimators were consistent for the parameters of the underlying continuous time model (which is not the case, since they have been derived from the
'See Broze et al. (1993); see also DeWinne (1994).
128
a
0.23
j3 + I
0.97
CTQ
0.094
a\
-1.73
y
1
129
a ft +1
ap
a\
y
0.03 0.98 0.102 -0.08 1
discretized version), we would have concluded that the Chan et al. model is misspecified. Indeed, the estimation of y reaches the limit point between stationarity
and nonstationarity, and the constant term a\ in the variance is significantly different from zero.
However, such a conclusion may be valid only after a bias correction of the previous
estimators. This correction has been performed by indirect inference based on a
discretized version with a time unit of 1/10 (see Table 6.4).
The y estimator still reaches the limit point, which confirms the misspecification.
In fact, it means that the unconstrained estimator of y, i.e. without imposing the
inequality constraint y < 1, would have taken a value strictly larger than one.
The estimators of parameters ft, cr0 are not strongly modified. On the contrary, the
estimations of a and a\ are much more sensitive to the discretization of the model.
Given the previous Monte Carlo results, this is not surprising for the volatility
parameter a\. Concerning the a parameter, the effect is probably a consequence
of the constraint on y. The unconstrained estimator of y (a volatility parameter) is
strongly modified by discretization of the model. Imposing the constraint y 1
transfers this modification to another parameter measuring the nonstationarity
phenomenon; a is such a parameter.
The informational content of option prices
Pastorello et al. (1994) were interested in comparing the informational content of
option prices and stock prices for the parameters associated with the dynamics of
the volatility. For this purpose, they first introduced a stochastic volatility model
of the form:
where S,, at denote the stock price and the volatility respectively, and where (Wts),
(W) are two independent standard Brownian motions. Therefore the log volatility
satisfies an Orstein-Uhlenbeck model, and, conditional on the volatility process,
the stock prices have a lognormal formulation with a time varying volatility.
The option prices are deduced from the arbitrage free conditions. Let us consider
a European call option on the asset S, maturing at time T, with strike K. It delivers
at date T the terminal value max(0, ST K). By arbitrage free arguments, its
130
price at date t is a discounted expected value of this terminal cash flow, where
the expectation is taken with respect to a pricing (risk-neutral) probability Q, and
conditionally to the information available at t:
where the expectations are taken with respect to the historical probability (the one
associated with (6.16)), where:
In Pastorello et al. (1993), the observations are stock prices S,, t = 1 , . . . , T, and
prices of at-the-money options, i.e. options such that x, = 0, corresponding to a
sequence of maturities r = T t. The observations are denoted by:
where:
131
132
Indirect inference
based on (or/)
OJUT
0.030
Uncorrected method
based on (a/)
0.142
0.026
fj, = 4.6% per year, which is the average nominal return on Treasury bills for the
period 1948-83.
k = 0.116, a = 6.422, a 0.192, which correspond to the estimates reported
in Melino and Turnbull (1990).
The Monte Carlo experiments have been performed with a sample size T = 720,
corresponding to daily returns. Some summary statistics are given in Tables 6.56.7.
Since the parameter a measuring the magnitude of the random character of the
volatility is rather high, the implicit volatilities a' are bad proxies of the underlying volatilities, and the application of the Uncorrected method based on the
observations (a/) leads to strongly biased estimators.
The other important remark concerns the relative precision of the estimators based
on the stock prices (St) and the option prices (cr/). It clearly appears that the option
prices are much more informative, especially for the parameters k and a, which
measure the time correlation and the random character of the volatility. Of course,
Mean
Standard error
Indirect inference
based on (St)
O245
0.134
133
better results may have been obtained by a joint use of the two kinds of observation,
but the improvement of the precision would not be very important compared with
the one based uniquely on the information summarized in the implicit volatility;
moreover, all the methods using stock price data are sensitive to misspecification
errors concerning the dynamics of (St). Indeed, we have to recall that the pricing
formulas that are derived from the arbitrage free conditions do not depend on the
form of the instantaneous expected return ^(t, St, at) such that:
Definition 6.1
Since the limit lim^o }(Gt<p <p) does not necessarily exist for all square integrable functions, the infinitesimal operator is in fact defined on a subset of the set
of square integrable functions. This subset is called the domain of the infinitesimal
operator and is denoted by D.
2
134
Proposition 6.1
of y,.
Proof. Indeed, we have
Similarly, we have:
QED
135
Note that the scalar product is the one associated with the marginal probability
distribution of yt, which is time independent because of the stationarity assumption.
The determination of the operator A* sometimes requires the computation of the
stationary distribution of the process. However, for most stationary univariate
diffusions the two operators are equal: A* A (see Hansen and Scheinkman
1995).
Moment conditions
Proposition 6.3
(i) We have:
(ii) We have:
QED
136
We must add that the second set of moment conditions can be extended to functions
without multiplicative forms. If (p(yt, yt+{) is such a function of the two arguments,
we get, for univariate diffusion equations,
These moment conditions may be used directly as the basis of exact moment
methods in some simple cases. For instance let us consider the Chan et al. (1992)
model introduced in Section 6.1. It corresponds to the stochastic differential
equation:
We have, for a function <p possibly depending on the parameter,
In practice, we may introduce several functions <p and the associated moment conditions. For instance, we may introduce exponential functions <pj(r) = exp(o/r),
j I , . . . , p, and the moment conditions will be:
137
Let us now assume that the only available data are the stock prices St = y\t- In
such a case, the only moment conditions that can be used as a basis for GMM are
the ones for which A<p(yt) depends only on the first coordinate y\t, and this for all
the admissible values of the parameters. This constraint is equivalent to:
6.2.2
We have seen that exact methods of moments were difficult to implement, especially in the case of unobservable factors. An alternative method is the method
of simulated moments. Since the analytical forms of the conditional distribution
of yt given y,-\ are in general not simulable, the MSM has to be applied to static
cross moments (see Chapter 2). Such an approach has been followed by Duffie and
Singleton (1993). As before, the use of static moments only is likely to introduce a
loss of precision, especially for the parameters summarizing the nonlinear features
of the dynamics. It seems at least useful to take into account the third and fourth
order moments Eyf, Ey* in the case of stochastic volatility models, since we know
138
that the existence of a stochastic volatility increases the tails of the marginal distribution, and in particular the kurtosis (see e.g. Engle 1982), together with cross
order moments such as E(yfyf_k) to capture the conditional heteroscedasticity
and E(\yt\yt~k) to capture the leverage effect (see Andersen and Sorensen 1993).
where the functions ri(y,_i, y*, ; 9) and r2(yt-i, 3>*_i, ; 0) define a one to one
relationship between (ei,,S2t) and (yt,y*), and where (e\t), (e 2f ) are independent white noises with known distributions. The y,, t = I, ...,T, variables are
observable, but the factors y*, t 1 , . . . , T, are unobservable.
As mentioned in Section 1.3.4, conditional p.d.f.s are easily derived from system (6.3). These are the conditional p.d.f. of yt given yt-\, y*, denoted by
f(yt/yt-\.iy*\Q)i and the conditional p.d.f. of y* given yt-i,y*-\, denoted
by f*(y*/yt-i,y*-i'^)We then deduce the p.d.f. of yT = ( y i , . . . , y r ) ,
y* = ( y * , . . . , y*) given the information _yo, y:
If the process (yt, y*) is strongly stationary and T is large, the effect of the initial
conditions yo, y becomes negligible when studying the asymptotic properties of
the estimators.T'herefore we will not discuss this problem of initial values. The
likelihood function (conditional to .yo, 3$) has the form of a multivariate integral:
139
conditional on y,-\, y*_^ (with, for instance, the identification constraints o > 0,
a\ > 0, a0 + i = !)
In this example, we have:
6.3.2
The dynamic model (6.29) appears as a nonlinear state space system, where the first
subsystem is the measurement equation and the second is the transition equation.
In a linear state space system it is well known that the Kalman filter is an algorithm
allowing for the exact computation of the conditional p.d.f. of yt given yt-i (and
the initial conditions). In this subsection we will discuss the possibility~of such
an exact algorithm for nonlinear models, and show that, except for some specific
cases, the exact computation of the likelihood function is not possible. Then it
will be necessary to use either numerical or simulated methods.
3
SeeKitagawa(1987).
140
where the first term of the RHS in directly deduced from (6.35).
Step 2: One step prediction
We deduce
where the first term of the RHS is known from (6.34) and the second is given by
Step 1.
Then, integrating out y*Lp, we get:
And, integrating out yt-p, we obtain f(y*Lp+\/y_t), which is the input of the next
iteration.
In summary, Kitagawa's algorithm provides a recursive computation of the multiple
integral defining the likelihood function.
Such an algorithm is not so simple to implement, since it requires the computation of integrals. These computations can be done explicitly in the case of
141
linear Gaussian state space models, where Kitagawa's algorithm coincides with
the Kalman filter, and when the factor y* is discrete with a finite number of possible valuesb\,... ,bj, say. In such a case the integrals reduce to finite sums
(see Hamilton 1989). In the general case, and if p is small, these integrals could
be approximated by numerical methods (see Kitagawa 1987) or by simulation
methods.
6.3.3
Since exact computation of the likelihood function can be performed when the
factor takes only a finite number of values, a natural idea is to apply indirect
inference to an approximated version of the ARCH model in which the factor
has been state discretized. Let us consider the factor ARCH model introduced in
(6.32) and (6.33) and a partition of the range of y* in / given classes (a,-, o/+i),
j = 0 , . . . , J 1, where OQ = oo, cij +00. The state discretized factor is
defined by:
where bj are given real numbers, such as the centres of the classes, except for the
two extreme ones.
The dynamics of the discretized factor may be defined in accordance with the
dynamics of the initial one by:
(say).
Then the initial factor ARCH model is replaced by the proxy model:
where (e,), (y,) are independent, (et) ~ IIN(0, 2), and (y,) is a qualitative Markov
process with transition probabilities P/;(a0, i). This auxiliary model can be
estimated by the maximum likelihood method, using Kitagawa's algorithm. Then
the correction for the state discretization of the factors is performed by indirect
inference.
4
142
6.3.4
Stochastic volatility models (SVM) may also be directly defined in discrete time.
Danielsson (1993) considered such a model with the structure:
where (s,), (vt) are independent Gaussian white noises with zero mean and unit
variance. Therefore there is a stochastic volatility which is predetermined, and
generally not observable. This lack of observability implies a likelihood function
with the form of a T-variate integral:
The expression of the likelihood function is simplified only in the static case:
p = b2 = 0.
The previous model has been estimated by the SML method using eight years
of daily observations of Standard and Poor's 500 index for the years 1980-7
[T 2022], and an accelerated Gaussian importance sampler. Several estimation
results are given in Table 6.8 depending on the parameters of the model which
are a priori constrained to zero. In particular, the first two columns of the table
correspond to static cases.
5
Danielsson(1994).
143
by using the stationarity assumption. A second order expansion around the value
y, y provides:
since E[dWt/yt = y] = 0, and the other terms are negligible. Finally, we get:
since
QED
Applications to Switching
Regime Models
7.1 Endogenously Switching Regime Models
7.1.1 Static disequilibrium models
The canonical static disequilibrium model is defined as:
where z\t, Z2t are (row) vectors of observable exogenous variables, y*t and yt
are latent endogenous variables, yt is observed, and (si,, 2<) are independently
N(Q, /2) distributed. The parameter vector is 0 = (a{ ,a'2,a\, a2)'.
In this simple canonical case the likelihood function is easily computed and is
equal to:
where ^ and 4> are the p.d.f. and the c.d.f. of N(Q, 1) respectively.
However, in multimarket disequilibrium models with nonlinear demand or supply
functions, or in models with micro markets (see Laroque and Salanie 1989), the
likelihood function becomes very complicated, and in some cases untractable. In
order to solve this problem, Laroque and Salanie (1989) introduced various versions of the simulated pseudo-maximum likelihood (SPML) method. Moreover, in
Laroque and Salanie (1994) an evaluation of these methods based on experiments
is given. In these experiments the model retained is the previous canonical model
in which (z\t, Z2t) is a bivariate vector following
146
PML2, and QGPML, and their simulated analogues, called SMPL1, SMPL2,
and SPMLG. The three PML methods are obtained by maximizing the following
pseudo-likelihood functions:
PML1:
PML2:
QGPML:
where 21 is a preliminary estimate of 0 based on the PML2 method and where
m\(zt, 0) and v ( z t , 9) are, respectively, the mean and the variance of yt derived
from the model and given by:
with
and:
with:
In other words, these PML methods are based on normal pseudo-likelihood functions, although v, is clearly not normal; note that in this case, the PML1 and the
QGPML reduce, respectively, to the nonlinear and the quasi-generalized nonlinear
least squares methods.
The simulated analogues of these methods are obtained by replacing m\(zt,0) and
v(z t , 0) by approximations mf(zt, 0) and vs(zt, 0) based on simulations, namely:
and:
147
TABLE 7.1: Mean estimates on the retained samples (out of 200 samples); constrained estimator a\ = 02 = cr.
Coefficient
(true value)
ai
(1.00)
2
(1.00)
CT
(1.00)
T
20
50
80
20
50
80
20
50
80
FTML
1.04
1.03
1.03
1.04
1.02
1.02
0.88
0.94
0.95
PML1
1.06
1.07
1.09
1.26
1.29
1.26
1.84
1.98
1.98
Coefficient
(true value)
ai
(1.00)
a2
(1.00)
a
(1.00)
5= 5
0.98
0.95
0.95
1.00
0.96
0.94
0.64
0.37
0.30
f
20
50
80
20
50
80
20
50
80
SPML1
5 = 10
1.01
0.96
0.97
1.03
0.96
0.95
0.79
0.44
0.40
PML2
S = 20
1.07
1.00
0.98
1.04
0.99
0.96
1.01
0.64
0.51
QGPML
S= 5
1.04
0.99
1.06
0.96
1.08
0.96
1.27
1.01
1.30
0.97
1.25
0.96
1774072
1.94
0.45
1.89
0.42
1.02
1.02
1.02
1.03
1.03
1.02
0.88
0.94
0.95
S= 5
1.06
1.09
1.10
1.08
1.09
1.09
1.23
1.32
1.36
SPML2
S= 10
1.04
1.09
1.07
1.07
1.05
1.05
1.02
1.06
1.08
S = 20
1.05
1.05
1.05
1.04
1.05
1.04
0.94
0.99
1.00
SQGPML
S = 10
S = 20
1.02
1.07
0.98
0.98
0.98
0.99
1.02
1.06
0.97
0.98
0.95
0.97
5776LOT"
0.54
0.61
0.43
0.54
with:
148
where z\t, lit are (row) vectors of observable exogenous variables, y*t and y2t
are latent endogenous variables, y, is observed, and (s\t, e2t) are independently
,/V(0, h) distributed. (The cases of more than one lag or of autocorrelated disturbances are straightforward extensions.)
The likelihood function of such a model is intractable. In order to evaluate the
complexity of this function, let us first introduce the notations:
where m* r (0) andm2 f (0) are the conditional expectations of y*t and y2t givenJ,-!,
i.e.
and 9 is a notation for the parameter set (a{,a'2,b\,b2,c\,C2,a\, 02)'. Note that
the p.d.f. ft(y,, y*, r,/Xt~i', &) given in (7.4) is taken with respect to the measure
A-2 <E> (So + S\), where A.2 is the Lebesgue measure on R2 and 50, Si are the unit
masses on 0 and 1.
We can deduce the p.d.f. of VT, y, ?Y (given some initial values):
149
The PML1 and PML2 methods, based on normal pseudo-likelihood functions and
on static (or unconditional) moments, consist in minimizing, respectively,
and
Since Mt(9) and Vt(0) do not have closed forms, we consider their simulated
analogues in which Mt(0) and Vt(0) are replaced by approximations based on
path simulations of the model, Yf(6)s = l,...,S:
M,5(0) = -y/(0)
where the z\t are independently distributed in N(2.5, 2.52), zit = 5, (e1(, e2()
follow independently AT(0, /i), a\ = a2 = a\ = a2 = 1, b = 0.5, 7" = 50,
5 = 10, 20, 50, k = 0, and 200 replications have been performed.
The results obtained for the SPML2 estimates are reproduced in Table 7.2. As can
be seen, the estimation biases are rather small in spite of the fact that k has been
taken equal to 0, i.e. Y, is simply yt.
150
~a}
b
a2
CTI
02
LOO
0.50
1.00
1.00
LOO
S = 10
L02
0.50
1.01
1.03
L04
Mean
estimate
S~= 20 T^50
L 0 4 L O T
0.50
0.49
1.00
0.99
0.99
0.98
1.00
0.96
Dispersion of
estimate
S"^TO5^20S^TO
O22
O24
O3T~
0.06
0.06 0.05
0.08
0.06 0.05
0.28
0.26 0.32
0.25
0.23 0.20
SML methods
Lee (1995) has proposed two SML methods based on the following decompositions
afft(yt,yf,rt/It-i;0):
and
Using decomposition (7.8), the likelihood function (7.6) appears as the expectation
of the function Hf=i ft(yt/rt,Zt-i', 0) with respect to the variables v j f , . . . , y*r,
r\,... ,TT, and using the probability distribution whose p.d.f. is:
151
The latter method gives better results than the former, and in this case the three
required p.d.f.s are:
The last two p.d.f.s show how to draw rt and yr* at each date t: first rr is drawn
in the distribution on {0,1} defined by f t ( l / y t , T-t-\, 0), then y* is drawn in the
truncated normal distribution defined by ft(y*/yt, rt, lt-\;.6). (We have seen in
Section 5.1.3 how to realize such a drawing efficiently.)
The results of a Monte Carlo study using this method are reported in Lee (1995)
and seem encouraging; however, it seems difficult to extend this approach to more
complicated models where the likelihood function is intractable even in the static
case (for instance, multimarket models).
7.2
where the u, follow independently U[o,i\, the uniform distribution on [0, 1]; n0
and ;TI are, respectively, the probabilities of staying in state 0 and state 1.
152
Let us now assume that the observed endogenous variable yt is given by:
where {v,} is a white noise independent of {u,}, with a known distribution, where
3>/Ip = (yt-i, . - , yt-p), r't_p (rt, ..., r,-p), and g is a known function.
The conditional distribution of (yt, r,) given (y,_--\_, r t _i) depends only on (y'tlp,
r'lp), and the process(y?, r t ) is jointly Markovian of order p. (Note that this would
remain true if r, were Markovian of order p and if the transition probabilities were
functions of y'tlp.)
In this case Kitagawa's algorithm, described in Section 6.3.2, can be used to
compute the likelihood function (by taking y* = r,). The integrals appearing in
this algorithm are sums over the 2P+1 possible values of r't_ , and the algorithm
may be tractable if p is not too large. The algorithm thus obtained has been used by
Hamilton in various contexts, in particular in a switching AR(p) model (Hamilton
1989) in which (7.12) is specified as:
7.2.2
where {s,}, {r]t} are independent standard Gaussian white noises, {u,} is a white
noise, independent of {s,}{rjt}, whose marginal distribution is W[o,i], the uniform
distribution on [0, 1] and yo, zo, ro are nonrandom.
The dimensions of y, and y* are denoted by n and k respectively. The regime
indicator variable rt is assumed to take two values 0 and 1, but an extension to any
finite number of regimes is straightforward, y, is observable, r, is unobservable,
and y* is (at least partially) unobservable.
'This subsection refers to Billio and Monfort (1995).
153
Note that, from (7.15), y* does not cause rt in the Granger sense, but y, may cause
rt and, therefore, rt may be not strongly exogenous but only predetermined.
The framework defined by (7.13)-(7.15) contains many interesting models as
particular cases: switching ARMA models (which may have a nontrivial moving
average part), switching factor models, dynamic switching regressions, deformed
time models, models with endogenously missing data, and so on (see Billio and
Monfort 1995).
The partial Kalman filter, denoted by KF(rj-), is defined as the Kalman filter
mechanically applied to the linear state space model obtained from (7.13), (7.14)
for any given sequence rj.
It can be shown that, if the assumption of noncausality from y* to rt (implied by (7.15)) holds, the conditional distributions (yf/yt-\_, rj), (yf/y,_, rj),
and (yt/yt-i_,fj) are normal distributions whose expectations and variancecovariance matrices are the outputs of the partial Kalman filter:
These outputs are very useful tools for the computation of the likelihood function
of the model, or of the filters of y* and rt. (Similarly a partial Kalman smoother
can be defined and used for the computation of the smoothers of y* and rt.) In
particular, the p.d.f. f(yr/yt_-i_, rj) of N(yt/t-\(rj), M,/,-i(rj)) will be useful for
the computation of the likelihood function.
and let us denote by P the probability distribution on {0, l}r defined by:
where p(i/yt^-\_, r,-\), i =0,1, are the probabilities of the Bernoulli distributions:
154
In theory, this formula provides an unbiased simulator of tj and a way of approximating 1T from S independent simulated paths {f/}, s = 1 , . . . , S, drawn in P;
this approximation is:
However this method provides very poor results and must be improved.
Sequentially optimal sampling (SOS) methods
Using arguments based on the sequential optimality of importance sampling methods, several improvements of the basic sequential sampling method have been
proposed (in Billio and Monfort 1995).
(i) The first order sequentially optimal sampling (SOS(l)) method is based on
the following unbiased simulator of IT'-
where:
with
155
(ii) The second order sequentially optimal sampling (SOS(2)) method uses the
unbiased simulator (assuming T even):
where
(iii) A strong second order sequentially optimal sampling (SOS*(2)) has also
been proposed and is based on the unbiased simulator:
where
and where the S paths (rfs, r = 1,..., T - 2) have been sequentially drawn
from:
156
Simulated
Log simulated
Variance
likelihood mean likelihood mean
xlO 5
13ASIC3.52 x 10~20L43J4L2x 10"34
SOS(l)
0.9902
1.0001
288
SOS(2)
1.0096
0.9999
199
1.0078
12
0.9999
SOS*(2)
This procedure could be generalized to an SOS*(p) method; but, clearly, the
computational burden increases with pin the limit case p T we would get
the exact likelihood function.
Monte Carlo study
In order to evaluate the performances of the previous methods, we consider a simple
case in which the likelihood function is computable by Hamilton's algorithm,
namely the switching AR(1) model:
with:
where r, is a two state Markov chain defined by TTQ = 0.89 (probability of staying
at 0) and ?TI = 0.84 (probability of staying at 1), and {e;} is a standard Gaussian
white noise independent of {r,}.
We considered samples of various lengths T drawn from this process.
For each of the four previous methods (BASIC, SOS(l), SOS(2), SOS*(2)) we
computed 5 = 1 0 000 simulations of the value of the likelihood function at the true
parameter. Each simulation was divided by the true value of the likelihood function,
and Table 7.3 gives, for T = 100, the mean of these normalized simulations, the
log of this mean divided by the log of the likelihood function, and the estimated
variance of the mean of the normalized simulations.
From Table 7.3 it is clear that the basic method does not work at all, whereas
the SOS(l), SOS(2), SOS*(2) give satisfactory results and the ordering of their
respective performances is as expected.
Figure 7.1 shows the variability of the various simulators. Note that we have
plotted the log simulators in order to be able to show them on the same graph (with
the same scale).
Figure 7.2 shows the convergence rate of the log simulated mean for the various
methods (except the basic method, which does not converge). From this figure it
appears that the SOS*(2) method is the best: for this method, the mean is close
to 1 as soon as 5 is larger than 50.
157
REFERENCES
160
REFERENCES
REFERENCES
161
162
REFERENCES
REFERENCES
163
164
REFERENCES
and A. MONFORT (1989), Statistique et modeles econometriques, Economica, Paris; English translation, Cambridge University Press, 1995.
(1991), 'Simulation Based Econometrics in Models with Heterogeneity', Annales d'Economie et de Statistique, 20/1: 69-107.
(1993a), 'Simulation Based Inference: a Survey with Special Reference to Panel Data Models', Journal of Econometrics, 59: 5-33.
(1993>), 'Pseudo Likelihood Methods', in Handbook of Statistics, G.
S. MADDALA, C. R. RAO, and H. VINOD (eds.), North-Holland, Amsterdam.
(1995), 'Testing, Encompassing, and Simulating Dynamic Econometric
Models', Econometric Theory, 11: 195-228.
and A. TROGNON (1984o), 'Estimation and Test in Probit Models with
Serial Correlation', in Alternative Approaches to Time Series Analysis, J.
P. FLORENS, et al. (eds.), University St Louis, Brussels.
(1984>), 'Pseudo-Maximum Likelihood Methods: Theory',
Econometrica, 52: 681-700.
(1984c), 'Pseudo-Maximum Likelihood Methods: Applications
to Poisson Models', Econometrica, 52: 701-20.
A. E. RENAULT, and A. TROGNON (1987), 'Simulated Residuals',
Journal of Econometrics, 34: 201-52.
(1991), 'Dynamic Factor Models', Discussion Paper, CREST.
(1993), 'Indirect Inference', Journal of Applied Econometrics,
8: 85-118.
E. C. RENAULT, and N. Touzi (1994), 'Calibration by Simulation for Small
Sample Bias Correction', Discussion Paper, CREST.
REFERENCES
165
166
REFERENCES
HOTZ, J., and R. MILLER (1993), 'Conditional Choice Probabilities and the Estimation of Dynamic Programming Models', Review of Economic Studies,
60: 497-530.
and S. SANDERS (1990), 'The Estimation of Dynamic Discrete Choice Models by the Method of Simulated Moments', Discussion Paper, University of
Chicago.
R. MILLER, S. SANDERS, and J. SMITH (1992), 'A Simulation Estimator
for Dynamic Models of Discrete Choice', Discussion Paper, University of
Chicago.
HULL, J., and A. WHITE (1987), The Pricing of Options on Assets with Stochastic
Volatility', Journal of Finance, 3: 281-300.
ICHIMURA, H., and T. SCOTT-THOMPSON (1993), 'Maximum Likelihood Estimation of a Binary Choice Model with Random Coefficients of Unknown
Distribution', Discussion Paper 268, University of Minnesota.
INGRAM, B. E, and B. S. LEE (1991), 'Estimation by Simulation of Time Series
Models', Journal of Econometrics, 47: 197-207.
JENNRICH, R., (1969), 'Asymptotic Properties of Nonlinear Least Squares Estimators', Annals of Mathematical Statistics, 40: 633-43.
KEANE, M., and K. WOLPIN (1992), 'Solution and Estimation of Discrete Dynamic Programming Models by Simulation: Monte Carlo Evidence', Discussion Paper, University of Minnesota.
KEANE, M. P. (1990), 'Four Essays in Empirical Macro and Labor Economies',
Ph.D. dissertation, Brown University.
(1993), 'Simulation Estimation for Panel Data Models with Limited Dependent Variable Models', in Handbook of Statistics, ii, G. S. MADDALA,
C. R. RAO, and H. VINOD (eds.), North-Holland, Amsterdam, 545-70.
(1994), 'A Computationally Practical Simulation Estimator for Panel Data
with Applications to Estimating Temporal Dependence in Employment and
Wages', Econometrica, 62: 95-116.
KENNEDY, J., and J. GENTLE (1980), Statistical Computing, Marcel Dekker, New
York.
KlEFER, N., and G. NEUMANN (1979), 'An Empirical Job Search Model with
a Test of the Constant Reservation Wage Hypothesis', Journal of Political
Economy, 87: 89-107.
KlM C. J. (1994), 'Dynamic Linear Models with Markov Switching', Journal of
Econometrics, 60: 1-22.
REFERENCES
167
168
REFERENCES
LlPPMAN, S., and J. McCALL (1976), 'The Economics of Job Search: a Survey',
Economic Inquiry, 14: 155-367.
LlPSTER, R. S., and A. N. SHIRYAYEV (1977), Statistics of Random Processes, I
General Theory, Springer-Verlag, Berlin.
(1978), Statistics of Random Processes, II, Applications, SpringerVerlag, Berlin.
LO, A. (1988), 'Maximum Likelihood Estimation of Generalized Ito Processes
with Discretely Sampled Data', Econometric Theory, 4: 231-47.
McCULLAGH, P., and J. A. NELDER (1989), Generalized Linear Models, Chapman
& Hall, London.
McFADDEN, D. (1976), 'Quantal Choice Analysis: a Survey', Annals of Economics and Social Measurement, 5: 363-90.
(1989), 'A Method of Simulated Moments for Estimation of Discrete Response Models without Numerical Integration', Econometrica, 57: 9951026.
and P. RUUD (1987), 'Estimation of Limited Dependent Variable Models
from the Regular Exponential Family by the Method of Simulated Moment',
Discussion Paper, University of California at Berkeley.
(1990), 'Estimation by Simulation', Discussion Paper, MIT.
McGRATTAN, E. (1990), 'Solving the Stochastic Growth Model by LinearQuadratic Approximation', Journal of Business and Economic Statistics,
8: 41-4.
McKiNNON, J. G., and A. A. SMITH (1995), 'Approximate Bias Correction in
Econometrics', Discussion Paper, Queen's University.
MAGNAC, T., J. M. ROBIN, and M. VISSER (1995), 'Analysing Incomplete Individual Employment Histories Using Indirect Inference', Journal of Applied
Econometrics, 10: 153-70.
MALINVAUD, E. (1970), The Consistency of Nonlinear Regressions', Annals of
Mathematical Statistics, 41: 956-69.
MARCET, A. (1993), 'Simulation Analysis of Dynamic Stochastic Models: Applications to Theory and Estimation', Discussion Paper 6, University of
Barcelona.
MARIANO, R. S., and B. W. BROWN (1985), 'Stochastic Prediction in Dynamic
Nonlinear Econometric Systems', Annales de 1'INSEE, 59-60: 267-78.
REFERENCES
169
170
REFERENCES
FAKES, A., and D. POLLARD (1989), 'Simulation and the Asymptotics of Optimization Estimators', Econometrica, 57: 1027-57.
PARDOUX, E., and D. TALAY (1985), 'Discretization and Simulation of Stochastic
Equations', Acta Applicandae Mathematica, 3: 23-47.
PASTORELLO, S., E. RENAULT, and N. Touzi (1994), 'Statistical Inference for
Random Variance Option Pricing', Discussion Paper, CREST.
PESARAN, H., and B. PESARAN (1993), 'A Simulation Approach to the Problem
of Computing Cox's Statistic for Testing Non-Nested Models', Journal of
Econometrics, 57: 377-92.
RAO, P. (1988), 'Statistical Inference from Sampled Data for Stochastic Processes', Contemporary Mathematics, 20, American Mathematical Society.
RICHARD, J.-F. (1973), Posterior and Predictive Densities for Simultaneous Equation Models, Springer-Verlag, Berlin.
(1991), 'Applications of Monte Carlo Simulation Techniques in Econometrics and Game Theory', Discussion Paper, Duke University.
ROBERT, C. P. (1996), Methodes de Monte Carlo par Chames de Markov, CREST.
ROBINSON, P. (1982), 'On the Asymptotic Properties of Models Containing Limited Dependent Variables', Econometrica, 50: 27-41.
RUDEBUSCH, G. D. (1993), 'Uncertain Unit Root in Real GNP', The American
Economic Review, 83: 264-72.
RUUD, P. (1991), 'Extensions of Estimations Methods Using the EM Algorithm',
Journal of Econometrics, 49: 305-41.
SCOTT, L. (1987), 'Option Pricing when the Variance Changes Randomly: Theory, Estimation and Application', Journal of Financial and Quantitative
Analysis, 22: 419-38.
SHEPHARD, N. (1993), 'Fitting Nonlinear Time Series with Applications to
Stochastic Variance Models', Journal of Applied Econometrics, 8: 56384.
(1994), 'Partial non-Gaussian State Space', Biometrika, 81(1): 115-31.
SMITH, A. (1990), 'Three Essays on the Solution and Estimation of Dynamic
Macroeconometric Models', Ph.D. dissertation, Duke University.
(1993), 'Estimating Nonlinear Time Series Models Using Simulated Vector
Autoregressions', Journal of Applied Econometrics, 8: 63-84.
REFERENCES
171
STEIN, E. M., and J. C. STEIN (1991), 'Stock Price Distribution with Stochastic
Volatility: an Analytic Approach', Review of Financial Studies, 4: 727-52.
STERN, S. (1992), 'A Method of Smoothing Simulated Moments of Probabilities
in the Multinomial Probit Models', Econometrica, 60: 943-52.
THELOT, C. (1993), 'Note surlaloilogistiqueet 1'imitation', Annalesde 1'INSEE,
42: 111-25.
TIERNEY, L. (1994), 'Markov Chains for Exploring Posterior Distributions (with
discussion)', Annals of Statistics, 22: 1701-62.
Touzi, N. (1994), 'A Note on Hansen-Scheinkman's Back to the Future: Generating Moment Implications for Continuous Time Markov Processes', Discussion Paper, CREST.
VAN DlJK, H. K. (1987), 'Some Advances in Bayesian Estimation Methods using
Monte Carlo Integration', in Advances in Econometrics, vi, T. B. FOMBY
and G. F. RHODES (eds.), JAI Press, Greenwich, CT.
VAN PRAAG, B. M.S., and J. P. HOP (1987), 'Estimation of Continuous Models on
the Basis of Set-Valued Observations', Paper presented at the Econometric
Society European Meeting, Copenhagen.
WHITE, H. (1982), 'Maximum Likelihood Estimation of Misspecified Models',
Econometrica, 50: 1-28.
WIGGINS, J. (1987), 'Option Values under Stochastic Volatility', Journal of Financial Economics, 19: 351-72.
ZEGER, S., and R. KARIM (1991), 'Generalized Linear Models with Random
Effects: a Gibbs Sampling Approach', Journal of the American Statistical
Association, 86: 79-86.
ZELLNER, A., L. BAUWENS, and H. VAN DIJK (1988), 'Bayesian Specification
Analysis and Estimation of Simultaneous Equation Models using Monte
Carlo Methods', Journal of Econometrics, 38: 39-72.
INDEX
accelerated acceptance-rejection method 109
accelerated Gaussian importance sampling 49
acceptance-rejection method 109
adjusted PML method 6
aggregation effect 10
ARCH model 13, 18, 141, 143
auction model 14
bias correction 44, 56
binding function 67, 85
Black-Scholes formula 122, 130
Brennan-Schwartz model 127
changing time process 121
conditional simulation 15, 17
conditioning 45
continuous time model 10
diffusion equation 10, 120
discrete choice model 93
disequilibrium model 10, 14, 17, 145, 148
duration model 9, 113
dynamic condition moment 23, 54
efficient method of moment 76
Euler approximation 10,119,122
expectation maximization (EM) 50
factor ARCH model 13, 18, 141, 143
factor model 120, 138
frequency simulator 26, 96
finite sample bias 80
Gauss-Newton algorithm 29
generalized method of moments 5,21
geometric Brownian motion 122
GHK simulator 98, 105
Gibbs sampling 60, 109
heterogeneity 11,12,113
implicit long run equilibrium 75
implicit volatility 131
173
174
panel probit model 46
partial Kalman filter 152
path calibration 19,20
path simulation 15, 17, 18
probit model 17
pseudo-maximum likelihood 4, 50, 51, 145
pseudo-true value 6, 67
quadratic exponential family 53
quasi-maximum likelihood 53, 145
random parameter 11,18
recursive truncated normal 104
INDEX
simulated maximum likelihood 42, 142, 150
simulated nonlinear least squares 55, 56
simulated PML 54, 148
simulated score 35, 36
simulator 24
smooth simulator 26
static conditional moment 23, 54, 55
Stern simulator 97, 105
stochastic differential equation 119, 133
stochastic volatility 13, 120, 129, 137, 142
switching regime model 145,151
switching state space model 13, 49, 152
true value of the parameter 3