2003 Awr 3

On importance sampling Monte Carlo approach
to uncertainty analysis for ow and transport in porous media

Zhiming Lu
*
, Dongxiao Zhang
Hydrology, Geochemistry, and Geology Group (EES-6), MS T003, Los Alamos National Laboratory, Los Alamos, NM 87545, USA
Received 17 April 2003; received in revised form 29 July 2003; accepted 30 July 2003
Abstract
In this study we introduce a new approach named importance sampling or quick simulations. The method has been extensively
used in communication theory in estimating probability of rare events. The basic idea behind importance sampling techniques is that
certain values of the input random variables (or vectors) have more important impact on the parameters being estimated than others,
and if these important values are sampled more frequently than others, i.e., sampled froma biased density function, the variance of
the estimator can be reduced. The outputs from simulations are then weighted to correct such biased sampling. Two illustrative
examples are given to show the general procedure of the importance sampling approach as well as its applicability to subsurface ow
and transport problems. In one example we estimated the mean and variance of hydraulic head for one-dimensional ow, and in the
other we estimated the probability of a particles travel time t less than a given critical value T. In both examples, we compared results
fromanalytical solutions, the conventional Monte Carlo (CMC) simulations, and the importance sampling approach. It is shown that
when an importance density function is chosen appropriately, importance sampling techniques may be many orders of magnitude
more ecient than the CMC simulations and have a great potential in simulating subsurface ow and transport.
Published by Elsevier Ltd.
Keywords: Monte Carlo simulation; Importance sampling; Heterogeneity; Flow and transport; Variance reduction; Variance estimation
1. Introduction
It is well known that geological formations are
ubiquitously heterogeneous, and the equations describ-
ing ow and transport in these formations are stochas-
tic. Stochastic approaches to ow and transport in
heterogeneous porous media have been extensively
studied in the last two decades, and many stochastic
models have been developed [4,6,19]. These models can
be classied into two groups: Monte Carlo simulations
and moment-equation approaches. Monte Carlo simu-
lation is a conceptually straightforward approach to
solve these stochastic partial dierential equations by
generating a large number of equally likely random re-
alizations of the parameter elds, solving deterministic
ow and transport equations for each realization, and
averaging the results over all realizations to obtain
sample moments of the solution. This approach has the
advantages of applying to a very broad range of both
linear and nonlinear ow and transport problems, but
has a number of potential drawbacks.
A major disadvantage of the Monte Carlo method,
among others, is the requirement for large computa-
tional eort. To properly resolve high frequency space
time uctuations in random parameters, it is necessary
to employ ne numerical grids in spacetime. Therefore,
computational eort for each realization is usually large,
especially if both physical and chemical heterogeneities,
as well as uncertainties in initial and boundary condi-
tions, are considered. To ensure sample output moments
converge to their theoretical ensemble values, a large
number of Monte Carlo simulations are often required
(typically a few thousand realizations, depending on the
degree of medium heterogeneity), which dramatically
increases the computation burden.
One alternative to Monte Carlo simulations is an
approach based on moment equations, the essence of
which is to derive a system of deterministic partial dif-
ferential equations governing the rst two moments
(mean and covariance), and then solve them analytically
or numerically [3,4,7,11,1720].
*
Corresponding author. Tel.: +1-505-665-2126.
E-mail address: zhiming@lanl.gov (Z. Lu).
0309-1708/$ - see front matter Published by Elsevier Ltd.
doi:10.1016/S0309-1708(03)00106-4
Advances in Water Resources 26 (2003) 11771188
www.elsevier.com/locate/advwatres
Though the moment-equation-based approach (based
on perturbation methods) in many cases works well for
relatively large variations in the medium properties
[10,16,21], this approach in general is restricted to small
variabilities of medium properties. Although the mo-
ment equation approach may render accurate solutions
with coarser numerical grids (of fewer nodes) than the
Monte Carlo method, it must solve covariance equa-
tions as many times as the number of nodes. Thus, this
approach can also be computationally demanding, in
particular, for large-size problems.
In this study we introduce a new approach named
importance sampling or quick simulations for the pur-
pose of solving problems of ow and transport in ran-
dom porous media. The importance sampling method is
one of variance reduction techniques and it has been
extensively used in communication theory in simulating
rare events [1,9,13]. The basic idea behind importance
sampling techniques is that certain values of the input
random variables (or vectors) have more important
impact on the quantities being estimated than others,
and if these important values are sampled more fre-
quently, i.e., sampled from a biased density function, the
variance of the estimator can be reduced. The outputs
from simulations are then weighted to correct the bias
caused by sampling from the biased density function. A
detailed mathematical framework for importance sam-
pling is given in [8] and a thorough review on this topic
can be found in [14].
The organization of this paper is as follows. In Sec-
tion 2, we review the principle of importance sampling
techniques in estimating mean quantities and develop
for the rst time a similar principle for estimating their
associated variances. We also discuss in this section the
choice of importance density functions. Two examples
are given in Section 3 to illustrate the general procedure
of the importance sampling approach and its applica-
bility to subsurface ow and transport problems. In one
example we estimated the mean and variance of hy-
draulic head for one-dimensional ow, and in the other
we estimated the probability of a particles travel time t
less than a given critical value T. In both examples, we
compared results from analytical solutions, the con-
ventional Monte Carlo (CMC) simulations, and the
importance sampling approach. It is shown that in the
cases where the importance density function is chosen
appropriately, the importance sampling technique may
be many orders of magnitude more ecient than the
CMC simulations.
2. Principle of importance sampling techniques
The purpose of the importance sampling techniques is
to obtain accurate estimate of quantities (such as mean
hydraulic head and head variance) with fewer samples
than required in the CMC method. There are two major
steps involved in the importance sampling techniques.
One is distortion of the original input process. Instead of
taking samples from the original probability density
function (pdf), samples are taken from some other pdf,
called importance density functions, such that some
important regions of the sample space get more
samples. The fundamental issue in implementing the
importance sampling techniques is the choice of biased
importance density functions. The other is correction of
the distortion by averaging the output from dierent
samples (realizations) using weights that are related to
the distortion, such that the mean of the quantity being
estimated is preserved.
Let Y denote a random variable (or vector) with a
(joint) pdf f y. Here Y may represent properties of
porous media, such as log hydraulic conductivity, at
discretized grid nodes. We wish to derive the moments
of function h gY, where g is a specied, determin-
istic function (operator). For example, g can be a nu-
merical solver that maps a log hydraulic conductivity
eld Y into a head or velocity eld h, under some given
boundary and initial conditions.
2.1. Mean estimation
By denition, the mean and variance of h can be
written as
l
h
Eh EgY
Z
X
gyf y dy; 1
r
2
h
EfgY l
h
2
g
Z
X
gy l
h
2
f y dy
Z
X
g
2
yf y dy l
2
h
; 2
where X is a probability space, and E represents statis-
tical expectation. Note that both l
h
and r
2
h
depend on
the problem itself and are independent of the way one
estimates them. To estimate the mean from Eq. (1) using
the CMC method, one samples randomly a sequence of
y
i
, i 1; 2; . . . ; N, from the density function f y and
computes the sample mean
hh
N

1
N
X
N
i1
gy
i
: 3
Since y
i
, i 1; 2; . . . ; N are independent identically dis-
tributed random variables, it can be shown that
E
hh
N
l
h
, i.e.,

hh
N
is an unbiased estimator of l
h
, and
the variance of the estimator

hh
N
is r
2
MC
r
2
h
=N. Because
r
2
h
is unknown, it is usually estimated using
S
2
h;N

1
N
X
N
i1
g
2
y
i

hh
2
N
: 4
1178 Z. Lu, D. Zhang / Advances in Water Resources 26 (2003) 11771188
By the Central Limit Theorem, in the limit N ! 1, the
probability density of
hh
N
l
h
=r
h
=
N
p
tends to a
standard Gaussian distribution N0; 1. Thus we expect
hh
N
to lie within r
h
=
N
p
around l
h
with the probability
of 68%, if N is suciently large. From this, we can
estimate the minimum number of simulations required
to obtain the mean estimation with a given accuracy.
Let N
;MC
denote the minimum number of simulations
required to obtain a 100 % precision, then
N
;MC
dr
2
h
=l
2
h
2
e; 5
where dxe is the least integer Px. For instance, if we
need the estimation to be within 10% around the exact
value, the minimum number of required simulations
should be N
0:1;MC
dr
2
h
=l
2
h
2
e d100r
2
h
=l
2
h
e.
Because the statistical error r
h
=
N
p
is inversely pro-
portional to
N
p
, if we wish to reduce it by a factor of
two we have to increase the sample size by a factor of
four. Certainly, the rate of convergence of the CMC
method is rather slow. It is desirable if we can design
some ways to reduce the estimation variance more
rapidly as N increases.
The importance sampling technique is one of such
ways that reduce the estimation variance (called vari-
ance reduction techniques) and thus reduce the statisti-
cal error much faster than the CMC techniques. The
eciency of the importance sampling techniques de-
pends on the selected importance density functions. The
essence of importance sampling is to avoid taking
samples y in regions where the impact of the value of the
function gy to the quantity being estimated is negli-
gible but to concentrate on (important) regions where
the impact is large. This operation will inevitably in-
troduce bias (or, distortion), which is to be corrected
by weighting the sample values appropriately.
Suppose we sample y
i
, i 1; 2; . . . ; N, from an im-
portance density function f
1
y rather than the original
density function f y, where f
1
y is zero only if f y is
zero. To preserve the mean (i.e., to correct the bias) one
has to modify the original score function gy. A mod-
ied score function H g
1
Y is dened as
g
1
y gyw
1
y; 6
where w
1
y f y=f
1
y is called a weight function.
The expectation of H under density f
1
y can be deter-
mined
l
H
EH Eg
1
Y
Z
X
g
1
yf
1
y dy
Z
X
gy
f y
f
1
y
f
1
y dy
Z
X
gyf y dy l
h
; 7
which means that the mean remains the same though the
samples are taken biasedly from the importance density
function f
1
y. The variance H under density f
1
y is
r
2
H
Efg
1
Y l
H
2
g
Z
X
g
1
y l
H
2
f
1
y dy
Z
X
g
2
1
yf
1
y dy l
2
H
Z
X
f y
f
1
y
" #
g
2
yf y dy l
2
H
: 8
From (7) one can construct an estimator for l
h
based on
samples y
i
, i 1; 2; . . . ; N:
H
N

1
N
X
N
i1
g
1
y
i

1
N
X
N
i1
gy
i
w
1
y
i
; 9
i.e., the contribution of sample y
i
in H
N
is weighted by
w
1
y
i
and the bias due to sampling from the biased
importance density function has been corrected. The
variance of the estimator H
N
is r
2
IS
r
2
H
=N, where r
2
H
can be estimated using
S
2
H;N

1
N
X
N
i1
g
2
y
i
w
2
1
y
i
H
2
N
: 10
Similar to equation (5), the minimum number of simu-
lations required to obtain an estimation of Eg with a
100 % precision is
N
;IS
dr
2
IS
=l
2
H
2
e: 11
The ratio of variances c r
2
MC
=r
2
IS
N
;MC
=N
;IS
is a
measure of the eciency (performance) of the impor-
tance sampling, which depends on the choice of the
importance density function. Here r
2
MC
r
2
h
=N is the
estimation variance of the CMC method. It should be
noted by comparison between (8) and (2) that if we
choose the importance density function f
1
y properly
such that the weight function w
1
y on average is sub-
stantially less than unity, r
2
H
and thus the variance of the
estimator r
2
IS
r
2
H
=N will be reduced. This means that,
comparing to the CMC method, sampling from the
importance density function may allow us to estimate
the mean with a small sample size for a given accuracy,
or with a much better accuracy for a given sample size.
In fact, as will be shown later, the Monte Carlo ap-
proach based on importance sampling techniques makes
it possible to solve some problems that cannot be solved
by the CMC simulation.
We should emphasize here that uncertainties associ-
ated with modeling applications can be classied as
reducible and irreducible. Natural uncertainty is
inherited or irreducible, while data and model un-
certainties contain both reducible and irreducible com-
ponents. By the term variance reduction we refer to
reduction of estimation variance, i.e., the variance of the
estimator or model uncertainty, not the variance inher-
ited from uncertainties in system parameters. For in-
stance, in evaluating an integral
R
b
a
f x dx numerically,
one may discretize interval a; b using N points that can
Z. Lu, D. Zhang / Advances in Water Resources 26 (2003) 11771188 1179
be either uniformly or randomly distributed in a; b and
then calculate
P
f x
i
Dx
i
. The accuracy of this estima-
tion depends on the number of points, N, and the dis-
tribution of these points in a; b. A good estimator may
require less computation time, converge faster to the
true value, and thus have a smaller estimation variance.
Note that here the variance caused by uncertainties is
not involved, because the value of the integral is a de-
terministic quantity. In general, the variance due to
parameter uncertainties cannot be reduced unless more
information about system parameters is added. For ex-
ample, if we are interested in estimating mean head for
given uncertainty of hydraulic conductivity, the variance
introduced by the spatial variability of hydraulic con-
ductivity cannot be reduced unless more information,
such as more measurements on hydraulic conductivity,
is provided. Of course, we may be able to design a good
estimator (such as importance sampling) such that the
estimator converges to the true value faster, i.e., reduc-
ing the variance of the estimator.
2.2. Variance estimation
Unlike in communication systems where the mean,
bit error rate, is the main quantity to be estimated, in
hydrology it is often needed to estimate the variance
associated with the mean prediction. Because impor-
tance sampling techniques do not preserve variance (i.e.,
r
2
H
is in general not equal to r
2
h
), if we want to estimate
variance r
2
h
, we will have to re-formulate the expression
for variance. From Eq. (2) we have
r
2
h
EfgY l
h
2
g
Z
X
g
2
yf y dy l
2
h
Z
X
f
1
y
f y
g
2
1
yf
1
y dy l
2
H
; 12
where l
h
has been replaced by l
H
because they are the
same. In Eq. (12), for any y such that f y 0, from Eq.
(6), g
1
y must be zero. From Eq. (12), if we take sam-
ples y
i
, i 1; 2; . . . ; N, from an importance density
function f
1
y, then the original variance r
2
h
may be es-
timated using the following expression:
S
2
h;N

1
N
X
N
i1
f
1
y
i
f y
i
g
2
1
y
i
H
2
N
1
N
X
N
i1
g
2
y
i
w
1
y
i
H
2
N
: 13
A comparison with Eq. (10) reveals that the reduction of
variance S
2
H;N
in Eq. (10) is due to the fact that g
2
y is
weighted by w
2
1
y in S
2
H;N
rather than by w
1
y as in S
2
h;N
in Eq. (13).
It should be noted that, though Eq. (13) gives cor-
rected estimation of variance, the rate of convergence of
estimator S
2
h;N
in Eq. (13) to r
2
h
as n ! 1 may not
necessarily be optimized, because the importance density
function f
1
y discussed above is selected for the pur-
pose of preserving the mean quantity l
H
l
h
while re-
ducing the computational eort by minimizing Eq. (8).
In some special cases, as shown in case 2 in the next
section, the score function gy is an indicator function
thus g
2
y gy, and from Eq. (2) we have r
2
h

l
H
1 l
H
. Therefore, in this case once we have esti-
mated l
H
, the variance can be easily calculated without
resorting to Eq. (13).
However, in general, from Eq. (2), if one wishes to
estimate r
2
h
using smaller number of simulations, one
may need to construct a new importance density func-
tion f
2
y to estimate the mean of g
2
y, just as we
construct f
1
y to estimate the mean of gy. Suppose we
sample y
i
, i 1; 2; . . . ; N, from an importance density
function f
2
y, a function to be determined. Again, f
2
y
is zero at y only if f y is zero. By dening
g
2
y gyw
2
y, where w
2
y
f y=f
2
y
p
, the last
integral in Eq. (2) becomes
Eh
2
Eg
2
Y
Z
X
g
2
yf y dy
Z
X
g
2
2
yf
2
y dy; 14
which means that the mean of g
2
y under distribution
f y is preserved by the mean of g
2
2
y under distribution
f
2
y. Therefore, the variance r
2
h
is preserved under
distribution f
2
y. From the last integral of Eq. (14),
Eh
2
can be estimated by
h
2
N

1
N
X
N
i1
g
2
2
y
i

1
N
X
N
i1
g
2
y
i
w
2
2
y
i
: 15
Therefore, instead of using Eq. (13) one may use the
following expression to estimate r
2
h
in Eq. (2)
S
2
h;N

1
N
X
N
i1
g
2
y
i
w
2
2
y
i
H
2
N
; 16
where H
N
is estimated from Eq. (9) using weighting
function w
1
y, and samples y
i
, i 1; 2; . . . ; N, are taken
from the importance density f
2
y. The latter can be
selected by minimizing the variance
r
2
h
2
;IS

Z
X
g
2
2
y Eh
2
2
f
2
y dy
Z
X
g
4
yw
4
2
yf
2
y dy Eh
2
Z
X
f y
f
2
y
g
4
yf y dy Eh
2
2
: 17
The variance r
2
h
2
;IS
of estimator S
2
h;N
can be estimated
from importance samples y
i
:
S
2
h
2
;IS

1
N
X
N
i1
g
4
y
i
w
4
2
y
i
h
2
N
2
: 18
To our best knowledge, the importance sampling pro-
cedure for variance estimation in (16) and (17) is de-
veloped for the rst time. In a special case where gy is
an indicator function, g
2
y gy, thus Eh
2

Eg
2
y Egy l
h
. A comparison of the last inte-
grals of Eqs. (8) and (17) reveals that in this case the
importance density that minimizes Eq. (8) will also
minimize Eq. (17), i.e., f
2
y f
1
y. In general, f
2
y is,
however, dierent from f
1
y.
In summary, to estimate a mean quantity using the
importance sampling techniques, one chooses an im-
portance density function f
1
such that r
2
H
in Eq. (8) or
S
2
H;N
in Eq. (10) is minimized, samples y from f
1
y, and
computes H
N
in Eq. (9) as an estimate of l
h
. Similarly,
to eciently estimate variance r
2
h
, one nds f
2
such that
r
2
h
2
;IS
in (17) or S
2
h
2
;IS
in (18) is minimized, takes samples
from f
2
, and estimates r
2
h
using S
2
h;N
in (16).
2.3. Selection of importance density functions
One important question that remains is how to
choose the importance density function such that the IS
estimator has a smaller variance than the CMC esti-
mator. Here we limit our discussion on selection of f
1
y
which minimizes Eq. (8) and reduces computational ef-
fort in estimating mean quantity l
h
. A similar procedure
can also be applied to selection of f
2
y that minimizes
Eq. (17) and reduces the number of simulations required
in estimating variance r
2
h
. Ideally, if one chooses
f
1
y gyf y=l
h
, then from Eq. (8) it follows that
r
2
H
0. Though f
1
y dened here contains l
h
that is the
quantity being estimated and thus cannot be used in
practice, it does give us some clues about how to con-
struct an importance density function.
It is seen from Eq. (8) that it is dicult to minimize
r
2
H
directly as a function of an unknown function f
1
y.
A practical alternative is to choose f
1
y from a family
of candidates for which sampling y
i
and evaluating
w
1
y are relatively easy. In addition, choosing f
1
y
from a family of candidates makes it possible in some
cases to derive an analytical expression for r
2
H
.
If the original distribution is Gaussian, then the
family of Gaussian distributions should be a natural
choice. Two major approaches have been used in liter-
ature, i.e, variance scaling (VS) [5] and mean translation
(MT) [9], or their combination. The concepts of MT and
VS are illustrated in Fig. 1, where f x is the original
density function. Suppose the problem being concerned
is related to a small probability Prx < x
0
, the CMC
simulation that takes samples from f x will not be very
ecient, because it is very hard to take a sample from
f x such that x < x
0
when x
0
is very small. By MT, one
simply shifts the mean of f x to x
0
, thus the frequency
to take samples from x < x
0
from the new density
function f
1
x has been increased (Fig. 1(a)). Another
way to enlarge the sampling frequency for x < x
0
is to
increase variance (Fig. 1(b)). Fig. 1(c) illustrates the
combination of MT and VS. Discussion on advantages
and disadvantages of MT and VS can be found in lit-
erature [1,2,14]. In some cases, as showed by Chen et al.
[2], VS in combination with optimized MT can add a
degree of robustness with respect to severe nonlinearity.
It is dicult in general to analytically derive expres-
sions for optimal parameters at which r
2
H
in Eq. (8) is
minimized, even when the form of f x is relatively
simple. In practice, the optimal parameter values can be
obtained numerically. For example, suppose the original
distribution is Nl; r
2
, to nd the optimal MT, one may
choose a set of l
i
, i 1; 2; . . . ; m, that spans the range of
l, take samples from Nl
i
; r
2
, and for each l
i
compute
S
2
H;N
dened in Eq. (10). The l
k
corresponding to the
minimum value of S
2
H;N
among all m values is considered
as the optimal mean l
opt
for the optimal density func-
tion.
An alternative is an iterative approach called Spa-
niers technique proposed by Spanier [15] for optimizing
a parameter through processing of the Monte Carlo
results from a relatively small samples. Expressing the
last integral in Eq. (8) as
M
2
l
i

Z
X
g
2
yf
2
y
f
1
y; l
i
f
1
y; ~ ll
f
1
y; ~ ll dy; i 1; 2; . . . ; m
19
where ~ ll is a guess value of the parameter l, and l
i
are
parameter values that span the most likely range of l.
Now start from the initial guess ~ ll, take samples y
j
,
x
(a)
f(x)
f
1
(x)
x
0 x
(b)
f
1
(x)
f(x)
x
0 x
(c)
f
1
(x)
f(x)
x
0
Fig. 1. Schematic diagrams showing: (a) MT, (b) VS, and (c) their combination in one dimension. Shaded area indicates the eective IS probability
mass.
j 1; 2; . . . ; N, from an importance density f
1
y; ~ ll, and
compute
M
2
l
i

1
N
X
N
j1
g
2
y
j
f
2
y
j
f
1
y
j
; l
i
f
1
y
j
; ~ ll
; i 1; 2; . . . ; m
20
The new guess value for the next iteration ~ ll l
k
at
which M
2
l
k
in Eq. (20) is the minimum among m
values of M
2
l
i
, i 1; 2; . . . ; m. Repeat this process
until ~ ll converges. Similar procedure can be applied to
nding the optimal variance r
2
IS
of the importance
density, or to obtaining both l
IS
and r
2
IS
.
3. Illustrative examples
In the section we attempt to illustrate the power of
importance sampling techniques using two simple ex-
amples for ow and transport in one-dimensional po-
rous media whose hydraulic properties are random
constants rather than correlated random elds. One of
the major reasons for choosing such simple examples is
that analytical solutions for these examples are avail-
able, and therefore they can serve as the basis for
comparing eectiveness of the conventional and IS-
based Monte Carlo (ISMC) simulations. Another rea-
son is that no clear strategy is yet available to optimally
select importance density functions for correlated ran-
dom elds, which shall be a topic of future research.
Nevertheless, to our best knowledge these examples
constitute rst applications of importance sampling
Monte Carlo techniques to subsurface ow and trans-
port problems.
3.1. Head moments of 1D ow
In our rst example, denoted as case 1, we consider a
one-dimensional horizontal ow with prescribed deter-
ministic inux q
0
on the left and prescribed deterministic
constant head H
1
on the right. We assume that the log
saturated hydraulic conductivity Y ln K
s
is a random
constant, being a constant in the physical space but
varying in the probability space with a normal distri-
bution NhY i; r
2
Y
. For this case, as shown in Appendix
A, the mean hydraulic head hhxi can be expressed as
hhxi H
1
q
0
x Le
hY ir
2
Y
=2
21
where x is the coordinate with x 0 at the left bound-
ary, and the head variance
r
2
h
q
2
0
x L
2
e
2hY ir
2
Y
e
r
2
Y
1: 22
Note that Eqs. (21) and (22) are exact solutions because
there is no approximation involved in deriving these
solutions, and therefore they are considered as refer-
ences for comparison between the CMC and ISMC
approaches. The parameters in our example are given
as: q
0
0:1 m/day, H
1
8:0 m, L 100:0 m, hY i
3:0, i.e., K
G
0:04979 m/day, and r
2
Y
2:0.
First, we need to nd two pairs of optimal means
and variances, one for importance sampling function
f
1
y and the other for f
2
y. We take samples Y
i
,
i 1; 2; . . . ; N, from a distribution NhY
IS
i; r
2
Y ;IS
, cal-
culate gx; Y
i
according to gx; Y
i
hx; Y
i
H
1

q
0
x L expY
i
, and then compute sample variance
S
2
H;N
and S
2
h
2
;IS
using Eqs. (10) and (18), respectively. This
procedure is repeated for dierent values of hY
IS
i hY
j
i,
j 1; 2; . . . ; m, and r
2
Y ;IS
r
2
k
, k 1; 2; . . . ; n. Find op-
timal parameters that minimize Eqs. (10) and (18), i.e.,
those parameters corresponding to minimum S
2
H;N
and
S
2
h
2
;IS
among m n values. The dependence of S
2
H;N
and
S
2
h
2
;IS
on hY
IS
i and r
2
Y ;IS
is illustrated in Fig. 2(a) and (b),
showing two contour maps for S
2
H;N
and S
2
h
2
;IS
, respec-
tively. It is found that the optimized importance density
f
1
y is N3:8; 3:0 and f
2
y is N5:9; 4:7, both of
which greatly deviate from the original distribution
N3:0; 2. It is understandable that the large variance
in importance density functions of Y represents the need
to enlarge the tail from the original distribution, i.e., the
5
1
0
1
0
0
5
0
0
1
0
0
0
2
0
0
0
1
0
0
5
0
0
1
0
0
0
2
0
0
0
1
Y
IS
2Y
,
I
S
-8 -6 -4 -2 0
2
3
4
5
6
(a)
1
0
0
0
0
1
0
0
0
0
0
1
E
+
0
6
2
E
+
0
6
4
E
+
0
6
5
0
0
0
0
5
0
0
0
1
E
+
0
6
Y
IS
2Y
,
I
S
-8 -6 -4 -2 0
2
3
4
5
6
(b)
Fig. 2. Contour maps showing dependence of: (a) S
2
H;N
and (b) S
2
h
2
;IS
on parameters hY
IS
i and r
2
Y ;IS
of importance density functions. Optimal hY
IS
i and
r
2
Y ;IS
are determined from the minimum S
2
H;N
and S
2
h
2
;IS
in the maps.
Y samples from the tails have larger impact on mean
head than that of samples which are closer to the mean.
While the left shift of the mean value, i.e., smaller hY
IS
i,
may imply that small values of Y have more impor-
tant contribution to the mean head.
Figs. 3 and 4 compare mean head and head variance
at the left boundary (x 0) of the one-dimensional
domain obtained from dierent approaches: the CMC
simulation (CMC, 10
6
simulations), IS-based Monte
Carlo simulation (ISMC, 10
5
simulations) for two sets
of samples taken from f
1
y and f
2
y, respectively. As a
reference, the corresponding analytical solutions, i.e.,
Eqs. (21) and (22), are also shown in the gures. The
gures show that ISMC simulation are several orders of
magnitude more ecient than the CMC simulation. In
addition, it seems that a smaller number of (IS-based)
simulations is required for estimating mean head than
for head variance. A few hundreds of simulations are
enough for estimating the mean while thousands of
simulations are needed for estimating the head variance.
Furthermore, Fig. 4 indicates that for estimating the
head variance the optimized importance density func-
tion f
2
y that minimizes r
2
h
2
;IS
in equation (17) is more
ecient than f
1
y that minimizes equation (8). Like-
wise, it is seen from Fig. 3 that f
1
y is more ecient
than f
2
y in estimating the mean head. Figs. 5 and 6
convey the similar information as Figs. 3 and 4 but for
the proles of mean head and head variance in the 1D
domain.
The performance of the CMC and ISMC approaches
for estimating the mean head and the head variance at
the left boundary for dierent levels of precision (here
0:1 means that the estimated values is within 10%
of the exact value) is tabulated in Table 1. The superior
performance of the ISMC over the CMC is clearly evi-
dent. For example, in estimating the mean head at
0:1 and 0.05 levels, it takes just one run for the
ISMC (using f
1
y) while it takes 552 and 2342 runs for
the CMC to achieve the same levels of precision. Simi-
larly, in estimating the head variance, the ISMC (using
NMC
(
m
)
10
2
10
3
10
4
10
5
10
6
10.0
10.5
11.0
11.5
12.0
12.5
13.0
13.5
14.0
Analytical
CMC
IS, f
1
(y)
IS, f
2
(y)
Fig. 3. Comparison of the rate of convergence for mean head at the
left boundary obtained from dierent methods.
NMC
h 2
(
m
2
)
10
2
10
3
10
4
10
5
10
6
0
40
80
120
160
200
240
Analytical
CMC
IS, f
1
(y)
IS, f
2
(y)
Fig. 4. Comparison of the rate of convergence for head variance at the
left boundary obtained from dierent methods.
x (m)
(
m
)
0 25 50 75 100
8
9
10
11
12
13
14
Analytical
CMC (10
4
)
CMC (10
5
)
CMC (10
6
)
IS (f
1
, 10
4
)
IS (f
2
, 10
4
)
Fig. 5. Proles of mean head obtained from analytical solution, im-
portance sampling methods, and the CMC simulation with various
sample sizes.
x (m)
h 2
(
m
2
)
0 25 50 75 100
0
25
50
75
100
125
150
175
200
Analytical
CMC (10
4
)
CMC (10
5
)
CMC (10
6
)
IS (f
1
, 10
4
)
IS (f
2
, 10
4
)
Fig. 6. Proles of head variance obtained from analytical solution,
importance sampling methods, and the CMC simulation with various
sample sizes.
f
2
y) is several orders of magnitude more ecient than
the CMC for all levels of precision.
Although the Monte Carlo approach based on im-
portance sampling converges much faster than the CMC
approach, computational eort is needed in accurately
determining the optimal importance densities. Here we
investigated the eect of possible deviation from the
optimal density. Figs. 7 and 8 show convergence of
mean head and head variance at the left boundary re-
sulted from ISMC simulations with importance densities
f
1
y and f
2
y that are dierent from their optimal
densities. Though simulations with the optimal density
functions work much better (comparing to the analytical
value) than those with density functions that are devi-
ated from the optimal one, all of them are several orders
of magnitude more ecient than the CMC simulation.
This implies that it may not be necessarily to obtain a
very accurate estimation of parameters of the impor-
tance density function and that the sampling intervals
for constructing such contour maps as Fig. 2 can be
relatively large, thus allowing great savings in compu-
tational eorts for determining suboptimal importance
density functions.
3.2. Particle travel time
In this example, we consider a case with a similar
boundary congurations as in case 1, but here we as-
sume that porosity / is a random constant following a
normal distribution Nl
/
; r
2
/
and we are interested in
the probability of particles travel time less than a given
value T. Since porosity / is a random variable, so is
travel time t L=q=/ L/=q. In fact, the travel time
t NLl
/
=q; L
2
r
2
/
=q
2
. The probability Pt < T can be
written explicitly as
P
T
Pt < T P
L
q
/
< T
P /
<
Tq
L
!
U
Tq=L l
/
r
/

23
where Ux is the cumulative density function of the
standard normal distribution. Eq. (23) is a basis in
comparing eectiveness of the CMC and ISMC simu-
lations.
Now if we want to estimate P
T
using Monte Carlo
simulation, we rewrite P
T
as
P
T
Pt < T P /
<
Tq
L

Z Tq
L
1
f / d/
Z
1
1
g/f / d/ 24
where f / is the pdf of variable /, i.e., Nl
/
; r
2
/
, and
g/ is a score function dened as g/ 1 if /6Tq=L,
and 0 otherwise. Eq. (24) relates P
T
to the mean of the
score function g/ under density f /. The variance of
g/ can be determined from
NMC
(
m
)
10
2
10
3
10
4
10
5
10
6
10.0
10.5
11.0
11.5
12.0
12.5
13.0
13.5
14.0
Analytical
2
Y,IS
= 2.5, Y
IS
=-3.0
2
Y,IS
= 2.5, Y
IS
= -4.5
2
Y,IS
= 3.0, Y
IS
=-3.0
2
Y,IS
= 3.0, Y
IS
=-3.8 (optimized)
2
Y,IS
= 3.0, Y
IS
=-4.5
CMC
Fig. 7. Sensitivity of the importance sampling method on nonoptimal
parameters for mean head simulation.
NMC
h 2
(
m
2
)
10
2
10
3
10
4
10
5
10
6
0
50
100
150
200
250
Analytical
2
Y,IS
= 4.0, Y
IS
=-5.9
2
Y,IS
= 5.5, Y
IS
= -5.9
2
Y,IS
= 4.7, Y
IS
=-5.0
2
Y,IS
= 4.7, Y
IS
=-5.9 (optimized)
2
Y,IS
= 4.7, Y
IS
=-7.0
CMC
Fig. 8. Sensitivity of the importance sampling method on nonoptimal
parameters for head variance simulation.
Table 1
Comparison of CMC and ISMC on the number of simulations required for dierent levels of precision
hhi r
2
h
0.1 0.05 0.01 0.1 0.05 0.01
CMC 552 2342 21,200 61,301 506,600 >10
7
IS, f
1
x 1 1 79 2374 67,746 223,284
IS, f
2
x 77 391 9647 387 1361 104,714
r
2
Z
1
1
g/ P
T
2
f / d/
Z
1
1
g
2
/f / d/ P
2
T
1 P
T
P
T
25
For the CMC simulation, one samples a sequence of /
i
,
i 1; 2; . . . ; N. If /
i
is less than or equal to Tq=L, then
set g/
i
equal to unity. Otherwise, set g/
i
equal to
zero, as illustrated in Fig. 9(a) for the case of T 250
days. We collect the scores from all N samples and
calculate the sample mean:
P
T;MC

1
N
X
N
i1
g/
i
26
For given parameters q 0:1 m/day, l
/
0:3, r
/

0:03, i.e., CV
/
10%, and L 100:0 m, Eq. (23) reduces
to P
T
U10:0 T=30. Table 2 gives P
T
values ob-
tained from the analytical solution, i.e., Eq. (23), for
dierent T values. The table also gives the number of
simulations required to estimate P
T
using the CMC
simulation (N
MC
), within 10% of the exact values.
From Table 2, it is seen that, with the decrease of
parameter T (for T less than or equal to the mean travel
time 300 days), the probability P
T
Pt < T decreases
dramatically, and the number of realizations required to
predict P
T
using CMC increases signicantly. For in-
stance, to predict probability Pt < 100 days with 10%
accuracy will require over seven trillion realizations,
which is clearly beyond capability of any computational
scheme at the present time. The reason is that, for the
given parameter values, the value of f
t
t 100 days or
equivalently the value of f / 0:10 is 1.31 10
11
,
which is so small that it is almost impossible to take a
sample / from f / such that / < 0:1. Another problem
encountered in this kind of simulations is that currently
it is dicult to generate such a large number of random
numbers without repetitions. Although theoretically
some algorithms, such as ran2 as presented in [12],
have a period longer than 10
18
, in a machine of 32-bit
values of integers there are no more than 2
31
2,147,483,647 <10
10
positive signed integers. Thus in a
sequence of 10
10
such integers there must be repetitions.
It should be noted that the computational burden (the
number of simulations required) will be even larger if we
need more accurate predictions. For example, if we re-
duce 0:1 to 0.05, i.e., 5% precision, we have to in-
crease the sample size by a factor of four.
Now instead of taking samples from f /, we use the
importance sampling technique and take samples /
i
,
i 1; 2; . . . N, from an importance pdf f
1
/ of normal
distribution Nl
/;IS
; r
2
/;IS
. To preserve the mean, we
have to dene a new score function g
1
/ g/f /=
f
1
/, which is shown in Fig. 9(b) for the cases of
T 250 days. Certainly, taking samples from f
1
/ such
that / < 0:25 is much easier than taking them from
the original density f /, thus the sample size can be
reduced.
After we sample /
i
, i 1; 2; . . . ; N, from f
1
/, we
can estimate P
T
by
P
T;IS

1
N
X
N
i1
g
1
/
i

1
N
X
N
i1
g/
i
f /
i
=f
1
/
i
27
The variance of the new score function g
1
/ under
distribution f
1
/ can be derived as
0 0.1 0.2 0.3 0.4 0.5 0.6
0
2
4
6
8
10
12
14
f(x)
g(x)
(a)
0 0.1 0.2 0.3 0.4 0.5 0.6
0
2
4
6
8
10
12
14
f
1
0.00
0.05
0.10
0.15
0.20
0.25
0.30
g
1 f
1
(x)
g
1
(x)
(b)
Fig. 9. Pdf and score function: (a) original, and (b) modied.
Table 2
Number of simulations required for dierent values of T
T P
T
N
MC
N
MT
N
opt
300 0.500 1.00 10
2
100 100
250 4.78 10
2
1.99 10
3
202 151
200 4.23 10
4
2.33 10
5
376 285
150 2.87 10
7
3.49 10
8
567 437
100 1.31 10
11
7.64 10
12
767 596
50 3.93 10
17
2.54 10
18
970 759
r
2
IS

Z
1
1
g
1
/ P
T
2
f
1
/ d/
A
p
r
2
/;IS
r
/
exp
B
2
=A C
2r
2
/
r
2
/;IS
!
U
A
p
Tq=L B=A
r
/
r
/;IS
!
P
2
T
28
where A 2r
2
/;IS
r
2
/
, B r
2
/
l
/;IS
2r
2
/;IS
l
/
, C
2r
2
/;IS
l
2
/
r
2
/
l
2
/;IS
. The variance of the estimator P
T;IS
is r
2
IS
=N. In contrast to (25), the variance r
2
IS
can be
reduced by properly choosing l
/;IS
and r
2
/;IS
of the
importance density function f
1
/. Notice that the ex-
pression for A implies r
2
/;IS
Pr
2
/
=2.
Theoretically, the optimal density function can be
determined by nding l
/;IS
and r
2
/;IS
such that r
2
IS
in Eq.
(28) is minimized. However, it is not easy to directly
minimize r
2
IS
in Eq. (28). We utilize the large deviation
theory (LDT) and nd that that the optimized impor-
tance density is a Gaussian distribution NTq=L; r
2
/
, i.e.,
an MT from the original density Nl
/
; r
2
/
, which is
consistent with a previous result that for the Gaussian
distribution, the LDT approach is the same as MT [14].
To conrm this numerically, we computed a series of r
2
IS
based on Eq. (28) with dierent values of l
/;IS
and r
2
/;IS
for two particular values of T 100 and 250 days. Thus
the probability of travel time t < T for these two T
values corresponds to the probability of / < Tq=L 0:1
and 0.25, respectively. Fig. 10(a) and (b) show the per-
formance of importance density functions, where the
position of the dot in the gure stands for the mean
and standard deviation of the original density f /.
The measure of performance in Fig. 10 is repre-
sented by contour maps of logc logr
2
MC
=r
2
IS

logN
MC
=N
IS
logN
MC
logN
IS
, showing the dif-
ference of the order of magnitude in the number of
Monte Carlo simulations required for the CMC simu-
lation and ISMC simulation. For instance, the curve for
the contour level logc 2 means that the number of
simulations required for the CMC simulation is two
orders larger than the number required for the ISMC
simulation.
Several observations can be made from Fig. 10. First
of all, for any given T and any xed variance r
2
/;IS
,
sampling from distribution NTq=L; r
2
/;IS
is most e-
cient, i.e., l
/;IS
Tq=L as predicted by the LDT. Notice
that the optimal distribution is not NTq=L; r
2
/
but a
slightly scaled Gaussian distribution with a smaller
variance r
2
/;IS
< r
2
/
. Intuitively, this is understandable
that while moving the mean of the sampling density to
the dominating point which has the highest probability
density (in the original density function) among other
points in the domain of interest, D, the eciency is
further increased by concentrating the sampling dis-
tribution in the region around the dominating point. In
addition, when sampling distribution is overbiased, the
computational eort may increase rather than decrease,
by many orders in worst cases. Thirdly, when the sam-
pling mean is deviated form the optimized mean, the loss
of eciency may be compensated by increasing sam-
pling variance r
2
/;IS
. Furthermore, the eciency that the
IS techniques can achieve depends on the dierence
between l
/
and the optimized mean l
/;opt
Tq=L. For
example, for T 100 days, as illustrated in Fig. 10(a),
the optimized importance sampling can achieve about
10 orders of magnitude more ecient than the CMC
simulation. While for T 250 days, the eciency gain
that the importance sampling can achieve is only about
1 order of magnitude. In the extreme case of T 300
days, l
/;opt
Tq=L 0:3 l
/
, the ISMC simulation
does not have any eciency gain over the CMC simu-
lation. In this case, any density that deviates from f /
requires more computational eort.
Although the importance density derived from MT is
slightly dierent from the optimal density, they are in
0.5
1
0
-
5
-
1
0
0
-5
,IS
,
I
S
0.04 0.06 0.08
0.10
0.15
0.20
0.25
0.30
0.35
0.40
0.45
0.50
(b)
Tq/L = 0.25
9.5
9
8
5
0
9.8 10
9
8
,IS
,
I
S
0.04 0.06 0.08
0.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
(a)
Tq/L = 0.10
Fig. 10. Contour maps showing dependence of eciency of importance techniques, presented by logN
MC
=N
IS
, on parameters hl
/;IS
i and r
2
/;IS
of importance density functions for (a) T 100 days and (b) 250 days. Optimal hl
/;IS
i and r
2
/;IS
are determined from the maximum logN
MC
=N
IS
in
the maps.
the same order of magnitude of eciency compared to
the CMC simulation, as also shown in Table 2 for two
ISMC simulations with density functions derived from
MT (N
MT
) and from optimization (N
opt
).
Numerical results from the CMC and ISMC simu-
lations are tabulated in Table 3, where the number in
parenthesis is the number of simulations used to com-
pute P
T
. As expected, for accuracy of 10%, 1000 simu-
lations is enough for the ISMC approach for T value as
low as 50 days, while a signicant large number of runs
is required for the CMC simulation to achieve the same
level of accuracy even for T 200 days. For T < 150
days, we are not able to obtain a good estimation of P
T
using CMC.
4. Summary and conclusions
In this study we introduced a new approach, impor-
tance sampling techniques, and demonstrated its use-
fulness in simulating ow and transport in random
porous media through two simple one-dimensional ow
scenarios. The study leads to the following conclusions:
(1) In simulating ow and solute transport in random
porous media, it is possible to save computational eort
tremendously by employing importance sampling tech-
niques. Because dierent random realizations may have
a dierent impact on the quantities being estimated,
realizations are taken from biased density functions
(importance density functions) rather than the underly-
ing original density functions such that those realiza-
tions with more important impact on estimated
quantities are sampled more frequently. The eect of
biased density functions is then corrected by weighting
simulation outputs such that the means of the quantities
being estimated are preserved. By choosing appropriate
importance density functions, the ISMC simulation can
be many orders of magnitude more ecient than the
CMC simulation. In some cases, using ISMC approach,
one can solve some problems that cannot be solved
using the CMC simulation.
(2) The importance density for the mean estimator is
chosen in such a way that the variance of the mean es-
timator is reduced while the mean is preserved. There-
fore, the density function that minimizes the variance of
the mean estimator does not necessarily minimize the
variance of the variance estimator. In other words, in
estimating both mean quantity and its associated pre-
diction variance, one may need to nd dierent impor-
tance density functions.
(3) In practice, we may not need to nd the optimal
importance function, an importance function that is
relatively close to the optimal one in general works very
well.
(4) Finally, we have emphasize that when the sam-
pling distribution is overbiased, the computational eort
may increase rather than decrease, by many orders in
worst cases.
In examples presented in this paper, we assume that
medium properties are random constant. To realistically
apply importance sampling techniques to the real world,
it is needed to extend this approach to correlated ran-
dom elds.
Appendix A
For one-dimensional horizontal saturated ow with
prescribed ux on the left and prescribed constant head
on the right, in a porous medium whose log hydraulic
conductivity is random constant, the governing equation
for head h can be written as
o
2
h
ox
2
0 0 6x 6L A:1
with boundary conditions h H
1
at x L and
K
s
oh=ox q
0
at x 0. The exact solution for the above
equation with boundary conditions is
hx H
1
q
0
x Le
Y
A:2
where Y hY i Y
0
. Taking ensemble mean of Eq. (A.2)
yields the mean head
hhxi H
1
q
0
x Le
hY ir
2
Y
=2
A:3
Subtracting Eq. (A.3) from Eq. (A.2) gives
h
0
x q
0
x Le
hY i
e
Y
0
e
r
2
Y
=2
h i
A:4
which leads to the expression for head variance
r
2
h
x q
2
0
x L
2
e
2hY i
e
r
2
Y
e
r
2
Y
1
h i
A:5
Here we used the results hexpY
0
i expr
2
Y
=2 and
hexp2Y
0
i exp2r
2
Y
, which are exact for normally
distributed Y
0
.
Table 3
Comparison of P
T
resulted from dierent approaches
T P
T
P
T;MC
N
MC
P
T;IS
N
IS
300 0.500 0.500 0.500

250 4.78 10
2
5.05 10
2
(2000)
4.62 10
2
(1000)
200 4.23 10
4
4.53 10
4
(300,000)
4.33 10
4
(1000)
150 2.87 10
7
2.64 10
7
(1000)
100 1.31 10
11
1.20 10
11
(1000)
50 3.93 10
17
3.58 10
17
(1000)
References
[1] Chen J, Lu D, Sadowsky JS, Yao K. On importance sampling in
digital communicationsPart I: Fundamentals. IEEE J Select
Areas Commun 1993;11(3):28999.
[2] Chen J, Sadowsky JS, Lu D, Yao K. On biasing Gaussian noise in
importance sampling: optimization and implementation. In:
Proceeding of SUPERCOMM/ICC 92: Discovering a New
World of Communications, vol. 3, 1992. p. 13026.
[3] Dagan G. Stochastic modeling of groundwater ow by uncondi-
tional and conditional probabilities: 1. Conditional simulation
and the direct problem. Water Resour Res 198;18(4):813833.
[4] Dagan G. Flow and transport in porous formations. New York:
Springer-Verlag; 1989.
[5] Davis BR. An improved importance sampling method for digital
communication system simulation. IEEE Trans Commun 1986;
COM-34:7159.
[6] Gelhar LW. Stochastic subsurface hydrology. Englewood Clis,
NJ: Prentice-Hall; 1993.
[7] Gelhar LW, Axness CL. Three-dimensional stochastic analysis of
macrodispersion in aquifers. Water Resour Res 1983;19(1):16180.
[8] Glynn PW, Iglehart DL. Importance sampling for stochastic
simulations. Manage Sci 1989;35:136792.
[9] Lu D, Yao K. Improved importance sampling techniques for
ecient simulation of digital communication systems. IEEE J
Select Areas Commun 1988;6(1):6775.
[10] Lu Z, Neuman SP, Guadagnini A, Tartakovsky DM. Conditional
moment analysis of steady state unsaturated ow in bounded
randomly heterogeneous porous soils. Water Resour Res 2002;
8(4):10.1029/2001WR000278.
[11] Neuman SP. EulerianLagrangian theory of transport in space
time nonstationary velocity elds: Exact nonlocal formalism by
conditional moments and weak approximations. Water Resour
Res 1983;29(3):63345.
[12] Press WH, Teukosky SA, Vetterling WT, Flannery BP. Numerical
recipes in FORTRAN: the art of scientic computing. Cambridge
University Press; 1992.
[13] Sadowsky JS. On the optimality and stability of exponential
twisting in Monte Carlo estimation. IEEE Trans Inform Theory
1993;39:11928.
[14] Smith PJ, Sha M, Gao H. Quick simulation: a review of
importance sampling techniques in communications systems.
IEEE J Select Areas Commun 1997;15(4):597613.
[15] Spanier J. An analytic approach to variance reduction. SIAM J
Appl Math 1970;18:17290.
[16] Tartakovsky DM, Neuman SP, Lu Z. Conditional stochastic
averaging of steady state unsaturated ow by means of Kirchho
transformation. Water Resour Res 1999;35(3):73145.
[17] Yeh T-C, Gelhar LW, Gutjahr AL. Stochastic analysis of
unsaturated ow in heterogeneous soils: 1. Statistically isotropic
media. Water Resour Res 1985;21(4):44756.
[18] Zhang D. Numerical solutions to statistical moment equations of
groundwater ow in nonstationary, bounded heterogeneous
media. Water Resour Res 1998;34:52938.
[19] Zhang D. Stochastic methods for ow in porous media: coping
with uncertainties. San Diego, California: Academic Press;
2002.
[20] Zhang D, Neuman SP. EulerianLagrangian analysis of transport
conditioned on hydraulic data: 1. Analyticalnumerical approach.
Water Resour Res 1995;31(1):3951.
[21] Zhang D, Winter CL. Moment equation approach to single phase
uid ow in heterogeneous reservoirs. Soc Petrol Eng J
1999;4(2):11827.

2003 Awr 3

Uploaded by

Copyright:

Available Formats

2003 Awr 3

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

2003 Awr 3

Uploaded by

Copyright:

Available Formats

On importance sampling Monte Carlo approach

to uncertainty analysis for ow and transport in porous media

300 0.500 0.500 0.500

You might also like