2003 Awr 3
2003 Awr 3
2003 Awr 3
2
g
Z
X
gy l
h
2
f y dy
Z
X
g
2
yf y dy l
2
h
; 2
where X is a probability space, and E represents statis-
tical expectation. Note that both l
h
and r
2
h
depend on
the problem itself and are independent of the way one
estimates them. To estimate the mean from Eq. (1) using
the CMC method, one samples randomly a sequence of
y
i
, i 1; 2; . . . ; N, from the density function f y and
computes the sample mean
hh
N
1
N
X
N
i1
gy
i
: 3
Since y
i
, i 1; 2; . . . ; N are independent identically dis-
tributed random variables, it can be shown that
E
hh
N
l
h
, i.e.,
hh
N
is an unbiased estimator of l
h
, and
the variance of the estimator
hh
N
is r
2
MC
r
2
h
=N. Because
r
2
h
is unknown, it is usually estimated using
S
2
h;N
1
N
X
N
i1
g
2
y
i
hh
2
N
: 4
1178 Z. Lu, D. Zhang / Advances in Water Resources 26 (2003) 11771188
By the Central Limit Theorem, in the limit N ! 1, the
probability density of
hh
N
l
h
=r
h
=
N
p
tends to a
standard Gaussian distribution N0; 1. Thus we expect
hh
N
to lie within r
h
=
N
p
around l
h
with the probability
of 68%, if N is suciently large. From this, we can
estimate the minimum number of simulations required
to obtain the mean estimation with a given accuracy.
Let N
;MC
denote the minimum number of simulations
required to obtain a 100 % precision, then
N
;MC
dr
2
h
=l
2
h
2
e; 5
where dxe is the least integer Px. For instance, if we
need the estimation to be within 10% around the exact
value, the minimum number of required simulations
should be N
0:1;MC
dr
2
h
=l
2
h
2
e d100r
2
h
=l
2
h
e.
Because the statistical error r
h
=
N
p
is inversely pro-
portional to
N
p
, if we wish to reduce it by a factor of
two we have to increase the sample size by a factor of
four. Certainly, the rate of convergence of the CMC
method is rather slow. It is desirable if we can design
some ways to reduce the estimation variance more
rapidly as N increases.
The importance sampling technique is one of such
ways that reduce the estimation variance (called vari-
ance reduction techniques) and thus reduce the statisti-
cal error much faster than the CMC techniques. The
eciency of the importance sampling techniques de-
pends on the selected importance density functions. The
essence of importance sampling is to avoid taking
samples y in regions where the impact of the value of the
function gy to the quantity being estimated is negli-
gible but to concentrate on (important) regions where
the impact is large. This operation will inevitably in-
troduce bias (or, distortion), which is to be corrected
by weighting the sample values appropriately.
Suppose we sample y
i
, i 1; 2; . . . ; N, from an im-
portance density function f
1
y rather than the original
density function f y, where f
1
y is zero only if f y is
zero. To preserve the mean (i.e., to correct the bias) one
has to modify the original score function gy. A mod-
ied score function H g
1
Y is dened as
g
1
y gyw
1
y; 6
where w
1
y f y=f
1
y is called a weight function.
The expectation of H under density f
1
y can be deter-
mined
l
H
EH Eg
1
Y
Z
X
g
1
yf
1
y dy
Z
X
gy
f y
f
1
y
f
1
y dy
Z
X
gyf y dy l
h
; 7
which means that the mean remains the same though the
samples are taken biasedly from the importance density
function f
1
y. The variance H under density f
1
y is
r
2
H
Efg
1
Y l
H
2
g
Z
X
g
1
y l
H
2
f
1
y dy
Z
X
g
2
1
yf
1
y dy l
2
H
Z
X
f y
f
1
y
" #
g
2
yf y dy l
2
H
: 8
From (7) one can construct an estimator for l
h
based on
samples y
i
, i 1; 2; . . . ; N:
H
N
1
N
X
N
i1
g
1
y
i
1
N
X
N
i1
gy
i
w
1
y
i
; 9
i.e., the contribution of sample y
i
in H
N
is weighted by
w
1
y
i
and the bias due to sampling from the biased
importance density function has been corrected. The
variance of the estimator H
N
is r
2
IS
r
2
H
=N, where r
2
H
can be estimated using
S
2
H;N
1
N
X
N
i1
g
2
y
i
w
2
1
y
i
H
2
N
: 10
Similar to equation (5), the minimum number of simu-
lations required to obtain an estimation of Eg with a
100 % precision is
N
;IS
dr
2
IS
=l
2
H
2
e: 11
The ratio of variances c r
2
MC
=r
2
IS
N
;MC
=N
;IS
is a
measure of the eciency (performance) of the impor-
tance sampling, which depends on the choice of the
importance density function. Here r
2
MC
r
2
h
=N is the
estimation variance of the CMC method. It should be
noted by comparison between (8) and (2) that if we
choose the importance density function f
1
y properly
such that the weight function w
1
y on average is sub-
stantially less than unity, r
2
H
and thus the variance of the
estimator r
2
IS
r
2
H
=N will be reduced. This means that,
comparing to the CMC method, sampling from the
importance density function may allow us to estimate
the mean with a small sample size for a given accuracy,
or with a much better accuracy for a given sample size.
In fact, as will be shown later, the Monte Carlo ap-
proach based on importance sampling techniques makes
it possible to solve some problems that cannot be solved
by the CMC simulation.
We should emphasize here that uncertainties associ-
ated with modeling applications can be classied as
reducible and irreducible. Natural uncertainty is
inherited or irreducible, while data and model un-
certainties contain both reducible and irreducible com-
ponents. By the term variance reduction we refer to
reduction of estimation variance, i.e., the variance of the
estimator or model uncertainty, not the variance inher-
ited from uncertainties in system parameters. For in-
stance, in evaluating an integral
R
b
a
f x dx numerically,
one may discretize interval a; b using N points that can
Z. Lu, D. Zhang / Advances in Water Resources 26 (2003) 11771188 1179
be either uniformly or randomly distributed in a; b and
then calculate
P
f x
i
Dx
i
. The accuracy of this estima-
tion depends on the number of points, N, and the dis-
tribution of these points in a; b. A good estimator may
require less computation time, converge faster to the
true value, and thus have a smaller estimation variance.
Note that here the variance caused by uncertainties is
not involved, because the value of the integral is a de-
terministic quantity. In general, the variance due to
parameter uncertainties cannot be reduced unless more
information about system parameters is added. For ex-
ample, if we are interested in estimating mean head for
given uncertainty of hydraulic conductivity, the variance
introduced by the spatial variability of hydraulic con-
ductivity cannot be reduced unless more information,
such as more measurements on hydraulic conductivity,
is provided. Of course, we may be able to design a good
estimator (such as importance sampling) such that the
estimator converges to the true value faster, i.e., reduc-
ing the variance of the estimator.
2.2. Variance estimation
Unlike in communication systems where the mean,
bit error rate, is the main quantity to be estimated, in
hydrology it is often needed to estimate the variance
associated with the mean prediction. Because impor-
tance sampling techniques do not preserve variance (i.e.,
r
2
H
is in general not equal to r
2
h
), if we want to estimate
variance r
2
h
, we will have to re-formulate the expression
for variance. From Eq. (2) we have
r
2
h
EfgY l
h
2
g
Z
X
g
2
yf y dy l
2
h
Z
X
f
1
y
f y
g
2
1
yf
1
y dy l
2
H
; 12
where l
h
has been replaced by l
H
because they are the
same. In Eq. (12), for any y such that f y 0, from Eq.
(6), g
1
y must be zero. From Eq. (12), if we take sam-
ples y
i
, i 1; 2; . . . ; N, from an importance density
function f
1
y, then the original variance r
2
h
may be es-
timated using the following expression:
S
2
h;N
1
N
X
N
i1
f
1
y
i
f y
i
g
2
1
y
i
H
2
N
1
N
X
N
i1
g
2
y
i
w
1
y
i
H
2
N
: 13
A comparison with Eq. (10) reveals that the reduction of
variance S
2
H;N
in Eq. (10) is due to the fact that g
2
y is
weighted by w
2
1
y in S
2
H;N
rather than by w
1
y as in S
2
h;N
in Eq. (13).
It should be noted that, though Eq. (13) gives cor-
rected estimation of variance, the rate of convergence of
estimator S
2
h;N
in Eq. (13) to r
2
h
as n ! 1 may not
necessarily be optimized, because the importance density
function f
1
y discussed above is selected for the pur-
pose of preserving the mean quantity l
H
l
h
while re-
ducing the computational eort by minimizing Eq. (8).
In some special cases, as shown in case 2 in the next
section, the score function gy is an indicator function
thus g
2
y gy, and from Eq. (2) we have r
2
h
l
H
1 l
H
. Therefore, in this case once we have esti-
mated l
H
, the variance can be easily calculated without
resorting to Eq. (13).
However, in general, from Eq. (2), if one wishes to
estimate r
2
h
using smaller number of simulations, one
may need to construct a new importance density func-
tion f
2
y to estimate the mean of g
2
y, just as we
construct f
1
y to estimate the mean of gy. Suppose we
sample y
i
, i 1; 2; . . . ; N, from an importance density
function f
2
y, a function to be determined. Again, f
2
y
is zero at y only if f y is zero. By dening
g
2
y gyw
2
y, where w
2
y
f y=f
2
y
p
, the last
integral in Eq. (2) becomes
Eh
2
Eg
2
Y
Z
X
g
2
yf y dy
Z
X
g
2
2
yf
2
y dy; 14
which means that the mean of g
2
y under distribution
f y is preserved by the mean of g
2
2
y under distribution
f
2
y. Therefore, the variance r
2
h
is preserved under
distribution f
2
y. From the last integral of Eq. (14),
Eh
2
can be estimated by
h
2
N
1
N
X
N
i1
g
2
2
y
i
1
N
X
N
i1
g
2
y
i
w
2
2
y
i
: 15
Therefore, instead of using Eq. (13) one may use the
following expression to estimate r
2
h
in Eq. (2)
S
2
h;N
1
N
X
N
i1
g
2
y
i
w
2
2
y
i
H
2
N
; 16
where H
N
is estimated from Eq. (9) using weighting
function w
1
y, and samples y
i
, i 1; 2; . . . ; N, are taken
from the importance density f
2
y. The latter can be
selected by minimizing the variance
r
2
h
2
;IS
Z
X
g
2
2
y Eh
2
2
f
2
y dy
Z
X
g
4
yw
4
2
yf
2
y dy Eh
2
Z
X
f y
f
2
y
g
4
yf y dy Eh
2
2
: 17
The variance r
2
h
2
;IS
of estimator S
2
h;N
can be estimated
from importance samples y
i
:
S
2
h
2
;IS
1
N
X
N
i1
g
4
y
i
w
4
2
y
i
h
2
N
2
: 18
1180 Z. Lu, D. Zhang / Advances in Water Resources 26 (2003) 11771188
To our best knowledge, the importance sampling pro-
cedure for variance estimation in (16) and (17) is de-
veloped for the rst time. In a special case where gy is
an indicator function, g
2
y gy, thus Eh
2
Eg
2
y Egy l
h
. A comparison of the last inte-
grals of Eqs. (8) and (17) reveals that in this case the
importance density that minimizes Eq. (8) will also
minimize Eq. (17), i.e., f
2
y f
1
y. In general, f
2
y is,
however, dierent from f
1
y.
In summary, to estimate a mean quantity using the
importance sampling techniques, one chooses an im-
portance density function f
1
such that r
2
H
in Eq. (8) or
S
2
H;N
in Eq. (10) is minimized, samples y from f
1
y, and
computes H
N
in Eq. (9) as an estimate of l
h
. Similarly,
to eciently estimate variance r
2
h
, one nds f
2
such that
r
2
h
2
;IS
in (17) or S
2
h
2
;IS
in (18) is minimized, takes samples
from f
2
, and estimates r
2
h
using S
2
h;N
in (16).
2.3. Selection of importance density functions
One important question that remains is how to
choose the importance density function such that the IS
estimator has a smaller variance than the CMC esti-
mator. Here we limit our discussion on selection of f
1
y
which minimizes Eq. (8) and reduces computational ef-
fort in estimating mean quantity l
h
. A similar procedure
can also be applied to selection of f
2
y that minimizes
Eq. (17) and reduces the number of simulations required
in estimating variance r
2
h
. Ideally, if one chooses
f
1
y gyf y=l
h
, then from Eq. (8) it follows that
r
2
H
0. Though f
1
y dened here contains l
h
that is the
quantity being estimated and thus cannot be used in
practice, it does give us some clues about how to con-
struct an importance density function.
It is seen from Eq. (8) that it is dicult to minimize
r
2
H
directly as a function of an unknown function f
1
y.
A practical alternative is to choose f
1
y from a family
of candidates for which sampling y
i
and evaluating
w
1
y are relatively easy. In addition, choosing f
1
y
from a family of candidates makes it possible in some
cases to derive an analytical expression for r
2
H
.
If the original distribution is Gaussian, then the
family of Gaussian distributions should be a natural
choice. Two major approaches have been used in liter-
ature, i.e, variance scaling (VS) [5] and mean translation
(MT) [9], or their combination. The concepts of MT and
VS are illustrated in Fig. 1, where f x is the original
density function. Suppose the problem being concerned
is related to a small probability Prx < x
0
, the CMC
simulation that takes samples from f x will not be very
ecient, because it is very hard to take a sample from
f x such that x < x
0
when x
0
is very small. By MT, one
simply shifts the mean of f x to x
0
, thus the frequency
to take samples from x < x
0
from the new density
function f
1
x has been increased (Fig. 1(a)). Another
way to enlarge the sampling frequency for x < x
0
is to
increase variance (Fig. 1(b)). Fig. 1(c) illustrates the
combination of MT and VS. Discussion on advantages
and disadvantages of MT and VS can be found in lit-
erature [1,2,14]. In some cases, as showed by Chen et al.
[2], VS in combination with optimized MT can add a
degree of robustness with respect to severe nonlinearity.
It is dicult in general to analytically derive expres-
sions for optimal parameters at which r
2
H
in Eq. (8) is
minimized, even when the form of f x is relatively
simple. In practice, the optimal parameter values can be
obtained numerically. For example, suppose the original
distribution is Nl; r
2
, to nd the optimal MT, one may
choose a set of l
i
, i 1; 2; . . . ; m, that spans the range of
l, take samples from Nl
i
; r
2
, and for each l
i
compute
S
2
H;N
dened in Eq. (10). The l
k
corresponding to the
minimum value of S
2
H;N
among all m values is considered
as the optimal mean l
opt
for the optimal density func-
tion.
An alternative is an iterative approach called Spa-
niers technique proposed by Spanier [15] for optimizing
a parameter through processing of the Monte Carlo
results from a relatively small samples. Expressing the
last integral in Eq. (8) as
M
2
l
i
Z
X
g
2
yf
2
y
f
1
y; l
i
f
1
y; ~ ll
f
1
y; ~ ll dy; i 1; 2; . . . ; m
19
where ~ ll is a guess value of the parameter l, and l
i
are
parameter values that span the most likely range of l.
Now start from the initial guess ~ ll, take samples y
j
,
x
(a)
f(x)
f
1
(x)
x
0 x
(b)
f
1
(x)
f(x)
x
0 x
(c)
f
1
(x)
f(x)
x
0
Fig. 1. Schematic diagrams showing: (a) MT, (b) VS, and (c) their combination in one dimension. Shaded area indicates the eective IS probability
mass.
Z. Lu, D. Zhang / Advances in Water Resources 26 (2003) 11771188 1181
j 1; 2; . . . ; N, from an importance density f
1
y; ~ ll, and
compute
M
2
l
i
1
N
X
N
j1
g
2
y
j
f
2
y
j
f
1
y
j
; l
i
f
1
y
j
; ~ ll
; i 1; 2; . . . ; m
20
The new guess value for the next iteration ~ ll l
k
at
which M
2
l
k
in Eq. (20) is the minimum among m
values of M
2
l
i
, i 1; 2; . . . ; m. Repeat this process
until ~ ll converges. Similar procedure can be applied to
nding the optimal variance r
2
IS
of the importance
density, or to obtaining both l
IS
and r
2
IS
.
3. Illustrative examples
In the section we attempt to illustrate the power of
importance sampling techniques using two simple ex-
amples for ow and transport in one-dimensional po-
rous media whose hydraulic properties are random
constants rather than correlated random elds. One of
the major reasons for choosing such simple examples is
that analytical solutions for these examples are avail-
able, and therefore they can serve as the basis for
comparing eectiveness of the conventional and IS-
based Monte Carlo (ISMC) simulations. Another rea-
son is that no clear strategy is yet available to optimally
select importance density functions for correlated ran-
dom elds, which shall be a topic of future research.
Nevertheless, to our best knowledge these examples
constitute rst applications of importance sampling
Monte Carlo techniques to subsurface ow and trans-
port problems.
3.1. Head moments of 1D ow
In our rst example, denoted as case 1, we consider a
one-dimensional horizontal ow with prescribed deter-
ministic inux q
0
on the left and prescribed deterministic
constant head H
1
on the right. We assume that the log
saturated hydraulic conductivity Y ln K
s
is a random
constant, being a constant in the physical space but
varying in the probability space with a normal distri-
bution NhY i; r
2
Y
. For this case, as shown in Appendix
A, the mean hydraulic head hhxi can be expressed as
hhxi H
1
q
0
x Le
hY ir
2
Y
=2
21
where x is the coordinate with x 0 at the left bound-
ary, and the head variance
r
2
h
q
2
0
x L
2
e
2hY ir
2
Y
e
r
2
Y
1: 22
Note that Eqs. (21) and (22) are exact solutions because
there is no approximation involved in deriving these
solutions, and therefore they are considered as refer-
ences for comparison between the CMC and ISMC
approaches. The parameters in our example are given
as: q
0
0:1 m/day, H
1
8:0 m, L 100:0 m, hY i
3:0, i.e., K
G
0:04979 m/day, and r
2
Y
2:0.
First, we need to nd two pairs of optimal means
and variances, one for importance sampling function
f
1
y and the other for f
2
y. We take samples Y
i
,
i 1; 2; . . . ; N, from a distribution NhY
IS
i; r
2
Y ;IS
, cal-
culate gx; Y
i
according to gx; Y
i
hx; Y
i
H
1
q
0
x L expY
i
, and then compute sample variance
S
2
H;N
and S
2
h
2
;IS
using Eqs. (10) and (18), respectively. This
procedure is repeated for dierent values of hY
IS
i hY
j
i,
j 1; 2; . . . ; m, and r
2
Y ;IS
r
2
k
, k 1; 2; . . . ; n. Find op-
timal parameters that minimize Eqs. (10) and (18), i.e.,
those parameters corresponding to minimum S
2
H;N
and
S
2
h
2
;IS
among m n values. The dependence of S
2
H;N
and
S
2
h
2
;IS
on hY
IS
i and r
2
Y ;IS
is illustrated in Fig. 2(a) and (b),
showing two contour maps for S
2
H;N
and S
2
h
2
;IS
, respec-
tively. It is found that the optimized importance density
f
1
y is N3:8; 3:0 and f
2
y is N5:9; 4:7, both of
which greatly deviate from the original distribution
N3:0; 2. It is understandable that the large variance
in importance density functions of Y represents the need
to enlarge the tail from the original distribution, i.e., the
5
1
0
1
0
0
5
0
0
1
0
0
0
2
0
0
0
1
0
0
5
0
0
1
0
0
0
2
0
0
0
1
Y
IS
2Y
,
I
S
-8 -6 -4 -2 0
2
3
4
5
6
(a)
1
0
0
0
0
1
0
0
0
0
0
1
E
+
0
6
2
E
+
0
6
4
E
+
0
6
5
0
0
0
0
5
0
0
0
1
E
+
0
6
Y
IS
2Y
,
I
S
-8 -6 -4 -2 0
2
3
4
5
6
(b)
Fig. 2. Contour maps showing dependence of: (a) S
2
H;N
and (b) S
2
h
2
;IS
on parameters hY
IS
i and r
2
Y ;IS
of importance density functions. Optimal hY
IS
i and
r
2
Y ;IS
are determined from the minimum S
2
H;N
and S
2
h
2
;IS
in the maps.
1182 Z. Lu, D. Zhang / Advances in Water Resources 26 (2003) 11771188
Y samples from the tails have larger impact on mean
head than that of samples which are closer to the mean.
While the left shift of the mean value, i.e., smaller hY
IS
i,
may imply that small values of Y have more impor-
tant contribution to the mean head.
Figs. 3 and 4 compare mean head and head variance
at the left boundary (x 0) of the one-dimensional
domain obtained from dierent approaches: the CMC
simulation (CMC, 10
6
simulations), IS-based Monte
Carlo simulation (ISMC, 10
5
simulations) for two sets
of samples taken from f
1
y and f
2
y, respectively. As a
reference, the corresponding analytical solutions, i.e.,
Eqs. (21) and (22), are also shown in the gures. The
gures show that ISMC simulation are several orders of
magnitude more ecient than the CMC simulation. In
addition, it seems that a smaller number of (IS-based)
simulations is required for estimating mean head than
for head variance. A few hundreds of simulations are
enough for estimating the mean while thousands of
simulations are needed for estimating the head variance.
Furthermore, Fig. 4 indicates that for estimating the
head variance the optimized importance density func-
tion f
2
y that minimizes r
2
h
2
;IS
in equation (17) is more
ecient than f
1
y that minimizes equation (8). Like-
wise, it is seen from Fig. 3 that f
1
y is more ecient
than f
2
y in estimating the mean head. Figs. 5 and 6
convey the similar information as Figs. 3 and 4 but for
the proles of mean head and head variance in the 1D
domain.
The performance of the CMC and ISMC approaches
for estimating the mean head and the head variance at
the left boundary for dierent levels of precision (here
0:1 means that the estimated values is within 10%
of the exact value) is tabulated in Table 1. The superior
performance of the ISMC over the CMC is clearly evi-
dent. For example, in estimating the mean head at
0:1 and 0.05 levels, it takes just one run for the
ISMC (using f
1
y) while it takes 552 and 2342 runs for
the CMC to achieve the same levels of precision. Simi-
larly, in estimating the head variance, the ISMC (using
NMC
(
m
)
10
2
10
3
10
4
10
5
10
6
10.0
10.5
11.0
11.5
12.0
12.5
13.0
13.5
14.0
Analytical
CMC
IS, f
1
(y)
IS, f
2
(y)
Fig. 3. Comparison of the rate of convergence for mean head at the
left boundary obtained from dierent methods.
NMC
h 2
(
m
2
)
10
2
10
3
10
4
10
5
10
6
0
40
80
120
160
200
240
Analytical
CMC
IS, f
1
(y)
IS, f
2
(y)
Fig. 4. Comparison of the rate of convergence for head variance at the
left boundary obtained from dierent methods.
x (m)
(
m
)
0 25 50 75 100
8
9
10
11
12
13
14
Analytical
CMC (10
4
)
CMC (10
5
)
CMC (10
6
)
IS (f
1
, 10
4
)
IS (f
2
, 10
4
)
Fig. 5. Proles of mean head obtained from analytical solution, im-
portance sampling methods, and the CMC simulation with various
sample sizes.
x (m)
h 2
(
m
2
)
0 25 50 75 100
0
25
50
75
100
125
150
175
200
Analytical
CMC (10
4
)
CMC (10
5
)
CMC (10
6
)
IS (f
1
, 10
4
)
IS (f
2
, 10
4
)
Fig. 6. Proles of head variance obtained from analytical solution,
importance sampling methods, and the CMC simulation with various
sample sizes.
Z. Lu, D. Zhang / Advances in Water Resources 26 (2003) 11771188 1183
f
2
y) is several orders of magnitude more ecient than
the CMC for all levels of precision.
Although the Monte Carlo approach based on im-
portance sampling converges much faster than the CMC
approach, computational eort is needed in accurately
determining the optimal importance densities. Here we
investigated the eect of possible deviation from the
optimal density. Figs. 7 and 8 show convergence of
mean head and head variance at the left boundary re-
sulted from ISMC simulations with importance densities
f
1
y and f
2
y that are dierent from their optimal
densities. Though simulations with the optimal density
functions work much better (comparing to the analytical
value) than those with density functions that are devi-
ated from the optimal one, all of them are several orders
of magnitude more ecient than the CMC simulation.
This implies that it may not be necessarily to obtain a
very accurate estimation of parameters of the impor-
tance density function and that the sampling intervals
for constructing such contour maps as Fig. 2 can be
relatively large, thus allowing great savings in compu-
tational eorts for determining suboptimal importance
density functions.
3.2. Particle travel time
In this example, we consider a case with a similar
boundary congurations as in case 1, but here we as-
sume that porosity / is a random constant following a
normal distribution Nl
/
; r
2
/
and we are interested in
the probability of particles travel time less than a given
value T. Since porosity / is a random variable, so is
travel time t L=q=/ L/=q. In fact, the travel time
t NLl
/
=q; L
2
r
2
/
=q
2
. The probability Pt < T can be
written explicitly as
P
T
Pt < T P
L
q
/
< T
P /
<
Tq
L
!
U
Tq=L l
/
r
/
23
where Ux is the cumulative density function of the
standard normal distribution. Eq. (23) is a basis in
comparing eectiveness of the CMC and ISMC simu-
lations.
Now if we want to estimate P
T
using Monte Carlo
simulation, we rewrite P
T
as
P
T
Pt < T P /
<
Tq
L
Z Tq
L
1
f / d/
Z
1
1
g/f / d/ 24
where f / is the pdf of variable /, i.e., Nl
/
; r
2
/
, and
g/ is a score function dened as g/ 1 if /6Tq=L,
and 0 otherwise. Eq. (24) relates P
T
to the mean of the
score function g/ under density f /. The variance of
g/ can be determined from
NMC
(
m
)
10
2
10
3
10
4
10
5
10
6
10.0
10.5
11.0
11.5
12.0
12.5
13.0
13.5
14.0
Analytical
2
Y,IS
= 2.5, Y
IS
=-3.0
2
Y,IS
= 2.5, Y
IS
= -4.5
2
Y,IS
= 3.0, Y
IS
=-3.0
2
Y,IS
= 3.0, Y
IS
=-3.8 (optimized)
2
Y,IS
= 3.0, Y
IS
=-4.5
CMC
Fig. 7. Sensitivity of the importance sampling method on nonoptimal
parameters for mean head simulation.
NMC
h 2
(
m
2
)
10
2
10
3
10
4
10
5
10
6
0
50
100
150
200
250
Analytical
2
Y,IS
= 4.0, Y
IS
=-5.9
2
Y,IS
= 5.5, Y
IS
= -5.9
2
Y,IS
= 4.7, Y
IS
=-5.0
2
Y,IS
= 4.7, Y
IS
=-5.9 (optimized)
2
Y,IS
= 4.7, Y
IS
=-7.0
CMC
Fig. 8. Sensitivity of the importance sampling method on nonoptimal
parameters for head variance simulation.
Table 1
Comparison of CMC and ISMC on the number of simulations required for dierent levels of precision
hhi r
2
h
0.1 0.05 0.01 0.1 0.05 0.01
CMC 552 2342 21,200 61,301 506,600 >10
7
IS, f
1
x 1 1 79 2374 67,746 223,284
IS, f
2
x 77 391 9647 387 1361 104,714
1184 Z. Lu, D. Zhang / Advances in Water Resources 26 (2003) 11771188
r
2
Z
1
1
g/ P
T
2
f / d/
Z
1
1
g
2
/f / d/ P
2
T
1 P
T
P
T
25
For the CMC simulation, one samples a sequence of /
i
,
i 1; 2; . . . ; N. If /
i
is less than or equal to Tq=L, then
set g/
i
equal to unity. Otherwise, set g/
i
equal to
zero, as illustrated in Fig. 9(a) for the case of T 250
days. We collect the scores from all N samples and
calculate the sample mean:
P
T;MC
1
N
X
N
i1
g/
i
26
For given parameters q 0:1 m/day, l
/
0:3, r
/
0:03, i.e., CV
/
10%, and L 100:0 m, Eq. (23) reduces
to P
T
U10:0 T=30. Table 2 gives P
T
values ob-
tained from the analytical solution, i.e., Eq. (23), for
dierent T values. The table also gives the number of
simulations required to estimate P
T
using the CMC
simulation (N
MC
), within 10% of the exact values.
From Table 2, it is seen that, with the decrease of
parameter T (for T less than or equal to the mean travel
time 300 days), the probability P
T
Pt < T decreases
dramatically, and the number of realizations required to
predict P
T
using CMC increases signicantly. For in-
stance, to predict probability Pt < 100 days with 10%
accuracy will require over seven trillion realizations,
which is clearly beyond capability of any computational
scheme at the present time. The reason is that, for the
given parameter values, the value of f
t
t 100 days or
equivalently the value of f / 0:10 is 1.31 10
11
,
which is so small that it is almost impossible to take a
sample / from f / such that / < 0:1. Another problem
encountered in this kind of simulations is that currently
it is dicult to generate such a large number of random
numbers without repetitions. Although theoretically
some algorithms, such as ran2 as presented in [12],
have a period longer than 10
18
, in a machine of 32-bit
values of integers there are no more than 2
31
2,147,483,647 <10
10
positive signed integers. Thus in a
sequence of 10
10
such integers there must be repetitions.
It should be noted that the computational burden (the
number of simulations required) will be even larger if we
need more accurate predictions. For example, if we re-
duce 0:1 to 0.05, i.e., 5% precision, we have to in-
crease the sample size by a factor of four.
Now instead of taking samples from f /, we use the
importance sampling technique and take samples /
i
,
i 1; 2; . . . N, from an importance pdf f
1
/ of normal
distribution Nl
/;IS
; r
2
/;IS
. To preserve the mean, we
have to dene a new score function g
1
/ g/f /=
f
1
/, which is shown in Fig. 9(b) for the cases of
T 250 days. Certainly, taking samples from f
1
/ such
that / < 0:25 is much easier than taking them from
the original density f /, thus the sample size can be
reduced.
After we sample /
i
, i 1; 2; . . . ; N, from f
1
/, we
can estimate P
T
by
P
T;IS
1
N
X
N
i1
g
1
/
i
1
N
X
N
i1
g/
i
f /
i
=f
1
/
i
27
The variance of the new score function g
1
/ under
distribution f
1
/ can be derived as
0 0.1 0.2 0.3 0.4 0.5 0.6
0
2
4
6
8
10
12
14
f(x)
g(x)
(a)
0 0.1 0.2 0.3 0.4 0.5 0.6
0
2
4
6
8
10
12
14
f
1
0.00
0.05
0.10
0.15
0.20
0.25
0.30
g
1 f
1
(x)
g
1
(x)
(b)
Fig. 9. Pdf and score function: (a) original, and (b) modied.
Table 2
Number of simulations required for dierent values of T
T P
T
N
MC
N
MT
N
opt
300 0.500 1.00 10
2
100 100
250 4.78 10
2
1.99 10
3
202 151
200 4.23 10
4
2.33 10
5
376 285
150 2.87 10
7
3.49 10
8
567 437
100 1.31 10
11
7.64 10
12
767 596
50 3.93 10
17
2.54 10
18
970 759
Z. Lu, D. Zhang / Advances in Water Resources 26 (2003) 11771188 1185
r
2
IS
Z
1
1
g
1
/ P
T
2
f
1
/ d/
A
p
r
2
/;IS
r
/
exp
B
2
=A C
2r
2
/
r
2
/;IS
!
U
A
p
Tq=L B=A
r
/
r
/;IS
!
P
2
T
28
where A 2r
2
/;IS
r
2
/
, B r
2
/
l
/;IS
2r
2
/;IS
l
/
, C
2r
2
/;IS
l
2
/
r
2
/
l
2
/;IS
. The variance of the estimator P
T;IS
is r
2
IS
=N. In contrast to (25), the variance r
2
IS
can be
reduced by properly choosing l
/;IS
and r
2
/;IS
of the
importance density function f
1
/. Notice that the ex-
pression for A implies r
2
/;IS
Pr
2
/
=2.
Theoretically, the optimal density function can be
determined by nding l
/;IS
and r
2
/;IS
such that r
2
IS
in Eq.
(28) is minimized. However, it is not easy to directly
minimize r
2
IS
in Eq. (28). We utilize the large deviation
theory (LDT) and nd that that the optimized impor-
tance density is a Gaussian distribution NTq=L; r
2
/
, i.e.,
an MT from the original density Nl
/
; r
2
/
, which is
consistent with a previous result that for the Gaussian
distribution, the LDT approach is the same as MT [14].
To conrm this numerically, we computed a series of r
2
IS
based on Eq. (28) with dierent values of l
/;IS
and r
2
/;IS
for two particular values of T 100 and 250 days. Thus
the probability of travel time t < T for these two T
values corresponds to the probability of / < Tq=L 0:1
and 0.25, respectively. Fig. 10(a) and (b) show the per-
formance of importance density functions, where the
position of the dot in the gure stands for the mean
and standard deviation of the original density f /.
The measure of performance in Fig. 10 is repre-
sented by contour maps of logc logr
2
MC
=r
2
IS
logN
MC
=N
IS
logN
MC
logN
IS
, showing the dif-
ference of the order of magnitude in the number of
Monte Carlo simulations required for the CMC simu-
lation and ISMC simulation. For instance, the curve for
the contour level logc 2 means that the number of
simulations required for the CMC simulation is two
orders larger than the number required for the ISMC
simulation.
Several observations can be made from Fig. 10. First
of all, for any given T and any xed variance r
2
/;IS
,
sampling from distribution NTq=L; r
2
/;IS
is most e-
cient, i.e., l
/;IS
Tq=L as predicted by the LDT. Notice
that the optimal distribution is not NTq=L; r
2
/
but a
slightly scaled Gaussian distribution with a smaller
variance r
2
/;IS
< r
2
/
. Intuitively, this is understandable
that while moving the mean of the sampling density to
the dominating point which has the highest probability
density (in the original density function) among other
points in the domain of interest, D, the eciency is
further increased by concentrating the sampling dis-
tribution in the region around the dominating point. In
addition, when sampling distribution is overbiased, the
computational eort may increase rather than decrease,
by many orders in worst cases. Thirdly, when the sam-
pling mean is deviated form the optimized mean, the loss
of eciency may be compensated by increasing sam-
pling variance r
2
/;IS
. Furthermore, the eciency that the
IS techniques can achieve depends on the dierence
between l
/
and the optimized mean l
/;opt
Tq=L. For
example, for T 100 days, as illustrated in Fig. 10(a),
the optimized importance sampling can achieve about
10 orders of magnitude more ecient than the CMC
simulation. While for T 250 days, the eciency gain
that the importance sampling can achieve is only about
1 order of magnitude. In the extreme case of T 300
days, l
/;opt
Tq=L 0:3 l
/
, the ISMC simulation
does not have any eciency gain over the CMC simu-
lation. In this case, any density that deviates from f /
requires more computational eort.
Although the importance density derived from MT is
slightly dierent from the optimal density, they are in
0.5
1
0
-
5
-
1
0
0
-5
,IS
,
I
S
0.04 0.06 0.08
0.10
0.15
0.20
0.25
0.30
0.35
0.40
0.45
0.50
(b)
Tq/L = 0.25
9.5
9
8
5
0
9.8 10
9
8
,IS
,
I
S
0.04 0.06 0.08
0.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
(a)
Tq/L = 0.10
Fig. 10. Contour maps showing dependence of eciency of importance techniques, presented by logN
MC
=N
IS
, on parameters hl
/;IS
i and r
2
/;IS
of importance density functions for (a) T 100 days and (b) 250 days. Optimal hl
/;IS
i and r
2
/;IS
are determined from the maximum logN
MC
=N
IS
in
the maps.
1186 Z. Lu, D. Zhang / Advances in Water Resources 26 (2003) 11771188
the same order of magnitude of eciency compared to
the CMC simulation, as also shown in Table 2 for two
ISMC simulations with density functions derived from
MT (N
MT
) and from optimization (N
opt
).
Numerical results from the CMC and ISMC simu-
lations are tabulated in Table 3, where the number in
parenthesis is the number of simulations used to com-
pute P
T
. As expected, for accuracy of 10%, 1000 simu-
lations is enough for the ISMC approach for T value as
low as 50 days, while a signicant large number of runs
is required for the CMC simulation to achieve the same
level of accuracy even for T 200 days. For T < 150
days, we are not able to obtain a good estimation of P
T
using CMC.
4. Summary and conclusions
In this study we introduced a new approach, impor-
tance sampling techniques, and demonstrated its use-
fulness in simulating ow and transport in random
porous media through two simple one-dimensional ow
scenarios. The study leads to the following conclusions:
(1) In simulating ow and solute transport in random
porous media, it is possible to save computational eort
tremendously by employing importance sampling tech-
niques. Because dierent random realizations may have
a dierent impact on the quantities being estimated,
realizations are taken from biased density functions
(importance density functions) rather than the underly-
ing original density functions such that those realiza-
tions with more important impact on estimated
quantities are sampled more frequently. The eect of
biased density functions is then corrected by weighting
simulation outputs such that the means of the quantities
being estimated are preserved. By choosing appropriate
importance density functions, the ISMC simulation can
be many orders of magnitude more ecient than the
CMC simulation. In some cases, using ISMC approach,
one can solve some problems that cannot be solved
using the CMC simulation.
(2) The importance density for the mean estimator is
chosen in such a way that the variance of the mean es-
timator is reduced while the mean is preserved. There-
fore, the density function that minimizes the variance of
the mean estimator does not necessarily minimize the
variance of the variance estimator. In other words, in
estimating both mean quantity and its associated pre-
diction variance, one may need to nd dierent impor-
tance density functions.
(3) In practice, we may not need to nd the optimal
importance function, an importance function that is
relatively close to the optimal one in general works very
well.
(4) Finally, we have emphasize that when the sam-
pling distribution is overbiased, the computational eort
may increase rather than decrease, by many orders in
worst cases.
In examples presented in this paper, we assume that
medium properties are random constant. To realistically
apply importance sampling techniques to the real world,
it is needed to extend this approach to correlated ran-
dom elds.
Appendix A
For one-dimensional horizontal saturated ow with
prescribed ux on the left and prescribed constant head
on the right, in a porous medium whose log hydraulic
conductivity is random constant, the governing equation
for head h can be written as
o
2
h
ox
2
0 0 6x 6L A:1
with boundary conditions h H
1
at x L and
K
s
oh=ox q
0
at x 0. The exact solution for the above
equation with boundary conditions is
hx H
1
q
0
x Le
Y
A:2
where Y hY i Y
0
. Taking ensemble mean of Eq. (A.2)
yields the mean head
hhxi H
1
q
0
x Le
hY ir
2
Y
=2
A:3
Subtracting Eq. (A.3) from Eq. (A.2) gives
h
0
x q
0
x Le
hY i
e
Y
0
e
r
2
Y
=2
h i
A:4
which leads to the expression for head variance
r
2
h
x q
2
0
x L
2
e
2hY i
e
r
2
Y
e
r
2
Y
1
h i
A:5
Here we used the results hexpY
0
i expr
2
Y
=2 and
hexp2Y
0
i exp2r
2
Y
, which are exact for normally
distributed Y
0
.
Table 3
Comparison of P
T
resulted from dierent approaches
T P
T
P
T;MC
N
MC
P
T;IS
N
IS