Spear 1994
Spear 1994
Spear 1994
RESEARCH,
VOL.30,NO. 11,PAGES
3159-3169,
NOVEMBER
1994
Parameter
uncertainty
andinteraction
in complex
environmental models
Robert C. Spear
Environmental
Engineering
andHealthSciences
Laboratory,
Richmond
Research
Center
Universityof California, Berkeley
Thomas M. Grieb
Tetra
Tech,Inc.,Lafayette,
California,
andEnvironmental
Engineering
andHealth
Sciences
Laboratory,
Richmond
ResearchCenter,Universityof California,Berkeley
Nong Shang
Environmental
Engineering
andHealthSciences
Laboratory,Richmond
ResearchCenter
Universityof California, Berkeley
potential
oftheCARTapproach
ledustoinvestin exploring m
fi(x)
ni(•i)
= -- (1) structureare presentedin the following example.
Figure I showsa test casein whichf comprisesa weighted
sum of four bivariate normal distributions of different means
The accuracyof estimationis defined by an integrated and variances. A random sample of size 200 was drawn from
squareerror (ISE): this complexbivariatedistribution,and thesepairsof values
were input to the densityestimationprocedure.The task is
to divide the spaceinto regionsof relatively uniform density
ISE
=f.r{f(x)
-j'(x)}
Es 2dx (2)
within the constraintsof the precisionallowed by the sample
size. The scatterplotfor the 200 samplesis shownin Figure
wherethe smallerthe ISE, the betterthe fit. Note that 2. The irregularrectangularbox structure superimposedon it
representsthe boundariesof 13 subregionsdeterminedby
the estimationprocedure.Figure 3 showsthe corresponding
tree. The density values in the nodes are relative densities,
•S •S rl i= 1 which are calculatedby assumingthat the parameter space
(3) has a unit volume. From (!), it is easy to show that the real
densityisjust the relativedensitydividedby the real areaor
It canbeshownthatforanyf withstructure
(1),the!SEcan volume of the parameterspace. The first split is at x2 -<
'• estimated
to withina constant by a lossfunctionL, -9.7, As is notedon the tree, only 5 of the 200 points lie in
definedas that region, with 195beinggreater than that value, as can be
3162 SPEARET AL.: UNCERTAINTY AND INTERACTION IN COMPLEXMODELS
comparing
trees,adjusting
thesplittingcriterionto avoidend
cuts, and developingpruning strategiesfor generatingthe
best subtrees.To give some senseof the concept, we define
the roughnessparameter as
1 m I
Rs
=--
rn i•
1--
Si (7)
The definitionhastwo importantproperties'R -> rn for any
tree with rn terminalnodesand R = rn if and only if Si =
S/rn. Hence equal splittinggives the least roughness.Con-
versely,the more uneventhe splitting,the larger the rough.
nessparameter.In general,R is an increasingfunctionof m.
That means that the roughness of a tree comes from two
aspectsof a partition:finenessand evenness.Many criteria
in the tree-generating
processneedto be critically examined.
For example,the modifiedcriterionfor makinga split is to
Figure 1. Perspectiveplot of the real density. maximize
specialsmoothingof the data. The finer the partitioning,the 0.16 1.06 0.17
rougher the estimation. Both oversmoothing and under- ß
o -
2. How doesone verify the accuracyof a particulartree?
If the samesampleis used to constructa tree and to estimate '.. ':'
the ISE, the error is called a resubstitution error, and it is ,,
,
.
6(•Intermediate
Node 6.49 = Density
5
X2 < -9.7
195
0[• Terminal
Node 0.07 = Density
0.34 o.3.._•4
= Area X1 > -7.8
183
Xl • 5.4
137
X2• 1.1
51 11 34 41 11 i S0 4
/ \
tiveprocedurein which the complete data set is randomly includinglandfills, surfaceimpoundments,waste piles, un-
dividedinto v subsets(v = 10 was used). In each of the v dergroundinjection wells, and undergroundstoragetanks.
repetitions,v - 1 subsets are combined into the learning The physicalcharacteristicsof each type of waste manage-
samplethat is used to constructa tree, and the remaining ment unit are different, but in each case source characteris-
subsetis usedas a test sampleto estimatethe ISE. For each tics are defined and mass balance calculations are made on
valueof the roughnessparameter,the estimatedISEs are an annual basis to ensure that mass is conserved in waste
averaged over the v repetitions.The resultingquantityis a managementunits that have multiple release pathways.
reasonableestimation of the real ISE and is referred to as the Several mathematical models are used to represent fate
crossvalidationerror. The best roughnessparameteris that and transport processesin five pathways: atmospheric,
surfacewater, groundwater,soil erosion, and food chain
onewhich has the smallest cross validation error, or approx-
imatelythe smallestISE. The final tree is constructedby bioaccumulation.The uncertaintyanalysesconductedin this
usingthe best roughnessparameterand the completedata studyfocusedonthe simulationof contaminantmovementin
set. the groundwaterpathway.The simulatedgroundwaterpath-
way includesthefollowingprocesses:net recharge,leaching
of contaminantsfrom the soil, transportthroughthe partially
ModelDescription saturatedzone, and contaminantdispersionwithin an under-
MMSOILS was used to explorethe applicationof the lying aquifer.A seriesof equationsand simplemodelsare
tree-structured
densityestimationmethodology to investi- used to link the movement of water into the soil, fluid flow
gatingmodeluncertaintyandparameterinteraction.It is a and contaminanttransport through the partially saturated
multimediacontaminant fate, transport, and exposure zone, and mixingat the boundaryof the saturatedzone. The
model.It wasdevelopedas partof an assessmentmethod- fate and transport of a contaminantin the aquifer are
ology[EPA, 1992]to estimateexposures andhealthrisks calculatedby using a quasi-analyticalsolution to a three-
with the release and subsequentfate and trans- dimensional
associated transientequation.It incorporatesthe effectsof
longitudinal
port of chemicals from contaminated soils and hazardous advectionandthree-dimensional
dispersion,
re-
chemical tardation, and first-order decay.
wastesites.The modelis capableof simulating
•leasesfrom severaltypesof wastemanagement units, MMSOILS was designedto 'run on a personal computer
3164 SPEAR ET AL.: UNCERTAINTY AND INTERACTION IN COMPLEX MODELS
Range of
Parameter Definition Values
*Parametersthatexhibitedstatistically
significant
differences
in theirdistributionsbetweenpassand
nonpass groups.
system, and the model output consistsof the predicted The initial analysisinvolvedthe applicationof regional
concentrationin eachenvironmentalpathwayanda selected sensitivityanalysisto investigatewhich modelparameters
exposure point. For a previous study [Tetra Tech, Inc., make the largest contribution to the ability to meet the
1992],the model has also been incorporatedinto a Monte performance criterion.The methodinvolvesthe application
Carlosimulation packageto randomlyselectvaluesof input of univariatestatisticaltechniquesto examine differences
in
parameters from specified distributions. The simulation the marginaldistributionsof each parameterbetweenthe
packageallows up to !80 of the 300 MMSOILS input passand nonpassgroups.In the primary set of analyses the
parameters to be varied in each model run. tree-structureddensity estimation technique was usedto.
examinethe interactionsbetween parametersin the pass
Analyses region of the parameterspace.
The analysesdescribedin thispaperexaminethe relative
importanceof model parametersin predictingchemical Results
concentrations
in the groundwaterat a solidwastemanage- UnivariateAnalysis
ment site regulated under the Resource Conservationand
RecoveryAct of 1976.Modelparameter
valuesrequired
for The model simulationsproduced 1090 passesand 3910
the model simulationshad previouslybeen obtainedfrom nonpasses.The hydraulicconductivityand hydraulicgra•-
numerousdocumentspreparedduringthe courseof remedial ent showedthe largestdifference in the cumulative distribu-
investigationand feasibility studiesconductedat the site tionfunctionbetweenthe two groups.Nine additionalm•el
[Tetra Tech, Inc., 1992].Twentymodelparameters, de- parametersexhibitedstatisticallysignificantdifferences
in
scribedin Table 1, were randomlyvaried in 5000 model their distributions betweenthe passand nonpassgroups.
simulations.These20 parameterswere selectedon the basis Differenceswere exhibitedbetweenthe two groupsin t•½
of theirrelativeimportance
in preliminary sensitiv- typesof distributionbut not in the range of valuesthatmet
regional
ity analyses[TetraTech,Inc., 1992]using54parameters. the performance criteria. Correlation coefficientswere a!s0
Eachmodelrun wasseparated intotwo groups on the calculatedto evaluatethe relationshipbetweenmodelpa-
basisofthemodel performancecriterion: ground- rameters. The results, summarized in Figure 4, indicateaa
predicted
water concentration of benzeneat a specified receptor absenceof importantinterparameterrelationships.Thelarg-
well located505m fromthe center estcorrelationcoefficient(-0.41) occurredbetweenconduc,
location(a drinking-water
of the site)andtime(77 yearsafteroperation of the solid tivity and gradient.The absolutevalue of the majorityof
waste management site began). Model runs in which the correlation coefficients was less than 0.10.
predictedbenzeneconcentration
exceededthe U.S. Envi-
ronmental Protection
Agency's applicablePartitioningof the PassPoints
chemical-specific
or relevantandappropriate requirement for benzene(1 Thediagram producedin the evaluation
of thedensity
:of
/zg/L)wereclassifiedaspasses.
All othermodelrunswere thepasspointsis shownin Figure5. The relativedensity
classifiedas nonpasses. valueis shownfor eachnode,and the proportion of •
SPEARET AL.: UNCERTAINTYAND INTERACTIONIN COMPLEXMODELS 3165
n= !65
d)/N. This informationon the density in each node is also
summarized in Table 2. In terminal nodes 4, 10, 1, 3, and 14,
the relative density values were greater than 2.0, which
meansthat we expect that at least 2.0 x 1090/5000= 44% of
the pointsin the local regionsare passpoints. The expected
number of pass points in terminal node 4, for example, is
n =23 95%. Overall, 858 of the 1090passes,or 79% were found in
six of the terminal nodes,which representedapproximately
28% of the volumeof the parameterspace.Examinationof
the other terminal nodes shows that 51% of the parameter
space has a relative density of less than 0.5. The expected
n=l n=l proportion of passes in these regions is less than 10%.
m • m
0.1 0.2 0.3 0.4 0.5
Generally, these results show that the parameter spacecan
be partitioned into relatively small, densely populated re-
gionsand large, sparselypopulatedregions.
The model parametersthat are responsiblefor the parti-
Figure4. Correlationbetween model parametersin the tioning of the parameterspaceand the valueson which the
passregion(n = 1090). parameter space is partitioned are also shown in the tree
diagram(Figure 5). The five parametersthat definedthe high
density terminal nodes are COND (aquifer conductivity),
volume of the parameter space occupied by the terminal GRAD (aquifer hydraulic gradient), SOLB (chemical solu-
nodeis also indicated. The relative density value indicates bility), ICLOSE (waste site closure date), and DPLYR2
the density of pass points in each node. In general, if we (unsaturatedzone thickness). These parameters are a subset
haven passesout of N simulations,then a relative density, of those identified in the regional sensitivity analysis that
d, in a local region can be explained as the expectationof accounted for the partitioning of the pass and nonpass
gettingn x d passesif we took N samplesfrom the local regions of the total parameter space (Table 2).
region,or the proportion of passes in the region is (n x The parameterrangeswithin the high density nodes of the
776 3!4
472
SOLB > 9.75 0.0.._•.9 0.,035 88 154
I,CLOSE =• 5O
TN4
102 370
26 3
CONDUCT > 3150.2
CONDUCT • 3465.1
0.033 o 033
74 28 86 284
1
0.07_•5 2
I I 0.1,35, 11 3
iii i iiiiiiiim iii iiiiiiiiiiiiiii iiii ii iiiiii i iii i i ii
AREA1 > 101.5 ICLOSE < 55
0•o•,• 1 0.054
Q Intermediate
Node
TN3
1.26 = Density
Terminal Node 3
AQUIDP • 9 DPLYR2 • 586.5
TN13/ • 14 2,48 = Density
0.033 = VolUme
0'033(Highdensity
termtna!
nodes
are shaded)
0,041 o,o•_!
o_..•.• o,o9•
Figure5. Partitioning
ofpass
points
using
density
estimation
approach
(number
of cases
= 1090).
3166 SPEAR ET AL.: UNCERTAINTY AND INTERACTION IN COMPLEX MODELS
pass region provide an effective method for explainingthe The tree structure and these parameter ranges within the
tree diagram. Figure 6 depicts the numericalrangesfor each terminal nodes also provide information on the relative
of these five parameters within the six terminal nodesthat importanceof individualparameters. The initial split of the
exhibit high densities(i.e., relative densitiesgreaterthan 1). tree is made on the basis of values of COND, and the
This figure shows that the high-densityregions take on importanceof COND is indicated by the fact that several
valuesover the full range of each of thesefive parameters. subsequentsplitsof the tree refine the acceptablerangesof
The passregion is not a singlehigh-densityarea describedby COND within additional high-density terminal nodes. In
restrictedrangesof values for the five parameters.Instead, contrast,DPLYR2 appearsto be less important in defining
there are several distinct high-density areas defined by the high density nodes. Full ranges of values were observed
differentcombinationsof parameterranges. in all but one of the high density terminal nodes (terminal
Examination of these combinationsprovides a means of node 14).
comparingand contrastingthe effectof model parametersin Another view of the importance of these parametersin
differentportionsof the passregionaswell as informationon differentportionsof the pass region is provided in Figure7.
the interactionbetweenparametersthat was obscuredin the Terminal node 4 has the highest relative density, contains
initial correlationanalysis(Figure4). For example,very low 14% of the observedpasseswithin 3.5% of the parameter
values of COND and unrestricted values for GRAD charac-
space, and is defined by very low values of COND •d
terize terminal nodes 2, 3, and 4. Terminal node 1, on the relatively high values for SOLB and ICLOSE. Terminal
other hand, is characterizedby very low values of GRAD node 1 encompassesa relatively large portion of the pass
and all values for COND except thoseincludedin terminal region (approximately 9%) and is defined on the basisof
nodes2, 3, and 4. The importanceof these combinationsin restrictedvaluesofjust two parameters,GRAD and COND.
definingthesefour high-densitynodeshelpsto explainthe Terminal nodes 10, 13, and 14 have the same values for
relatively strong linear relationshipthat was observedbe- COND, GRAD, and SOLB but different values for ICLOSE
tween these parameters in the initial correlationanalysis and DPLYR2 at separatenodes, which results in different
(r = -0.41). The tree-structured density estimation tech- densities. Terminal nodes 10 and 14 are defined on the basis
nique, however, also explainswhy a strongerrelationship of differences in the values of ICLOSE and DEPLYR2. Both
does not emerge from the linear correlation analysis;while are parametersthat affect contaminantavailability, andit
terminal nodes 1, 2, 3, and 4 exhibit restrictedrangesof seemslikely that theseparametersfunction as surrogates.
eitherCOND or GRAD, the remaininghigh-density terminal Differencesin the densitybetweenterminal nodes13 and14
nodes (10 and 14) are characterized by wider ranges of showthe effectsof a singleparameterin defininga high-
moderate values for both COND and GRAD. densityarea within the passregion.
TN4
TN1
TN 10
TN 14
Terminal
Terminal Node Density
Node Densi_t• I 10 2.94
T" 4.35 '..•
::. !4 2.14
TN1 2.80 • 13 0.34
Parameter Parameter
Conductivity
Conductivity
(COND)
Gradient
(COND)
Gradient
(GRAD) (GRAD)
Solubility Solubility
(SOLB) (SOLe)
Figure 7. Parameterrangeswithinselectedterminalnodes.
3OO 767
252 48 66 701
! 93 59
2 46 0.0.._.•9 56 645
THICK3 > 0.938 DPLYR2• 385.1 TN5-S DPLYR2< 284.4
3 56
I 0.111 0.165
0i-;
0.165
353
RAIN1 • 1.08
292
AREA1 _<136.3
=•5-6
0.0495
TN5-7
,,,
0.0605
261
THICK3
92
> 0.881
117
TNS-11
1(•)Intermediate
Node
ms.?
1,64 = Density
0.111 0,091
0.0,,9 0.03
Figure8. Results
of directed
search
ofterminal
node5 (number
of cases= 1067).