65
NUREG/CR-6843 PNNL-14534 Combined Estimation of Hydrogeologic Conceptual Model and Parameter Uncertainty Pacific Northwest National Laboratory University of Arizona U.S. Nuclear Regulatory Commission Office of Nuclear Regulatory Research NEC,, Washington, DC 20555-0001 &II 'IC

Combined Estimation of Hydrogeologic Conceptual Model and ... · PNNL-14534 Combined Estimation of Hydrogeologic Conceptual Model and Parameter Uncertainty Pacific Northwest National

Embed Size (px)

Citation preview

Page 1: Combined Estimation of Hydrogeologic Conceptual Model and ... · PNNL-14534 Combined Estimation of Hydrogeologic Conceptual Model and Parameter Uncertainty Pacific Northwest National

NUREG/CR-6843PNNL-14534

Combined Estimation ofHydrogeologic ConceptualModel and Parameter Uncertainty

Pacific Northwest National Laboratory

University of Arizona

U.S. Nuclear Regulatory CommissionOffice of Nuclear Regulatory Research NEC,,Washington, DC 20555-0001 &II 'IC

Page 2: Combined Estimation of Hydrogeologic Conceptual Model and ... · PNNL-14534 Combined Estimation of Hydrogeologic Conceptual Model and Parameter Uncertainty Pacific Northwest National

II

AVAILABILITY OF REFERENCE MATERIALSIN NRC PUBLICATIONS

NRC Reference Material

As of November 1999, you may electronically accessNUREG-series publications and other NRC records atNRC's Public Electronic Reading Room athttp://www.nrc.aov/readina-rm.html. Publidy releasedrecords include, to name a few, NUREG-seriespublications; Federal Register notices; applicant,licensee, and vendor documents and correspondence;NRC correspondence and internal memoranda;bulletins and information notices; inspection andinvestigative reports: licensee event reports; andCommission papers and their attachments.

NRC publications in the NUREG series, NRCregulations, and Title 10, Energy, in the Code ofFederal Regulations may also be purchased from oneof these two sources.1. The Superintendent of Documents

U.S. Government Printing OfficeMail Stop SSOPWashington, DC 20402-0001Internet: bookstore.gpo.govTelephone: 202-512-1800Fax: 202-512-2250

2. The National Technical Information ServiceSpringfield, VA 22161-0002www.ntis.gov1-800-553-6847 or, locally, 703-605-6000

A single copy of each NRC draft report for comment isavailable free, to the extent of supply, upon writtenrequest as follows:Address: Office of the Chief Information Officer,

Reproduction and DistributionServices Section

U.S. Nuclear Regulatory CommissionWashington, DC 20555-0001

E-mail: [email protected]: 301-415-2289

Some publications in the NUREG series that areposted at NRC's Web site addresshttp://www.nrc.pov/readinp-rm/doc-collectionslnurepsare updated periodically and may differ from the lastprinted version. Although references to material foundon a Web site bear the date the material was accessed,the material available on the date cited maysubsequently be removed from the site.

Non-NRC Reference Material

Documents available from public and special technicallibraries include all open literature items, such asbooks, joumal articles, and transactions, FederalRegister notices, Federal and State legislation, andcongressional reports. Such documents as theses,dissertations, foreign reports and translations, andnon-NRC conference proceedings may be purchasedfrom their sponsoring organization.

Copies of industry codes and standards used in asubstantive manner in the NRC regulatory process aremaintained at-

The NRC Technical LibraryTwo White Flint North11545 Rockville PikeRockville, MD 20852-2738

These standards are available in the library forreference use by the public. Codes and standards areusually copyrighted and may be purchased from theoriginating organization or, if they are AmericanNational Standards, from-

American National Standards Institute11 West 4274 StreetNew York, NY 10036-8002www.ansi.org212-642-4900

Legally binding regulatory requirements are statedonly in laws; NRC regulations; licenses, includingtechnical specifications: or orders, not inNUREG-series publications. The views expressedin contractor-prepared publications in this series arenot necessarily those of the NRC.

The NUREG series comprises (1) technical andadministrative reports and books prepared by thestaff (NUREG-XXXX) or agency contractors(NUREGICR-XXXX), (2) proceedings ofconferences (NUREG/CP-XXXX), (3) reportsresulting from international agreements(NUREGIIA-XXXX), (4) brochures(NUREG/BR-XXXX), and (5) compilations of legaldecisions and orders of the Commission and Atomicand Safety Licensing Boards and of Directors'decisions under Section 2.206 of NRC's regulations(NUREG-0750).

DISCLAIMER: This report was prepared as an account of work sponsored by an agency of the U.S. Government.Neither the U.S. Government nor any agency thereof, nor any employee, makes any warranty, expressed orimplied, or assumes any legal liability or responsibility for any third party's use, or the results of such use, of anyinformation, apparatus, product, or process disclosed in this publication, or represents that its use by such thirdparty would not infringe privately owned rights.

Page 3: Combined Estimation of Hydrogeologic Conceptual Model and ... · PNNL-14534 Combined Estimation of Hydrogeologic Conceptual Model and Parameter Uncertainty Pacific Northwest National

NUREG/CR-6843PNNL-14534

Combined Estimation ofHydrogeologic ConceptualModel and Parameter Uncertainty

Manuscript Completed: October 2003Date Published: March 2004

Prepared byP.D. Meyer, M. Ye, S.P. Neuman (UA),K.J. Cantrell

Pacific Northwest National LaboratoryRichland, WA 99352

Subcontractor:University of ArizonaTucson, AZ 85721

T.J. Nicholson, NRC Project Manager

Prepared forDivision of Systems Analysis and Regulatory EffectivenessOffice of Nuclear Regulatory ResearchU.S. Nuclear Regulatory CommissionWN'ashington, DC 20555-0001NRC Job Code Y6465

Page 4: Combined Estimation of Hydrogeologic Conceptual Model and ... · PNNL-14534 Combined Estimation of Hydrogeologic Conceptual Model and Parameter Uncertainty Pacific Northwest National

Abstract

The objective of the research described in this report isthe development and application of a methodology forcomprehensively assessing the hydrogeologicuncertainties involved in dose assessment, includinguncertainties associated with conceptual models,parameters, and scenarios. This report describes andapplies a statistical method, Maximum LikelihoodBayesian Model Averaging (MLBMA), toquantitatively estimate the combined uncertainty inmodel predictions arising from conceptual model andparameter uncertainties. The method relies on modelaveraging to combine the predictions of a set ofalternative models. Implementation is driven by theavailable data. When there is minimal site-specific datathe method can be carried out with prior parameterestimates based on generic data and subjective priormodel probabilities. For sites with observations ofsystem behavior (and optionally data characterizingmodel parameters), the method uses model calibrationto update the prior parameter estimates and modelprobabilities based on the correspondence betweenmodel predictions and site observations. The set ofmodel alternatives can contain both simplified andcomplex models, with the requirement that all modelsbe based on the same set of data.

MLBMA was applied to the geostatistical modeling ofair permeability at a fractured rock site. Sevenalternative variogram models of log air permeabilitywere considered to represent data from single-holepneumatic injection tests in six boreholes at the site.Unbiased maximum likelihood estimates of variogramand drift parameters were obtained for each model.Standard information criteria provided an ambiguousranking of the models, which would not justifyselecting one of them and discarding all others as iscommonly done in practice. Instead, some of themodels were eliminated based on their negligibly smallupdated probabilities and the rest were used to projectthe measured log permeabilities by kriging onto a rockvolume containing the six boreholes. These fourprojections, and associated kriging variances, wereaveraged using the posterior model probabilities asweights. Finally, cross-validation was conducted byeliminating from consideration all data from oneborehole at a time, repeating the above process, andcomparing the predictive capability of the model-averaged result with that of each individual model.Using two quantitative measures of comparison, themodel-averaged result was superior to any individualgeostatistical model of log permeability considered.

iii

Page 5: Combined Estimation of Hydrogeologic Conceptual Model and ... · PNNL-14534 Combined Estimation of Hydrogeologic Conceptual Model and Parameter Uncertainty Pacific Northwest National

Contents

Abstract .iiiExecutive Summary ................................................ ixForeword ................................................. xiiiAcknowledgments ................................................ xv

I Introduction.1

2 Quantification of Parameter and Conceptual Model Uncertainty ................................................ S5

2.1 Parameter Uncertainty .5

2.1.1 Sources of Parameter Uncertainty .52.1.2 Analysis of Parameter Uncertainty ........................ 6..........6

2.2 Conceptual Model Uncertainty .9

2.2.1 Analysis of Conceptual Model Uncertainty .9

3 Combining Parameter and Conceptual Model Uncertainty .. 11

3.1 Bayesian Model Averaging . 1

3.1.1 Interpretation of Model Probability . 13.1.2 Specifying Prior Model Probability .12

3.2 Maximum Likelihood Bayesian Model Averaging .12

3.2.1 A Few ords About KIC.................................................................................................................... 133.2.2 Applicability of MLBMA .14

3.3 Summary of MLBMA.14

4 Example Application.17

4.1 Implementation of MLBMA .174.2 ALRS Data and Previous Efforts .17

4.2.1 Alternative Models and Maximum Likelihood Parameter Estimation . .184.2.2 Posterior Model Probabilities .. 19

4.2.2.1 Sensitivity to Prior Model Probabilities .194.2.3 Kriging Results..20

4.3 Assessment Of Predictive Performance .29

4.3.1 Predictive Log Score.294.3.2 Predictive Coverage.30

5 Conclusions.37

6 References.39

v

Page 6: Combined Estimation of Hydrogeologic Conceptual Model and ... · PNNL-14534 Combined Estimation of Hydrogeologic Conceptual Model and Parameter Uncertainty Pacific Northwest National

I I

Appendix A. Distribution Coefficients, Kd, and Associated Uncertainty .................................................................. A-I

A. 1 Introduction .A- IA.2 Background.A-I

A.l.1 Contaminant Adsorption onto Natural Mineral Surfaces .A-IA. 1.2 Empirical Approaches to Adsorption Modeling . A-2A. 1.3 Surface Complexation Approach to Adsorption Modeling .A-2A. 1.4 Non-Electrostatic Surface Complexation Models .A-3

A.2 Sources of Kd Value Uncertainty .A-4A.3 Variability in Kd Values and the Impact on Transport Calculations .A-SA.4 Determination of Kd Values and Associated Uncertainly .A-6

A.4. 1 Systematic Approach for Determination of Kd Values and Associated Uncertainty . A-6A.4.2 Determination of Uranium Kd Values and Associated Uncertainty with Iterative Refinement . A-7

A.5 References .A-

Figures

2-1. Photograph of a trench face from an excavation in the 200 Area of the Hanford Site, Washington ...................... 5

2-2. Ratio of estimated to true parameter values for variance and correlation length of transmissivity for sevendifferent inverse methods ................................................................................ 6

2-3. Use of data/information in parameter estimation ............................................................................... 7

2-4. Types and uses of data sources and information for characterizing hydrogeologic parameter uncertaintyin dose assessments for license termination decisions ............................................................................... 8

2-5. A schematic representation of the relationship between alternative conceptual-mathematical models ................. 9

3-1. Maximum Likelihood Bayesian Model Averaging (MLBMA) approach to combined estimation of modeland parameter uncertainty .............................................................................. 16

4-1 Spatial locations of 184 I-m-scale loglok data at ALRS .............................................................................. 17

4-2. Omni-directional sample variogram of l-m-scale loglok data at the ALRS and numbers of data pairs ............... 18

4-3. Negative log likelihood functions (NLL) as function of each variogram parameter and drift coefficientfor exponential model with linear drift (Exp 1) ............................................................................... 20

4-4. Kriged (a) estimate and (b) variance of loglok at y = 6.5 in obtained using the power model (PowO) ................. 23

4-5. Kriged (a) estimate and (b) variance of logl 0k at y = 6.5 m obtained using the exponential model withoutdrift .............................................................................. 24

4-6. Kriged (a) estimate and (b) variance of loglk aty = 6.5 m obtained using the exponential model withfirst-order drift ......... ,. . . . . . . . . 25

4-7. Kriged (a) estimate and (b) variance of loglok aty = 6.5 in obtained using the spherical model withfirst-order drift ..... ,26

4-8. Kriged (a) estimate and (b) variance of loglok aty = 6.5 m obtained using MLBMA ......................................... 27

4-9. (a) Within- and (b) between-model variance of MLBMA logck- estimates aty = 6.5 m ..................................... 28

vi

Page 7: Combined Estimation of Hydrogeologic Conceptual Model and ... · PNNL-14534 Combined Estimation of Hydrogeologic Conceptual Model and Parameter Uncertainty Pacific Northwest National

4-10. Cumulative distribution of kriged loglok estimates obtained using various models and MLBMA .................... 29

4-11. Omni-directional sample variograms of all data and all but data from borehole (a) V2, X2, Y2 and(b) Y3, Z2, NV2A ............................................................................ 31

4-12. Dependence of power variogram (PowO) (a) parameters and (b) quality criteria on data .................................. 32

4-13. Posterior model probabilities based on (a) BIC and (b) KIC upon eliminating data from designatedborehole ............................................................................ 33

4-14. 5% and 95% limits of simulated prediction interval of loglok along borehole X2 ............................................. 34

4-15. Cumulative distribution of simulated loglok values at a measurement location in borehole (a) V2 and(b) Y3 ............................................................................. 35

4-16. Sample variances of loglok values simulated using various models and MLBMA along borehole (a) V2and (b) Y3 while eliminating the corresponding data ............................................................................ 36

Tables

I-1. Summary of radiological criteria for license termination .............................................................................. 1

4-1. Quality criteria, rankings and prior/posterior probabilities associated with alternative geostatistical models ..... 21

4-2. Variance of kriged estimates across the grid obtained with alternative models and MLBMA ............................ 22

4-3. Number of loglok data in DW of each cross validation case and their percentage of the entire data set ................ 29

44. Average predictive log score and predictive coverage of individual models and MLBMA ................................. 30

vii

Page 8: Combined Estimation of Hydrogeologic Conceptual Model and ... · PNNL-14534 Combined Estimation of Hydrogeologic Conceptual Model and Parameter Uncertainty Pacific Northwest National

Executive Summary

In its performance assessments of decommissioningsites and other nuclear facilities, the U.S. NuclearRegulatory Commission (NRC) staff uses a risk-informed, performance-based approach in whichevaluation of risk is an integral part of, but not the solebasis for, decision making. The risk is, in part,manifested as uncertainty in estimates of dose. Theimportance of assessing uncertainty in dose is madeclear by considering the following:

* long regulatory time frames (e.g., 1000 years),* complex exposure pathways involving

multiple media,* relatively small incremental doses, and* potentially limited site-specific

characterization data.

The objective of the research described in this report isthe development and application of a methodology forcomprehensively assessing the hydrogeologicuncertainties involved in dose assessment modeling.For methodological purposes, prediction uncertainty isclassified as being associated with one of three basiccomponents of dose assessment models:

* the conceptual-mathematical basis of themodel,

* model parameters, or* the scenario to which the model is applied.

This report describes and applies a method to estimatethe combined uncertainty in model predictions arisingfrom conceptual model and parameter uncertainties. Afuture report will include the analysis of scenariouncertainty.

The primary steps involved in addressing uncertainty inmodel parameters are

* characterization of parameter uncertainty,* propagation of parameter uncertainty into

model output uncertainty, and* parameter sensitivity analysis.

Parameter estimation, including the characterization ofparameter uncertainty, is driven by the available dataand information. In the most data-limited case, priorparameter estimates are based on available informationthat does not include site-specific parametermeasurements. These estimates represent the largestdegree of uncertainty. Meyer and Gee (1999) discussdata sources for characterizing hydrogeologicparameter uncertainty in the context of dose assessmentmodeling for license termination decisions. They

suggest the application of a hierarchy of data fromnational-scale databases (referred to as genericinformation) to site-specific measurements ofparameter values. Site-specific parametermeasurements, when available, can be used to updatethe prior estimates (Meyer et al., 1997), therebydecreasing parameter uncertainty. A similarmethodology for the characterization of probabilitydistributions for (adsorption) distribution coefficients isbeing developed as part of the research reported here(see Appendix A).

When observations of system state variables (e.g.,hydraulic head, radionuclide concentration) areavailable at a site, formal calibration methods, using aninverse model, can be used to improve parameterestimates and characterize the uncertainty of theseestimates. Calibrated parameter estimates represent theapplication of the maximum amount ofdatalinformation and yield parameters with theminimum uncertainty (Wang et al., 2003). Becausethey rely on an inverse model, calibrated parameterestimates are model-dependent. In fact, mostcalibration methods assume the model is correct. Errorsthus represent the uncertainty in parameters given thatthe model is correct. This will underestimate parameteruncertainty.

Relying on a single conceptual representation of asystem has two potential pitfalls: the rejection byomission of valid conceptual model alternatives, andreliance on an invalid representation by failing toadequately test it. The potential consequences areunderestimation of uncertainty by under-samplingmodel space and biased results by relying on an invalidmodel. To obtain realistic risk estimates, effort shouldthus be made to evaluate multiple, alternativeconceptualizations of the system being analyzed.

Any approach based on evaluation of a discrete set ofalternative models will only be as good as the set ofalternatives. That is, if the set of alternatives does notrepresent the full range of possibilities, conceptualmodel uncertainty will be underestimated. In Neumanand Wierenga's (2003) extensive discussion ofconceptual model uncertainty they provide someadvice on the generation of alternatives, summarized asfollows.

From the assembled database of site-specificdata and other relevant information, consideralternative representations of space-timescales, number and type of hydrogeologic

ix

Page 9: Combined Estimation of Hydrogeologic Conceptual Model and ... · PNNL-14534 Combined Estimation of Hydrogeologic Conceptual Model and Parameter Uncertainty Pacific Northwest National

I I

units, flow and transport propertycharacterization, system boundaries, initialconditions, fast flow paths, controllingtransport phenomena, etc.

• Each conceptual model alternative should besupported by key data.

* Minimize inconsistencies, anomalies, andambiguities.

* Apply the principle of Occam's windowaccording to which one considers only arelatively small set of the most parsimoniousmodels among those which, a priori, appear tobe hydrologically most plausible in light of allknowledge and data relevant to the purpose ofthe model and, a posteriori, explain the data inan acceptable manner.

* Maximize the number of experts involved inthe generation of alternativeconceptualizations.

• Articulate uncertainties associated with eachalternative conceptualization.

Having defined the set of alternatives, the options foraddressing conceptual model uncertainty include thefollowing.

* Evaluate each alternative and select the bestmodel, either through an informal comparisonor through evaluation of a formal modelselection criterion.

* Evaluate each alternative and combine theresults using some weighting scheme.

When multiple model conceptualizations are consistentwith the available data, it may not be justifiable to relyon a single model structure. The method described hererelies on model averaging to combine the predictionsof alternative models. The weights applied to eachmodel's predictions are estimated model probabilities.

The method uses a Maximum Likelihoodimplementation of Bayesian Model Averaging(MLBMA) described by Neuman (2003). If A is thepredicted quantity (e.g., dose), its posterior distributiongiven a set of data D is

p (MA ID)= p(DIMk)P(Mk)K

Zp(DIM,)p(M,),=1

(E-2)

The solution of these equations is accomplished bymaximum likelihood estimation of each model'sparameters.

Prior model probabilities in Equation E-2 rp(M&k) andp(Afl)]are subjective values reflecting a belief about therelative plausibility of each model based on itsapparent consistency with available knowledge anddata. Posterior model probabilities are modifications ofthese subjective values based on an objectiveevaluation of each model's consistency with availabledata. Hence, the posterior probabilities are valid only ina comparative, not in an absolute, sense.

The maximum likelihood method can be applied tocomplex and simplified models as long as each modelin the set of alternatives is based on the same data (Yeet al., 2003). It can be applied to deterministic modelsand also to stochastic models based on momentequations (Hernandez et al., 2003). Application ofmaximum likelihood also yields parameter sensitivityinformation.

Including prior information in the maximum likelihoodcalibration is an option, which allows one to conditionthe parameter estimates not only on site monitoring(observational) data but also on site characterizationdata, potentially rendering the model a better predictor.

Maximum likelihood allows the statistical parameterscharacterizing the parameter and state variable errors tobe estimated. When these statistical parameters areknown (i.e., not estimated), maximum likelihoodreduces to generalized least squares estimation. In thiscase, available codes such as PEST and UCODE canbe applied.

Maximum likelihood estimation yields an approximatecovariance matrix for the parameter estimation errors.Assuming these errors to be Gaussian or log Gaussian,the probability distribution of model output[Ip (A IAfk, D) in Equation E- I] can be determined by

Monte Carlo simulation of A through randomperturbation of the parameters. If the model is ageostatistical model or a stochastic moment model, ityields the expected value and variance of its outputdirectly without Monte Carlo simulation.

In the most data-limited application, one in which thereare no system observations with which to calibrate amodel and the only available parameter information is

K

p(AID) = ZP(AIJ|Mk,D)P(AMk ID)k=I

(E-l)

where M = (M,..., MK) is the set of all models

considered. The posterior probability for model MA isa function of the prior model probability and the modellikelihood, as given by Bayes' rule,

x

Page 10: Combined Estimation of Hydrogeologic Conceptual Model and ... · PNNL-14534 Combined Estimation of Hydrogeologic Conceptual Model and Parameter Uncertainty Pacific Northwest National

that available from generic databases, Equation E-lreduces to

Kp(A)=zP(AIMf)P(M4).

k=1

That is, model predictions can still be made using priorparameter estimates and model averaging can still becarried out, but only with prior model probabilities.Since the predictions and model probabilities are notconditioned on state variable observations, however,the results are expected to be more uncertain andpotentially more biased.

To implement MLBMA the following steps arefollowed.

(I) Postulate alternative conceptual-mathematicalmodels for a site using guidance provided inNeuman and Wierenga (2003).

(2) Assign a prior probability to each model.

(3) Optionally assign prior probabilities to theparameters of each model, using, for example,guidance provided in Meyer and Gee (1999).

(4) Obtain posterior maximum likelihoodparameter estimates, and estimationcovariance, for each model by inversion(model calibration). In many cases, availablecodes such as PEST and UCODE can beapplied to this step.

(5) Calculate a posterior probability for eachmodel using the model calibration results andthe prior model probabilities.

(6) Predict quantities of interest using eachmodel.

(7) Assess prediction uncertainty (distribution,variance) for each model using Monte Carloor stochastic moment methods.

(8) Weight predictions and uncertainties by thecorresponding posterior model probabilities.

(9) Sum the results over all models.

To evaluate MLBMA, it was applied to sevenalternative variogram models of log air permeabilitydata from single-hole pneumatic injection tests in six

boreholes at the Apache Leap Research Site (ALRS) incentral Arizona. Unbiased ML estimates of variogramand drift parameters were obtained using adjoint statemaximum likelihood cross validation in conjunctionwith universal kriging and generalized least squares.Standard information criteria provided an ambiguousranking of the models, which did not justify selectingone of them and discarding all others as is commonlydone in practice. Instead, three of the models wereeliminated based on their negligibly small posteriorprobabilities and the remaining four models were usedto project the measured log permeabilities by krigingonto a rock volume containing the six boreholes. Thesefour projections, and associated kriging variances, wereaveraged using the posterior probability of each modelas weight.

Finally, the results were cross-validated by eliminatingfrom consideration all data from one borehole at atime, repeating the above process, and comparing thepredictive capability of MLBMA with that of eachindividual model. The predictive capabilities of thealternative models and the MLBMA result werecompared through their log scores. The lower thepredictive log score of a model, the smaller the amountof information lost upon eliminating a borehole's datafrom the original dataset (i.e., the higher the probabilitythat the model based on the reduced dataset wouldreproduce the eliminated borehole's data).

Another measure of model performance is itspredictive coverage. This is the percent ofmeasurements from the eliminated borehole's data thatfall within a given prediction interval generated byconducting Monte Carlo simulations of log airpermeability conditioned on the data from theremaining boreholes.

The table below lists the average log score for the threemodel alternatives with the highest posteriorprobability, as well as the average of correspondingMLBMA scores. The average predictive log score ofMLBMA is seen to be smaller than that of anyindividual model, indicating that MLBMA is a betterpredictor than any of the single model alternatives. Thetable also shows the predictive coverage of MLBMA,which is larger than that of any individual model,attesting once again to its superior performance.

Table E-1. Comparison of MLBMA with individual model alternatives

Model PowO ExpO ExpI MLBMA

Predictive log score 34.1 35.2 34.0 31.4

Predictive coverage (%) 86.5 80.8 83.7 87.5

xi

Page 11: Combined Estimation of Hydrogeologic Conceptual Model and ... · PNNL-14534 Combined Estimation of Hydrogeologic Conceptual Model and Parameter Uncertainty Pacific Northwest National

Foreword

This technical contractor report was prepared by Pacific Northwest National Laboratory' (PNNL) under their DOEInteragency Work Order (JCN Y6465) with the U.S. Nuclear Regulatory Commission. This research reportdescribes an approach for integrating two methodologies developed to assess uncertainties: one for evaluatinghydrologic conceptual model uncertainty as documented in NUREG/CR-6805, and the second for estimatinghydrologic parameter uncertainty as documented in NUREG/CR-6767. This report provides both the logicdeveloped and examples demonstrating the approach using field data. The detailed input and analyses for the real-world examples are presented in the report's appendix and may be useful in decommissioning reviews of complexsites. This report is consistent with the NRC strategic performance goal of making NRC activities and decisionsmore effective, efficient, and realistic by identifying and estimating uncertainties.

The report demonstrates, using examples relevant to decommissioning analyses, that sources of uncertainty can beidentified, quantified, and integrated using a comparative model analysis approach. The report illustrates theeffectiveness of the integrated methodology to estimate uncertainty in model predictions arising from bothconceptual and parameter uncertainties. This information will assist NRC licensing staff, Agreement Stateregulators, and licensees in their decision making by identifying and quantifying overall uncertainties inperformance assessment models.

This report, as with the previous reports on individual sources of uncertainty, is not a substitute for NRC regulations,and compliance is not required. The approaches and/or methods described in this NUREG/CR are provided forinformation only. Publication of this report does not necessarily constitute NRC approval or agreement with theinformation contained herein. Use of product or trade names is for identification purposes only and does notconstitute endorsement by the NRC or Pacific Northwest National Laboratory.

Cheryl A. Trottier, ChiefRadiation Protection, Environmental Risk and Waste Management BranchDivision of System Analysis and Regulatory EffectivenessOffice of Nuclear Regulatory Research

'Pacific Northwest National Laboratory is operated for the U.S. Department of Energy by Battelle MemorialInstitute under contract DE-AC06-76RLO 1830.

xiii

Page 12: Combined Estimation of Hydrogeologic Conceptual Model and ... · PNNL-14534 Combined Estimation of Hydrogeologic Conceptual Model and Parameter Uncertainty Pacific Northwest National

Acknowledgments

The authors gratefully acknowledge the financialassistance of the Office of Nuclear RegulatoryResearch of the U.S. Nuclear Regulatory Commission(NRC) and the guidance of the NRC Project Manager,Thomas J. Nicholson.

Comments provided by Mary C. Hill, U.S. GeologicalSurvey and Mark L. Rockhold, PNNL, were helpful inimproving the report. We particularly appreciate themany detailed and challenging comments of Dr. Hill aswell as the generous commitment of her time. We alsoappreciate the comments of various NRC staff andtheir frank discussion of issues raised in the report.

Portions of Chapters 1 and 2 were previously publishedas P.D. Meyer and T.J. Nicholson, "Analysis ofHydrogeologic Conceptual Model and ParameterUncertainty," in Groundwater Quality Modeling andManagement Under Uncertainty, S. Mishra (ed.),American Society of Civil Engineers, Reston, VA,2003. Portions of Chapters 3 and 4 are in review as M.Ye, S.P. Neuman, and P.D. Meyer, "MaximumLikelihood Bayesian Averaging of Spatial VariabilityModels in Unsaturated Fractured Tuff," MaterResources Research, 2003.

xv

Page 13: Combined Estimation of Hydrogeologic Conceptual Model and ... · PNNL-14534 Combined Estimation of Hydrogeologic Conceptual Model and Parameter Uncertainty Pacific Northwest National

I Introduction

In its performance assessments of decommissioningsites and other nuclear facilities, the U.S. NuclearRegulatory Commission (NRC) staff uses a risk-informed, performance-based approach in whichevaluation of risk is an integral part of, but not the solebasis for, decision making. NRC regulatory criteria areoften written in terms of dose. For example, theprimary regulatory criterion for license termination is amaximum dose for the period up to 1000 years fromthe time of decommissioning (see Table 1-1). Onemight argue that risk (such as the risk of prematuredeath) could be derived from knowledge of exposure toa particular dose. When, however, estimating that doseinvolves predictions of contaminant transport andexposure via complex contaminant exposure pathwaysover a 1 000-year period, then there is an obviousadditional component of uncertainty contributing torisk. That component is the uncertainty in the estimateof dose. The importance of assessing uncertainty indose is made clear by considering

* the long regulatory time frame,* complex exposure pathways involving

multiple media,* the relatively small incremental dose specified

in the regulations, and* potentially limited site-specific

characterization data.

In the license termination case, the result of aquantitative assessment of this uncertainty will be anestimate of the probability distribution of dose to theaverage member of the critical group for the 1 000-yearperiod following decommissioning.

There are numerous sources of uncertainty that arepotentially significant contributors to an estimate of theprobability of dose. This is a consequence of themultiple potential exposure pathways. The analysispresented in this report only addresses the pathwaysinvolving transport of radionuclides in water. Forlicense termination, that includes a residential farmerscenario in which exposure comes from the use ofcontaminated groundwater for home, garden, and farm.Thus, the uncertainties considered are those related totransport from a source (typically near the groundsurface) through unsaturated soils and groundwater toan exposure point via a pumped well or surface waterbody. The methods described here are general,however, and could be applied to other exposurescenanos.

Although the analysis described here is limited tohydrogeologic uncertainty, it is comprehensive in thesense that all types of hydrogeologic uncertainty areconsidered. Uncertainty is defined, for the purposes ofthis study, as a lack of certainty due to

* incomplete knowledge of the system beinganalyzed;

* measurement or sampling error incharacterizing the system's features, events,and processes;

* variability in the system's properties;* disparity among the sampling, simulation, and

actual scales of the system's features, events,and processes; and

* randomness in the system's stresses,particularly transient external stresses, often ina short-time context.

Table 1-1. Summary of radiological criteria for license termination (10 CFR Part 20 Subpart E) (from Meyerand Gee, 1999). (TEDE - Total Effective Dose Equivalent; ALARA - As Low As ReasonablyAchievable)

Unrestricted Release Restricted Release

25 mrem TEDE per year peak 25 mrem TEDE per 100 or 500 mrem TEDEDose Criterion annual dose to the average year peak annual dose per year peak annual

member of the critical group to the average member dose to the averageof the critical group member of the criticalwhile controls are in group upon failure of

place the controls

Time Frame 1000 years 1000 years

Other Requirements ALARA ALARA, financial assurance, public participation

I

Page 14: Combined Estimation of Hydrogeologic Conceptual Model and ... · PNNL-14534 Combined Estimation of Hydrogeologic Conceptual Model and Parameter Uncertainty Pacific Northwest National

I I

Note that this definition includes uncertainty that canbe reduced with sufficient data (sometimes referred toas subjective or epistemic uncertainty; see Helton1996) and uncertainty that is an irreduciblecharacteristic of the system (sometimes referred to asstochastic or aleatory uncertainty). An example of theformer is uncertainty about the continuity (thickness)of a low permeability hydrostratigraphic unit.Examples of the latter are the annual recharge rate overthe next 1000 years, or the stage of a riverhydraulically connected to a groundwater system. It isoften argued that these two broad types of uncertaintyshould be kept separate in the application ofuncertainty analysis methods (Helton, 1994; Ayyuband McCuen, 2003); this may improve the ability todraw correct conclusions about the important factorsleading to system success/failure and the value ofadditional data. Winkler (1996) suggests thatuncertainties that appear irreducible may, in fact, oftenbe a function of the available knowledge (i.e.,subjective). For example, river stage may be inherentlyvariable, but that variability could, in principle, beentirely accounted for if a sufficiently detailedhydrologic model and the associated data wereavailable. Winkler (1996) argues that distinctionsbetween types of uncertainty are largely related tosources of information and that it is more useful tothink in terms of what is needed to accomplish themodeling task: adequate decomposition of the problem,combining various sources of information, assessingthe value of additional data, and effectively utilizingsensitivity analysis. This is the viewpoint adopted inthis report.

Models are generally used to make consistent,quantitative assessments of future dose required bycriteria such as that given in Table 1-1. Although we donot strictly distinguish between subjective andstochastic types of uncertainty, from a methodologicalperspective we classify uncertainty as being associatedwith one of three basic components of dose assessmentmodels:

* the conceptual-mathematical basis of themodel,

* model parameters, or* the scenario to which the model is applied.

The model conceptual basis can be thought of as ahypothesis about the behavior of the system beingmodeled and the relationships between the componentsof the system. This conceptualization is typicallyrepresented mathematically to render quantitativepredictions; thus it is appropriate to talk about aconceptual-mathematical model (sometimes referred toas model structure). The model parameters are thequantities required to obtain a solution from the model

(and thus are model-specific). A scenario is definedhere as a future state or condition assumed for a systemthat is the result of an event, process, or feature thatwas not assumed in the initial base case definition ofthe system and diverges significantly from the initialbase case. A scenario may be imposed by humans (e.g.,irrigation schemes and ground-water extraction) butmay also be natural (e.g., glaciation and flooding).Scenarios are often considered in a long-time context.Only hydrologically related aspects of scenariouncertainty are included in this analysis.

The objective of the research described in this report isthe development and application of a methodology forcomprehensively assessing the uncertainties involvedin dose assessment, including uncertainties associatedwith conceptual models, parameters, and scenarios. Inaddressing this problem we have generally adopted aBayesian viewpoint. The merits of a Bayesian(subjectivist) approach to probability relative to aclassical (frequentist) approach have been discussed inmany publications (e.g., Martz and Waller, 1988;Abramson, 1988). Our approach is Bayesian primarilyfor practical reasons. Quantification of hydrogeologicuncertainty for dose assessments must often deal withvery limited observations of site characteristics.Generic and indirect data can be and generally are usedto infer site properties. For example, geologiccharacteristics may be inferred from analysis ofoutcrops, hydraulic characteristics may be estimatedfrom soil-textural information, and radionuclideadsorption characteristics may be assigned from adatabase of values measured at other sites under avariety of conditions. In addition, the assessment ofconceptual model and scenario probabilities seemsinherently subjective. The Bayesian approach providesa means to incorporate different types of data andsubjective judgments into the assessment ofuncertainty.

This report describes and applies a method to estimatethe combined uncertainty in model predictions arisingfrom conceptual model and parameter uncertainties.The inclusion of scenario uncertainty will be describedin a future report. Chapter 2 provides some backgroundon the quantification of parameter and conceptualmodel uncertainty. A related discussion of an approachbeing developed as part of this project for evaluatinguncertainty in the distribution coefficient parameter isincluded in Appendix A. Chapter 3 describes themaximum likelihood Bayesian model averagingmethod, a general method for combining quantitativeestimates of conceptual model and parameteruncertainty. Chapter 4 is an application of this methodto the geostatistical modeling of air permeability at afractured rock site. This application was chosen

2

Page 15: Combined Estimation of Hydrogeologic Conceptual Model and ... · PNNL-14534 Combined Estimation of Hydrogeologic Conceptual Model and Parameter Uncertainty Pacific Northwest National

because the site is a relatively well-controlledexperimental research site with good characterizationdata. In addition the results of past studies at the sitewere available to us. Applications that are morereflective of actual NRC-regulated sites will be thefocus of future efforts.

The developments described here are beingcoordinated with other Federal agencies cooperatingunder the Interagency Steering Committee onMultimedia Environmental Models Memorandum of

Understanding (ISCMEM MOU) (seehttp://lSCMEM.org). Results reported here have beendiscussed with members of the Working Group onUncertainty and Parameter Estimation organized underthe steering committee and have been presented at theInternational Workshop on Uncertainty, Sensitivity,and Parameter Estimation for MultimediaEnvironmental Modeling, held August 19-21, 2003, atNRC Headquarters and organized by the WorkingGroup.

3

Page 16: Combined Estimation of Hydrogeologic Conceptual Model and ... · PNNL-14534 Combined Estimation of Hydrogeologic Conceptual Model and Parameter Uncertainty Pacific Northwest National

2 Quantification of Parameter and Conceptual Model Uncertainty

2.1 Parameter Uncertainty

2.1.1 Sources of Parameter Uncertainty

The sources of uncertainty outlined in the previouschapter that contribute to hydrogeologic parameteruncertainty can be clearly illustrated with the aid ofFigure 2-1, a photo of a trench face from an excavationin the 200 Area of the Hanford Site. A large variationin soil particle size can be seen, ranging from fine siltsto very coarse gravels. The profile shows a layeredstructure with evidence of cross-bedding; the scale ofthe structures is on the order of a few centimeters. Thisvariation results in hydraulic and transport propertiesthat may vary over several orders of magnitude on this

same small scale. Measurements are likely to be madeon a somewhat larger scale, perhaps 10 cm or more.Exhaustive sampling to determine the exact nature ofthe subsurface at this scale will be impossible, thusrequiring interpolation between measurements andother indirect methods to estimate properties atunmeasured locations. In addition, the simulation scalefor most practical applications (and thus the scale ofthe parameters) is likely to be significantly larger thanthe measurement scale, from a few tens of centimetersto many meters.

The impact of measurement errors on parameteruncertainty is often felt to be small relative to othersources of uncertainty and easily quantified. Holt et al.(2002) provide some evidence that relatively simple

Figure 2-1. Photograph of a trench face from an excavation in the 200 Area ofthe Hanford Site, Washington (photograph by John Selker,Oregon State University).

5

Page 17: Combined Estimation of Hydrogeologic Conceptual Model and ... · PNNL-14534 Combined Estimation of Hydrogeologic Conceptual Model and Parameter Uncertainty Pacific Northwest National

measurement errors can introduce significant parameteruncertainties. They simulated tension infiltrometermeasurements with added pressure transducer error(observation error) and contact error (inversion error).They used the simulated measurements to estimate thevariance and correlation length of the parameters of theGardner hydraulic conductivity model over a range oftrue values representing poorly-sorted to well-sortedsilt to coarse sand. The ratio of estimated to trueparameter values (for the variances and correlationlengths) ranged from less than 0.5 to more than 2.5.These are significant errors for parameters representingthe average characteristics of a site. Holt et al. (2002)also observed that the modeled errors producedspurious parameter correlations, an effect that haslikely been poorly appreciated in most applications.

An additional source of parameter uncertainty that haslikely not been fully appreciated can be illustratedusing results presented in Zimmerman et al. (1998).They compared results from seven models calibratedon the same set of data by different participant groupsusing different inverse methods. The ratio of estimatedto true parameter values for the variance andcorrelation length of the transmissivity are shown inFigure 2-2 for each of the inverse methods used. Thetrue transmissivity field was synthetically generated.

An exponential model was fit to the average empiricalvariogram for a set of realizations obtained from eachinverse method. The results shown are for TestProblem 1, the simplest transmissivity model used (anisotropic, exponential variogram). Nonetheless, theparameter errors resulting simply from the use ofdifferent inverse methods (and participants) weresignificant.

2.1.2 Analysis of Parameter Uncertainty

The analysis of parameter uncertainty has receivedmuch attention in the literature. Helton (1993) andMcKay (1995) provide discussions of parameteruncertainty that are particularly relevant to doseassessment modeling. The primary steps involved inaddressing uncertainty in model parameters are

* characterization of parameter uncertainty,* propagation of parameter uncertainty into

model output uncertainty, and* parameter sensitivity analysis.

Parameter estimation, including the characterization ofparameter uncertainty, is driven by the available dataand information. Figure 2-3 is a simple representationof the parameter estimation process, where it is

Transmissivity Variogram

>2.5X 2* VarianceI

g 1.5I-z

E '

coL1

FS 1

t_0.5en

W 0

FF FS LC LS ML PP SSInverse Method

Figure 2-2. Ratio of estimated to true parameter values for variance and correlation length oftransmissivity for seven different inverse methods. Results from Test Problem I ofZimmerman et al. (1998). (FF=Fast Fourier Transform, FS--Fractal Simulation,LC=Linearized Cokriging, LS=Linearized Semianalytical, ML=Maximum Likelihood,PP=Pilot Point, SS=Sequential Self-Calibration)

6

C 0/

Page 18: Combined Estimation of Hydrogeologic Conceptual Model and ... · PNNL-14534 Combined Estimation of Hydrogeologic Conceptual Model and Parameter Uncertainty Pacific Northwest National

Figure 2-3. Use of data/information in parameter estimation

assumed that the process provides not only parameterestimates, but also some measure of the parameteruncertainty. This could take the form of boundingvalues, variances, or specific distributional forms. Atthe lower left are prior parameter estimates based onavailable information that does not include site-specificparameter measurements. This information mayinclude a compilation of parameter values fromnumerous sites, or data from analogous sites. The priorparameter estimates represent the largest degree ofuncertainty and the least amount of site-specific data.In the center of Figure 2-3 are updated (posterior)parameter estimates that are based on the priorestimates but include the effect of site-specificparameter measurements. They represent a decrease inparameter uncertainty from the prior estimates.

Meyer and Gee (1999) discuss data sources forcharacterizing hydrogeologic parameter uncertainty inthe context of dose assessment modeling for licensetermination decisions. They suggest the application ofa hierarchy of data from national-scale databases(referred to as generic information) to site-specificmeasurements of parameter values. Their methodologyis represented schematically in Figure 24. Informationfrom the national-scale databases is used by Meyer andGee (1999) to specify prior parameter distributions thatcan be updated subsequently in a Bayesian approachusing site-specific parameter data (Meyer et al., 1997),which is expected to be sparse or non-existent at manyof the decommissioning sites. In data-limitedapplications parameter probability distributions canalso be based on the subjective opinions of one or more

experts. Formal procedures are available to provideconsistency in the elicitation of expert opinionsregarding probabilities (Morgan and Henrion, 1990). Amethodology relying on generic databases, however,has the advantage of being less expensive and moreeasily applied to a wide variety of sites. Themethodology is currently being extended to include thecharacterization of probability distributions for(adsorption) distribution coefficients of selectedradionuclides (see Appendix A).

When observations of state variables (e.g., hydraulichead, radionuclide concentration) are available at a site,formal calibration methods can be used to improveparameter estimates and characterize the uncertainty ofthese estimates (Hill, 1998). As shown in the upperright of Figure 2-3, this involves the application of aninverse model. These calibrated parameter values mayinclude the effect of the site-specific parametermeasurements. In this case the updated parameterestimates shown in Figure 2-3 are referred to as theprior parameter estimates for the calibration. Calibratedparameter estimates represent the application of themaximum amount of data/information and yieldparameters with the minimum uncertainty. Anapplication to unsaturated flow presented in Wang etal. (2003) illustrates the relationships between the dataused in parameter estimation and the resultingprediction uncertainty.

Note that prior and updated parameter estimates maybe independent of a model. As discussed in Meyer andGee (1999), however, there must be a correspondence

7

Page 19: Combined Estimation of Hydrogeologic Conceptual Model and ... · PNNL-14534 Combined Estimation of Hydrogeologic Conceptual Model and Parameter Uncertainty Pacific Northwest National

I I

.

Types of Information Application of Information

5.co

-g)

CD r

Generic Information fromRegionaVNational Sources

e.g., UNSODA, NUREGICR-6565I

I

Local Informaton fromRegionallNatonal Sources, e.g.,

NatI.Soil Char. Dbase, STATSGO,SSURGO, NCDC, NWIS, GWIS

Parameter Uncertainty Distributons,Bounding Values, Best Estimates in

the absence of site-specific data

Modify Uncertainty Distributions andBounding Values, Best Estimates in

the absence of site-specific data

Modify Uncertainty Distributions andBounding Values, Best Estimates in

the absence of site-specific data

CDa-0CD0LC=C)CD0)

Local Information from Local Sourcese.g., Extension Service, State

Agencies, University/Industry Experts

I Site-specific direct measurements IBest Estimates,

I Modify Uncertainty Distributions

Figure 24. Types and uses of data sources and information for characterizing hydrogeologicparameter uncertainty in dose assessments for license termination decisions (fromMeyer and Gee, 1999). Acronyms refer to various databases.

between the estimates and the parameters assignedthose estimates, e.g., a model that has a single value ofa parameter representing a site must be assigned avalue that represents a mean. Similarly, the uncertaintyin that parameter value must represent uncertainty inthe mean. Because they rely on an inverse model,calibrated parameter estimates are model-dependent. Infact, most calibration methods assume the model iscorrect. Errors thus represent the uncertainty inparameters given that the model is correct. This willunderestimate parameter uncertainty.

Zimmerman et al. (1998) evaluated a variety ofcalibration methods using a set of hypothetical(generated) data based on the Waste Isolation PilotPlant site. Transmissivity fields for two-dimensionalgroundwater flow models were calibrated on four testproblems. One of their conclusions was that thecalibrated models consistently underestimated the"true" variability in transport. The maximum likelihood(Carrera and Neuman, 1986a) and sequential self-calibration (Gomez-Hemandez et al., 1997) methodswere consistently ranked higher than the othermethods. The sequential self-calibration method offersthe advantage of producing spatially variabletransmissivity fields that honor the spatial statistics ofthe transmissivity field. A calibrated, stochasticgroundwater simulation can be carried out using a setof these fields in a Monte Carlo simulation. Themaximum likelihood method is more general, however,and can be applied to the calibration of a wide variety

of parameters, including statistical parameters.Maximum likelihood is used in the method andapplication described in Chapters 3 and 4.

Computer codes that can be adapted to the calibrationof any simulation model have recently becomeavailable (Poeter and Hill, 1998; Doherty, 2002). Oneof these codes, PEST (Doherty, 2002) was used in theapplication presented in Chapter 4. A method forcalibrating geostatistically-simulated parameter fields(similar to the sequential self-calibration method) hasrecently been demonstrated using PEST (Doherty,2003).

A variety of methods for propagating parameteruncertainty are available, including Monte Carlosimulation, the first-order, second-moment method(Kunstmann et al., 2002), the stochastic responsesurface method (Isukapalli et al. 1998), and stochasticmoment methods (Dagan and Neuman, 1997; Zhang,2001). Monte Carlo simulation is the most generallyapplicable method and was used in the applicationpresented in Chapter 4. The stochastic momentmethods are appealing because of their potentialcomputational advantage over Monte Carlo simulation.Recent progress in handling conditions that introducenonstationarities (Zhang, 2001) have made thesemethods more generally applicable.

Uncertainties must be defined on a site-specific basisand the importance of individual sources may vary site

8

Page 20: Combined Estimation of Hydrogeologic Conceptual Model and ... · PNNL-14534 Combined Estimation of Hydrogeologic Conceptual Model and Parameter Uncertainty Pacific Northwest National

by site or even with different objectives at the samesite. Determination of the parameters that are mostimportant to the prediction uncertainty is the finalelement of an assessment of parameter uncertainty.This is generally carried out through theimplementation of sensitivity analysis (Saltelli et al.,2000a; Helton, 1993). Meyer and Taira (2001) applieddifferential, graphical, and sampling-based methods ofsensitivity analysis to decommissioning problems.Sensitivity measures may also be obtained during thecalibration procedure (Hill, 1998; Tiedeman et al.,2003). Global sensitivity methods (Borgonovo et al,2003; Saltelli et al., 2000b; McKay, 1995) partition thetotal prediction variance according to the contributionof each parameter and also determine the contributionto prediction variance due to interactions betweenparameters. A sensitivity analysis was not conductedfor the application described in Chapter 4.

2.2 Conceptual Model Uncertainty

The sources of uncertainty described in the previoussections result in multiple valid representations ofparameter values. That is, for a given model structure,there will be multiple sets of parameter values thatprovide valid representations of observed systembehavior. In a similar manner, the same sources ofuncertainty may result in valid alternative modelstructures or conceptualizations. WThen multiple modelconceptualizations are consistent with the availabledata, it may not be justifiable to rely on a single modelstructure. Relying on a single conceptual representationof a system has two potential pitfalls: the rejection byomission of valid alternatives, and reliance on aninvalid representation by failing to adequately test it.The potential consequences are underestimation ofuncertainty by under-sampling model space and biasedresults by relying on an invalid model.

When discussing model uncertainty, it is instructive toview model structure as the combination of a

conceptual model and a mathematical model: aconceptual-mathematical model (Neuman andWierenga, 2003). The conceptual model can be thoughtof as a hypothesis about the system behavior and therelationship between system components. It isprimarily qualitative and comprehensive. Themathematical model can be thought of as a process totest the conceptual model hypothesis. It is aquantitative, possibly simplified implementation of theconceptual model.

Figure 2-5 illustrates the relationship betweenalternative conceptual-mathematical models. Eachconceptual model is based on the available site dataand other relevant information and represents a distinctconceptualization of system characterization orbehavior. For example, alternative conceptual modelsmight be represented by the presence and absence ofleakage from an underlying aquifer; or the presenceand absence of matrix-fracture interaction in afractured rock. In addition, a single conceptual modelmay be implemented in more than one way: forexample, a fractured rock may be represented as anequivalent porous medium or as a discrete network offractures. The process of conceptual-mathematicalmodel development may be iterative as additional sitedata becomes available and conceptual models areupdated.

In this report, "conceptual model uncertainty" shouldbe interpreted as "conceptual-mathematical modeluncertainty," representing uncertainty in either theconceptual model or its mathematical implementation.

2.2.1 Analysis of Conceptual ModelUncertainty

Methods for the quantification of conceptual modeluncertainty are much less well established than thoseaddressing parameter uncertainty. Mosleh et al. (1994)provide a good introduction to the issues involved.

hoIIo e I o

a-- aca ._. a e t---

Figure 2-5. A schematic representation of the relationship between alternative conceptual-mathematicalmodels

9

Page 21: Combined Estimation of Hydrogeologic Conceptual Model and ... · PNNL-14534 Combined Estimation of Hydrogeologic Conceptual Model and Parameter Uncertainty Pacific Northwest National

I I

Neuman and Wierenga (2003) discuss a wide variety ofissues related to hydrogeologic conceptual modeluncertainty, including many instances of its practicalimportance.

While it is generally possible to specify a reasonableprobability distribution representing the complete set ofpossibilities for the value of a parameter, it is notgenerally possible to specify the complete set ofpossible conceptual model alternatives. As a result,conceptual model uncertainty has generally beenrepresented as a discrete distribution, with a smallnumber of model alternatives taken as the complete setof possibilities. In the generic example of Figure 2-5,the complete set of possibilities consists of threeconceptual-mathematical model alternatives. Havingdefined the set of alternatives, the options foraddressing conceptual model uncertainty include thefollowing.

* Evaluate each alternative and select the bestmodel. This may be carried out through aninformal comparison (James and Oldenburg,1997; Cole et al., 2001) or through evaluationof a formal model selection criterion(Burnham and Anderson, 2002). As discussedpreviously, selection of a single model maynot always be justifiable.

* Evaluate each alternative and combine theresults using some weighting scheme, such asthe likelihood-based weighting of Beven andFreer (2001), the multimodel ensembleapproach of Krishnamurti et al. (2000), themodel likelihood weighting of Burnham andAnderson (2002), and the model probabilityweighting of Draper (1995).

Neuman (2003) reviews a number of approaches thathave been used to address conceptual modeluncertainty. The method he proposes, a version of themodel averaging method described in Draper (1995),was used here and is discussed in detail in thefollowing chapter.

Any approach based on evaluation of a discrete set ofalternative models will only be as good as the set ofalternatives. That is, if the set of alternatives does notrepresent the full range of possibilities, conceptualmodel uncertainty will be underestimated. In Neumanand Wierenga's (2003) extensive discussion ofconceptual model uncertainty, they provide someadvice on the generation of alternatives, summarized asfollows.

* From the assembled database of site-specificdata and other relevant information, consideralternative representations of space-timescales, number and type of hydrogeologicunits, flow and transport propertycharacterization, system boundaries, initialconditions, fast flow paths, controllingtransport phenomena, etc.

• Each conceptual model alternative should besupported by key data.

* Minimize inconsistencies, anomalies, andambiguities.

* Apply the principle of Occam's window(Jefferys and Berger, 1992; Madigan andRaftery, 1994) according to which oneconsiders only a relatively small set of themost parsimonious models among thosewhich, a priori, appear to be hydrologicallymost plausible in light of all knowledge anddata relevant to the purpose of the model and,a posteriori, explain the data in an acceptablemanner.

• Maximize the number of experts involved inthe generation of alternativeconceptualizations.

* Articulate uncertainties associated with eachalternative conceptualization.

Because the set of alternative conceptual models isunlikely to represent the full range of possibilities,evaluations of model uncertainty should be viewed asrelative comparisons. That is, they may be used toconclude that one model is better than another for theintended purpose, but they cannot necessarily be usedto conclude that any model is a good model. Inaddition, as stated above, the contribution of modeluncertainty to overall prediction uncertainty will beunderestimated.

Gaganis and Smith (2001) presented a unique analysisbased on Bayes Factors for calculating an absolutemeasure of conceptual model uncertainty for a singlemodel (that is, without comparison to alternativemodels). We evaluated this method using two syntheticexamples of groundwater flow in which modelstructural errors were introduced through a boundaryflux and a source term error. Parameter uncertainty wasrepresented by a random field of transmissivity.Although head and parameter measurements wereerror-free and all driving forces other than the specifiedmodel errors were known, the method of Gaganis andSmith (2001) provided inconsistent estimates of the(known) model uncertainty. Based on these results, wefeel the method is, at best, not generally applicable.

10

Page 22: Combined Estimation of Hydrogeologic Conceptual Model and ... · PNNL-14534 Combined Estimation of Hydrogeologic Conceptual Model and Parameter Uncertainty Pacific Northwest National

3 Combining Parameter and Conceptual Model Uncertainty

This chapter discusses a method to provide an optimalway of combining the predictions of several alternativemodels and assessing theirjoint predictive uncertainty,with consideration of parameter and conceptual modeluncertainty. This method relies on the specification of aset of alternative models (with the consequentlimitations discussed in the previous chapter andbelow) and weights the alternative model results by ameasure of the model probabilities. The method wasoriginally proposed by Neuman (2002).

3.1 Bayesian Model Averaging

A formal method of evaluating prediction uncertaintywith full consideration of model uncertainty isBayesian Model Averaging (BMA) (Draper, 1995;Hoeting et al., 1999). Using the notation of Hoeting etal. (1999), if A is the predicted quantity, its posteriordistribution given a set of data D is

Ap(AID)= Zp(AIAIkD)p(MA ID) (1)

k=I

whereM =((Mf.*--, MK) is the set of all modelsconsidered, at least one of which must be correct. Asdiscussed in the previous chapter, Neuman andWierenga (2003) provide guidance on selecting a set ofmodels that is small enough to be computationallyfeasible yet large enough to represent the breadth ofsignificant possibilities.

In (1), p (AID) is the average of the posterior

distributions p (A JMk D) under each model, weighted

by their posterior model probabilities p (Mi ID) .The

posterior probability for model MA is given by Bayes'rule,

Sk is the vector of parameters associated with model

Mk P (Ok IMk) is the prior density of Sk under

model M, , p (D 1DA k A) is the joint likelihood of

model M, and its parameters 0,, and p (MA ) is the

prior probability that MA is the correct model. Allprobabilities are implicitly conditional on M.

The posterior mean and variance of A are (Draper,1995)

E[AID]=ZE[AIDMk]P(MA ID)A.,

(4)

Var[AID] = ZVar[AID,M ]p(Mk ID)+k-I

Z(E[AJD,M,]-E[AID])2 p(Mk ID) (5)

In (5), the first term on the right-hand side representswithin-model variance; the second term representsbetween-model variance. Note that the predictiveprobabilities (1) and leading moments (4) - (5) areweighted by the posterior probabilities of the individualmodels.

3.1.1 Interpretation of Model Probability

Philosophical difficulties with the BMA approach havebeen discussed by Winkler (1993) and center on theinterpretation of p (MA ) as the probability that M, isthe correct model and the method's requirement thatone of the M, is in fact the correct model. Winkler(1993) argues that, although this interpretation isintuitively appealing, the existence of a "correct"model is questionable since all models areapproximations of reality.

p(Mk ID)= p (DIA )P(MA )

Zp(DJM,)p(M,)1=1

where

p(DlMk) = fp(DIOk,Mk)p(OkI Mk)dOk

is the integrated likelihood of model Mk,

One approach to these philosophical difficulties is to(2) interpret model probability in relative terms (e.g., Zio

and Apostolakis, 1996), where the model with thegreatest probability is the "best" model (and all modelprobabilities sum to one). Winkler (1993) suggests thatthis means p (A IMA, D) must be interpreted as being

conditional to the "best" model, and asks whether thereis utility in that interpretation if the "best" model is notvery good. As discussed in the previous chapter, basingthe analysis on a set of model alternatives that do notencompass all possibilities implies a relative

II

Page 23: Combined Estimation of Hydrogeologic Conceptual Model and ... · PNNL-14534 Combined Estimation of Hydrogeologic Conceptual Model and Parameter Uncertainty Pacific Northwest National

alJ

comparison between models. We thus interpret priormodel probabilities to be subjective values reflectingthe analyst's belief about the relative plausibility ofeach model based on its apparent (qualitative, a priori)consistency with available knowledge and data.

Whereas prior model probabilities must in our viewremain subjective, the posterior model probabilities aremodifications of these subjective values based on anobjective evaluation of each model's consistency withavailable data. Hence, the posterior probabilities arevalid only in a comparative, not in an absolute, sense.They are conditional on the choice of models (inaddition to being conditional on the data) and may besensitive to the choice of prior model probabilities (asdemonstrated later by example). This sensitivity isexpected to diminish with increased level ofconditioning on data.

3.1.2 Specifying Prior Model Probability

Given a set of alternative models, M, one formallyassumes that their prior probabilities sum up to one,

prior probabilities assigned to models that are deemedclosely related. We explore this idea through anexample in the following chapter.

3.2 Maximum Likelihood BayesianModel Averaging (MLBMA)

Computational difficulties in the BMA approachinclude the calculation of p (A IAf , D) in (I) and

p (D IMi ) in (3), which may require exhaustive

Monte Carlo simulations of the prior parameter space0k for each model. This may be computationally andhydrologically very demanding. Approximating

p(AIM^,D) by p(AI11fl,O ,D) , where 0, is the

maximum likelihood (ML) estimate of 0S based on the

likelihood p(DOk,MA?, ), was suggested by Taplin

(1993) and was shown to be useful in the BMA contextby Draper (1995), Raftery et al. (1996) and Volinsky etal. (1997).

K

Z p (Af)=l. (6)

This implies that all possible models of relevance areincluded in H (the set is collectively exhaustive), andthat all models in M differ from each other sufficientlyto be considered mutually exclusive (the jointprobability of any two models is zero), at the outset.Mutually exclusive models are not redundant; theyproduce different results for the same set of inputs. Inpractice, it may be impossible to demonstrate that theset of models is collectively exhaustive. In this case,model uncertainty may be underestimated, a conditionimplied by the fact that all probabilities computedusing B13MA are conditional on M, as stated previously.

With regard to prior model probability, when there isinsufficient prior reason to prefer one model overanother, a "reasonable 'neutral' choice" (Hoeting et al.,1999) is to assume that all models are a priori equallylikely. Draper (1995) and George (1999) expressconcern that if two models are near equivalent asregards predictions (i.e., redundant), treating them asseparate equally likely models amounts to givingdouble weight to a single model of which there are twoslightly different versions, thereby "diluting" thepredictive power of BMA. One way to minimize thiseffect is to eliminate at the outset models that aredeemed potentially inferior. Another is to retain onlymodels that are structurally distinct and non-collinear.Otherwise, one should consider reducing (diluting) the

Neuman (2002, 2003) proposed evaluating theposterior model probability, p(AfR ID), based on a

result due to Kashyap (1982) and referred to theresulting method as Maximum Likelihood BMA(MLBMIA). Kashyap derived an expression forp(,M ID) by expanding the terms in the integrand of

(3) in a Taylor series about 0k. A related approachbased on Laplace approximations has been used in theBMA context by Draper (1995) and Kass and Raftery(1995). Kashyap's expression can be written (Ye et al.,2003) as

p(M I) -eX(p IAKICk P(kexpH--AKIC, )p (Mk )P(V, ID) = 2

E exp (_ 2KICt ) p(,Vf,)

(7)

where

AKICA KICk - KIC.., (8)

KIC, =NLL, +N. ln (iJ+ln Fk ( Dnlo6k f&)| (9)

KICk is the so-called Kashyap information criterion formodel M,, KICCmIn is its minimum value over allcandidate models, andNLLk =-21np(D6k1,Mk)-2lnP(6kIfk) the

negative log likelihood of Mh evaluated at 0 I Here

12

Page 24: Combined Estimation of Hydrogeologic Conceptual Model and ... · PNNL-14534 Combined Estimation of Hydrogeologic Conceptual Model and Parameter Uncertainty Pacific Northwest National

N. is the dimension of 0 , (number of parameters

associated with model M, ), N is the dimension of D

(number of discrete data points), and F. is thenormalized (by N) observed (as opposed to ensemblemean) Fisher information matrix having components

I a 2 Inp(DIoNA )) (10)

'J N aSj §k =6k

In the absence of prior information about theparameters, one simply drops the term

-2 In p (o, IMf ) from NLL, . This reflects commonpractice in model calibration.

Increasing the number of parameters NA allows-In p (D I, Af, ) to decrease and N, In N toincrease. When NA is large, the rate of decrease doesnot compensate for the rate of increase and KJCk growswhile p (Al, ID) diminishes. This means that a moreparsimonious model with fewer parameters is rankedhigher and assigned a higher posterior probability. Onthe other hand, - In p (D Jo,, M, ) diminishes with N ata rate higher than linear so that as the latter grows,there may be an advantage to a more complex modelwith larger NA.

The last term in (9) reflects the information content ofthe available data. It thus enables consideration ofmodels of growing complexity as the data baseimproves in quantity and quality. As illustrated byCarrera and Neuman (1986b), KICk recognizes thatwhen the data base is limited and/or of poor quality,one has little justification for selecting an elaboratemodel with numerous parameters. Instead, one shouldprefer a simpler model with fewer parameters, whichnevertheless reflects adequately the underlyinghydrologic structure and regime of the system. Statedotherwise, KICk may cause one to prefer a simplermodel that leads to a poorer fit with the data over amore complex model that fits the data better.

As shown in Ye et al. (2003), alternative models canhave different types and numbers of parameters, but thelatter must be estimated and the models comparedconsidering a single data set D. For a comparison oftwo- and three-dimensional models, data distributed inthree-dimensional space may need to be projected ontoa two-dimensional plane as done by Ando et al. (2003)or averaged in the third dimension as suggested byNeuman and Wierenga (2003, Appendix B).

3.2.1 A Few Words About KIC

Previously, KICA has been used (e.g., Carrera andNeuman, 1986a,b; Samper and Neuman, 1989a,b) as anoptimum decision rule for the ranking of competingmodels. The highest-ranking model is thatcorresponding to KICK . Note that KIC has nointrinsic meaning; it is only the differences betweenKIC values that have meaning. Thus the use ofAKIC in (7) reflects the interpretation of p (Afk ID) as

a relative probability suitable for comparing the modelswithin the set M.

The Fisher information matrix term in (9) tends to aconstant as N becomes large, so that KIC, becomesasymptotically equivalent to the Bayes informationcriterion

BIC, = NLL, + N. In N (I I)

derived on the basis of other considerations by Akaike(1977), Rissanen (1978) and Schwarz (1978). Raftery(1993) proposed adopting the asymptotic BICapproximation, without the prior information term

-2 In p (Ok JM, ), for BMA (see also Raftery et al.1996; Volinsky et al. 1997; Hoeting et al. 1999). From(11) it follows that (7) tends asymptotically to

p(AfM (12)

1..

where

ABIC, = BIC, -BIC- (13)

and BIC . is the smallest value of BICk over all

candidate models (see also Burnham and Anderson,2002, pp. 297).

Since hydrologic models are often data limited, theasymptotic expression (12) is less general than thenonasymptotic expression (7) that is at the heart ofMLBMA. Indeed, Carrera and Neuman (1986a,b) andSamper and Neuman (1989a,b) found KICk to providemore reliable rankings of alternative groundwater flowand geostatistical models than do BICk or two othercommonly used information criteria:AIC, = NLLA + 2N, (Akaike, 1974) and

HIC, = NLL5 + 2N In (In N) (Hannan, 1980).

13

Page 25: Combined Estimation of Hydrogeologic Conceptual Model and ... · PNNL-14534 Combined Estimation of Hydrogeologic Conceptual Model and Parameter Uncertainty Pacific Northwest National

I l

For a recent overview of various information criteriathe reader is referred to Burnham and Anderson (2002,p. 284).

3.2.2 Applicability of MLBMA

Using the maximum likelihood method has severaladvantages. It can be applied to both complex andsimplified models. It can be applied to deterministicmodels as described by Carrera and Neuman (1986a,b)and Carrera et al. (1997) and also to stochastic modelsbased on moment equations as demonstrated byHernandez et a]. (2002, 2003). Application ofmaximum likelihood also yields parameter sensitivityinformation.

Including prior information in the maximum likelihoodcalibration is an option that allows one to condition theparameter estimates not only on site monitoring(observational) data but also on site characterizationdata, from which prior parameter estimates are usuallyderived. WVhen both sets of data are considered to bestatistically meaningful, the posterior parameterestimates are compatible with a wider array ofmeasurements than they would be otherwise and aretherefore better constrained (potentially rendering themodel a better predictor).

Maximum likelihood yields a negative log likelihoodcriterion NLL, that includes two weighted squareresidual terms: a generalized sum of squareddifferences between simulated and observed statevariables arising from -2In p(DI 1°',M), and a

generalized sum of squared differences betweenposterior and prior parameter estimates arising from-2 In p (0, JA11, ) .The first is weighted by a matrix

proportional to the inverse covariance matrix of stateobservation errors. The second is weighted by a matrixproportional to the inverse covariance matrix of priorparameter estimation errors. Maximum likelihoodallows the statistical parameters of the errors to beestimated. When these statistical parameters are known(i.e., not estimated), maximum likelihood reduces togeneralized least squares estimation. In this case,available codes such as PEST (Doherty, 2002) andUCODE (Poeter and Hill, 1998) can be applied.

Maximum likelihood estimation yields an approximatecovariance matrix for the estimation errors of 0, .

Upon considering the parameter estimation errors of acalibrated deterministic model M, to be Gaussian or

log Gaussian, one easily determines p (AIM,, O, D)

by Monte Carlo simulation of A through randomperturbation of the parameters. The simulation alsoyields corresponding approximations E[AjM, ,, D]of E[AIM.,D],and Var[AIAI,,6,,D] of

Var [A IM, D], in (4) and (5). If M, is a geostatisticalmodel (as in the example below) or a stochasticmoment model (of the kind considered by Hernandezet al. (2002, 2003), it yields E[AIAt, ,0, D] and

Var[AIAfM, 6,D] directly without Monte Carlo

simulation.

One final point regarding the applicability of MLBMA.In the most data-limited application, one in which thereare no system observations with which to calibrate amodel and the only available parameter information isthat available from generic databases, Equation Ireduces to

K

p(A) = Zp(AIfM )p( f).k-1

That is, model predictions can still be made using prior(or updated) parameter estimates (see Figure 2-3) andmodel averaging can still be carried out, but only withprior model probabilities. Since the predictions andmodel probabilities are not conditioned on statevariable observations, however, the results areexpected to be more uncertain and potentially morebiased.

3.3 Summary of MLBNIA

To implement MLBMA the following steps arefollowed.

(I) Postulate alternative conceptual-mathematicalmodels for a site using guidance provided inNeuman and Wierenga (2003).

(2) Assign a prior probability to each model.

(3) Optionally assign prior probabilities to theparameters of each model, using, for example,guidance provided in Meyer and Gee (1999).

(4) Obtain posterior maximum likelihoodparameter estimates, and estimationcovariance, for each model by inversion(model calibration). In many cases, availablecodes such as PEST (Doherty, 2002) andUCODE (Poeter and Hill, 1998) can beapplied to this step.

14

Page 26: Combined Estimation of Hydrogeologic Conceptual Model and ... · PNNL-14534 Combined Estimation of Hydrogeologic Conceptual Model and Parameter Uncertainty Pacific Northwest National

(5) Calculate a posterior probability for eachmodel using the model calibration results andthe prior model probabilities as expressed inEquations 7 to 9.

(6) Predict quantities of interest using eachmodel.

(7) Assess prediction uncertainty (distribution,variance) for each model using Monte Carloor stochastic moment methods.

(8) Weight predictions and uncertainties by thecorresponding posterior model probabilities.

(9) Sum the results over all models.

A flowchart illustrating the MLBMA approach tocombined estimation of conceptual model andparameter uncertainty is shown in Figure 3-1. Numbersin parentheses above the boxes refer to the numberedsteps above.

The following chapter provides an example applicationof MLBMA and an evaluation of its performance andsuitability.

15

Page 27: Combined Estimation of Hydrogeologic Conceptual Model and ... · PNNL-14534 Combined Estimation of Hydrogeologic Conceptual Model and Parameter Uncertainty Pacific Northwest National

-I 1

(l)

Model formulation Postulate multipleprocess from Neuman 10 alternative conceptual- oand Wierenga (2003) mathematical models

- {~MIM2, .... ,MK.I

(2) i

7

Assign a (subjectiveprobability, p(MI )

model such that E pk

t) prior, to each

4t(. ) = I (3)

I Assign prior Parameter estimation,----------- parameter guidance from Meyer

I probability and Gee (1999)distributions

(4)

Calibrate each Ak imaximum likelih

methods)od

ParameterSensitivityEstimates

ModelInformation

Criteria,K/C \

(5)

Calculate posteriormodel probabilities

p(MA ID)

(8), (9)

PosteriorParameter

Estimates, o6

ParameterCovarianceEstimates

, I

(6), (7)

Run each model to estimateprobability distribution of

predicted quantity (A)

p (A M, ,0, D)

T-Find model-averaged prediction weightedby model probabilities (MLBMA result)

p(AID)=Zp(^I-k,"kD)p(Mk ID)k=I

Information to guidedata collection &improve models

( Combined estimate ofmodel & parameter

uncertainty

Figure 3-1. Maximum Likelihood Bayesian Model Averaging (MILBMIA) approach to combinedestimation of model and parameter uncertainty

16

Page 28: Combined Estimation of Hydrogeologic Conceptual Model and ... · PNNL-14534 Combined Estimation of Hydrogeologic Conceptual Model and Parameter Uncertainty Pacific Northwest National

4 Example Application

4.1 Implementation of MLBMA

To demonstrate the application of MLBMA and toevaluate the results, we apply it to alternativegeostatistical models of log air permeability variationsin unsaturated fractured tuff at the Apache LeapResearch Site (ALRS) in central Arizona. This site waschosen for an initial application of MLBMA because itis a relatively well-controlled experimental researchsite with good characterization data. The results of paststudies at the site were available to us as well, whichfacilitated the application of MLBMA. In addition, themodels considered (geostatistical models ofpermeability) are relatively simple, thus reducing thecomputational effort required to complete theapplication. We recognize that an example consideringgroundwater flow and transport would better reflectNRC-regulated sites. However, we see no fundamentalbarrier in applying MLBMA to the more complexmodels required in such applications. Any difficultiesin applying MLBMA to groundwater flow andtransport applications will be explored in a case-studyusing actual field data that is the focus of future efforts.

4.2 ALRS Data and PreviousEfforts

Spatially distributed log air permeability data wereobtained by Guzman et al. (1994, 1996) based on a

30

steady state interpretation of 184 pneumatic injectiontests in 1-m-length intervals along 6 vertical andinclined (at 45°) boreholes at the site (Figure 4-1). Fiveof the boreholes (V2, W2A, X2, Y2, Z2) are 30-m longand one (Y3) has a length of 45 m; five (W2A, X2, Y2,Y3, Z2) are inclined at 450 and one (V2) is vertical.

Figure 4-2 shows an omni-directional samplevariogram of corresponding log,, k data. Chen et al.(2000) fitted three variogram models to these and some3-m-scale data using an adjoint state maximumlikelihood cross-validation (ASMLCV) methoddeveloped for this purpose by Samper and Neuman(1989a,b), coupled with a generalized least squares(GLS) drift removal approach due to Neuman andJacobson (1984). The three models included (I) power(characteristic of a random fractal), (2) exponentialwith a linear drift, and (3) exponential with a quadraticdrift. The data did not support accounting fordirectional effects by considering the variograms to beanisotropic.

The authors found that whereas the exponentialvariogram model with a quadratic drift provided a bestfit to the data (as measured and implied by the smallestnegative log-likelihood model fit criterion, NLL), fourmodel discrimination criteria (AIC, BIC, HIC, KIC)consistently ranked the power model as best, and theformer model as least acceptable. The reason was that

Figure 4-1 Spatial locations of 184 1-m-scale logloh- data at ALRS

17

Page 29: Combined Estimation of Hydrogeologic Conceptual Model and ... · PNNL-14534 Combined Estimation of Hydrogeologic Conceptual Model and Parameter Uncertainty Pacific Northwest National

I I

1.8

560

1.6

1.41346

1.2

2125

01

c~0.8

5 10 15 20 25 30Separation Distance (m)

Figure 4-2. Omni-directional sample variogram of 1-m-scale logl 0k data at the ALRS and numbers ofdata pairs

whereas all three models provided an almost equallygood fit to the data, the power model was mostparsimonious with only two parameters, and theexponential variogram model with second-order driftwas least parsimonious with twelve parameters. Theytherefore adopted the power model and discarded allother variogram models from further consideration.

loglok measurements into a deterministic drift vector pand a random residual vector R,

D = [t+ R (14)

P(X) = ± g& (x)a, = Gakso

(15)

4.2.1 Alternative Models and MaximumLikelihood Parameter Estimation

For purposes of MLBMA we expand the range ofvariogram models postulated for 1-m-scale loglo k at

the ALRS to seven: (1) Power (PonwO), (2) exponentialwithout a drift (ExpO), (3) exponential with a lineardrift (Erpl), (4) exponential with a quadratic drift(Erp2), (5) spherical without a drift (SphO), (6)spherical with a linear drift (Sphl), and (7) sphericalwith a quadratic drift (Sph2).

To estimate the parameter vector P of drift-freevariogram models (Pow-O, E.xpO, SphO) we useASMLCV as described in Ye et al. (2003),implemented in a computer code slightly modified afterSamper (1998, personal communication). To do thesame for models with drift (Erpl, Exp2, Sphl, Sph2),we decompose the N-dimensional data vector D of

where a = (a,, al,..., ap) is a vector ofp+ I drift

coefficients and G is a N x (p + I) matrix of linearlyindependent monomial functions g, (x) evaluated at

the data points x,,, n = 1, 2,..., N .

Assuming that D is multivariate Gaussian with mean liand covariance matrix CR (Vesselinov 2000 has shownthat the data pass the Kolmogorof-Smirnov test ofunivariate Gaussianity at a significance level of 0.05),the joint negative log likelihood function of drift andvariogram parameters takes the form

NLL(a, P I D) = -2 In p(D I a,A)

= N In 2ir + In ICR (P)I + (D - Ga)r C,' (p)(D - Ga) (16)

Minimizing (16) jointly with respect to a and f yieldsbiased estimates of the variogram parameters, aproblem that can be remedied through the use of a

18

Page 30: Combined Estimation of Hydrogeologic Conceptual Model and ... · PNNL-14534 Combined Estimation of Hydrogeologic Conceptual Model and Parameter Uncertainty Pacific Northwest National

restricted ML (RML) approach (Hoeksema andKitanidis, 1985; Kitanidis and Lane, 1985; Cressie,1991, p. 92). We solve the problem differently byformally decoupling the ML estimations of a and 1.First, we obtain unbiased ML estimates 1 of thevariogram parameters using ASMLCV in conjunctionwith universal kriging (ASMLCV-UK, Samper 1998,personal comm.), which does not require knowledge ofthe drift coefficients (Ye et al., 2003). Next, wecompute corresponding unbiased ML estimates a ofthe drift coefficients through minimization of

NLL(a,J I D)

= N In 2,r + In IC, (p)! + (D - Ga)r C-'(p)(D - Ga) (17)

with respect to a by generalized least squares, a task weaccomplish using PEST-ASP (Doherty, 2002). Ouroptimum NLL is then given by

NLL(i, p I D)

Nln 2g + In IC,(p)I + (D -Gi)TC-'(p)(D - G) .(18)

Figure 4-3 depicts profiles of NLL(a, P I D) in (16)versus each parameter of model Expl when theremaining parameters are fixed. It clearly demonstrates

that I (the marked values of sill and integral scale [m)does not correspond to the minimum of NLL(a, p l D),which would therefore yield biased estimates ofvariogram parameters.

The estimation covariance matrix of 0 = (a, , isgenerally represented by its asymptotic lower orCramer-Rao bound, given by the inverse Fisherinformation matrix (e.g., Carrera et al., 1997).Components of the observed Fisher information matrix(10) are proportional to those of the Hessian matrix Hwhich, in turn, can be approximated as (Kitanidis andLane, 1985)

= 2 Inp(DI0kMk)HkU - o, =,

1 1(aC OC1 aC 8 RT . 1 R-Tr C -+-C - (19)2 80, R ao,) ace 80 o= a 6

This approximation obviates the need to calculatesecond-order derivatives of the log likelihood function,which would be computationally more demanding thancomputing first-order derivatives of C, and R. In our

case, the latter are easy to obtain analytically as donefor exponential and spherical variogram models withdrift (Ye, et al., 2003). An alternative, which in ourcase yields very similar results, is to compute theobserved Fisher information matrix numerically usingmethods such as the Ridder algorithm (Press et al.,1992, pp. 180).

4.2.2 Posterior Model Probabilities

Table 4-1 confirms that increasing the number ofparameters associated with a given class of variogrammodel (exponential or spherical) brings about animprovement in model fit, as indicated by a reductionin the negative log likelihood criterion NLL. Whereasthe exponential variogram model with a quadratic drift(Exp2) fits the data best (ranks first in terms of fit dueto its smallest NLL value), it is ranked second by AICand sixth by BIC and KIC. Whereas the power model(PoVO) shows a relatively poor fit with the data (ratingfifth), it is ranked highly (first through third) by allthree information criteria. The reason is that thedifference in fit between the two models is not enoughto compensate for the much more parsimonious natureof PoivO (with 2 parameters) than that of Erp2 (with 12parameters).

The rankings of the seven models by AIC, BIC and KICare not entirely consistent. None of these informationcriteria provide justification for retaining one modelwhile discarding all other models as is commonly donein practice. Nor do they provide clearjustification forretaining some models while discarding the rest. Wetherefore consider all seven models to be valid initialcandidates for MLBMA.

Upon assigning an equal prior probability of I/7 toeach model, we find on the basis ofKIC via (7) that thefirst three models (PowvO, ExpO, Expl) have muchhigher posterior probabilities than do the rest. Three ofthe models (Exp2, SphO, Sph2) have zero probabilities(to three significant figures) and can therefore beeliminated (considering the low posterior probability ofSphl, there is almost equal justification for eliminatingit too, but we retain it at this stage for the sake ofillustration). Doing so and assigning an equal priorprobability of 1/4 to each of the retained models is seento have no impact on their posterior probabilities. Inboth cases the posterior probabilities are markedlydifferent from their prior values, reflecting the strongimpact of conditioning on data.

4.2.2.1 Sensitivity to Prior Model Probabilities

To investigate the influence of prior probabilityselection on the outcome, consider assigning an equal

19

Page 31: Combined Estimation of Hydrogeologic Conceptual Model and ... · PNNL-14534 Combined Estimation of Hydrogeologic Conceptual Model and Parameter Uncertainty Pacific Northwest National

I I

probability of 1/3 to each of the three classes of models(power, exponential and spherical) and also assigningequal probability to models within each class. Thisresults in a prior probability of 1/3 for PowO and of 1/9for each of the other six models. Though this bringsabout a marked increase in the posterior probability ofPowvO and a decrease in those of ExpO and Expl, onceagain the posterior probabilities of Exp2, SphO andSph2 are zero while that of Sphl is very close to zero.Eliminating the three models with zero posteriorprobability and redistributing the prior probabilitiesamong the remaining models as shown in the next-to-

last row of Table 4-1 brings about a decrease in theposterior probability of PowO and an increase in theposterior probabilities of ExpO and Expl. We concludethat posterior model probabilities exhibit some degreeof sensitivity to the choice of prior probabilities butexpect this sensitivity to diminish with improvedconditioning.

4.2.3 Kriging Results

We continue our analysis by retaining four (PowO,ExpO, Erpl, Sphl) of the seven models (with the

400 A4Uh

-J-Jz

380-J-Jz

360

340

*ntegral scale=1.2

.,~~ ,I...1 2

Range3

a. a,

501

4000

3000-jz2000

1000

a 2 a 3

Figure 4-3. Negative log likelihood functions (OVL) as function of each variogram parameter and driftcoefficient for exponential model with linear drift (Expl). Vertical lines indicate unbiased AMLestimates.

20

Page 32: Combined Estimation of Hydrogeologic Conceptual Model and ... · PNNL-14534 Combined Estimation of Hydrogeologic Conceptual Model and Parameter Uncertainty Pacific Northwest National

Table 4-1. Quality criteria, rankings and prior/posterior probabilities associated with alternativegeostatistical models of loglok at the ALRS

Model PowO ExpO ExpI Exp2 SphO Sphl Sph2

Number ofparameters 2 2 6 12 2 6 12

Sill/Coefficient 0.286 0.718 0.514 0.501 0.749 0.664 0.662

Correlation/Power 0.460 1.840 1.240 1.198 3.184 2.849 2.835

NLL 352.2 361.0 341.6 330.4 379.1 349.6 338.8

Rank 5 6 3 1 7 4 2

AIC 356.2 365.0 353.6 354.4 383.1 361.6 362.8

Rank 3 6 1 2 7 5 4

BIC 362.6 371.4 372.9 392.9 389.5 380.9 401.4

Rank 1 2 3 6 5 4 7

KIC 369.6 370.1 369.5 416.7 390.5 378.1 424.6

Rank 2 3 1 6 5 4 7

p(AM) 1/7 1/7 1/7 1/7 1/7 1/7 1/7

p(MkID)(%) 35.3 26.6 37.6 0 0 0.5 0

p(MA) 1/4 1/4 1/4 - - 1/4 -

p(MiID)(%) 35.3 26.6 37.6 - - 0.5 -

p(AMi) 1/3 1/9 1/9 1/9 1/9 1/9 1/9

p(MkID)(%) 62.1 15.6 22.0 0 0 0.3 0

p(WD) 1/3 1/6 1/6 - - 1/3 -

p(MkID)(%) 52.0 19.6 27.7 - - 0.7 -

corresponding ML parameter estimates) and assigningto each of them an equal prior probability of 1/4. Usingeach of these models, we project the available loglokdata by ordinary (in the case of drift-free models) oruniversal (otherwise) kriging onto a grid of 50 x 40 x30 1-m3 cubes contained within the coordinate ranges-l10x•40 m, -lOy•30 mand -30szsO minFigure 4- 1.

If one thinks of A as a random value of loglok in agiven grid block then our kriging estimates representE [A 1M , 0,, D] and their variances stand for

Var [AIAt,,O,,D], the ML approximations of

E [AIM, D] and Var [A IMk, D] in (4) and (5),respectively.

Figure 4-4 to Figure 4-7 show the kriged estimates andvariances of loglok on a vertical plane y = 6.5 m for thefour models. Conditioning on borehole data is evidentto a lesser degree in the images of loglok estimates thanin those of their variances. Averaging the krigingresults across all models using an ML approximation of(4) and (5) yields corresponding MLBMA estimatesand variances of the kind depicted fory = 6.5 m inFigure 4-8. Figure 4-9 shows a decomposition of theMLBMA estimation variance in Figure 4-8b into itswithin- and between-model components. The largestvalues of these two components throughout the three-dimensional grid are 1.1 and 0.38, respectively.Whereas the within-model MLBMA variance in Figure4-9a reflects conditioning on borehole measurements, itis difficult to discern such conditioning in the image ofbetween-model variance (Figure 4-9b) due to the faint

21

Page 33: Combined Estimation of Hydrogeologic Conceptual Model and ... · PNNL-14534 Combined Estimation of Hydrogeologic Conceptual Model and Parameter Uncertainty Pacific Northwest National

II

reflection of such conditioning in the underlyingimages of loglok estimates.

Figure 4-10 shows univariate cumulativedistributions of kriging estimates correspondingto each of the four models and MLBMA. Thedistributions are seen to be sensitive to the choiceof model with MLBMA providing a weightedcompromise. The same is reflected in thevariances of these kriged estimates, listed inTable 4-2.

Table 4-2. Variance of kriged estimates across thegrid obtained with alternative models andNILBAIA

Model Variance

Pon-O 0.334

ExpO

Expi

SphJ

0.134

0.467

0.404

MLBMA 0.405

22

Page 34: Combined Estimation of Hydrogeologic Conceptual Model and ... · PNNL-14534 Combined Estimation of Hydrogeologic Conceptual Model and Parameter Uncertainty Pacific Northwest National

(a) Kriged estimate-14.0-14.2-14.3-14.5-14.7-14.9-15.0-15.2-15.4-15.5-15.7-15.9-16.1-16.2-16.4

-

-10 0 10 20 30 4U

X(m)

(b) Kriged variance

I I

lI --_- -.. I _

0

-10

-20

-30

1.251.181.111.040.960.890.820.750.680.610.540.460.390.320.25-10 0 10 20 30 40

X(m)

Figure 4-4. Kriged (a) estimate and (b) variance of loglok aty = 6.5 mn obtained using the power model(PowO)

23

(C C~

Page 35: Combined Estimation of Hydrogeologic Conceptual Model and ... · PNNL-14534 Combined Estimation of Hydrogeologic Conceptual Model and Parameter Uncertainty Pacific Northwest National

(a) Kriged estimate

0

-10

-20

-30

-14.0-14.2-14.3-14.5-14.7-14.9-15.0-15.2-15.4-15.5-15.7-15.9-16.1-16.2-16.4-10 0 10 . . . . . . . .20 30 40

X(m)

(b) Kriged variance

0

-10

EN

-20

1.251.181.111.040.960.890.820.750.680.610.540.460.390.320.25

-30

X(m)

Figure 4-5. Kriged (a) estimate and (b) variance of loglok aty = 6.5 m obtained using the exponential modelwithout drift (ExpO)

24

C> 0

Page 36: Combined Estimation of Hydrogeologic Conceptual Model and ... · PNNL-14534 Combined Estimation of Hydrogeologic Conceptual Model and Parameter Uncertainty Pacific Northwest National

(a) Kriged estimate-14.0-14.2-14.3-14.5-14.7-14.9-15.0-15.2-15.4-15.5-15.7-15.9-16.1-16.2-16.4

X(m)

(b) Kriged varianceI ( i I . ,

I_..iI

0

-10

-20

-30

1.251.181.111.040.960.890.820.750.680.610.540.460.390.320.25-10 0 10 20

X(m)30 40

Figure 4-6. Kriged (a) estimate and (b) variance of loglok aty = 6.5 m obtained using the exponential modelwith first-order drift (Expl)

25

co&/

Page 37: Combined Estimation of Hydrogeologic Conceptual Model and ... · PNNL-14534 Combined Estimation of Hydrogeologic Conceptual Model and Parameter Uncertainty Pacific Northwest National

(a) Kriged estimate-

0 -14.0-14.2-14.3-14.5

_-10 -14.70--f -14.9

_|- -15.0-15.2

-030 -15.4-20 -15.5-15.7-15.9-16.1

-30 -16.2-10 0 10 20 30 40 -16.4

X(m)

(b) Kriged varianceI I I ll 1 01.25

_ I II.II.I

1.18t1.111.04

-10 0.96?-- 0.89E 0.82

0.75-20 0.68

0.540.46

-30 °0.39

-10 0 10 20 30 40 0.25

X(m)

Figure 4-7. Kriged (a) estimate and (b) variance of loglok aty = 6.5 m obtained using the spherical model withfirst-order drift (Sphl)

26 ,

C 0-

Page 38: Combined Estimation of Hydrogeologic Conceptual Model and ... · PNNL-14534 Combined Estimation of Hydrogeologic Conceptual Model and Parameter Uncertainty Pacific Northwest National

(a) Kriged estimate0

-10

EN

-20

-14.0-14.2-14.3-14.5-14.7-14.9-15.0-15.2-15.4-15.5-15.7-15.9-16.1-16.2-16.4

-30

X(m)

(b) Kriged variance

0

-10E

N-20

-30

1.251.181.111.040.960.890.820.750.680.610.540.460.390.320.25

X(m)

Figure 4-8. Kriged (a) estimate and (b) variance of loglok aty = 6.5 m obtained using MLBMA

27 cO

Page 39: Combined Estimation of Hydrogeologic Conceptual Model and ... · PNNL-14534 Combined Estimation of Hydrogeologic Conceptual Model and Parameter Uncertainty Pacific Northwest National

(a) Within-model variance0

-10

N-20

-30

X(m)

1.101.040.970.910.840.780.710.650.590.520.460.390.330.260.20

0.380.350.330.300.280.250.220.200.170.150.120.100.070.040.02

(b)Between-model variance0

-10

N-20

-30-10 0 10 20 30 40

X(m)

Figure 4-9. (a) Within- and (b) between-model variance of MLBMA loglok estimates aty = 6.5 m

28

co7 '

Page 40: Combined Estimation of Hydrogeologic Conceptual Model and ... · PNNL-14534 Combined Estimation of Hydrogeologic Conceptual Model and Parameter Uncertainty Pacific Northwest National

o 0.8 - - I

0.7 -___VLBMA A

ta 0.6

0.5:I

U- .4j

0.3: I

~0.20.1

-17 -16 -15 -14 -13log10 k

Figure 4-10. Cumulative distribution of kriged logl0k estimates obtained using various models and MLBMA

4.3 Assessment of PredictivePerformance

Table 4-3. Number of loglok data in DA of eachcross validation case and theirpercentage of the entire data set.

To assess the predictive performance of MLBMA, wecross-validate the above results by (1) splitting the dataD into two parts, DA and D@; (2) obtaining MLestimates of model parameters and posteriorprobabilities conditional on DA; (3) using these torender MLBMA predictions Da of D8; and (4)assessing the quality of the predictions. We do so byeliminating from consideration all logl0k data from oneborehole at a time and predicting them with modelsconditioned on the remaining data. The number andcorresponding percentage of data in DA for each crossvalidation case are listed in Table 4-3. As Sphl has avery small posterior probability in comparison toPowO, ExpO, and Expi (Table 4-1), we limit the cross-validation to the latter three geostatistical models andrecalculate their posterior probabilities by assigning toeach of them a prior probability of 1/3.

Figure 4-11 shows that eliminating data from oneborehole at a time may, but need not, have a significantimpact on the omni-directional sample variogram ofloglok. The impact that such elimination has onparameter estimates and model quality criteriaassociated with PonO is indicated in Figure 4-12.Figure 4-13 demonstrates that posterior modelprobability is sensitive to the choice of conditioningdata. This sensitivity is greater when posteriorprobability is computed using KSC in (7) than BIC in(12). This illustrates that the non-asymptotic criterionKIC is more informative than the asymptotic criterion

Well Number Percentage (%)V2 163 89.1X2 154 83.7Y2 156 84.8Y3 144 78.3Z2 156 84.8

W2A 147 79.9

BIC, supporting the choice of the former as the basisfor MLBMA (Neuman, 2002, 2003).

4.3.1 Predictive Log Score

One way to compare the predictive capabilities ofalternative models is through their log scores,- In p(DB I M,, DA) (Good, 1952; Volinsky et al.,

1997). The lower the predictive log score of model

M, based on data DA, the smaller the amount of

information lost upon eliminating Ds from the originaldataset D (i.e., the higher the probability that M, based

on DA would reproduce the lost data, Da ). Thepredictive log score associated with BMA is

29

Page 41: Combined Estimation of Hydrogeologic Conceptual Model and ... · PNNL-14534 Combined Estimation of Hydrogeologic Conceptual Model and Parameter Uncertainty Pacific Northwest National

I l

-In p(D5 I DA )

= -In p(DhIMKD A)P(1\f, ID")A.,

(20)

Approximating p(D8 I M,, DA) by

p(D' I M, 0,, DA), and computing p(M I D4) via

(7) after replacing D by DA, yields a corresponding logscore for MLBMA.

Let D6 be kriged estimates of loglok data Ds along aborehole obtained using variogram model Mk with

ML parameters 0, based on loglok data DA in otherboreholes. Then the ML log score for drift-free modelsPoivO and ExpO is (Ye et al., 2003)

-In p(D5 IM6,,O5,D')

N21 ( 2 )! 2 B _D )2 (21)= 4 ' n(2;r) + 1 al + D,_D

2 2,., 2 ,., a,

where Nd is the dimension of D', D," are its

components, and o i2 is given in Ye et al. (2003,Equation B5). In analogy to (17), the ML log score forEXpi is

-In p(D" MI, 0,, D') =-Nln(2,r) +-In(l C, (I, ) I)2 2

+-(D - Ghah) TC_'()(D` -Gha) (22)2

Predictive log scores were obtained for each modelupon eliminating data from one of six boreholes at atime. Table 4-4 lists the average of these six scores foreach model, as well as the average of correspondingMLBMA scores (20). The average predictive log scoreof MLBMA is seen to be lower than that of any

Table 4-4. Average predictive log score andpredictive coverage of individualmodels and NILBIA

individual model, indicating that MLBMA is a betterpredictor than any of these models.

4.3.2 Predictive Coverage

Another measure of model performance is itspredictive coverage (Hoeting et al., 1999). This is thepercent of measurements D,` that fall within a given

prediction interval about D,' . In our case, this intervalwas generated by conducting Monte Carlo simulationsof logok conditioned on DA . We used a simulatedannealing code (Deutsch and Journel, 1998, p. 183) toallow generation of statistically nonhomogeneousrandom fields characterized by a power variogram.Figure 4-14a-c show 90% prediction intervals (dashed)defining the 5% and 95% limits of 500 simulationsalong borehole X2 using individual models with MLparameter estimates conditioned on measurements inthe remaining five boreholes. Figure 4-14d showsaverages of these intervals over the three models,weighted by their posterior probabilities. The percentof measurements (triangles) lying within these andsimilar intervals, associated with all six boreholes,defines predictive coverage as listed in Table 4-4. Thepredictive coverage of MLBMA is larger than that ofany individual model, attesting once again to itssuperior performance.

Figure 4-15 depicts the cumulative distributions ofsimulated values at two measurement locations inboreholes V2 and Y3 obtained using individual modelsand MLBMA, while eliminating data from thecorresponding boreholes. The measured values areindicated by vertical lines. In both cases the MLBMAdistribution is strongly influenced by that of PoivO andweakly affected by Expl. Figure 4-16 shows samplepredictive variances obtained using individual modelsand MLBMA at measurement points along each of thetwo boreholes. Along V2, PoivO with a posteriorprobability of about 83% exerts an ovenvhelminginfluence on the predictive variance of MLBMA,which is however lower (closer to those of ErpO andExpl). Along Y3, individual models tend to beassociated with a somewhat lower predictive variancethan MLBMA.

Overall, MLBMA is a more reliable predictor than anyindividual model, as indicated by its relatively smallpredictive log score and large predictive coverage.

Model Predictive PredictiveLog Score Coverage (%)

PowvO 34.1 86.5

ExpO 35.2 80.8

Expl 34.0 83.7

MLBMA 31.4 87.5

30

Page 42: Combined Estimation of Hydrogeologic Conceptual Model and ... · PNNL-14534 Combined Estimation of Hydrogeologic Conceptual Model and Parameter Uncertainty Pacific Northwest National

2

E10 1

Ca

.4 * * Entire data set (a)V2 Eliminated

2 ------- X2 Eliminated.- Y2 Eliminated

.6 -

.2

.8 5 - - - 2-

.4

0 5 10 15 20 25 30

0

Seperation distance (m)

2.4

2

E 1.6E0).0 1.2

>0.8

0.4

* . Entire data set (b)- Y3 Eliminated

-------- Z2 Eliminated ,,--- ---- W2A Eliminated , a

--V

. . . . .. . . . . . . . . . . . I. . . I

I 5 10 15 20 25Seperation distance (m)

30

Figure 4-11. Omni-directional sample variograms of all data and all butdata from borehole (a) N'2, X2, Y2 and (b) Y3, Z2, W2A

31

Page 43: Combined Estimation of Hydrogeologic Conceptual Model and ... · PNNL-14534 Combined Estimation of Hydrogeologic Conceptual Model and Parameter Uncertainty Pacific Northwest National

I I

0.8

cQ 0.7a

E 0.62

0i 0.5E

004.C

> 0.3

nno

(a) * Coefficient- Coefficient

A Power- Power

AA

______________________________ A

........-.- . _.--- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -A._._._._._

v.2- X2 Y2 Y3 Z2 W2ACross validation borehole

Figure 4-12. Dependence of power variogram (PowO) (a) parameters and (b) quality criteria on data. In (a),symbols designate parameter estimates obtained without data from designated borehole;broken and dashed lines indicate parameters obtained with all data.

32

Page 44: Combined Estimation of Hydrogeologic Conceptual Model and ... · PNNL-14534 Combined Estimation of Hydrogeologic Conceptual Model and Parameter Uncertainty Pacific Northwest National

.0 800IL.

- 60a,0

M 40

so

= 20a,0

(a)

* Powi. ExrO

)4 I Al_ _

.2 X2 Y2 y3 Z2Cross validation borehole

X2 Y2 Y3 Z2Cross validation borehole

Figure 4-13. Posterior model probabilities based on (a) BIC and (b) KIC upon eliminating data fromdesignated borehole

33

Page 45: Combined Estimation of Hydrogeologic Conceptual Model and ... · PNNL-14534 Combined Estimation of Hydrogeologic Conceptual Model and Parameter Uncertainty Pacific Northwest National

I I

,3 (a) PowO ,3 (b)ExpO

-14 -14-14 1

.2.-16 -16'

-17 - - -17

-18 1 20 30 -18 10 20 30Test Interval Test Interval

,3 (c) ExpI - (d) MfLB'fA

-13 1

*~-15 - 1

-1 x-165 -

-17 -- -- 17 -

-18 -18-o 20 30 1 20 30Test interval Test interval

Figure 4-14. 5% (bottom dashed) and 95% (top dashed) limits of simulated predictioninterval of logAok along borehole X2. Triangles designate measured values.

34

Page 46: Combined Estimation of Hydrogeologic Conceptual Model and ... · PNNL-14534 Combined Estimation of Hydrogeologic Conceptual Model and Parameter Uncertainty Pacific Northwest National

10.9

r- 0.8Dg07,, 0.7

.0

:E0.6

o 0.5

=0.4

E 0.30 0.2

0.1

0.

1

0.9

C 0.8:_= 0.7r.0

V 0.6a 0.5

0 0.4

E 0.30 0.2

0.1

(;a) ,r d-

,'' ,/! Ir

I I

II

1og 10k-16.1 061)

_ _ _ _ ________ _ : owt- - - ----_

,-§nI , I I . I . .I I I . . . . .I

PovOErpO

AILBAJA. . . . .

, . : _ : _ , _ : . . _

8 -17 -16 -15Iog10k

-14 -13 -' 2

-(b) (b)91ogk-14.203

1.5,/ !

i- I

*1

*1 F Po wO

j ----------- EqXlIjILBAJA

.. . . . . . - . .IO I ' ' : .-18 -17 -16 -15

0ogok-14 -13 -12

Figure 4-15. Cumulative distribution of simulated loglok values at ameasuremcnt location in borehole (a) V2 and (b) Y3. Vertical lineindicates measured value.

35

Page 47: Combined Estimation of Hydrogeologic Conceptual Model and ... · PNNL-14534 Combined Estimation of Hydrogeologic Conceptual Model and Parameter Uncertainty Pacific Northwest National

I I

1.6

1.4

c50a

Cu

1.2

1

(a)

' --- PowvO--- Erp 0

- JILBMA_ _S.- _- - , - _ .. * _-_- --

0.8

0.6

0.4

1

0.8

5 10 15 2(Test interval

)

0UaU'UM

-- PowVO (b)----- EIpO

- Kypil- AILBAIA

- I- -Z?

0.61

0.4

............b 10 lb 20 2b 30 3b 40Cross validation data

Figure 4-16. Sample variances of loglok values simulated using various models and INILBM along borehole(a) V2 and (b) Y3 vhile eliminating the corresponding data

36

Page 48: Combined Estimation of Hydrogeologic Conceptual Model and ... · PNNL-14534 Combined Estimation of Hydrogeologic Conceptual Model and Parameter Uncertainty Pacific Northwest National

5 Conclusions

The objective of the research described in this report isthe development and application of a methodology forcomprehensively assessing the hydrogeologicuncertainties involved in dose assessment modeling.For methodological purposes, uncertainty is classifiedas being associated with the conceptual-mathematicalbasis of the model, model parameters, or the scenarioto which the model is applied.

This report describes and applies a method to estimatethe joint uncertainty in model predictions arising fromconceptual model and parameter uncertainties.Analyses of model uncertainty based on a singlehydrologic concept are prone to statistical bias (bypotential reliance on an invalid model) andunderestimation of uncertainty (by under-sampling ofthe relevant model space). Bias and uncertaintyresulting from an inadequate model structure(conceptualization) are often more detrimental to amodel's predictive reliability than are suboptimalmodel parameters.

Bayesian Model Averaging (BMA) provides anoptimal but computationally demanding way ofcombining the predictions of several competing modelsand assessing theirjoint predictive uncertainty. TheMaximum Likelihood version (MLBMA) of BMAproposed by Neuman (2002, 2003), and described andapplied in this report, renders the approachcomputationally feasible and applicable to real-worldhydrologic problems. It applies to deterministic andstochastic models, to complex and simplified models.

Whereas BMA requires specifying a prior distributionfor model parameters, MLBMA accepts but does notrequire such prior information. This is so because,contrary to BMA, MLBMA relies on maximumlikelihood model calibration against observational data.

In the most data-limited application, one in which thereare no system observations with which to calibrate amodel and the only available parameter information isthat available from generic databases, modelpredictions can still be made using prior parameterestimates and model averaging can still be carried out,but only with prior model probabilities. Since thepredictions and model probabilities are not conditionedon state variable observations, however, the results areexpected to be more uncertain and potentially morebiased.

A further benefit of the use of maximum likelihood isthat the optimization can yield parameter sensitivity

information. In addition, when the statisticalparameters characterizing the parameter and statevariable errors are known (i.e., not estimated),maximum likelihood reduces to generalized leastsquares estimation. In this case, available codes such asPEST and UCODE can be applied.

Prior model probabilities are subjective valuesreflecting a belief about the relative plausibility of eachmodel based on its apparent consistency with availableknowledge and data. Posterior model probabilities aremodifications of these subjective values based on anobjective evaluation of each model's consistency withavailable data. Hence, the posterior probabilities arevalid only in a comparative, not in an absolute, sense.

MLBMA is based on Kashyap's (1982) informationcriterion, KIC, more commonly used as an optimumdecision rule for the ranking of competing models.Like KIC, MLBMA favors models which, among agiven set of alternatives, are least likely to be incorrect.It honors the principle of parsimony by favoring theleast complex among models which, otherwise, fitobservational data equally well. Among models ofequal complexity, MLBMA favors those exhibiting thebest fit. It additionally contains an information termwhich allows one to consider models of growingcomplexity as the dataset improves in quantity andquality. Stated otherwise, MLBMA recognizes thatwhen the dataset is limited and/or of poor quality, oneshould assign relatively low weights to elaboratemodels with numerous parameters. One should weighmore heavily simpler models with fewer parametersthat nevertheless reflect adequately the underlyinghydrologic structure and phenomena.

The example application confirms that the non-asymptotic criterion AC is more informative than itsasymptotic limit BIC, supporting the choice of theformer as the basis for MLBMA.

Models considered in MLBMA may have differenttypes and numbers of parameters, but the latter must beestimated and the models weighted based on a singledataset. As an example, to analyze jointly two- andthree-dimensional models via MLBMA, a given set ofthree-dimensional data must be used and eitherprojected onto a two-dimensional plane or averaged inthe third dimension for inclusion in the two-dimensional model(s).

Application of MLBMA to alternative geostatisticalmodels of log air permeability variations in unsaturated

37

Page 49: Combined Estimation of Hydrogeologic Conceptual Model and ... · PNNL-14534 Combined Estimation of Hydrogeologic Conceptual Model and Parameter Uncertainty Pacific Northwest National

I I

fractured tuff has shown it to be a better predictor ofspatial variability than any individual model.

To implement MLBMA the following steps arefollowed.

(1) Postulate alternative conceptual-mathematicalmodels for a site using guidance provided inNeuman and Wierenga (2003).

(2) Assign a prior probability to each model.

(3) Optionally assign prior probabilities to theparameters of each model, using, for example,guidance provided in Meyer and Gee (1999).

(4) Obtain posterior maximum likelihoodparameter estimates, and estimationcovariance, for each model by inversion(model calibration). In many cases, available

codes such as PEST and UCODE can beapplied to this step.

(5) Calculate a posterior probability for eachmodel using the model calibration results andthe prior model probabilities.

(6) Predict quantities of interest using eachmodel.

(7) Assess prediction uncertainty (distribution,variance) for each model using Monte Carloor stochastic moment methods.

(8) Weight predictions and uncertainties by thecorresponding posterior model probabilities.

(9) Sum the results over all models.

38

Page 50: Combined Estimation of Hydrogeologic Conceptual Model and ... · PNNL-14534 Combined Estimation of Hydrogeologic Conceptual Model and Parameter Uncertainty Pacific Northwest National

6 References

Abramson, L.R., The philosophical basis for the use ofprobabilities in safety assessments, ReliabilityEngineering and System Safety, 23:253-257, 1988.

Akaike, H., A new look at statistical modelidentification, IEEE Trans. Automat. Contr., AC-19,716-722, 1974.

Akaike H., On entropy maximization principle, in:Krishnaiah P.R. (ed.), Applications of Statistics. NorthHolland, Amsterdam, pp. 27-41, 1977.

Ando, K., Kostner, A., and S.P. Neuman, Stochasticcontinuum modeling of flow and transport in acrystalline rock mass: Fanay-Augeres, France,revisited, in press, Hydrogeology Journal, 11 (5), 521 -535,2003.

Ayyub, B.M. and R.H. McCuen, Probability, Statistics,and Reliability for Engineers and Scientists, Chapman& Hall/CRC Press LLC, Boca Raton, Florida, 2003.

Beven K.J. and J. Freer, Equifinality, data assimilation,and uncertainty estimation in mechanistic modelling ofcomplex environmental systems using the GLUEmethodology, J. Hydrology, 249:11-29, 2001.

Borgonovo, E., G.E. Apostolakis, S. Tarantola and A.Saltelli, Comparison of global sensitivity analysistechniques and importance measures in PSA,Reliability Engineering and System Safety, 79:175-185,2003.

Bumham, K.P. and A.R. Anderson, Model selectionand multiple model inference: a practical information-theoretical approach, 2nd edition, New York,Springer, 2002.

Carrera J. and S.P. Neuman, Estimation of aquiferparameters under transient and steady state conditions:1. Maximum likelihood method incorporating priorinformation, Water Resour. Res., 22(2):199-2 10,1986a.

Carrera, J. Neuman S.P., Estimation of aquiferparameters under transient and steady state conditions:3. Application to synthetic and field data. WaterResour. Res. 22(2), 228-242, 1986b.

Carrera, J., Medina, A., Axness, C. & Zimmerman, T.,Formulations and computational issues of the inversionof random fields. In: Subsurface Flow and Transport:A Stochastic Approach (ed. by G. Dagan & S. P.

Neuman), 62-79. Cambridge University Press,Cambridge, United Kingdom, 1997.

Chen, G., W.A. Illman, D.L. Thompson, V.V.Vesselinov, and S.P. Neuman, Geostatistical, typecurve and inverse analyses of pneumatic injection testsin unsaturated fractured tuffs at the Apache LeapResearch Site near Superior, Arizona, pp. 73-98 inDynamics of Flow and Transport in Fractured Rocks,edited by B. Faybishenko et al., AGU monographseries, 2000.

Cole, C.R., M.P. Bergeron, C.J. Murray, P.D. Thorne,S.K. Wurstner, and P.M. Rogers, Uncertainty AnalysisFramework - Hanford Site-Wide Groundwater Flowand Transport Model, PNNL-13641, Pacific NorthwestNational Laboratory, Richland, Washington, 2001.

Cressie, N., Statistics of Spatial Data. John Wiley andSons, Inc., New York, 1991.

Dagan G. and S.P. Neuman (eds.), Subsurface Flowand Transport: A Stochastic Approach, CambridgeUniversity Press, Cambridge, United Kingdom, 1997.

Deutsch, C. V. and A. G. Journel, GSLIB:Geostatistical Software Library and User's Guide(second edition), Oxford University Press, New York-,1998.

Doherty, J., Manualfor PEST, Fifth Edition,Watermark Numerical Computing, Australia, 2002.

Doherty, J., Ground water model calibration using pilotpoints and regularization, Ground Water, 41(2):170-177, 2003.

Draper, D., Assessment and propagation of modeluncertainty, J. Roy. Statist. Soc. Ser. B, 57(l):45-97,1995.

Gaganis, P. and L. Smith, A Bayesian approach to thequantification of the effect of model error on thepredictions of groundwater models, Water Resour. Res.37(9):2309-2322, 2001.

George, E.I., Comment. Statist. Sci. 14(4), 409-412,1999.

Gomez-Hemandez, J.J., A. Sahuquillo, and J.E.Capilla, Stochastic simulation of transmissivity fieldsconditional to both transmissivity and piezometric data,1. Theory, J. Hydrology, 203:162-174, 1997.

39

Page 51: Combined Estimation of Hydrogeologic Conceptual Model and ... · PNNL-14534 Combined Estimation of Hydrogeologic Conceptual Model and Parameter Uncertainty Pacific Northwest National

I I

Good, I.J., Rational Decisions, J. R. Statist. Soc. B,57(l), 107-114, 1952.

Guzman, A. G., S. P. Neuman, C. Lohrstorfer, and R.Bassett, Chapter 4, in Validation studies for AssessingUnsaturated Flow and Transport Through FracturedRock, edited by R.L. Bassett, S.P. Neuman, T.C.Rasmussen, A.G. Guzman, G.R. Davidson, and C.L.Lohrstorfer, pp. 4-1-4-58, NUREG/CR-6203, U.S.Nuclear Regulatory Commission, Washington, D.C.,1994.

Guzman, A. G., A.M. Geddis, M.J. Henrich, C.Lohrstorfer, and S. P. Neuman, Summary of AirPermeability Data From Single-Hole Injection Tests inUnsaturated Fractured Tuffs at the Apache LeapResearch Site: Results of Steady-State TestInterpretation, NUREGICR-6360, U.S. NuclearRegulatory Commission, Washington, D.C., 1996.

Hannan, E. S., The estimation of the order of ARMAprocess. Ann. Stat., 1791-1081, 1980.

Helton, J.C., Uncertainty and sensitivity techniques foruse in performance assessment for radioactive wastedisposal, Reliability Engineering and System Safety,42:327-367, 1993.

Helton, J.C., Treatment of uncertainty in performanceassessments for complex systems, Risk Analysis,14:483-511, 1994.

Helton, J.C., Guest editorial: treatment of aleatory andepistemic uncertainty in performance assessments forcomplex systems, Reliability Engineering and SystemSafety, 54:91-94, 1996.

Hernandez, A.F., S.P. Neuman, A. Guadagnini, and J.Carrera-Ramirez, Conditioning steady state meanstochastic flow equations on head and hydraulicconductivity measurements, 158-162, Proc. 4 Uh Intern.Conf: on Calibration and Reliability in GroundivaterModelling (AlodelCARE 2002), edited by K. Kovar andZ. Hrkal, Charles University, Prague, Czech Republic,2002.

Hemandez, A. F., S. P. Neuman, A. Guadagnini, and J.Carrera, Conditioning mean steady state flow onhydraulic head and conductivity through geostatisticalinversion, Stochastic Environmental Research and RiskAssessment, 17, DOI 10.1007/s00477-003-0154-4,2003.

Hill, M.C., Methods and Guidelinesfor EffectiveModel Calibration, U.S. Geological Survey Water-Resources Investigations Report 98-4005, U.S.Geological Survey, Denver, Colorado, 1998.

Hoeksema, R.J. and P. K. Kitanidis, Analysis of thespatial structure of properties of selected aquifers,Water Resour. Res., 21(4), 563-572, 1985.

Hoeting, J.A., D. Madigan, A.E. Raftery, and C.T.Volinsky, Bayesian model averaging: A tutorial,Statist. Sci., 14(4):382-417, 1999.

Holt, R.M., J.L. Wilson, and R.J. Glass, Spatial bias infield-estimated unsaturated hydraulic properties, WaterResour. Res., 38(12),131 l,doi:10.1029/2002WR001336, 2002.

Isukapalli, S.S., A. Roy and P.G. Georgopoulos,Stochastic response surface methods (SRSMs) foruncertainty propagation: application to environmentaland biological systems, Risk Analysis, Vol. 18, No.3,1998.

James A.L. and C.M. Oldenburg, Linear and MonteCarlo uncertainty analysis for subsurface contaminanttransport simulation, Water Resour. Res., 33(11):2495-2508, 1997.

Jefferys, W.H. and 1.O. Berger, Ockham's razor andBayesian analysis, American Scientist, 80(1):64-72,1992.

Kashyap, R.L., Optimal choice of AR and MA parts inautoregressive moving average models. IEEE Trans.Pattern Anal. Mfach. Intel. PAM! 4(2): 99-104, 1982.

Kass, R.E. and A.E.Raftery, Bayes factors, J. Amer.Statist. Assoc., 90(430):773-795, 1995.

Kitanidis P.K. and R. W. Lane, maximum likelihoodparameter estimation o hydrologic spatial processes bythe Gaussian-Newvton method, J. Hydro., 79, 53-71,1985.

Krishnamurti, T.N., C.M. Kishtawal, Z. Zhang, T.LaRow, D. Bachiochi. E. Williford, S. Gadgil, and S.Surendran, Multimodel ensemble forecasts for weatherand seasonal climate, J. Climate, 13(23):4196-4216,2000.

Kunstmann, H., W. Kinzelbach, and T. Siegfried,Conditional first-order second-moment method and itsapplication to the quantification of uncertainty ingroundwater modeling, Water Resources Research,38(4): 10.1029/2000WR000022, 2002.

Madigan D. and A.E. Raftery, Model selection andaccounting for model uncertainty in graphical modelsusing Occam's window. J. Amer. Statist. Assoc.,89(428):1535-1546, 1994.

40

Page 52: Combined Estimation of Hydrogeologic Conceptual Model and ... · PNNL-14534 Combined Estimation of Hydrogeologic Conceptual Model and Parameter Uncertainty Pacific Northwest National

Martz, H.F. and R.A. Waller, On the meaning ofprobability, Reliability Engineering and System Safety,23:253-257, 1988.

McKay, M.D., Evaluating Prediction Uncertainty,NUREG/CR-63 11, U.S. Nuclear RegulatoryCommission, Washington, D.C., 1995.

Meyer, P.D., M.L. Rockhold, and G.W. Gee,Uncertainty Analyses of Infiltration and SubsurfaceFlow and Transport for SDMP Sites, NUREG/CR-6565, U.S. Nuclear Regulatory Commission,Washington, D.C., 1997. (http://nrc-hvdro-uncert.pnl.gov)

Meyer, P.D. and G.W. Gee, Information on HydrologicConceptual Models, Parameters, UncertaintyAnalysis,and Data Sources for Dose Assessments atDecommissioning Sites, NUREG/CR-6656, U.S.Nuclear Regulatory Commission, Washington, D.C.,1999. (http://nrc-hvdro-uncert.pnl.iov/)

Meyer, P. D. and R. W. Taira, Hydrologic UncertaintyAssessment for Decommissioning Sites: HypotheticalTest Case Applications, NUREG/CR-6695, U.S.Nuclear Regulatory Commission, Washington, D.C.,2001. (http://nrc-hydro-uncert.pnl.gov/)

Morgan, M.G., and M. Henrion, Uncertainty: A Guideto Dealing with Uncertainty in Quantitative Risk andPolicy Analysis, Cambridge University Press,Cambridge, United Kingdom, 1990.

Mosleh, A., N. Siu, C. Smidts, and C. Lui (eds.), ModelUncertainty: Its Characterization and Quantification,Proceedings of Workshop I in Advanced Topics in Riskand Reliability Analysis, NUREG/CP-0 138, U.S.Nuclear Regulatory Commission, Washington, D.C.,1994.

Neuman, S.P., Accounting for conceptual modeluncertainty via maximum likelihood model averaging,529-534, Proc. 4 th Intern. Conf on Calibration andReliability in Groundwater Modelling (ModelCARE2002), edited by K. Kovar and Z. Hrkal, CharlesUniversity, Prague, Czech Republic, 2002.

Neuman, S.P., Maximum likelihood Bayesianaveraging of alternative conceptual-mathematicalmodels, Stochastic Environmental Research and RiskAssessment, 17, DOI 10.1007/s00477-003-0151-7,2003.

Neuman, S. P., and E. A. Jacobson, Analysis ofnonintrinsic spatial variability by residual Kriging withapplication to regional groundwater levels, Math.Geology, 16, 491-521, 1984.

Neuman, S.P. and P.J. Wierenga, A ComprehensiveStrategy of Hydrogeologic Modeling and UncertaintyAnalysisfor Nuclear Facilities and Sites, NUREGICR-6805, U.S. Nuclear Regulatory Commission,Washington, D.C., 2003.

Poeter, E.P. and M.C. Hill, Documentation of UCODE,A Computer Code for Universal Inverse Modeling,U.S. Geological Survey Water-ResourcesInvestigations Report 98-4080, 116 pp., U.S.Geological Survey, Denver, Colorado, 1998.

Press, W.H., S.A. Teukolsky, W.T. Vetterling, B.P.Flannery, Numerical Recipe in Fortran 77 (2nd edition),Cambridge University Press, 1992.

Raftery, A. E., Bayesian model selection in structuralequation models. In: Testing Stnrctural EquationModels, K. Bollen and J. Long (eds.), Sage, NewburyPark, California, pp. 163-180, 1993.

Raftery A.E., D. Madigan, C.T. Volinsky, Accountingfor model uncertainty in survival analysis improvespredictive performance, in: Bayesian Statistics, J.Bernardo, J. Berger, A. Dawid, A. Smith (eds.), OxfordUniv. Press, pp. 323-349, 1996.

Rissanen, J., Modeling by shortest data description,Automatica, 14,465-471, 1978.

Saltelli, A., K. Chan, and E.M. Scott (eds.), SensitivityAnalysis, John Wiley & Sons LTD, Chichester,England, 475 pp., 200a.

Saltelli, A., S. Tarantola and F. Campolongo,Sensitivity analysis as an ingredient of modeling,Statistical Science, Vol. 15, 4:377-395, 2000b.

Samper, F.J. and S.P. Neuman, Estimation of spatialcovariance structures by adjoint state maximumlikelihood cross- validation: 1. Theory, Water Resour.Res., 25(3), 351-362, 1989a.

Samper, F.J. and S.P. Neuman, Estimation of spatialcovariance structures by adjoint state maximumlikelihood cross- validation: 1. Synthetic experiments,Mater Resour. Res., 25(3), 363-371, 1989b.

Schwarz, G., Estimating the dimension of a model,Ann. Stat., 6(2), 461-464, 1978.

Taplin, R.H., Robust likelihood calculation for timeseries, J. Roy. Statist. Soc. Ser. B, 55:829-836, 1993.

Tiedeman, C.R., M.C. Hill, F.A. D'Agnese, C.C.Faunt, Methods for using groundwater modelpredictions to guide hydrogeologic data collection,

41

Page 53: Combined Estimation of Hydrogeologic Conceptual Model and ... · PNNL-14534 Combined Estimation of Hydrogeologic Conceptual Model and Parameter Uncertainty Pacific Northwest National

I I

with application to the Death Valley regionalgroundwater flow system, Water Resour. Res., 39(1),1010, doi:10.1029/2001 WR001255, 2003

Volinsky C.T., D. Madigan, A.E. Raftery, R.A.Kronmal, Bayesian model averaging in proportionalhazard models: assessing the risk of a stroke. J. Roy.Statist. Soc. Ser. C46, 433-448, 1997.

Vesselinov, V.V., Numerical Inverse Interpretation ofpneumatic tests in unsaturated fractured tuffs at theApache Leap Research Site, Ph. D. Dissertation, theUniversity of Arizona, Tucson, Arizona, 2000.

Wang, W., S.P. Neuman, T. Yao, and P.J. Wierenga,Simulation of large-scale field infiltration experimentsusing a hierarchy of models based on public, generic,and site data, Vadose Zone Journal, 2:297-312, 2003.

Winkler, R.L., Model uncertainty: probabilities formodels?, in Mosleh, A., N. Siu, C. Smidts, and C. Lui(eds.), AModel Uncertainty: Its Characterization andQuantiflcation, Proceedings of Workshop I inAdvanced Topics in Risk and Reliability Analysis,NUREG/CP-0 138, U.S. Nuclear RegulatoryCommission, Washington, D.C., 1993.

Winkler, R.L., Uncertainty in probabilistic riskassessment, Reliability Engineering and System Safety,54:127-132, 1996.

Ye, M., S.P. Neuman, and P.D. Meyer, Maximumlikelihood Bayesian averaging of spatial variabilitymodels in unsaturated fractured tuff, Wbater Resour.Res. (in review), 2004.

Zhang, D., Stochastic Uet hodsfor Flow in PorousMedia, Academic Press, 2001.

Zimmerman, D.A., G. de Marsily, C.A. Gotway, M.G.Marietta, C.L. Axness, R.L. Beauheim, R.L. Bras, J.Carrera, G. Dagan, P.B. Davies, D.P. Gallegos, A.Galli, J. Gomez-Hemandez, P. Grindrod, A. L. Gutjahr,P.K. Kitanidis, A.M. Lavenue, D. McLaughlin, S.P.Neuman, B.S. RamaRao, C. Ravenne, and Y. Rubin, Acomparison of seven geostatistically-based inverseapproaches to estimate transmissivities for modelingadvective transport by groundwater flow, lVaterResour. Res., 34(6):1373-1413, 1998.

Zio, E. and G.E. Apostolakis, Two methods for thestructured assessment of model uncertainty by expertsin performance assessments of radioactive wasterepositories, Reliability Engineering and System Safety,54:225-241, 1996.

42

Page 54: Combined Estimation of Hydrogeologic Conceptual Model and ... · PNNL-14534 Combined Estimation of Hydrogeologic Conceptual Model and Parameter Uncertainty Pacific Northwest National

Appendix A. Distribution Coefficients, K d, and Associated Uncertainty inDose Assessment Modeling for Decommissioning Analyses

A.1 Introduction chemistry, and heterogeneities in the physicalproperties of the aquifer materials.

Preliminary or screening dose assessments conductedas part of decommissioning analyses are typicallyconducted using generic input parameter values. Threeexamples of codes that are used for this purpose areDandD, RESRAD and MEPAS (Meyer and Gee,1999). In a recent study, a hypotheticaldecommissioning test case was used to conduct anuncertainty analysis for two of these codes (DandD v.1.0 and RESRAD v. 6.0) (Meyer and Taira, 2001).Uranium was used as one of the contaminants ofinterest. In this case, it was determined that thedistribution coefficient was one of the most criticalparameters for determining dose.

Because the distribution coefficient is an importantsource of uncertainty in dose assessment modeling, it isimportant to have a good understanding of whatcontributes to uncertainty in the distribution coefficientitself. The distribution coefficient or Kd is an empiricalmodel for the description of partitioning of acontaminant between the soillsediment and the solutionin contact with the soil/sediment and is defined asfollows:

Kd = CadC., (A-l)

A.2 Background

A.1.1 Contaminant Adsorption ontoNatural Mineral Surfaces

Adsorption, accumulation at the solid-water interface,is one of the primary processes controlling thetransport of dissolved contaminants in the vadose zoneand groundwater. Adsorption occurs as atoms,molecules, and ions exert forces on each other at thissolid-water interface. Adsorption reactions arediscussed primarily in terms of intermolecularinteractions that occur between the solutes and solidphases (Stumm and Morgan 1996). These interactionsinclude:

1) Surface complexation reactions (surfacehydrolysis and the formation of coordinativebonds at the surface between metal cations,anions, and surface binding sites).

2) Electrostatic interactions at the surfaces,extending over longer distances than chemicalforces.

3) Hydrophobic expulsion of hydrophobicsubstances (this includes nonpolar organicsolutes), which are usually only sparinglysoluble in water and tend to reduce theircontact with water and seek relativelynonpolar environments, thus accumulating onsolid surfaces and becoming adsorbed onorganic sorbents.

4) Adsorption ofsurfactants (molecules thatcontain both a hydrophobic and a hydrophilicmoiety). Interfacial tension and adsorption areintimately related through the Gibbsadsorption law. In simple terms, this lawindicates that substances that reduce surfacetension will tend to adsorb at interfaces.

5) Adsorption ofpolymners and ofpolyelectrolytes (humic substances andproteins in particular). This is a rather generalphenomenon in natural waters and soilsystems that has far-reaching consequencesfor the interaction of particles with each other

where C.& is the concentration of the contaminant ofinterest adsorbed to the solid phase (moles/g) and Caq isthe concentration of the contaminant in the aqueousphase (moles/mL). This model assumes that thepartitioning of the contaminant between the two phasesis in equilibrium and is linear. A significant advantageof the Kd model is its simplicity both for its numericalapplication in transport codes as well as the relativeease of its experimental measurement. For thesereasons, the Kd model is the most widely usedadsorption model in hydrologic transport codes for riskassessment calculations. This simplicity and ease ofuse also make this approach one of the most widelymisused models for describing contaminant adsorption.This is particularly true for systems that have highlyvariable geochemical conditions. Some of the primaryfactors that can lead to large variation in Kd valuesinclude non-linear adsorption, solidlaqueouspartitioning conditions that are controlled or influencedby solubility and/or redox conditions, slow reactionkinetics, spatial variability in the solution chemistry orsolid phase mineralogy, temporal changes in solution

A-l

Page 55: Combined Estimation of Hydrogeologic Conceptual Model and ... · PNNL-14534 Combined Estimation of Hydrogeologic Conceptual Model and Parameter Uncertainty Pacific Northwest National

11

and on the attachments of colloids (andbacteria) to surfaces.

The process in which chemicals become associatedwith solid phases is often referred to as sorption,

especially when one is not sure whether one is dealingwith adsorption (onto a two-dimensional surface) orabsorption into a three-dimensional matrix.

In addition to the nature of the solid phase, thechemical properties of the solution in contact with thesolid phase will have a substantial effect on itsadsorption characteristics. For example, pH will have amajor influence on the degree of surface hydrolysis,which in turn affects the nature and extent of surfacecharge. Ionic strength will affect the electrostaticnature of the surface and therefore the electrostaticinteractions that can occur. In addition to these effects,the adsorption process itself will change the nature ofthe surfaces of the solid phase and will influencefurther adsorption.

The chemical properties of the solution in contact withthe solid phase will also affect adsorption as a result ofinteractions between dissolved species. For example,many metal ions form complexes with major anions insolution. The formation of these complex species canhave a major influence on the charge and geometry ofthe original ion and as a result, significantly alter thesorptive properties of the species of interest. A specialcase of complex formation is hydrolysis. Hydrolysis isthe formation of complexes with hydroxide ion and is astrong function of pH. Ionic strength can be animportant factor that affects the activity of all dissolvedions, and as a result, the extent of complex formation.Eh can also have a large influence on adsorption byaltering the oxidation state of the contaminant and/orthe adsorbent.

A.1.2 Empirical Approaches toAdsorption Modeling

As indicated previously, the linear equilibriumadsorption isotherm or Kd model is an empiricalapproach that assumes the adsorption of a soluteincreases linearly with increasing concentration of asolute. As a result of the empirical nature of the Kdmodel, it cannot represent the individual contributionsof different uptake mechanisms. In addition, the Kdmodel cannot recognize a maximum sorption limit. Inactuality, there are a finite number of sorption sitesand, as a result, sorption will reach a practical upperlimit.

Despite the shortcomings of the Kd model, it canprovide an accurate description of adsorption under

certain conditions. The Kd model generally works wellfor trace concentrations of un-ionized hydrophobicorganic compounds; however, application to ionicinorganic contaminants is more limited. Appropriateuse of the Kd approach for modeling adsorption ofionic species is generally limited to species that havevery simple chemistry and site conditions where thegroundwater solution chemistry and mineralogy of theaquifer material are quite constant and homogeneous.This is generally an unusual occurrence, particularly atcontaminated waste sites.

In addition to the linear equilibrium adsorptionisotherm, several other more complex empiricaladsorption models are available. The Freundlichisotherm (Freundlich, 1926) is a nonlinear equilibriumadsorption model defined by the relationship:

Cads = KFr(Caq)n (A-2)

where Cd,, and Cq are defined as in Eq. (A-I) and KF,

and n are empirical coefficients. For the special casewhere n = 1, Eqs. A- I and A-2 are identical. A plot oflog Cad, versus log Cnq should result in a straight linewith a slope of n and an intercept of log KRFr As withthe linear adsorption isotherm model, an adsorptionmaximum cannot be represented with the Freundlichisotherm.

An empirical adsorption model that accounts for anupper limit to adsorption is the Langmuir isotherm(Langmuir, 1918). This model was developed foradsorption of gases onto solid surfaces and assumesthat all sorption sites are energetically equal. Thegeneral form of the Langmuir isotherm (as adapted foradsorption from solution) is:

Cads = KubCaqf( I + Ku aq) (A-3)

Where b is the maximum adsorption capacity of thesubstrate (g solutelg adsorbent), and Ku( is a constantthat represents the strength of adsorption of the soluteonto the solid (mLimoles). Values for b can bedetermined for a given data set by plotting C1,WCad,versus Cad,. This should yield a straight line with aslope of l/b and an intercept of I/Kub.

A.1.3 Surface Complexation Approach toAdsorption Modeling

Surface complexation models (SCMs) are chemicalmodels that provide a molecular level mechanisticdescription of adsorption. Analogous to solutioncomplexation, surface complexation models definesurface species, chemical reactions, equilibriumconstants, mass balances and charge balances that are

A-2

Page 56: Combined Estimation of Hydrogeologic Conceptual Model and ... · PNNL-14534 Combined Estimation of Hydrogeologic Conceptual Model and Parameter Uncertainty Pacific Northwest National

based on an equilibrium thermodynamic approach.Surface complexation models constitute a family ofmodels that have many common characteristics andadjustable parameters. The models differ in thestructural representation of the solid-solution interface(location of the adsorbing ions and resulting charge).The primary advantage of surface complexationmodels over empirical models is the ability to accountfor variable physical-chemical conditions. This is instark contrast to empirical models, which generallyignore the chemical complexity of the sorptionprocesses and aqueous complexation.

Although surface complexation models are oftenincorporated directly into complex reactive transportcodes, the advantages of the surface complexationmodels can be exploited using simpler hydrologic doseassessment codes as well. This has importantimplications because it is these simpler codes that aremost frequently used for regulatory decision-makingpurposes. In most hydrologic dose assessment codesthe complex geologic conceptual model is simplified toa relatively simple geologic conceptual representation(Meyer and Gee, 1999). These simplified conceptualmodels are typically composed of layers or zones ofmaterials that have distinct and homogenous physical(hydrologic), mineralogical, and chemical properties.By making certain assumptions regarding the averageor typical chemical and mineralogical characteristicswithin these different layers or zones, surfacecomplexation models can be used to calculateindividual Kd values appropriate for each layer or zonewithin the conceptual model.

As indicated above, surface complexation modelsconstitute a family of models that have many commoncharacteristics and adjustable parameters. The mostfrequently used surface complexation models includethe Diffuse Layer Model (DLM), the ConstantCapacitance Model (CCM), the Triple Layer Model(TLM), and non-electrostatic SCMs. The three surfacecomplexation models (DLM, CCM, and TLM) will bediscussed briefly below and the non-electrostatic SCMswill be discussed in the next section.

The DLM is the simplest of the electrostatic SCMs. Inthe DLM, protonation/deprotonation and adsorptionoccur in one plane at the surface/solution interface andonly those ions specifically adsorbed in this inner "o-plane" contribute to the total surface charge (a, = a.).Dzombak and Morel (1990) have provided a detailedevaluation of the DLM, including the development of astrong site/weak site conceptual model for the mineralsurface. The analysis of Dzombak and Morel (1990)also provides parameters for its application to the

sorption of a number of cationic and anionic species onferrihydrate.

The CCM model (Schindler et al., 1976) isconceptually similar to the DLM. In contrast to theDLM, the CCM assumes that the charged surface isisolated from the bulk solution by a plane with aconstant capacitance Cl (Farads/m2), resulting in alinear potential gradient from the charged substrate tothe bulk solution. The CCM approached is generallylimited to a specific ionic strength because changes inionic strength require recalculation of C1. The constantcapacitance term is not measureable and as a result istypically applied as an empirical parameter and fit tothe data. This has the advantage of providing a betterfit to the experimental data, but at the expense oftheoretical rigor.

The TLM (Davis et al., 1978; Davis and Leckie, 1978;1980) is conceptually similar to both the DLM and theCCM. In the TLM; however, the charge/potentialrelationships of the mineral-water interface are dividedinto three layers. The TLM approach provides moreflexibility to simulate ionic strength effects byrepresenting sorption of background electrolytes andpermitting the formation of both inner- and outer-sphere complexes. As a result of its construction, theTLM requires additional parameters beyond thoseneeded for the DLM and CCM. Additional parametersinclude equilibrium constants KC.t and KAn forbackground electrolyte sorption, and capacitances Cland C2 associated with the areas between the o- and f-planes and f3- and d-planes, respectively.

A.1.4 Non-Electrostatic SurfaceComplexation Models

Although SCM is the most theoretically rigorousapproach to modeling contaminant adsorption ontomineral surfaces, application to natural materialsremains problematic. SCM adsorption data aregenerally determined using well-characterized single-phase minerals whose surface properties, such assurface area, site density, and electrostatic correctionterms, are readily measured. For most natural soils andsediments, measurement of the site density andelectrostatic correction terms of the individualcontributing minerals is impractical if not impossible.Natural mineral surfaces in sediments/soils aretypically coated with poorly crystalline secondarymineral coatings (Penn et al. 2001, Coston et al., 1995).In general, these coatings make it extremely difficult toquantitatively assess the electrostatic contribution tothe free energy of adsorption.

A-3

Page 57: Combined Estimation of Hydrogeologic Conceptual Model and ... · PNNL-14534 Combined Estimation of Hydrogeologic Conceptual Model and Parameter Uncertainty Pacific Northwest National

I I

Davis et al. (1998, 2002) recently demonstrated twoapproaches for modeling adsorption onto naturalheterogeneous materials. The two approaches are theComponent Additivity approach and the GeneralizedComposite approach. The Component Additivityapproach is based on summing the adsorption of theindividual mineral components of the soil or sedimentto get the total adsorption of the mixture. Because thismodeling approach is based on summing the resultsfrom models already calibrated with pure mineralphases, the Component Additivity approach ispredictive and does not involve fitting the adsorptiondata of the natural materials.

In the Generalized Composite modeling approach, thesurface of the mineral assemblage is considered toocomplex to be quantified in terms of the contributionsof individual phases to adsorption. Instead theelectrostatic terms are omitted and the mass actionexpressions are described in terms of "generic" surfacefunction groups. The stoichiometry and formationconstants for each reaction are evaluated based on theirsimplicity and goodness of fit to the experimentaladsorption data (Davis et al., 2002; Davis et al., 1998).The generic surface sites represent average propertiesof the sediment/soil rather than specific minerals.Experimental data for site-specific natural materialsmust be collected over the range of chemicalconditions that can be expected in the field. Because ofthe semi-empirical nature of this approach, theresulting model parameters are not likely to betransferable to other field sites.

These two modeling approaches were compared forU(VI) adsorption by sediments from the Koongarranatural analog site in northwest Australia (Davis et al.,2002, Waite et al., 2000). The Component Additivityapproach required eight reactions and used a diffusedouble layer electrostatic model. The GeneralizedComposite approach only needed four surface reactionsand did not include an electrostatic model. The modelfit to the experimental adsorption data for bothapproaches was nearly the same, even though theGeneralized Composite model had seven modelparameters and the Component Additivity model hadeleven.

A.2 Sources of Kd ValueUncertainty

The uncertainty associated with any particular Kd valueused in a risk assessment can be placed into threemajor categories:

I) Experimental uncertainty2) Sorption process chemistry uncertainty

a) variation in solution chemistry- complexation- competitive adsorption- alteration of the adsorption-site chemistry

b) variation in surface adsorption sites- mineralogy- surface coatings and fracture fillings

3) Uncertainty resulting from scaling of K(dmeasurements determined in the laboratory tointact sediments/soil in the fieldc) effective surface area

- surface sites in hydrologic contact withmoving radionuclides

- diffusion

The experimental uncertainty is the sum of the errorsresulting from measurement errors that occur duringthe Kd value measurement. This is generally the mosteasily quantifiable component of the uncertainty andcan be determined using statistical methods. Both theuncertainty in the Kd value that results from variationin the sorption process chemistry, and the uncertaintyresulting from the scaling of laboratory Kd values tointact sediments/soil in the field, could be considered tobe conceptual model uncertainties. This is because, fora particular Kd value, the solution chemistry,sediment/soil mineralogy and surface area per unitweight of the laboratory sample used for the Kd valuedetermination is assumed to be identical to that of thesite (or portion of the site) that is being modeled withthe reactive transport code. If any of these parametersvary significantly such that they can result in asignificant change in the Kd value, then the conceptualmodel would have to be considered asunrepresentative.

In order to quantify the uncertainty of a reactivetransport model resulting from uncertainty in the Kdvalue, the uncertainties resulting from the sorptionprocess chemistry and the uncertainty from scalingmust be quantified.

Quantification of the sorption process chemistryuncertainty can be broken down into two major parts.The first part is quantification of the variation in thesolution chemistry and sediment/soil mineralogy withinthe site being modeled. This is a site characterizationtask that must be conducted with expert guidance toensure that measurements of all geochemicalparameters that could potentially influence adsorptionof the contaminant of interest are made. In addition tothe geochemical parameter measurements, spatialfrequency of the sample collection is of criticalimportance for quantification of the geochemicalparameter variation.

A-4

Page 58: Combined Estimation of Hydrogeologic Conceptual Model and ... · PNNL-14534 Combined Estimation of Hydrogeologic Conceptual Model and Parameter Uncertainty Pacific Northwest National

The second component of the sorption processchemistry uncertainty required to quantify Kd valueuncertainty is quantification of the variation in the Kdvalue as a function of the important geochemicalparameters. This must be conducted in the laboratoryover the range of values for each importantgeochemical parameter that occurs within the site ofinterest.

The uncertainties that result from scaling issues arelargely the result of differences in the amount ofadsorption sites that are in hydrologic contact with themobile aqueous phase within the field site, versus thatwhich are accessible to the aqueous phase in thelaboratory Kd value determinations. Because theadsorbed phase concentration (Cd, in Eq. A-I) of theKd is given in terms of unit mass, as opposed to unitsurface area, any difference between the surface areaper unit weight of soil/sediment that occurs in-situversus that in the laboratory system will result in error.

A.3 Variability in Kd Values and theImpact on TransportCalculations

As indicated earlier Kd values are empirical constantsand as a result can be applied with confidence only toconditions that are the same as those under which thevalue was measured. If the sediment/soil mineralogy orphysical properties, solution chemistry, or contaminantloading of the system to be modeled are significantlydifferent than that for which the }Cd value wasdetermined, significant error in the estimated transportrates could result. This is because many factors canaffect the degree to which a particular contaminantadsorbs to a particular sediment or soil (as discussedabove). These factors include: sediment mineralogyand surface area, major ion concentration in solution(complexation and competitive adsorption), pH of thesolution, and the concentration of the adsorbent insolution and on the adsorbate. Careful application ofexpert geochemical knowledge can often significantlyreduce the number of significant variables that must beconsidered for evaluating Kd values. For example, someradionuclides may have a low tendency to formcomplexes with other major ions in solution or do notinteract significantly with certain mineral surfaces.

In the hypothetical test case conducted by Meyer andTaira (2001), a Kd value of 15 was used for uranium.This value is a geometric mean value for loam takenfrom the compilation by Sheppard and Thibault (1990).A major problem with using mean Kd values from thisand similar literature compilations of Kd values forconducting screening calculations is the inherently

large variation in the Kd values. For example, Sheppardand Thibault (1990) report a range in Kd values foruranium of 0.03 to 2200 ml/gm. The reason for thislarge degree of variability in Kd values is due largely todifferences in solution chemistry and soil propertiesused in the various Kd value determinations included inthe compilation. Because no control is placed on thesevariables during the statistical analysis of the Kd values,the individual impact of these variables is ignored,resulting in the large overall variation observed.

To better illustrate the impact of these values on thecalculated mobility of uranium, these Kd values will beconverted to retardation factors. The retardation factoris a measure of the ratio of the average linear velocityof water divided by the average linear velocity of thecontaminant. The retardation factor can be calculatedusing the following equation:

Rf= I + (Kd pb)/ 0 (A-4)

where, the retardation factor is Rf (unitless), Pb (kg/iM3)is the bulk density, and 0 (m3/m3) is the volumetricwater content. By assuming a bulk density of 1.86kg/im3 and a volumetric water content of 0.30 m3/m3,equation I can be simplified to:

R= I + 6.2Kd (A-5)

Using the range of Kdvalues for uranium reported bySheppard and Thibault (1990), the range in retardationfactors is calculated to be 1.2 to 14,000. This range inretardation factors illustrates that, for the reportedrange of Kd values, uranium has the potential to varyfrom being essentially unretarded (Rr = I indicates thecontaminant moves with the water or no adsorptionoccurs) to being essentially immobile (stronglyadsorbed), depending upon the conditions encountered.

There are several factors that account for this largevariation in adsorption potential. These factors includethe highly variable adsorption potential of differentminerals for uranium, and the strong influence of pHand carbonate concentration of uranium adsorption. Forexample, Turner et al. (2002) illustrate uranium Kd datafor silica, montmorillonite, and clinoptilolite as afunction of pH (in equilibrium with atmospheric C02).From this data, it can be seen that for silica at pH 8 thetypical Kd value is 5 ml/gm. As the pH decreases tobetween 6.5 and 6.0, the Kd for silica peaks at 50. Asthe pH decreases further to pH 4 the Kd decreases toabout 0.3. In contrast, Kdvalues for montmorilloniteare much higher. At pH 8 the Kd is approximately 300.As the pH decreases to between 6.5 and 6.0 the Kd formontmorillonite peaks at 10,000. As the pH decreases

A-5

Page 59: Combined Estimation of Hydrogeologic Conceptual Model and ... · PNNL-14534 Combined Estimation of Hydrogeologic Conceptual Model and Parameter Uncertainty Pacific Northwest National

further to pH 4 the Kd decreases to about 300. Theserelationships are illustrated in Figure A- I.

It is clear from these illustrations that the variability inKd values as a result of large heterogeneities in site-specific mineralogy and solution chemistry could resultin highly variable adsorption behavior that couldpotentially result in significant error when compared tomodeling results determined with a single generic Kdvalue.

The magnitude of variation illustrated for uranium Kdvalues could also be expected for other radionuclidescommonly encountered at NRC decommissioning sites.Specific examples are C-14 and possibly Tc-99 and Sr-90. Most of the other radionuclides commonlyencountered at NRC decommissioning sites (Cs-137,Co-60, Ni-63, Am-241, Pu-238,-239,-241, Eu-152, Nb-94, and Cm-243) are strongly adsorbing under typicalconditions and even large variability in their Kd valuesis not likely to result in large differences in doseuncertainty. H-3 is not adsorbing with a Kd value ofzero with little uncertainty. This suggests that thegreatest degree of uncertainty in dose models resultsfrom uncertainty of Kd values for a limited number ofradionuclides.

A.4 Determination of Kd Values andAssociated Uncertainly

Experimental determination of site-specific Kd valuesis likely to remain the most common method for

characterizing adsorption in risk assessment models atmost sites in the near term. Geochemical reasoning andthermodynamic modeling can provide valuableguidance and support for the experimentaldetermination of Kd values and how they vary withsolution chemistry and mineralogy. In some cases,surface complexation models can be used to estimateKid values as a function of solution chemistry andmineralogy. This approach has been demonstrated by anumber of researchers to support performanceassessments at major radioactive waste disposal sitesthat have significant resources to devote to such efforts(Davis et al., 2002; Turner et al., 2002). This approachis currently gaining acceptance as the best compromisebetween comprehensive scientific defensibility andpractical application. It is expected that this approachfor determining input sorption parameters for moreroutine risk and performance assessment modelingefforts will become increasingly utilized as thedatabase of thermodynamic sorption models increases.This approach typically requires a significant amountof site-specific geochemical characterization.

A.4.1 Systematic Approach forDetermination of Kd Values andAssociated Uncertainty

A systematic approach for determining Kd values andassociated uncertainty for use in dose assessmentmodeling at specific sites is outlined below in generalterms. The first step in this approach is to collect allsite-specific characterization data that is available that

[-Silica - M/ntmorillonite - Clinoptilolitej

10000

1000

1001 0-

1

0.1 I I I I I I

3 4 5 6 7 8 9

pH

Figure A-I. Variability in uranium Kd as a function of mineral and pH. Based on data fromTurner et al. (2002)

A-6

L(%

Page 60: Combined Estimation of Hydrogeologic Conceptual Model and ... · PNNL-14534 Combined Estimation of Hydrogeologic Conceptual Model and Parameter Uncertainty Pacific Northwest National

may be useful for estimating adsorption of thecontaminants of interest. This could include aqueousphase chemical data (contaminant concentrations,major ion data, Eb, and pH), aquifer materialmineralogy, mineral surface coatings, stratigraphy, andspatial and temporal variability of these geochemicalparameters. This information can be used to guide theselection of Kd values from generic compilations of Kdvalues or from other adsorption data available in theliterature that could be used to calculate Kd values(such as surface complexation model data). If theuncertainty of the Kd value estimates determined in thisprocess is acceptable, no further Kdvalue refinement isnecessary. If the uncertainty of the Kdvalue estimatesdetermined in this process is too high or if the availablecharacterization data and/or available adsorption datafor the contaminants of concern is not adequate, then amore detailed geochemical analysis must be conducted.As part of the geochemical analysis, site-specificcharacterization needs would be determined and therequirements and scope of an adsorption study todevelop site-specific Kd values as a function ofimportant geochemical parameters would be outlined.The site-specific characterization work may involve aniterative process where early characterization resultscan be used to determine and guide furthercharacterization needs.

A.4.2 Determination of Uranium KdValues and Associated Uncertaintywvith Iterative Refinement toMaximize Cost Effectiveness

A brief outline will be provided here to illustrate howthis methodology can be applied to a specificcontaminant. In this case, uranium has been selectedfor illustrative purposes because it is a majorcontaminant of concern for a number ofdecommissioning sites and uranium has complexadsorptive behavior that ranges from non-adsorbing tohighly adsorbing, depending on geochemicalconditions.

The first step to estimating a site-specific Kdvalue is tocompile any available site characterization data thatwould be useful from a geochemical perspective. Thiswould include solution chemistry data (major cationand anion concentrations, alkalinity measurements, pH,Eh and contaminant concentrations), and mineralogy(texture, major mineral components, clay mineralogyand hydrous metal oxide content). The geochemistry ofthe contaminant of interest will determine whichgeochemical parameters are most critical fordetermination of the Kdvalue. In the case of uranium,the carbonate concentration has a very large effect on

the adsorption of uranium due to strong complexformation with carbonate. For example, Kd values foruranium (VI) adsorption on ferrihydrite at pH 8 havebeen shown to decrease by four orders of magnitude asthe partial pressure of carbon dioxide gas, pCO2,increases from its value in air (0.032%) to 1% (Davis etal., 2002). This is an important variation to understand,because pCO2 in aquifers commonly reaches values of1-5%, while most laboratory determined Kd valueshave been determined in equilibrium with air. Thecarbonate concentration (or pCO2) can be determinedfrom measurements of pH and alkalinity. So in general,the two most important solution parameters to knowfor estimating Kd values are pH and alkalinity. Othermajor ions are of secondary importance, but caninfluence the speciation of the carbonate system.

After the solution parameters, pH and alkalinity, thenext most important geochemical parameter to knowfor uranium Kd estimation is the mineralogy. Themineralogical information can range from very generaldescriptions (sand, silt, clay, calcarious, etc.), to veryspecific such as a complete quantitative mineralogicalcharacterization. This would include the percentages ofthe major minerals present, clay mineralogy andhydrous metal oxide content. In between these twoextremes, one could obtain a semi-quantitative XRDscan that would provide characterization of the majorcrystalline minerals present.

Once the characterization data have been assembled,this information would be used to find Kdvalues in theliterature or from Kd compilations that best match siteconditions. Alternatively, adsorption data determinedfor pure minerals could be used to calculate Kd values.This could involve the use of surface complexationmodels and geochemical equilibrium codes combined,with adsorption site densities estimated from sitecharacterization data to estimate Kd values for specificgeochemical conditions.

Depending on the nature of the site and the adsorptiondata available in the literature, it may be determinedthat some limited additional characterization data maysignificantly reduce the uncertainty of the current Kdestimates. For example, if uranium adsorption data areavailable in the literature for ferrihydrite andmontmorillonite at various pH values and carbonate(pCO2) concentrations and it is determined that thesetwo minerals are significant components of the aquifermaterial and are likely to be controlling uraniumadsorption, it may then be worthwhile to conductquantitative measurements of these components onavailable samples from the site. An additional step thatcould be taken to narrow Kd value uncertainty evenfurther, would be to conduct an adsorption study using

A-7

Page 61: Combined Estimation of Hydrogeologic Conceptual Model and ... · PNNL-14534 Combined Estimation of Hydrogeologic Conceptual Model and Parameter Uncertainty Pacific Northwest National

-L

site aquifer material over a range of parametersappropriate to site conditions. This iterative approachto narrowing the uncertainty of the Kd value may be asensible approach for addressing dose assessmentmodeling at sites that initially have littlecharacterization data available. It will also provide ameans to balance the contrasting needs of reducing Kdvalue uncertainty and producing a cost effectiveperformance assessment.

The methodology outlined above for determining a Kdvalue for uranium can also be used to determine thespatial and temporal variation in the Kd value; however,the spatial and temporal variation in the indicatedcritical parameters must be known or estimated. In thecase of pCO2, values can increase in groundwaterrecharge as a result of transport through organic richhorizons where significant decomposition is occurring.This can lead to significant spatial and temporalvariation in pCO2 and therefore uranium retardation.

Significant complications that have not been addressedin this discussion are the fact that Kd values aregenerally given in units based on adsorption per unitmass. Because adsorption is actually related to the sitedensity of the adsorbent, significant differences insurface area per unit mass of the material used in theadsorption measurements the site material can result inerror. For example, Turner et al. (2002) have shownthat uranium adsorption onto montmorillonite,clinoptilolite, a-alumina, and quartz have similar K,values on a specific surface area basis (mL/m2);however, for Kd values on a mass basis (mL/g), thedifference between Kd values for montmorillonite andquartz is about three orders of magnitude at nearneutral pH values.

A.5 References

Coston, J.A., C.C. Fuller, and J.A. Davis. Pb2 " and Zn2+Adsorption by a Natural Aluminum-Bearing and Iron-Bearing Surface Coating on an Aquifer Sand. Geochim.Cosmochim. Acta, 59(17): 3535-3547, 1995.

Davis, J. A., G.P. Curtis, and J.D. Randall, Applicationof Surface Complexation Modeling to DescribeUranium(V!) Adsorption and Retardation at theUranium Mill Tailings Site at Naturata, Colorado,NUREG/CR-6820, U.S. Nuclear RegulatoryCommission, Washington, DC, 2003.

Davis, J.A., T.E. Payne, and T. D. Waite. Simulationthe pH and pCO2 Dependence of Uranium(VI)Adsorption by a Weathered Schist with SurfaceComplexation Models. In: Geochemistry of SoilRadionuclides, (ed. P.C. Zhang and P.V. Brady), SSSA

Special Publication Number 59, pp. 61-86, SoilScience Society of America, Madison, Wisconsin,2002.

Davis, J. A., J.A. Coston, D.B. Kent, and C.C. Fuller.Application of the Surface Complexation Concept toComplex Mineral Assemblages. Environ. Sci. Technol.32 :2820-2828, 1998.

Davis, J.A. and J.O. Leckie. Surface Ionization andComplexation at the Oxide/Water Interface 3 -Adsorption of Anions." Journal of Colloid InterfaceScience. 74:32-43, 1980.

Davis, J.A. and J.O. Leckie. Surface Ionization andComplexation at the OxideAVater Interface 2 - SurfaceProperties of Amorphous Iron Oxyhydroxide andAdsorption of Metal Ions. Journal of Colloid InterfaceScience. 67:90-107, 1978.

Davis, J.A., R.O. James, and J.O. Leckie. SurfaceIonization and Complexation at the Oxide/WaterInterface I - Computation of Electrical Double LayerProperties in Simple Electrolytes." Journal of ColloidInterface Science. 63:480499, 1978.

Dzombak, D.A., and F.M.M. Morel, SurfaceComplexation Modeling. If ydrous Ferric Oxide. John,Wiley and Sons, New York, 1990.

Freundlich, H. Colloid and Capillary Chemistry.Methuen, London, 1926.

Langmuir, D. The adsorption of gases on planesurfaces of glass, mica, and platinum. Jour. Amer.Chem. Soc. 40: 1361-1403, 1918.

Meyer, P.D. and G.W. Gee. Information on HydrologicConceptual Models, Parameters, Uncertainty Analysis.and Data Sources for Dose Assessments atDecommissioning Sites. NUREG/CR-6656, PNNL-13091, U.S. Nuclear Regulatory Commission,Washington, DC, 1999.

Meyer, P.D. and R.Y. Taira. Hydrologic UncertaintyAssessmentfor Decommissioning Sites: HjpotheticalTest Case Applications. NUREG/CR-6695, PNNL-13375, U.S. Nuclear Regulatory Commission,Washington, DC, 2001.

Penn, R.L., C. Shu, H. Z. Xu, and D.R. Veblen. IronOxide Coatings on Sand Grains from the AtlanticCoastal Plain: High-Resolution Transmission ElectronMicroscopy Characterization. Geology, 29(9):843-846,2001.

A-8

Page 62: Combined Estimation of Hydrogeologic Conceptual Model and ... · PNNL-14534 Combined Estimation of Hydrogeologic Conceptual Model and Parameter Uncertainty Pacific Northwest National

Schindler, P.W., B. Furst, R. Dick, and P.U. Wolf."Ligand Properties of Surface Silanol Groups." Journalof Colloid Interface Science. 55:469-475, 1976

Sheppard, M.L. and D.H. Thibault. "Default SoilSolid/Liquid Partition Coefficients, Kds, for FourMajor Soil Types: A Compendium." Health Physics59(4):471478, 1990.

Stumm, W., and J.J. Morgan. Aquatic Chemistry.Wiley, New York, 1996.

Performance Assessment: Approaches for theAbstraction of Detailed Models. In Geochemistry ofSoil Radionuclides, (ed. P.C. Zhang and P.V. Brady),SSSA Special Publication Number 59, pp. 211-252,Soil Science Society of America, Madison, Wisconsin,2002.

Waite, T.D., J.A. Davis, B.R. Fenton, and T.E. Payne.Approaches to Modeling Uranium(VI) Adsorption onNatural Mineral Assemblages. Radiochim. Acta88:687-693,2000.

Turner, D.R., F.P. Bertetti, and R.T. Pabalan. Role ofRadionuclide Sorption in High-Level Waste

A-9

Page 63: Combined Estimation of Hydrogeologic Conceptual Model and ... · PNNL-14534 Combined Estimation of Hydrogeologic Conceptual Model and Parameter Uncertainty Pacific Northwest National

NRC FORM 335 US. NUCLEAR REGULATORY COMMISSION 1. REPORT NUMBER(249) (Assigned by NRC, Add Vol., Supp., Rev.,NRCM 1102, BIBLIOGRAPHIC DATA SHEE and Addendum Numbers, If any.)3201,2202 BBIGAHCDT HE

(See estrucnons on the reverse) NUREGCR-68432. TITLE AND SUBTITLE PNNL-14534

Combined Estimation of Hydrogeologic Conceptual Model and Parameter Uncertainty 3. DATE REPORT PUBLISHED

MONTH YEAR

March 20044. FIN OR GRANT NUMBER

Y6465

5. AUTHOR(S) 6. TYPE OF REPORT

P.D. Meyer, M.Ye, S.P. Neuman and K.J. Cantrell Technical

7. PERIOD COVERED (Induskve Dates)

March 2003 - March 20048. PERFORMING ORGANIZATION - NAME AND ADDRESS (IhNRC. pmvied&sDon Office or Region. U.S. NudfearRegulatory Commission and mang fraddress: i contra cor.

provede name and mafrt address.)

Pacific Northwest National Laboratory Department of Hydrology and Water ResourcesP.O. Box 999 University of ArizonaRichland, WA 99352 Tucson, AZ 85721

9. SPONSORING ORGANIZATION - NAME AND ADDRESS (ff NRC. t)pe Same as above'; ictntractor. provide NRC DAis ion. Office orRegion. US. Nudear Regulatory Commission.and alting address.)

Division of Systems Analysis and Regulatory EffectivenessOffice of Nuclear Regulatory ResearchU.S. Nuclear Regulatory CommissionWashington, DC 20555-0001

10. SUPPLEMENTARY NOTES

T.J. Nicholson, NRC Project Manager11. ABSTRACT (200 words or less)

The objective of this research is the development and application of a methodology for comprehensively assessing thehydrogeologic conceptual model, parameter, and scenario uncertainties involved in dose assessment. This report describesand applies a statistical method, Maximum Likelihood Bayesian Model Averaging (MLBMA), to quantitatively estimate thecombined uncertainty in model predictions arising from hydrogeologic conceptual model and parameter uncertainties. Themethod relies on model averaging to combine the predictions of a set of altemative models and uses model calibration toupdate prior parameter estimates and model probabilities based on the correspondence between model predictions and siteobservations. MLBMA was applied to the geostatistical modeling of air permeability at a fractured rock site. Seven alternativevariogram models of log air permeability were considered. Unbiased maximum likelihood estimates of variogram and driftparameters were obtained for each model. Standard information criteria provided an ambiguous ranking of the models, whichwould not justify selecting one of them and discarding all others as is commonly done in practice. Instead, three of the modelswere eliminated based on their negligibly small updated probabilities. The remaining four models were averaged using theposterior model probabilities as weights. Using two quantitative measures of comparison, model-averaged predictions weresuperior to any individual geostatistical model of log permeability considered.

12. KEY WORDSIDESCRIPTORS (List words orphrases that wilassist researchers hi lcating the report) 13. AVAILABILITY STATEMENT

Bayesian Model Averaging unlimiteddecommissioning 14. SECURITY CLASSIFICATION

dose assessment (This Page)ground-water modeling unclassifiedhydrogeologic conceptual model (Uhi Report)model calibration unclassifiedmodel uncertainty 15 NUMBER OF PAGESparameter uncertaintyuncertainty

16. PRICE

NRC FORM 335 (2-89) This form was electronically produced by Elite Federal Forms, Inc.

Page 64: Combined Estimation of Hydrogeologic Conceptual Model and ... · PNNL-14534 Combined Estimation of Hydrogeologic Conceptual Model and Parameter Uncertainty Pacific Northwest National

-LI

LPri~ntedon recycleds

paper

Federal Recycling Program

Page 65: Combined Estimation of Hydrogeologic Conceptual Model and ... · PNNL-14534 Combined Estimation of Hydrogeologic Conceptual Model and Parameter Uncertainty Pacific Northwest National

NUREG/CR-6843 COMBINED ESTIMATION OF IIYDROGEOLOGIC CONCEPTUALMODEL AND PARAMETER UNCERTAINTY

MARCH 2004

UNITED STATESNUCLEAR REGULATORY COMMISSION

WASHINGTON, DC 20555-0001

OFFICIAL BUSINESS