Upload
philippa-poole
View
214
Download
0
Embed Size (px)
Citation preview
MITREMITRE
Developing a Common Framework Developing a Common Framework for Model Evaluationfor Model Evaluation
Panel DiscussionPanel Discussion77thth Transport and Diffusion Conference Transport and Diffusion Conference
George Mason UniversityGeorge Mason University19 June 200319 June 2003
Dr. Priscilla A. GlasowDr. Priscilla A. GlasowThe MITRE CorporationThe MITRE Corporation
MITREMITRE
TopicsTopics
Joint Effects Model evaluation Joint Effects Model evaluation approachapproach
Model evaluation in the Department Model evaluation in the Department of Defenseof Defense MethodsMethods Evaluation method deficienciesEvaluation method deficiencies Lessons learned in building a common Lessons learned in building a common
evaluation frameworkevaluation framework
MITREMITRE
System DescriptionSystem DescriptionModels the transport and diffusion of hazardous Models the transport and diffusion of hazardous materials in near real-timematerials in near real-time CBRN HazardsCBRN Hazards Toxic Industrial Chemicals / Materials Toxic Industrial Chemicals / Materials
Modeling approach includesModeling approach includes WeatherWeather Terrain / vegetation / marine environmentTerrain / vegetation / marine environment Interaction with materialsInteraction with materials
Depicts the hazard area, concentrations, lethality, etc. Depicts the hazard area, concentrations, lethality, etc. Overlays on command and control system mapsOverlays on command and control system maps
Will deploy on C4I systems and standaloneWill deploy on C4I systems and standalone Where JWARN is not needed, but hazard prediction is neededWhere JWARN is not needed, but hazard prediction is needed
MITREMITRE
System Description System Description (continued)(continued)
Legacy systems being replacedLegacy systems being replaced HPAC - a DTRA S&T effortHPAC - a DTRA S&T effort VLSTRACK - a NAVSEA S&T effortVLSTRACK - a NAVSEA S&T effort D2PUFF - a USA S&T effortD2PUFF - a USA S&T effort
Improved CapabilitiesImproved Capabilities One model, accredited for all uses currently supported by One model, accredited for all uses currently supported by
the 3 Interim Accredited DoD Hazard Prediction Models the 3 Interim Accredited DoD Hazard Prediction Models Warfighter validated Graphical User InterfaceWarfighter validated Graphical User Interface Uses only Authoritative Data SourcesUses only Authoritative Data Sources
Integrated into Service Command and Control systemsIntegrated into Service Command and Control systems Flexible & Extensible Open ArchitectureFlexible & Extensible Open Architecture
Seamless weather data transfer from provider of choiceSeamless weather data transfer from provider of choice Planned program sustainment and Warfighter supportPlanned program sustainment and Warfighter support
School-house and embedded trainingSchool-house and embedded training Reach BackReach Back
MITREMITRE
OCCOCCVULNVULNHAZHAZPPPP
// OCCOCCVULNVULNHAZHAZPPPP
//
PHAZ = probability of the hazard
PHAZ/VULN = prob. of the hazard for a given vulnerability
PVULN/OCC = prob. of vulnerability for a given occurrence
POCC = prob. of the occurrence
Calculating Hazard Predictions Calculating Hazard Predictions
How might a hazard probability be How might a hazard probability be calculated? calculated?
MITREMITRE
Example: Calculating Example: Calculating CasualtiesCasualties
The probability of casualties can be The probability of casualties can be defined as: defined as:
EXPEXPCONTCONTCASCASPPPP
// EXPEXPCONTCONTCASCASPPPP
//
PCAS = prob. of casualties
PCAS/CONT = prob. of casualties for a given contamination
PCONT/EXP = prob. of contamination for a given exposure
PEXP = prob. of exposure
MITREMITRE
Probability of CasualtiesProbability of CasualtiesPCAS/CONT
Probability
of hazard for a given vulnerability
Toxicity is
already a probability
Data for
Inhalation or contact
PCONT/EXP
Probability of vulnerability for a given occurrence
Can be empirical or
computed by JOEF Can account for
Protection Levels (by inverting protection factors – ratios of concentrations outside a system to inside a system)
PEXP
Probability of occurrence
Performed by JEM’s Atmospheric Transport and Dispersion algorithms
MET DATA SOURCE
TERMS T&D
PHYSICS
MITREMITRE
AFW
AOL
FNW
Falsely and Falsely Not WarnedFalsely and Falsely Not Warned
Adapted from IDA work by Dr. Steve Warner
A
Can compute fractions of areas Can compute fractions of areas
(under curves) of FW and FNW to (under curves) of FW and FNW to
observed and/or predicted areas. observed and/or predicted areas. Given enough field trial data, Given enough field trial data,
means of FW and FNW can be means of FW and FNW can be
used as probabilities. used as probabilities.
AccuracyAccuracy then 1 – FW, 1 – FNW, then 1 – FW, 1 – FNW,
(false pos. acc. and false neg. (false pos. acc. and false neg.
acc.) acc.) Propose a tentative ~ 0.2 FW Propose a tentative ~ 0.2 FW
accuracy standard and a accuracy standard and a
tentative ~ 0.4 FNW accuracy tentative ~ 0.4 FNW accuracy
standard which 1) HPAC has standard which 1) HPAC has
demonstrated and 2) is demonstrated and 2) is
apparently accepted by HPAC apparently accepted by HPAC
users. users. Comparisons to ATP-45 (as Comparisons to ATP-45 (as
current JORD language reads) current JORD language reads)
may be problematic for FNW. may be problematic for FNW.
MITREMITRE
Transport and Diffusion Transport and Diffusion Statistical ComparisonsStatistical Comparisons
ASTM D 6589-00, Standard ASTM D 6589-00, Standard
Guide for Statistical Guide for Statistical
Evaluation of Atmospheric Evaluation of Atmospheric
Dispersion Model Dispersion Model
Performance Performance Geometric mean, geometric Geometric mean, geometric
variance, bias (absolute, variance, bias (absolute,
fractional), normalized mean fractional), normalized mean
square error, correlation square error, correlation
coefficient, robust highest coefficient, robust highest
concentration concentration Use all available Validation Use all available Validation
Database data Database data Allows comparison to other Allows comparison to other
T&D models and legacy T&D models and legacy
codescodes
MITREMITRE
Anticipated Approach Anticipated Approach Use both the IDA methodology and the Use both the IDA methodology and the
T&D statistics using the validation T&D statistics using the validation databasedatabase
The IDA metrics are easy for the The IDA metrics are easy for the warfighter to understand and visualize the warfighter to understand and visualize the threatthreat
T&D statistics provide a basis for T&D statistics provide a basis for comparison to legacy codes (comparison comparison to legacy codes (comparison of observed and predicted means)of observed and predicted means)
MITREMITRE
Model Evaluation in the Model Evaluation in the Department of DefenseDepartment of Defense
MITREMITRE
Model Evaluation MethodsModel Evaluation Methods
The Department of Defense (DoD) assesses The Department of Defense (DoD) assesses the credibility of models and simulations the credibility of models and simulations through a process of Verification, Validation through a process of Verification, Validation and Accreditation (VV&A)and Accreditation (VV&A)
The JEM model evaluation approach will The JEM model evaluation approach will follow DoD VV&A policy and proceduresfollow DoD VV&A policy and procedures DoD Instruction 5000.61, May 2003DoD Instruction 5000.61, May 2003 DoD VV&A Recommended Practices Guide, DoD VV&A Recommended Practices Guide,
November 1996November 1996
MITREMITRE
Current Methods DeficienciesCurrent Methods DeficienciesScience ViewScience View
Inadequate use of statistical methods Inadequate use of statistical methods for assessing the credibility of for assessing the credibility of models and simulationsmodels and simulations
Over-reliance on subjective methods, Over-reliance on subjective methods, such as subject matter experts, such as subject matter experts, whose credentials are not whose credentials are not documented and who may not be documented and who may not be current or unbiasedcurrent or unbiased
MITREMITRE
Current Method DeficienciesCurrent Method DeficienciesUser ViewUser View
Insufficient guidance on what techniques Insufficient guidance on what techniques are available and how to select suitable are available and how to select suitable techniques for a given applicationtechniques for a given application
Overemphasis on using models for their Overemphasis on using models for their own sake rather than as tools to support own sake rather than as tools to support analysisanalysis
Failure to carefully and clearly define the Failure to carefully and clearly define the problem for which models are being used problem for which models are being used to provide insights or solutionsto provide insights or solutions
Not understanding the inherent limitations Not understanding the inherent limitations of modelsof models
MITREMITRE
Is Developing a Common Is Developing a Common Framework an Achievable Goal?Framework an Achievable Goal?
The Military Operations Research Society The Military Operations Research Society (MORS) sponsored a workshop in October (MORS) sponsored a workshop in October 2002 to address the status and health of 2002 to address the status and health of VV&A practices in DoDVV&A practices in DoD
An Ad Hoc committee was also formed by An Ad Hoc committee was also formed by Mr. Walt Hollis, Deputy Undersecretary of Mr. Walt Hollis, Deputy Undersecretary of the Army (Operations Research) to address the Army (Operations Research) to address general concerns regarding capability and general concerns regarding capability and expectations for using M&S in acquisitionexpectations for using M&S in acquisition
MITREMITRE
Best PracticesBest PracticesAn Initial Set of Goals for Developing a Common An Initial Set of Goals for Developing a Common
Framework for Model EvaluationFramework for Model Evaluation
Understand the problemUnderstand the problem Determine use of M&SDetermine use of M&S Focus accreditation criteriaFocus accreditation criteria Scope the V&VScope the V&V Contract to get what you needContract to get what you need
MITREMITRE
Best PracticesBest PracticesUnderstand the ProblemUnderstand the Problem
Understand the problem/questions so that you Understand the problem/questions so that you can develop sound criteria against which to can develop sound criteria against which to evaluate possible solutions.evaluate possible solutions.
This first step is frequently disregardedThis first step is frequently disregarded Too hard to doToo hard to do Assumption that the problem is “clearly evident”Assumption that the problem is “clearly evident” No real understanding of what the problem isNo real understanding of what the problem is
Failure to understand the problem results in:Failure to understand the problem results in: Unfocused use of M&SUnfocused use of M&S Unbounded VV&AUnbounded VV&A Unnecessary expenditure of time and money without Unnecessary expenditure of time and money without
meaningful resultsmeaningful results
MITREMITRE
Best Practices:Best Practices:Determine Use of M&SDetermine Use of M&S
M&S should be used to:M&S should be used to: help critical thinkinghelp critical thinking generate hypothesesgenerate hypotheses provide insightsprovide insights perform sensitivity analyses to identify logical consequences of assumptionsperform sensitivity analyses to identify logical consequences of assumptions generate imaginable scenariosgenerate imaginable scenarios visualize resultsvisualize results crystallize thinkingcrystallize thinking suggest experiments the importance of which can only be demonstrated by the suggest experiments the importance of which can only be demonstrated by the
use of M&Suse of M&S Know up front what you will use M&S forKnow up front what you will use M&S for
What part of the problem can be answered by M&S?What part of the problem can be answered by M&S? What requirements do you have that M&S can address?What requirements do you have that M&S can address? Use M&S appropriately: If the model isn’t sufficiently accurate Use M&S appropriately: If the model isn’t sufficiently accurate
vis-à-vis the test issues, it may lead to flawed information for vis-à-vis the test issues, it may lead to flawed information for decisions, such as test planning and execution.decisions, such as test planning and execution.
MITREMITRE
Best Practices:Best Practices:Focus Accreditation CriteriaFocus Accreditation Criteria
Focus the accreditation on the criteria; Focus the accreditation on the criteria; focus the VV&A effort on reducing focus the VV&A effort on reducing program risk.program risk.
Accreditation criteria are best developed Accreditation criteria are best developed through collaboration among all through collaboration among all stakeholders:stakeholders: Get System Program Office buy-in on the M&S effortGet System Program Office buy-in on the M&S effort Foster collaboration with testers, analysts, and model Foster collaboration with testers, analysts, and model
developers…and key decisionmakers. developers…and key decisionmakers. Build the training simulation first to elicit the user’s needs Build the training simulation first to elicit the user’s needs
and ideas about “key criteria” before building the system and ideas about “key criteria” before building the system itself. If this is not feasible, think “training” and ultimate itself. If this is not feasible, think “training” and ultimate use to define key criteria.use to define key criteria.
MITREMITRE
Best Practices:Best Practices:Scope the V&VScope the V&V
Focus the accreditation on the criteria as Focus the accreditation on the criteria as that will properly focus the verification and that will properly focus the verification and validation (V&V) effort.validation (V&V) effort.
Write the accreditation plan so that it Write the accreditation plan so that it scopes and bounds the V&V effort.scopes and bounds the V&V effort.
Plan the V&V effort in advance to use test Plan the V&V effort in advance to use test data to validate the modeldata to validate the model Need to ensure that the data are used correctlyNeed to ensure that the data are used correctly Need to document the conditions under which Need to document the conditions under which
the data were collectedthe data were collected
MITREMITRE
Best Practices:Best Practices:Contract to Get What You NeedContract to Get What You Need
Include M&S and VV&A requirements in Include M&S and VV&A requirements in Requests for Proposals to obtain bids and Requests for Proposals to obtain bids and estimated costsestimated costs Require that bidders provide a conceptual model that Require that bidders provide a conceptual model that
can be used to assess the bidder’s understanding of the can be used to assess the bidder’s understanding of the problemproblem
Make documentation of M&S and VV&A deliverables Make documentation of M&S and VV&A deliverables under the contractunder the contract
Competition among bidders will generate alternative Competition among bidders will generate alternative models and a broader range of ideasmodels and a broader range of ideas
Require that the Government announce what M&S it will Require that the Government announce what M&S it will use as part of their source selectionuse as part of their source selection
Ensure that the VV&A team is multidisciplinary and Ensure that the VV&A team is multidisciplinary and includes experts who understand the mathematical and includes experts who understand the mathematical and physical underpinnings of the modelphysical underpinnings of the model
Ensure that Subject Matter Experts (SMEs) have relevant Ensure that Subject Matter Experts (SMEs) have relevant and current expertise, that these qualifications are and current expertise, that these qualifications are documented, and that the rational of each SME for his or documented, and that the rational of each SME for his or her judgements and recommendations are recordedher judgements and recommendations are recorded
MITREMITRE
Other Goals for Developing a Other Goals for Developing a Common Approach for Model Common Approach for Model
EvaluationEvaluation Consensus and commitment to developing Consensus and commitment to developing
and using credible modelsand using credible models Adequate resources to conduct credible Adequate resources to conduct credible
model evaluation effortsmodel evaluation efforts Qualified evaluatorsQualified evaluators FundingFunding
Provide program incentives to those who Provide program incentives to those who are expected to perform model are expected to perform model evaluationsevaluations
MITREMITRE
Other Goals for Developing a Other Goals for Developing a Common Approach for Model Common Approach for Model
EvaluationEvaluation Use independent evaluators, not the Use independent evaluators, not the
model developers, to obtain unbiased model developers, to obtain unbiased evaluation resultsevaluation results
Institutionalize the use and reuse of Institutionalize the use and reuse of models that have documented credibilitymodels that have documented credibility
Establish standards for model evaluation Establish standards for model evaluation performance to ensure quality and performance to ensure quality and integrity of evaluation effortsintegrity of evaluation efforts