Upload
mirabel-isambero
View
17
Download
1
Tags:
Embed Size (px)
DESCRIPTION
Verification and Validation of Agent-based and Equation-based Simulations and Bioinformatics Computing: Identifying Transposable Elements in the Aedes aegypti Genome. Ryan C. Kennedy Department of Computer Science and Engineering University of Notre Dame. - PowerPoint PPT Presentation
Citation preview
04/19/23
Verification and Validation of Agent-based and Equation-
based Simulationsand
Bioinformatics Computing: Identifying Transposable
Elements in the Aedes aegypti Genome
Ryan C. KennedyDepartment of Computer Science and Engineering
University of Notre Dame
04/19/23
Verification and Validation of Agent-based and
Equation-based Simulations
04/19/23
Overview Introduction
Motivation Concepts of Verification and Validation Research Objectives and Methods
Case Study I An Agent-based Scientific Model
Case Study II An Equation-based Economic Model
Conclusion Future Work
04/19/23
Motivation NSF Blue Ribbon Panel (February 2006):
“New theory and methods are needed for handling stochastic models and for developing meaningful and efficient approaches to the quantification of uncertainties. As they stand now, verification, validation, and uncertainty quantification are challenging and necessary research areas that must be actively pursued.”
Dr. Richard W. Amos Deputy to the Commanding General, U.S. Army Aviation and
Missile Command (AMCOM) Previously the Director of the System Simulation and
Development Directorate in the Aviation and Missile Research, Development and Engineering Center (AMRDEC)
Verification and Validation 10-15% of total cost of model development, but often overlooked in
overall lifecycle
*Oden: “Simulation-Based Engineering Science: Revolutionizing Engineering Science through Simulation”
04/19/23
Model Verification & Validation (V & V) V & V
Verification: solve model right
Validation: solve right model
The cost and value influence confidence of model
Want optimal cost-effectiveness of V & V
*Adapted from Sargent: “Verification and Validation of Simulation Models”
04/19/23
Verification and Validation Process
*Adapted from Sargent: “Verification and Validation of Simulation Models” andHuang: “Agent-Based Scientific Simulation”
04/19/23
Applicable Verification and Validation Methods
*Balci: “Handbook of Simulation: Principles, Methodology, Advances, Applications, and Practice” lists more than 75 Methods
04/19/23
V & V: Subjective Analysis Examples of V & V Techniques
Face Validity Animation Graphical Representation
Turing Test Internal Validity Tracing Black-Box Testing
04/19/23
V & V: Quantitative Analysis Examples of V & V Techniques
Docking (Model-to-Model Comparison) Historical Data Validation Sensitivity Analysis/Parameter
Variability Prediction Validation
04/19/23
What and How Research objective
Perform V & V on distinct models and identify the more cost-effective techniques
How Two very different projects as case studies Evaluate and adapt the formalized V & V
techniques in industrial and system engineering
04/19/23
Case Study I:An Agent-based Scientific Model NSF funded interdisciplinary project
Understanding the evolution and heterogeneous structure of Natural Organic Matter (NOM)
E-science example Chemists, biologists, ecologists, and computer
scientists Agent-based stochastic model Web-based simulation model
04/19/23
Case Study I:NOM What is NOM?
Heterogeneous mixture of molecules in terrestrial and aquatic ecosystems
Why study NOM? Plays a crucial role in the evolution of soils, the
transport of pollutants, and the global carbon cycle
Understanding NOM helps us better understand natural ecosystems
Hard to study in laboratory
04/19/23
Case Study I:The Conceptual Model I Agents
A large number of molecules Heterogeneous properties
Elemental composition Molecular weight Characteristic functional groups
Behaviors Transport through soil pores (spatial mobility) Chemical reactions: first order and second order Sorption
04/19/23
Case Study I:The Conceptual Model II Stochastic Model
Individual behaviors and interactions are stochastically determined by:
Internal attributes Molecular structure State (adsorbed, desorbed, reacted, etc.)
External conditions Environment (pH, light intensity, etc.) Proximity to other molecules
Length of time step, Δt Space
2D Grid Structure Emergent properties
Distribution of molecular properties over time
04/19/23
Case Study I:Internal Validity I
0
500
1000
1500
2000
2500
3000
3500
4000
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
Seed
Nu
mb
er
of
Mo
lec
ule
s a
t th
e e
nd
ste
p
Before
After
04/19/23
Case Study I:Docking I Compare the model with validated one Compare the model with non-validated one Different implementations
Different programming languages Different packages
Different modeling approaches Agent-based approach vs. Equation-based approach
Powerful method
04/19/23
Case Study I:Docking IIFeatures AlphaStep No-FlowReactionDeveloping Group University of New
Mexico, Department of Chemistry
University of Notre Dame, Computer Science and Engineering
Programming language Pascal Java (Sun JDK 1.4.2)
Platforms Delphi 6, Windows Red hat Linux cluster
Running mode Standalone Web based, standalone
Simulation package None Repast toolkit
Animation None Yes
Spatial representation None 2D grid
Second order reaction Randomly pick one from list
Choose the nearest neighbor
First order with split Add to list Find empty cell nearby
04/19/23
Case Study II:An Economic Model Interdisciplinary project
Initially written in Matlab within Department of Finance
Converted to C++ by Computer Scientists Equation-based system Concerned with identifying ideal economic
variables, such as debt, money growth, and tax rate
04/19/23
Case Study II:The Conceptual Model Equation-based
system Nonlinear projection
methods used to solve Ramsey problems in a stochastic money economy
Goal is to generate the best social welfare for a given economy
Motivation
04/19/23
Case Study II:Face Verification
LaGrange Multiplier
LaborMoney Growth
Tax Rate
Cash Good
Credit Good
Matlab 0.138 0.309 -0.009 0.188 0.486 0.621
C++ 0.138 0.309 -0.009 0.188 0.486 0.621
Steady State
0.138 0.309 -0.009 0.188 0.485 0.620
04/19/23
Case Study II:Tracing
C++:
it 44, af 3.7496e-08, rc 0, timer 11.1, l 0.1382704496, m -0.0092286139, t 0.1881024991, h 0.3093668925cc1 0.4861695543, cc2 0.6212795130, rl 1.0092221442it 45, af 2.64653e-08, rc 0, timer 11.0, l 0.1382704643, m -0.0092286175, t 0.1881024947, h 0.3093668931cc1 0.4861695553, cc2 0.6212795120, rl 1.0092221442
it: 44 af: 0.00144839 rc: 0 l: 0.138359 m: -0.00936025 t: 0.188252 h: 0.309338cc1: 0.486205 cc2: 0.621244 rl: -0.65888it: 45 af: 0.00144784 rc: 0 l: 0.138401 m: -0.00937062 t: 0.188239 h: 0.30934cc1: 0.486208 cc2: 0.621241 rl: -0.665511
Matlab:
04/19/23
Case Study II:Docking
Features Matlab C++
Developing Group University of Notre Dame, Department of Finance
University of Notre Dame, Computer Science and Engineering
Language High-Level Lower-Level
Compiler Interpreted GNU Compiler
Good For Prototyping Speed
Platforms Linux, Windows Linux
Running mode Standalone Standalone
Packages LAPACK, etc… STL, GSL
Variables Implicit Declared
04/19/23
Case Study II:Performance
5 Iterations
50 Iterations
500 Iterations
Matlab 58 s 568 s 8872 s
C++ 2 s 17 s 176 s
Speedup 29x 33.4x 50.4x
Performance
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
0 100 200 300 400 500
Number of Iterations
Run
ning
Tim
e (s
)
Matlab C++
04/19/23
Summary & Conclusion Applied V & V techniques to distinct case studies to
increase model confidence Some techniques are more cost-effective
Agent-based (Stochastic)Equation-based(Deterministic)
Cost Effectiveness Cost Effectiveness
FaceValidation/Verification
Low Very Good Low Very Good
Turing Test Low Very Good Low Good
Internal Validity Moderate Very Good n/a n/a
Tracing Moderate Fair Moderate Excellent
Black-Box Testing Low Good Low Very Good
Docking Moderate Very Good Moderate Good
Historical DataVerification
Moderate Very Good Moderate Very Good
Sensitivity Analysis Moderate Good Moderate Good
PredictionValidation
Moderate Good Moderate Fair
04/19/23
Future Work More in-depth survey of V & V methods More rigorous quantitative methods Compare simulation results against
empirical data Invalidation Testing More general and formalized V & V
process model
04/19/23
Bioinformatics Computing: Identifying Transposable
Elements in the Aedes aegypti Genome
04/19/23
Overview Introduction
Motivation Basic Biological Concepts Bioinformatics
Aedes aegypti Transposable Elements Approaches to Identifying Transposable
Elements Conclusion Future Work
04/19/23
Motivation Bioinformatics field is rapidly growing
Computer scientists can help advance its study A better understanding of the biology of
organisms would be helpful to scientists Transposable elements can be useful tools to
scientists Computer scientists can help biologists
develop advanced techniques to find transposable elements
04/19/23
Biological Foundations All cells contain DNA, RNA, and protein molecules DNA
Composed of four nucleotides Building block of life
RNA Transfers DNA throughout a cell
Protein Laborer of the cell
Central Dogma of Molecular Biology:
04/19/23
Bioinformatics Collective study of numerous fields and
techniques to solve biological problems Focused on the study of DNA and its
underlying characteristics Computer science lends itself well to
bioinformatics
04/19/23
Bioinformatics Research Topics Genome Annotation
Assigning biological meaning to regions of a sequence
Sequence Alignment Comparing two or more sequences
Sequencing Finding the structure of a given sequence
Genome Assembly Assembling many short sequences of DNA
04/19/23
Bioinformatics Tools Perl
BioPerl BLAST
Popular alignment tool Hidden Markov Model Clustal X Phylogenetic Tree
Relationships between sequences Bioinformatics Collaboratories
NCBI, Ensembl, VectorBase
04/19/23
Aedes aegypti Tropical Mosquito Vector for dengue
and yellow fever viruses
Its unannotated genome recently released
Much larger genome than that of other mosquitoes
04/19/23
Transposable Elements Often referred to as “jumping genes” Can make up large portions of a genome Can transfer genetic material Useful when performing evolutionary
studies Typically divided into Class I, Class II, and
Class II elements
04/19/23
Transposons Class II transposable elements Divided into many families
piggyBac, Tc1, pogo, mariner, P element Typical structure of a transposon:
04/19/23
Typical Approach BLAST known transposons against a new
genome Good for identifying known or similar
transposons in new genomes Does not account for sequence variations
04/19/23
Approach I Focused on identifying P
elements Utilized multiple tools
and scripts Able to identify
previously unknown transposons
Clustal X and the HMMER suite allowed us to perform a more through search
Cannot account for frame shifts
04/19/23
Approach II Used for five families
of transposons Utilized GeneWise Did not search for new
transposons
04/19/23
Hybrid Approach: A Transposable Element Discovery Methodology Proposed approach Utilize better aspects
of first two approaches Can be used for all
families described in this study
04/19/23
Phylogentic Tree mariner family Clustered clades
indicate close relationships
04/19/23
Summary & Conclusion Found a reasonable
number of transposons
Utilized novel approaches to finding transposons First such study using
this type of approach on the Aedes aegypti genome
Proposed a hybrid approach
TE Number
piggyBac 12
Tc1 72
pogo 50
mariner 25
P element 9
04/19/23
Future Work Utilize hybrid approach Automate process Comparison of transposable elements
found in Aedes aegypti and Anopheles gambiae