50
03/22/22 of Agent-based and Equation-based Simulations and Bioinformatics Computing: Identifying Transposable Elements in the Aedes aegypti Genome Ryan C. Kennedy Department of Computer Science and Engineering University of Notre Dame

Ryan C. Kennedy Department of Computer Science and Engineering University of Notre Dame

Embed Size (px)

DESCRIPTION

Verification and Validation of Agent-based and Equation-based Simulations and Bioinformatics Computing: Identifying Transposable Elements in the Aedes aegypti Genome. Ryan C. Kennedy Department of Computer Science and Engineering University of Notre Dame. - PowerPoint PPT Presentation

Citation preview

04/19/23

Verification and Validation of Agent-based and Equation-

based Simulationsand

Bioinformatics Computing: Identifying Transposable

Elements in the Aedes aegypti Genome

Ryan C. KennedyDepartment of Computer Science and Engineering

University of Notre Dame

04/19/23

Verification and Validation of Agent-based and

Equation-based Simulations

04/19/23

Overview Introduction

Motivation Concepts of Verification and Validation Research Objectives and Methods

Case Study I An Agent-based Scientific Model

Case Study II An Equation-based Economic Model

Conclusion Future Work

04/19/23

Motivation NSF Blue Ribbon Panel (February 2006):

“New theory and methods are needed for handling stochastic models and for developing meaningful and efficient approaches to the quantification of uncertainties. As they stand now, verification, validation, and uncertainty quantification are challenging and necessary research areas that must be actively pursued.”

Dr. Richard W. Amos Deputy to the Commanding General, U.S. Army Aviation and

Missile Command (AMCOM) Previously the Director of the System Simulation and

Development Directorate in the Aviation and Missile Research, Development and Engineering Center (AMRDEC)

Verification and Validation 10-15% of total cost of model development, but often overlooked in

overall lifecycle

*Oden: “Simulation-Based Engineering Science: Revolutionizing Engineering Science through Simulation”

04/19/23

Model Verification & Validation (V & V) V & V

Verification: solve model right

Validation: solve right model

The cost and value influence confidence of model

Want optimal cost-effectiveness of V & V

*Adapted from Sargent: “Verification and Validation of Simulation Models”

04/19/23

Verification and Validation Process

*Adapted from Sargent: “Verification and Validation of Simulation Models” andHuang: “Agent-Based Scientific Simulation”

04/19/23

Applicable Verification and Validation Methods

*Balci: “Handbook of Simulation: Principles, Methodology, Advances, Applications, and Practice” lists more than 75 Methods

04/19/23

V & V: Subjective Analysis Examples of V & V Techniques

Face Validity Animation Graphical Representation

Turing Test Internal Validity Tracing Black-Box Testing

04/19/23

V & V: Quantitative Analysis Examples of V & V Techniques

Docking (Model-to-Model Comparison) Historical Data Validation Sensitivity Analysis/Parameter

Variability Prediction Validation

04/19/23

What and How Research objective

Perform V & V on distinct models and identify the more cost-effective techniques

How Two very different projects as case studies Evaluate and adapt the formalized V & V

techniques in industrial and system engineering

04/19/23

Case Study I:An Agent-based Scientific Model NSF funded interdisciplinary project

Understanding the evolution and heterogeneous structure of Natural Organic Matter (NOM)

E-science example Chemists, biologists, ecologists, and computer

scientists Agent-based stochastic model Web-based simulation model

04/19/23

Case Study I:NOM What is NOM?

Heterogeneous mixture of molecules in terrestrial and aquatic ecosystems

Why study NOM? Plays a crucial role in the evolution of soils, the

transport of pollutants, and the global carbon cycle

Understanding NOM helps us better understand natural ecosystems

Hard to study in laboratory

04/19/23

Case Study I:The Conceptual Model I Agents

A large number of molecules Heterogeneous properties

Elemental composition Molecular weight Characteristic functional groups

Behaviors Transport through soil pores (spatial mobility) Chemical reactions: first order and second order Sorption

04/19/23

Case Study I:The Conceptual Model II Stochastic Model

Individual behaviors and interactions are stochastically determined by:

Internal attributes Molecular structure State (adsorbed, desorbed, reacted, etc.)

External conditions Environment (pH, light intensity, etc.) Proximity to other molecules

Length of time step, Δt Space

2D Grid Structure Emergent properties

Distribution of molecular properties over time

04/19/23

Case Study I:Implementations

04/19/23

Case Study I:Face Validity

04/19/23

Case Study I:Internal Validity I

0

500

1000

1500

2000

2500

3000

3500

4000

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Seed

Nu

mb

er

of

Mo

lec

ule

s a

t th

e e

nd

ste

p

Before

After

04/19/23

Case Study I:Internal Validity II

04/19/23

Case Study I:Docking I Compare the model with validated one Compare the model with non-validated one Different implementations

Different programming languages Different packages

Different modeling approaches Agent-based approach vs. Equation-based approach

Powerful method

04/19/23

Case Study I:Docking IIFeatures AlphaStep No-FlowReactionDeveloping Group University of New

Mexico, Department of Chemistry

University of Notre Dame, Computer Science and Engineering

Programming language Pascal Java (Sun JDK 1.4.2)

Platforms Delphi 6, Windows Red hat Linux cluster

Running mode Standalone Web based, standalone

Simulation package None Repast toolkit

Animation None Yes

Spatial representation None 2D grid

Second order reaction Randomly pick one from list

Choose the nearest neighbor

First order with split Add to list Find empty cell nearby

04/19/23

Case Study I:Docking III

04/19/23

Case Study I:Docking IV

04/19/23

Case Study I:Docking V

04/19/23

Case Study II:An Economic Model Interdisciplinary project

Initially written in Matlab within Department of Finance

Converted to C++ by Computer Scientists Equation-based system Concerned with identifying ideal economic

variables, such as debt, money growth, and tax rate

04/19/23

Case Study II:The Conceptual Model Equation-based

system Nonlinear projection

methods used to solve Ramsey problems in a stochastic money economy

Goal is to generate the best social welfare for a given economy

Motivation

04/19/23

Case Study II:Face Verification

LaGrange Multiplier

LaborMoney Growth

Tax Rate

Cash Good

Credit Good

Matlab 0.138 0.309 -0.009 0.188 0.486 0.621

C++ 0.138 0.309 -0.009 0.188 0.486 0.621

Steady State

0.138 0.309 -0.009 0.188 0.485 0.620

04/19/23

Case Study II:Tracing

C++:

it 44, af 3.7496e-08, rc 0, timer 11.1, l 0.1382704496, m -0.0092286139, t 0.1881024991, h 0.3093668925cc1 0.4861695543, cc2 0.6212795130, rl 1.0092221442it 45, af 2.64653e-08, rc 0, timer 11.0, l 0.1382704643, m -0.0092286175, t 0.1881024947, h 0.3093668931cc1 0.4861695553, cc2 0.6212795120, rl 1.0092221442

it: 44 af: 0.00144839 rc: 0 l: 0.138359 m: -0.00936025 t: 0.188252 h: 0.309338cc1: 0.486205 cc2: 0.621244 rl: -0.65888it: 45 af: 0.00144784 rc: 0 l: 0.138401 m: -0.00937062 t: 0.188239 h: 0.30934cc1: 0.486208 cc2: 0.621241 rl: -0.665511

Matlab:

04/19/23

Case Study II:Docking

Features Matlab C++

Developing Group University of Notre Dame, Department of Finance

University of Notre Dame, Computer Science and Engineering

Language High-Level Lower-Level

Compiler Interpreted GNU Compiler

Good For Prototyping Speed

Platforms Linux, Windows Linux

Running mode Standalone Standalone

Packages LAPACK, etc… STL, GSL

Variables Implicit Declared

04/19/23

Case Study II:Performance

5 Iterations

50 Iterations

500 Iterations

Matlab 58 s 568 s 8872 s

C++ 2 s 17 s 176 s

Speedup 29x 33.4x 50.4x

Performance

0

1000

2000

3000

4000

5000

6000

7000

8000

9000

10000

0 100 200 300 400 500

Number of Iterations

Run

ning

Tim

e (s

)

Matlab C++

04/19/23

Summary & Conclusion Applied V & V techniques to distinct case studies to

increase model confidence Some techniques are more cost-effective

Agent-based (Stochastic)Equation-based(Deterministic)

Cost Effectiveness Cost Effectiveness

FaceValidation/Verification

Low Very Good Low Very Good

Turing Test Low Very Good Low Good

Internal Validity Moderate Very Good n/a n/a

Tracing Moderate Fair Moderate Excellent

Black-Box Testing Low Good Low Very Good

Docking Moderate Very Good Moderate Good

Historical DataVerification

Moderate Very Good Moderate Very Good

Sensitivity Analysis Moderate Good Moderate Good

PredictionValidation

Moderate Good Moderate Fair

04/19/23

Future Work More in-depth survey of V & V methods More rigorous quantitative methods Compare simulation results against

empirical data Invalidation Testing More general and formalized V & V

process model

04/19/23

Bioinformatics Computing: Identifying Transposable

Elements in the Aedes aegypti Genome

04/19/23

Overview Introduction

Motivation Basic Biological Concepts Bioinformatics

Aedes aegypti Transposable Elements Approaches to Identifying Transposable

Elements Conclusion Future Work

04/19/23

Motivation Bioinformatics field is rapidly growing

Computer scientists can help advance its study A better understanding of the biology of

organisms would be helpful to scientists Transposable elements can be useful tools to

scientists Computer scientists can help biologists

develop advanced techniques to find transposable elements

04/19/23

Biological Foundations All cells contain DNA, RNA, and protein molecules DNA

Composed of four nucleotides Building block of life

RNA Transfers DNA throughout a cell

Protein Laborer of the cell

Central Dogma of Molecular Biology:

04/19/23

Bioinformatics Collective study of numerous fields and

techniques to solve biological problems Focused on the study of DNA and its

underlying characteristics Computer science lends itself well to

bioinformatics

04/19/23

Bioinformatics Research Topics Genome Annotation

Assigning biological meaning to regions of a sequence

Sequence Alignment Comparing two or more sequences

Sequencing Finding the structure of a given sequence

Genome Assembly Assembling many short sequences of DNA

04/19/23

Bioinformatics Tools Perl

BioPerl BLAST

Popular alignment tool Hidden Markov Model Clustal X Phylogenetic Tree

Relationships between sequences Bioinformatics Collaboratories

NCBI, Ensembl, VectorBase

04/19/23

Aedes aegypti Tropical Mosquito Vector for dengue

and yellow fever viruses

Its unannotated genome recently released

Much larger genome than that of other mosquitoes

04/19/23

Transposable Elements Often referred to as “jumping genes” Can make up large portions of a genome Can transfer genetic material Useful when performing evolutionary

studies Typically divided into Class I, Class II, and

Class II elements

04/19/23

Transposons Class II transposable elements Divided into many families

piggyBac, Tc1, pogo, mariner, P element Typical structure of a transposon:

04/19/23

Typical Approach BLAST known transposons against a new

genome Good for identifying known or similar

transposons in new genomes Does not account for sequence variations

04/19/23

Approach I Focused on identifying P

elements Utilized multiple tools

and scripts Able to identify

previously unknown transposons

Clustal X and the HMMER suite allowed us to perform a more through search

Cannot account for frame shifts

04/19/23

04/19/23

Approach II Used for five families

of transposons Utilized GeneWise Did not search for new

transposons

04/19/23

Hybrid Approach: A Transposable Element Discovery Methodology Proposed approach Utilize better aspects

of first two approaches Can be used for all

families described in this study

04/19/23

Phylogentic Tree mariner family Clustered clades

indicate close relationships

04/19/23

Summary & Conclusion Found a reasonable

number of transposons

Utilized novel approaches to finding transposons First such study using

this type of approach on the Aedes aegypti genome

Proposed a hybrid approach

TE Number

piggyBac 12

Tc1 72

pogo 50

mariner 25

P element 9

04/19/23

Future Work Utilize hybrid approach Automate process Comparison of transposable elements

found in Aedes aegypti and Anopheles gambiae

04/19/23

Questions or Comments?