View
220
Download
3
Category
Tags:
Preview:
Citation preview
Comparative Biology
observable observable
Parameters:tim
e
rates, selection
Unobservable
Evolutionary Path
observable
Most Recent
Common Ancestor
?
ATTGCGTATATAT….CAG ATTGCGTATATAT….CAG ATTGCGTATATAT….CAG
Tim
e Direction
•Which phylogeny?
•Which ancestral states?
•Which process?
Key Questions:
•Homologous objects•Co-modelling•Genealogical Structures?
Key Generalisations:
Structure of Biology: Physical Systems and Evolution
Data
SequencesStructuresExpression Levels….…•Data
M1
M1
..Mk
Models
Framework for model formulation
•Models
Scientific Texts,Systems Biology Markup Language,Process Algebras…
Knowledge and Representation
•Knowledge & Representation
Structure of Biological Systems
Atoms, Molecules, Networks, MotorsCentral Dogma, Genetic Code…
•Structure of Biological Systems
Dynamics - the system as a physical entity
Evolution - the system has evolved
Part of individuals in a population
Part of species in the tree of life
The Data
• Sequence Data
• Metabonomics/Metabolomics and Small Molecule Detection
• Expression Data
• Proteomics and Protein Interactions
• Structures from Crystallography, NMR and Cryo-EM
• Single Molecule Measurements
• Microscopy
Example of Reduction/LevelsEnzyme catalysis:
Such reductions can are based on “biological concepts”
A molecular dynamics sample path involving one catalysis event:
Set of E + S initial states ES states? Set of E + P final states
109 time steps
104 atoms
Discrete models of one catalysis event:
E + S ES E + P3-5 steps
red
uct
ion
Other clear reductions:
Individual molecules
Concentration of molecules
Set of atoms
Nucleotide
lipid molecules
Membrane
Elements of Physical Dynamic Modeling
Time Continuous Time
Discrete Time
0 1 2 k
No Time - Equilibrium
State & Space
Continuous Space Discrete SpaceNo Space or Space Homogeneity
Time/Space dependency
Discrete Time
0 1 k-i k-1 k
Deterministic
Stochastic
p0
p1
p2
p3
Discrete Time Continuous Time
Complicated
&
contentious.
Physical Dynamic Modeling: Key Models
Molecular Dynamics Quantum Mechanics Classical Potential
Continuous Time Markov Chains/ Gillespie Algorithm
Ordinary Differential Equations - ODE
Partial Differential Equations - PDE (Turing Model)
Stochastic Ordinary Differential Equations - SODE
Stochastic Partial Differential Equations - SPDE
Models on Networks Boolean Networks Kinetic Models
Elusive Biological Concepts: EmergenceOther EBCs: function, robustness, modularity, purpose, top-down, downward causation.
Strong emergence:(never observed)
The dynamic laws for k components
are not deducible from their properties
and their relationships.
Lower levelHigh dimensional detailed description
Higher levelLow dimensional
“Surprising” stable, robust properties
Re
du
cti
on
Weak emergence: something “new” emerges.
Questions: Automatic detection of emergence? How frequent is it? Does selection pull out emergent systems?
Ex.1 Network Dynamics
Oscillations, sensitive amplification
Large set of enzymes and atoms
Ex.2 Neural Networks
Ability to calculate, consciousness
Large set of cells
Levels & Objects Level Example(s) Data Modelling Techniques Atomic, Molecules -globin, water, cell
membrane Single molecules measurem ents, X-ray diffraction of crystals, N MR
Classical potentials and Newtonian Dynamics, Quantum Mechanics,
Molecular complexes Ribosome, hemoglobin, single molecule measu rements Mechanical analo gues models, Continuous Time Markov Chain with finite state space
Molecule concentrations
Concentration of meta bolites, fate of isotopes in different molecules,
ODEs (many molecules/c oncentrations), kinetics,
Metabolic Network Citric Acid Cycle Enzyme and metabolite concentrations and metabonomics
ODEs, Kinetic Models, Flux Analysis,
Regulatory Network -globins and their regulators
Expression data Boolean networks, Petri Nets, ODEs
Signal Transduction Mitogen-activated protein -kinase (MAPK)
Protein Interaction and Expression Data
ODEs, Continuous Time Markov Chains,
Protein Interaction Network
Yeast PIN Mass Spectroscopy No dynamics involved, i.e. a data type.
Motors Flagellar Motor, Microscopy, single molecule flourescen ce
Mechanical Analog ue Models
Cell(s) B-Cell, zygote, E.coli, Microscopy, expressi on data, proteomics,..
Integration of genetic, mechanical and network models.
Tissue Cancer, Partial differentia l equation (PDEs), cellular automata.
Organ Liver, lung, heart Mechanical measurem ents, Multilevel integrated modelling, including mechanics.
How to Compare?Examples
Protein Structures Networks Craniums/Shape
Homologous - Non-Homologous?
Homologous components A C G TA - T T
Matching - Similarity - Distance
Distance from shortest paths
The ideal: The probability of 1 observation * Summing over possible evolutionary trajectories to the second observation.
Informal
A set:
AG
T
AC
CT
AC
CTP( ) P( )
A pair:
“Natural” Evolutionary Modeling
Components: Birth and Death Process. Components are born with rate and die with rate.
Discrete states: Continuous Time Finite States Markov Chains. Initially all rates the same.
p0
p1
p2
p3
Continuous states: Continuous Time Continuous States Markov Process - specifically Diffusion. Initially simplest Diffusion: Brownian Motion, then Ornstein-Uhlenbeck.
Comparative BiologyNucleotides/Amino Acids
Continuous Quantities
Sequences
Gene Structure
Structure RNA Protein
Networks Metabolic Pathways Protein Interaction Regulatory Pathways Signal Transduction
Macromolecular Assemblies
Motors
Shape
Patterns
Tissue/Organs/Skeleton/….
Dynamics MD movements of proteins Locomotion
Culture
Language Vocabulary Grammar Phonetics Semantics
• Observed or predicted?
• Choice of Representation.
Comparative Biology: Evolutionary Models
Nucleotides/Amino Acids/codons CTFS continuous time finite state Jukes-Cantor 69 +500 otherContinuous Quantities CTCS Felsenstein 68 + 50 otherSequences CT countable S Thorne, Kishino Felsenstein,91 + 40Gene Structure Matching DeGroot, 07Genome Structure CTCS MMStructure RNA SCFG-model like Holmes, I. 06 + few others ProteinNetworks CT countable S Snijder, T Metabolic Pathways Protein Interaction Regulatory Pathways Signal Transduction Macromolecular Assemblies Motors IShapePatternsTissue/Organs/Skeleton/….Dynamics MD movements of proteins LocomotionCultureLanguage Vocabulary “Infinite Allele Model” (CTCS) Swadesh,52, Sankoff,72,… Grammar - Phonetics Semantics Phenotype
Object Type Reference
“Natural” Co-Modeling
• Joint evolutionary modeling of X(t),Y(t).
The ideal, rarely if ever done.
• Conditional evolutionary modeling of X(t) given Y(t). The standard in comparative genomics. The distribution of Y(t) is not derived from evolution, but from practicality.
Protein Gene Prediction
RNA structure prediction
Regulatory signal prediction.
• Y(t) deterministic function of X(t)
Movement of proteins
Protein Structures
Examples
•RNA structure prediction
•Comparative Genomics
•Networks Patterns
•Protein Structures
Structure Dependent Molecular Evolution RNA Secondary Structure
From
Durbin e t a l.(1998) B
iologica l Sequence C
ompari son
Secondary Structure : Set of paired positions.
A-U + C-G can base pair. Some other pairings can occur + triple interactions exists.
Pseudoknot – non nested pairing: i < j < k < l and i-k & j-l.
Simple String Generators
Context Free Grammar S--> aSa bSb aa bb
One sentence (even length palindromes):S--> aSa --> abSba --> abaaba
Variables (capital) Letters (small)
Regular Grammar: Start with S S --> aT bS T --> aS bT
One sentence – odd # of a’s:S-> aT -> aaS –> aabS -> aabaT -> aaba
Reg
ula
rC
on
text
Fre
e
Stochastic GrammarsThe grammars above classify all string as belonging to the language or not.
All variables has a finite set of substitution rules. Assigning probabilities to the use of each rule will assign probabilities to the strings in the language.
S -> aSa -> abSba -> abaaba
i. Start with S. S --> (0.3)aT (0.7)bS T --> (0.2)aS (0.4)bT (0.2)
If there is a 1-1 derivation (creation) of a string, the probability of a string can be obtained as the product probability of the applied rules.
S -> aT -> aaS –> aabS -> aabaT -> aaba
ii. S--> (0.3)aSa (0.5)bSb (0.1)aa (0.1)bb
*0.3
*0.3 *0.2 *0.7 *0.3 *0.2
*0.5 *0.1
S --> LS L .869 .131F --> dFd LS .788 .212L --> s dFd .895 .105
Secondary Structure Generators
Knudsen & Hein, 2003
From Knudsen & Hein (1999)
RNA Structure Application
Co-Modelling and Conditional Modelling
Observable
Observable Unobservable
Unobservable
Goldman, Thorne & Jones, 96
UC G
AC
AU
AC
Knudsen.., 99
Eddy & co.
Meyer and Durbin 02 Pedersen …, 03 Siepel & Haussler 03
Pedersen, Meyer, Forsberg…, Simmonds 2004a,b
McCauley ….
Firth & Brown
i. P(Sequence Structure)
ii. P(Structure)
)()(
)()(
SequencePSequenceStructureP
StructurePStructureSequenceP
• Conditional Modelling
Needs:Footprinting -Signals (Blanchette)
AGGTATATAATGCG..... Pcoding{ATG-->GTG} orAGCCATTTAGTGCG..... Pnon-coding{ATG-->GTG}
Network EvolutionStatistics of Networks
Comparing Networks
Networks in Cellular Biology
A. Metabolic Pathways
B. Regulatory Networks
C. Signaling Pathways
D. Protein Interaction Networks - PIN
Empirical Facts
Dynamics on Networks (models)
Models of Network Evolution
A Model for Network Inference
•A core metabolism:
•A given set of metabolites:
•A given set of possible reactions -
arrows not shown.
•A set of present reactions - M
black and red arrows
Let be the rate of deletion the rate of insertionThen
Restriction R:
A metabolism must define a connected graph
M + R defines
1. a set of deletable (dashed) edges D(M):
2. and a set of addable edges A(M):
dP(M)
dt P(M ') P(M ' ')
M ''A (M )
M 'D(M )
- P(M)[D(M) A(M) ]
Likelihood of Homologous PathwaysNumber of Metabolisms:
1 2
3 4
+ 2 symmetrical versions
P( , )=P( )P( -> )
Eleni Giannoulatou
Approaches: Continuous Time Markov Chains with computational tricks.
MCMC
Importance Sampling
PIN Network EvolutionBarabasi & Oltvai, 2004 & Berg et al. ,2004; Wiuf etal., 2006
•A gene duplicates
•Inherits it connections
•The connections can change
Berg et al. ,2004:
•Gene duplication slow ~10-9/year
•Connection evolution fast ~10-6/year
•Observed networks can be modeled as if node number was fixed.
Likelihood of PINs
•Can only handle 1 graph.
•Limited Evolution Model
de-DAing
De-con
nectin
g Data
2386 nodes and 7221 links
Irreducible (and isomorphic)
735 nodes
)0,33,.66,.1(0
Wiuf etal., 2006
The Phylogenetic Turing Patterns I
Stripes: p small Spots: p large
The Phylogenetic Turing Patterns II
Reaction-Diffusion Equations:
Analysis Tasks:1. Choose Class of Mechanisms2. Observe Empirical Patterns
3. Choose Closest set of Turing Patterns T1, T2,.., Tk,
4. Choose parameters p1, p2, .. , pk (sets?) behind T1,..
Evolutionary Modelling Tasks:
1. p(t1)-p(t2) ~ N(0, (t1-t2)) 2. Non-overlapping intervals have independent incrementsI.e. Brownian Motion
Scientific Motivation:1. Is there evolutionary information on pattern mechanisms?2. How does patterns evolve?
Known KnownUnknown
-globin Myoglobin
300 amino acid changes800 nucleotide changes1 structural change1.4 Gyr
?
?
?
?
1. Given Structure what are the possible events that could happen?
2. What are their probabilities? Old fashioned substitution + indel process with bias.
Bias: Folding(Sequence Structure) & Fitness of Structure
3. Summation over all paths.
Protein Structure
Summary: The Virtues of Comparative Modeling• It is the natural setup for much modeling and transfer of knowledge from one species/system to another.
• Even 1 system/species is an evolutionary observation:
x
P(x):
P(Further history of x):
x
U
C G
A
C
AU
A
C
Recommended