36
Schedule Day 1: Molecular Evolution 9.00-10.15 Lecture: Models of Sequence Evolution and Statistical Alignment 10.30-12.00 Practical: Molecular Evolution (Phylogenies – PHYLIP+) 2.00-3.30 Lecture: Molecular Evolution & Comparative Genomics 3.30-5.00 Student Activity: Prepare projects Day 2: Population Biology and Mapping 9.00-10.00 Lecture: Population Genetics and Gene Genealogies 10.00-11.00 Exercise: Jukes-Cantor and Rate Matrix 11.00-12.00 Lecture: Inferring Recombination Histories 2.00-3.30 Practical: DNA Sequence Analysis (PAML Phase +) 3.30-5.00 Student Activity: Prepare projects Day 3: Integrative Genomics (IG) 9.00-10.15 Lecture: High Throughput Data, the structure of IG, GF 10.30-12.00 Practical: Statistical Alignment & Footprinting 2.00-3.30 Lecture (L): Grammars and RNA Prediction 3.30-5.00 Student Activity: Prepare projects Day 4: Integrative Genomics (IG) 9.00-10.00 Lecture: Networks and other concepts 10.00-11.00 Exercise: Stochastic Context Free Grammars 11.00-12.00 Lecture: Concepts, Data Analysis and Functional Studies 2.00-3.30 Practical: Detecting Recombinations 3.30-5.00 Student Activity: Prepare projects Day 5: Project Discussion/Presentation 9.00-10.00 Project 1 – Population Genomics: 1000 genomes 10.00-11.00 Practical – Integrative Data Analysis – Mapping 11.00-12.00 Project 2 – Comparative Genomics: Signals 2.00-3.00 Project 3 – Integrative Genomics: Basic data types 3.00-4.00 Exercise: Statistical Alignment 4.00-5.00 Project 4 – Comparative Biology: Networks

Schedule Day 1: Molecular Evolution 9.00-10.15 Lecture: Models of Sequence Evolution and Statistical Alignment 10.30-12.00 Practical: Molecular Evolution

Embed Size (px)

DESCRIPTION

The Human Genome (Harding & Sanger) * *20  globin (chromosome 11) 6*10 4 bp 3*10 9 bp *10 3 Exon 2 Exon 1 Exon 3 5’ flanking 3’ flanking 3*10 3 bp Myoglobin  globin ATTGCCATGTCGATAATTGGACTATTTGG A 30 bp aa DNA: Protein :

Citation preview

Page 1: Schedule Day 1: Molecular Evolution 9.00-10.15 Lecture: Models of Sequence Evolution and Statistical Alignment 10.30-12.00 Practical: Molecular Evolution

ScheduleDay 1: Molecular Evolution9.00-10.15 Lecture: Models of Sequence Evolution and Statistical Alignment10.30-12.00 Practical: Molecular Evolution (Phylogenies – PHYLIP+) 2.00-3.30 Lecture: Molecular Evolution & Comparative Genomics3.30-5.00 Student Activity: Prepare projects Day 2: Population Biology and Mapping9.00-10.00 Lecture: Population Genetics and Gene Genealogies10.00-11.00 Exercise: Jukes-Cantor and Rate Matrix11.00-12.00 Lecture: Inferring Recombination Histories 2.00-3.30 Practical: DNA Sequence Analysis (PAML Phase +)3.30-5.00 Student Activity: Prepare projects Day 3: Integrative Genomics (IG)9.00-10.15 Lecture: High Throughput Data, the structure of IG, GF10.30-12.00 Practical: Statistical Alignment & Footprinting2.00-3.30 Lecture (L): Grammars and RNA Prediction3.30-5.00 Student Activity: Prepare projects Day 4: Integrative Genomics (IG)9.00-10.00 Lecture: Networks and other concepts 10.00-11.00 Exercise: Stochastic Context Free Grammars11.00-12.00 Lecture: Concepts, Data Analysis and Functional Studies2.00-3.30 Practical: Detecting Recombinations 3.30-5.00 Student Activity: Prepare projects Day 5: Project Discussion/Presentation9.00-10.00 Project 1 – Population Genomics: 1000 genomes10.00-11.00 Practical – Integrative Data Analysis – Mapping11.00-12.00 Project 2 – Comparative Genomics: Signals2.00-3.00 Project 3 – Integrative Genomics: Basic data types3.00-4.00 Exercise: Statistical Alignment4.00-5.00 Project 4 – Comparative Biology: Networks

Page 2: Schedule Day 1: Molecular Evolution 9.00-10.15 Lecture: Models of Sequence Evolution and Statistical Alignment 10.30-12.00 Practical: Molecular Evolution

The Data & its growth.1976/79 The first viral genome –MS2/X174

1995 The first prokaryotic genome – H. influenzae

1996 The first unicellular eukaryotic genome - Yeast

1997 The first multicellular eukaryotic genome – C.elegans

2000 Arabidopsis thaliana, Drosophila

2001 The human genome

2002 Mouse Genome

2005+ Dog, Marsupial, Rat, Chicken, 12 Drosophilas

1.5.08: Known

>10000 viral genomes

2000 prokaryotic genomes

80 Archeobacterial genomes

A general increase in data involving higher structures and dynamics of biological systems

Page 3: Schedule Day 1: Molecular Evolution 9.00-10.15 Lecture: Models of Sequence Evolution and Statistical Alignment 10.30-12.00 Practical: Molecular Evolution

The Human Genome (Harding & Sanger)

*50.000

*20

globin(chromosome 11)

6*104 bp

3*109 bp

*103

Exon 2Exon 1 Exon 3

5’ flanking 3’ flanking3*103 bp

Myoglobin globin

ATTGCCATGTCGATAATTGGACTATTTGGA

30 bp

aa aa aa aa aa aa aa aa aa aa

DNA:

Protein:

Page 4: Schedule Day 1: Molecular Evolution 9.00-10.15 Lecture: Models of Sequence Evolution and Statistical Alignment 10.30-12.00 Practical: Molecular Evolution

ACGTC

Central Problems: History cannot be observed, only end products.

Even if History could be observed, the underlying process couldn’t !!

ACGCC

AGGCC

AGGCT

AGGCT

AGGTT

ACGTC

ACGCC

AGGCC

AGGCTAGGGC

AGGCT

AGGTT AGTGC

Page 5: Schedule Day 1: Molecular Evolution 9.00-10.15 Lecture: Models of Sequence Evolution and Statistical Alignment 10.30-12.00 Practical: Molecular Evolution

Some DefinitionsState space – a set often corresponding of possible observations ie {A,C,G,T}.

A random variable, X can take values in the state space with probabilities ie P{X=A} = ¼. The value taken often indicated by small letters - x

Stochastic Process is a set of time labeled stochastic variables Xt

ie P{X0=A, X1=C, .., X5=G} =.00122

Time can be discrete or continuous, in our context it will almost always mean natural numbers, N {0,1,2,3,4..}, or an interval on the real line, R.

Time Homogeneity – the process is the same for all t.

Markov Property: ie

P{X i X i−1,..., X0} = P{X i X i−1}

P{X i,X i−1,...,X0} = P{X0}P{X1 X0}...P{X i X i−1}

Page 6: Schedule Day 1: Molecular Evolution 9.00-10.15 Lecture: Models of Sequence Evolution and Statistical Alignment 10.30-12.00 Practical: Molecular Evolution

2) Processes in different positions of the molecule are independent, so the probability for the whole alignment will be the product of the probabilities of the individual patterns.

Simplifying Assumptions IData: s1=TCGGTA,s2=TGGTT

1) Only substitutions. s1 TCGGTA s1 TCGGA s2 TGGT-T s2 TGGTT

TGGTTTCGGTA

a - unknown

Biological setup

TT

a1a2

a3a4

a5

G G T T

C G G A

Probability of Data

TGGTT)(TCGGTA)(*)( →→=∑ PPPP

TGGTT)(TCGGA)(*)( →→=∑ PPPP

)1s()1s(*)(5

1iiiiii

aii

iaPaPaPP →→∏= ∑

=

Page 7: Schedule Day 1: Molecular Evolution 9.00-10.15 Lecture: Models of Sequence Evolution and Statistical Alignment 10.30-12.00 Practical: Molecular Evolution

Simplifying Assumptions II

3) The evolutionary process is the same in all positions

4) Time reversibility: Virtually all models of sequence evolution are time reversible. I.e. π i Pi,j(t) = πj Pj,i(t), where πi is the stationary distribution of i and Pt(i->j) the probability that state i has changed into state j after t time. This implies that

P(a)a

∑ * Pl1(ai → s1i)Pl1

(ai → s2i) = P(s1i) * Pl1 +l2(s1i → s2i)

=a

s1i s2i

l2+l1l1 l2 s2is1i

P = ∏i=1

5P(ai)

a∑ * P(ai → s1i)P(ai → s2i)

P = ∏i=1

5P(s1i)P(s1i → s2i)

Page 8: Schedule Day 1: Molecular Evolution 9.00-10.15 Lecture: Models of Sequence Evolution and Statistical Alignment 10.30-12.00 Practical: Molecular Evolution

Simplifying assumptions III

6) The rate matrix, Q, for the continuous time Markov Chain is the same at all times (and often all positions). However, it is possible to let the rate of events, ri, vary from site to site, then the term for passed time, t, will be substituted by ri*t.

5) The nucleotide at any position evolves following a continuous time Markov Chain.

T O A C G TF A -(qA,C+qA,G+qA,T) qA,C qA,G qA,T

R C qC,A -(qC,A+qC,G+qC,T) qC, G qC ,T O G qG,A qG,C -(qG,A+qG,C+qG,T) qG,T

M T qT,A qT,C qT,G -(qT,A+qT,C+qT,G)

Pi,j(t) continuous time markov chain on the state space {A,C,G,T}.

Q - rate matrix:

t1 t2CC

A

ijji q

P=>−

)(lim ,

0 iiii q

P−=

−>−

1)(lim ,

0

Page 9: Schedule Day 1: Molecular Evolution 9.00-10.15 Lecture: Models of Sequence Evolution and Statistical Alignment 10.30-12.00 Practical: Molecular Evolution

i. P(0) = I

Q and P(t)What is the probability of going from i (C?) to j (G?) in time t with rate matrix Q?

vi. QE=0 Eij=1 (all i,j) vii. PE=E viii. If AB=BA, then eA+B=eAeB.

ii. P() close to I+Q for small

iii. P'(0) = Q.

iv. lim P(t) has the equilibrium frequencies of the 4 nucleotides in each row

v. Waiting time in state j, Tj, P(Tj > t) = eqjj

t

.......!3)(

!2)(

!)()exp()(

32

0

++++=== ∑∞

=

tQtQtQIitQtQtP

i

i

Expected number of events at equilibrium

t −qiiπ inucleotides

Page 10: Schedule Day 1: Molecular Evolution 9.00-10.15 Lecture: Models of Sequence Evolution and Statistical Alignment 10.30-12.00 Practical: Molecular Evolution

Jukes-Cantor (JC69): Total SymmetryRate-matrix, R: T O A C G T

F A R C O G M T

P = P(s1)∏i=1

5P(s1i → s2i) = ( 1

4)5P(T → T)P(C → G)P(G → G)P(G → T)P(A → T)

= (14

)5( 14

)5(1+ 3e−4 a )2(1− e−4a )3

Stationary Distribution: (1,1,1,1)/4.

Transition prob. after time t, a = *t:

P(equal) = ¼(1 + 3e-4*a ) ~ 1 - 3a P(specific difference) = ¼(1 - e-4*a ) ~ 3a

Page 11: Schedule Day 1: Molecular Evolution 9.00-10.15 Lecture: Models of Sequence Evolution and Statistical Alignment 10.30-12.00 Practical: Molecular Evolution

Principle of Inference: LikelihoodLikelihood function L() – the probability of data as function of parameters: L(Q,D)

If the data is a series of independent experiments L() will become a product of Likelihoods of each experiment, l() will become the sum of LogLikelihoods of each experiment

In Likelihood analysis parameter is not viewed as a random variable.

increases.data as (D)ˆ:yConsistenc trueQ→Q

LogLikelihood Function – l(): ln(L(Q,D))

LikelihoodLogLikelihood

Page 12: Schedule Day 1: Molecular Evolution 9.00-10.15 Lecture: Models of Sequence Evolution and Statistical Alignment 10.30-12.00 Practical: Molecular Evolution

From Q to P for Jukes-Cantor

⎥⎥⎥⎥

⎢⎢⎢⎢

−−

−−

=

⎥⎥⎥⎥

⎢⎢⎢⎢

−−

−−

3111131111311113

33

33

α

αααααααααααααααα

⎥⎥⎥⎥

⎢⎢⎢⎢

−−

−−

=

⎥⎥⎥⎥

⎢⎢⎢⎢

−−

−−

3111131111311113

4

3111131111311113

1i

i

−3α α α αα −3α α αα α −3α αα α α −3α

⎢ ⎢ ⎢ ⎢

⎥ ⎥ ⎥ ⎥i= 0

i

t i /i!=1/4[I − (−4αt)i

−3 1 1 11 −3 1 11 1 −3 11 1 1 −3

⎢ ⎢ ⎢ ⎢

⎥ ⎥ ⎥ ⎥i=1

∑ /i!] =

1/4[I +

3 −1 −1 −1−1 3 −1 −1−1 −1 3 −1−1 −1 −1 3

⎢ ⎢ ⎢ ⎢

⎥ ⎥ ⎥ ⎥e−4αt ]

Page 13: Schedule Day 1: Molecular Evolution 9.00-10.15 Lecture: Models of Sequence Evolution and Statistical Alignment 10.30-12.00 Practical: Molecular Evolution

Exponentiation/Powering of Matrices

Qi = BΛB−1BΛB−1...BΛB−1 = BΛiB−1then

Q = BΛB−1

Λ=

l1 0 0 00 λ2 0 00 0 λ3 00 0 0 λ4

⎜ ⎜ ⎜ ⎜

⎟ ⎟ ⎟ ⎟

If where

(tQ)i

i!i= 0

∑ = (tBΛB−1)i

i!= B[ (tΛ)i

i!i= 0

∑i= 0

∑ ]B−1 = B

exp tλ1 0 0 00 exp tλ 2 0 00 0 exp tλ 3 00 0 0 exp tλ 4

⎜ ⎜ ⎜ ⎜

⎟ ⎟ ⎟ ⎟B−1and

Finding Λ: det (Q-lI)=0

By eigen values:

Numerically:

(tQ)i

i!i= 0

∑ ~ (tQ)i

i!i= 0

k

∑ where k ~6-10

JC69:

P(t) =

1 1/4 0 11 1/4 0 −11 −1/4 1 01 −1/4 −1 0

⎜ ⎜ ⎜ ⎜

⎟ ⎟ ⎟ ⎟

1 0 0 00 exp− 4tα 0 00 0 exp− 4tα 00 0 0 exp− 4tα

⎜ ⎜ ⎜ ⎜

⎟ ⎟ ⎟ ⎟

1/4 1/4 1/4 1/41/8 1/8 −1/8 −1/80 0 1 −11 −1 0 0

⎜ ⎜ ⎜ ⎜

⎟ ⎟ ⎟ ⎟

Finding : (Q-liI)bi=0

Page 14: Schedule Day 1: Molecular Evolution 9.00-10.15 Lecture: Models of Sequence Evolution and Statistical Alignment 10.30-12.00 Practical: Molecular Evolution

Kimura 2-parameter model - K80 TO A C G T

F A - R C O G M T a = *t b = *t

Q:

P(t)start

)21(25. )(24 bab ee +−− ++

)1(25. 4be−−

)1(25. 4be−−

)21(25. )(24 bab ee +−− −+

Page 15: Schedule Day 1: Molecular Evolution 9.00-10.15 Lecture: Models of Sequence Evolution and Statistical Alignment 10.30-12.00 Practical: Molecular Evolution

Unequal base composition: (Felsenstein, 1981 F81)

Qi,j = C*πj i unequal j

Felsenstein81 & Hasegawa, Kishino & Yano 85

Tv/Tr & compostion bias (Hasegawa, Kishino & Yano, 1985 HKY85)

()*C*πj i- >j a transition Qi,j = C*πj i- >j a transversion

Rates to frequent nucleotides are high - (π =(πA , πC , πG , πT)

Tv/Tr = (πT πC +πA πG )/[(πT+πC )(πA+ πG )]A

G

T

C

Tv/Tr = () (πT πC +πA πG )/[(πT+πC )(πA+ πG )]

Page 16: Schedule Day 1: Molecular Evolution 9.00-10.15 Lecture: Models of Sequence Evolution and Statistical Alignment 10.30-12.00 Practical: Molecular Evolution

Measuring Selection ThrSerACGTCA

Certain events have functional consequences and will be selected out. The strength and localization of this selection is of great interest.

ThrProPro ACGCCA

-

ArgSer AGGCCG

-

The selection criteria could in principle be anything, but the selection against amino acid changes is without comparison the most important

ThrSer ACGCCG

ThrSer ACTCTG

AlaSer GCTCTG

AlaSer GCACTG

Page 17: Schedule Day 1: Molecular Evolution 9.00-10.15 Lecture: Models of Sequence Evolution and Statistical Alignment 10.30-12.00 Practical: Molecular Evolution

The Genetic Code

i.

3 classes of sites:

4

2-2

1-1-1-1

Problems:

i. Not all fit into those categories.

ii. Change in on site can change the status of another.

4 (3rd) 1-1-1-1 (3rd)

ii. TA (2nd)

Page 18: Schedule Day 1: Molecular Evolution 9.00-10.15 Lecture: Models of Sequence Evolution and Statistical Alignment 10.30-12.00 Practical: Molecular Evolution

Possible events if the genetic code remade from Li,1997

Substitutions Number Percent

Total in all codons 549 100

Synonymous 134 25

Nonsynonymous 415 75

Missense 392 71

Nonsense 23 4

Possible number of substitutions: 61 (codons)*3 (positions)*3 (alternative nucleotides).

Page 19: Schedule Day 1: Molecular Evolution 9.00-10.15 Lecture: Models of Sequence Evolution and Statistical Alignment 10.30-12.00 Practical: Molecular Evolution

Kimura’s 2 parameter model & Li’s Model.

Selection on the 3 kinds of sites (a,b)(?,?)

1-1-1-1 (f*,f*)

2-2 (,f*)

4 (, )

Rates:start

Probabilities:

)21(25. )(24 bab ee +−− ++

)1(25. 4be−−

)1(25. 4be−−

)21(25. )(24 bab ee +−− −+

Page 20: Schedule Day 1: Molecular Evolution 9.00-10.15 Lecture: Models of Sequence Evolution and Statistical Alignment 10.30-12.00 Practical: Molecular Evolution

Sites Total Conserved Transitions Transversions1-1-1-1 274 246 (.8978) 12(.0438) 16(.0584)2-2 77 51 (.6623) 21(.2727) 5(.0649)4 78 47 (.6026) 16(.2051) 15(.1923)

alpha-globin from rabbit and mouse.Ser Thr Glu Met Cys Leu Met Gly GlyTCA ACT GAG ATG TGT TTA ATG GGG GGA * * * * * * * **TCG ACA GGG ATA TAT CTA ATG GGT ATASer Thr Gly Ile Tyr Leu Met Gly Ile

Z(t,t) = .50[1+exp(-2t) - 2exp(-t(+)] transition Y(t,t) = .25[1-exp(-2t )] transversionX(t,t) = .25[1+exp(-2t) + 2exp(-t(+)] identity

L(observations,a,b,f)= C(429,274,77,78)* {X(a*f,b*f)246*Y(a*f,b*f)12*Z(a*f,b*f)16}* {X(a,b*f)51*Y(a,b*f)21*Z(a,b*f)5}*{X(a,b)47*Y(a,b)16*Z(a,b)15}

where a = at and b = bt.

Estimated Parameters: a = 0.3003 b = 0.1871 2*b = 0.3742 (a + 2*b) = 0.6745 f = 0.1663

Transitions Transversions1-1-1-1 a*f = 0.0500 2*b*f = 0.06222-2 a = 0.3004 2*b*f = 0.06224 a = 0.3004 2*b = 0.3741

Expected number of: replacement substitutions 35.49 synonymous 75.93Replacement sites : 246 + (0.3742/0.6744)*77 = 314.72Silent sites : 429 - 314.72 = 114.28 K s = .6644 Ka = .1127

Page 21: Schedule Day 1: Molecular Evolution 9.00-10.15 Lecture: Models of Sequence Evolution and Statistical Alignment 10.30-12.00 Practical: Molecular Evolution

Approaches to Sequence Analysis

s2 s3 s4s1

statistics

GT-CAT

GTTGGT

GT-CA-

CT-CA-

Parsimony, similarity, optimisation.

Data {GTCAT,GTTGGT,GTCA,CTCA}

Actual Practice: 2 phase analysis.

Ideal Practice: 1 phase analysis.

1. TKF91 - The combined

substitution/indel process.

2. Acceleration of Basic

Algorithm

3. Many Sequence Algorithm

4. MCMC Approaches

Page 22: Schedule Day 1: Molecular Evolution 9.00-10.15 Lecture: Models of Sequence Evolution and Statistical Alignment 10.30-12.00 Practical: Molecular Evolution

Thorne-Kishino-Felsenstein (1991) Process

l (birth rate) m (death rate)

A # C G

# ##

#

T= 0

T = t#

s2

s1

s1 s2

rs1 s22. Time reversible:

1. P(s) = (1-lm)(lm)l A #A* .. * T

#T l =length(s)

# - - -

# # # #

*

Page 23: Schedule Day 1: Molecular Evolution 9.00-10.15 Lecture: Models of Sequence Evolution and Statistical Alignment 10.30-12.00 Practical: Molecular Evolution

l & m into Alignment BlocksA. Amino Acids Ignored:

e-mt[1-l](l)k-1

# - - - # # # # k

# - - - -- # # # # k

=[1-e(lm)t]/[mle(lm)t]

pk(t) p’k(t)

[1-l-m](l)k

p’0(t)= m(t)

* - - - -* # # # # k[1-l](l)k

p’’k(t)

B. Amino Acids Considered:

T - - -R Q S W Pt(T-->R)*Q*..*W*p4(t) 4

T - - - -- R Q S W R *Q*..*W*p’4(t) 4

Page 24: Schedule Day 1: Molecular Evolution 9.00-10.15 Lecture: Models of Sequence Evolution and Statistical Alignment 10.30-12.00 Practical: Molecular Evolution

# - - ... -# # # ... #

Differential Equations for p-functions

# - - - ... -- # # # ... #

* - - - ... -* # # # ... #

Initial Conditions: pk(0)= pk’’(0)= p’k (0)= 0 k>1 p1(0)= p0’’(0)= 1. p’0 (0)= 0

pk = t*[l*(k-1) pk-1 + m*k*pk+1 - (l+m)*k*pk]

p’k=t*[l*(k-1) p’k-1+m*(k+1)*p’k+1-(l+m)*k*p’k+m*pk+1]

p’’k=t*[l*k*p’’k-1+m*(k+1)*p’’k+1- [(k+1)l+km]*p’’k]

Page 25: Schedule Day 1: Molecular Evolution 9.00-10.15 Lecture: Models of Sequence Evolution and Statistical Alignment 10.30-12.00 Practical: Molecular Evolution

Basic Pairwise Recursion (O(length3))

Survives: Dies:

i-1j-2

i

j

i-1 i

j-1 j

……………………

1… j (j) cases

……………………

j

i-1 i

j

ii-1

j-1

])[2(*'*)21( 111 jspssP ji −− →

0… j (j+1) cases

…………………………………………

……………………

i

j

P(s1i → s2 j )

(s2[ j])

f (s1[i],s2[ j −1])

p2

P(s1i−1 → s2 j−2)

e-mt[1-l](l)k-1, where

=[1-e(lm)t]/[mle(lm)t]

Page 26: Schedule Day 1: Molecular Evolution 9.00-10.15 Lecture: Models of Sequence Evolution and Statistical Alignment 10.30-12.00 Practical: Molecular Evolution

Basic Pairwise Recursion (O(length3))

(i,j)

i

j

i-1

j-1

(i-1,j)

(i-1,j-1)

survive

death

(i-1,j-k)

…………..

…………..…………..

Initial condition:

p’’=s2[1:j]

Page 27: Schedule Day 1: Molecular Evolution 9.00-10.15 Lecture: Models of Sequence Evolution and Statistical Alignment 10.30-12.00 Practical: Molecular Evolution

Accelleration of Pairwise Algorithm(From Hein,Wiuf,Knudsen,Moeller & Wiebling 2000)

Corner Cutting ~100-1000

Better Numerical Search ~10-100Ex.: good start guess, 28 evaluations, 3 iterations

Simpler Recursion ~3-10

Faster Computers ~250

1991-->2000 ~106

Page 28: Schedule Day 1: Molecular Evolution 9.00-10.15 Lecture: Models of Sequence Evolution and Statistical Alignment 10.30-12.00 Practical: Molecular Evolution

-globin (141) and -globin (146)(From Hein,Wiuf,Knudsen,Moeller & Wiebling 2000)

430.108 : -log(-globin) 327.320 : -log(-globin --> -globin) 747.428 : -log(-globin, -globin) = -log(l(sumalign))

l*t: 0.0371805 +/- 0.0135899m*t: 0.0374396 +/- 0.0136846s*t: 0.91701 +/- 0.119556

E(Length) E(Insertions,Deletions) E(Substitutions) 143.499 5.37255 131.59

Maximum contributing alignment:

V-LSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHF-DLS--H---GSAQVKGHGKKVADALTVHLTPEEKSAVTALWGKV--NVDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGNPKVKAHGKKVLGAFS

NAVAHVDDMPNALSALSDLHAHKLRVDPVNFKLLSHCLLVTLAAHLPAEFTPAVHASLDKFLASVSTVLTSKYRDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVANALAHKYH

Ratio l(maxalign)/l(sumalign) = 0.00565064

Page 29: Schedule Day 1: Molecular Evolution 9.00-10.15 Lecture: Models of Sequence Evolution and Statistical Alignment 10.30-12.00 Practical: Molecular Evolution

The invasion of the immortal link

VLSPADNAL.....DLHAHKR 141 AA long

???????????????????? k AA long

2 107 years

2 108 years

2 109 year s

*########### …. ### 141 AA long

*########### …. ###

*########### …. ###

109 years

Page 30: Schedule Day 1: Molecular Evolution 9.00-10.15 Lecture: Models of Sequence Evolution and Statistical Alignment 10.30-12.00 Practical: Molecular Evolution

Algorithm for alignment on star tree (O(length6))(Steel & Hein, 2001)

* (lm)*######

P(S) =(1−λμ

)[P*(S)+λμ

P#(Tail∑ )P(S−Tail)]

a

s1 s2

s3

*ACGC *TT GT

*ACG GT

Page 31: Schedule Day 1: Molecular Evolution 9.00-10.15 Lecture: Models of Sequence Evolution and Statistical Alignment 10.30-12.00 Practical: Molecular Evolution

Binary Tree Problem

The problem would be simpler if:

s1

s2

s3

s4

a1 a2

ACCT

GTT

TGA

ACG

A Markov chain generating ancestral alignments can solve the problem!!

a1 a2* *# ## -- ## #- #

i. The ancestral sequences & their alignment was known.

ii. The alignment of ancestral alignment columns to leaf sequences was known

How to sum over all possible ancestral sequences and their alignments?:

Page 32: Schedule Day 1: Molecular Evolution 9.00-10.15 Lecture: Models of Sequence Evolution and Statistical Alignment 10.30-12.00 Practical: Molecular Evolution

- # # E # # - E ** l lm (1 l)e-m lm (1 l)(1 e-

m) (1 lm) (1 l)

## l lm (1 l)e-m lm (1 l)(1 e-

m) (1 lm) (1 l) _# l lm (1 l)e-m lm (1 l)(1 e-

m) (1 lm) (1 l)

#- l

1−λβe−μ

1−e−μ

λβe−μ

1−e−μ

(μ−λ)β1−e−μ

Generating Ancestral Alignments

a1 *a2 *

- #l

# # lm (1 l)e-

m

E E (1 lm) (1 l)

Page 33: Schedule Day 1: Molecular Evolution 9.00-10.15 Lecture: Models of Sequence Evolution and Statistical Alignment 10.30-12.00 Practical: Molecular Evolution

The Basic Recursion

S E

”Remove 1st step” - recursion:

”Remove last step” - recursion:

Last/First step removal are inequivalent, but have the same complexities. First step algorithm is the simplest.

Page 34: Schedule Day 1: Molecular Evolution 9.00-10.15 Lecture: Models of Sequence Evolution and Statistical Alignment 10.30-12.00 Practical: Molecular Evolution

Sequence Recursion: First Step Removal

ε∑

i∈Sα∑ P'(kSi,H α)

H ∈Cα∑ P(α → ε)Pε(Si)

P(Sk): Epifixes (S[k+1:l]) starting in given MC starts in .

P(Sk) = E

( p 'k

j : H ( j ) = 0

∏ ( tj

) πsj

[ i ( j ) : k ( j )])( pk

j : H ( j ) = 1

∏ ( tj

) πsj

[ i ( j ) + 1 : k ( j )])F(kSi,H)

Where P’(kS i,H) =

Page 35: Schedule Day 1: Molecular Evolution 9.00-10.15 Lecture: Models of Sequence Evolution and Statistical Alignment 10.30-12.00 Practical: Molecular Evolution

Human alpha hemoglobin;Human beta hemoglobin;Human myoglobinBean leghemoglobin

Probability of data e -1560.138

Probability of data and alignment e-1593.223

Probability of alignment given data 4.279 * 10-15 = e-33.085

Ratio of insertion-deletions to substitutions: 0.0334

Maximum likelihood phylogeny and alignmentGerton Lunter

Istvan Miklos

Alexei Drummond

Yun Song

Page 36: Schedule Day 1: Molecular Evolution 9.00-10.15 Lecture: Models of Sequence Evolution and Statistical Alignment 10.30-12.00 Practical: Molecular Evolution

Metropolis-Hastings Statistical AlignmentLunter, Drummond, Miklos, Jensen & Hein, 2005