32
MAT 4830 Mathematical Modeling 4.5 Phylogenetic Distances I http://myhome.spu.edu/lauw

MAT 4830 Mathematical Modeling 4.5 Phylogenetic Distances I

Embed Size (px)

Citation preview

Page 1: MAT 4830 Mathematical Modeling 4.5 Phylogenetic Distances I

MAT 4830Mathematical Modeling

4.5

Phylogenetic Distances I

http://myhome.spu.edu/lauw

Page 2: MAT 4830 Mathematical Modeling 4.5 Phylogenetic Distances I

Preview

Phylogenetic: of or relating to the evolutionary development of organisms

Estimate the amount of total mutations (observed and hidden mutations).

Page 3: MAT 4830 Mathematical Modeling 4.5 Phylogenetic Distances I

Example from 4.1

S0 : Ancestral sequenceS1 : Descendant of S0S2 : Descendant of S1

S0 : ATGTCGCCTGATAATGCC

S1 : ATGCCGCTTGACAATGCC

S2 : ATGCCGCGTGATAATGCC

Page 4: MAT 4830 Mathematical Modeling 4.5 Phylogenetic Distances I

Example from 4.1

S0 : Ancestral sequenceS1 : Descendant of S0S2 : Descendant of S1

S0 : ATGTCGCCTGATAATGCC

S1 : ATGCCGCTTGACAATGCC

S2 : ATGCCGCGTGATAATGCC

Observed mutations: 2

Page 5: MAT 4830 Mathematical Modeling 4.5 Phylogenetic Distances I

Example from 4.1

S0 : Ancestral sequenceS1 : Descendant of S0S2 : Descendant of S1

S0 : ATGTCGCCTGATAATGCC

S1 : ATGCCGCTTGACAATGCC

S2 : ATGCCGCGTGATAATGCC

Actual mutations: 5

Page 6: MAT 4830 Mathematical Modeling 4.5 Phylogenetic Distances I

Example from 4.1

S0 : Ancestral sequenceS1 : Descendant of S0S2 : Descendant of S1

S0 : ATGTCGCCTGATAATGCC

S1 : ATGCCGCTTGACAATGCC

S2 : ATGCCGCGTGATAATGCC

Actual mutations: 5, (some are hidden mutations)

Page 7: MAT 4830 Mathematical Modeling 4.5 Phylogenetic Distances I

Distance of Two Sequences

We want to define the “distance” between two sequences.

It measures the average no. of mutations per site that occurred, including the hidden ones.

S0 : ATGTCGCCTGATAATGCC

S : ATGCCGCGTGATAATGCC

Page 8: MAT 4830 Mathematical Modeling 4.5 Phylogenetic Distances I

Distance of Two Sequences

Let d(S0,S) be the distance between sequences S0 and S. What properties it “should” have?

1.

2.

3.S0 : ATGTCGCCTGATAATGCC

S : ATGCCGCGTGATAATGCC

Page 9: MAT 4830 Mathematical Modeling 4.5 Phylogenetic Distances I

Jukes-Cantor Model

Assume α is small. Mutations per time step are “rare”.

0

1 / 3 / 3 / 3

/ 3 1 / 3 / 3 1 1 1 1( )

/ 3 / 3 1 / 3 4 4 4 4

/ 3 / 3 / 3 1

T

M p

Page 10: MAT 4830 Mathematical Modeling 4.5 Phylogenetic Distances I

Jukes-Cantor Model

q(t)=conditional prob. that the base at time t is the same as the base at time 0

( )q t

1 3 4 1 1 4 1 1 41 1 1

4 4 3 4 4 3 4 4 3

1 1 4 1 3 4 1 1 41 1 1

4 4 3 4 4 3 4 4 3( )

1 1 4 1 1 4 1 3 41 1 1

4 4 3 4 4 3 4 4 3

1 1 4 1 1 4 11 1

4 4 3 4 4 3 4

t t t

t t t

t

t t t

t t

M

1 1 41

4 4 3

1 1 41

4 4 3

1 1 41

4 4 3

1 4 1 3 41 1

4 3 4 4 3

t

t

t

t t

Page 11: MAT 4830 Mathematical Modeling 4.5 Phylogenetic Distances I

Jukes-Cantor Model

q(t)=fraction of sites with no observed mutations

( )q t

1 3 4 1 1 4 1 1 41 1 1

4 4 3 4 4 3 4 4 3

1 1 4 1 3 4 1 1 41 1 1

4 4 3 4 4 3 4 4 3( )

1 1 4 1 1 4 1 3 41 1 1

4 4 3 4 4 3 4 4 3

1 1 4 1 1 4 11 1

4 4 3 4 4 3 4

t t t

t t t

t

t t t

t t

M

1 1 41

4 4 3

1 1 41

4 4 3

1 1 41

4 4 3

1 4 1 3 41 1

4 3 4 4 3

t

t

t

t t

Page 12: MAT 4830 Mathematical Modeling 4.5 Phylogenetic Distances I

Jukes-Cantor Model

p(t)=1-q(t)=fractions of sites with observed mutations

( )q t

( ) 1 ( )p t q t

1 3 4 1 1 4 1 1 41 1 1

4 4 3 4 4 3 4 4 3

1 1 4 1 3 4 1 1 41 1 1

4 4 3 4 4 3 4 4 3( )

1 1 4 1 1 4 1 3 41 1 1

4 4 3 4 4 3 4 4 3

1 1 4 1 1 4 11 1

4 4 3 4 4 3 4

t t t

t t t

t

t t t

t t

M

1 1 41

4 4 3

1 1 41

4 4 3

1 1 41

4 4 3

1 4 1 3 41 1

4 3 4 4 3

t

t

t

t t

Page 13: MAT 4830 Mathematical Modeling 4.5 Phylogenetic Distances I

Jukes-Cantor Model

p(t)=1-q(t)=fractions of sites with observed mutations

( )q t

( ) 1 ( )p t q t

1 3 4 1 1 4 1 1 41 1 1

4 4 3 4 4 3 4 4 3

1 1 4 1 3 4 1 1 41 1 1

4 4 3 4 4 3 4 4 3( )

1 1 4 1 1 4 1 3 41 1 1

4 4 3 4 4 3 4 4 3

1 1 4 1 1 4 11 1

4 4 3 4 4 3 4

t t t

t t t

t

t t t

t t

M

1 1 41

4 4 3

1 1 41

4 4 3

1 1 41

4 4 3

1 4 1 3 41 1

4 3 4 4 3

t

t

t

t t

3 3 4( ) 1

4 4 3

t

p t

Page 14: MAT 4830 Mathematical Modeling 4.5 Phylogenetic Distances I

Jukes-Cantor Model

p can be estimated from the two sequences

( )q t

( ) 1 ( )p t q t

1 3 4 1 1 4 1 1 41 1 1

4 4 3 4 4 3 4 4 3

1 1 4 1 3 4 1 1 41 1 1

4 4 3 4 4 3 4 4 3( )

1 1 4 1 1 4 1 3 41 1 1

4 4 3 4 4 3 4 4 3

1 1 4 1 1 4 11 1

4 4 3 4 4 3 4

t t t

t t t

t

t t t

t t

M

1 1 41

4 4 3

1 1 41

4 4 3

1 1 41

4 4 3

1 4 1 3 41 1

4 3 4 4 3

t

t

t

t t

3 3 4( ) 1

4 4 3

t

p t

Page 15: MAT 4830 Mathematical Modeling 4.5 Phylogenetic Distances I

Example from 4.1

S0 : ATGTCGCCTGATAATGCC

S1 : ATGCCGCTTGACAATGCC

S2 : ATGCCGCGTGATAATGCC

Observed mutations: 2

fractions of sites with observed mutations

2 0.11

18p

Page 16: MAT 4830 Mathematical Modeling 4.5 Phylogenetic Distances I

Jukes-Cantor Distance

Given p (and t), the J-C distance between two sequences S0 and S1 is defined as

0 1

3 4( , ) ln 1

4 3JCd S S p

0

1

: ATGTCGCCTGATAATGCC

: ATGCCGCGTGATAATGCC

S

S

Page 17: MAT 4830 Mathematical Modeling 4.5 Phylogenetic Distances I

Jukes-Cantor Distance

Given p (and t), the J-C distance between two sequences S0 and S1 is defined as

0 1

3 4( , ) ln 1

4 3JCd S S p

0

1

: ATGTCGCCTGATAATGCC

: ATGCCGCGTGATAATGCC

S

S

Page 18: MAT 4830 Mathematical Modeling 4.5 Phylogenetic Distances I

Jukes-Cantor Distance

rate of base sub. sub. per site per time step

t no. of time step

t total no. of sub. in t time steps sub. per site

Page 19: MAT 4830 Mathematical Modeling 4.5 Phylogenetic Distances I

Jukes-Cantor Distance

rate of base sub. sub. per site per time step

t no. of time step

t total no. of sub. in t time steps sub. per site

3 3 41

4 4 3

4 4ln 1 ln 1

3 3 when is small

44ln 1

33

t

p

p pt

Page 20: MAT 4830 Mathematical Modeling 4.5 Phylogenetic Distances I

Jukes-Cantor Distance

rate of base sub. sub. per site per time step

t no. of time step

t total no. of sub. in t time steps sub. per site

3 3 41

4 4 3

4 4ln 1 ln 1

3 3 when is small

44ln 1

33

t

p

p pt

3 4ln 1

4 3t p

Page 21: MAT 4830 Mathematical Modeling 4.5 Phylogenetic Distances I

Example from 4.3

Suppose a 40-base ancestral and descendent DNA sequences are

0

1

S : ACTTGTCGGATGATCAGCGGTCCATGCACCTGACAACGGT

S : ACATGTTGCTTGACGACAGGTCCATGCGCCTGAGAACGGC

1 0\

7 0 1 1 1 9 2 0

0 2 7 2

1 0 1 6

S S A G C T

A

G

C

T

Page 22: MAT 4830 Mathematical Modeling 4.5 Phylogenetic Distances I

Example from 4.3

Suppose a 40-base ancestral and descendent DNA sequences are

0

1

S : ACTTGTCGGATGATCAGCGGTCCATGCACCTGACAACGGT

S : ACATGTTGCTTGACGACAGGTCCATGCGCCTGAGAACGGC

1 0\

7 0 1 1 1 9 2 0

0 2 7 2

1 0 1 6

S S A G C T

A

G

C

T

110.275

403 4 11

ln 1 0.34264 3 40JC

p

d

0 1

3 4( , ) ln 1

4 3JCd S S p

Page 23: MAT 4830 Mathematical Modeling 4.5 Phylogenetic Distances I

Example from 4.3

0.275 observed sub. per site.

0.3426 sub. estimated per site.

0

1

S : ACTTGTCGGATGATCAGCGGTCCATGCACCTGACAACGGT

S : ACATGTTGCTTGACGACAGGTCCATGCGCCTGAGAACGGC

1 0\

7 0 1 1 1 9 2 0

0 2 7 2

1 0 1 6

S S A G C T

A

G

C

T

110.275

403 4 11

ln 1 0.34264 3 40JC

p

d

Page 24: MAT 4830 Mathematical Modeling 4.5 Phylogenetic Distances I

Example from 4.3

11 observed sub.

13.7 sub. estimated.

0

1

S : ACTTGTCGGATGATCAGCGGTCCATGCACCTGACAACGGT

S : ACATGTTGCTTGACGACAGGTCCATGCGCCTGAGAACGGC

1 0\

7 0 1 1 1 9 2 0

0 2 7 2

1 0 1 6

S S A G C T

A

G

C

T

110.275

403 4 11

ln 1 0.34264 3 40JC

p

d

Page 25: MAT 4830 Mathematical Modeling 4.5 Phylogenetic Distances I

Performance of JC distance (Homework Problem 4)

Write a program to simulate of the mutations of a sequence for t time step using the Jukes-Cantor model with parameter α.

Page 26: MAT 4830 Mathematical Modeling 4.5 Phylogenetic Distances I

Performance of JC distance (Homework Problem 4)

Write a program to simulate of the mutations of a sequence for t time step using the Jukes-Cantor model with parameter α.

Count the number of base substitutions occurred.

Page 27: MAT 4830 Mathematical Modeling 4.5 Phylogenetic Distances I

Performance of JC distance (Homework Problem 4)

Write a program to simulate of the mutations of a sequence for t time step using the Jukes-Cantor model with parameter α.

Count the number of base substitutions occurred.

Compute the Jukes-Cantor distance of the initial and finial sequence.

Page 28: MAT 4830 Mathematical Modeling 4.5 Phylogenetic Distances I

Performance of JC distance (Homework Problem 4)

Write a program to simulate of the mutations of a sequence for t time step using the Jukes-Cantor model with parameter α.

Count the number of base substitutions occurred.

Compute the Jukes-Cantor distance of the initial and finial sequence.

Compare the actual number of base substitutions and the estimation from the Jukes-Cantor distance.

Page 29: MAT 4830 Mathematical Modeling 4.5 Phylogenetic Distances I

Performance of JC distance (Homework Problem 4)

Page 30: MAT 4830 Mathematical Modeling 4.5 Phylogenetic Distances I

Maple: Strings Handling II

Concatenating two strings

Page 31: MAT 4830 Mathematical Modeling 4.5 Phylogenetic Distances I

Maple: Strings Handling II

However, no “re-assignment”.

Page 32: MAT 4830 Mathematical Modeling 4.5 Phylogenetic Distances I

Classwork

Work on HW #1, 2