View
230
Download
2
Category
Tags:
Preview:
Citation preview
Exercise
• Show that in UPGMA, for some new cluster k
• The distances dkl are given by:
for any cluster l
k i jC C C
| | | |
| | | |il i jl j
kli j
d C d Cd
C C
Solution
• Since the members of k are the members of i and j, the sum of distances between members of k and l can be written as:
, , ,k l i l j l
xy xy xyx C y C x C y C x C y C
d d d
• This is equal to:
| | | | | | | |il i l jl j ld C C d C C
Solution
• By the definition of distance between clusters, we divide the latter sum by |Ck|·|Cl|:
| | | | | | | |
| | | |il i l jl j l
k l
d C C d C C
C C
| | | | | | | |
(| | | |) | |il i l jl j l
i j l
d C C d C C
C C C
Exercise
• Show that every parent in a tree constructed by UPGMA is never lower than its daughter nodes
Solution
• Since hn=dkl/2, we will show that for every k and l dkl≥dij and therefore node n
is higher than node k
• According to the previous exercise:
| | | |
| | | |il i jl j
kli j
d C d Cd
C C
il jlmin(d ,d ) kld
Solution
• Since i and j were merged and not i and l or j and l, we can conclude that
il jl ijmin(d ,d ) dkld
Exercise
• Show an example in which the parent node height is equal to the child node height (UPGMA).
Solution
• Suppose 3 pairs of sequences have the same distance d.
• We choose to merge leafs 1 and 2 and produce node 4, with height d/2.
• The new distance, d43, is exactly d
• So when we merge node 4 and leaf 3, we create a new node 5 of height d/2
Exercise
• The famous paleontologist R. Geller argued to his sister that the last common ancestor of birds and dinosaurs lived 100 million years ago.
• His sister claimed that the ancestor lived 200 million years ago.
• The evidence are 1000nt long homologous genes with 350 differences (its not contamination this time…)
Exercise
• Both accept the Jukes-Cantor model• Both accept the assumption of a
molecular clock• If mutations occur independently,
with rate 10-9 mutations per year, whose theory is more likely to be correct?
Solution
• According to Jukes-Cantor, the probability of a nucleotide remaining unchanged over t time units is:
41(1 3 )
4t
x xP e
• The probability for a specific change:
41(1 )
4t
x y xP e
Solution
• The likelihood of the tree at site i is:
( ) ( , | , )i i iL t P bird dinosaur T time from parent is t
( | , ) ( | , )i
i
ancestor i i i iancestor
q P bird ancestor t P dinosaur ancestor t
( | , ) ( | , )i
i
bird i i i iancestor
q P dinosaur ancestor t P ancestor bird t
Likelihood of a tree
Jukes-Cantor Reversibility property
( | , 2 )ibird i iq P dinosaur bird t
Jukes-Cantor Additivity
Less work to do
Solution
• Since the distance between the species is 2t, the probability of every site in which there is a match is:
4 (2 ) 4 (2 )1 1( ) 1 3 1 3
4 16t t
i iL t q e e
• For a mismatch, the probability is:
4 (2 )1( ) 1
16t
iL t e
Solution
• The log likelihood of the trees suggested by Dr. Geller and his sister is:
6 6
6 6
650 3504 (2 100 10 ) 4 (2 100 10 )
1650 350
4 (2 200 10 ) 4 (2 200 10 )2
1 3 1( )ln
( ) 1 3 1
e eL T
L T e e
6 6
6 6
4 (2 100 10 ) 4 (2 100 10 )
4 (2 200 10 ) 4 (2 200 10 )
1 3 1650 ln 350 ln
1 3 1
e e
e e
3α=10-9 α=1/3*10-9
Solution
0.26) 0.26
0.52 0.52
1 3 1650 ln 350 ln
1 3 1
e e
e e
0.26) 0.52
0.26 0.52
650 ln 1 3 650 ln 1 3
350 ln 1 350 ln 1
e e
e e
779 666 516 317 86
Yay!
Exercise
• Assume that the substitution cost for a weighted parsimony algorithm is a metric, i.e. it satisfied S(a,a)=0, S(a,b)=S(b,a) and S(a,c)≤S(a,b)+S(b,c).
• Show the tree with minimal cost is independent of the position of the root.
Solution
• We have a set of species and we are given a minimal weight tree for it. Denote the root in this tree by k
k
i j
l m
We will show that deleting kand moving it to this edge does not change the cost of the tree
Solution
• What is the cost of the tree before translocation of the root?
k
i j
l m
min ( , ) ( ) min ( , ) ( )
min ( , ) ( ) min ( , ) ( )
T i ja b
i ja b
S S c a S a S c b S b
S a c S a S c b S b
For a specific choice of character c at the root:The minimal choice is the cost of this tree:
min TcS
• And the minimal cost of the tree is:
Solution
• Due to the triangle inequality, S(a,b)≤S(a,c)+S(c,b)
k
i j
l m
• If we set c to a (or equivalently to b), we get:
min ( , ) ( ) min ( , ) ( )
( ) min ( , ) ( )
T i ja b
i jb
S S a a S a S a b S b
S a S a b S b
min( ( ) min ( , ) ( ))i ja bS a S a b S b
Solution
• Denote the character at l as d
k
l i
jm
• The new cost is:
' '
,
'
,
min ( ) ( , ) ( )
min ( ) ( , ) ( )
i la d
i la d
S a S a d S d
S a S a d S d
where the S’ is due to the change in subtree
Solution
k
l i
jm
k
i j
l m
'
,min ( ) ( , ) ( )i la dS a S a d S d
,min ( ) ( , ) ( )i ja bS a S a b S b
Solution
k
l i
jm
k
i j
l m
'
,
,
min ( ) ( , ) ( )
min ( ) ( , )
min min ( ) ( , )
( , ) ( )
i la d
jb
ma d e
l
S a S a d S d
S b S a b
S e S a e
S a d S d
,
,
min ( ) ( , ) ( )
min ( ) ( , )
min min ( ) ( , )
( , ) ( )
i ja b
ld
ma b e
j
S a S a b S b
S d S a d
S e S a e
S a b S b
Solution
• We proved that when moving the root to an adjacent position does not change the minimal cost.
• Why is the case of moving the root to a non-adjacent position easier to prove?
Recommended