Upload
others
View
29
Download
0
Embed Size (px)
Citation preview
Algorithms for phylogeny construction
A Hybrid Micro-Macroevolutionary Approach to Gene Tree
Reconstruction
ICE-TCS Inaugural Symposium
Bjarni V. Halldorsson
April 30, 2005
1
Has Intelligence?
nozzvvvvvvvvvvvvvvvvvvvvvvvv
yes
$$JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ
Has Body Hair?
no����
����
����
����
����
����
����
����
yes BB
BBBB
BBBB
BBBB
BBBB
BBBB
BBBB
BBBB
BB
3
Genes, genomes
• Gene - a sequence having functional importance. AACG,
CACC, TACT
• Genome - a sequence containing genes as subsequences
TATAACGTTTCTACTCTATTACTCC
4
Evolution - changes in the genome
Original:
TATAACGTTTCTACTCTATTACTCC
Mutation:
TATAACGTTTCTAATCTCTTACTCC
Duplication:
TATAACGTTTCTACTCTATTACTCCTCTACTCT
Loss:
TA-TCTACTCTATTACTCC
5
Phylogenies
A species phylogeny shows the evolutionary history of a set of
species.
wwooooooooooooooo
))TTTTTTTTTTTTTTTT
%%KKKKKKKKKKKK
zzuuuuuuuuuuuu mouse
human monkey
A gene phylogeny shows the evolutionary history of a single gene.
P
�� ))SSSSSSSSSSSSSSSSS
P
xxpppppppppppppp
((RRRRRRRRRRRRRRRRRRRR Pmouse
P1
yyssssssssss
%%LLLLLLLLLL P2
�� ((PPPPPPPPPPPPPP
P1human
P1monkey P2
humanP2monkey
6
Why are gene phylogenies interesting?
• The same gene in different species is likely to play the same
role.
• We want to determine the function of a gene in human.
• Experiments in mouse, yeast or flies are less controversial
and take less time than in human.
7
Phylogeny construction considering mutations A very large num-
ber of algorithms exist for this problem.
• Character based algorithms (as mentioned before).
• Distances between the sequences are computed (such as the
number of mutations that occured between the sequences).
• If the phylogeny has the ultrametric property an efficient
algorithm can be employed.
8
Macroevolutionary phylogeny
Input: A rooted species tree, TS with s leaves; a list of multiplic-
ities m1 . . . ms, where ml is the number of gene family members
found in species l; weights cλ and cδ.
Output: A rooted gene tree {TG} with∑s
l=1 ml leaves such that
the D/L Score of TG is minimal.
zztttttttttt
''OOOOOOOOOOOOO
||zzzz
zzzz
��yyssssssssss
""EE
EEEE
EEEE
E
2A
����
@@@@
@@@@
@ 1B
~~~~~~
~~~~
~
��@@
@@@@
@@@
��@@
@@@@
@@@
��~~~~
~~~~
~ 2C 1D 2E
1F 2G
9
Phylogenies considering only cost of loss
• If the cost of losing a gene is much higher than the cost of
duplication we will construct a phylogeny that minimizes the
number of lost genes.
• All duplications will then take place after the speciations take
place.
10
4 Duplications
zzvvvvvvvvv
&&NNNNNNNNNNNN
}}zzzz
zzzz
��yytttt
tttttt
""DD
DDDD
DDDD
2A
����
@@@@
@@@@
@ 1B
��~~~~
~~~~
~
��@@
@@@@
@@@
��@@
@@@@
@@@
������
����
� 2C 1D 2E
1F 2G
vvmmmmmmmmmmmmmm
''PPPPPPPPPPPPP
xxppppppppppp
��yyrrrrrrrrrrr
##GGGGGGGGGGG
Dupl
||xxxx
xxxx
x
����
%%JJJJJJJJJJJJJ 1B
}}||||
||||
||
""FF
FFFF
FFFF
F
A A
yytttttttttttttt
��
Dupl
�� $$HHH
HHHHH
HH1D Dupl
||xxxx
xxxx
x
��
F Dupl
%%KKKKKKKKKKK
yysssssssssssC C E E
G G
11
Phylogenies considering only cost of duplication
• If the cost of a duplication is much higher than the cost of a
loss we will construct a phylogeny that minimizes the number
of duplications.
• All duplications can then be assumed to occur before any
speciation occurs.
12
1 Duplication, 3 Losses
zzvvvvvvvvv
&&NNNNNNNNNNNN
}}zzzz
zzzz
��yytttt
tttttt
""DD
DDDD
DDDD
2A
����
@@@@
@@@@
@ 1B
��~~~~
~~~~
~
��@@
@@@@
@@@
��@@
@@@@
@@@
������
����
� 2C 1D 2E
1F 2G
rreeeeeeeeeeeeeeeeeeeeeeeeeeeeee
))RRRRRRRRRRRRRRR
yyssssssssss
((PPPPPPPPPPPPPP
||yyyy
yyyy
y
$$IIIII
IIIII
{{wwwwwwwww
��xxqqqqqqqqqqq
##GG
GGGGG
GGGG
~~~~~~
~~~
��||yy
yyyy
yyy
AA
AAAA
AAA
A
��!!
CCCC
CCCC
CC Lost
}}{{{{
{{{{
{{
��<<
<<<<
<<< A
����
<<<<
<<<<
< B
������
����
�
��<<
<<<<
<<<
!!CC
CCCC
CCCC
}}{{{{
{{{{
{{ C Lost E
��<<
<<<<
<<<
������
����
� C D E
Lost G F G
13
Phylogenies considering duplication and loss
Reconstruct[TS, {m1 . . . ms}]
Ascend[root(TS)];
Descend[root(TS), 1];
Construct[root(TS)];
14
Ascend[v]
if v is not a leaf: Ascend[l(v)]; Ascend[r(v)];
if v is a leaf:
∀i s.t. 1 ≤ i ≤ m
costminv [i]← cδ ∗max(mv − i,0) + cλ ∗max(i−mv,0);
if v is not a leaf:
∀i, j s.t. 1 ≤ i, j ≤ m
costv[i, j]← cδ ∗max(j − i,0) + cλ ∗max(i− j,0) + costminl(v)
[j] + costminr(v)
[j];
∀i costminv [i]← min∀j{costv[i, j]};
15
Descend
Descend[v, i]
if v is a leaf:
v.losses← max((i−mv),0); v.dups← max((mv − i),0);
v.out← 0;
else
repeat { v.out + + } until ( costv[i, v.out] == costminv [i] );
Descend[l(v), v.out]; Descend[r(v), v.out]
v.losses← max(i− v.out,0); v.dups← max (v.out− i,0)
16
Construct
Construct[s]
g ← new gene node; g.species← s
if (s.currDup < s.dups)
s.currDup + +; l(g)← Construct[s]; r(g)← Construct[s];
else if (s.currLoss < s.losses)
s.currLoss + +;
else if (s.currSpec < s.out)
s.currSpec + +;
if s is not a leaf: l(g)← Construct[l(s)]; r(g)← Construct[r(s)];
return g;
17
2 Duplications, 1 loss
zzvvvvvvvvv
&&NNNNNNNNNNNN
}}zzzz
zzzz
��yytttt
tttttt
""DD
DDDD
DDDD
2A
����
@@@@
@@@@
@ 1B
��~~~~
~~~~
~
��@@
@@@@
@@@
��@@
@@@@
@@@
������
����
� 2C 1D 2E
1F 2G
Dupl
vvmmmmmmmmmmmmmm
%%LLLLLLLLLL
@@
@@@@
@@@
vvmmmmmmmmmmmmmmmmm
xxqqqqqqqqqqq
##HHHHHHHHHHH
~~||||
||||
||
�� ������
����
��� 1B
}}{{{{
{{{{
{{
##GG
GGGG
GGGG
G
A
��::
::::
:::
��
A
!!CC
CCCC
CCCC
��
1D Dupl
��{{wwww
wwww
w
��<<
<<<<
<<<
}}{{{{
{{{{
{{ C
""DD
DDDD
DDDD
������
����
C E E
Lost G F G
18
Time Complexity
Optimal history can be found in time O(nm2). Where n is the
number of nodes in the species tree and m is the maximum
number of genes drawn from any species.
In Ascendleaves of the species tree can be annotated with mul-
tiplicities in O(nm) time. The cost vector in each node is of
length m + 1 and each entry can be computed in time O(m),
total O(nm2).
Descend requires O(m) at each node, total O(nm). Construct
inserts duplication and loss nodes in the new tree, which can
number in total no more than m per node in TS. Total O(nm).
19
Extensions
• Combining duplication and loss cost with cost of mutations.
– Some edges of a phylogeny tree are well supported by a
micro-evolutionary phylogenetic construction algorithms.
– Edges that are not as well supported can be rearranged
minimizing duplication and loss.
• Consider and display all possible optimal histories.
20
Acknowledgements
• R. Ravi, Carnegie Mellon University
• Dannie Durand, Carnegie Mellon University
A Hybrid Micro-Macroevolutionary Approach to Gene Tree Re-
construction. D. Durand, B. V. Halldorsson, B. Vernot, 2005.
Proceedings of the Ninth Annual International Conference on
Computational Molecular Biology (RECOMB), To Appear.
21