View
213
Download
0
Category
Tags:
Preview:
Citation preview
New Tools for VisualizingGenome Evolution
Lutz HamelDept. of Computer Science and StatisticsUniversity of Rhode Island
J. Peter GogartenDept. of Molecular and Cell BiologyUniversity of Connecticut
Motivation
Early life on Earth has left a variety of traces that can be utilized to reconstruct the history of life the fossil and geological records information retained in living organisms
Our research focuses on how information can be gained from the molecular record: information about the history of life that is retained
in the structure and sequences of macromolecules found in extant organisms
The analyses of the mosaic nature of genomes using phylogenetics will be a key ingredient to unravel the life's early history.
Relevance to NASA
The analyses are relevant in the context of NASA's Origin theme: Understand the origin and evolution of life on
Earth.
We address questions that are central to NASA's Astrobiology program: Understand how past life on Earth interacted
with its changing planetary and Solar System environment.
Understand the evolutionary mechanisms and environmental limits of life.
Phylogenetics
Phylogenetics (Greek: phylon = race and genetic = birth) is the taxonomical classification of organisms based on how closely they are related in terms of evolutionary differences.
EuglenaTrypanosoma
Zea
Paramecium
Dictyostelium
EntamoebaNaegleria
Coprinus
Porphyra
Physarum
HomoTritrichomonas
Sulfolobus
ThermofilumThermoproteus
pJP 27pJP 78
pSL 22pSL 4
pSL 50
pSL 12
E.coli
Agrobacterium
Epulopiscium
AquifexThermotoga
Deinococcus
Synechococcus
Bacillus
Chlorobium
Vairimorpha
Cytophaga
HexamitaGiardia
mitochondria
chloroplast
Haloferax
Methanospirillum
Methanosarcina
Methanobacterium
ThermococcusMethanopyrus
Methanococcus
ARCHAEA BACTERIA
EUCARYA
Encephalitozoon
Thermus
EM 17 Marine group 1
RiftiaChromatium
ORIGIN
Treponema
Fig. modified from Norman Pace
SS
U-r
RN
A T
ree
of L
if e
Phylogenetics: Classic View
All genes are inherited from ancestor.
Branching reflects speciation events.
Evolutionary tree follows very closely the SSU-rRNA tree.
Horizontal Gene Transfer (HGT) leads to Mosaic Genomes, where different parts of the genome have different histories.
(a) concordant genes, (b) according to 16S (and other conserved genes) (c) according to phylogenetically discordant genes
Gophna, U., Doolittle, W.F. & Charlebois, R.L.: Weighted genome trees: refinements and applications. J. Bacteriol. (in press)
Evolutionary Processes Analogous to the Ones Proposed to Occur in the Microbial World
+
=
+ =
Cartoons from Science Made Stupid, T. Weller, 1986.from http://www.besse.at/sms/
Visualizing Phylogenies
Visualize the relation of four organisms at a time
three unrooted trees plot the support of
various genes for each of the tree topologies in an equilateral triangle
OrthologousGene Families
Visualizing Phylogenies
Synechocystis sp. (cyanobact.)
Chlorobium tepidum (GSB)
Rhodobacter capsulatus (-prot)
Rhodopseudomonas palustris (-prot)
Constructing the Visualization
Download four genomes
(genome quartet)
[a.a.sequences]
Download four genomes
(genome quartet)
[a.a.sequences]
“BLAST” every genome
against every other genome
“BLAST” every genome
against every other genome
Select top hit
of every BLAST search
Select top hit
of every BLAST search
Detect quartets of orthologs
Detect quartets of orthologs
Align quartets of orthologuesusing ClustalW
Align quartets of orthologuesusing ClustalW
Calculate maximum-likelihood values and posterior
probabilities for all three tree topologies
Calculate maximum-likelihood values and posterior
probabilities for all three tree topologies
Convert probabilities(barycentric coordinates)
into Cartesian coordinates
Convert probabilities(barycentric coordinates)
into Cartesian coordinates
Plot all points onto
equilateral triangle
Plot all points onto
equilateral triangle
Visualizing Five Genomes
Five genomes => fifteen unrooted trees
Rather than triangle - dekapentagon
A: ArchaeoglobusS: SulfolobusY: YeastR: RhodobacterB: Bacillus
A: ArchaeoglobusS: SulfolobusY: YeastR: RhodobacterB: Bacillus
Zhaxybayeva O, Hamel L, Raymond J, Gogarten JP: Visualization of Phylogenetic Content of Five Genomes with Dekapentagonal Maps. Genome Biology 2004, 5:R20
Visualizing Multiple Genomes
Number of Genomes Number of Unrooted Trees
3 1
4 3
5 15
6 105
7 945
8 10395
9 135135
10 2027025
20 2.22E+020
30 8.69E+036
40 1.31E+055
50 2.84E+074
60 5.01E+094
70 5.00E+115
80 2.18E+137
For comparison the universe contains only about 1089 protons and has an age of about 5*1017 seconds or 5*1029 picoseconds.
Given this explosion, plotting all possible relationships as unrootedtrees is impossible.
Visualizing Multiple Genomes: SOMs
SOM Self-Organizing Map An artificial neural network approach to clustering
we are looking for clusters of genes which favor certain tree topologies
Advantages over other clustering approaches: No a priori knowledge of how many clusters to
expect Explicit summary of commonalities and differences
between clusters Cluster membership is not exclusive – a gene can
indicate membership in multiple clusters at the same time
Visually appealing representation
T. Kohonen, Self-organizing maps, 3rd ed. Berlin ; New York: Springer, 2001.
Training a SOM
Tree #1 (T1) … Tree #k (Tk)
Support value vector for a set #1 of orthologous genes
P11 … P1k
Support value vector for a set #2 of orthologous genes
P21 … P2k
… … … …
Support value vector for a set #n of orthologous genes
Pn1 … Pnk
x
y
k
ittc ii
,)()(minarg mx
itthtt iciii )],()([)()1( mxmm
. if
, if 0
ic
ichci
SOM Regression Equations:
- learning rate - neighborhood distance
mi
Data SetSOM Neural Elements
SOM Visualization
Visualizing a Larger Number of Genomes
13 gamma-proteobacterial genomes (258 putative orthologs):
•E.coli•Buchnera•Haemophilus•Pasteurella•Salmonella•Yersinia pestis (2 strains)•Vibrio•Xanthomonas (2 sp.)•Pseudomonas•Wigglesworhtia•Vibrio Cholerae
There are 13,749,310,575
possible unrooted tree topologies for 13 genomes
switch to bipartitions
Bipartition of a Phylogenetic Tree
Bipartition – a division of a phylogenetic tree into two parts that are connected by a single branch. It divides a dataset into two groups, but it does not consider the relationships within each of the two groups.
95Here 95 represents thebootstrap support for theinternal branch.
The number of bipartitions for N genomes is equal to 2(N-1)-N-1.
Bipartitions: Lento Plot & SOM
Strongly supported bipartitions in SOM
Strongly supported bipartitions
Conclusions & Future Work
Self-Organizing Maps seem to be an effective way to visualize mosaic genome evolution.
Corroborate findings with other methodologies
Scalable In the Future:
Larger data sets Locally Linear Embedding (LLE)
Recommended