Upload
jamie-estill
View
429
Download
0
Tags:
Embed Size (px)
DESCRIPTION
The reconciliation of gene trees to species trees makes use of the species tree to infer the history of evolutionary events such as gene duplication and loss in an individual gene family history. A cyberinfrastructure for tree reconciliation (TR) has been developed that includes an extensible pipeline for high-throughput reconciliation of gene trees to species trees, database utilities, and a visualization tool. The TR database schema extends the Ensemble-Compara database to include species trees and the mapping between the nodes of a gene tree and the species tree used for that reconciliation, which permits large-scale analysis of the distribution of gene tree events on species tree, and comparison of the evolutionary timing of events between gene trees. The Chado controlled vocabulary module was also incorporated to support the use of OBO ontologies to tag attribute values within the database. The schema supports multiple reconciliations for each gene tree, and an ontology for TR was developed to support storage of metadata for TR methodologies. Additions to the BioPerl Tree API allow for direct import of reconciled trees in PRIME format, and utilities have been provided to populate the database from de novo analyses of gene tree reconciliations. Queries against the database are facilitated by a RESTful web API that allows for BLAST searches against gene sequences in the database, as well as searches for GO term assignments among gene families. These tools support comparative analysis of reconciliation methodologies, which we illustrate by reporting an evaluation of the accuracy of methods that reconcile gene trees individually relative to synteny-informed reconstructions of genome duplication history. We also illustrate a novel visualization tool for interactively exploring the mapping between gene trees and species tree.
Citation preview
Extending Cyberinfrastructurefor
Gene Tree Reconciliation
James Estill, John Bowers, Hariolf Haefele, Adam Kubach, Naim Matasci, Sheldon McKay, Andrew Muir,
Dennis Roberts, Sriram Srinivasan, Cécile Ané, Jim Leebens-Mack, Todd Vision
iEvoBioJune 21, 2011
iPlant Tree of Life (iPTOL)
• Tree Reconciliation
• Big Trees
• Data Assembly
• Trait Evolution
• Data Integration
• Tree Visualization
Gene Tree Reconciliation
Projection of gene trees onto a species tree• gene duplications• gene losses• lineage sorting• horizontal transfer
Gene Tree Reconciliation
• Locating gene duplications allows us to identify orthologs and paralogs
• Identify gene composition in inferred ancestral genomes
• Map of the positions of ancestral polyploidy events
• Contribute to the study of the “fate” of duplicated genes
• Address questions of gene family coevolution
Existing Cyberinfrastructure
TreeBeST
primeGSR
GenerateReconciliations
primeTV
fltreebest
GeneTrees
EC VisualizeReconciliations
Extending Cyberinfrastructure
• Increased interoperability among the component pieces
• Query the location of gene duplications on the species tree
• Integrate tree visualization tools that scale to many thousands of nodes
• Allow for the storage and analysis of multiple reconciliations for a single gene tree within a single database structure
SpeciesTrees
Extending Cyberinfrastructure
TreeBeST
primeGSR
GenerateReconciliations
primeTV
fltreebest
GeneTrees
VisualizeReconciliations
Adding Species Trees
SpeciesTrees
Reconciled
Extending Cyberinfrastructure
TreeBeST
primeGSR
GenerateReconciliations
primeTV
fltreebest
GeneTrees
VisualizeReconciliations
A
B
C
D
E
12
3
4567
89
101112
13
14
15
1617
18
19
20
2122
23
Reconciled Tree
Gene Tree
123
4567
89
1011
12
13
14
15
16
17
18
19
20
2122
23
Species Tree
E
D
A
B
C
3 Species5 Nodes
12 Genes From 3 Species23 Nodes in Gene Tree
Mapping Host to Guest
• Map the guest tree onto the host tree by defining the position on an host tree edge that the gene tree node maps to
A
1 2
Host Tree Edge
B
Host Tree Nodes
Guest Tree Edge
Guest Tree Nodes
ParentNode
ChildNode
Mapping Host to Guest
1
• Guest nodes can map to four general locations on host edges
Inside Parent Node
2
Inside Child Node
Edge Between Host Nodes
3
Outside of Host Edge
4
Mapping Host to Guest
1
• Locations stored in a reconciliation map table
2
3
4
A
A
A
A
B
B
B
B
map idguestnode
hostparent node
host child node
1001 1 A A
1002 2 B B
1003 3 A B
1004 4 NULL A
Reconciliation
• Reconciliation is a mapping of the nodes of guest tree (gene tree) onto the nodes and edges of the host tree (species tree)
• The topology of the two trees are stored separately from the mapping of the reconciliation itself
Speciation
A B
C
1 2
3
D
4
map_id guest host_parent host_child
301 1 A A
302 2 B B
303 3 C C
304 4 D D
Gene DuplicationA B
map_id guest host_parent host_child
101 1 A A
102 2 A B
103 3 B B
104 4 B B
1 2
3
4
Horizontal Transfer to NodeA
B
3
B
C D
12
3
map_id guest host_parent host_child
201 1 A A
202 2 B B
203 3 D D
Horizontal Transfer to EdgeA
B
3
B
C D
12
3
map_id guest host_parent host_child
201 1 A A
202 2 B B
203 3 C D
Alien Horizontal Transfer
C D
1
3
map_id guest host_parent host_child
201 1 NULL NULL
202 2 NULL NULL
203 3 C D
Gene Source Outside of Species Tree
2
Gene Nodes Beyond Species LCA
A
C
12
3
D
4
map_id guest host_parent host_child
301 1 NULL A
302 2 A A
303 3 A A
304 4 C C
305 5 C C
306 6 D D
307 7 D D
5
6
7
Ontology
SpeciesTrees
Reconciled
Extending Cyberinfrastructure
TreeBeST
primeGSR
GenerateReconciliations
primeTV
fltreebest
GeneTrees
VisualizeReconciliations
TRON
Reconciled Node Attributesmap_id node_id host_parent host_child
1001 1 A A
1002 2 A A
1003 3 A A
1004 4 B B
1005 5 B B
1006 6 B B
1007 7 B B
1008 8 C C
1009 9 C C
1010 10 C C
1011 11 C C
1012 12 C C
1013 13 D B
1014 14 D B
1015 15 E C
1016 16 D D
1017 17 D D
1018 18 E C
1019 19 E D
1020 20 E C
1021 21 E D
1022 22 NULL E
1023 23 NULL E A B C
D
E
1 2 3 4 5 6 7 8 910 11 12
13 14 15
16 17 18
19
20
21
2223
Reconciliation nodeattributes stored in a separate table
map_id node_id host_parent host_child
1001 1 A A
1002 2 A A
1003 3 A A
1004 4 B B
1005 5 B B
1006 6 B B
1007 7 B B
1008 8 C C
1009 9 C C
1010 10 C C
1011 11 C C
1012 12 C C
1013 13 D B
1014 14 D B
1015 15 E C
1016 16 D D
1017 17 D D
1018 18 E C
1019 19 E D
1020 20 E C
1021 21 E D
1022 22 NULL E
1023 23 NULL E A B
D1 2 3 4 5 6 7 8
13 14
16 17
Reconciliation Map Tablemap_id term value
1001 1 leaf
1002 1 leaf
1003 1 leaf
1013 1 duplication
1014 1 duplication
1016 1 speciation
1017 1 speciation
Reconciliation Node Attributes
This could hold type of node, distance to host child node etc.
Ontology
SpeciesTrees
Reconciled
Extending Cyberinfrastructure
TreeBeST
primeGSR
GenerateReconciliations
primeTV
fltreebest
GeneTrees
VisualizeReconciliations
annot8r
FunctionalAnnotation TRON
Gene Ontology Terms
• Gene Ontology terms are assigned to individual gene models using annot8r
• Additional tools are provided to import annot8r output into the database
Ontology
SpeciesTrees
Reconciled
Extending Cyberinfrastructure
TreeBeST
primeGSR
GenerateReconciliations
primeTV
fltreebest
GeneTrees
VisualizeReconciliations
annot8r
FunctionalAnnotation TRON
Populating the Database
• Currently support TreeBeST reconciliations in the analysis pipeline
• Added PRIME support to BioPerl TREE::IO to support import of TreeBeST output into the database
((AT1G79430_Arabidopsis [&&PRIME ID=13 S=Arabidopsis AC=(6 8 9 10)],((CPS0032G081_papaya [&&PRIME ID=10 S=papaya AC=(7 8 9)],V19G1171_grape [&&PRIME ID=9 S=grape AC=(0)])snode0 [&&PRIME ID=11 D=0 AC=(10)],
Ontology
SpeciesTrees
Reconciled
Extending Cyberinfrastructure
TreeBeST
primeGSR
GenerateReconciliations
primeTV
fltreebest
GeneTrees
VisualizeReconciliations
annot8r
FunctionalAnnotation
QueryFunctionsTRON
Tree Reconciliation GUI
Tree Reconciliation GUI
Tree Reconciliation GUI
Tree Reconciliation GUI
Tree Reconciliation GUI
Queries
• BLAST
• GO Term
• Locus Name
• Gene Family Name
Ontology
SpeciesTrees
Reconciled
Extending Cyberinfrastructure
TreeBeST
primeGSR
GenerateReconciliations
primeTV
fltreebest
GeneTrees
VisualizeReconciliations
annot8r
FunctionalAnnotation
QueryFunctionsTRON
>high_throughput.pl
Current Development
• Extending analysis pipeline to additional reconciliation software (primeGSR/Notung)
• Evaluating accuracy of reconciliation software compared to synteny informed reconstruction
• XML Representation of reconciled trees
• Further refinement of GUI and integration with iPlant Discovery Environment
Repository
http://tinyurl.com/iPlantOS• Iplant-treerec – Back end services
• Tr-standalone – TR Viewer
Availability
Documentationhttp://tinyurl.com/TRDocs