Extending Cyberinfrastructurefor
Gene Tree Reconciliation
James Estill, John Bowers, Hariolf Haefele, Adam Kubach, Naim Matasci, Sheldon McKay, Andrew Muir,
Dennis Roberts, Sriram Srinivasan, Cécile Ané, Jim Leebens-Mack, Todd Vision
iEvoBioJune 21, 2011
iPlant Tree of Life (iPTOL)
• Tree Reconciliation
• Big Trees
• Data Assembly
• Trait Evolution
• Data Integration
• Tree Visualization
Gene Tree Reconciliation
Projection of gene trees onto a species tree• gene duplications• gene losses• lineage sorting• horizontal transfer
Gene Tree Reconciliation
• Locating gene duplications allows us to identify orthologs and paralogs
• Identify gene composition in inferred ancestral genomes
• Map of the positions of ancestral polyploidy events
• Contribute to the study of the “fate” of duplicated genes
• Address questions of gene family coevolution
Existing Cyberinfrastructure
TreeBeST
primeGSR
GenerateReconciliations
primeTV
fltreebest
GeneTrees
EC VisualizeReconciliations
Extending Cyberinfrastructure
• Increased interoperability among the component pieces
• Query the location of gene duplications on the species tree
• Integrate tree visualization tools that scale to many thousands of nodes
• Allow for the storage and analysis of multiple reconciliations for a single gene tree within a single database structure
SpeciesTrees
Extending Cyberinfrastructure
TreeBeST
primeGSR
GenerateReconciliations
primeTV
fltreebest
GeneTrees
VisualizeReconciliations
Adding Species Trees
SpeciesTrees
Reconciled
Extending Cyberinfrastructure
TreeBeST
primeGSR
GenerateReconciliations
primeTV
fltreebest
GeneTrees
VisualizeReconciliations
A
B
C
D
E
12
3
4567
89
101112
13
14
15
1617
18
19
20
2122
23
Reconciled Tree
Gene Tree
123
4567
89
1011
12
13
14
15
16
17
18
19
20
2122
23
Species Tree
E
D
A
B
C
3 Species5 Nodes
12 Genes From 3 Species23 Nodes in Gene Tree
Mapping Host to Guest
• Map the guest tree onto the host tree by defining the position on an host tree edge that the gene tree node maps to
A
1 2
Host Tree Edge
B
Host Tree Nodes
Guest Tree Edge
Guest Tree Nodes
ParentNode
ChildNode
Mapping Host to Guest
1
• Guest nodes can map to four general locations on host edges
Inside Parent Node
2
Inside Child Node
Edge Between Host Nodes
3
Outside of Host Edge
4
Mapping Host to Guest
1
• Locations stored in a reconciliation map table
2
3
4
A
A
A
A
B
B
B
B
map idguestnode
hostparent node
host child node
1001 1 A A
1002 2 B B
1003 3 A B
1004 4 NULL A
Reconciliation
• Reconciliation is a mapping of the nodes of guest tree (gene tree) onto the nodes and edges of the host tree (species tree)
• The topology of the two trees are stored separately from the mapping of the reconciliation itself
Speciation
A B
C
1 2
3
D
4
map_id guest host_parent host_child
301 1 A A
302 2 B B
303 3 C C
304 4 D D
Gene DuplicationA B
map_id guest host_parent host_child
101 1 A A
102 2 A B
103 3 B B
104 4 B B
1 2
3
4
Horizontal Transfer to NodeA
B
3
B
C D
12
3
map_id guest host_parent host_child
201 1 A A
202 2 B B
203 3 D D
Horizontal Transfer to EdgeA
B
3
B
C D
12
3
map_id guest host_parent host_child
201 1 A A
202 2 B B
203 3 C D
Alien Horizontal Transfer
C D
1
3
map_id guest host_parent host_child
201 1 NULL NULL
202 2 NULL NULL
203 3 C D
Gene Source Outside of Species Tree
2
Gene Nodes Beyond Species LCA
A
C
12
3
D
4
map_id guest host_parent host_child
301 1 NULL A
302 2 A A
303 3 A A
304 4 C C
305 5 C C
306 6 D D
307 7 D D
5
6
7
Ontology
SpeciesTrees
Reconciled
Extending Cyberinfrastructure
TreeBeST
primeGSR
GenerateReconciliations
primeTV
fltreebest
GeneTrees
VisualizeReconciliations
TRON
Reconciled Node Attributesmap_id node_id host_parent host_child
1001 1 A A
1002 2 A A
1003 3 A A
1004 4 B B
1005 5 B B
1006 6 B B
1007 7 B B
1008 8 C C
1009 9 C C
1010 10 C C
1011 11 C C
1012 12 C C
1013 13 D B
1014 14 D B
1015 15 E C
1016 16 D D
1017 17 D D
1018 18 E C
1019 19 E D
1020 20 E C
1021 21 E D
1022 22 NULL E
1023 23 NULL E A B C
D
E
1 2 3 4 5 6 7 8 910 11 12
13 14 15
16 17 18
19
20
21
2223
Reconciliation nodeattributes stored in a separate table
map_id node_id host_parent host_child
1001 1 A A
1002 2 A A
1003 3 A A
1004 4 B B
1005 5 B B
1006 6 B B
1007 7 B B
1008 8 C C
1009 9 C C
1010 10 C C
1011 11 C C
1012 12 C C
1013 13 D B
1014 14 D B
1015 15 E C
1016 16 D D
1017 17 D D
1018 18 E C
1019 19 E D
1020 20 E C
1021 21 E D
1022 22 NULL E
1023 23 NULL E A B
D1 2 3 4 5 6 7 8
13 14
16 17
Reconciliation Map Tablemap_id term value
1001 1 leaf
1002 1 leaf
1003 1 leaf
1013 1 duplication
1014 1 duplication
1016 1 speciation
1017 1 speciation
Reconciliation Node Attributes
This could hold type of node, distance to host child node etc.
Ontology
SpeciesTrees
Reconciled
Extending Cyberinfrastructure
TreeBeST
primeGSR
GenerateReconciliations
primeTV
fltreebest
GeneTrees
VisualizeReconciliations
annot8r
FunctionalAnnotation TRON
Gene Ontology Terms
• Gene Ontology terms are assigned to individual gene models using annot8r
• Additional tools are provided to import annot8r output into the database
Ontology
SpeciesTrees
Reconciled
Extending Cyberinfrastructure
TreeBeST
primeGSR
GenerateReconciliations
primeTV
fltreebest
GeneTrees
VisualizeReconciliations
annot8r
FunctionalAnnotation TRON
Populating the Database
• Currently support TreeBeST reconciliations in the analysis pipeline
• Added PRIME support to BioPerl TREE::IO to support import of TreeBeST output into the database
((AT1G79430_Arabidopsis [&&PRIME ID=13 S=Arabidopsis AC=(6 8 9 10)],((CPS0032G081_papaya [&&PRIME ID=10 S=papaya AC=(7 8 9)],V19G1171_grape [&&PRIME ID=9 S=grape AC=(0)])snode0 [&&PRIME ID=11 D=0 AC=(10)],
Ontology
SpeciesTrees
Reconciled
Extending Cyberinfrastructure
TreeBeST
primeGSR
GenerateReconciliations
primeTV
fltreebest
GeneTrees
VisualizeReconciliations
annot8r
FunctionalAnnotation
QueryFunctionsTRON
Tree Reconciliation GUI
Tree Reconciliation GUI
Tree Reconciliation GUI
Tree Reconciliation GUI
Tree Reconciliation GUI
Queries
• BLAST
• GO Term
• Locus Name
• Gene Family Name
Ontology
SpeciesTrees
Reconciled
Extending Cyberinfrastructure
TreeBeST
primeGSR
GenerateReconciliations
primeTV
fltreebest
GeneTrees
VisualizeReconciliations
annot8r
FunctionalAnnotation
QueryFunctionsTRON
>high_throughput.pl
Current Development
• Extending analysis pipeline to additional reconciliation software (primeGSR/Notung)
• Evaluating accuracy of reconciliation software compared to synteny informed reconstruction
• XML Representation of reconciled trees
• Further refinement of GUI and integration with iPlant Discovery Environment
Repository
http://tinyurl.com/iPlantOS• Iplant-treerec – Back end services
• Tr-standalone – TR Viewer
Availability
Documentationhttp://tinyurl.com/TRDocs