36
Extending Cyberinfrastructure for Gene Tree Reconciliation James Estill, John Bowers, Hariolf Haefele, Adam Kubach, Naim Matasci, Sheldon McKay, Andrew Muir, Dennis Roberts, Sriram Srinivasan, Cécile Ané, Jim Leebens-Mack, Todd Vision iEvoBio June 21, 2011

iEvoBio2011 : Extending Cyberinfrastructure for Gene Tree Reconciliation

Embed Size (px)

DESCRIPTION

The reconciliation of gene trees to species trees makes use of the species tree to infer the history of evolutionary events such as gene duplication and loss in an individual gene family history. A cyberinfrastructure for tree reconciliation (TR) has been developed that includes an extensible pipeline for high-throughput reconciliation of gene trees to species trees, database utilities, and a visualization tool. The TR database schema extends the Ensemble-Compara database to include species trees and the mapping between the nodes of a gene tree and the species tree used for that reconciliation, which permits large-scale analysis of the distribution of gene tree events on species tree, and comparison of the evolutionary timing of events between gene trees. The Chado controlled vocabulary module was also incorporated to support the use of OBO ontologies to tag attribute values within the database. The schema supports multiple reconciliations for each gene tree, and an ontology for TR was developed to support storage of metadata for TR methodologies. Additions to the BioPerl Tree API allow for direct import of reconciled trees in PRIME format, and utilities have been provided to populate the database from de novo analyses of gene tree reconciliations. Queries against the database are facilitated by a RESTful web API that allows for BLAST searches against gene sequences in the database, as well as searches for GO term assignments among gene families. These tools support comparative analysis of reconciliation methodologies, which we illustrate by reporting an evaluation of the accuracy of methods that reconcile gene trees individually relative to synteny-informed reconstructions of genome duplication history. We also illustrate a novel visualization tool for interactively exploring the mapping between gene trees and species tree.

Citation preview

Page 1: iEvoBio2011 : Extending Cyberinfrastructure for Gene Tree Reconciliation

Extending Cyberinfrastructurefor

Gene Tree Reconciliation

James Estill, John Bowers, Hariolf Haefele, Adam Kubach, Naim Matasci, Sheldon McKay, Andrew Muir,

Dennis Roberts, Sriram Srinivasan, Cécile Ané, Jim Leebens-Mack, Todd Vision

iEvoBioJune 21, 2011

Page 2: iEvoBio2011 : Extending Cyberinfrastructure for Gene Tree Reconciliation

iPlant Tree of Life (iPTOL)

• Tree Reconciliation

• Big Trees

• Data Assembly

• Trait Evolution

• Data Integration

• Tree Visualization

Page 3: iEvoBio2011 : Extending Cyberinfrastructure for Gene Tree Reconciliation

Gene Tree Reconciliation

Projection of gene trees onto a species tree• gene duplications• gene losses• lineage sorting• horizontal transfer

Page 4: iEvoBio2011 : Extending Cyberinfrastructure for Gene Tree Reconciliation

Gene Tree Reconciliation

• Locating gene duplications allows us to identify orthologs and paralogs

• Identify gene composition in inferred ancestral genomes

• Map of the positions of ancestral polyploidy events

• Contribute to the study of the “fate” of duplicated genes

• Address questions of gene family coevolution

Page 5: iEvoBio2011 : Extending Cyberinfrastructure for Gene Tree Reconciliation

Existing Cyberinfrastructure

TreeBeST

primeGSR

GenerateReconciliations

primeTV

fltreebest

GeneTrees

EC VisualizeReconciliations

Page 6: iEvoBio2011 : Extending Cyberinfrastructure for Gene Tree Reconciliation

Extending Cyberinfrastructure

• Increased interoperability among the component pieces

• Query the location of gene duplications on the species tree

• Integrate tree visualization tools that scale to many thousands of nodes

• Allow for the storage and analysis of multiple reconciliations for a single gene tree within a single database structure

Page 7: iEvoBio2011 : Extending Cyberinfrastructure for Gene Tree Reconciliation

SpeciesTrees

Extending Cyberinfrastructure

TreeBeST

primeGSR

GenerateReconciliations

primeTV

fltreebest

GeneTrees

VisualizeReconciliations

Page 8: iEvoBio2011 : Extending Cyberinfrastructure for Gene Tree Reconciliation

Adding Species Trees

Page 9: iEvoBio2011 : Extending Cyberinfrastructure for Gene Tree Reconciliation

SpeciesTrees

Reconciled

Extending Cyberinfrastructure

TreeBeST

primeGSR

GenerateReconciliations

primeTV

fltreebest

GeneTrees

VisualizeReconciliations

Page 10: iEvoBio2011 : Extending Cyberinfrastructure for Gene Tree Reconciliation

A

B

C

D

E

12

3

4567

89

101112

13

14

15

1617

18

19

20

2122

23

Reconciled Tree

Gene Tree

123

4567

89

1011

12

13

14

15

16

17

18

19

20

2122

23

Species Tree

E

D

A

B

C

3 Species5 Nodes

12 Genes From 3 Species23 Nodes in Gene Tree

Page 11: iEvoBio2011 : Extending Cyberinfrastructure for Gene Tree Reconciliation

Mapping Host to Guest

• Map the guest tree onto the host tree by defining the position on an host tree edge that the gene tree node maps to

A

1 2

Host Tree Edge

B

Host Tree Nodes

Guest Tree Edge

Guest Tree Nodes

ParentNode

ChildNode

Page 12: iEvoBio2011 : Extending Cyberinfrastructure for Gene Tree Reconciliation

Mapping Host to Guest

1

• Guest nodes can map to four general locations on host edges

Inside Parent Node

2

Inside Child Node

Edge Between Host Nodes

3

Outside of Host Edge

4

Page 13: iEvoBio2011 : Extending Cyberinfrastructure for Gene Tree Reconciliation

Mapping Host to Guest

1

• Locations stored in a reconciliation map table

2

3

4

A

A

A

A

B

B

B

B

map idguestnode

hostparent node

host child node

1001 1 A A

1002 2 B B

1003 3 A B

1004 4 NULL A

Page 14: iEvoBio2011 : Extending Cyberinfrastructure for Gene Tree Reconciliation

Reconciliation

• Reconciliation is a mapping of the nodes of guest tree (gene tree) onto the nodes and edges of the host tree (species tree)

• The topology of the two trees are stored separately from the mapping of the reconciliation itself

Page 15: iEvoBio2011 : Extending Cyberinfrastructure for Gene Tree Reconciliation

Speciation

A B

C

1 2

3

D

4

map_id guest host_parent host_child

301 1 A A

302 2 B B

303 3 C C

304 4 D D

Page 16: iEvoBio2011 : Extending Cyberinfrastructure for Gene Tree Reconciliation

Gene DuplicationA B

map_id guest host_parent host_child

101 1 A A

102 2 A B

103 3 B B

104 4 B B

1 2

3

4

Page 17: iEvoBio2011 : Extending Cyberinfrastructure for Gene Tree Reconciliation

Horizontal Transfer to NodeA

B

3

B

C D

12

3

map_id guest host_parent host_child

201 1 A A

202 2 B B

203 3 D D

Page 18: iEvoBio2011 : Extending Cyberinfrastructure for Gene Tree Reconciliation

Horizontal Transfer to EdgeA

B

3

B

C D

12

3

map_id guest host_parent host_child

201 1 A A

202 2 B B

203 3 C D

Page 19: iEvoBio2011 : Extending Cyberinfrastructure for Gene Tree Reconciliation

Alien Horizontal Transfer

C D

1

3

map_id guest host_parent host_child

201 1 NULL NULL

202 2 NULL NULL

203 3 C D

Gene Source Outside of Species Tree

2

Page 20: iEvoBio2011 : Extending Cyberinfrastructure for Gene Tree Reconciliation

Gene Nodes Beyond Species LCA

A

C

12

3

D

4

map_id guest host_parent host_child

301 1 NULL A

302 2 A A

303 3 A A

304 4 C C

305 5 C C

306 6 D D

307 7 D D

5

6

7

Page 21: iEvoBio2011 : Extending Cyberinfrastructure for Gene Tree Reconciliation

Ontology

SpeciesTrees

Reconciled

Extending Cyberinfrastructure

TreeBeST

primeGSR

GenerateReconciliations

primeTV

fltreebest

GeneTrees

VisualizeReconciliations

TRON

Page 22: iEvoBio2011 : Extending Cyberinfrastructure for Gene Tree Reconciliation

Reconciled Node Attributesmap_id node_id host_parent host_child

1001 1 A A

1002 2 A A

1003 3 A A

1004 4 B B

1005 5 B B

1006 6 B B

1007 7 B B

1008 8 C C

1009 9 C C

1010 10 C C

1011 11 C C

1012 12 C C

1013 13 D B

1014 14 D B

1015 15 E C

1016 16 D D

1017 17 D D

1018 18 E C

1019 19 E D

1020 20 E C

1021 21 E D

1022 22 NULL E

1023 23 NULL E A B C

D

E

1 2 3 4 5 6 7 8 910 11 12

13 14 15

16 17 18

19

20

21

2223

Reconciliation nodeattributes stored in a separate table

Page 23: iEvoBio2011 : Extending Cyberinfrastructure for Gene Tree Reconciliation

map_id node_id host_parent host_child

1001 1 A A

1002 2 A A

1003 3 A A

1004 4 B B

1005 5 B B

1006 6 B B

1007 7 B B

1008 8 C C

1009 9 C C

1010 10 C C

1011 11 C C

1012 12 C C

1013 13 D B

1014 14 D B

1015 15 E C

1016 16 D D

1017 17 D D

1018 18 E C

1019 19 E D

1020 20 E C

1021 21 E D

1022 22 NULL E

1023 23 NULL E A B

D1 2 3 4 5 6 7 8

13 14

16 17

Reconciliation Map Tablemap_id term value

1001 1 leaf

1002 1 leaf

1003 1 leaf

1013 1 duplication

1014 1 duplication

1016 1 speciation

1017 1 speciation

Reconciliation Node Attributes

This could hold type of node, distance to host child node etc.

Page 24: iEvoBio2011 : Extending Cyberinfrastructure for Gene Tree Reconciliation

Ontology

SpeciesTrees

Reconciled

Extending Cyberinfrastructure

TreeBeST

primeGSR

GenerateReconciliations

primeTV

fltreebest

GeneTrees

VisualizeReconciliations

annot8r

FunctionalAnnotation TRON

Page 25: iEvoBio2011 : Extending Cyberinfrastructure for Gene Tree Reconciliation

Gene Ontology Terms

• Gene Ontology terms are assigned to individual gene models using annot8r

• Additional tools are provided to import annot8r output into the database

Page 26: iEvoBio2011 : Extending Cyberinfrastructure for Gene Tree Reconciliation

Ontology

SpeciesTrees

Reconciled

Extending Cyberinfrastructure

TreeBeST

primeGSR

GenerateReconciliations

primeTV

fltreebest

GeneTrees

VisualizeReconciliations

annot8r

FunctionalAnnotation TRON

Page 27: iEvoBio2011 : Extending Cyberinfrastructure for Gene Tree Reconciliation

Populating the Database

• Currently support TreeBeST reconciliations in the analysis pipeline

• Added PRIME support to BioPerl TREE::IO to support import of TreeBeST output into the database

((AT1G79430_Arabidopsis [&&PRIME ID=13 S=Arabidopsis AC=(6 8 9 10)],((CPS0032G081_papaya [&&PRIME ID=10 S=papaya AC=(7 8 9)],V19G1171_grape [&&PRIME ID=9 S=grape AC=(0)])snode0 [&&PRIME ID=11 D=0 AC=(10)],

Page 28: iEvoBio2011 : Extending Cyberinfrastructure for Gene Tree Reconciliation

Ontology

SpeciesTrees

Reconciled

Extending Cyberinfrastructure

TreeBeST

primeGSR

GenerateReconciliations

primeTV

fltreebest

GeneTrees

VisualizeReconciliations

annot8r

FunctionalAnnotation

QueryFunctionsTRON

Page 29: iEvoBio2011 : Extending Cyberinfrastructure for Gene Tree Reconciliation

Tree Reconciliation GUI

Page 30: iEvoBio2011 : Extending Cyberinfrastructure for Gene Tree Reconciliation

Tree Reconciliation GUI

Page 31: iEvoBio2011 : Extending Cyberinfrastructure for Gene Tree Reconciliation

Tree Reconciliation GUI

Page 32: iEvoBio2011 : Extending Cyberinfrastructure for Gene Tree Reconciliation

Tree Reconciliation GUI

Page 33: iEvoBio2011 : Extending Cyberinfrastructure for Gene Tree Reconciliation

Tree Reconciliation GUI

Queries

• BLAST

• GO Term

• Locus Name

• Gene Family Name

Page 34: iEvoBio2011 : Extending Cyberinfrastructure for Gene Tree Reconciliation

Ontology

SpeciesTrees

Reconciled

Extending Cyberinfrastructure

TreeBeST

primeGSR

GenerateReconciliations

primeTV

fltreebest

GeneTrees

VisualizeReconciliations

annot8r

FunctionalAnnotation

QueryFunctionsTRON

>high_throughput.pl

Page 35: iEvoBio2011 : Extending Cyberinfrastructure for Gene Tree Reconciliation

Current Development

• Extending analysis pipeline to additional reconciliation software (primeGSR/Notung)

• Evaluating accuracy of reconciliation software compared to synteny informed reconstruction

• XML Representation of reconciled trees

• Further refinement of GUI and integration with iPlant Discovery Environment

Page 36: iEvoBio2011 : Extending Cyberinfrastructure for Gene Tree Reconciliation

Repository

http://tinyurl.com/iPlantOS• Iplant-treerec – Back end services

• Tr-standalone – TR Viewer

Availability

Documentationhttp://tinyurl.com/TRDocs