30
Towards a Simple, Standards Compliant, and Generic Phylogenetic Database Module Hilmar Lapp and Todd Vision National Evolutionary Synthesis Center (NESCent)

Towards a Simple, Standards-Compliant, and Generic Phylogenetic Database

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: Towards a Simple, Standards-Compliant, and Generic Phylogenetic Database

Towards a Simple, Standards Compliant, and

Generic Phylogenetic Database Module

Hilmar Lapp and Todd VisionNational Evolutionary Synthesis Center

(NESCent)

Page 2: Towards a Simple, Standards-Compliant, and Generic Phylogenetic Database

Rich diversity of online data repositories

Page 3: Towards a Simple, Standards-Compliant, and Generic Phylogenetic Database

Most data is not online

Clark J.R. et al. (2008) A Comparative Study in Ancestral Range Reconstruction Methods: Retracing the Uncertain Histories of Insular Lineages. Systematic Biology,57:5,693-707

Syst. Biol.Data Archive

Page 4: Towards a Simple, Standards-Compliant, and Generic Phylogenetic Database

Little standards support

Page 5: Towards a Simple, Standards-Compliant, and Generic Phylogenetic Database

Accelerating knowledge dissemination: A Story

• Jane and her lab have accumulated molecular data to resolve the phylogeny of a certain clade of frogs, many of which are endangered species.

• Her group assembles a multiple alignment and reconstructs the phylogeny using a variety of methods, some developed by her lab, resulting in 1000s of trees.

• The results show overwhelming support for several new branch points. The results are interesting and solid enough to be useful for others working on those species.

Page 6: Towards a Simple, Standards-Compliant, and Generic Phylogenetic Database

Accelerating knowledge dissemination: A Story

• Jane and her lab have accumulated molecular data to resolve the phylogeny of a certain clade of frogs, many of which are endangered species.

• Her group assembles a multiple alignment and reconstructs the phylogeny using a variety of methods, some developed by her lab, resulting in 1000s of trees.

• The results show overwhelming support for several new branch points. The results are interesting and solid enough to be useful for others working on those species.

Page 7: Towards a Simple, Standards-Compliant, and Generic Phylogenetic Database

Accelerating knowledge dissemination: A Story

• Jane and her lab have accumulated molecular data to resolve the phylogeny of a certain clade of frogs, many of which are endangered species.

• Her group assembles a multiple alignment and reconstructs the phylogeny using a variety of methods, some developed by her lab, resulting in 1000s of trees.

• The results show overwhelming support for several new branch points. The results are interesting and solid enough to be useful for others working on those species.

Page 8: Towards a Simple, Standards-Compliant, and Generic Phylogenetic Database

Accelerating knowledge dissemination: A Story

• Jane and her lab have accumulated molecular data to resolve the phylogeny of a certain clade of frogs, many of which are endangered species.

• Her group assembles a multiple alignment and reconstructs the phylogeny using a variety of methods, some developed by her lab, resulting in 1000s of trees.

• The results show overwhelming support for several new branch points. The results are interesting and solid enough to be useful for others working on those species.

Page 9: Towards a Simple, Standards-Compliant, and Generic Phylogenetic Database

• Jane downloads and installs PhyloDOM, a freely available open source software package. The software creates a database and Jane uses the programs that come with it to import all her data.

• As a result, Jane’s lab now has a web-interface to her results that others can use to query for novel topologies and to explore her data.

• Her lab also updates the database from their on-going work, and uses it to add provenance data and links to protocols, publications, and taxonomic concepts.

Page 10: Towards a Simple, Standards-Compliant, and Generic Phylogenetic Database

• Jane downloads and installs PhyloDOM, a freely available open source software package. The software creates a database and Jane uses the programs that come with it to import all her data.

• As a result, Jane’s lab now has a web-interface to her results that others can use to query for novel topologies and to explore her data.

• Her lab also updates the database from their on-going work, and uses it to add provenance data and links to protocols, publications, and taxonomic concepts.

Page 11: Towards a Simple, Standards-Compliant, and Generic Phylogenetic Database

• Jane downloads and installs PhyloDOM, a freely available open source software package. The software creates a database and Jane uses the programs that come with it to import all her data.

• As a result, Jane’s lab now has a web-interface to her results that others can use to query for novel topologies and to explore her data.

• Her lab also updates the database from their on-going work, and uses it to add provenance data and links to protocols, publications, and taxonomic concepts.

Page 12: Towards a Simple, Standards-Compliant, and Generic Phylogenetic Database

• Jane downloads and installs PhyloDOM, a freely available open source software package. The software creates a database and Jane uses the programs that come with it to import all her data.

• As a result, Jane’s lab now has a web-interface to her results that others can use to query for novel topologies and to explore her data.

• Her lab also updates the database from their on-going work, and uses it to add provenance data and links to protocols, publications, and taxonomic concepts.

Page 13: Towards a Simple, Standards-Compliant, and Generic Phylogenetic Database

• Other researchers easily download and integrate her results in their own analyses.

• Even where Jane used new methods, other software understands the meaning of the metadata and can take advantage of it.

• Within shortly, her results appear in data aggregators such as iSpecies, EOL, or Scratchpads, along with those from other labs.

• Jane herself uses the LifeMap widget to map her trees onto geo-coordinates and to link branches to ecological and biodiversity parameters of respective areas.

Page 14: Towards a Simple, Standards-Compliant, and Generic Phylogenetic Database

• Other researchers easily download and integrate her results in their own analyses.

• Even where Jane used new methods, other software understands the meaning of the metadata and can take advantage of it.

• Within shortly, her results appear in data aggregators such as iSpecies, EOL, or Scratchpads, along with those from other labs.

• Jane herself uses the LifeMap widget to map her trees onto geo-coordinates and to link branches to ecological and biodiversity parameters of respective areas.

Page 15: Towards a Simple, Standards-Compliant, and Generic Phylogenetic Database

• Other researchers easily download and integrate her results in their own analyses.

• Even where Jane used new methods, other software understands the meaning of the metadata and can take advantage of it.

• Within shortly, her results appear in data aggregators such as iSpecies, EOL, or Scratchpads, along with those from other labs.

• Jane herself uses the LifeMap widget to map her trees onto geo-coordinates and to link branches to ecological and biodiversity parameters of respective areas.

Page 16: Towards a Simple, Standards-Compliant, and Generic Phylogenetic Database

• Other researchers easily download and integrate her results in their own analyses.

• Even where Jane used new methods, other software understands the meaning of the metadata and can take advantage of it.

• Within shortly, her results appear in data aggregators such as iSpecies, EOL, or Scratchpads, along with those from other labs.

• Jane herself uses the LifeMap widget to map her trees onto geo-coordinates and to link branches to ecological and biodiversity parameters of respective areas.

Page 17: Towards a Simple, Standards-Compliant, and Generic Phylogenetic Database

• Other researchers easily download and integrate her results in their own analyses.

• Even where Jane used new methods, other software understands the meaning of the metadata and can take advantage of it.

• Within shortly, her results appear in data aggregators such as iSpecies, EOL, or Scratchpads, along with those from other labs.

• Jane herself uses the LifeMap widget to map her trees onto geo-coordinates and to link branches to ecological and biodiversity parameters of respective areas.

Page 18: Towards a Simple, Standards-Compliant, and Generic Phylogenetic Database

How to get there?

Phylogenetic Database supporting- ontologies

- arbitrary metadata(PhyloDB / BioSQL)

Precompute Query

Optimization

Data loading tools (BioSQL)

Language binding for database model

(BioPerl, Biojava, Biopython, Bioruby)

Topology-oriented Queries

Embeddable Tools

(PhyloWidget,

GBrowse TreeWidget)

Phylogenetic Trees

(Gene, Species)

ITIS, NCBI Taxonomies

Parser libraries for data and semantics

standards (NeXML, CDAO)

Middleware: Query & Persistence Management

Data and other services API (PhyloWS)

supporting exchange standards (NeXML, CDAO)

TaxonomiesCharacter

Data

Metadata (Evolutionary, Biodiversity,

Computational)

Client-based Query

Interfaces

Data Aggregators,

Mash-up Applications

Molecular Data

(Sequences, Annotation)

Ontologies

Data

Management

Tools

Page 19: Towards a Simple, Standards-Compliant, and Generic Phylogenetic Database

Achieving the Vision:Coordinated & open

development,nurturing & harnessing

existing efforts

Page 20: Towards a Simple, Standards-Compliant, and Generic Phylogenetic Database

Database:PhyloDB module

Tree-Name-Identifier-Is_Rooted

Node-Label-Left_Idx-Right_Idx

Edge

Node_Path- distance

Biodatabase

TermTaxon

Bioentry Ontology

-Value-Rank

Node_Qualifier_Value

Tree_Dbxref

-Value-Rank

Edge_Qualifier_Value

Node_Dbxref

-Value-Rank

Tree_Qualifier_Value

-Is_Alternate-Significance

Tree_Root

Dbxref

-Rank

Node_Taxon

-Rank

Node_Bioentry

Page 23: Towards a Simple, Standards-Compliant, and Generic Phylogenetic Database

Semantics: CDAO

http://www.evolutionaryontology.org

Page 24: Towards a Simple, Standards-Compliant, and Generic Phylogenetic Database

Service API: PhyloWShttp://evoinfo.nescent.org/PhyloWS

Page 27: Towards a Simple, Standards-Compliant, and Generic Phylogenetic Database

Nurturing the community

Page 28: Towards a Simple, Standards-Compliant, and Generic Phylogenetic Database

Phyloinformatics Hackathon, Dec 2006

Page 29: Towards a Simple, Standards-Compliant, and Generic Phylogenetic Database

• James Estill (U. Georgia):“A Perl-based Command Line Interface to a Topological Query Application for BioSQL in Support of High Throughput Classification and Analysis of LTR Retrotransposons in Plant Genomes”

Page 30: Towards a Simple, Standards-Compliant, and Generic Phylogenetic Database

Acknowledgments

• Phyloinformatics Hackathon participants

• BioHackathon 2008 participants

• EvoInformatics Working Group participants

• Google Summer of Code Students:Jamie Estill

• Sponsors & support:

• NESCent

• BioSynC

• TDWG

• DBCLS, CBRC (Japan)