Upload
hilmar-lapp
View
962
Download
1
Embed Size (px)
DESCRIPTION
Citation preview
Towards a Simple, Standards Compliant, and
Generic Phylogenetic Database Module
Hilmar Lapp and Todd VisionNational Evolutionary Synthesis Center
(NESCent)
Rich diversity of online data repositories
Most data is not online
Clark J.R. et al. (2008) A Comparative Study in Ancestral Range Reconstruction Methods: Retracing the Uncertain Histories of Insular Lineages. Systematic Biology,57:5,693-707
Syst. Biol.Data Archive
Little standards support
Accelerating knowledge dissemination: A Story
• Jane and her lab have accumulated molecular data to resolve the phylogeny of a certain clade of frogs, many of which are endangered species.
• Her group assembles a multiple alignment and reconstructs the phylogeny using a variety of methods, some developed by her lab, resulting in 1000s of trees.
• The results show overwhelming support for several new branch points. The results are interesting and solid enough to be useful for others working on those species.
Accelerating knowledge dissemination: A Story
• Jane and her lab have accumulated molecular data to resolve the phylogeny of a certain clade of frogs, many of which are endangered species.
• Her group assembles a multiple alignment and reconstructs the phylogeny using a variety of methods, some developed by her lab, resulting in 1000s of trees.
• The results show overwhelming support for several new branch points. The results are interesting and solid enough to be useful for others working on those species.
Accelerating knowledge dissemination: A Story
• Jane and her lab have accumulated molecular data to resolve the phylogeny of a certain clade of frogs, many of which are endangered species.
• Her group assembles a multiple alignment and reconstructs the phylogeny using a variety of methods, some developed by her lab, resulting in 1000s of trees.
• The results show overwhelming support for several new branch points. The results are interesting and solid enough to be useful for others working on those species.
Accelerating knowledge dissemination: A Story
• Jane and her lab have accumulated molecular data to resolve the phylogeny of a certain clade of frogs, many of which are endangered species.
• Her group assembles a multiple alignment and reconstructs the phylogeny using a variety of methods, some developed by her lab, resulting in 1000s of trees.
• The results show overwhelming support for several new branch points. The results are interesting and solid enough to be useful for others working on those species.
• Jane downloads and installs PhyloDOM, a freely available open source software package. The software creates a database and Jane uses the programs that come with it to import all her data.
• As a result, Jane’s lab now has a web-interface to her results that others can use to query for novel topologies and to explore her data.
• Her lab also updates the database from their on-going work, and uses it to add provenance data and links to protocols, publications, and taxonomic concepts.
• Jane downloads and installs PhyloDOM, a freely available open source software package. The software creates a database and Jane uses the programs that come with it to import all her data.
• As a result, Jane’s lab now has a web-interface to her results that others can use to query for novel topologies and to explore her data.
• Her lab also updates the database from their on-going work, and uses it to add provenance data and links to protocols, publications, and taxonomic concepts.
• Jane downloads and installs PhyloDOM, a freely available open source software package. The software creates a database and Jane uses the programs that come with it to import all her data.
• As a result, Jane’s lab now has a web-interface to her results that others can use to query for novel topologies and to explore her data.
• Her lab also updates the database from their on-going work, and uses it to add provenance data and links to protocols, publications, and taxonomic concepts.
• Jane downloads and installs PhyloDOM, a freely available open source software package. The software creates a database and Jane uses the programs that come with it to import all her data.
• As a result, Jane’s lab now has a web-interface to her results that others can use to query for novel topologies and to explore her data.
• Her lab also updates the database from their on-going work, and uses it to add provenance data and links to protocols, publications, and taxonomic concepts.
• Other researchers easily download and integrate her results in their own analyses.
• Even where Jane used new methods, other software understands the meaning of the metadata and can take advantage of it.
• Within shortly, her results appear in data aggregators such as iSpecies, EOL, or Scratchpads, along with those from other labs.
• Jane herself uses the LifeMap widget to map her trees onto geo-coordinates and to link branches to ecological and biodiversity parameters of respective areas.
• Other researchers easily download and integrate her results in their own analyses.
• Even where Jane used new methods, other software understands the meaning of the metadata and can take advantage of it.
• Within shortly, her results appear in data aggregators such as iSpecies, EOL, or Scratchpads, along with those from other labs.
• Jane herself uses the LifeMap widget to map her trees onto geo-coordinates and to link branches to ecological and biodiversity parameters of respective areas.
• Other researchers easily download and integrate her results in their own analyses.
• Even where Jane used new methods, other software understands the meaning of the metadata and can take advantage of it.
• Within shortly, her results appear in data aggregators such as iSpecies, EOL, or Scratchpads, along with those from other labs.
• Jane herself uses the LifeMap widget to map her trees onto geo-coordinates and to link branches to ecological and biodiversity parameters of respective areas.
• Other researchers easily download and integrate her results in their own analyses.
• Even where Jane used new methods, other software understands the meaning of the metadata and can take advantage of it.
• Within shortly, her results appear in data aggregators such as iSpecies, EOL, or Scratchpads, along with those from other labs.
• Jane herself uses the LifeMap widget to map her trees onto geo-coordinates and to link branches to ecological and biodiversity parameters of respective areas.
• Other researchers easily download and integrate her results in their own analyses.
• Even where Jane used new methods, other software understands the meaning of the metadata and can take advantage of it.
• Within shortly, her results appear in data aggregators such as iSpecies, EOL, or Scratchpads, along with those from other labs.
• Jane herself uses the LifeMap widget to map her trees onto geo-coordinates and to link branches to ecological and biodiversity parameters of respective areas.
How to get there?
Phylogenetic Database supporting- ontologies
- arbitrary metadata(PhyloDB / BioSQL)
Precompute Query
Optimization
Data loading tools (BioSQL)
Language binding for database model
(BioPerl, Biojava, Biopython, Bioruby)
Topology-oriented Queries
Embeddable Tools
(PhyloWidget,
GBrowse TreeWidget)
Phylogenetic Trees
(Gene, Species)
ITIS, NCBI Taxonomies
Parser libraries for data and semantics
standards (NeXML, CDAO)
Middleware: Query & Persistence Management
Data and other services API (PhyloWS)
supporting exchange standards (NeXML, CDAO)
TaxonomiesCharacter
Data
Metadata (Evolutionary, Biodiversity,
Computational)
Client-based Query
Interfaces
Data Aggregators,
Mash-up Applications
Molecular Data
(Sequences, Annotation)
Ontologies
Data
Management
Tools
Achieving the Vision:Coordinated & open
development,nurturing & harnessing
existing efforts
Database:PhyloDB module
Tree-Name-Identifier-Is_Rooted
Node-Label-Left_Idx-Right_Idx
Edge
Node_Path- distance
Biodatabase
TermTaxon
Bioentry Ontology
-Value-Rank
Node_Qualifier_Value
Tree_Dbxref
-Value-Rank
Edge_Qualifier_Value
Node_Dbxref
-Value-Rank
Tree_Qualifier_Value
-Is_Alternate-Significance
Tree_Root
Dbxref
-Rank
Node_Taxon
-Rank
Node_Bioentry
Semantics: CDAO
http://www.evolutionaryontology.org
Service API: PhyloWShttp://evoinfo.nescent.org/PhyloWS
Embeddable tools:
Community-owned, reusable software
Nurturing the community
Phyloinformatics Hackathon, Dec 2006
• James Estill (U. Georgia):“A Perl-based Command Line Interface to a Topological Query Application for BioSQL in Support of High Throughput Classification and Analysis of LTR Retrotransposons in Plant Genomes”
Acknowledgments
• Phyloinformatics Hackathon participants
• BioHackathon 2008 participants
• EvoInformatics Working Group participants
• Google Summer of Code Students:Jamie Estill
• Sponsors & support:
• NESCent
• BioSynC
• TDWG
• DBCLS, CBRC (Japan)