Upload
marypanahiazar
View
382
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Building a Foundation to Enable Semantic Technologies for Phylogenetically-Based Comparative Analyses
Citation preview
Building a Foundation to Enable Semantic Technologies for Phylogenetically-Based
Comparative Analyses
Maryam Panahiazar1, Arlin Stoltzfus2, Rutger Vos3, Enrico Pontelli4 and Jim Leebens-Mack1
1University of Georgia, USA2NIST & University of Maryland, USA 3University of Reading, UK 4New Mexico State University
Phyloinformatics
04/14/23 1
Phyloinformatics
04/14/23 2
Motivation
“Nothing in biology makes sense except in the light of evolution” (Theodosius Dobzhansky, 1973)…. and
Nothing in evolution makes sense except in the light of phylogeny
04/14/23 3
For example - Prediction of gene and protein function
Jonathan A. Eisen, 1998,Genome Research, 8:163-167
Phyloinformatics
1. Choose gene of interest1. Choose gene of interest 2. Identify homolog2. Identify homolog
3.Align sequences3.Align sequences
4.Calculate gene tree4.Calculate gene tree
5.Overaly known functions onto tree5.Overaly known functions onto tree
6. Hypothesize function for all genes6. Hypothesize function for all genes
7. Reconcile gene and species trees 7. Reconcile gene and species trees
After Eisen 1998,Genome Research
04/14/23 4
Example 2 – Testing congruence among phylogeographic analyses
Knowles 2009 after Avis 1992
Phyloinformatics
1. Compile results of phylogeographic analyses for multiple species from the same geographic region
1. Compile results of phylogeographic analyses for multiple species from the same geographic region
2. Apply demographic models to account for variation in generation times and substitution rates
2. Apply demographic models to account for variation in generation times and substitution rates
After Knowles 2009, Annu. Rev. Ecol. Evol. Syst.
Applying Semantics to BioinformaticsIntegrative bioinformatics experimentation cycle
04/14/23 5
1.Problem Definition1.Problem Definition
2. Experimental Design2. Experimental Design
3. Data Integration3. Data Integration
4. Data Analysis4. Data Analysis
15. Interpretation15. Interpretation
Biological hypothesisBiological hypothesis
ProtocolProtocol
Raw integration resultRaw integration result
Analysis resultAnalysis result
knowledgeknowledge
1. Imported or create data and knowledge models
1. Imported or create data and knowledge models
2.Use data models to transform raw data to RDF data
2.Use data models to transform raw data to RDF data
3. Link data models to knowledge models
3. Link data models to knowledge models
4. Select common domain4. Select common domain
5. Construct and run semantic query5. Construct and run semantic query
Raw integration resultRaw integration result
Lennart J.G. Post, Marco Roos, M. Scott Marshall, Roel van Driel and Timo M. Breit. A semantic web approach applied to integrative bioinformatics experimentation: a biological use case with genomics data, Vol. 23 no. 22 2007, pages 3080–3087 doi:10.1093/bioinformatics/btm461
Phyloinformatics
04/14/23 6
Requirements for data reuse in comparative analyses:
• Easy access to machine-readable trees, data matrices and meta-data (e.g. sample characteristics including sample locality)
• A minimum reporting standard for phylogenetic analyses (MIAPA).
• A controlled vocabulary for describing components of phylogenetic workflows
Bioinformatics and phylogeny
04/14/23 7
Proposed components of a minimum reporting standard for phylogenetic analyses:
Leebens-Mack et al. 2006 OMICS
Bioinformatics and phylogeny
04/14/23 8
Developing an ontology for describing phylogentic workflows:
1. Catalogue published methods of phylogentic analysis (https://www.nescent.org/sites/evoio/MIAPA/PhyloWays),
2. Develop ontology that would accommodate published phylogenetic workflows,
3. Evaluate utility of ontology for describing published phylogenetic workflows.
4. Use ontology to construct NeXML files with annotated trees and data matrices
5. Elicit feedback from the Systematics community
Phyloinformatics
04/14/23 9
PhyloWays entry:
Publication:Soltis DE, Smith SA, Cellinese N, Wurdack KJ, Tank DC, Brockington SF, Refulio-Rodriguez NF, Walker JB, Moore MJ, Carlsward BS, et al. 2011. Angiosperm phylogeny: 17 genes, 640 taxa. Am J Bot 2011:ajb.1000404. - http://www.amjbot.org/cgi/reprint/ajb.1000404v1
Data: concatenated alignments for a superset of 14loci/17 genes (nucleotide sequences) sampled from 640 species. Genes included 18S rDNA (nuc), 26S rDNA (nuc), atpB (cp), atp1 (mito), matK (cp), matR (mito), nad5 (mito), ndhF (cp), psbBTNH (cp 4 gene region), rbcL (cp), rpoC2 (cp), rps16 (cp), rps3 (mito), and rps4 (cp).
Alignment method: MAFFT used to align each of 14 loci; "adjustments were made by eye when there were obvious alignment errors due to particularly divergent or “ gappy ” sequences"; Sites (columns) with > 50% missing data (including gaps due to indels) were removed using Phyutility (Smith and Dunn, 2008). All or subsets of gene alignments concatenated for phylogenetic analysis.
Tree estimation: ML analyses performed the following data matrices; nuclear rDNA genes; cp genes; mito genes; nuclear+cp genes; all 17 genes; 10 independent runs for each data matrix. Program - RAxML (vers. 7.1; Stamatakis, 2006 ). - Model of sequence evolution - GTRGAMMA with parameters estimated separately (unlinked) for each gene partition. - Method for evaluating support - 100-300 bootstrap replicates
BPhyloinformatics
04/14/23 10
Current components of PhylOnt, an ontology for describing phylogenetics workflows:
• Tree estimation program
• Method of analysis
• Construction of data matrix
• Alignment….
• Tree estimation
• optimality criterion….
• branch swamping…
• support assessment…
Phyloinformatics
04/14/23 11
Tree estimation program ontology
Phyloinformatics
04/14/23 12
Data analysis ontology diagram
Phyloinformatics
04/14/23 13
Models for character state transitions (e.g. nucleotide substitution model)
Phyloinformatics
04/14/23 14
1. Complete PhylOnt
2. Develop NeXML file builder that uses PhylOnt concepts
3. Formalize Minimum Information about Phylogenetic Analyses (MIAPA) reporting standard
4. Evaluate and refine PhylOnt for construction of MIAPA – compliant NeXML files
Next steps: