13
Quest for Orthologs* anchoring comparative biology research Sharing and delivery of reusable phylogenetic knowledge TDWG 2013 Annual Conference - October 2013 Florence, Italy *http://questfororthologs.org/ Wednesday, October 23, 13

Quest for Orthologs: anchoring comparative biology research (TDWG 2013)

Embed Size (px)

DESCRIPTION

Quest for Orthologs (QfO) is a volunteer effort to jointly create a firmer foundation for comparative biology research. Extrapolating knowledge from a handful of organisms for which experimental data is available to other living organisms is the heart of comparative biology, and orthology assignments are axiomatic. Whether the aim is to find the gene in a model organism corresponding to a human disease gene, inferring the function of a newly sequenced gene using available experimental assays from its orthologs, or inferring species phylogenies by tracing the evolution of orthologous groups, reliable ortholog predictions are needed. Any improvement to this essential base will offer enormous benefit to the entire biological community. Currently more than 30 different phylogenomic databases provide their ortholog analysis results to the scientific community, each differing in many ways—number of species, taxonomic range, sampling density, applied methodology, and more. In addition, phylogenomic databases differ in their concepts, making direct benchmarking problematic and presenting major obstacles to the user community looking for relationships between genes. One of the prerequisite for the QfO consortium is a shared species phylogeny, which is used for the final determination of gene relationships during construction of sequence-based gene trees. A cladogram was therefore constructed for the 147 species of the reference proteomes (http://www.ebi.ac.uk/reference_proteomes/) based on information from various resources (http://wiki.isb-sib.ch/swisstree/Species_tree_for_Quest_for_Orthologs_reference_proteomes_2013). This talk presents the current progress by the QfO consortium and ideally will lead to a discussion of common needs and open questions.

Citation preview

Page 1: Quest for Orthologs: anchoring comparative biology research (TDWG 2013)

Quest for Orthologs*anchoring comparative biology researchSharing and delivery of reusable phylogenetic knowledgeTDWG 2013 Annual Conference - October 2013 Florence, Italy

*http://questfororthologs.org/

Wednesday, October 23, 13

Page 2: Quest for Orthologs: anchoring comparative biology research (TDWG 2013)

Evolutionary conservation allows knowledge transfer between well-characterized model organisms to human & other organisms and is the basis for comparative genomic studies

Wednesday, October 23, 13

Page 3: Quest for Orthologs: anchoring comparative biology research (TDWG 2013)

The barriersMore than 30 phylogenomic databases provide their analysis results to the scientific community.

The content of these databases differ

The concepts of these databases also differ

Complex/slow pipelines

Unavailability as stand alone programsDifferent output formatsLack of benchmarking data sets

Consequently comparing and choosing is difficult

Wednesday, October 23, 13

Page 4: Quest for Orthologs: anchoring comparative biology research (TDWG 2013)

Who we are

SWISS INSTITUTE OF BIOINFORMATICS

EUROPEAN BIOINFORMATICS INSTITUTE

STOCKHOLMS UNIVERSITET

UNIVERSITY OF LAUSANNE

JACKSON LABORATORY

EIDGENÖSSISCHE TECHNISCHE HOCHSCHULE ZÜRICH

INSTITUT DE GÉNÉTIQUE ET MICROBIOLOGIE

UNIVERSITÉ DE GENEVE

NATIONAL INSTITUTE FOR BASIC BIOLOGY, JAPAN

EUROPEAN MOLECULAR BIOLOGY LABORATORY

INSTITUT NATIONAL DE LA RECHERCHE AGRONOMIQUE

UNIVERSITÄT BONN

UNIVERSIDAD DE MURCIA

CENTRE INTERNATIONALE POUR LA RECHERCHE AGRONOMIQUE POUR LE DÉVELOPPEMENT

JOINT GENOME INSTITUTE SYNGENTA

UNIVERSITY OF CAMBRIDGE

UNIVERSITÉ DE LYON CENTRE DE REGULACIÓ GENOMICS, BARCELONA

PRINCETON UNIVERSITY UNIVERSITY OF PENNSYLVANIA

SANGER INSTITUTE

Wednesday, October 23, 13

Page 5: Quest for Orthologs: anchoring comparative biology research (TDWG 2013)

Quest for Orthologs’ objectivesA collaboration of phylogenomic databases

Use shared reference datasets (proteomes and species trees)

Benchmark orthology predictions

Use an agreed format

Evaluate emerging new methods

Wednesday, October 23, 13

Page 6: Quest for Orthologs: anchoring comparative biology research (TDWG 2013)

QfO - proteomesCriteria 1: include the major experimental model organisms

Criteria 2: include a broad taxonomic range of genomes

Common dataset: QfO Reference Proteome: http://www.ebi.ac.uk/reference_proteomes

Currently 147 species that are publicly available and are generated using UniProtKB, Ensembl and Ensembl Genomes.

Additional species on request, annual release in April

Wednesday, October 23, 13

Page 8: Quest for Orthologs: anchoring comparative biology research (TDWG 2013)

QfO - BenchmarkCompared: Ensembl Compara, InParanoid (Full, core), MetaPhOrs (Missing 3 genomes), OMA (Pairs, Groups, HOGs), Orthoinspector 1.30, PANTHER 8.0 (LDO only, all), PhylomeDB, RSD 0.8 1e-5 (RoundUp)OrthoBench: http://orthology.benchmarkservice.orgBattery of approaches: Species-tree discordance test: Gold standard gene trees: Gold standard (hierarchical) orthologous groupsMinimum standard and sanity check (already useful): Minimum Information for an Orthology Prediction Algorithm?Guide to improve algorithms

Wednesday, October 23, 13

Page 9: Quest for Orthologs: anchoring comparative biology research (TDWG 2013)

A species tree is keyA reliable species phylogeny enhances prediction of gene relationships

Current cladogram comprised of 147 species from the reference datasets and is based on information from various resources

Newick format3 identifiers: UniProtKB species code, scientific name, NCBI taxidRelevant publications for speciation nodes possible

For QfO benchmarking only needs to cover current accepted models of species evolution

A time-tree would be desirable to define rules for the introduction of multi-furcating nodes for benchmarking purposes

Ortholog DB providers use it for gene/species tree reconciliation

Wednesday, October 23, 13

Page 10: Quest for Orthologs: anchoring comparative biology research (TDWG 2013)

Ontology based annotation of trees

Wednesday, October 23, 13

Page 11: Quest for Orthologs: anchoring comparative biology research (TDWG 2013)

ResourcesQuest for Orthologs—http://questfororthologs.org/

Alan Wilter Sousa da Silva— http://www.ebi.ac.uk/reference_proteomes

Eric Sonnhammer & Matthieu Muffato— http://seqxml.org

Adrian Altenhoff & Christophe Dessimov— http://orthology.benchmarkservice.org

Brigitte Boeckman—wiki.isb-sib.ch/swisstree

Wednesday, October 23, 13

Page 12: Quest for Orthologs: anchoring comparative biology research (TDWG 2013)

Our questionsAre model differences among the different ToLs documented and if yes, is this info made public or can be made available to QfO?Which tree format is best to use for data comparison and update?Are confidence values for internal nodes in the ToLs made available?Are ToLs available for download (formats, update frequencies, release identifiers?,...)Are species identifiers of ToL projects in sync with the NCBI TaxIds? Is there a point of contact and communication? How can we productively engage with the ToL & taxonomy community on a cooperative effort?

Wednesday, October 23, 13

Page 13: Quest for Orthologs: anchoring comparative biology research (TDWG 2013)

SWISS INSTITUTE OF BIOINFORMATICSBrigitte BoeckmannUNIVERSITÉ DE LYONVincent DaubinEUROPEAN MOLECULAR BIOLOGY LABORATORYKristoffer ForslundCENTRE DE REGULACIÓ GENOMICS, BARCELONA Toni GabaldonEUROPEAN BIOINFORMATICS INSTITUTEMatthieu Muffato

Fabian Schreiber SANGER INSTITUTE

Special thanks to

Wednesday, October 23, 13