Transcript
Page 1: Quest for Orthologs: anchoring comparative biology research (TDWG 2013)

Quest for Orthologs*anchoring comparative biology researchSharing and delivery of reusable phylogenetic knowledgeTDWG 2013 Annual Conference - October 2013 Florence, Italy

*http://questfororthologs.org/

Wednesday, October 23, 13

Page 2: Quest for Orthologs: anchoring comparative biology research (TDWG 2013)

Evolutionary conservation allows knowledge transfer between well-characterized model organisms to human & other organisms and is the basis for comparative genomic studies

Wednesday, October 23, 13

Page 3: Quest for Orthologs: anchoring comparative biology research (TDWG 2013)

The barriersMore than 30 phylogenomic databases provide their analysis results to the scientific community.

The content of these databases differ

The concepts of these databases also differ

Complex/slow pipelines

Unavailability as stand alone programsDifferent output formatsLack of benchmarking data sets

Consequently comparing and choosing is difficult

Wednesday, October 23, 13

Page 4: Quest for Orthologs: anchoring comparative biology research (TDWG 2013)

Who we are

SWISS INSTITUTE OF BIOINFORMATICS

EUROPEAN BIOINFORMATICS INSTITUTE

STOCKHOLMS UNIVERSITET

UNIVERSITY OF LAUSANNE

JACKSON LABORATORY

EIDGENÖSSISCHE TECHNISCHE HOCHSCHULE ZÜRICH

INSTITUT DE GÉNÉTIQUE ET MICROBIOLOGIE

UNIVERSITÉ DE GENEVE

NATIONAL INSTITUTE FOR BASIC BIOLOGY, JAPAN

EUROPEAN MOLECULAR BIOLOGY LABORATORY

INSTITUT NATIONAL DE LA RECHERCHE AGRONOMIQUE

UNIVERSITÄT BONN

UNIVERSIDAD DE MURCIA

CENTRE INTERNATIONALE POUR LA RECHERCHE AGRONOMIQUE POUR LE DÉVELOPPEMENT

JOINT GENOME INSTITUTE SYNGENTA

UNIVERSITY OF CAMBRIDGE

UNIVERSITÉ DE LYON CENTRE DE REGULACIÓ GENOMICS, BARCELONA

PRINCETON UNIVERSITY UNIVERSITY OF PENNSYLVANIA

SANGER INSTITUTE

Wednesday, October 23, 13

Page 5: Quest for Orthologs: anchoring comparative biology research (TDWG 2013)

Quest for Orthologs’ objectivesA collaboration of phylogenomic databases

Use shared reference datasets (proteomes and species trees)

Benchmark orthology predictions

Use an agreed format

Evaluate emerging new methods

Wednesday, October 23, 13

Page 6: Quest for Orthologs: anchoring comparative biology research (TDWG 2013)

QfO - proteomesCriteria 1: include the major experimental model organisms

Criteria 2: include a broad taxonomic range of genomes

Common dataset: QfO Reference Proteome: http://www.ebi.ac.uk/reference_proteomes

Currently 147 species that are publicly available and are generated using UniProtKB, Ensembl and Ensembl Genomes.

Additional species on request, annual release in April

Wednesday, October 23, 13

Page 8: Quest for Orthologs: anchoring comparative biology research (TDWG 2013)

QfO - BenchmarkCompared: Ensembl Compara, InParanoid (Full, core), MetaPhOrs (Missing 3 genomes), OMA (Pairs, Groups, HOGs), Orthoinspector 1.30, PANTHER 8.0 (LDO only, all), PhylomeDB, RSD 0.8 1e-5 (RoundUp)OrthoBench: http://orthology.benchmarkservice.orgBattery of approaches: Species-tree discordance test: Gold standard gene trees: Gold standard (hierarchical) orthologous groupsMinimum standard and sanity check (already useful): Minimum Information for an Orthology Prediction Algorithm?Guide to improve algorithms

Wednesday, October 23, 13

Page 9: Quest for Orthologs: anchoring comparative biology research (TDWG 2013)

A species tree is keyA reliable species phylogeny enhances prediction of gene relationships

Current cladogram comprised of 147 species from the reference datasets and is based on information from various resources

Newick format3 identifiers: UniProtKB species code, scientific name, NCBI taxidRelevant publications for speciation nodes possible

For QfO benchmarking only needs to cover current accepted models of species evolution

A time-tree would be desirable to define rules for the introduction of multi-furcating nodes for benchmarking purposes

Ortholog DB providers use it for gene/species tree reconciliation

Wednesday, October 23, 13

Page 10: Quest for Orthologs: anchoring comparative biology research (TDWG 2013)

Ontology based annotation of trees

Wednesday, October 23, 13

Page 11: Quest for Orthologs: anchoring comparative biology research (TDWG 2013)

ResourcesQuest for Orthologs—http://questfororthologs.org/

Alan Wilter Sousa da Silva— http://www.ebi.ac.uk/reference_proteomes

Eric Sonnhammer & Matthieu Muffato— http://seqxml.org

Adrian Altenhoff & Christophe Dessimov— http://orthology.benchmarkservice.org

Brigitte Boeckman—wiki.isb-sib.ch/swisstree

Wednesday, October 23, 13

Page 12: Quest for Orthologs: anchoring comparative biology research (TDWG 2013)

Our questionsAre model differences among the different ToLs documented and if yes, is this info made public or can be made available to QfO?Which tree format is best to use for data comparison and update?Are confidence values for internal nodes in the ToLs made available?Are ToLs available for download (formats, update frequencies, release identifiers?,...)Are species identifiers of ToL projects in sync with the NCBI TaxIds? Is there a point of contact and communication? How can we productively engage with the ToL & taxonomy community on a cooperative effort?

Wednesday, October 23, 13

Page 13: Quest for Orthologs: anchoring comparative biology research (TDWG 2013)

SWISS INSTITUTE OF BIOINFORMATICSBrigitte BoeckmannUNIVERSITÉ DE LYONVincent DaubinEUROPEAN MOLECULAR BIOLOGY LABORATORYKristoffer ForslundCENTRE DE REGULACIÓ GENOMICS, BARCELONA Toni GabaldonEUROPEAN BIOINFORMATICS INSTITUTEMatthieu Muffato

Fabian Schreiber SANGER INSTITUTE

Special thanks to

Wednesday, October 23, 13