From genes, to genomes to networks, with community aided cura5on
Fiona Brinkman Simon Fraser University
Biocura4on conference April 2013
From genes, to genomes to networks, with community aided cura5on
with a li?le help from my friends …
3
Targe4ng major players resul4ng in infec4ous disease:
o Pathogen virulence ID an4-‐infec4ves (don’t kill the pathogen, disarm them)
o Host immune system failure/over-‐ac4vity Immune modulators that dampen damaging inflamma4on and boost “good” immune response
o Changes in environment/social factors Integra4ng pathogen genome data with environment, microbiome, and social network data Be?er iden4fy source/cause of disease outbreaks
My Primary Research Interest
Developing more sustainable approaches for infec:ous disease control …using novel computa:onal tools, integrated data and interdisciplinary approaches
4
o Pathogen virulence PSORTb – Protein localiza4on analysis (ID cell surface/secreted drug targets) IslandViewer – Genomic island analysis, pathogen-‐associated genes Ortholuge DB – Precomputed assessments of bacterial orthologs Genera-‐specific DBs like Pseudomonas Genome Database
o Host immune system failure/over-‐ac5vity InnateDB – Human/Mouse interactome + curated innate immunity-‐associated interac4ons
o Changes in environment/social factors Metagenomics projects Integrated Rapid Infec4ous Disease Analysis Pipeline (IRIDA)
Some of our labs tools…
5
o Pathogen virulence PSORTb – Protein localiza4on analysis (ID cell surface/secreted drug targets) IslandViewer – Genomic island analysis, pathogen-‐associated genes Ortholuge DB – Precomputed assessments of bacterial orthologs Genera-‐specific DBs like Pseudomonas Genome Database
o Host immune system failure/over-‐ac5vity InnateDB – Human/Mouse interactome + curated innate immunity-‐associated interac4ons
o Changes in environment/social factors Metagenomics projects Integrated Rapid Infec4ous Disease Analysis Pipeline (IRIDA)
Some of our labs tools…
6
High quality analyses are only as good as the robust data, effec:ve data organiza:on and accurate analysis methods used.
Want high accuracy – usually erring on the side of high precision at the expense of recall.
To a?ain high accuracy, biocura4on is oben KEY
Research Philosophy
Robust data
Data organization
Accurate analysis methods
The Nexus
7
Overview
• Community-‐based Community-‐aided gene/genome annota4on • 1997 – present: Pseudomonas Genome Project and PseudoCAP
(Pseudomonas Community Annota4on Project)
• Community-‐aided Mul4ple community-‐aided contextual cura4on of molecular interac4ons
• 2006 – present: InnateDB project
• What we’re doing next…
• Funding it all!
8
Pseudomonas Community Annota5on Project
Goals
Cri4cal and conserva4ve genome annota4on Minimize project costs Capitalize on large Pseudomonas aeruginosa research community
Solu:on
Community-‐based, Internet-‐based approach for (con4nually updated) genome annota4on
“Crowdsourcing” in the 90’s!
9
Pseudomonas Community Annota5on Project
Ini:al PseudoCAP leading to genome publica:on (1997 – 2000)
61 researchers from 13 countries, 1741 annota4ons
Focus on conserva4ve annota4on
Need to capture researcher’s excellent, diverse biol biological knowledge, NOT their diverse ways of annota4ng!
10
Pseudomonas Community Annota5on Project
Ini:al PseudoCAP leading to genome publica:on (1997 – 2000)
Ini4al 1741 community-‐based annota4ons… Annota4ons incorporated by 3 annotators through web-‐based tool 1st fully internet-‐based community annota4on effort
11
Pseudomonas Community Annota5on Project
Current PseudoCAP – con:nually updated annota:on (2000 – present)
151 researchers, 2356 curated gene annota4ons (not incl. computa4onal analyses)
Movement from gene-‐based genes plus other genome features (2,590 other genome features added in the last year alone)
Found we needed to further modify our community-‐based approach…
Winsor et al 2011 PMID: 20929876 Winsor et al 2005 PMID: 15608211
12
Pseudomonas Community Annota5on Project
Current PseudoCAP – con:nually updated annota:on (2000 – present)
Annota4ons incorporated by one part 4me project coordinator Subject to review process (peer reviewed paper or other peer review)
Increasing movement from Community-‐based Community-‐aided -‐ Coordinator contacts researchers more to get input -‐ Capitalize on exper4se most efficiently -‐ Coordinator ensures consistency
Coordinator and community collec4vely ensures quality
13
Pseudomonas Community Annota5on Project
Challenges and Solu:ons
-‐ Disputes between researchers regarding an annota4on -‐ Go with first published and have alternate annota4ons
-‐ Researchers are busy! -‐ Keep submission system/input process simple! -‐ We now contact them more than they contact us -‐ Have rounds of major annota4on pushes
Future: Will try again the “paper carrot” for another annota4on push – authorship on a NAR update paper (as a consor4um) to encourage par4cipa4on
14
InnateDB: Cura5ng molecular interac5ons, networks
Community-‐aided “Mul4ple community-‐aided” Highly contextual annota4on
Mouse Model Datasets:
Cerebral Malaria mouse model (IMR, Australia)
Tuberculosis mouse model (AECM)
Shigella xenograft model (Pasteur)
Human Clinical Datasets:
Typhoid & Malaria Vietnam (OUCRU/Stanford/
Sanger)
Non Typhoidal Salmonella Malawi (Sanger)
Chronic/Acute Helminth Ecuador (USF de Quito/
Sanger)
Dengue (OUCRU) Modulating innate immune response via
Host Defense peptides (Hancock lab, UBC)
Mouse KOs (Sanger)
+
InnateDB Developed to Aid Two Large Interna4onal Systems Biology Projects
Novel insight into host response and mechanism of peptides. Common Pathways, networks and transcriptional regulation.
Thompson et al PNAS December 2009
Systems Biology & The Innate Immune Response:
Many layers of complexity.
Layers of regulation: transcriptional; post-transcriptional (miRNAs); post-translational (ubiquitination, phosphorylation)
Host-pathogen interactions
100s – 1000s DE genes
Not simple pathways - networks of molecular interactions
Gardy*, Lynn*, Brinkman, Hancock (2009). Enabling a systems biology approach to immunology: focus on innate immunity. Trends in Immunology PMID: 19428301
Breuer et al., 2013 InnateDB: systems biology of innate immunity and beyond… NAR (DB issue) PMID: 23180781
Manual Curation of Interaction Data From Literature to Database Greatly Enhances Coverage of Innate Immunity Interactome
INNATEDB CURATED INTERACTOME
INTERACTIONS ALSO CURATED BY TOP 5 OTHER
INTERACTION DATABASES:
BIND, INTACT, DIP, BIOGRID & MINT
INTERACTIONS ONLY CURATED
BY INNATEDB
Lynn et al., Curating the Innate Immunity Interactome. BMC Systems Biology 2010 PMID: 20727158
Manual Cura4on of Interac4on Data From Literature to Database – Enhancing coverage of Innate Immunity Interactome
Breuer et al., 2013 InnateDB: systems biology of innate immunity and beyond… Nucleic Acids Research (Database issue)
The InnateDB curated interactome in July 2012. Red edges represent interac4ons that have been added in 2011 and 2012.
Contextually Curating Innate Immunity-Relevant Interactions
Annotated fields include:
Molecule type; organism; biological role; interaction detection method; the host system (in vitro, in vivo, ex vivo); host organism; interaction type; cell, cell-line and tissue types; cell status (primary/cell line); experimental role; participant identification method and sub-cellular localization, plus variety of additional curator comments.
Curating Innate Immunity-Relevant Interactions
71% human, 22% mouse, 7% human- mouse
~80% interactions in innate immunity interactome not annotated by other major databases
Protein (69%), DNA and RNA interactions
Developed InnateDB submission system software to allow submission of interaction annotation in an OBO ontology-controlled and MIMIx & PSI-MI 2.5 compliant manner.
Lynn et al., Curating the Innate Immunity Interactome. BMC Systems Biology 2010 PMID: 20727158
Which journals are curated?
>4,400 journal articles curated to date
Don’t focus on specific journals - relevant articles curated if meet appropriate quality standards for the interaction evidence.
Indeed, at least one protein has been curated from >200 different journals.
More than 70% of curated articles have come from 20 journals.
Note many journals in top 20 are not “immunology journals”, underscoring importance of not limiting curation efforts to journals perceived as “relevant”.
Curating Innate Immunity-Relevant Interactions – 4-pronged approach
Curation primarily pathway-centric systematically review all literature describing interactions for a particular innate
immunity pathway. Curate all other interactors regardless of whether the interacting molecule is a member
of the pathway or has any known role in innate immunity expands network outside of known innate immunity players.
Systematically curated pathways are scheduled for frequent re-curation as the field is moving quickly.
Also, new publications on innate immunity assessed on a daily basis to identify novel interactions of interest. Priority given to the most recent publications incorporates new information on the
most current research
Immunology Community-aided: Curators consult with researchers to confirm unclear literature data Most common issue: Unclear what species the protein/DNA/RNA interactors come
from
Curation Community-aided: InnateDB curators review each others curations as an error check IMEx consortium!
http://www.innatedb.com/doc/InnateDB_2010_curation_guide.pdf
• InnateDB is a member of IMEx – an interna4onal consor4um of interac4on databases involved in cura4on
• Goal: Develop common standards, avoid too much redundancy in data collec4on/cura4on, central registry, single search interface
• Orchard et al Nature Methods 9:345-‐350 PMID: 22453911
• Stay tuned for Sandra Orchard's talk!
Going Beyond Innate Immunity – An Integrative Biology Resource
>196,000 human and mouse interactions extracted & loaded from BIND, INTACT, DIP, BIOGRID & MINT DBs
Cross-referenced genes to >3,000 pathways from KEGG, PID, BIOCARTA, INOH, NetPath & Reactome DBs Visualize/analyze interactions
associated with specific pathway Pathway over-representation analysis
Ensembl annotation provides details of all human & mouse genes/transcripts/ proteins. UniProt, Entrez, Gene Ontology, etc rich protein & gene annotation
Transcript. factor–DNA interactions experimentally confirmed from Transfac, TransCompel
Robust orthology & gene synteny analysis facilitate human-mouse comparisons
InnateDB – Advanced Yet User-‐Friendly Searching – Find & Analyze Relevant Interac4ons, Pathways & Genes/Proteins.
InnateDB – Facilita4ng Systems-‐Level Analyses of Gene Expression Data
Upload Your Own Gene Expression Data - Up to 10 conditions/timepoints at 1 time.
Overlay Gene Expression Data from Multiple Conditions on
Networks/Pathways
Pathway, Gene Ontology & TF ORA tools Find – DE Pathways/Functionally Related
Genes/TFs
Go Beyond Pathway Analysis – Differentially Expressed Sub-networks – New Pathways? How Are DE Genes Actually Inter-connected? Central Regulators
(Network Hubs)
InnateDB and curated data aided study of an immune modulator – host-directed adjunctive therapy coupled with anti-malarial
29
What we’re doing next… Need to develop more ontologies and data standards to integrate microbial genomic data from a disease outbreak with epidemiological data.
Cura4ng pathogen status for complete microbial genomes
Will try the “paper carrot” again for next Pseudomonas Genome Database cura4on project
InnateDB – expanding to Allergy and Asthma
Iden4fy genes unique to/shared between strains, species, genera, any selected bacteria….
30
31
Funding! Grants!
One of the biggest challenges is to secure long term, reliable funding.
We've found:
Need to target cura4on to specific bio projects. (ie innate immunity, then to allergy and asthma; aiding a specific Pseudomonas analysis)
Limits what we can do, but good in the sense that cura4on benefits are more quickly felt as they are needed/used by others
32
Concluding comments
Using community-‐aided, expert curator-‐centered, approach for balancing consistency, reliability and maximizing knowledge. Degree of community involvement depends on nature of data.
Capitalize on both bio community and cura4on community – keep linked
Researchers are busy! Make it super easy for them to provide input. A li?le contribu4on can go a long way
Paper carrots!
Link cura4on to bio research to secure funding
Indoctrinate young minds! Get biocura4on and its challenges into undergrad curriculums
Acknowledgements - InnateDB
InnateDB Principle Investigators: Fiona Brinkman (SFU) Bob Hancock (UBC) David Lynn (Teagasc)
InnateDB Development: Karin Breuer Geoff Winsor Matthew Laird Calvin Chan Amir Foroushani Brian Meredith Nathan Lawless Nicolas Richard Avinash Chikatamarla Fiona Roche Timothy Chan Naisha Shah Michael Acab
InnateDB Curation: www.innatedb.com
Raymond Lo Anastasia Sribnaia Carol Chan Misbah Naseer Melissa Yau Giselle Ring Kathleen Wee Jaimmie Que
Cerebral network visualizer:
Aaron Barsky Jennifer Gardy Tamara Munzner
FNIH/GCGH Collaborators:
Gordon Dougan (Sanger) Fernanda Schreiber (Sanger) Melita Gordon (U. Liverpool) Bill Jacobs (AECM) Dee Dao (AECM) Philip Cooper (St. Georges) Louis Schofield (WEHI) Sandra Pilat (WEHI) Sarah Dunstan (OUCRU) Brett Finlay (UBC)
Acknowledgements – PseudoCAP
Geoff Winsor Ray Lo Ma? Laird Bhav Dhillon Ma?hew Whiteside
151 PseudoCAP par4cipants
www.pseudomonas.com