24
Operated by Los Alamos National Security, LLC for NNSA Bioscience Discovering virulence genes present in novel strains and metagenomes Chris Stubben IC postdoc, B-7

Discovering virulence genes present in novel strains and metagenomes

  • Upload
    leane

  • View
    37

  • Download
    0

Embed Size (px)

DESCRIPTION

Discovering virulence genes present in novel strains and metagenomes. Chris Stubben IC postdoc, B-7. Overview. Review current functional classification systems Discuss Virulence Factor Ontology Identify virulence genes in novel strains and metagenomes. Functional classification systems. - PowerPoint PPT Presentation

Citation preview

Page 1: Discovering virulence genes present in novel strains and metagenomes

Operated by Los Alamos National Security, LLC for NNSA

Bioscience

Discovering virulence genes present in novel strains and metagenomes

Chris StubbenIC postdoc, B-7

Page 2: Discovering virulence genes present in novel strains and metagenomes

Operated by Los Alamos National Security, LLC for NNSA

Bioscience

Page 3: Discovering virulence genes present in novel strains and metagenomes

Overview

• Review current functional classification systems

• Discuss Virulence Factor Ontology

• Identify virulence genes in novel strains and metagenomes

Slide 3

Page 4: Discovering virulence genes present in novel strains and metagenomes

Functional classification systems

• EC numbers for enyzmes (1956)

• Swiss-Prot keywords (1986)

• E. coli gene functions, M. Riley (1993)

• TIGR role categories (1995)

• Gene Ontology (1998)

Slide 4

genegenefunctionfunction

Page 5: Discovering virulence genes present in novel strains and metagenomes

What functions are related to virulence?

• Some systems have a few terms– Swiss-Prot keywords = virulence,

toxin, antibiotic resistance– TIGR roles = pathogenesis, toxin

production and resistance

• Gene Ontology (GO) also has pathogenesis, resistance to antibiotics, plus many more

Slide 5

GO terms related to the enzymatic activity of toxins

Page 6: Discovering virulence genes present in novel strains and metagenomes

Gene Ontology (GO)

• 25,688 terms in three structured controlled vocabularies (ontologies)– 15098 biological processes– 2186 cellular components– 8404 molecular functions

• Standard for eukaryotic gene annotation

• Increasingly used for prokaryotes– TIGR (2002)– Plant pathogens by PAMGO at VBI (2005)– Human pathogens at 8 BRCs (2006)

Slide 6

Page 7: Discovering virulence genes present in novel strains and metagenomes

Bioinformatics Resource Centers (BRC)

• NIAID funded, $100 million dollar effort to create eight bioinformatic centers for human pathogens

• Goal is to provide easy access to genomic data from multiple strains like eukaryotic model organism databases

Slide 7

BRCs = ?

Page 8: Discovering virulence genes present in novel strains and metagenomes

Example: Toxin annotation in GO

Slide 8

Step 1, Assign GO terms, maybe– activation of Rho GTPase activity– N-terminal peptidyl-glutamine deamination– actin cytoskeleton reorganization– stress fiber formation

Page 9: Discovering virulence genes present in novel strains and metagenomes

Step 2, add references and evidence codes

Slide 9

Virulence Protein

ExperimentalExperimental

Sequence similarity

Sequence similarity

Genomic context

Genomic context

ComputationalComputational

Function

• Knockout mutants (IMP)• Overexpression phenotypes (IDA)• Genetic interactions (IGI)• Microarrays (IEP or RCA)

• BLAST alignments (ISA)• Orthologous proteins (ISO) • Hidden markov models of protein families or domains (ISM)

• Phlyogenetic profiles, conserved neighborhoods, gene fusion, shared regulatory sites, etc (IGC)

Page 10: Discovering virulence genes present in novel strains and metagenomes

Example: Toxin searches in GO

Slide 10

• If a gene is annotated to ‘adenylate cyclase activity’, how do you know it’s a toxin?

• It may also annotated to “cell killing” or related term, but is that enough?

• However, an alternative is to define virulence factors and toxins (both outside the scope of GO) in a new ontology

Page 11: Discovering virulence genes present in novel strains and metagenomes

Why we need a Virulence Factor ontology

• Lots of effort to characterize pathogenic processes and systems (eg, BRCs)

• Many different definitions of pathogen, virulence and virulence factors

• Not clear what terms in GO may be related to toxins and virulence (BRCs have already assigned 750,000 GO terms to 300,000 genes)

Slide 11

Page 12: Discovering virulence genes present in novel strains and metagenomes

Virulence Factor Ontology working group

• Goal is to combine existing toxin and virulence terms from various groups into a single ontology – TVFac and antibiotic resistance (AR) terms at LANL– Gemina virulence factors and AR terms at U. of Maryland– PAMGO terms in GO

• Participants– MITRE. Lynette Hirschmman, Marc Colosimo, and others– LANL. Chris Stubben, Murray Wolinsky and Jian Song– U of Maryland IGS. Lynn Schriml and Michelle Gwinn

Slide 12

Page 13: Discovering virulence genes present in novel strains and metagenomes

Virulence Factor Ontology (VFO)

• Three new ontologies, one very simple that points to additional terms in GO or to new ontologies

• Virulence factor (definition needed!)– toxin associated processes – antibiotic resistance– adhesion– entry into host– acquisition of nutrients from host– avoidance of host defenses– growth within host– modification of host morhphology– dissemination from host

Slide 13

New

New

simplified GO trees (slims)

Page 14: Discovering virulence genes present in novel strains and metagenomes

Virulence genes in novel strains

• Emerging, engineered and novel strains will most likely be sequenced quickly using next generation sequencing technologies,

• and then compared to near neighbor strains using sequence similarity (BLAST) or models (HMMs like PFams, TIGRFams, FIGFams, EnteroFams, etc).

Slide 14

Page 15: Discovering virulence genes present in novel strains and metagenomes

Compare novel strains to what?

• Very few manual annotations available for prokaryotes, especially in public databases like NCBI and UniProt

Slide 15

“Curated information from the literature serves as the gold-standard data set for comparative analyses”-Nature Sep10, 2008

Table 1. Percentage of genes in UniProt with functional assignments to Gene Ontology terms based on experimental evidence in the primary literature.

Use BRCs!

Page 16: Discovering virulence genes present in novel strains and metagenomes

BRC annotations

• Genomes annotations should have references and evidence codes signifying whether annotations were produced experimentally or computationally

Slide 16

3.8% of Y.pestis CO92 with manual annotations

Page 17: Discovering virulence genes present in novel strains and metagenomes

Y. pestis CO92 annotations at ERIC

Slide 17

Table 1 and 2. Sequence features and coding sequence annotations for Y. pestis CO92 at ERIC

Page 18: Discovering virulence genes present in novel strains and metagenomes

Yersinia antibiotic resistance genes

Slide 18

Table 1 and 2. Antibiotic resistance genes found using Swiss-prot keyword search ‘antibiotic resistance’ in UniProt and using GO term search ‘response to antibiotic’ in ERIC.

Only one gene in common!

Page 19: Discovering virulence genes present in novel strains and metagenomes

Vibrio toxins in GO, UniProt, and NMPDR

Slide 19

Page 20: Discovering virulence genes present in novel strains and metagenomes

Virulence genes in metagenomes

• Recent comparison of virulence genes in chicken, cow, mouse and human gut metagenomes (metavirulomes) was based on SEED subsystem categories at NMPDR

Slide 20

• Another alternative is to use GO term mappings to protein family and domain databases like PFam

Page 21: Discovering virulence genes present in novel strains and metagenomes

IMG/metagenomes from JGI

Slide 21

• Select metagenomes and save

Page 22: Discovering virulence genes present in novel strains and metagenomes

Create abundance profiles

Slide 22

• Compare using Pfam, COG, or TIGRfam abundance profiles

Page 23: Discovering virulence genes present in novel strains and metagenomes

Find virulence genes

Slide 23

• Use GO term mappings to PFAM database to find virulence genes ID Map tp GO term Pfam Air 1 Air 2 Soil Whalefall Human 7PF00144 response to antibiotic Beta-lactamase 0.3094 0.2349 0.2757 0.1087 0.0191PF05139 response to antibiotic Erythromycin esterase 0.0041 0.0114 0.0240 0.0000 0.0064PF05223 response to antibiotic NTF2-like N-terminal transpeptidase domain 0.0000 0.0000 0.0010 0.0000 0.0000PF07091 pathogenesis Ribosomal RNA methyltransferase (FmrO) 0.0000 0.0000 0.0021 0.0000 0.0000PF01289 pathogenesis Thiol-activated cytolysin 0.0000 0.0000 0.0021 0.0000 0.0000PF01376 pathogenesis Heat-labile enterotoxin beta chain 0.0000 0.0000 0.0000 0.0000 0.0000PF03023 pathogenesis MviN-like protein 0.0247 0.0341 0.0459 0.0225 0.1146PF03077 pathogenesis Putative vacuolating cytotoxin 0.0000 0.0000 0.0000 0.0037 0.0000PF03945 pathogenesis delta endotoxin, N-terminal domain 0.0041 0.0000 0.0000 0.0000 0.0000PF05394 pathogenesis Avirulence protein 0.0000 0.0000 0.0010 0.0000 0.0000PF05480 pathogenesis Staphylococcus haemolytic protein 0.0000 0.0000 0.0010 0.0000 0.0000PF05658 pathogenesis Hep_Hag 0.0289 0.0265 0.0073 0.0112 0.0127PF05662 pathogenesis Haemagglutinin 0.0289 0.0379 0.0021 0.0000 0.0191PF05932 pathogenesis Tir chaperone protein (CesT) 0.0000 0.0000 0.0010 0.0000 0.0000PF07269 pathogenesis T-complex transport apparatus lipoprotein VirB7 0.0000 0.0000 0.0010 0.0000 0.0000PF07675 pathogenesis Cleaved Adhesin Domain 0.0000 0.0000 0.0000 0.0000 0.0064PF07822 pathogenesis Neurotoxin B-IV-like protein 0.0000 0.0000 0.0010 0.0000 0.0000PF09025 pathogenesis YopR Core 0.0000 0.0000 0.0000 0.0037 0.0000PF09207 pathogenesis Yeast killer toxin 0.0000 0.0000 0.0010 0.0000 0.0000PF06414 pathogenesis Zeta toxin 0.0000 0.0076 0.0010 0.0000 0.0000PF06769 pathogenesis Plasmid encoded toxin Txe 0.0041 0.0038 0.0010 0.0037 0.0191PF02794 pathogenesis RTX toxin acyltransferase family 0.0000 0.0038 0.0000 0.0000 0.0000

Page 24: Discovering virulence genes present in novel strains and metagenomes

Need better mappings to virulence genes

• Current GO term mappings miss most virulence-associated genes.

Slide 24

ID PFAM Air 1 Air 2 Soil Whalefall Human 7PF00593 TonB dependent receptor 0.90 0.87 0.16 0.31 0.00PF07715 TonB-dependent Receptor Plug Domain 0.94 0.95 0.33 0.37 0.02PF03466 LysR substrate binding domain 0.68 0.58 0.16 0.52 0.18PF00126 Bacterial regulatory helix-turn-helix protein, lysR family 0.48 0.42 0.14 0.38 0.27PF00440 Bacterial regulatory proteins, tetR family 0.42 0.43 0.17 0.28 0.20PF00873 AcrB/AcrD/AcrF family 0.77 0.58 0.43 0.64 0.03PF00015 Methyl-accepting chemotaxis protein (MCP) signaling domain 0.29 0.22 0.05 0.42 0.00PF07992 Pyridine nucleotide-disulphide oxidoreductase 0.78 0.79 0.58 0.77 0.49PF00106 short chain dehydrogenase 0.99 0.89 0.74 0.58 0.23PF01381 Helix-turn-helix 0.31 0.38 0.17 0.23 0.64

Table 1 and 2. PFAMs and TIGRfams overrepresented in air compared to soil

ID TIGRFAM Air 1 Air 2 Soil Whalefall Human 7TIGR00014 arsenate reductase (glutaredoxin) 0.40 0.30 0.05 0.04 0.00TIGR01297 cation diffusion facilitator family transporter 0.29 0.46 0.13 0.16 0.24TIGR01782 TonB-dependent receptor 0.23 0.26 0.01 0.00 0.00TIGR02606 putative addiction module antidote protein 0.25 0.28 0.03 0.00 0.00TIGR01552 prevent-host-death family protein 0.25 0.48 0.14 0.18 0.16TIGR01435 putative glutamate--cysteine ligase 0.20 0.20 0.00 0.22 0.42TIGR01352 TonB family C-terminal domain 1.01 0.74 0.68 0.37 0.03TIGR01509 haloacid dehalogenase superfamily 0.49 0.46 0.28 0.28 0.42TIGR00093 pseudouridine synthase family 0.54 0.40 0.28 0.18 0.16TIGR02690 arsenical resistance protein ArsH 0.11 0.28 0.02 0.00 0.00