Upload
others
View
5
Download
0
Embed Size (px)
Citation preview
1
Data Vaccination:
Computational Biology Enhances Infectious Disease Research
Lisa Herron-OlsonMicrobiologist
Syntiron LLCSaint Paul, MN
SARS Spreads From China Markets
West Nile Virus Arrives in New YorkMonkeypox Acquired From Prairie Dogs
Canadian Cow Tests Positive for BSEWeaponized Anthrax Sent in US MailExperts warn of avian flu pandemic
2
Areas of Discussion
Computational tools in infectious disease research
• ID database to include molecular biology
• CASE STUDY: Staphylococcus aureus
• Comparative genomics• Assembly• Sequence analysis
• Functional genomics• Gene expression• Proteomics
• Significance: moving closer to a vaccine
Messages
•Existing tools are good.
•Databases•Algorithms•Associated software tools
•More/better tools are continually needed.
•Even if not necessarily requested…•Even if not necessarily requested in clear language…•Even if the current tools work…•Even if the newest version was launched an hour ago…
3
Biological Problem:
How can we improve ourresistance to microbial infection in a changing world?
Focusing research on zoonotic pathogenswill increase the likelihood of survival for
human and other animal hosts.
4
Bacterial host specificity
Evidence for genetic similarity among clones associated with a specific host
– Salmonella spp. – human, cow, pig, chicken, fish– Rhizobium spp., Bradyrhizobium spp., etc. – legumes and other plants– Pseudomonas syringae – tomato, potato, tobacco, bean, pear
What is the genetic basis for host specificity and what can it tell us about the pathogenesis of specific diseases?
Understanding infectious disease ecology: overall challeng
5
Understanding infectious disease ecology: overall challeng
Tracking infectious disease: current systems
NEDSS (Natl. Electronic Disease Surveillance System)
Strengths: well-developed and fundedincludes some non-human hosts
Weaknesses: clinically focusedno molecular biology*various levels of implementation
PRO-Med Mail (PROgram for Monitoring Infectious Diseases)
Strengths: rapidglobalmoderated by human ‘expert’ teams
Weaknesses: Internet-basedhuman only*human moderators do not scale up
6
• Variation in amount and type of data collected during a specific case, inability to guarantee ‘complete’ records
• Variation in data format
• Integration of local, (county/state/province/region), federal health and surveillance systems
• Uneven distribution of resources
• Human focus
Understanding infectious disease ecology: overall challeng
Host
Disease
Symptom
Pathogen
•Tracking
•Prediction
•Therapy•Vaccine
•Control
Problem
For hundreds of years, surveillance systems relied upon the concept of ‘symptom’ as the entry point for tracking data.
7
Problem
For hundreds of years, surveillance systems relied upon the concept of ‘symptom’ as the entry point for tracking data.
Host
Disease
Symptom
Pathogen
•Tracking
•Prediction
•Therapy•Vaccine
•Control
ResolutionImproving the resolution between ‘pathogen’ and ‘symptom’ includes vital information for improving success on the right side
Host-pathogen
interaction
Component
Protein
Transcript
Virulence/protection
gene
Genomic sequence
Host
Disease
Symptom
Pathogen
•Tracking
•Prediction
•Therapy•Vaccine
•Control
8
Objectives
The model is designed to remember important data types involved with infection, with an emphasis on keeping track of multiple hosts and including molecular biological data.
Through this, we aim to accomplish two major objectives:
Objective 1: Identify molecular mechanisms underlying pathogenesis
Objective 2: Improve epidemiological surveillance
Core of the model: Host – Pathogen – Disease Triangle (HPD
Disease
Host Pathogen
Environment
The two objectives could be combined in a single model because the important data to remember about each objective
are anchored by a common core.
9
host idhost date of birthhost date of deathhost sexhost reproductive statusHost current genderHost time of birthHost birth order numberHost time of deathHost # reproduction eventsHost common nameHost siblings flagHost ethnicityHost contact phone numberHost infertility flagEye colorHair/fur colorCoat patternLeaf patternFlowering flagSeeding flagHost photographHandicapped flagReligious afÞliationHost contact phone numberGuardian flagNumber of offspring under careMarital statusMother maiden nameKnown inbreeding/self poll flag
host
mothers
be mothered by
fathers
be fathered by
disease iddisease clinical namedisease common name
disease
pathogen idPathogen common namePathogen original discovererReproductive rateNumber of life stagesPathogen reproduction typeColony/community colorColony/community morphologyAerobic/anaerobic respiration flagIdentiÞed in host organism flagPathogen genderMotile flagSporulating flagPathogen adult size
pathogen
KOLT idkind of living thing
genome sequence idGS lengthGS chromosome countGS plasmid countGS GC contentGS insertion seq. countGS ribosome countGS gap countGS nucleotide seq.
genome sequence
start positionend position
gene in sequence
gene idgene namegene nucleotide lengthgene amino acid lengthgene nucleotide seq.gene amino acid seq.gene GC contentgene ECgene phage flag
gene
PCRP idPCRP lengthPCRP seq.PCRP puriÞedPCRP forward primerPCRP reverse primer
PCR ProductMA idMA content descMA systemMA spot count
microarray
chip idchip mfg.
chip
array xarray y
array coordinate
spot idmicroarray spot
bebe
plate idMAL xMAL y
MA location
has
be .. of
bebe
CE idCE dateCE objective
chip experimentuses
be used for
GP presence flagGP copy count
gene presence
sample idsample typesample storage typesample storage tempsample harvest datesample harvest time
sample
host-sample pair pathogen-sample pair
symptom idsymptom clinical namesymptom common name
symptom
symptom severity seqsymptom severity desc.
symptom severityclinical/subjective flagex. symptom diagnosis date
exhibited symptom
diagnosis reasonaffliction
disease-symptom pair
CE-HS pair
hybrid. sample idRNA extraction datehybrid. sample label typehybrid. sample control flagRNA source*
hybridization sampleEoT quantitationexpression of transcript
EoP quantitationexpression of protein
PE-PT pair
PEE idPEE type
transcript expression experiment
TEE idTEE type
protein expression experiment?
comprisesis comprised of
genus idgenus
species idspecies
sub-species idsub-speciesbe be
be be
be be
TE idtransmission event
vector host in TE
location idlocation
IE dateisolation event
HiL alive flaghost in location
LTC idLTC type value
location time componentLPC idLPC type value
location place component
LPC type nameLPC type
LTC type nameLTC type
LTC-LTC pair be objectobjectify
LTCP-preposition valueLTCP-preposition desc
LTC-pair preposition
LPC-LPC pair
LPCP-preposition valueLPCP-preposition desc
LPC-pair preposition
PiL alive flagpathogen in location
be
be
susceptibility idsusceptibility age*
susceptibility
be subjectsubjectify
bebe
be objectobjectify
be subjectsubjectify
CSC descriptioncomposite sample component
composite samplebebe
Challenge solution: HPD Triangle
Touring the Data Model: Molecular epidemiology
10
Touring the Data Model: Pathogenesis
host idhost date of birthhost date of deathhost sexhost reproductive statusHost current genderHost time of birthHost birth order numberHost time of deathHost # reproduction eventsHost common nameHost siblings flagHost ethnicityHost contact phone numberHost infertility flagEye colorHair/fur colorCoat patternLeaf patternFlowering flagSeeding flagHost photographHandicapped flagReligious afÞliationHost contact phone numberGuardian flagNumber of offspring under careMarital statusMother maiden nameKnown inbreeding/self poll flag
host
mothers
be mothered by
fathers
be fathered by
disease iddisease clinical namedisease common name
disease
pathogen idPathogen common namePathogen original discovererReproductive rateNumber of life stagesPathogen reproduction typeColony/community colorColony/community morphologyAerobic/anaerobic respiration flagIdentiÞed in host organism flagPathogen genderMotile flagSporulating flagPathogen adult size
pathogen
KOLT idkind of living thing
genome sequence idGS lengthGS chromosome countGS plasmid countGS GC contentGS insertion seq. countGS ribosome countGS gap countGS nucleotide seq.
genome sequence
start positionend position
gene in sequence
gene idgene namegene nucleotide lengthgene amino acid lengthgene nucleotide seq.gene amino acid seq.gene GC contentgene ECgene phage flag
gene
PCRP idPCRP lengthPCRP seq.PCRP puriÞedPCRP forward primerPCRP reverse primer
PCR ProductMA idMA content descMA systemMA spot count
microarray
chip idchip mfg.
chip
array xarray y
array coordinate
spot idmicroarray spot
bebe
plate idMAL xMAL y
MA location
has
be .. of
bebe
CE idCE dateCE objective
chip experimentuses
be used for
GP presence flagGP copy count
gene presence
sample idsample typesample storage typesample storage tempsample harvest datesample harvest time
sample
host-sample pair pathogen-sample pair
symptom idsymptom clinical namesymptom common name
symptom
symptom severity seqsymptom severity desc.
symptom severityclinical/subjective flagex. symptom diagnosis date
exhibited symptom
diagnosis reasonaffliction
disease-symptom pair
CE-HS pair
hybrid. sample idRNA extraction datehybrid. sample label typehybrid. sample control flagRNA source*
hybridization sampleEoT quantitationexpression of transcript
EoP quantitationexpression of protein
PE-PT pair
PEE idPEE type
transcript expression experiment
TEE idTEE type
protein expression experiment?
comprisesis comprised of
genus idgenus
species idspecies
sub-species idsub-speciesbe be
be be
be be
TE idtransmission event
vector host in TE
location idlocation
IE dateisolation event
HiL alive flaghost in location
LTC idLTC type value
location time componentLPC idLPC type value
location place component
LPC type nameLPC type
LTC type nameLTC type
LTC-LTC pair be objectobjectify
LTCP-preposition valueLTCP-preposition desc
LTC-pair preposition
LPC-LPC pair
LPCP-preposition valueLPCP-preposition desc
LPC-pair preposition
PiL alive flagpathogen in location
be
be
susceptibility idsusceptibility age*
susceptibility
be subjectsubjectify
bebe
be objectobjectify
be subjectsubjectify
CSC descriptioncomposite sample component
composite samplebebe
Share
Genbank NCBI Taxonomy
Stanford MicroarrayDB
NEDSS
11
CASE STUDY: Staphylococcus aureus Research
Comparative Genomics
Part I: Sequencing, assembly and annotation
12
Staphylococcus aureus: the bug
• Gram-positive cocci• family Micrococcae• grapelike clusters• yellow colonies• coagulase positive
• carried by 30-40% of healthy human adults
• septicemia • endocarditis • TOXIC SHOCK
SYNDROME• osteomyelitis • pneumonia• purpura fulminans• food poisoning• furuncules• impetigo• scalded skin syndrome• arthritis
Staphylococcus aureus infections
HUMANS
SA infects multiple hosts and causes many diseases
• MASTITIS• septicemia • toxic shock syndrome• pneumonia• osteomyelitis • snuffles• wound infection
ANIMALS
13
• Metabolic diversity• Toxins• Immune effectors• Biofilms• Clumping• Adhesins• Regulators
Staphylococcus aureus: an ideal pathogen
GOAL: Identify the genomic differences between bovine and humanStaphylococcus aureus
HYPOTHESIS: Host-specific pathogenesis is enhanced by a subset of host-tailored virulence-related genes
14
Why whole-genome sequencing?
Advantages:Complete set of potential genes
Virulence factorsVaccine componentsTherapeutic targets
Regulatory elementsGenomic organization
Challenges:Data managementTime expenseComplexity of conducting thorough analyses
Comparative genomicsmethods
Isolate plasmids and sequence the inserts
Generate small-insertgenomic library in E. coli
Close contig gapsAnnotate open reading frames
Assemble sequence reads
Hybridize fluorescently labeled SA DNA(multiple strains) to array, scan, analyze
Analyze amino acid substitution rates Compare genome content and organization
Spot 70mer oligonucleotides representingSA ORFs onto glass slides
Culture RF122, MSA553
….. ….. ….. ….. ….. ….. …..….. ….. ….. ….. ….. ….. …..….. ….. ….. ….. ….. ….. …..….. ….. ….. ….. ….. ….. …..….. ….. ….. ….. ….. ….. …..….. ….. ….. ….. ….. ….. …..….. ….. ….. ….. ….. ….. …..….. ….. ….. ….. ….. ….. …..….. ….. ….. ….. ….. ….. …..
….. ….. ….. ….. ….. ….. …..….. ….. ….. ….. ….. ….. …..….. ….. ….. ….. ….. ….. …..….. ….. ….. ….. ….. ….. …..….. ….. ….. ….. ….. ….. …..….. ….. ….. ….. ….. ….. …..….. ….. ….. ….. ….. ….. …..….. ….. ….. ….. ….. ….. …..….. ….. ….. ….. ….. ….. …..
15
Strain: RF122 Mu50 N315 MW2
Size (bp): 2,703,713 2,878,040 2,814,816 2,820,462GC content: 33.4% 32.8% 32.8% 32.9%ORFs: 2,406 2,714 2,593 2,632No. Reads: 23,125 64,000 63,000 64,000
The Tool That Saved the Thesis
16
S. aureus strain RF122 genome sequence
Size (nt) 2,742,531ORFs 2590GC% 32.78%rRNA 5tRNAs 60Plasmids 0Path Islands 2Phages 2Unique genes 60Pseudogenes 74
S. aureus strain MSA553 genome sequence
Staphylococcus aureus MSA553
Size (nt) 2,856,447ORFs 2702GC content 33%Ribosomal ops 5tRNAs 59Plasmids 1(int)Path Islands 2Phages 1Unique genes 25
17
Isolate
Key Size (mbp) 2.74 2.86 2.90 2.80 2.82 2.81 2.8 7 2.81 2.82 Plasmids 0 0 1 1 1 0 0 1 0 ORFs 2590 2702 2671 2565 2632 2595 2697 UA UAIS1181 0 1 2 3 3 8 11 1 1
Mu50
N315
MW2
MSSA476
MRSA252
NCTC83
25
MSA553
COL
RF122
Isolate
Key Size (mbp) 2.74 2.86 2.90 2.80 2.82 2.81 2.87 2.81 2.82 Plasmids 0 0 1 1 1 0 0 1 0 ORFs 2589 2685 2671 2565 2632 2595 2697 UA UAIS1181 0 1 2 3 3 8 11 1 1
Mu50
N315
MW2
MSSA476
MRSA252
NCTC83
25
MSA553
COL
RF122
MSSA476
N315
Mu50
COL
NCTC8325
MRSA252RF122 *MSA553
MW2
Completely sequenced SA isolates for comparison
Staphylococcus aureusRF122
RF122 Genome Comparison
18
MSA553 Genome Comparison
Staphylococcus aureusMSA553
C A T G C A A G T C G C C G T A T T
C A T G C G A G T C G C C A T A T TH A S R H I
H A S R R I
Gene analysis: amino acid substitution rates
1. Obtain RF122 sequence2. Line up raw RF122 sequence with sequence of human isolate3. Identify substitutions based on algorithm of Nei, Gojobori4. Calculate synonymous and nonsynonymous substitution rates per gene
synonymous nonsynonymous
19
RF122 genes with elevated rates of nonsynonymous substit
PRODUCT FUNCTION
staphycoagulase adhesionfibronectin binding protein A adhesionhost factor binding protein adhesionhost factor binding protein adhesionhost factor binding protein adhesionclumping factor A adhesionsecreted von Willebrand adhesion
factor-binding protein precursor
IgG-binding protein immune evasion
staphylococcal enterotoxin 11 virulencestaphylococcal enterotoxin 9 virulence
hypothetical membrane protein (6) unknown
Identification of gene deletions
Staphylococcus aureusMSA553
20
Ebh is not conserved in RF122
EMRSA-16
TSS
Mu50
N315
MW2
MSSA476
COL
OK8325
RF122
0 10K AA
Mobile element inserted within Ebh sequence
Ebh is a 1.1-megadalton cell wall protein capable of binding human fibronectin
High genome homology between MSA553 and MRSA252
A. phageMSA553 B. SaPI5 containing TSST-1 and Etx
C. SCCmec encoding methicillin resistance
D. phage Sa2
21
Strain Type State Year TSSTMRSA etxa
MSA553 mTSS PA 1978 tsst1+b +
PSHA mTSS - 1978 + +
PSMN mTSS MN 1986 + +
PSPA mTSS TN 1986 + MRSA
PS58 mTSS CDC1980 + +
PSWH mTSS MN 2000 + MRSA
PSJO nmTSS MN 2005 +
PSEB nmTSS MN 2005 + +
PSHO nmTSS MN 2005 + +
PSLA nmTSS MN 2005 + +
a Presence of gene confirmed by PCR = +b Presence of gene confirmed; protein production unconfirmed
Novel exfoliative toxin detected in multiple SA isolated from
Equivalent gene content ≠ equivalent gene position
22
Plug ‘N’ Play™Get the LATEST in Mobile Technology!
We carry the best in:
Adhesion factorsInvasion assistanceAntibiotic resistanceAltered regulatorsGeneral nuisances
and
TOXINS TOXINS TOXINS!
Sick of your current job? Alter your host specificity
with Plug ‘N’ Play HS!
Easiest Install!Just acquire and go!
Staph & Co.®
Since long ago
Comparative Genomics
Part II: Genomic DNA hybridization
23
Comparative genomic DNA hybridization of diverse isolates
RF122 versus PSA1001 RF122 versus PSA72
SA oligonucleotide microarray contains 3800 probes corresponding to all of the genes from 9 sequenced SA genomes
Discovery: tools for CGH analysis are limited!
Most array tools designed for gene expression
CGH has a different set of challenges:
Normalization:Global won’t work on many comprehensive arraysHousekeeping set must be genetically conserved
Statistics:The concept is binary, but the reality isn’tEstablish cut-off for present, absent, divergent
Reliability: What if genes are locally divergent where probe hits?
24
vSaB
ov
SCC
cap
BoPh
i12
SaPI
bov
OPT
rans
SaPI
bovB
eta
Ebh
BBBBBBBHHHHBBBBHH HHH
PSA1RF120RF122PSA72PSA6PSA10PSA13Mu50N315MW2MSA476PSA20PSA1001PSA17PSA4
NCTC8325MRSA252MSA553MSA553A
COL
Gene content of SA from isolated from humans and cow
Most successfulbovine isolates (ET3)
Comparative genomics summary
There are genetic differences between SA isolates routinely recovered from human and bovine hosts.
• Novel toxins
• Unique mobile genetic elements, genome organization
• Genes showing sequence divergence or deletionTranscriptional regulatorsMembrane proteinsAdhesion factorsToxins
25
A smattering of DBs used for these analyses
Comparative genomics in the IDDB
26
Comparative genomics in the IDDB
Functional genomics
Part I: Gene expression analysis
27
Iron availability in the host
Toxic shock syndrome
bLactoferrin
Citrate
Mastitis
hLactoferrin (mTSS)TransferrinHemoglobin
Ferritin
Iron availability in the host
Mastitis Toxic shock syndrome
bLactoferrin
CitratehLactoferrin (mTSS)
TransferrinHemoglobin
Ferritin
28
Metal metabolism of SA
Transcriptional regulator
RepressorInducer
Cytosolic protein
Membrane protein
Surface component
Divergent component
Non-iron primary function
Fur
Fhu .
Frp Sir Isd
PerR Zur
MntR
D E
C
B
D2
BA C
H
* FrpG = sortase B
F
G*
KatA
SrtB SrtA
AhpFC
FeoB
TrxBFtn
SstA
B
D
C
Mnt
A
B
C
Sbn .
I
DC
HB
A
H
G
F
E
D
MrgA
CzrA
Cad Czr
B
X
A
AdcA
? StbA
Iron studies in Staphylococcus aureus
Phenotypic response of bovine SA to iron limitation
CDM + iron
CDM – iron, log phase
CDM – iron, stationary phase
CDM iron added back (2 hours)
CDM + iron CDM - iron
29
Gene expression analysis methods
Grow MSA553, RF122 in chemically defined media
Reverse transcribe to cDNA, construct fluorescently labeled probe
Scan array
Identify strain-specific iron-responsive genes for further genetic analyses
Isolate RNA at 5, 30, 60, 120 minutes
….. ….. ….. ….. ….. ….. …..….. ….. ….. ….. ….. ….. …..….. ….. ….. ….. ….. ….. …..….. ….. ….. ….. ….. ….. …..….. ….. ….. ….. ….. ….. …..….. ….. ….. ….. ….. ….. …..….. ….. ….. ….. ….. ….. …..….. ….. ….. ….. ….. ….. …..….. ….. ….. ….. ….. ….. …..
….. ….. ….. ….. ….. ….. …..….. ….. ….. ….. ….. ….. …..….. ….. ….. ….. ….. ….. …..….. ….. ….. ….. ….. ….. …..….. ….. ….. ….. ….. ….. …..….. ….. ….. ….. ….. ….. …..….. ….. ….. ….. ….. ….. …..….. ….. ….. ….. ….. ….. …..….. ….. ….. ….. ….. ….. …..Hybridize to
oligonucleotide array
Add lactoferrin or ferric citrate during mid-log phase
SAB0180 ldh1 L-lactate dehydrogenase 1 SAB0164 pflB formate acetyltransferase SAB1667 probable specificity determinant SAB0165 pflA formate acetyltransferase activating enzyme SAB2246 hypothetical protein SAB0557 adh1 alcohol dehydrogenase I SAB0242 probable formate/nitrite transport protein SAB2363 hypothetical protein SAB2267 narK nitrite extrusion protein SAB2491 anaerobic ribonucleotide reductase large subunit SAB2245 lldP1 L-lactate permease SAB2490 anaerobic ribonucleotide reductase small subunit SAB2029 czrA zinc and cobalt transport repressor protein SAB2030 czrB cation-efflux system membrane protein SAB2492 mntH probable magnesium citrate secondary transporter SAB2280 nasD nitrite reductase SAB0049 lldP2 L-lactate permease putative resolvase cadA cadmium exporting ATPase protein cadX cadmium efflux accessory protein SAB0209 rbsC ribose permease transport protein SAB0210 rbsD ribose transporter SAB1940 ilvN acetolactate synthase small subunit SAB1901 hypothetical protein SAB0072 sodM superoxide dismutase SAB0361 hypothetical bovine pathogenicity island protein SAB0332 probable nitro/flavin reductase SAB1219 probable DNA damage repair protein SAB0355 bovine pathogenicity island protein Orf9 SAB0354 bovine pathogenicity island protein Orf10 SAB0356 bovine pathogenicity island protein Orf8 SAB0358 bovine pathogenicity island protein Orf6 SAB0344 bovine pathogenicity island protein Orf19 SAB0235 hypothetical protein SAB0057 sbnC probable siderophore biosynthesis protein SAB2495 bsaA glutathione peroxidase SAB2346 oligopeptide transporter putative ATPase domain SAB2251 conserved hypothetical protein SAB2057 sirB ferrichrome ABC transporter SAB2153 hypothetical protein SAB2154 conserved hypothetical protein SAB0107 conserved hypothetical protein SAB0598 fhuD ferrichrome transport permease SAB0582 mntB cation ABC transporter SAB2296 gpmA 2,3-bisphosphoglycerate-dependent phosphoglycerate mutase SAB0583 mntA cation ATP-binding ABC transporter SAB2155 iunH inosine-uridine preferring nucleoside hydrolase SAB0597 fhuB ferrichrome transport permease SAB2156 fhuD2 probable ferrichrome-binding protein SAB2058 sirA ferrichrome ABC transporter lipoprotein SAB2056 sirC ferrichrome ABC transporter SAB0761 conserved hypothetical protein SAB0309 metB cystathionine gamma-synthase SAB0596 fhuA ferrichrome transport ATP-binding protein SAB0059 sbnE probable siderophore biosynthesis protein SAB0812 conserved hypothetical protein SAB2286 adcA probable zinc-binding lipoprotein SAB2345 oligopeptide transporter system protein SAB1651 hypothetical protein SAB2349 opp1B oligopeptide transporter putative membrane permease domain SAB2347 opp1D oligopeptide transporter putative ATPase domain SAB2351 oligopeptide transporter system protein SAB2353 oligopeptide transporter system protein SAB1316 fnbB fibronectin binding protein 2 domain SAB2352 oligopeptide transporter system protein SAB2350 opp1A oligopeptide transporter putative substrate binding domain SAB2348 opp1C oligopeptide transporter putative membrane permease domain
-12 0 +12 log(2) expression ratio
Lf-induced Both
Lf-induced MSA553
Fc-induced
Low-iron induced stronger Fc response
Low-iron induced matching RF122 Fc, Lf response
5 30 60 120 5 30 60 12 0 5 30 60
RF122 RF122 MSA553 Fc Lf Lf
A
B
C
D
E
A. Induced by bLfB. Induced by bLfC. Induced by FcD. Induced by iron starvationE. Induced by iron starvation
Strain and source-specifictranscriptional response clusters
RF1
22
F
erric
citr
ate
RF1
22
Lact
ofer
rin
MSA
553
Lact
ofer
rin
30
Analysis of microarray data
•Global normalization•Filtering•Triplicate spots, duplicate experiments, dye-swap
2 biological replicates6 inputs per datapoint
•ClusteringHierarchical (Euclidian distance, avg. linkage, UPGMA)K-means (uncentered based on meas. distance)
•StatisticsSignificance Analysis for Microarrays (SAM)
Median-centered log ratios, one-class modelStringent delta value
Adhesion proteins DNA damage repairPurine biosynthesis
11
1
1033 14
21
12 10
12
0
91
0
Overlap of significantly* differentially expressed genesStrain and source-specific responses
RF122 Fc RF122 Lf
MSA553 Lf
RF122 Fc RF122 Lf
MSA553 Lf
* Significantly different across ALL timepoints by SAM analy
31
10D2 fibronectin binding protein 2 domain mult.IJ21 ferrichrome ABC transporter SAB2056 sirCIJ22 ferrichrome ABC transporter SAB2057 sirBIJ23 ferrichrome ABC transport lipoprotein SAB2058 sirAIO22 conserved hypothetical protein SAB2153IO23 conserved hypothetical protein SAB2154IO24 conserved hypothetical protein SAB2155IP2 ferrichrome binding protein SAB2156 fhuD2IP3 conserved hypothetical protein SAB0107
2I22 probable antibiotic resistance protein SAB2345 2I23 oligopeptide transporter2I24 oligopeptide transporter ATPase domain SAB2347 opp1D2J1 oligopeptide transporter membrane dom. SAB2348 opp1C2J2 oligopeptide transporter membrane dom. SAB2349 opp1B2J3 oligopeptide transporter membrane dom. SAB2350 opp1A2J5 oligopeptide transporter SAB2351 2J6 oligopeptide transporter SAB2352 2J7 oligopeptide transporter SAB2353
4F19 cation ABC-transporter SAB0582 mntB4F21 cation ATP-binding ABC transporter
SAB0583 mntA4G14 ferrichrome transport ATP-binding trans. SAB0596 fhuA4G15 ferrichrome transporter permease SAB0597 fhuB4G16 ferrichrome transporter permease SAB0598 fhuG4G21 siderophore biosynthesis protein SAB0059 sbnE
Genes upregulated in iron-deplete conditionsRFc RLf MLf
gataatgataatcattatc E. coli consensusgataatgataatcattatc B. subtilis dhb gataatgattctcattgtc S. aureus sirAgttcatgataatcattatc S. aureus fhucattgcacctttcattatc S. aureus opp1Xtatcgtatcattcattatc S. aureus opp1Ztttaatttccttcattatc S. aureus opuCC
Signal sequence for iron-regulated genes
1B11 SaPIbov protein SAB03581B13 SaPIbov protein SAB03561B14 SaPIbov protein SAB03551B15 SaPIbov protein SAB03542J21 hypothetical protein SAB02353A19 glutathione peroxidase SAB24953J16 nitro/flavin reductase SAB03323K8 SaPIbov protein SAB03443K16 SaPIbov protein SAB03614F20 siderophore biosynthesis protein SAB00574N11 superoxide dismutase [Mn/Fe]SAB00725K18 DNA damage repair protein SAB1219
Genes upregulated in response to ferric citrate but not lactof
Summary
Ferric citrate, but not lactoferrin, induces antioxidant response and increased transcription of SaPIbov pathogenicity island genes
RFc RLf MLf
32
2D14 L-lactate permease SAB22452D16 hypothetical protein SAB22462E19 nitrite extrusion protein SAB22672F10 nitrite reductase SAB22802J22 hypothetical protein SAB23633A12 anaerobic ribonucleotide reductase SAB24903A15 anaerobic ribonucleotide reductase SAB24913A16 probable Mg2+ transporter SAB24924A23 L-lactate permease SAB00494E13 alcohol dehydrogenase I SAB05577B13 formate acetyltransferase SAB01647B24 formate acetyltransferase activating enz.SAB01657C1 specificity determinant SAB16677J5 L-lactate dehydrogenase SAB0180
Genes upregulated in response to lactoferrin but not ferric c
Summary
Lactoferrin, but not ferric citrate, induces transcription of fermentation and anaerobic respiration system components
RFc RLf MLf
ummary of SA response to iron depletion and specific iron sour
• Multiple iron transport systems are significantly upregulated in low iron
• Steady metabolic/cellular gene expressionEmphasizes ability of SA to withstand iron depletion
• New iron-regulated transport operon (Opp) and Fur signal sequence
• Ferric citrate induces antioxidant response and increased transcription of pathogenicity island genes
•Lactoferrin induces transcription of fermentation and anaerobic respiration system components
33
Functional genomics
Part II: Proteomics
Fur
Fhu .
Frp Sir Isd
PerR Zur
MntR
D E
C
B
D2
BA C
H
* FrpG = sortase B
F
G*
KatA
SrtB SrtA
AhpFC
FeoB
TrxBFtn
SstA
B
D
C
Mnt
A
B
C
Sbn .
I
DC
HB
A
H
G
F
E
D
MrgA
CzrA
Cad Czr
B
X
A
AdcA
? StbA
34
Gene expression analysis methods
Grow S. aureus in rich media (Fe+) and rich media + iron chelator (Fe-)
Extract and purify membrane proteins
Run MASCOT search to identify corresponding protein
Separate membrane proteins by SDS-PAGE
Use MALDI-TOF to identify peptide mass fingerprint
Extract membrane proteins
++++
500 1000 1500 2000 2500 3000 m/z
2000
4000
6000
8000
10000
12000
14000
a.i.
/I=/jan04/lw129a1/SRef/pdata/1 Administrator Fri Mar 5 10:15:24 2004
35
SAAV1 19636 1477 2176
+Fe -Fe +Fe -Fe +Fe -Fe +Fe -Fe
++ ++)
M.W. STD
SA membrane proteins induced during iron-restriction
500 1000 1500 2000 2500 3000 m/z
2000
4000
6000
8000
10000
12000
14000
a.i.
/I=/jan04/lw129a1/SRef/pdata/1 Administrator Fri Mar 5 10:15:24 2004
67%158 (p= 2.2 x 10-10)25329962SirA
Peptide coverage
Mowse scoreGI#Protein match
67%158 (p= 2.2 x 10-10)25329962SirA
Peptide coverage
Mowse scoreGI#Protein match
A B
C
Identification of SA membrane proteins
36
S. aureus proteins upregulated under iron restrictionmatch gene expression study
ldh L-lactate dehydrogenase IpflB formate acetyltransferasepflA formate acetyltransferase act. enzymeadh1 alcohol dehydrogenase I
SAV2177 ferrichrome ABC transporter homologSAR1869 putative exported proteinsirA iron-regulated lipoproteinSACOL0688 ABC transporter homologopp1-A oligopeptide transporter
-12 0 +12 log(2) expression ratio
5 30 60 120 5 30 60 12 0 5 30 60
RF122 RF122 MSA553 Fc Lf Lf
Functional genomics summary
• Identified genes and operons not previously associated with iron metabolism in S. aureus
• Identified strain-specific differences in iron metabolism between a bovine and human isolate of SA
• Identified conserved iron-induced membrane proteins
FUTURE WORK
• Evaluate vaccine potential of membrane proteins•Different compositions for humans and bovines?
• Confirm regulation and function of newly identified genes
MORE DATA
37
Update on what we have in local databases
• Genomic sequences (2 full in-house; 7 outside sequences)• Full sequence (Total of 25.2 million nucleotides)
• Annotated genes (aa and nt)
• 60,000 clones catalogued (id, location, date, sequence)
• Sequence similarity reports for each gene (aa and nt versus 2x8
targets)
• Substitution rates for each gene (2x8 targets)
• Microarray hybridization data for 3800 genes in12 strains (6 reps)
• Microarray based expression data for 3800 genes in 3 strains under 5
different environmental conditions (6 reps)
• Protein gel data for 3200 proteins under 2 conditions, 2 strains>
… actually, a great deal of this is stored in Excel spreadsheets.
>1,000,000 data points
Utilizing the IDDB for S. aureus studies
38
Host-specialized adaptations mined from comparative genomic and functional analysis
Human TSS
NovelExfoliative toxin XSaPI5
DivergentMembrane proteinsLytR regulatorExotoxin 3Staphopain protease
PseudogenesHla
Bovine mastitis
NovelStreptolysinsSaPIbov genes
DivergentAdhesinsAgr locus
PseudogenesEbhSpaClfASdrCSstC
Are we any closer to a vaccine?
39
Iron-induced proteins are cross-protective against SA challenge in mice
0
5
10
15
20
25
30
placebo vaccinated
Lesi
on d
iam
eter
(mm
)
A B
0 25 50 75 100 125 150 175 2000
102030405060708090
100110
19636 vaccinated
placebo
Time (hrs)
Perc
ent s
urvi
val
C D
0
5
10
15
20
25
30
placebo 19636-vaccinated
Lesi
on d
iam
eter
(mm
)
0 25 50 75 100 125 150 175 2000
102030405060708090
100110
1477 vaccinated
placebo
Time (hrs)
Perc
ent s
urvi
val
Homologous challenge
Heterologous challenge
p = 0.020
p = 0.012p = 0.015
p = 0.020
Mortality model Lesion model
Messages
•Existing tools are good.
•Databases•Algorithms•Associated software tools
•More/better tools are continually needed.
•Even if not necessarily requested…•Even if not necessarily requested in clear language…•Even if the current tools work…•Even if the newest version was launched an hour ago…
40
Conclusions
Biologist
B_id
Computer Scientist
CSci_idbe
be
Nice, but not likely, nor necessary
Conclusions
Biologist
B_id
Imperative!
ComputerScientistCSci_id
Bio-CSci Pair
Good communication flag
41
Acknowledgements
Vivek KapurRajit Chakravarty
Dan WolfAkash Kumar
Advanced Genetic Analysis Center
Computational Analysis ProjectsNick BollwegJohn CarlisChris Dwan
John Freeman (3M)Wayne Xu
SYNTIRONLaura Wonderling
Daryll Emery
CollaboratorsJames Musser JR Fitzgerald