36
Rapid quantification and taxonomic Rapid quantification and taxonomic classification of a complex consortium of classification of a complex consortium of rDNA amplicons from both prokaryotic and rDNA amplicons from both prokaryotic and eukaryotic origins using a microarray. eukaryotic origins using a microarray. CEB - ESD - LBNL Todd DeSantis, Sonya Murray, Jordan Moberg, Gary Andersen Carol Stone (DSTL, U.K.) What bugs are in my What bugs are in my sample? sample?

Rapid quantification and taxonomic classification of a complex consortium of rDNA amplicons from both prokaryotic and eukaryotic origins using a microarray

Embed Size (px)

Citation preview

Page 1: Rapid quantification and taxonomic classification of a complex consortium of rDNA amplicons from both prokaryotic and eukaryotic origins using a microarray

Rapid quantification and taxonomic classification of a Rapid quantification and taxonomic classification of a complex consortium of rDNA amplicons from both complex consortium of rDNA amplicons from both

prokaryotic and eukaryotic origins using a prokaryotic and eukaryotic origins using a microarray.microarray.

CEB - ESD - LBNLTodd DeSantis, Sonya Murray, Jordan Moberg, Gary Andersen

Carol Stone (DSTL, U.K.)

What bugs are in my What bugs are in my sample?sample?

Page 2: Rapid quantification and taxonomic classification of a complex consortium of rDNA amplicons from both prokaryotic and eukaryotic origins using a microarray

The ponderings of a toddlerThe ponderings of a toddler

Why must Mom confiscate my “Hello Kitty”

blanket on laundry day?

Will the swings be wet at the park?

How will this sausage impact the

diversity in my lower G.I. bacterial

community?

Will I inhale any archaeal

microorganisms when I visit the

hot springs?Gianna DeSantis

Page 3: Rapid quantification and taxonomic classification of a complex consortium of rDNA amplicons from both prokaryotic and eukaryotic origins using a microarray

• Every discarded water sample, geological core, or spent air filter is lost data.

• But who wants to do all the work?– Culture? Anaerobes? non-cultivable? Safety?– Analysis of nucleic acids isolated from environment

• Must classify or sort heterogeneous nucleic acids into bins.– Restriction Fragment Length Polymorphisms (RFLP)– Single Stranded Conformation Polymorphisms (SSCP)– Temp/Denat Gradient Gel Electrophoresis (T/DGGE)– Sequencing

» Provides taxonomic nomenclature » estimates the relative abundance » Need to create, clone, & process hundreds of samples

• Can we create a simple, quantitative, comprehensive microbial test?

Page 4: Rapid quantification and taxonomic classification of a complex consortium of rDNA amplicons from both prokaryotic and eukaryotic origins using a microarray

OutlineOutline• Goals

• Experimental Approach

• Organization of rDNA sequences into taxa (CASCADE-P)

• Assigning sets of probes for each taxa

• Using 16S GeneChip for quantitative aerosol analysis

Page 5: Rapid quantification and taxonomic classification of a complex consortium of rDNA amplicons from both prokaryotic and eukaryotic origins using a microarray

Project OverviewProject Overview• Goal

– Create a single microarray capable of detecting and quantifying bacterial and/or archaeal organisms in a complex sample.

• Approach– Combinatorial power

of multiple probes for sequence-specific hybridization

Page 6: Rapid quantification and taxonomic classification of a complex consortium of rDNA amplicons from both prokaryotic and eukaryotic origins using a microarray

16S rRNA gene (16S rDNA)16S rRNA gene (16S rDNA)

• Used to identify and classify organisms by gene sequence variations.

• Variations have been used in design of DNA probes for the detection of: – taxonomic domains, divisions, groups …– specific organisms

Page 7: Rapid quantification and taxonomic classification of a complex consortium of rDNA amplicons from both prokaryotic and eukaryotic origins using a microarray

The The RibosomeRibosome

rDNA

rRNA (functional molecule)

LSU

SSU16s or 18s

Page 8: Rapid quantification and taxonomic classification of a complex consortium of rDNA amplicons from both prokaryotic and eukaryotic origins using a microarray

The The RibosomeRibosome

• Folded secondary structure

• Essential functional component

• Conserved spans– structure must be retained for viability

– targeted for universal/group-specific PCR primers and probes

• Variable regions– spans not fundamental to the folded structure

– receive less pressure from natural selection

– probed for genus and species level discrimination

Page 9: Rapid quantification and taxonomic classification of a complex consortium of rDNA amplicons from both prokaryotic and eukaryotic origins using a microarray

What could be What could be amplified?amplified?

• Universal 16S PCR primers complex population of amplicons.

• Must define the targets to consider as the Potential Amplicon Set.

Variable

Page 10: Rapid quantification and taxonomic classification of a complex consortium of rDNA amplicons from both prokaryotic and eukaryotic origins using a microarray

5’ 3’

1390 1507

Region interrogated on chip

pA Ccomp 1492R

20 base DNA signature segments on chip = probe set

Sample reacts only with complementary signature sequences on chip

SSU rDNA

First generation rDNA Array uses 85-base

highly variable region of ribosomal DNA

Page 11: Rapid quantification and taxonomic classification of a complex consortium of rDNA amplicons from both prokaryotic and eukaryotic origins using a microarray

http://greengenes.llnl.gov/http://greengenes.llnl.gov/16S16S

• Comprehensive Aligned Sequence Construction for Automated Design of Effective Probes

• Igor Dubosarskiy– Java

implementations

• Tim Harsch– RDBMS

consultations

• Lisa Corsetti– Apache module

management

• Kevin Melissare– Graphics

Page 12: Rapid quantification and taxonomic classification of a complex consortium of rDNA amplicons from both prokaryotic and eukaryotic origins using a microarray

2.30.9.2.10

5th Level:C.ACETOBUTYLICUM_SUBGROUP

4th Level:C.BOTULINUM_GROUP

3rd Level:CLOSTRIDIUM_AND_RELATIVES

2nd Level:GRAM_POSITIVE_BACTERIA

1st Level:BACTERIA

Clostridium collagenovorans DSM 3089 (T) Clostridium sardiniensis ATCC 33455 (T) Clostridium acetobutylicum ATCC 824 (T) Clostridium acetobutylicum DSM 792 (T) Clostridium acetobutylicum ATCC 824 (T) Clostridium acetobutylicum NCDO 1712 Clostridium acetobutylicum DSM 1731 

2.28.3.27.2

5th Level:ESCHERICHIA_SUBGROUP

4th Level:ENTERICS_AND_RELATIVES

(Group)

3rd Level:GAMMA_SUBDIVISION

2nd Level:PROTEOBACTERIA

1st Level:BACTERIA

U85138 clone ACK-SA7AE000452 Escherichia coli str. K-12Er.trachep Erwinia tracheiphila LMG 2906 (T)E.coliK12 Escherichia coli [gene=rrnG gene]Haf.alvei3 Hafnia alveiS.tymuriu3 Salmonella typhimurium str. Stm1Shi.boydii Shigella boydiiAF084835 str. KN4S.enterit4 Salmonella enteritidis str. SE22S.ptyphi6 Salmonella paratyphiS.typhi3 Salmonella typhi str. St111S.bovismrb Salmonella bovis morbificans Sbm1Alt.agrlyt Alterococcus agarolyticus str. ADT3Shi.flxne2 Shigella flexneri ATCC 29903 (T)

HierarchicalHierarchical Phylocodes Phylocodes

Page 13: Rapid quantification and taxonomic classification of a complex consortium of rDNA amplicons from both prokaryotic and eukaryotic origins using a microarray

Chip TaxaChip Taxa

• Avoid groupings based on historical nomenclature.• Sequence-dependent classification by transitive

similarity clustering.

• Each sequence must end up in exactly 1 taxon.

if x R y & y R z x R z

Page 14: Rapid quantification and taxonomic classification of a complex consortium of rDNA amplicons from both prokaryotic and eukaryotic origins using a microarray

Assigning Probes for GeneChip MicroarrayAssigning Probes for GeneChip Microarray

• Select probe sets for each taxon• Ideal Probe

• Present in all sequences of the taxon• Not present outside the taxon• Unable to X-hybe with seqs in other taxa

• Ideal Mis-match Control Probe• Unable to X-hybe to any sequence

Page 15: Rapid quantification and taxonomic classification of a complex consortium of rDNA amplicons from both prokaryotic and eukaryotic origins using a microarray

Finding groupingsFinding groupingsseq

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

A

B

C

D

E

F

G

H

I

J

K

L

M

N

O

sequ

ence

s

probes

Consider A – O to be 16S sequences.

Consider 1 – 24 to be probes already embedded on the chip.

First, associate all available probes with all available sequences.

Let probe similarities drive sequence groupings.

Page 16: Rapid quantification and taxonomic classification of a complex consortium of rDNA amplicons from both prokaryotic and eukaryotic origins using a microarray

Finding groupingsFinding groupingsseq

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

A

B

C

D

E

F

G

H

I

J

K

L

M

N

O

Consider A – O to be 16S sequences.

Consider 1 – 24 to be probes already embedded on the chip.

First, associate all available probes with all available sequences.

Let probe similarities drive sequence groupings.

Page 17: Rapid quantification and taxonomic classification of a complex consortium of rDNA amplicons from both prokaryotic and eukaryotic origins using a microarray

Finding groupingsFinding groupingsseq

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

A

B

C

D

E

F

G

H

I

J

K

L

M

N

O

Consider A – O to be 16S sequences.

Consider 1 – 24 to be probes already embedded on the chip.

First, associate all available probes with all available sequences.

Let probe similarities drive sequence groupings.

Page 18: Rapid quantification and taxonomic classification of a complex consortium of rDNA amplicons from both prokaryotic and eukaryotic origins using a microarray

Finding groupingsFinding groupingsseq

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

A

B

C

D

E

F

G

H

I

J

K

L

M

N

O

Consider A – O to be 16S sequences.

Consider 1 – 24 to be probes already embedded on the chip.

First, associate all available probes with all available sequences.

Let probe similarities drive sequence groupings.

Page 19: Rapid quantification and taxonomic classification of a complex consortium of rDNA amplicons from both prokaryotic and eukaryotic origins using a microarray

Finding groupingsFinding groupingsseq

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

A

B

C

D

E

F

G

H

I

J

K

L

M

N

O

Consider A – O to be 16S sequences.

Consider 1 – 24 to be probes already embedded on the chip.

First, associate all available probes with all available sequences.

Let probe similarities drive sequence groupings.

Page 20: Rapid quantification and taxonomic classification of a complex consortium of rDNA amplicons from both prokaryotic and eukaryotic origins using a microarray

Progressive Transitive Progressive Transitive ClusteringClustering

Count of Solved Clusters ith each Cycle's Parameters

1

10

100

1000

10000

1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77

Cycle

Co

un

tTotal Clusters

Solved Clusters

uGBpplock

uPWppsep

DEFINE: upp (useful probe pair): a PM,MM pair where the 20-mer

PM complements all intra-cluster sequences AND the central 16-mer of PM does not complement any extra-cluster sequences AND the central 16-mer of the MM does not complement any sequence. Probe pairs are reassessed whenever the sequence clusters are altered.

nGB­upp: number of upps for a cluster, these probe pairs globally differentiate a cluster from all other sequences.

L:­the value of nGB­upp which must be met for a cluster to be locked.

nPW­uppA: number of useful probe pairs which pair-wise differentiate clustA from clustB

nPW­uppB: number of useful probe pairs which pair-wise differentiate clustB from clustA

m: the value of nPW­upp­which must be met to inhibit two clusters from merging.

FOR L (11 .. 4) DO FOR m (1 .. 10) DO Determine nGB­upp­for each cluster; Lock all clusters where nGB­upp­≥ L­; Pair-wise compare non-locked clusters (clustA,­

clustB); UNLESS (nPW­uppA­≥ m­AND nPW­uppB­≥ m) Merge sequences of clustA and clustB into one

cluster; END UNLESS END FOR Uncluster non-locked clusters;END FOR

650 clusters found

Page 21: Rapid quantification and taxonomic classification of a complex consortium of rDNA amplicons from both prokaryotic and eukaryotic origins using a microarray

cctagcatgCattctgcatacctagcatgGattctgcata

MATCHMISMATCH

Approach: Custom Affymetrix GeneChip

• Massive parallelism – Up to 500,000 probes in a 1.28 cm2 array• Identification of multiple species in a mixed population• Single nucleotide mismatch resolution

Page 22: Rapid quantification and taxonomic classification of a complex consortium of rDNA amplicons from both prokaryotic and eukaryotic origins using a microarray

General General ProtocolProtocol

Air

Soil

Feces

Blood

Water

rRNA

gDNA

Universal 16S rDNA

PCR

Contains probes adhered to glass surface in grid

pattern.

Page 23: Rapid quantification and taxonomic classification of a complex consortium of rDNA amplicons from both prokaryotic and eukaryotic origins using a microarray

50 µ

50 µ

AC

GG

TC

GA

AC

GG

TC

GA

AC

GG

TC

GA

AC

GG

TC

GA

AC

GG

TC

GA

Hybridize

PCR Amplify DNA

Fractionate DNA

Biotin End-label

Locating Hybridization Locating Hybridization EventsEvents

Page 24: Rapid quantification and taxonomic classification of a complex consortium of rDNA amplicons from both prokaryotic and eukaryotic origins using a microarray

Parameter Frankia Clostridium Positive fraction 1.00 0.64Average difference 3720 625

Frankia sp. str. G48

PM MM

Clostridium butyricum

Page 25: Rapid quantification and taxonomic classification of a complex consortium of rDNA amplicons from both prokaryotic and eukaryotic origins using a microarray

Can the chip detect Can the chip detect more than one more than one

analyte?analyte?

Page 26: Rapid quantification and taxonomic classification of a complex consortium of rDNA amplicons from both prokaryotic and eukaryotic origins using a microarray

Combinatorial Combinatorial scoring of “Probe scoring of “Probe Sets” are able to Sets” are able to categorize mixed categorize mixed

samples.samples.

OTU   % pos pairs2.30.7.12.1.013*   1002.30.7.12.1.014   46 – 572.30.7.12.1.015   54 - 612.30.7.12.1.016   39 – 542.30.7.12.1.017   182.30.7.12.2.002   112.30.7.12.2.003   142.30.7.12.2.005   14 – 322.30.7.12.2.006   18 – 322.30.7.12.2.007   21 – 252.30.7.12.2.008   14 – 292.30.7.12.3.001   7 – 252.30.7.12.3.002   82.30.7.12.3.003   42.30.7.12.3.004   7 – 112.30.7.12.3.005   4 – 142.30.7.12.3.006   112.30.7.12.3.007   14 – 292.30.7.12.3.008   72.30.7.12.3.009   4 – 112.30.7.12.3.010   0 - 42.30.7.12.4.001   21 – 362.30.7.12.4.004*   100

2.30.7.12.4.005   0 – 112.30.7.12.4.006   29 – 542.30.7.12.4.007   11 – 142.30.7.12.4.008   11       

S.­aureus­spike

B.­anthracis­spike

Can the chip detect Can the chip detect more than one more than one

analyte?analyte?

Page 27: Rapid quantification and taxonomic classification of a complex consortium of rDNA amplicons from both prokaryotic and eukaryotic origins using a microarray

OTU   % pos pairs2.30.7.12.1.013*   1002.30.7.12.1.014   46 – 572.30.7.12.1.015   54 - 612.30.7.12.1.016   39 – 542.30.7.12.1.017   182.30.7.12.2.002   112.30.7.12.2.003   142.30.7.12.2.005   14 – 322.30.7.12.2.006   18 – 322.30.7.12.2.007   21 – 252.30.7.12.2.008   14 – 292.30.7.12.3.001   7 – 252.30.7.12.3.002   82.30.7.12.3.003   42.30.7.12.3.004   7 – 112.30.7.12.3.005   4 – 142.30.7.12.3.006   112.30.7.12.3.007   14 – 292.30.7.12.3.008   72.30.7.12.3.009   4 – 112.30.7.12.3.010   0 - 42.30.7.12.4.001   21 – 362.30.7.12.4.004*   100

2.30.7.12.4.005   0 – 112.30.7.12.4.006   29 – 542.30.7.12.4.007   11 – 142.30.7.12.4.008   11       Percent of probe-pairs scored positive for each probe set in the Staphylococcus Group.

Hybridization results from spike-in experiment done in

triplicate.

Sonya Murray

Aubree Hubbel

Can the chip detect Can the chip detect more than one more than one

analyte?analyte?

Combinatorial Combinatorial scoring of “Probe scoring of “Probe Sets” are able to Sets” are able to categorize mixed categorize mixed

samples.samples.

Page 28: Rapid quantification and taxonomic classification of a complex consortium of rDNA amplicons from both prokaryotic and eukaryotic origins using a microarray

Application ExampleApplication Example

• Does air filter sample processing affect detection?– Method 1

• Wash particles from filter with SDS

• Digest particles with lysozyme

• Purify DNA using Qiagen kit

– Method 2• Pulverize filter and particles with bead mill, SDS,

P:C:ISA

• Purify DNA using MoBio kit and Sephacryl column

Page 29: Rapid quantification and taxonomic classification of a complex consortium of rDNA amplicons from both prokaryotic and eukaryotic origins using a microarray

Bead beating allowed greater diversity to be

detected.

Page 30: Rapid quantification and taxonomic classification of a complex consortium of rDNA amplicons from both prokaryotic and eukaryotic origins using a microarray

Quantitative AnalysisQuantitative Analysis

• Could the concentration of each amplicon in a sample be measured by fluorescence intensity?

• Experimental setup for 20 point Latin Square calibration:

Experiment Oc.oenos Fer.nod Sap.grand M.neuro H20 Environmental amplicons*

1 5 13 31 74 No Yes

2 13 31 74 143 No Yes

3 31 74 143 5 No Yes

4 74 143 5 13 No Yes

5 143 5 13 31 No Yes

6 0 0 0 0 Yes Yes

* 18uL of products from 30 cycle universal 16S PCR of gDNA extracted from U.K. air sample.

SPIKE CONCENTRATION (pM in Hybridization Solution)SPIKE CONCENTRATION (pM in Hybridization Solution)

Sonya Murray

Carol Stone

Page 31: Rapid quantification and taxonomic classification of a complex consortium of rDNA amplicons from both prokaryotic and eukaryotic origins using a microarray

                        Oo               Fn               Sg               Mn

1                       5 (5474)         13 (16069)       31 (31805)       74 (124732)

2                       13 (7885)       31 (61185)       74 (81107)       143 (115237)

3                       31 (58912)       74 (70317)       143 (98235)      5 (8759)

4                       74 (101803)      143 (69529)      5 (7789)         13 (11530)

5                       143 (149869)     5 (4534)         13 (16228)       31 (56103)

6                       n.a.      n.a          n.a.        n.a.Final concentration of spike in hybridization in pM.  Values in parentheses are the resulting hybridization signal in

arbitrary units (a.u.) obtained from the Latin Square experiments. All spikes were added to 18µL of products of 30 cycle universal SSU PCR of gDNA extracted from air samples using Method 2.

Page 32: Rapid quantification and taxonomic classification of a complex consortium of rDNA amplicons from both prokaryotic and eukaryotic origins using a microarray

Log2 transformed

Linear Least Squares Regression

Pearson’s corr coeff was significant (df=18)

95% confidence intervals calculated according to: National Measurement System’s Valid Analytical Measurement Programme (VAM)

Figure 2 - Calibration Plot

y = 0.9207x + 10.504R = 0.974

9

11

13

15

17

19

0 1 2 3 4 5 6 7 8

log2 Concentration (pM)

log

2 H

ybS

core

Spike-in rDNA

Environmental rDNA

95% Confidence Limits

Spike-in Regression

Page 33: Rapid quantification and taxonomic classification of a complex consortium of rDNA amplicons from both prokaryotic and eukaryotic origins using a microarray

Environmental community is measured with confidence intervals.

Figure 3 - Concentration of Environmental SSU Amplicons

0 20 40 60 80 100 120 140

Clostridium thermobutyricumStreptococcus anginosus

Bacillus racemilacticusPseudomonas sp.

symbiont of Solemya velumClostridium limosum+

Eurotiales (Aspergillus+)Bartonella+

Staphylococcus delphini+Vibrio parahaemolyticus+

Pasteurella sp.Heterotextus alpinus

StreptomycesStaphylococcus cohnii+

Propionibacterium lymphophilumLeucostoma persoonii

Tax

a

rDNA Concentration (pM)

Conf Interval: Conc(t(RSE)/b)(1/m+1/n+((Y-y)2) / (b2(n-1)sx2))

b = slope from regression

Y = mean of 6 replicate measurements

m = number of repeat measurements = 6

y = mean of the HybScores for the 20 points used for calibration

t = critical value obtained from t-table for 18 d.f. for 95% = 1.734

RSE = residual standard error of calibration points = 0.56

sx = standard deviation of the conc. for the 20 points used for calibration

Page 34: Rapid quantification and taxonomic classification of a complex consortium of rDNA amplicons from both prokaryotic and eukaryotic origins using a microarray

SummarySummary

The SSU microarray was able to rapidly quantify and taxonomically classify of a complex consortium of rDNA amplicons from both prokaryotic and eukaryotic orgins.

Page 35: Rapid quantification and taxonomic classification of a complex consortium of rDNA amplicons from both prokaryotic and eukaryotic origins using a microarray

AcknowledgementsAcknowledgements

• Gary Andersen – group Leader

• Carol Stone – sample collection, hybridization Sonya Murray - hybridizations

Page 36: Rapid quantification and taxonomic classification of a complex consortium of rDNA amplicons from both prokaryotic and eukaryotic origins using a microarray