1
Roxana Hickey , 1,2 Sam Hunter, 2 Matthew Settles, 2 Bing Ma, 3 Garry Myers, 3 Jacques Ravel 3 and Larry Forney 1,2 1 Department of Biological Sciences and 2 Institute for Bioinformatics and Evolutionary Studies (IBEST), University of Idaho, Moscow, Idaho, USA 3 Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, Maryland, USA why a microarray for microbiome analysis? design & production of the V-chip 200 vaginal bacterial species (336 strains) genome sequences in public databases 812,653 genes/ORFs 693,707 unique <10 bp or >9,999 bp excluded (n=240) Raw data input Cd-hit clustering: 80% identity, 80% coverage 473,709 gene/ORF clusters Representative sequences from multiple sequence alignment in MUSCLE Sequence clustering & alignment 3-plex 4.2M probe array 1.2M per subarray 2-5 60-mer probes per cluster 307,860/473,709 clusters represented 716 human immunity genes (3,580 probes) Probe design mock communities: bacterial + human genomic DNA Table 1. Composition of mock communities tested on the V-chip microarray Mock community A B C D E F G H I Species Proportion of genomic DNA in mixture Anaerococcus hydrogenalis (vaginal isolate) 0.200 0.100 0.010 0.010 0.001 - - - - Anaerococcus tetradius (vaginal isolate) 0.200 0.100 0.010 0.010 0.001 - - - - Atopobium vaginae ATCC BAA-55 0.200 0.100 0.010 0.010 0.001 - - - - Finegoldia magna (vaginal isolate) 0.200 0.100 0.010 0.010 0.001 - - - - Gardnerella vaginalis ATCC 14018 0.200 0.100 0.010 0.010 0.001 - - - - Lactobacillus crispatus ATCC 33820 - 0.500 0.950 0.050 0.050 0.050 0.010 1.000 - Homo sapiens female genomic DNA - - - 0.900 0.945 0.950 0.990 - 1.000 vaginal swabs: V-chip vs. 16S rRNA pyrosequencing Subject 1 (L. iners) Subject 2 (L. crispatus) Figure 2. Daily temporal dynamics of vaginal bacterial communities in two women over 10 weeks, along with associated subject metadata. Subject 1’s microbiota remained relatively stable throughout the study, while Subject 2 experienced a drastic alteration of community composition following vaginal intercourse in week 8. 1 2A 2B A B Figure 3. Relative abundances of species detected on the V-chip compared to 16S rRNA V1-V2 pyrosequencing. Table 2. Pearson correlation coefficients for V-chip vs. 16S rRNA species relative abundance V-chip vs. 16S rRNA 1 DNA vs. 16S cDNA vs. 16S Sample 1 1.00 0.92 Sample 2A 0.53 0.59 Sample 2B 0.84 0.69 Overall 0.92 0.84 1 Hypervariable V1-V2 region of 16S rRNA genes were determined previously by pyrosequencing Table 3. Pearson correlation coefficients for V-chip vs. in silico mapping of Illumina RNA-Seq reads against probes Genus (upper diag.) / Species (lower) Sample 1 cDNA Sample 1 reads Sample 2A cDNA Sample 2A reads Sample 1 cDNA 1.00 0.71 0.37 0.31 Sample 1 reads 0.75 1.00 0.33 0.53 Sample 2A cDNA 0.36 0.31 1.00 0.79 Sample 2A reads 0.20 0.25 0.77 1.00 Figure 4. Number of species-specific gene clusters with a greater than two-fold change between time points in the metatranscriptome of samples 2A and 2B. in silico hybridization: V-chip vs. Illumina RNA-Seq using V-chip to measure transcriptional response High-throughput DNA sequencing technologies have drastically reduced costs and improved efficiency, but bioinformatics expertise and computational time remain a barrier to rapid analysis and accessibility Microarrays are a faster, more straightforward alternative to sequencing that can more readily be utilized in smaller research studies and clinical settings V-chip microarray developed for explorative analysis of the vaginal microbiome vaginal swabs: metagenomes & metatranscriptomes conclusions V-chip shows high-level agreement with sequencing-based techniques Utility as a discovery tool with a variety of potential applications: Identify genes or taxa that contribute to community ecological function or health outcomes of the host Screen samples for interesting patterns to select manageable subset for in depth investigation acknowledgments RH is supported by a fellowship from the University of Idaho Bioinformatics and Computational Biology Graduate Program and grant U19 AI084044 from the National Institutes of Health. The research study was supported by grants UH2 AI083264 and U01 AI070921 from the National Institute of Allergy and Infectious Diseases, National Institutes of Health. The authors are grateful to Daniel New (IBEST Genomics Core), Renee Nuhn (Forney lab) and Li Fu (Ravel lab) for their technical assistance. Figure 1. Relative abundances of species in mock community samples detected by species-specific probes on the V-chip (left) compared to expected (right). Relative abundance was determined from RMA- normalized hybridization signal intensity values. Mock communities of genomic DNA (bacterial + human) tested to evaluate qualitative and quantitative capabilities of the V-chip Metagenome (DNA) & metatranscriptome (mRNA cDNA) of three vaginal swabs tested to evaluate utility of V-chip for vaginal microbial communities High correlation of species relative abundance between cDNA hybridization on V-chip and in silico mapping of RNA-Seq reads Illumina RNA-Seq reads (100 bp) mapped to V-chip probes in silico using Bowtie (v0.12.9) ~2-3% of total reads mapped to V-chip probes V-chip sensitive and specific, but not quantitative, for detecting species- specific genes in mock communities V-chip useful for performing qualitative comparisons of samples As with mock communities, V-chip sensitive but not highly quantitative for detecting species-specific genes in vaginal microbial communities V-chip sensitive to low-abundance bacteria, but more useful as qualitative rather than quantitative tool, particularly for highly skewed communities Comparison of relative abundance of 42 bacterial species in metagenome and metatranscriptome on V-chip with previous 16S rRNA V1-V3 pyrosequencing results Many species-specific genes differentially expressed, reflecting changes in community composition over time interval

Development and validation of V-chip, a DNA microarray for explorative analysis of the vaginal microbiome

Embed Size (px)

Citation preview

Page 1: Development and validation of V-chip, a DNA microarray for explorative analysis of the vaginal microbiome

Roxana Hickey,1,2 Sam Hunter,2 Matthew Settles,2 Bing Ma,3 Garry Myers,3 Jacques Ravel3 and Larry Forney1,2 1Department of Biological Sciences and 2Institute for Bioinformatics and Evolutionary Studies (IBEST), University of Idaho, Moscow, Idaho, USA

3Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, Maryland, USA

why a microarray for microbiome analysis?

design & production of the V-chip

•  200 vaginal bacterial species (336 strains) •  genome sequences in

public databases •  812,653 genes/ORFs

•  693,707 unique •  <10 bp or >9,999 bp

excluded (n=240)

Raw data input

•  Cd-hit clustering: 80% identity, 80% coverage

•  473,709 gene/ORF clusters •  Representative sequences

from multiple sequence alignment in MUSCLE

Sequence clustering & alignment

•  3-plex 4.2M probe array •  1.2M per subarray

•  2-5 60-mer probes per cluster

•  307,860/473,709 clusters represented

•  716 human immunity genes (3,580 probes)

Probe design

mock communities: bacterial + human genomic DNA

Table 1. Composition of mock communities tested on the V-chip microarray Mock community! A! B! C! D! E! F! G! H! I!

Species! Proportion of genomic DNA in mixture!Anaerococcus hydrogenalis (vaginal isolate) 0.200 0.100 0.010 0.010 0.001 - - - - Anaerococcus tetradius (vaginal isolate) 0.200 0.100 0.010 0.010 0.001 - - - - Atopobium vaginae ATCC BAA-55 0.200 0.100 0.010 0.010 0.001 - - - - Finegoldia magna (vaginal isolate) 0.200 0.100 0.010 0.010 0.001 - - - - Gardnerella vaginalis ATCC 14018 0.200 0.100 0.010 0.010 0.001 - - - - Lactobacillus crispatus ATCC 33820 - 0.500 0.950 0.050 0.050 0.050 0.010 1.000 - Homo sapiens female genomic DNA - - - 0.900 0.945 0.950 0.990 - 1.000

vaginal swabs: V-chip vs. 16S rRNA pyrosequencing

Subject 1 (L. iners) Subject 2 (L. crispatus) Figure 2. Daily temporal dynamics of vaginal bacterial communities in two women over 10 weeks, along with associated subject metadata. Subject 1’s microbiota remained relatively stable throughout the study, while Subject 2 experienced a drastic alteration of community composition following vaginal intercourse in week 8.

1 2A

2B

A B

Figure 3. Relative abundances of species detected on the V-chip compared to 16S rRNA V1-V2 pyrosequencing.

Table 2. Pearson correlation coefficients for V-chip vs. 16S rRNA species relative abundance  ! V-chip vs. 16S rRNA1!

DNA vs. 16S! cDNA vs. 16S!Sample 1 1.00 0.92 Sample 2A 0.53 0.59 Sample 2B 0.84 0.69 Overall 0.92 0.84 1 Hypervariable V1-V2 region of 16S rRNA genes were determined previously by pyrosequencing

Table 3. Pearson correlation coefficients for V-chip vs. in silico mapping of Illumina RNA-Seq reads against probes Genus (upper diag.) / Species (lower)!

Sample 1!cDNA!

Sample 1!reads!

Sample 2A cDNA!

Sample 2A reads!

Sample 1 cDNA 1.00 0.71 0.37 0.31 Sample 1 reads 0.75 1.00 0.33 0.53 Sample 2A cDNA 0.36 0.31 1.00 0.79 Sample 2A reads 0.20 0.25 0.77 1.00

Figure 4. Number of species-specific gene clusters with a greater than two-fold change between time points in the metatranscriptome of samples 2A and 2B.

in silico hybridization: V-chip vs. Illumina RNA-Seq

using V-chip to measure transcriptional response

Ø High-throughput DNA sequencing technologies have drastically reduced costs and improved efficiency, but bioinformatics expertise and computational time remain a barrier to rapid analysis and accessibility

Ø Microarrays are a faster, more straightforward alternative to sequencing that can more readily be utilized in smaller research studies and clinical settings

Ø  V-chip microarray developed for explorative analysis of the vaginal microbiome

vaginal swabs: metagenomes & metatranscriptomes

conclusions Ø  V-chip shows high-level agreement with sequencing-based techniques Ø Utility as a discovery tool with a variety of potential applications:

•  Identify genes or taxa that contribute to community ecological function or health outcomes of the host

•  Screen samples for interesting patterns to select manageable subset for in depth investigation

acknowledgments RH is supported by a fellowship from the University of Idaho Bioinformatics and Computational Biology Graduate Program and grant U19 AI084044 from the National Institutes of Health. The research study was supported by grants UH2 AI083264 and U01 AI070921 from the National Institute of Allergy and Infectious Diseases, National Institutes of Health. The authors are grateful to Daniel New (IBEST Genomics Core), Renee Nuhn (Forney lab) and Li Fu (Ravel lab) for their technical assistance.

Figure 1. Relative abundances of species in mock community samples detected by species-specific probes on the V-chip (left) compared to expected (right). Relative abundance was determined from RMA-normalized hybridization signal intensity values.

Mock communities of genomic DNA (bacterial + human) tested to evaluate qualitative and quantitative capabilities of the V-chip

Metagenome (DNA) & metatranscriptome (mRNA è cDNA) of three vaginal swabs tested to evaluate utility of V-chip for vaginal microbial communities

High correlation of species relative abundance between cDNA hybridization on V-chip and in silico mapping of RNA-Seq reads Ø  Illumina RNA-Seq

reads (100 bp) mapped to V-chip probes in silico using Bowtie (v0.12.9)

Ø  ~2-3% of total reads mapped to V-chip probes

V-chip sensitive and specific, but not quantitative, for detecting species-specific genes in mock communities

V-chip useful for performing qualitative comparisons of samples

As with mock communities, V-chip sensitive but not highly quantitative for detecting species-specific genes in vaginal microbial communities

Ø  V-chip sensitive to low-abundance bacteria, but more useful as qualitative rather than quantitative tool, particularly for highly skewed communities

Ø Comparison of relative abundance of 42 bacterial species in metagenome and metatranscriptome on V-chip with previous 16S rRNA V1-V3 pyrosequencing results

Ø Many species-specific genes differentially expressed, reflecting changes in community composition over time interval