Upload
blanche-wade
View
214
Download
0
Embed Size (px)
Citation preview
Development and Evaluation of a Comprehensive Functional Gene array for Environmental Studies Zhili He1,2, C. W. Schadt2, T. Gentry2, J. Liebich3, S.C. Song2, X. Li4, and J. Zhou 1,2
To detect and monitor functions of microbial organisms in their environments, functional gene arrays (FGAs) have been used as a promising and powerful tool. In this study, we have constructed the second generation of FGA, called FGA2.0 that contains 23,843 oligonucleotide (50mer) probes and covers more than 10,000 sequences of targeted genes, which are involved in nitrogen, carbon, sulfur cycling and metabolism, metal reduction and resistance, and organic contaminant degradation. Several new strategies have been implemented in probe design, array construction and data analysis. Gene sequences were automatically retrieved by key words. A newly developed oligonucleotide design program CommOligo was used to select gene-specific and group-specific probes, and multiple probes were designed for each gene sequence or each group of highly homologous sequences. All designed oligonucleotides were verified and output in a 96-well format for direct order placement of oligonucleotide synthesis. To ensure the array specificity, the array has been systematically evaluated using different targets and environmental samples. The results demonstrate that such an array can provide specific analysis of microbial communities in a rapid, high-through-put and cost-effective fashion.
ABSTRACT
EXPERIMENTAL DESIGN
1The University of Oklahoma, Norman, OK, 2Oak Ridge National Laboratory, Oak Ridge, TN, 3Forschungszentrum Julich GmbH, Julich, Germany, 4Perkin Elmer Life and Analyetical Sciences, Boston, MA
RESULTS
CONCLUSIONS
This research was funded by the U.S. Department of Energy (Office of Biological and Environmental Research, Office of Science) grants from the Genomes To Life Program and ERSP Program.
ACKNOWLEDGEMENTS
Download sequences from databases GeneDownloader
Oligonucleotide probe design
CommOligo
Gene-specific probes Group-specific probes
Check designed probes
ProbeChecker
Output desired probes PlateProducer
Comprehensive FGA Covering 10,511 sequences with 23,843 probes
Table 1. Artificial oligonucleotide target information (U = unique or gene-specific; G = Group-specific) Target-ID Artificial target sequences (5'->3') Dye Primary probe Spec. Targeted gene/category T1 GATCTCGTCGAAGGGCTCTTCGGCGGCGACGAAGGGAACCAGCAGAGAGC Cy5 902747_840 U 902747/dsrAB
T2 CGGATGCATGACCATCTCGGGCACCGGCTCCAGACCGATCTGCTCCAGGA Cy3 FW005228A_393 U FW005228A/dsrAB
T3 CACTTGATGTCGCGGAAGCCGGCCTTCACGACCTCCTCCAGCTCCAGGTC Cy5 DA-NIFH-E01_77 U DA-NIFH-E01/nifH
T4 GTCGCCCAACACGTCATACGATACGTAGTCCGCGTTTTCGTACGCGCCGT Cy3 ORE-NIFH-B09_207 U ORE-NIFH-B09/nifH
T5 GGCCACGCATGGTCTTGACGACGTCGTCATAGGCTTCGCCGACGCTGTCA Cy5 DA-NIRK-E07_125 U DA-NIRK-E07/nirK
T6 CCTGAGAGTGGACGATCAGAACCGTTTCCCCGACTTTGGCCTTCATGGCG Cy3 ORE-NIRK-C03_230 U ORE-NIRK-C03/nirK
T7 CCCCGCATGACCTTCACGGTATCGTCGTAGTTGTCGCCTGCCGACTCGTA Cy5 DE-NIRK-F11_123 U DE-NIRK-F11/nirK
T8 GGCTTCACCGGGGCTGTCGTAGGTTTTGTATTTACCCTTCTCGTCCTTTG Cy3 ORA-NIRK-C01_94 U ORA-NIRK-C01/nirK
T9 GAGTTTTTCGAGTACAGAAGCACCGATACTCGCGTTCGGAAAGATTCGGA Cy5 AmoAsite12-G10-072_176 U Amo-G10-072/amoA
T10 CCTCCTCGAAGAAAATGTACGGGTTGGTCCGTGGATGCATCACCATCTGG Cy3 4:FW015046A_425 G FW015046A/dsrAB
T11 TCTCGGCTCCTGTACGTGATGAGCAACGGGTTTGATTCCGGTTGCCGTTA Cy5 45:FW005298A_354_1 G FW005298A/dsrAB
T12 CTCCTGTACGTGCTGAGCAACGGGTTTGATTCCGGTTGCCGCTAAGAGCT Cy3 45:FW015015A_363_2 G FW015015A/dsrAB
T13 ATCTCGCAGGTCTTGGGCAACTCGTTGTGGTCGATGGCCGGCACTTTGGT Cy5 47:FW010025B_556 G FW010025B/dsrAB
T14 GCGATTGGCCTCTGAATGAACGATCAAGACGTTCTCTCCAACCTTCGCTG Cy3 15:ORA-NIRK-F04_238 G ORA-NIRK-F04/nirK
T15 ATTTATATAGAAGGACTCCGGGGTTTTGCCAGCAAATTGGGATGAGGAAG Cy5 8:AmoAsite12-B03-025_82 G Amo-B03-025/amoA
T16 AACGTAATATATCCCCTTTTCAGCATCCCAGGCAAATGGCCCATTGATTG cy3 12007366_1832 U 12007366/cellulase
T17 GCTGTTCCGTTAGACCAACCGTTGCGATTGCCGGGGTAGTGAACGTAACC cy5 11071709_1272 U 11071709/merA
T18 GTTCTCCCCCACCGGGAGGCGACTCGCTAGGATGGAAAGCGAACCGACAA cy3 15807792_275 U 15807792/Ar reductase
T19 TAGCTCATCGCTGGAGGGACCGCTTCAGTCGTGGCAGCGGTCTGAGGACT cy5 19352377_16 U 19352377/Cr resistance protein
T20 GAACAGCATGCACCCTCCAGCGACAATCTCAAGGGATGACCTGACAGGGT cy3 22003701_216 U 22333701/mcrA
T21 GATCTTGAACACCGCTACCGACGCGGCGATCTCGGGTTCGGGGTTGAGCG cy5 22252884_219 U 22252884/nirS
T22 AACTCGGGGATCTTCTCTTCCTTCGCGACGTGCTGGTACCACTTGGAGTG cy3 18369656_244 U 18369656/bzdO
T23 TCGCGGCGCCCGCATTCAGTTCAATGTCCCCCTCGGCGGGGAAAATTTCT cy5 12313640_13 U 12313640/urease
T24 TCGAACAGCATGCATCCTCCGGCCACGATCTCAAGTGATGAGCGTACCGG cy3 21:22003692_218 G 22003692/mcrA
T25 CAGACCGGACCGAATTTTGCGTCAACAAAATTCGCGCCACGACCGGGATG cy5 25:24421447_266 G 24421447/nirS
Table 3. Information of PCR-generated targets hybridized with their gene-specific probes on the array Target-ID Forward primer (5'->3') Reverse primer (5'->3') Template Primary probe-ID Gene/category Size (bp)
P1 TCACCACGAAGTCTTGAAGC CTGGTCTGTTGCGATGATGT Lab clone FW010117B_436 FW010117B/dsrAB 661 (dsrA-C11) FW010117B_613 FW010117B_671
P2 GATGGTGGATGGAGTTGGTC CCTTCTTCTGACGCTCTTCG Lab clone FW010245A_189 FW010245A/dsrAB 262 (dsrA-C12) FW010245A_220 FW010245A_77
P3 AAGCTGGGACTACCACGAGA CTACGATTTCCACCGATTGC Lab clone FW015318B_180 FW015318B/dsrAB 617 (dsrB-D5) FW015318B_188 FW015318B_200
P4 GGGAAGTGGAAATACCACGA GACTGGACACCTTCGACCAT Lab clone FW300088B_251 FW300088B/dsrAB 626 (dsrB-D6) FW300088B_307 FW300088B_364
P5 ACCACGTCGCAGAACACC ATCTCCTGCGCCTTGTTCTC Lab clone DA-NIFH-E01_77 DA-NIFH-E01/nifH 392 (nifH-E01)
P6 CCGAGATGGGACAGAACATT TAGATTTCCTGCGCCTTGTT Lab clone ORE-NIFH-B09_199 ORE-NIFH-B09/nifH 361 (nifH-B09) ORE-NIFH-B09_207
P7 AAGGACGAAAAGGGCAAGC GATCAGGTGCGGACGAGT Lab clone ORE-NIRK-C03_124 ORE-NIRK-C03/nirK 285 (nirK-C03) ORE-NIRK-C03_230 ORE-NIRK-C03_252
P8 GCCTCAAGGACGACAAGG CGTGTCACGGTTGGCTTG Lab clone DE-NIRK-F11_103 DE-NIRK-F11/nirK 275 (nirK-F11) DE-NIRK-F11_123 DE-NIRK-F11_238
P9 CGGGTACAATCCCGAAAAG GATGAGGTGGTGCGAGAACT Genomic DNA 902748_270 gi|902748/dsrAB 1069 (D. vulgaris) 902748_32 902748_56
P11 GTTGCTGCGAGCGATTTATT AAGTATTCCGCCATCTGCTG Genomic DNA 24372305_11 gi|24372305/cytC 279 (S. oneidensis) 24372305_97
P12 TTCTCGCAACAGAAAAAGGAA TTCCTCCTTCTCCAACTTCG Genomic DNA 5712689_430 gi|5712689/cellulase 858 (M. maripludis) 5712689_488 5712689_759
Oligonucleotide design and synthesis. A computer program CommOligo (Li et al., 2005) was used to design gene-specific and group-specific probes based on the following criteria: (i) gene-specific probes: <=90% sequence identity, <=20-base continuous stretch, and >=-35 kcal/mol free energy; (ii) group-specific probes: >=96% sequence identity, >= 35-base continuous stretch, and <=-60 kcal/mol free energy (He et al., 2005a; Liebich et al., 2006). Each gene sequence or a group of homologous sequences had up to three probes. All verified probes were synthesized without modification by MWG Biotech, Inc. (High Point, NC) in a 96-well plate format with the concentration of 100 pmol/µl.
Oligonucleotide target synthesis. 25 oligonucleotides were synthesized as gene-specific and group-specific targets to evaluate the FGA specificity (Table 1). 50 pg for each oligonucleotide was used for hybridizations with a single target or a mixture of multiple targets.
Preparations of PCR-generated targets. 17 target genes were selected, and their PCR products (PCR-amplicons) were obtained using gene-specific primers and standard PCR methods (Table 2 and Table 3). Each PCR product had a minimal length to cover all available probes (1, 2 or/and 3 depending on probes selected) on the array.
DNA labeling and hybridization. The PCR-amplicons were fluorescently labeled by random priming using Klenow fragment of DNA polymerase as described previously (He et al., 2005b). Hybridization was at 50oC with 50% formamide.
Table 2. Information of PCR-generated targets to be hybridized with their group-specific probes on the array Target-ID Forward primer (5'->3') Reverse primer (5'->3') Template Targeted probe-ID Gene/category Size(bp) G14-1 AGACGGCCTCACCGACGGCAA CAGACATAGTCGCCATGACC Lab clone 14:DA-NIRK-D09_20 NIRK-C08/nirK 318
(Nirk-C08) NIRK-C09/nirK
G15-1 ACGCCCTTCATTACGACAAG GTCGCGATTGGCCTCTGAAT Lab clone 15:ORA-NIRK-F04_215 ORA-NIRK-F04/nirK 251
(NirK-F04) 15:ORA-NIRK-F04_229 ORA-NIRK-F11/nirK 251
15:ORA-NIRK-F04_238
G15-2 ACGCCCTTCATTACGACAAG GTCGCGATTGGCCTCTGAAT Lab clone 15:ORA-NIRK-F04_215 ORA-NIRK-F04/nirK 251
(NirK-F11) 15:ORA-NIRK-F04_229 ORA-NIRK-F11/nirK 251
15:ORA-NIRK-F04_238
G27-1 GACGGTCTCAAGGATGGCAGT AGTGAATGATCAGCACGGTTT Lab clone 27:DA-NIRK-C10-108 DA-NIRK-C10/nirK 259
(NirK-C10) 27:DE-NIRK-G06-35 DE-NIRK-G06/nirK 259
27:DE-NIRK-G10-23 DE-NIRK-G10/nirK 258
DE-NIRK-B12/nirK
G27-2 GACGGTCTCAAGGATGGCAGT AGTGAACGATCAGCACGGTTT Lab clone 27:DA-NIRK-C10-108 DA-NIRK-C10/nirK 259
(NirK-G06) 27:DE-NIRK-G06-35 DE-NIRK-G06/nirK 259
27:DE-NIRK-G10-23 DE-NIRK-G10/nirK 258
DE-NIRK-B12/nirK
G27-3 ACGGTCTCAAGGATGGCAGT AGTGAATGATCAGCACGGTTT Lab clone 27:DA-NIRK-C10-108 DA-NIRK-C10/nirK 259
(NirK-G10) 27:DE-NIRK-G06-35 DE-NIRK-G06/nirK 259
27:DE-NIRK-G10-23 DE-NIRK-G10/nirK 258
DE-NIRK-B12/nirK
Fig. 1 Major steps for construction of a comprehensive 50mer oligo functional gene array. CommOligo is the core program to select gene-specific and group-specific oligonucleotide probes. GeneDownloader, ProbeChecker, and PlateProducer were Perl scripts to pre-process gene sequences or post-process oligonucleotide probes.
For gene-specific probes, Fig. 2 shows the distribution of maximal sequence identities (Fig. 2A), maximal stretch lengths (Fig. 2B), or minimal free energy (Fig. 2C) with their non-targets. Most of the probes (~70%) had maximal sequence identities 72%~84%, stretch lengths 12~15 bases, and 0~-30kcal/mol free energy.
For group-specific probes, Fig. 3 shows the distribution of minimal sequence identities (Fig. 3A), minimal stretch lengths (Fig. 3B), or minimal free energy (Fig. 3C) with their group members. Most of the probes (~92%) had maximal sequence identities 100%, stretch lengths 45~50 bases, and free energy values of -65 kcal/mol or smaller.
Table 4. Summary of the numbers of probes and covered genes by category for the constructed comprehensive FGA
Gene category Unique probes Group probes Total
Nitrogen fixation 1225 0 1225
Nitrification/N metabolism 865 902 1767
Denitrification 1805 501 2306
Sulfur reduction 1286 329 1615
Methane reduction and oxidation 437 333 770
Carbon fixation 584 215 799
Carbon polymer degradation 2532 276 2808
Metal reduction and resistance 4039 507 4546
Organic contaminant degradation 6920 1087 8007
Total 19693 4150 23843
Fig. 4 The FGA was hybridized with a mixture of 15 synthesized oligonucleotide targets at 42oC, 45oC, 50oC and 60oC. Balancing probe sensitivity and specificity, the optimal hybridization temperature was determined to be 45-50oC with 50% formamide, which is generally consistent with our previous results.
Signal intensities for probe B and C were normalized with probe A (100%), and there were 14, 12 and 10 probe A, B, and C, respectively (Table S2 and Table S3).
The average of relative signal intensities for probe A, B and C were 100%, 103.8%, and 97.6%, respectively, and similarly, the average of SNR values were 73.1, 67.2 and 65.3 for probe A, B, and C, respectively (Fig. 5).
The results suggest that three probes performed similarly with known targets.
Table 5. The summary of dye-labeled targets (oligonucleotides or PCR-amplicons) hybridized with the FGA.
Target Oligonucleotide PCR-amplicon No. of targets 25 17
Expected no. of probes detected 38 35
No. of probes hybridized 40 39
Average signal intensity ± SD 5678±3372 9265±5270
Average SNR ± SD 55.9±27.56 87.6±38.72
No. of false positives 3 4
No. of false negatives 2 0
1. An FGA2.0 has been constructed with more than 23,000 oligos covering more than 10,000 gene sequences. To our knowledge, this is the most comprehensive FGA for environmental studies.
2. To ensure the array specificity, several new features has been implemented in the probe design, and array construction.
3. The FGA2.0 has been systematically evaluated using oligonucleotide and PCR-amplicon targets, and demonstrates that it can be used as a powerful tool for a rapid, high-through-put and cost-effective analysis of microbial communities.
4. The array can be used to profile microbial community differences, to address specific questions and/or hypotheses related to microbial population dynamics, and analyses of functional gene expression in microbial communities.
FGA II design strategies: 1. Using MSA to identify conserved regions for each functional gene.
2. Using experimentally established oligonucleotide design criteria and the novel software tool CommOligo.
3. Designing gene-specific and group-specific probes.
4. Multiple probes for each sequence or each group of sequences.
• 15.2% probes target carbon metabolism genes
• 22.2% probe target the genes involved in nitrogen cycling
• 6.8% probes for sulfur reduction genes
• 3.6% probes for methane reduction and oxidation
• 19.0% probes target genes involved in metal reduction and resistance
• 34.0% probes target genes involved in degradation of organic compounds
Fig. 2 Fig. 3
For oligo targets, there were three false positives and two false negatives, and for PCR-amplicon targets, four false positives and no false negatives observed (Table 5).
Possible reasons include: (i) First, the amounts of some oligonucleotides or PCR-amplicons applied to the array was too high or too low; (ii) Probe design criteria used were not specific enough for excluding all non-specific probes, and that some additional criteria may need to be considered; (iii) an optimization of hybridization conditions may improve probe specificity; (iv) there may be errors in probe or/and gene sequences.
To tackle the problem of false positives, relative comparisons are needed.
Fig. 5 Relative signal intensities and SNR values detected by probe A, B and C for PCR-amplicon targets.
REFERENCES
He Z, Wu L, Li X, Fields MW and Zhou J (2005a). Appl. Environ. Microbiol. 71:3753-3760.
He Z, Wu L, Fields MW and Zhou J (2005b). Appl. Environ. Microbiol. 71: 5154-5162.
Li X*, He Z* and Zhou J (2005). Nucleic Acid Res. 33: 6114-6123 (*Co-first authors).
Liebich J, Schadt CW, Chong SC, He Z, Rhee SK and Zhou J (2006). Appl. Environ. Microbiol. 72:1688-1691.
N125 http://ieg.ou.edu/