Development and Evaluation of a Comprehensive Functional Gene array for Environmental Studies Zhili He 1,2, C. W. Schadt 2, T. Gentry 2, J. Liebich 3,

Development and Evaluation of a Comprehensive Functional Gene array for Environmental Studies Zhili He1,2, C. W. Schadt2, T. Gentry2, J. Liebich3, S.C. Song2, X. Li4, and J. Zhou 1,2

To detect and monitor functions of microbial organisms in their environments, functional gene arrays (FGAs) have been used as a promising and powerful tool. In this study, we have constructed the second generation of FGA, called FGA2.0 that contains 23,843 oligonucleotide (50mer) probes and covers more than 10,000 sequences of targeted genes, which are involved in nitrogen, carbon, sulfur cycling and metabolism, metal reduction and resistance, and organic contaminant degradation. Several new strategies have been implemented in probe design, array construction and data analysis. Gene sequences were automatically retrieved by key words. A newly developed oligonucleotide design program CommOligo was used to select gene-specific and group-specific probes, and multiple probes were designed for each gene sequence or each group of highly homologous sequences. All designed oligonucleotides were verified and output in a 96-well format for direct order placement of oligonucleotide synthesis. To ensure the array specificity, the array has been systematically evaluated using different targets and environmental samples. The results demonstrate that such an array can provide specific analysis of microbial communities in a rapid, high-through-put and cost-effective fashion.

ABSTRACT

EXPERIMENTAL DESIGN

1The University of Oklahoma, Norman, OK, 2Oak Ridge National Laboratory, Oak Ridge, TN, 3Forschungszentrum Julich GmbH, Julich, Germany, 4Perkin Elmer Life and Analyetical Sciences, Boston, MA

RESULTS

CONCLUSIONS

This research was funded by the U.S. Department of Energy (Office of Biological and Environmental Research, Office of Science) grants from the Genomes To Life Program and ERSP Program.

ACKNOWLEDGEMENTS

Download sequences from databases GeneDownloader

Oligonucleotide probe design

CommOligo

Gene-specific probes Group-specific probes

Check designed probes

ProbeChecker

Output desired probes PlateProducer

Comprehensive FGA Covering 10,511 sequences with 23,843 probes

Table 1. Artificial oligonucleotide target information (U = unique or gene-specific; G = Group-specific) Target-ID Artificial target sequences (5'->3') Dye Primary probe Spec. Targeted gene/category T1 GATCTCGTCGAAGGGCTCTTCGGCGGCGACGAAGGGAACCAGCAGAGAGC Cy5 902747_840 U 902747/dsrAB

T2 CGGATGCATGACCATCTCGGGCACCGGCTCCAGACCGATCTGCTCCAGGA Cy3 FW005228A_393 U FW005228A/dsrAB

T3 CACTTGATGTCGCGGAAGCCGGCCTTCACGACCTCCTCCAGCTCCAGGTC Cy5 DA-NIFH-E01_77 U DA-NIFH-E01/nifH

T4 GTCGCCCAACACGTCATACGATACGTAGTCCGCGTTTTCGTACGCGCCGT Cy3 ORE-NIFH-B09_207 U ORE-NIFH-B09/nifH

T5 GGCCACGCATGGTCTTGACGACGTCGTCATAGGCTTCGCCGACGCTGTCA Cy5 DA-NIRK-E07_125 U DA-NIRK-E07/nirK

T6 CCTGAGAGTGGACGATCAGAACCGTTTCCCCGACTTTGGCCTTCATGGCG Cy3 ORE-NIRK-C03_230 U ORE-NIRK-C03/nirK

T7 CCCCGCATGACCTTCACGGTATCGTCGTAGTTGTCGCCTGCCGACTCGTA Cy5 DE-NIRK-F11_123 U DE-NIRK-F11/nirK

T8 GGCTTCACCGGGGCTGTCGTAGGTTTTGTATTTACCCTTCTCGTCCTTTG Cy3 ORA-NIRK-C01_94 U ORA-NIRK-C01/nirK

T9 GAGTTTTTCGAGTACAGAAGCACCGATACTCGCGTTCGGAAAGATTCGGA Cy5 AmoAsite12-G10-072_176 U Amo-G10-072/amoA

T10 CCTCCTCGAAGAAAATGTACGGGTTGGTCCGTGGATGCATCACCATCTGG Cy3 4:FW015046A_425 G FW015046A/dsrAB

T11 TCTCGGCTCCTGTACGTGATGAGCAACGGGTTTGATTCCGGTTGCCGTTA Cy5 45:FW005298A_354_1 G FW005298A/dsrAB

T12 CTCCTGTACGTGCTGAGCAACGGGTTTGATTCCGGTTGCCGCTAAGAGCT Cy3 45:FW015015A_363_2 G FW015015A/dsrAB

T13 ATCTCGCAGGTCTTGGGCAACTCGTTGTGGTCGATGGCCGGCACTTTGGT Cy5 47:FW010025B_556 G FW010025B/dsrAB

T14 GCGATTGGCCTCTGAATGAACGATCAAGACGTTCTCTCCAACCTTCGCTG Cy3 15:ORA-NIRK-F04_238 G ORA-NIRK-F04/nirK

T15 ATTTATATAGAAGGACTCCGGGGTTTTGCCAGCAAATTGGGATGAGGAAG Cy5 8:AmoAsite12-B03-025_82 G Amo-B03-025/amoA

T16 AACGTAATATATCCCCTTTTCAGCATCCCAGGCAAATGGCCCATTGATTG cy3 12007366_1832 U 12007366/cellulase

T17 GCTGTTCCGTTAGACCAACCGTTGCGATTGCCGGGGTAGTGAACGTAACC cy5 11071709_1272 U 11071709/merA

T18 GTTCTCCCCCACCGGGAGGCGACTCGCTAGGATGGAAAGCGAACCGACAA cy3 15807792_275 U 15807792/Ar reductase

T19 TAGCTCATCGCTGGAGGGACCGCTTCAGTCGTGGCAGCGGTCTGAGGACT cy5 19352377_16 U 19352377/Cr resistance protein

T20 GAACAGCATGCACCCTCCAGCGACAATCTCAAGGGATGACCTGACAGGGT cy3 22003701_216 U 22333701/mcrA

T21 GATCTTGAACACCGCTACCGACGCGGCGATCTCGGGTTCGGGGTTGAGCG cy5 22252884_219 U 22252884/nirS

T22 AACTCGGGGATCTTCTCTTCCTTCGCGACGTGCTGGTACCACTTGGAGTG cy3 18369656_244 U 18369656/bzdO

T23 TCGCGGCGCCCGCATTCAGTTCAATGTCCCCCTCGGCGGGGAAAATTTCT cy5 12313640_13 U 12313640/urease

T24 TCGAACAGCATGCATCCTCCGGCCACGATCTCAAGTGATGAGCGTACCGG cy3 21:22003692_218 G 22003692/mcrA

T25 CAGACCGGACCGAATTTTGCGTCAACAAAATTCGCGCCACGACCGGGATG cy5 25:24421447_266 G 24421447/nirS

Table 3. Information of PCR-generated targets hybridized with their gene-specific probes on the array Target-ID Forward primer (5'->3') Reverse primer (5'->3') Template Primary probe-ID Gene/category Size (bp)

P1 TCACCACGAAGTCTTGAAGC CTGGTCTGTTGCGATGATGT Lab clone FW010117B_436 FW010117B/dsrAB 661 (dsrA-C11) FW010117B_613 FW010117B_671

P2 GATGGTGGATGGAGTTGGTC CCTTCTTCTGACGCTCTTCG Lab clone FW010245A_189 FW010245A/dsrAB 262 (dsrA-C12) FW010245A_220 FW010245A_77

P3 AAGCTGGGACTACCACGAGA CTACGATTTCCACCGATTGC Lab clone FW015318B_180 FW015318B/dsrAB 617 (dsrB-D5) FW015318B_188 FW015318B_200

P4 GGGAAGTGGAAATACCACGA GACTGGACACCTTCGACCAT Lab clone FW300088B_251 FW300088B/dsrAB 626 (dsrB-D6) FW300088B_307 FW300088B_364

P5 ACCACGTCGCAGAACACC ATCTCCTGCGCCTTGTTCTC Lab clone DA-NIFH-E01_77 DA-NIFH-E01/nifH 392 (nifH-E01)

P6 CCGAGATGGGACAGAACATT TAGATTTCCTGCGCCTTGTT Lab clone ORE-NIFH-B09_199 ORE-NIFH-B09/nifH 361 (nifH-B09) ORE-NIFH-B09_207

P7 AAGGACGAAAAGGGCAAGC GATCAGGTGCGGACGAGT Lab clone ORE-NIRK-C03_124 ORE-NIRK-C03/nirK 285 (nirK-C03) ORE-NIRK-C03_230 ORE-NIRK-C03_252

P8 GCCTCAAGGACGACAAGG CGTGTCACGGTTGGCTTG Lab clone DE-NIRK-F11_103 DE-NIRK-F11/nirK 275 (nirK-F11) DE-NIRK-F11_123 DE-NIRK-F11_238

P9 CGGGTACAATCCCGAAAAG GATGAGGTGGTGCGAGAACT Genomic DNA 902748_270 gi|902748/dsrAB 1069 (D. vulgaris) 902748_32 902748_56

P11 GTTGCTGCGAGCGATTTATT AAGTATTCCGCCATCTGCTG Genomic DNA 24372305_11 gi|24372305/cytC 279 (S. oneidensis) 24372305_97

P12 TTCTCGCAACAGAAAAAGGAA TTCCTCCTTCTCCAACTTCG Genomic DNA 5712689_430 gi|5712689/cellulase 858 (M. maripludis) 5712689_488 5712689_759

Oligonucleotide design and synthesis. A computer program CommOligo (Li et al., 2005) was used to design gene-specific and group-specific probes based on the following criteria: (i) gene-specific probes: <=90% sequence identity, <=20-base continuous stretch, and >=-35 kcal/mol free energy; (ii) group-specific probes: >=96% sequence identity, >= 35-base continuous stretch, and <=-60 kcal/mol free energy (He et al., 2005a; Liebich et al., 2006). Each gene sequence or a group of homologous sequences had up to three probes. All verified probes were synthesized without modification by MWG Biotech, Inc. (High Point, NC) in a 96-well plate format with the concentration of 100 pmol/µl.

Oligonucleotide target synthesis. 25 oligonucleotides were synthesized as gene-specific and group-specific targets to evaluate the FGA specificity (Table 1). 50 pg for each oligonucleotide was used for hybridizations with a single target or a mixture of multiple targets.

Preparations of PCR-generated targets. 17 target genes were selected, and their PCR products (PCR-amplicons) were obtained using gene-specific primers and standard PCR methods (Table 2 and Table 3). Each PCR product had a minimal length to cover all available probes (1, 2 or/and 3 depending on probes selected) on the array.

DNA labeling and hybridization. The PCR-amplicons were fluorescently labeled by random priming using Klenow fragment of DNA polymerase as described previously (He et al., 2005b). Hybridization was at 50oC with 50% formamide.

Table 2. Information of PCR-generated targets to be hybridized with their group-specific probes on the array Target-ID Forward primer (5'->3') Reverse primer (5'->3') Template Targeted probe-ID Gene/category Size(bp) G14-1 AGACGGCCTCACCGACGGCAA CAGACATAGTCGCCATGACC Lab clone 14:DA-NIRK-D09_20 NIRK-C08/nirK 318

(Nirk-C08) NIRK-C09/nirK

G15-1 ACGCCCTTCATTACGACAAG GTCGCGATTGGCCTCTGAAT Lab clone 15:ORA-NIRK-F04_215 ORA-NIRK-F04/nirK 251

(NirK-F04) 15:ORA-NIRK-F04_229 ORA-NIRK-F11/nirK 251

15:ORA-NIRK-F04_238

G15-2 ACGCCCTTCATTACGACAAG GTCGCGATTGGCCTCTGAAT Lab clone 15:ORA-NIRK-F04_215 ORA-NIRK-F04/nirK 251

(NirK-F11) 15:ORA-NIRK-F04_229 ORA-NIRK-F11/nirK 251

15:ORA-NIRK-F04_238

G27-1 GACGGTCTCAAGGATGGCAGT AGTGAATGATCAGCACGGTTT Lab clone 27:DA-NIRK-C10-108 DA-NIRK-C10/nirK 259

(NirK-C10) 27:DE-NIRK-G06-35 DE-NIRK-G06/nirK 259

27:DE-NIRK-G10-23 DE-NIRK-G10/nirK 258

DE-NIRK-B12/nirK

G27-2 GACGGTCTCAAGGATGGCAGT AGTGAACGATCAGCACGGTTT Lab clone 27:DA-NIRK-C10-108 DA-NIRK-C10/nirK 259

(NirK-G06) 27:DE-NIRK-G06-35 DE-NIRK-G06/nirK 259


DE-NIRK-B12/nirK

G27-3 ACGGTCTCAAGGATGGCAGT AGTGAATGATCAGCACGGTTT Lab clone 27:DA-NIRK-C10-108 DA-NIRK-C10/nirK 259

(NirK-G10) 27:DE-NIRK-G06-35 DE-NIRK-G06/nirK 259


DE-NIRK-B12/nirK

Fig. 1 Major steps for construction of a comprehensive 50mer oligo functional gene array. CommOligo is the core program to select gene-specific and group-specific oligonucleotide probes. GeneDownloader, ProbeChecker, and PlateProducer were Perl scripts to pre-process gene sequences or post-process oligonucleotide probes.

For gene-specific probes, Fig. 2 shows the distribution of maximal sequence identities (Fig. 2A), maximal stretch lengths (Fig. 2B), or minimal free energy (Fig. 2C) with their non-targets. Most of the probes (~70%) had maximal sequence identities 72%~84%, stretch lengths 12~15 bases, and 0~-30kcal/mol free energy.

For group-specific probes, Fig. 3 shows the distribution of minimal sequence identities (Fig. 3A), minimal stretch lengths (Fig. 3B), or minimal free energy (Fig. 3C) with their group members. Most of the probes (~92%) had maximal sequence identities 100%, stretch lengths 45~50 bases, and free energy values of -65 kcal/mol or smaller.

Table 4. Summary of the numbers of probes and covered genes by category for the constructed comprehensive FGA

Gene category Unique probes Group probes Total

Nitrogen fixation 1225 0 1225

Nitrification/N metabolism 865 902 1767

Denitrification 1805 501 2306

Sulfur reduction 1286 329 1615

Methane reduction and oxidation 437 333 770

Carbon fixation 584 215 799

Carbon polymer degradation 2532 276 2808

Metal reduction and resistance 4039 507 4546

Organic contaminant degradation 6920 1087 8007

Total 19693 4150 23843

Fig. 4 The FGA was hybridized with a mixture of 15 synthesized oligonucleotide targets at 42oC, 45oC, 50oC and 60oC. Balancing probe sensitivity and specificity, the optimal hybridization temperature was determined to be 45-50oC with 50% formamide, which is generally consistent with our previous results.

Signal intensities for probe B and C were normalized with probe A (100%), and there were 14, 12 and 10 probe A, B, and C, respectively (Table S2 and Table S3).

The average of relative signal intensities for probe A, B and C were 100%, 103.8%, and 97.6%, respectively, and similarly, the average of SNR values were 73.1, 67.2 and 65.3 for probe A, B, and C, respectively (Fig. 5).

The results suggest that three probes performed similarly with known targets.

Table 5. The summary of dye-labeled targets (oligonucleotides or PCR-amplicons) hybridized with the FGA.

Target Oligonucleotide PCR-amplicon No. of targets 25 17

Expected no. of probes detected 38 35

No. of probes hybridized 40 39

Average signal intensity ± SD 5678±3372 9265±5270

Average SNR ± SD 55.9±27.56 87.6±38.72

No. of false positives 3 4

No. of false negatives 2 0

1. An FGA2.0 has been constructed with more than 23,000 oligos covering more than 10,000 gene sequences. To our knowledge, this is the most comprehensive FGA for environmental studies.

2. To ensure the array specificity, several new features has been implemented in the probe design, and array construction.

3. The FGA2.0 has been systematically evaluated using oligonucleotide and PCR-amplicon targets, and demonstrates that it can be used as a powerful tool for a rapid, high-through-put and cost-effective analysis of microbial communities.

4. The array can be used to profile microbial community differences, to address specific questions and/or hypotheses related to microbial population dynamics, and analyses of functional gene expression in microbial communities.

FGA II design strategies: 1. Using MSA to identify conserved regions for each functional gene.

2. Using experimentally established oligonucleotide design criteria and the novel software tool CommOligo.

3. Designing gene-specific and group-specific probes.

4. Multiple probes for each sequence or each group of sequences.

• 15.2% probes target carbon metabolism genes

• 22.2% probe target the genes involved in nitrogen cycling

• 6.8% probes for sulfur reduction genes

• 3.6% probes for methane reduction and oxidation

• 19.0% probes target genes involved in metal reduction and resistance

• 34.0% probes target genes involved in degradation of organic compounds

Fig. 2 Fig. 3

For oligo targets, there were three false positives and two false negatives, and for PCR-amplicon targets, four false positives and no false negatives observed (Table 5).

Possible reasons include: (i) First, the amounts of some oligonucleotides or PCR-amplicons applied to the array was too high or too low; (ii) Probe design criteria used were not specific enough for excluding all non-specific probes, and that some additional criteria may need to be considered; (iii) an optimization of hybridization conditions may improve probe specificity; (iv) there may be errors in probe or/and gene sequences.

To tackle the problem of false positives, relative comparisons are needed.

Fig. 5 Relative signal intensities and SNR values detected by probe A, B and C for PCR-amplicon targets.

REFERENCES

He Z, Wu L, Li X, Fields MW and Zhou J (2005a). Appl. Environ. Microbiol. 71:3753-3760.

He Z, Wu L, Fields MW and Zhou J (2005b). Appl. Environ. Microbiol. 71: 5154-5162.

Li X*, He Z* and Zhou J (2005). Nucleic Acid Res. 33: 6114-6123 (*Co-first authors).

Liebich J, Schadt CW, Chong SC, He Z, Rhee SK and Zhou J (2006). Appl. Environ. Microbiol. 72:1688-1691.

N125 http://ieg.ou.edu/

Documents

Development and Evaluation of a Comprehensive Functional Gene array for Environmental Studies Zhili He 1,2, C. W. Schadt 2, T. Gentry 2, J. Liebich 3,