15
Chapter 53 Functional Viral Metagenomics and the Development of New Enzymes for DNA and RNA Amplification and Sequencing Thomas W. Schoenfeld, Nick Hermersmann, Mike Moser, Darby Renneckar, Vinay Dhodda, and David Mead 53.1 INTRODUCTION The goal of this work is to use metagenomics to better understand viral molecular biology and advance molecular analytic capabilities by providing enzyme with improved capabilities. Gene products of phages and other viruses (collectively referred to here as viruses) have historically provided many of the enzymatic tools for molecular biology. However, most of the commonly used viral enzymes are derived from a very limited number of cultivated viruses, primarily phages T4, T7, lambda, SP6, and phi29, and retroviruses Moloney murine leukemia virus (Mo-MLV) and avian myeloblastosis virus (AMV). Our program to study hot springs virology in Yellow- stone National Park (YNP), California, and Nevada has provided insight into viral ecology [Breitbart et al., 2004; Schoenfeld et al. 2008; Heidelberg et al., 2009; see also Chapter 4–6, Vol. II] and has revealed a nearly unlimited source of diversity for our search for new enzymes [Schoenfeld et al., 2010]. However, current approaches to functional analysis of viral metagenomes, while informative, are limited by their reliance on sequence similarity to infer gene function. Improvements in our ability to functionally characterize viral metagenomes are necessary to advance the field (see Chapter 2–8, Vol. II). Understanding replication related proteins, especially thermostable DNA polymerases (Pols), has been a major Handbook of Molecular Microbial Ecology, Volume II: Metagenomics in Different Habitats, First Edition. Edited by Frans J. de Bruijn. © 2011 Wiley-Blackwell. Published 2011 by John Wiley & Sons, Inc. research focus due, in part, to the wide use of these enzymes in molecular detection and analysis. DNA polymerases are essential for PCR [Saiki et al., 1988] and other target-specific [Notomi et al., 2000; Vincent et al., 2004] and whole-genome amplification methods [Dean et al., 2002] and are also essential components of all the major DNA sequencing platforms. Sanger (dideoxy chain termination) DNA sequencing was the first major sequencing method to use DNA poly- merases and was advanced by thermostable Pols [Reeve and Fuller, 1995]. All of the leading next-generation sequencing-by-synthesis platforms (e.g., Roche/454 FLX, Illumina Genome Analyzer, Helicos Heliscope, Pacific BioSystems SMRT, ABI Solid) [Mardis, 2008; Shendure and Ji, 2008; see also Chapter 18, Vol. I] use at least one DNA polymerase for base discrimination and/or template preparation. DNA polymerase-based methods are driving discovery in research labs and, increasingly, in the clinic [Bustin and Mueller, 2005; see also Chapter 18, 20, 21, Vol. II] as methods for nucleic-acid-based detection of infectious agents, cancer, and genetic variation advance next-generation diagnostics and personalized medicine. Progress in improving all these methods depends in part on more suitable DNA polymerases. Viruses are rich sources of diverse new DNA polymerases. Compared to their cellular hosts, these intracellular parasites use a wide array of strategies to 563

Handbook of Molecular Microbial Ecology II (Metagenomics in Different Habitats) || Functional Viral Metagenomics and the Development of New Enzymes for DNA and RNA Amplification and

  • Upload
    frans-j

  • View
    218

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Handbook of Molecular Microbial Ecology II (Metagenomics in Different Habitats) || Functional Viral Metagenomics and the Development of New Enzymes for DNA and RNA Amplification and

Chapter 53

Functional Viral Metagenomicsand the Development of New Enzymesfor DNA and RNA Amplificationand Sequencing

Thomas W. Schoenfeld, Nick Hermersmann, Mike Moser, DarbyRenneckar, Vinay Dhodda, and David Mead

53.1 INTRODUCTION

The goal of this work is to use metagenomics tobetter understand viral molecular biology and advancemolecular analytic capabilities by providing enzyme withimproved capabilities. Gene products of phages and otherviruses (collectively referred to here as viruses) havehistorically provided many of the enzymatic tools formolecular biology. However, most of the commonly usedviral enzymes are derived from a very limited number ofcultivated viruses, primarily phages T4, T7, lambda, SP6,and phi29, and retroviruses Moloney murine leukemiavirus (Mo-MLV) and avian myeloblastosis virus (AMV).Our program to study hot springs virology in Yellow-stone National Park (YNP), California, and Nevada hasprovided insight into viral ecology [Breitbart et al., 2004;Schoenfeld et al. 2008; Heidelberg et al., 2009; see alsoChapter 4–6, Vol. II] and has revealed a nearly unlimitedsource of diversity for our search for new enzymes[Schoenfeld et al., 2010]. However, current approachesto functional analysis of viral metagenomes, whileinformative, are limited by their reliance on sequencesimilarity to infer gene function. Improvements in ourability to functionally characterize viral metagenomes arenecessary to advance the field (see Chapter 2–8, Vol. II).

Understanding replication related proteins, especiallythermostable DNA polymerases (Pols), has been a major

Handbook of Molecular Microbial Ecology, Volume II: Metagenomics in Different Habitats, First Edition. Edited by Frans J. de Bruijn.© 2011 Wiley-Blackwell. Published 2011 by John Wiley & Sons, Inc.

research focus due, in part, to the wide use of theseenzymes in molecular detection and analysis. DNApolymerases are essential for PCR [Saiki et al., 1988]and other target-specific [Notomi et al., 2000; Vincentet al., 2004] and whole-genome amplification methods[Dean et al., 2002] and are also essential componentsof all the major DNA sequencing platforms. Sanger(dideoxy chain termination) DNA sequencing wasthe first major sequencing method to use DNA poly-merases and was advanced by thermostable Pols [Reeveand Fuller, 1995]. All of the leading next-generationsequencing-by-synthesis platforms (e.g., Roche/454 FLX,Illumina Genome Analyzer, Helicos Heliscope, PacificBioSystems SMRT, ABI Solid) [Mardis, 2008; Shendureand Ji, 2008; see also Chapter 18, Vol. I] use at least oneDNA polymerase for base discrimination and/or templatepreparation. DNA polymerase-based methods are drivingdiscovery in research labs and, increasingly, in the clinic[Bustin and Mueller, 2005; see also Chapter 18, 20, 21,Vol. II] as methods for nucleic-acid-based detection ofinfectious agents, cancer, and genetic variation advancenext-generation diagnostics and personalized medicine.Progress in improving all these methods depends in parton more suitable DNA polymerases.

Viruses are rich sources of diverse new DNApolymerases. Compared to their cellular hosts, theseintracellular parasites use a wide array of strategies to

563

Page 2: Handbook of Molecular Microbial Ecology II (Metagenomics in Different Habitats) || Functional Viral Metagenomics and the Development of New Enzymes for DNA and RNA Amplification and

564 Chapter 53 Functional Viral Metagenomics and the Development of New Enzymes

replicate their genomes and viral genomes adopt nearlyevery conceivable form, including double-stranded andboth positive and negative single-stranded RNA and DNAforms, with linear, circular, and multipartite topologiesranging in size from 1.2 Mb (Mimivirus) down to 3.2 kb(hepatitis B virus) [Galibert et al., 1979; Suzan-Montiet al., 2006]. While many of these replicative strategiesrely on host enzymes, a substantial subset of viral familiessupply their own replication proteins. There is speculationthat viruses may have played a key role in the evolution ofreplication strategies used by cellular life [Koonin, 2006].

As replicases, viral polymerases are functionallydistinct from the bacterial and archaeal enzymes currentlyused in molecular biology. During prokaryotic cellularreplication, processive leading-strand synthesis dependson a multisubunit complex including Pol III holoenzyme,helicases, and primases. Escherichia coli Pol III holoen-zyme is a 791-kDa protein comprised of nine subunits(reviewed in Johnson and O’Donnell [2005]). Due to theircomplexity, no Pol III derivatives have been developedas a molecular biology reagent. Cell-derived reagentPols (e.g., Taq, Pfu , or E. coli DNA polymerases) areall bacterial Pol I or archaeal Pol II derivatives that aremainly responsible in vivo for lagging strand and repairsynthesis, neither of which requires strand separation orprocessive synthesis of long sequences. Viral Pols arefunctionally more like the leading-strand replicases and,accordingly, exhibit higher fidelity, rates of synthesis, andprocessivity [Johnson and O’Donnell, 2005]. Phage T7Pol, for example, incorporates 300 nt per second, 6 timesfaster than Escherichia coli Pol I; T4 phage replicatesDNA 10 times faster than its E. coli host [Kornberg andBaker, 1992]. Phi29 Pol has a processivity of >70,000nucleotides [Blanco et al., 1989] (i.e., it incorporates over70,000 nucleotides before dissociating), far greater thanthat of Taq Pol I, which has a processivity of between50 and 80 [Merkens et al., 1995]. Phi29 also has astrong strand displacement capability that, together withits processivity, makes it the polymerase of choice forwhole-genome amplification by multiple displacementamplification (MDA) [Dean et al., 2001]. T7 phagePol holoenzyme has a processivity of 1000 nucleotides[Tabor et al., 1987] and efficiently incorporates chain ter-minating nucleotide analogues, which facilitated Sangersequencing until it was displaced by Thermosequenase,a Taq Pol derivative that was engineered based onthe nucleotide variation in T7 DNA Pol that conferredefficient incorporation of dideoxynucleotides [Tabor andRichardson, 1995]. T5 Pol has both high processivity anda potent strand displacement activity that are independentof additional host or viral proteins [Andraos et al., 2004].T4 DNA Pol has a high proofreading activity that iscommonly exploited for generating blunt ends, especiallyin physically sheared DNA [Karam and Konigsberg,

2000]. Retroviral replicases (i.e., reverse transcriptases),especially Mo-MLV and AMV, are indispensable fordetection, analysis, and cloning of transcripts and RNAviruses [Morin et al., 2008; Wang et al., 2008]. Together,these qualities make viral Pols attractive targets fordevelopment as reagents.

While our emphasis has been DNA polymerases,viruses encode other useful enzymes. RNA polymerases,for example, are key components of a number of in vitroand in vivo transcription and translation systems, as wellas several transcription-mediated amplification methods[Guatelli et al., 1990; Compton, 1991]. Virtually allligation methods used for cloning and linker attachmentdepend on T4 DNA ligase due to its high activity on5′- and 3′- extended and blunt DNA. The integrases andrecombinases of various phages (e.g., lambda red and P1cre/lox ) have been used to integrate genes into bacterialand eukaryotic genomes. Resolvases (e.g., T4 endonucle-ase VII and T7 endonuclease I) have been used to detectsingle nucleotide polymorphisms (SNPs) [Babon et al.,2003]. It is likely that these and many other methodsthat rely on viral enzymes can be further improved bynovel enzyme activities. Functional metagenomic-basedenzyme discovery and development should benefit a widerange of applications.

The enzymes that have been isolated by cultivationover the years demonstrate the potential of viruses as asource of new enzymes, but greatly underrepresent therichness of this resource. The extreme global abundanceand diversity of viruses is well-documented [Breitbartet al., 2002; Angly et al., 2006; Dinsdale et al., 2008;McDaniel et al., 2008; Schoenfeld et al., 2008; see alsoChapter 6, Vol. II]. A liter of ocean water contains asmany viruses as there are humans on the planet and muchmore genetic diversity [Suttle, 2007]. In fact, the bulk ofthe world’s genetic diversity is probably encoded in viralgenomes. Despite the richness of the global virosphere as asource of diverse replicative proteins, standard approachesto discovering new enzymes by cultivating the viruseshave proven extremely inefficient and few new viralenzymes have been commercialized in the past decades.Notably, despite their widespread potential applicationsand notwithstanding substantial effort, thermostable viralPols have completely eluded discovery by cultivation.There are now 34 fully sequenced genomes from ther-mophilic viruses in the NCBI database (February 2010):27 archaeal viruses and 7 bacteriophages. None of thesegenomes nor broad screens of hundreds of cultivated Ther-mus phage [Yu et al., 2006] has produced a thermostableDNA polymerase. Extensive analysis of cultivated crenar-chaeal viral genomes from high temperature environmentsreveals few recognizable features other than a smallnumber of methylases, helicases, glycosyl transferasesand several unknown but shared genes ([Prangishvili and

Page 3: Handbook of Molecular Microbial Ecology II (Metagenomics in Different Habitats) || Functional Viral Metagenomics and the Development of New Enzymes for DNA and RNA Amplification and

53.1 Introduction 565

Garrett, 2005]. At least one presumptive DNA polymerasehas been identified in an archaeal viral genome [Penget al., 2007], but not, to our knowledge, expressed in thelab. At least five Pols have been expressed from ther-mophilic bacteriophage genomes [Hjorleifsdottir et al.,2003; Naryshkina et al., 2006; T. Schoenfeld, unpublishedresults]; however, for unknown reasons these enzymes areonly moderately thermostable and incapable of survivingthermocycling in PCR or sequencing, despite the ther-mostability of their host Pols. In order to identify usefulthermostable Pols, more efficient approaches are needed.

One of the main barriers to discovery of new viralenzymes is technical challenges associated with cultiva-tion. It is widely noted that cultivation in the lab selectsagainst the great majority of Bacteria and Archaea. Culti-vation of new viruses introduces another extreme level ofselection against the vast majority of natural populationsbecause cultivation requires the investigator to choose ahost that can be grown in the lab, which severely lim-its the comprehensiveness of the screens. When exam-ining extreme environments like thermal springs, whichare dominated by autotrophic microbes, this host selec-tion is even more limiting. Most these cultivation effortshave focused on viruses that infect heterotrophic Bacteria,especially Thermus [Sakaki and Oshima, 1975; Oshimaet al., 1976; Pederson et al., 2001; Blondal et al., 2003;Naryshkina et al., 2006; Yu et al., 2006; Jaatinen et al.,2008; Matsushita and Yanase, 2009] or a small numberof thermoacidophilic Archaea, particularly Sulfolobus andAcidianus (reviewed in Prangishvili and Garrett [2005]),due to the relative ease of cultivating these hosts. Metage-nomics promises to overcome these barriers and providea largely unbiased sampling of viral populations.

In some respects, viral metagenomes are especiallywell-suited for discovery of enzymes for use in molecularanalysis. Viral genomes are highly diverse and densewith genes associated with nucleic acid metabolism[Leplae et al., 2004]. For example, a typical bacterialgenome of 2 Mb contains 3–5 DNA polymerase genes,only one of which, polA, encodes enzymes that havebeen used as reagents. In contrast, a comparable 2 Mbof viral metagenome can yield up to 40 pol genes[Schoenfeld et al., 2008]. However, the promise ofusing this diversity to advance our understanding ofglobal ecology and in developing useful enzymes fromviral metagenomes is tempered by the challenge inassigning function to the genes. The gigabases of viralmetagenomic sequence data that have been generatedover the past decade have provided only inferentialinsight into function or biochemistry of the viral genesand, consequently, few new molecular tools. Efforts toglean insight from metagenomes are hampered by thenearly complete reliance on sequence similarity coupledwith the extreme viral genomic diversity and the dearth

of annotated sequences. Depending on the environment,40–90% of viral metagenomic sequences are unknown,novel sequences [Angly et al., 2006; Dinsdale et al.,2008; Bench, 2007; Srinivasiah, 2008; Schoenfeldet al., 2008]. All the next-generation platforms generateshorter reads that are even more difficult to assemble oralign to sequences in GenBank, resulting in artificiallylow BLASTx homologies or, conversely, artificiallyhigh numbers of “unique” sequence [Wommack, 2008;see also Chapter 4, Vol. II]. The Virome database(virome.dbi.udel.edu) has cataloged 201 Mb of predictedopen reading frames (ORFs) from long read sequencedata (Feb 2010), the vast majority of which are noveland functionally uncharacterized.

Functional characterization of viral metagenomes haslagged far behind our ability to collect sequence data (seeSection 1, Vol. II). Essentially none of the millions ofgene functions inferred by sequence similarity has beenproven biochemically by expression and analysis of thegene products. More importantly, the mere description ofsequence similarity does little to further our understandingof viral biology or to identify useful new enzymes. Fur-thermore, sequence similarity screens only identify geneswith an annotated counterpart in a database. The relativescarcity of functionally annotated viral genes in GenBankhas likely prevented discovery of truly novel enzyme fam-ilies, which should be the strength of viral metagenomics.

Finally, a conceptual barrier associated with ourdefinition of related viral types has prevented assembly ofviral genomes and, consequently, inferences into functionthat are based on gene position. Phage genes of relatedfunction, especially replication related genes, often occurin proximity within operons [Desiere et al., 1997].Assembly of sequence reads should allow reconstructionof operons; however, standard approaches relying onnucleotide identities of greater than 95% are ineffective inassembly of viral metagenomes and only a few very small,abundant phage genomes have been reconstructed frommetagenomic data [Angly et al., 2006]. Because even therelatively long Sanger reads are almost always too shortto include more than one complete gene, these associ-ations are generally missed. Since traditional shotgunsequencing, used in some of the work described below,involved the construction of clone libraries, we have hadsome success in identifying adjacent genes by sequencingentire inserts from archived clones, but even this approachis limited by the sizes of inserts in the libraries, gener-ally less than 5 kb. Since none of the next-generationsequencing methods uses clone libraries, this approachis impossible for most of the ongoing viral metagenomicprojects. The fundamental problem is that viral popula-tions are too molecularly diverse to accommodate thiscriterion. Among cultivated viruses, closely related phagesare up to 50% divergent at the nucleotide level [Lucchini

Page 4: Handbook of Molecular Microbial Ecology II (Metagenomics in Different Habitats) || Functional Viral Metagenomics and the Development of New Enzymes for DNA and RNA Amplification and

566 Chapter 53 Functional Viral Metagenomics and the Development of New Enzymes

et al., 1999; Hatfull et al., 2006]. When assembly criteriaare reduced to as low as 50%, much larger assembledcontigs are generated [Schoenfeld et al., 2008]. Thisapproach has proven effective in generating contigs thatcontain identifiable operons that not only allow isolationof genes of related function but also allow mappingof diversity onto the protein structure. These sequencevariations correspond to biochemical differences in thegene products and provide a guide to enzyme engineering.In the work described below, we have pursued a tripartiteapproach to functional analysis of viral metagenomes,including (1) expression and biochemical characterizationof the “BLASTx hits,” (2) functional screens to identifyenzymes too dissimilar to known genes to be detected bysequence similarity, and (3) assembly of operons to infergene function based on position in the genome.

53.2 METHODS

53.2.1 Sampling, LibraryConstruction, and SequencingSampling, library construction and sequencing of theYNP samples has been described [Schoenfeld et al.,2008]. The Great Boiling spring samples were collectedas described and amplified using the Repli-g kit (GEHealthcare). DNA was sheared and inserted into pETitevector (Lucigen) and the library used to transform E. coliHI-Control BL21(DE3) cells (Lucigen). Individual clonesfrom both libraries were sequenced in their entirety usingstandard chemistry (Life Technologies).

53.2.2 BioinformaticsSequence assemblies were performed using Sequencher(GeneCodes) or SeqMan (DNAStar). Clustal W analysiswas performed as described [Thompson et al., 1994].

53.2.3 Functional ScreensThe clones from the Great Boiling Spring samples weregrown on Luria broth, pelleted, and resuspended inbuffer containing lysozyme. Lysates were incubated for10 min at 70◦C, centrifuged and the supernatants weretested for DNA polymerase activity using the standardassay. Positive clones were cultivated at 50 ml in LBand retested. The inserts of clones with activity weresequenced in their entirety.

53.2.4 Cloning, Expression,Purification, and MutagenesisDNA polymerase genes that were further characterizedwere expressed at higher levels by insertion into pET28vector and expression in E. cloni EXPRESS BL21(DE3)

(Lucigen). DNA polymerase was purified by heat treat-ment and standard chromatography methods. Mutagenesiswas performed using the QuikChange II Site-DirectedMutagenesis Kits (Agilent).

53.2.5 Biochemical Analysisand Applications DevelopmentBiochemical assays were performed using standard meth-ods [Perler et al., 1996; Hogrefe et al., 2001].

53.3 RESULTS AND DISCUSSION

53.3.1 Sequence-Basedand Functional Discovery of NewDNA PolymerasesIn a recent study of viral metagenomes from Yel-lowstone hot springs, more than 28,000 Sanger-basedlong sequence reads (nearly 30 Mb of sequence) weredetermined [Schoenfeld et al., 2008; see also Chapter 6,Vol. II]. BLASTx alignment to the nonredundant proteindatabase indicated that 156 ORFs had similarity to knownpol genes. Fifty-nine appeared to be complete genes andwere tested for DNA polymerase activity. Ten showedactivity, and seven of these were sequenced in theirentirety. Although highly divergent from known viral andcellular genes, four loosely grouped with Family A poly-merases and three grouped with Family B polymerases.We refer to these pol genes as “PyroPhage” followedby an identifying number. The Family A pols detectedby this screen were too divergent to be grouped, butthe Family B Pols are referred to below as “PyroPhage4110-like Pols” in reference to the first one discovered.

The degree of sequence conservation among polgenes in these libraries, while relatively low, was higherthan most sequences found in viral metagenomes. Thediscovery of 156 partial genes among roughly 600viral genome equivalents suggests that sequence-basedscreens were relatively efficient in identifying pol genes.Nonetheless, there are important disadvantages to thisapproach. One is that the diversity of viral pol genes islikely to be high enough that interesting new enzymes aremissed. Another problem is that a gene must be situatedin the random clone so that an identifiable portion of it iswithin the read length of the sequencing method (>1000nucleotides by Sanger, much less by newer sequencingapproaches) and the gene must not extend beyond theboundaries of the random insert so that it is incomplete.It is unknown how many genes failed to fulfill the firstcriterion and were within the insert, but not within thesequence range. Of the 156 identified candidate pol genes,only 38% fulfilled the second criterion and appearedcomplete. Finally, the identification of a gene does not

Page 5: Handbook of Molecular Microbial Ecology II (Metagenomics in Different Habitats) || Functional Viral Metagenomics and the Development of New Enzymes for DNA and RNA Amplification and

53.3 Results and Discussion 567

mean that the gene will express efficiently in E. coli . Forunknown reasons, among the 59 likely complete genes,83% failed to express at detectable levels.

Functional screens address many limitations ofsequence-similarity screens and can often detect com-pletely novel activities regardless of divergence fromknown genes or position in the insert, as long as the com-plete gene is present. By their nature, functional screensonly detect complete, expression-competent genes. Viralmetagenomic DNA from the Great Boiling Spring,Gerlach, Nevada, kindly provided by Brian Heldund andJeremy Dodsworth (University of Nevada—Las Vegas),was used to construct a library that was screened forexpression of thermostable pol activity. Screening of2800 clones resulted in the discovery of 12 that were pos-itive for primer extension activity. Eleven of these weremore than 97% identical to each other and are referredto as the “PyroPhage 74-like polymerases” in reference

to the first member discovered. These pol genes shareup to 45% identity with the other polA-type genes fromYellowstone (PyroPhage 3173 and 967) and 56% identityto PyroPhage 488, a pol gene isolated 8 years earlier ina sequence-based screen of a metagenome from LittleHot Creek, Long Valley, California, which is 400 kmfrom Gerlach, Nevada, but still in the Great Basin. Thefinal clone identified in the functional screen, PyroPhage347, had no significant similarity to any known pol gene.In fact the strongest E value to any known gene had abarely significant 0.750 score to an open reading frameof unknown function in a crenarchaeal virus. Due to thislack of similarity to genes of known function, this genewould never have been identified by sequence similarity.

The pol genes discovered by both screens werealigned by Clustal W to each other and to representativecellular and viral pol genes to construct a neighborjoining tree (Fig. 53.1). Viral genes from these screens,

Figure 53.1 Polymerase phylogenetictree. Full-length viral metagenomic DNApolymerase amino acid sequences werecompared by ClustalW to representativeviral, microbial, and eukaryotic Pols anddisplayed in a neighbor joining tree.

Page 6: Handbook of Molecular Microbial Ecology II (Metagenomics in Different Habitats) || Functional Viral Metagenomics and the Development of New Enzymes for DNA and RNA Amplification and

568 Chapter 53 Functional Viral Metagenomics and the Development of New Enzymes

as well as those retrieved from GenBank, were noticeablymore diverse than cellular genes. Most PyroPhage polgenes are highly divergent from known cellular or viralpol genes. The exception is PyroPhage 3063, which isrelated to several polA genes of Aquificales family, whichare known to be quite divergent from other bacterial polAgenes [Griffiths and Gupta, 2004].

Since the libraries were constructed from differenthot spring populations, direct comparisons are difficult.However, while the overall rate of discovery of apparentDNA polymerase genes was comparable for the sequence-based and functional screens (156 pol genes from 28,000clones compared to 12 from 2800 clones, respectively),the rate of discovery of functional thermostable enzymeswas much lower for the sequence screens than the func-tional screens (10 of 28,000 versus 12 of 2800). Thediversity of the enzymes in the GBS library was muchlower than those from Yellowstone springs, presumablyreflecting a lower overall population diversity.

53.3.2 Biochemical Characteristicsand Directed Engineering ImproveUse of PyroPhage Pols in PCRand Sanger SequencingPyroPhage 3173 and 347 Pols proved to be the most ther-mostable of the newly discovered polymerases. In fact,these are the first viral Pols with adequate thermostabil-ity for PCR. PyroPhage 3173 Pol, which has been stud-ied in greatest detail (Table 53.1), has adequate ther-mostability for thermocycling, inherent reverse transcrip-tase activity, and high fidelity that enable a number ofapplications for this enzyme. The proofreading activityproved highly beneficial for high-fidelity PCR amplifi-cation (Fig. 53.2). However, many applications benefitfrom the absence of proofreading activity. Alignment ofthe PyroPhage 3173 pol gene to E. coli polA [Beese andSteitz, 1991] identified codons for two acidic residues,either of which could be mutated to eliminate exonucleaseactivity. This reduced fidelity to very close to that of TaqPol, but simplifies its use in PCR and other amplificationmethods. Like most family A Pols, 3173 has a strong dis-crimination against dideoxynucleotides that made it lesseffective in Sanger sequencing. Based on alignment toknown proteins [Tabor and Richardson, 1995], mutationF418Y (Fig. 53.3A) reduced discrimination against chainterminators to nearly zero, making the enzyme very effec-tive for dye terminator cycle sequencing (Fig. 53.3B).

53.3.3 Single-Enzyme RT PCRwith 3173 DNA PolymeraseThe thermostability and reverse transcriptase activityseen in PyroPhage 3173 Pol allow efficient RT PCR

Table 53.1 Biochemical Characteristics of PyroPhage3173 DNAP

3′ –5′ exonuclease Strong5′ –3′ exonuclease NoneStrand displacement StrongExtension from nicks StrongT 1

2 @ 95◦C 10minKm dNTPs 40 μMKm DNA 5.3 nMProcessivity 42Fidelity 8 × 104

3′ ends of amplicons BluntTemplate DNA or RNA

9.E+04Fidelity of PCR Enzymes

8.E+04

7.E+04

6.E+04

5.E+04

4.E+04

3.E+04

2.E+04

1.E+04

0.E+04P

yroP

hage

Taq

Ven

t

KO

D

Phu

sion

Pla

tTaq

HF

Pyr

oPha

geex

o m

inus

Figure 53.2 Fidelity of PyroPhage 3173 Pol and its exo minusderivative. Fidelity of PCR amplification of PyroPhage 3173 wt andexonuclease minus Pols were compared to commercial sources ofThermostable Pols in the lacI forward mutation assay [Lundberget al., 1991].

amplification (Fig. 53.4) that should benefit research intotranscription, gene expression, RNA viruses, and otherimportant areas involving amplification of RNA targets.Currently, almost all RT PCR depends on retroviralRTs (i.e. M-MLV and AMV RTs), which, despite wideuse, have well-documented deficiencies that compromiseRT-PCR. Side activities in retroviral reverse transcrip-tases, including RNase H and terminal transferase, ledto mismatch extension artifacts [Perrino et al., 1989;Pulsinelli and Temin, 1991; Creighton et al., 1992;Bakhanashvili and Hizi, 1993; Taube et al., 1997; Halvaset al., 2000; Brincat et al., 2002; Mbisa et al., 2005].Primer-dependent bias in extension efficiency [Andoet al., 1997] and fidelity [Ricchetti and Buc, 1990] likelyaccount for documented inaccuracy of RT PCR quantifi-cation [Liu and Graber, 2006], poor correlation betweentests [Konnick et al., 2005], and/or complete amplifica-tion failure, depending on the RT and the abundance oftranscript [Levesque-Sergerie et al., 2007]. Inherently lowsynthesis fidelity (up to one error per 500 nt, 20× higher

Page 7: Handbook of Molecular Microbial Ecology II (Metagenomics in Different Habitats) || Functional Viral Metagenomics and the Development of New Enzymes for DNA and RNA Amplification and

53.3 Results and Discussion 569

(A)

(B)

Figure 53.3 Directed engineering of 3173 Pol to improve Sangersequencing. (A) Shown is the increased incorporation of dideoxy-and acyclo-nucleotides by the F418Y mutant of PyroPhage 3173 Pol,as indicated by increased inhibition of Pol activity by chainterminating nucleotides. (B) The F418Y mutant was used as a directsubstitute for Thermosequenase in a BigDye® (ABI) sequencingreaction.

than Taq Pol) results in misincorporations, frameshifts,and deletions [Pathak and Temin, 1990; Pulsinelli andTemin, 1991; Burns and Temin, 1994]. Strand-switching[Temin, 1993] probably causes the inter- and intramolec-ular rearrangement artifacts [Bowman et al., 1998] thatcan be preferentially extended [Roberts et al., 1989]and result in recombination or insertion/deletion (indel)artifacts in cDNA synthesis [Weaver et al., 1981; Kulpaet al., 1997]. A consequence of two-enzyme RT PCRis that the RT step can interfere with subsequent PCR[Sellner et al., 1992; Fehlmann et al., 1993; Chumakov,1994; Chandler et al., 1998; Liss, 2002; Suslov andSteindler, 2005], which compromises quantificationof low abundance targets. Efforts to ameliorate thesedeficiencies include mutagenesis to disable or remove theRNase H domain [Kotewicz et al., 1988]. These mutationsreduce rearrangements, but lead to increased substitutionerrors and bias [Halvas et al., 2000; Svarovskaia et al.,2000; Brincat et al., 2002]. Other enzymes have beenexplored as alternatives to retroviral RTs (e.g., Tth Pol[Myers and Gelfand, 1991]), but none has proven asatisfactory replacement for most methods that rely onreverse transcription of RNA. PyroPhage 3173 is the most

efficient Pol for single-enzyme RT PCR and, as such, analternative to the retroviral RT-dependent methods.

53.3.4 Assembly of CompositeContigs from Viral MetagenomesOne anticipated drawback of using metagenomics asan enzyme discovery tools was the fragmentary natureof the reads, which was expected to hamper effortsto associate subunits of multisubunit enzymes. Manyproteins, replicases in particular, function as multiplesubunits. Indeed, the replicases of phages T4, T7, andPhi29 and of viruses Mo-MLV, vaccinia, and herpes allfunction in vivo as multigene replication complexes witha number of subunits—for example, helicases, primases,processivity factors, and clamp loaders [Blanco et al.,1994; Lehman and Boehmer, 1999; Trakselis et al., 2001;Stanitsa et al., 2006; Hamdan and Richardson, 2009].While, in most cases, the polymerase subunits functionindependently in vitro, the utility may be improved byadditional subunits. For example, T7 Pol apoenzyme, byitself, has low processivity and was not very effectivein Sanger sequencing without it host-derived proces-sivity factor, thioredoxin [Tabor et al., 1987; Taborand Richardson, 1987]. Because proteins in replicationcomplexes often have highly specific contacts with oneanother [Hamdan and Richardson, 2009], it is importantthat subunits be derived from the same viral genome andnot from unrelated viruses.

Because these functionally related genes are oftenadjacent in operons, it is theoretically possible to identifythem given long enough contiguous sequence. Experienceshows that operons are almost always too large to be foundin the relatively small insert clones seen in typical metage-nomic libraries and that without modified assembly rulesthey are missed. With deep sequencing, these fragmentscould theoretically be assembled to recover complete viralgenomes. In practice, the high degree of sequence poly-morphism that characterizes viral metapopulations con-founds assembly of related genes and only very limitedassembly has been possible by standard protocols.

To accommodate this natural population diversity,we experimented with lowering assembly stringencyfrom the standard 95% identity to as low as 50%.Assembly of the YNP Bear Paw (74◦C) and Octopus(93◦C) metagenomes at 50% identity allowed recoveryof composite contigs as large as 35 kb. Fully 7.04 Mb(33%) of the Octopus reads assembled at this identityinto 17 contigs of greater than 10 kb [Schoenfeldet al., 2008]. These assemblies appear very reliablein associating orthologous sequences. Particularly inthe Octopus library, the sequence reads are evenlydistributed throughout the contigs with minimal stackingor other anomalies that would suggest amplification or

Page 8: Handbook of Molecular Microbial Ecology II (Metagenomics in Different Habitats) || Functional Viral Metagenomics and the Development of New Enzymes for DNA and RNA Amplification and

100 bp ladder

144

bpβ-

actin

246

bpβ2

-μgl

obul

in29

8 bp

Cyc

loph

ilin

100 bp ladder

MS

2ta

rget

Pyr

oPha

ge R

T

xxx

xxx

xxx

xxx

xxx

xxx

xxx

xxx

xxx

xxx

xxx

xxx

xxx

xxx

xxx

xxx

xxx

xxx

xxx

xxx

xxx

xxx

xxx

xxx

xxx

xxx

xxx

xxx

xxx

xxx

xxx

xxx

xxx

xxx

xxx

xxx

xxx

xxx

xxx

xxx

xxx

xxx

xxx

xxx

xxx

xxx

xxx

xxx

xxx

Tth

Pol

BI

10–2

10–8

MMLV

Pyrophage RT

No RT

MMLV

Pyrophage RT

No RT

MMLV

Pyrophage RT

No RT

(A)

RT

PC

R o

f H

um

an T

ran

scri

pts

(B)

RT

PC

R o

f M

S2

Ph

age

RN

A

BI

10–2

10–8

xxx

Fig

ure

53.4

Rev

erse

tran

scri

ptio

nPC

Rus

ing

Pyro

Phag

e31

73Po

l.(A

)To

tal

hum

anli

ver

RN

A(1

μg,

Prom

ega)

was

reve

rse

tran

scri

bed

byM

-MLV

RT

orPy

roPh

age

3173

Pol,

then

PCR

ampl

ified

usin

gL

ucig

enE

cono

Taq®

PLU

SM

aste

rM

ix.

Show

nar

eta

rget

sof

144,

246,

and

298

bp.

(B)

Sing

le-e

nzym

eR

TPC

Ram

plifi

catio

nby

Pyro

Phag

e31

73Po

lan

dT

th(E

pice

ntre

)w

ere

com

pare

dus

ing

a16

0-bp

MS2

phag

eR

NA

targ

etov

era

102-

to10

8-f

old

dilu

tion

seri

es.

Show

nar

ere

al-t

ime

and

post

-rea

ctio

nm

elt

data

(top

)an

dco

rres

pond

ing

end-

poin

tR

TPC

Rag

aros

ege

l(b

otto

m).

Tth

poly

mer

ase

was

used

with

Mn2+

asdi

rect

ed.

Arr

ows

show

corr

ect

mel

tT

m(t

op)

and

ampl

icon

(bot

tom

).

570

Page 9: Handbook of Molecular Microbial Ecology II (Metagenomics in Different Habitats) || Functional Viral Metagenomics and the Development of New Enzymes for DNA and RNA Amplification and

53.3 Results and Discussion 571

cloning artifacts. The high numbers of reads on bothstrands, evenly distributed throughout the contigs, suggestthese contigs represent independent clones of closelyrelated genomes. Using the lower stringency assemblies,SNPs can be identified and mapped to the codingsequences. As additional biochemical and structural databecome available, molecular diversity may be correlatedwith variations in function and structure.

53.3.5 Assembly of a ReplicationOperon from a Viral MetagenomeOne of these contigs provided a unique opportunityto identify potential replicase subunits and associatepopulation diversity of an assembled metagenome withthe biochemistry of the gene products (Fig. 53.5). This16.5-kb contig, assembled at 50% identity, includes187 reads (average coverage of 11 reads per nucleotideposition). GeneMark [Besemer and Borodovsky, 2005]predicted 26 ORFs of greater than 100 nucleotides, which,when translated and annotated by BLASTp, appears toinclude at least a partial replication operon. The geneswith the strongest similarity to four of these ORFsencode two primase subunits, uracil DNA glycosylase,a Family B DNA polymerase, and nucleotide excisionrepair nuclease (dnaG, udg, polB, and ERCC4 genes,respectively). Homologues of these ORFs belong tocrenarchaeal DNA replication–repair complexes [Robertset al., 2003; Dionne and Bell, 2005; Barry and Bell,2006]. The predicted polB gene showed 28% identity toPyrobaculum islandicus polB2 [Kahler and Antranikian,2000]. Three of the discreet clones that include the polBgene in this contig (PyroPhage 4110, 2783, and 2323Pols, Fig. 53.1) have been expressed in E. coli to producea functional thermostable DNA polymerase (data notshown). This contig also contains apparent homologuesto a zinc finger-like protein and a transposon-likeintegrase/resolvase (tnp). Another ORF with highestsimilarity to the CRISPR-associated sequence cas4 [Haftet al., 2005] is more likely a separate member of the cas4COG, presumably a recB -like exonuclease gene.

To correlate the level of sequence divergence withpredicted gene function, SNP frequency was calculatedand overlaid onto the 50% assembly consensus sequenceof the contig (Fig. 53.5). Overall distribution of SNPs inthe contig was 0.705 per 10 bp. Replication-associatedgenes showed noticeably lower molecular diversity thanthe other ORFs. SNP distribution in the dnaG, udg, polB,and ERCC homologues was 0.565, 0.617, 0.569, and0.548 per 10 bp, respectively, while the distribution in theZn finger, cas4 , and thyA homologues was 0.979, 1.31,and 0.728, respectively. Finer mapping of this diversity isbeing used to understand the functional differences in theenzymes encoded by the constituent clones of this contig.

53.3.6 Identification of a ReplicasePolyprotein from the Great BoilingSpring MetagenomeBased on the large number of highly similar isolates (<3%amino acid divergence), the PyroPhage 74-like family ofpolA-like genes from the Great Boiling Spring in Nevada(Fig. 53.1) appears to be derived from abundant viruseswith limited molecular diversity. Unlike the previouslydescribed pol genes, these were identified by functionalscreening, precluding the assembly of large contigs. How-ever, this group of pol genes proved particularly usefulfor dissecting the molecular biology of a different repli-case. The various polymerase-positive clones contain thecarboxy-terminal half of an apparent polyprotein, but varyin the amount of coding sequence for the amino-terminalhalf (Fig. 53.6A), implying that the carboxy-terminal halfof the polyprotein is sufficient for polymerase activity.The polymerase gene appears to be part of an open read-ing frame that would encode a polyprotein of at least100 kDa. After expression in E. coli , this polyprotein isprocessed, either in vivo or in vitro, to produce a proteinof about 55 kDa (Fig. 53.6B). The amino-terminal half ofthis apparent polyprotein has no known function and nosignificant sequence similarity to known proteins, but islikely to be associated with replication and, therefore, thetarget of ongoing investigation.

Polyproteins are a common element used by RNAviruses [Lloyd, 2006]. The retroviral reverse transcrip-tases, for example, are all expressed as polyproteins thatare proteolytically processed [Goff, 1990]. Heterologousviral polyproteins from hepatitis C have been shown to beactive and properly processed in E. coli [Komoda et al.,1994]. However, replicases expressed as polyproteins aremuch rarer in DNA viruses. To our knowledge, PyroPhage74-family Pol described here and the PyroPhage 3173Pol described below are the first documented examplesof thermophilic phage polyproteins that are actively pro-cessed in E. coli .

53.3.7 Molecular Biologyof the PyroPhage 3173Replicase OperonExpression of PyroPhage 3173 Pol, described above,illustrates another challenge in metagenomics-basedenzyme discovery. Since, as with all metagenomes, theintact virus has never been cultivated and the sequencedata are fragmentary, delineation of the open readingframe of the pol gene was unclear. For production andstudy of the 3173 Pol, expression was initiated at anATG codon that appeared to be the most probable startsite based on alignment to bacterial pol genes. Despitethe success in using this 55-kDa expression product in

Page 10: Handbook of Molecular Microbial Ecology II (Metagenomics in Different Habitats) || Functional Viral Metagenomics and the Development of New Enzymes for DNA and RNA Amplification and

572 Chapter 53 Functional Viral Metagenomics and the Development of New Enzymes

16542 bp

43.5

32.5

21.5

10.5S

NP

s pe

r 10

bp

0

bp

ORFs

0

dnaG

2000 4000 6000 8000 10000 12000 14000 16000

187 reads87% two reads per stand

cas4 tnpERCC4poI8udgthyAZn tinger

Figure 53.5 Assembly of a 16.5-kb viralmetagenome consensus contig from Octopushot spring showing single nucleotidepolymorphism heterogeneity. A 16.5-kbcontig was assembled at 50% identity fromthe NYP Octopus hot spring library.Sequence coverage is shown on the top,with each line representing a separate read.Single nucleotide polymorphisms per 10base pairs were normalized to the number ofreads covering the respective nucleotide andare aligned with predicted open readingframes from the consensus sequence in thecontig and the gene name of the strongestBLASTx similarity. Direction oftranscription is shown by the arrows.Similarities to known genes were identifiedby BLASTp. Reprinted Schoenfeld et al.[2008], with permission.

(A)(B)

Figure 53.6 Putative polyprotein from great boiling spring viral metagenome. The PyroPhage 74-like pol genes are aligned to the consensussequence (A). All of the clones contain the C-terminal half of a 100-kDa ORF, but vary in the amount of N-terminal sequence. Despitedifferences in the sizes of open reading frames of the inserts, all PyroPhage 74-like clones express a thermostable protein of about 55 kDa (B).The 347 clone, in contrast, produces a 35-kDa thermostable protein.

RT-PCR and other applications (see above), anomalieswere apparent in the open reading frame that wasused for expression of this enzyme. First, there was noobvious adjacent ribosome binding site or transcriptionalpromoter. Second, there was no homologous ATG codonin the related 488 and 967 clones (Fig. 53.1), despiteoverall alignment with the 3173 gene. Finally, an openreading frame extended upstream from the putative startcodon to the insertion site of the viral sequence in thecloning vector.

Low identity assembly of the 3173 clone proveduseful in dissecting the molecular biology of this geneand allowed production of the complete enzyme cor-responding to the likely in vivo product. In contrast tothe 4110-like and 74-like polymerase families, the 3173clone was derived from a highly divergent, less abundantvirus, since reads from this clone failed to assemble at95% identity with any other read in the library. Assembly

at 75% identity resulted in a 7299-nt contig (Fig. 53.7A),comprised of four reads. This assembly was confirmedby PCR amplification of nearly the entire contig fromviral DNA isolated from the same hot spring 4 years laterto produce a product of the predicted size (Fig. 53.7B).This amplification also suggests that the 3173-encodingvirus is more persistent in the environment than otherviral families, none of which was detectable in the latersamples. This contig encodes four open reading frames ofgreater than 100 nt. The largest of these encodes a proteinof 1608 amino acids (170 kDa), the carboxy-terminalportion of which includes the 55-kDa PyroPhage 3173DNA polymerase. The amino-terminal portion containsa coding sequence with only weak similarity to knowngenes. The other open reading frames encode putativehelicases and a cas4/recB endonuclease protein.

The amplification product of the entire 1608 aminoacid ORF expressed in E. coli produced an 80-kDa

Page 11: Handbook of Molecular Microbial Ecology II (Metagenomics in Different Habitats) || Functional Viral Metagenomics and the Development of New Enzymes for DNA and RNA Amplification and

53.3 Results and Discussion 573

(A) (B) (C)

Figure 53.7 Analysis and PCR amplification of a 7.3-kb contig from 75% NIAID assembly. A 7.3-kb contig was assembled from four clonesin the hot springs viral metagenome. GeneMark identified four open reading frames of greater than 100 amino acids, the sizes of which (144,229, 202, and 1608 amino acids) are indicated (A). These genes had BLASTx similarity to helicases, cas4 (recB), and DNA polymerases, withthe indicated E values. Primers derived from the assembly are indicated by arrows, and their positions on the contig are indicated by theassociated numbers. These primers were used to amplify viral DNA isolated 4 years after the original collection (B). An amplicon covering the1608-amino-acid ORF (B, lane 2) was used inserted into an expression system. An apparent truncation product of ∼80 kDa, indicated by thearrow (C), co-purified with the Pol activity.

protein (Fig. 53.7C) that co-purified with thermostableDNA polymerase activity. The simplest explanation isthat the 1608 amino acid protein (expected MW of170 kDa) is processed in vivo or in vitro to gener-ate the 80-kDa product and that the original 55-kDaPyroPhage 3173 Pol was a cloning anomaly. Sup-porting this interpretation, amino acids 884–894 formthe motif AYIYLGSIFVE, which was predicted bycleavage site analysis to be both labile to autolyticcleavage and accessible on the surface [Blom et al.,1996]. Cleavage between G and S would result in a704-amino-acid (80-kDa) protein. The amino-terminalamino acids from the 80-kDa protein aligns with the5′ –3′ exonuclease domains of T. aquaticus and E.coli . The amino acids involved in nucleotide bindingare conserved, but not the amino acids required forhydrolysis. Although the 55-kDa protein has showngreat utility, it is possible that addition of this 25-kDaamino-terminal sequence, or a portion thereof, wouldimprove its function for certain applications. In additionto the 80-kDa Pol protein, the other ORFs are beingexpressed to reconstitute the presumptive replicaseholoenzyme.

This work highlights an important caveat of enzymediscovery by metagenomics. The fragmentary sequencescan result in recovery of partial genes. Assembly ofsequences can be the only means of verifying ORFs. Inthis case, the partial gene proved highly useful, but inmany cases a functional protein could easily be missedby recovery of partial sequences.

53.3.8 Sequence Variantsof PyroPhage 3173 DNA PolymeraseIsolated from the Viral MetagenomeMetagenomics has proven quite useful for new enzymediscovery. The utility of viral metagenomes is greatlyexpanded when it is used to guide engineering. Oneapproach to improving DNA polymerases is directedevolution [Ghadessy et al., 2001] based on randommutagenesis. While effective, quite daunting is the sheernumber of mutants that must be screened to approachsaturation. For an enzyme of the size of Taq Pol (832amino acids), this would require 20832 clones to com-pletely saturate the entire gene with mutagenized codonsand test all the possible amino acids at each positions.Even a fraction of this number overwhelms any currentor conceivable screening capability. To limit the search,algorithms have been developed to target mutagenesis tospecific domains [Voigt et al., 2001].

Metagenomic libraries are an alternative to randomdegenerate libraries as a source of molecular diversity.Since, in native populations, nature selects for active pro-teins, activities of variants in the libraries may differ, butthey should all retain function. To study sequence variants,the 55-kDa version of PyroPhage 3173 amplified fromviral DNA collected at Octopus hot spring (Fig. 53.7B)was cloned in an expression vector. Eleven clones wereused to express DNA polymerase activity and the insertswere sequenced. The variants were 93% identical to theoriginal 3173 isolate and at least 97% identity to one

Page 12: Handbook of Molecular Microbial Ecology II (Metagenomics in Different Habitats) || Functional Viral Metagenomics and the Development of New Enzymes for DNA and RNA Amplification and

574 Chapter 53 Functional Viral Metagenomics and the Development of New Enzymes

120%

100%

80%

60%

40%

20%

0%25 85

Incubation temperature

var 11 PQDLLNYQIQGGGAELFKKAIILLKEAKPDLKIVNLVHIPQDLLNYRIQGSGAELFKKAIVLLKEAKPDLKIVNLVHIPQDLLNYQIQGSGAELFKKAIVLLKETKPDLKIVNLVHI****** : *** : ********* : **** : ************

var 13173pol

(A) Thermostability of the 3173 Variants

(B) Partial Amino Acid Alignment of 3173 Variants.

90 95 98

Act

ivity

rem

aini

ng

var 1var 2var 3var 4var 5var 10var 113173

Figure 53.8 Thermostability of PyroPhage 3173 Polvariants. The amplification product from Figure 53.7B,lane 2, was cloned and expressed to producethermostable protein. The clones grouped into at leastfour families that were 97% identical to one another and93% identical to the original clone. The expressed Polactivity was purified and tested for thermostability byincubating for 10 min at the indicated temperature andassaying using the standard DNA polymerase assay (A).Shown are amino acid alignments of a portion of the Qhelix of from the prototype PyroPhage 3173 and the twoleast thermostable sequence variants (variants 1 and 11)(B). These thermolabile variants had one or two uniqueamino acids, respectively, that mapped to this region.

another. When the polymerases were partially purifiedand tested, they had a significant range of thermostability(Fig. 53.8). The two most labile enzymes had only one ortwo unique nucleotide polymorphism each. Two of theseindependent sequence polymorphisms map within fourcodons of each other. No three-dimensional structure isavailable for PyroPhage 3173 Pol, but, based on sequencealignment to Taq DNA polymerase and its known proteinstructure [Kim et al., 1995], the polymorphisms associ-ated with reduced thermostability likely map to the samealpha helix (the Q-helix) within one of the “fingers” of thePol structure. If so, the two affected amino acids are atthe proper spacing to be adjacent on the alpha helix (fouramino acids apart) and likely interact to stabilize or desta-bilize the alpha helix and thereby alter thermostability.

While a goal of screening hot springs viromes wasto find the most thermostable enzymes possible, thelower thermostability variants have value. Isothermalamplification methods such as LAMP [Notomi et al.,2000] use intermediate temperature (i.e., 50◦C to 70◦C)and do not require extreme thermostability. Less ther-mostable enzymes will likely have higher activity at theseintermediate temperatures [Giver et al., 1998]. Equallyimportantly, amino acids that reduce thermostability mapto regions that can be targeted to increase thermostability[Bae and Phillips, 2004] and are attractive targets formutagenesis.

53.4 PROSPECTS

The focus of our efforts has been discovering and improv-ing thermostable DNA polymerases. Metagenomics isplaying a role in both the discovery and development

phases of this project. Viral metagenomics has revealednew replicase operons, thermophilic polyproteins, andentirely new classes of Pols with novel and useful activi-ties for a number of methods of DNA and RNA detectionand analysis. In the near future it may be possible toassemble a complete genome from uncultivated virusesfrom thermal environments and recover intact replicaseoperons using the appropriate combination of sequencingstrategy, assembly paradigm, and genome walkingtechniques. We are just beginning to use the informationencoded in the viral metagenomes to direct our enzymeimprovement program. Additional applications can likelybe improved by the discovery of enzymes other thanPols. In many cases, viral metagenomes are excellentsources of diversity for these discovery programs, andpresumably any biochemical characteristic that can bemeasured can be further improved by application of theknowledge gained through metagenomics.

AcknowledgmentsSamples from YNP were collected under ResearchPermit YELL-05240. This work was supported by NSFgrants 0109756 and 0215988, NIH-NHGRI grant R43HG002714-01, and NIH-NIAID grant R43 AI081467-01to TWS and DOE grant DE-FG02-02ER83484 to DM.

REFERENCES

Ando T, Monroe SS, Noel JS, Glass RI. 1997. A one-tube methodof reverse transcription-PCR to efficiently amplify a 3-kilobaseregion from the RNA polymerase gene to the poly(A) tail of smallround-structured viruses (Norwalk-like viruses). J. Clin. Microbiol .35:570–577.

Page 13: Handbook of Molecular Microbial Ecology II (Metagenomics in Different Habitats) || Functional Viral Metagenomics and the Development of New Enzymes for DNA and RNA Amplification and

References 575

Andraos N, Tabor S, Richardson CC. 2004. The highly processiveDNA polymerase of bacteriophage T5. Role of the unique N and Ctermini. J. Biol. Chem . 279:50609– 50618.

Angly FE, Felts B, Breitbart M, Salamon P, Edwards RA,et al. 2006. The marine viromes of four oceanic regions. PLoS Biol .4:e368.

Babon JJ, McKenzie M, Cotton RG. 2003. The use of resolvases T4endonuclease VII and T7 endonuclease I in mutation detection. Mol.Biotechnol . 23:73–81.

Bae E, Phillips GN Jr. 2004. Structures and analysis of highly homol-ogous psychrophilic, mesophilic, and thermophilic adenylate kinases.J. Biol. Chem . 279:28202–28208.

Bakhanashvili M, Hizi A. 1993. The fidelity of the reverse transcrip-tases of human immunodeficiency viruses and murine leukemia virus,exhibited by the mispair extension frequencies, is sequence dependentand enzyme related. FEBS Lett . 319:201–205.

Barry ER, Bell SD. 2006. DNA replication in the archaea. Microbiol.Biol. Rev . 70:876–887.

Beese LS, Steitz TA. 1991. Structural basis for the 3′ –5′ exonucle-ase activity of Escherichia coli DNA polymerase I: a two metal ionmechanism. EMBO J. 10:25–33.

Bench SR, Hanson TE, Williamson KE, Ghosh D, RadosovichM, Wang K, Wommack KE. 2007. Metagenomic characteriza-tion of Chesapeake Bay virioplankton. Appl. Environ. Microbiol .73(23):7629–7641.

Besemer J, Borodovsky M. 2005. GeneMark: Web software for genefinding in prokaryotes, eukaryotes and viruses. Nucleic Acids Res .33:W451–4.

Blanco L, Bernad A, Lazaro JM, Martin G, Garmendia C, et al.1989. Highly efficient DNA synthesis by the phage phi 29 DNApolymerase. Symmetrical mode of DNA replication. J. Biol. Chem .264:8935–8940.

Blanco L, Lazaro JM, de Vega M, Bonnin A, Salas M. 1994.Terminal protein-primed DNA amplification. Proc. Natl. Acad. Sci.USA 91:12198–12202.

Blom N, Hansen J, Blaas D, Brunak S. 1996. Cleavage site analysisin picornaviral polyproteins: discovering cellular targets by neuralnetworks. Protein Sci . 5:2203–2216.

Blondal T, Hjorleifsdottir SH, Fridjonsson OF, Aevarsson A,Skirnisdottir S, et al. 2003. Discovery and characterization of athermostable bacteriophage RNA ligase homologous to T4 RNA lig-ase 1. Nucleic Acids Res . 31:7247–7254.

Bowman RR, Hu WS, Pathak VK. 1998. Relative rates of retroviralreverse transcriptase template switching during RNA- and DNA-dependent DNA synthesis. J. Virol . 72:5198–5206.

Breitbart M, Salamon P, Andresen B, Mahaffy JM, Segall AM,et al. 2002. Genomic analysis of uncultured marine viral communities.Proc. Natl. Acad. Sci. USA 99:14250–14255.

Breitbart M, Wegley L, Leeds S, Schoenfeld T, Rohwer F. 2004.Phage community dynamics in hot springs. Appl. Environ. Microbiol .70:1633–1640.

Brincat JL, Pfeiffer JK, Telesnitsky A. 2002. RNase H activity isrequired for high-frequency repeat deletion during Moloney murineleukemia virus replication. J. Virol . 76:88–95.

Burns DP, Temin HM. 1994. High rates of frameshift mutations withinhomo-oligomeric runs during a single cycle of retroviral replication.J. Virol . 68:4196–4203.

Bustin SA, Mueller R. 2005. Real-time reverse transcription PCR(qRT-PCR) and its potential use in clinical diagnosis. Clin. Sci.(Lond.) 109:365–379.

Chandler DP, Wagnon CA, Bolton H Jr. 1998. Reverse transcrip-tase (RT) inhibition of PCR at low concentrations of template andits implications for quantitative RT-PCR. Appl. Environ. Microbiol .64:669–677.

Chumakov KM. 1994. Reverse transcriptase can inhibit PCR and stim-ulate primer–dimer formation. PCR Methods Appl . 4:62–64.

Compton J. 1991. Nucleic acid sequence-based amplification. Nature350:91–92.

Creighton S, Huang MM, Cai H, Arnheim N, Goodman MF. 1992.Base mispair extension kinetics. Binding of avian myeloblastosisreverse transcriptase to matched and mismatched base pair termini.J. Biol. Chem . 267:2633–2639.

Dean FB, Nelson JR, Giesler TL, Lasken RS. 2001. Rapid ampli-fication of plasmid and phage DNA using Phi 29 DNA poly-merase and multiply-primed rolling circle amplification. Genome Res .11:1095–1099.

Dean FB, Hosono S, Fang L, Wu X, Faruqi AF, et al. 2002. Com-prehensive human genome amplification using multiple displacementamplification. Proc. Natl. Acad. Sci. USA 99:5261–5266.

Desiere F, Lucchini S, Bruttin A, Zwahlen M, Brussow H. 1997.A highly conserved DNA replication module from Streptococcus ther-mophilus phages is similar in sequence and topology to a module fromLactococcus lactis phages. Virology 234:372–382.

Dinsdale EA, Edwards RA, Hall D, Angly F, Breitbart M,et al. 2008. Functional metagenomic profiling of nine biomes. Nature452:629–632.

Dionne I, Bell SD. 2005. Characterization of an archaeal family 4uracil DNA glycosylase and its interaction with PCNA and chromatinproteins. Biochem. J . 387:859–863.

Fehlmann C, Krapf R, Solioz M. 1993. Reverse transcriptase canblock polymerase chain reaction. Clin. Chem . 39:368–369.

Galibert F, Mandart E, Fitoussi F, Tiollais P, Charnay P. 1979.Nucleotide sequence of the hepatitis B virus genome (subtype ayw)cloned in E. coli . Nature 281:646–650.

Ghadessy FJ, Ong JL, Holliger P. 2001. Directed evolution of poly-merase function by compartmentalized self-replication. Proc. Natl.Acad. Sci. USA 98:4552–4557.

Giver L, Gershenson A, Freskgard PO, Arnold FH. 1998. Directedevolution of a thermostable esterase. Proc. Natl. Acad. Sci. USA95:12809–12813.

Goff SP. 1990. Retroviral reverse transcriptase: Synthesis, structure,and function. J. Acquir. Immune. Defic. Syndr . 3:817–831.

Griffiths E, Gupta RS. 2004. Signature sequences in diverse proteinsprovide evidence for the late divergence of the order Aquificales. Int.Microbiol . 7:41–52.

Guatelli JC, Whitfield KM, Kwoh DY, Barringer KJ, RichmanDD, et al. 1990. Isothermal, in vitro amplification of nucleic acidsby a multienzyme reaction modeled after retroviral replication. Proc.Natl. Acad. Sci. USA 87:7797.

Haft DH, Selengut J, Mongodin EF, Nelson KE. 2005. Aguild of 45 CRISPR-associated (Cas) protein families and multipleCRISPR/Cas subtypes exist in prokaryotic genomes. PLoS Comput.Biol . 1:e60.

Halvas EK, Svarovskaia ES, Pathak VK. 2000. Developmentof an in vivo assay to identify structural determinants in murineleukemia virus reverse transcriptase important for fidelity. J. Virol .74:312–319.

Hamdan SM, Richardson CC. 2009. Motors, switches, and contactsin the replisome. Annu. Rev. Biochem . 78:205–243.

Hatfull GF, Pedulla ML, Jacobs-Sera D, Cichon PM, Foley A,et al. 2006. Exploring the mycobacteriophage metaproteome: phagegenomics as an educational platform. PLoS Genet . 2:e92.

Heidelberg JF, Nelson WC, Schoenfeld T, Bhaya D. 2009. Germwarfare in a microbial mat community: CRISPRs provide insights intothe co-evolution of host and viral genomes. PLoS ONE 4:e4169.

Hjorleifsdottir S, Hreggvidsson GO, Fridjonsson OH, AevarssonA, Kristjansson JK. 2003. Nucleic acid encoding DNA polymeraseof bacteriophage RM 378 U PTO.

Page 14: Handbook of Molecular Microbial Ecology II (Metagenomics in Different Habitats) || Functional Viral Metagenomics and the Development of New Enzymes for DNA and RNA Amplification and

576 Chapter 53 Functional Viral Metagenomics and the Development of New Enzymes

Hogrefe HH, Cline J, Lovejoy AE, Nielson KB. 2001. DNA poly-merases from hyperthermophiles. Methods Enzymol . 334:91–116.

Jaatinen ST, Happonen LJ, Laurinmaki P, Butcher SJ, BamfordDH. 2008. Biochemical and structural characterisation of membrane-containing icosahedral dsDNA bacteriophages infecting thermophilicThermus thermophilus . Virology 379:10–19.

Johnson A, O’Donnell M. 2005. Cellular DNA replicases: Com-ponents and dynamics at the replication fork. Annu. Rev. Biochem .74:283–315.

Kahler M, Antranikian G. 2000. Cloning and characterization of afamily B DNA polymerase from the hyperthermophilic crenarchaeonPyrobaculum islandicum . J. Bacteriol . 182:655–663.

Karam JD, Konigsberg WH. 2000. DNA polymerase of the T4-relatedbacteriophages. Prog. Nucleic Acid. Res. Mol. Biol . 64:65–96.

Kim Y, Eom SH, Wang J, Lee DS, Suh SW, et al. 1995. Crystal struc-ture of Thermus aquaticus DNA polymerase. Nature 376:612–616.

Komoda Y, Hijikata M, Tanji Y, Hirowatari Y, Mizushima H,et al. 1994. Processing of hepatitis C viral polyprotein in Escherichiacoli . Gene 145:221–226.

Konnick EQ, Williams SM, Ashwood ER, Hillyard DR. 2005.Evaluation of the COBAS hepatitis C virus (HCV) TaqMan analyte-specific reagent assay and comparison to the COBAS Amplicor HCVMonitor V2.0 and Versant HCV bDNA 3.0 assays. J. Clin. Microbiol .43:2133–2140.

Koonin EV. 2006. Temporal order of evolution of DNA replication sys-tems inferred by comparison of cellular and viral DNA polymerases.Biol. Direct . 1:39.

Kornberg A, Baker TA. 1992. DNA Replication 2. 187–189.Kotewicz ML, Sampson CM, D’Alessio JM, Gerard GF. 1988. Iso-

lation of cloned Moloney murine leukemia virus reverse transcriptaselacking ribonuclease H activity. Nucleic Acids Res . 16:265–277.

Kulpa D, Topping R, Telesnitsky A. 1997. Determination of the siteof first strand transfer during Moloney murine leukemia virus reversetranscription and identification of strand transfer-associated reversetranscriptase errors. EMBO J . 16:856–865.

Lehman IR, Boehmer PE. 1999. Replication of herpes simplex virusDNA. J. Biol. Chem . 274:28059–28062.

Leplae R, Hebrant A, Wodak SJ, Toussaint A. 2004. ACLAME:A CLAssification of Mobile genetic Elements. Nucleic Acids Res .32:D45–D49.

Levesque-Sergerie JP, Duquette M, Thibault C, Delbecchi L,Bissonnette N. 2007. Detection limits of several commercial reversetranscriptase enzymes: Impact on the low- and high-abundance tran-script levels assessed by quantitative RT-PCR. BMC Mol. Biol . 8:93.

Liss B. 2002. Improved quantitative real-time RT-PCR for expressionprofiling of individual cells. Nucleic Acids Res . 30:e89.

Liu D, Graber JH. 2006. Quantitative comparison of EST librariesrequires compensation for systematic biases in cDNA generation.BMC Bioinformatics 7:77.

Lloyd RE. 2006. Translational control by viral proteinases. Virus Res .119:76–88.

Lucchini S, Desiere F, Brussow H. 1999. Comparative genomics ofStreptococcus thermophilus phage species supports a modular evolu-tion theory. J. Virol . 73:8647–8656.

Lundberg KS, Shoemaker DD, Adams MW, Short JM, Sorge JA,et al. 1991. High-fidelity amplification using a thermostable DNApolymerase isolated from Pyrococcus furiosus . Gene 108:1–6.

Mardis ER. 2008. Next-generation DNA sequencing methods. Annu.Rev. Genomics Hum. Genet . 9:387–402.

Matsushita I, Yanase H. 2009. The genomic structure of thermusbacteriophage {phi}IN93. J. Biochem . 146:775–785.

Mbisa JL, Nikolenko GN and Pathak VK. 2005. Mutations in theRNase H primer grip domain of murine leukemia virus reverse tran-scriptase decrease efficiency and accuracy of plus-strand DNA trans-fer. J. Virol . 79:419–427.

McDaniel L, Breitbart M, Mobberley J, Long A, Haynes M, et al.2008. Metagenomic analysis of lysogeny in Tampa Bay: Implicationsfor prophage gene expression. PLoS ONE 3:e3263.

Merkens LS, Bryan SK, Moses RE. 1995. Inactivation of the 5′ –3′exonuclease of Thermus aquaticus DNA polymerase. Biochim. Bio-phys. Acta 1264:243–248.

Morin RD, Aksay G, Dolgosheina E, Ebhardt HA, Magrini V,et al. 2008. Comparative analysis of the small RNA transcriptomesof Pinus contorta and Oryza sativa . Genome Res . 18:571–584.

Myers TW, Gelfand DH. 1991. Reverse transcription and DNA ampli-fication by a Thermus thermophilus DNA polymerase. Biochemistry30:7661–7666.

Naryshkina T, Liu J, Florens L, Swanson SK, Pavlov AR, et al.2006. Thermus thermophilus bacteriophage phiYS40 genome and pro-teomic characterization of virions. J. Mol. Biol . 364:667–677.

Notomi T, Okayama H, Masubuchi H, Yonekawa T, Watanabe K,et al. 2000. Loop-mediated isothermal amplification of DNA. NucleicAcids Res . 28:E63.

Oshima T, Sakaki Y, Wakayama N, Watanabe K? Ohashi Z.1976. Biochemical studies on an extreme thermophile Thermus ther-mophilus: Thermal stabilities of cell constituents and a bacteriophage.Experientia Suppl . 26:317–331.

Pathak VK, Temin HM. 1990. Broad spectrum of in vivo for-ward mutations, hypermutations, and mutational hotspots in aretroviral shuttle vector after a single replication cycle: Substitu-tions, frameshifts, and hypermutations. Proc. Natl. Acad. Sci. USA87:6019–6023.

Pederson DM, Welsh LC, Marvin DA, Sampson M, Perham RN,et al. 2001. The protein capsid of filamentous bacteriophage PH75from Thermus thermophilus . J. Mol. Biol . 309:401–421.

Peng X, Basta T, Haring M, Garrett RA, Prangishvili D. 2007.Genome of the Acidianus bottle-shaped virus and insights into thereplication and packaging mechanisms. Virology 364:237–243.

Perler FB, Kumar S, Kong H. 1996. Thermostable DNA polymerases.Adv. Protein Chem . 48:377–435.

Perrino FW, Preston BD, Sandell LL, Loeb LA. 1989. Extension ofmismatched 3′ termini of DNA is a major determinant of the infidelityof human immunodeficiency virus type 1 reverse transcriptase. Proc.Natl. Acad. Sci. USA 86:8343–8347.

Prangishvili D, Garrett RA. 2005. Viruses of hyperthermophilic Cre-narchaea. Trends Microbiol . 13:535–542.

Pulsinelli GA, Temin HM. 1991. Characterization of large dele-tions occurring during a single round of retrovirus vector replication:Novel deletion mechanism involving errors in strand transfer. J. Virol .65:4786–4797.

Reeve MA, Fuller CW. 1995. A novel thermostable polymerase forDNA sequencing. Nature 376:796–797.

Ricchetti M, Buc H. 1990. Reverse transcriptases and genomic vari-ability: The accuracy of DNA replication is enzyme specific andsequence dependent. EMBO J . 9:1583–1593.

Roberts JA, Bell SD, White MF. 2003. An archaeal XPF repairendonuclease dependent on a heterotrimeric PCNA. Mol. Microbiol .48:361–371.

Roberts JD, Preston BD, Johnston LA, Soni A, Loeb LA, et al.1989. Fidelity of two retroviral reverse transcriptases during DNA-dependent DNA synthesis in vitro. Mol. Cell Biol . 9:469–476.

Saiki RK, Gelfand DH, Stoffel S, Scharf SJ, Higuchi R, et al.1988. Primer-directed enzymatic amplification of DNA with a ther-mostable DNA polymerase. Science 239:487–491.

Sakaki Y, Oshima T. 1975. Isolation and characterization of a bacte-riophage infectious to an extreme thermophile, Thermus thermophilusHB8. J. Virol . 15:1449–1453.

Schonfeld T. Unpublished results.

Page 15: Handbook of Molecular Microbial Ecology II (Metagenomics in Different Habitats) || Functional Viral Metagenomics and the Development of New Enzymes for DNA and RNA Amplification and

References 577

Schoenfeld T, Patterson M, Richardson PM, Wommack KE,Young M, et al. 2008. Assembly of viral metagenomes from yel-lowstone hot springs. Appl. Environ. Microbiol . 74:4164–4174.

Schoenfeld T, Liles M, Wommack KE, Polson SW, Godiska R,et al. 2010. Functional viral metagenomics and the next generationof molecular tools. Trends Microbiol . 18:20–29.

Sellner LN, Coelen RJ, Mackenzie JS. 1992. Reverse transcriptaseinhibits Taq polymerase activity. Nucleic Acids Res . 20:1487–1490.

Shendure J, Ji H. 2008. Next-generation DNA sequencing. Nat.Biotechnol . 26:1135–45.

Srinivasiah S, Bhavsar J, Thapar K, Liles M, Schoenfeld T, Wom-mack KE. 2008. Phages across the biosphere: contrasts of viruses insoil and aquatic environments. Res. Microbiol . 159(5):349–357.

Stanitsa ES, Arps L, Traktman P. 2006. Vaccinia virus uracil DNAglycosylase interacts with the A20 protein to form a heterodimericprocessivity factor for the viral DNA polymerase. J. Biol. Chem .281:3439–3451.

Suslov O, Steindler DA. 2005. PCR inhibition by reverse transcrip-tase leads to an overestimation of amplification efficiency. NucleicAcids Res . 33:e181.

Suttle CA. 2007. Marine viruses—Major players in the global ecosys-tem. Nat. Rev. Microbiol . 5:801–812.

Suzan-Monti M, La Scola B, Raoult D. 2006. Genomic and evo-lutionary aspects of Mimivirus. Virus Res . 117:145–155.

Svarovskaia ES, Delviks KA, Hwang CK, Pathak VK. 2000. Struc-tural determinants of murine leukemia virus reverse transcriptase thataffect the frequency of template switching. J. Virol . 74:7171–7178.

Tabor S, Richardson CC. 1987. DNA sequence analysis with a mod-ified bacteriophage T7 DNA polymerase. Proc. Natl. Acad. Sci. USA84:4767–4771.

Tabor S, Richardson CC. 1995. A single residue in DNA polymerasesof the Escherichia coli DNA polymerase I family is critical for dis-tinguishing between deoxy- and dideoxyribonucleotides. Proc. Natl.Acad. Sci. USA 92:6339–6343.

Tabor S, Huber HE, Richardson CC. 1987. Escherichia coli thiore-doxin confers processivity on the DNA polymerase activity ofthe gene 5 protein of bacteriophage T7. J. Biol. Chem . 262:16212–16223.

Taube R, Avidan O, Hizi A. 1997. The fidelity of misinsertion andmispair extension throughout DNA synthesis exhibited by mutantsof the reverse transcriptase of human immunodeficiency virus type 2resistant to nucleoside analogs. Eur. J. Biochem . 250:106–114.

Temin HM. 1993. Retrovirus variation and reverse transcription: Abnor-mal strand transfers result in retrovirus genetic variation. Proc NatlAcad Sci USA 90:6900–6903.

Thompson JD, Higgins DG, Gibson TJ. 1994. CLUSTAL W: Improv-ing the sensitivity of progressive multiple sequence alignment throughsequence weighting, position-specific gap penalties and weight matrixchoice. Nucleic Acids Res . 22:4673–4680.

Trakselis MA, Mayer MU, Ishmael FT, Roccasecca RM, BenkovicSJ. 2001. Dynamic protein interactions in the bacteriophage T4 repli-some. Trends Biochem. Sci . 26:566–572.

Vincent M, Xu Y, Kong H. 2004. Helicase-dependent isothermalDNA amplification. EMBO Rep. 5:795–800.

Voigt CA, Mayo SL, Arnold FH, Wang ZG. 2001. Computationallyfocusing the directed evolution of proteins. J. Cell Biochem. Suppl .37:58–63.

Wang X, Sun Q, McGrath SD, Mardis ER, Soloway PD, et al.2008. Transcriptome-wide identification of novel imprinted genes inneonatal mouse brain. PLoS ONE 3:e3839.

Weaver CA, Gordon DF, Kemper B. 1981. Introduction by molecularcloning of artifactual inverted sequences at the 5′ terminus of thesense strand of bovine parathyroid hormone cDNA. Proc Natl AcadSci USA 78:4073–4077.

Wommack KE, Bhavsar J, Ravel J. 2008. Metagenomics: read lengthmatters. Appl. Environ. Microbiol . 74(5):1453–1463.

Yu MX, Slater MR, Ackermann HW. 2006. Isolation and characteri-zation of Thermus bacteriophages. Arc, Virol . 151:663–679.