8
Chapter 8 Direct Metagenomic Detection of Viral Pathogens in Human Specimens Using an Unbiased High-Throughput Sequencing Approach Takaaki Nakaya, Shota Nakamura, Yoshiko Okamoto, Yoshiyuki Nagai, Jun Kawai, Yoshihide Hayashizaki, Tetsuya Iida, and Toshihiro Horii 8.1 INTRODUCTION Nucleic acid amplification tests (NATs) are increasingly being used to diagnose viral infections. The most familiar formats use DNA or RNA target amplification methods, such as reverse transcription (RT) PCR, and have sen- sitivities that are greater than culture- or antigen-based procedures [Fox, 2007]. Loop-mediated isothermal amplification is more convenient and sensitive than PCR in amplifying DNA targets, and it can be combined successfully with an RT step for RNA respiratory viruses. However, the wide variety of potential pathogens that elicit similar clinical symptoms and diseases makes the application of individual DNA- or RNA-based diagnostic assays both complex and expensive. Even multiplex PCRs are limited to 20–30 candidate pathogens, and they may be confounded if viral evolution results in mutations at the primer binding sites [Quan et al., 2007]. DNA microar- rays offer unprecedented opportunities for multiplexing; however, they are not widely implemented in clinical microbiology laboratories because of problems with sensitivity, throughput, and validation [Quan et al., 2007]. In addition, these microarrays are unavailable for unknown and/or unexpected microbes, because they require exact genetic information for each tested pathogen. Handbook of Molecular Microbial Ecology, Volume II: Metagenomics in Different Habitats, First Edition. Edited by Frans J. de Bruijn. © 2011 Wiley-Blackwell. Published 2011 by John Wiley & Sons, Inc. Newly developed “next-generation” sequencing tech- nologies, such as 454 (Roche), Solexa (Illumina), and SOLiD (ABI), allow researchers to obtain millions of sequences in a single round of operation, all in an unbi- ased manner [Fan et al., 2008]. Among these sequencing technologies, 454 currently offers the longest read length, 500 bp, on the Genome Sequencer (GS) FLX Titanium platform [Meyer et al., 2008; see also Chapter 18, Vol. I]. Sequencing error levels are low (<1%) (see Chapter 19, Vol. I) and arise primarily from homopolymer runs [Mar- gulies et al., 2005], but tend to be resolved in cases where there is sufficient coverage depth to allow the assembly of overlapping reads [Vera et al., 2008]. Many studies have used 454 pyrosequencing to ana- lyze PCR amplicons, bacterial artificial chromosomes, and genomic, mitochondrial, and plastid DNAs, as well as to perform expression profiling [Bainbridge et al., 2007; Goldberg et al., 2006; Moore et al., 2006; Poinar et al., 2006; Torres et al., 2008; Wicker et al., 2006]. The 454 platform is also a powerful tool for pathogen discovery [MacConaill and Meyerson, 2008] that has been used to identify new arenaviruses transmitted through solid- organ transplantation [Palacios et al., 2008], a new hemor- rhagic fever in southern Africa [Briese et al., 2009], and a new polyomavirus in Merkel cell skin carcinoma samples [Feng et al., 2008]. The 454 sequencing technique was 73

Handbook of Molecular Microbial Ecology II (Metagenomics in Different Habitats) || Direct Metagenomic Detection of Viral Pathogens in Human Specimens Using an Unbiased High-Throughput

  • Upload
    frans-j

  • View
    217

  • Download
    5

Embed Size (px)

Citation preview

Page 1: Handbook of Molecular Microbial Ecology II (Metagenomics in Different Habitats) || Direct Metagenomic Detection of Viral Pathogens in Human Specimens Using an Unbiased High-Throughput

Chapter 8

Direct Metagenomic Detection of ViralPathogens in Human Specimens Usingan Unbiased High-ThroughputSequencing Approach

Takaaki Nakaya, Shota Nakamura, Yoshiko Okamoto, YoshiyukiNagai, Jun Kawai, Yoshihide Hayashizaki, Tetsuya Iida, andToshihiro Horii

8.1 INTRODUCTION

Nucleic acid amplification tests (NATs) are increasinglybeing used to diagnose viral infections. The most familiarformats use DNA or RNA target amplification methods,such as reverse transcription (RT) PCR, and have sen-sitivities that are greater than culture- or antigen-basedprocedures [Fox, 2007]. Loop-mediated isothermalamplification is more convenient and sensitive than PCRin amplifying DNA targets, and it can be combinedsuccessfully with an RT step for RNA respiratory viruses.However, the wide variety of potential pathogens thatelicit similar clinical symptoms and diseases makes theapplication of individual DNA- or RNA-based diagnosticassays both complex and expensive. Even multiplex PCRsare limited to 20–30 candidate pathogens, and they maybe confounded if viral evolution results in mutations at theprimer binding sites [Quan et al., 2007]. DNA microar-rays offer unprecedented opportunities for multiplexing;however, they are not widely implemented in clinicalmicrobiology laboratories because of problems withsensitivity, throughput, and validation [Quan et al.,2007]. In addition, these microarrays are unavailablefor unknown and/or unexpected microbes, becausethey require exact genetic information for each testedpathogen.

Handbook of Molecular Microbial Ecology, Volume II: Metagenomics in Different Habitats, First Edition. Edited by Frans J. de Bruijn.© 2011 Wiley-Blackwell. Published 2011 by John Wiley & Sons, Inc.

Newly developed “next-generation” sequencing tech-nologies, such as 454 (Roche), Solexa (Illumina), andSOLiD (ABI), allow researchers to obtain millions ofsequences in a single round of operation, all in an unbi-ased manner [Fan et al., 2008]. Among these sequencingtechnologies, 454 currently offers the longest read length,∼500 bp, on the Genome Sequencer (GS) FLX Titaniumplatform [Meyer et al., 2008; see also Chapter 18, Vol. I].Sequencing error levels are low (<1%) (see Chapter 19,Vol. I) and arise primarily from homopolymer runs [Mar-gulies et al., 2005], but tend to be resolved in cases wherethere is sufficient coverage depth to allow the assemblyof overlapping reads [Vera et al., 2008].

Many studies have used 454 pyrosequencing to ana-lyze PCR amplicons, bacterial artificial chromosomes, andgenomic, mitochondrial, and plastid DNAs, as well asto perform expression profiling [Bainbridge et al., 2007;Goldberg et al., 2006; Moore et al., 2006; Poinar et al.,2006; Torres et al., 2008; Wicker et al., 2006]. The 454platform is also a powerful tool for pathogen discovery[MacConaill and Meyerson, 2008] that has been usedto identify new arenaviruses transmitted through solid-organ transplantation [Palacios et al., 2008], a new hemor-rhagic fever in southern Africa [Briese et al., 2009], and anew polyomavirus in Merkel cell skin carcinoma samples[Feng et al., 2008]. The 454 sequencing technique was

73

Page 2: Handbook of Molecular Microbial Ecology II (Metagenomics in Different Habitats) || Direct Metagenomic Detection of Viral Pathogens in Human Specimens Using an Unbiased High-Throughput

74 Chapter 8 Direct Metagenomic Detection of Viral Pathogens

also used to implicate Israeli acute paralysis virus as a sig-nificant marker for colony collapse disorder in honey bees[Cox-Foster et al., 2007]. Another group reportedly used454 to analyze the whole genome of Gallid herpesvirus[Spatz and Rue, 2008].

We previously demonstrated the direct detection of abacterial pathogen from a patient sample using 454 high-throughput DNA sequencing [Nakamura et al., 2008].We also reported the design and diagnostic validation ofan unbiased high-throughput sequencing method for thedirect diagnosis of viral infections in clinical specimens(e.g., nasopharyngeal and fecal samples) [Nakamuraet al., 2009]. In this chapter, we present additionalexperimental results using this method [Nakamura et al.,2009].

8.2 MATERIALS AND METHODS

8.2.1 RNA Isolation from ClinicalSamplesWe analyzed unlinked, anonymous samples at the OsakaPrefectural Institute of Public Health. These sampleswere nasopharyngeal aspirates and stools (n = 3 and 5,respectively) isolated between 2005 and 2007 in Osaka,Japan. Seasonal influenza A virus (flu) in nasopharyngealaspirates from 3- to 7-year-old children was detected bya rapid diagnostic kit based on immunochromatographyusing flu-specific antibodies. In 2006/2007, a large-scalenorovirus outbreak occurred in Osaka; diagnosis ofnorovirus infection (#N1–#N5) was made based onRT–PCR [Sakon et al., 2007]. The collected stool frompatients was suspended with an equal amount of PBSand was centrifuged at 15,000 rpm for 10 min. Thesupernatants (0.25 ml) were used for RNA isolation.

We also analyzed unlinked, anonymous humanplasma samples that were derived from blood donationsbetween 2000 and 2005 at Benesis Corporation. Fourspecific specimens (#H1 through #H4) were chosen. #H1and #H2 were previously diagnosed as hepatitis C virus(HCV) RNA-positive by RT-PCR. #H3 and #H4 werepreviously diagnosed with high alanine aminotransferase(ALT), but no pathogen(s) were detected. For eachspecimen, we used 0.25–0.5 ml of the plasma for RNAisolation.

These studies were approved by the ethical reviewcommittees of the RIMD, Osaka University.

8.2.2 Quantitative RT-PCRof NorovirusRNA extraction was performed using a Magtration-MagaZorbRNA Common kit (Precision System Science).

The viral copy number of norovirus was estimated withreal time RT-PCR [Sakon et al., 2007] using a One-Stepreal-time PCR reagent kit (Toyobo). A plasmid containingthe target sequence was used as a control.

8.2.3 Random RT-PCRAmplificationTotal RNA was extracted from specimens with TRI-LS(Sigma-Aldrich), and it was reverse-transcribed with theTransplex whole transcriptome amplification (WTA1)kit (Sigma-Aldrich) [Sakai et al., 2007; Watanabe et al.,2008] using a quasi-random primer, according to themanufacturer’s protocol. PCR amplification for thepreparation of template DNA for pyrosequencing wasperformed using AmpliTaq Gold DNA PolymeraseLD (Applied Biosystems) [Watanabe et al., 2008].Norovirus-specific PCR was performed as describedabove [Sakon et al., 2007], and flu-specific PCR wasperformed using the FluA M gene-specific primer set(M30F: 5′-TTCTAACCGAGGTCGAAACG-3′ andM264R2: 5′-ACAAAGCGTCTACGCTGCAG-3′).

8.2.4 Diagnosis with RT-PCRViral RNA was extracted from nasopharyngeal and fecalspecimens with a QIAamp Viral RNA Mini Kit (QIA-GEN), and cDNA was synthesized using SuperScriptTM

III reverse transcriptase (Invitrogen) with a random hex-amer, as described previously [Watanabe et al., 2008]. Thegenerated cDNA was subjected to PCR using the ExpandHigh-FidelityPLUS PCR System (Roche) with primer setsspecific to viruses, such as human coronaviruses [Lauet al., 2006], WU polyomavirus (WUV) [Gaynor et al.,2007], and pepper mild mottle virus (PMMV) [Zhanget al., 2006]. The PCR detection of GB virus C (GBV-C)was performed according to Ruiz et al. [2006].

8.2.5 Pyrosequencing and DataAnalysisAmplified cDNA was used as a template for GS FLX anal-ysis (454 Life Sciences). A 70 × 75 PicoTiterPlate device(gasket for 16 regions) was divided into two regions foreach of eight samples from nasopharyngeal aspirates andstools. Another plate was used for four plasma samples.The obtained data were then subjected to a data analysispipeline. Data analysis was performed on each readsequence by computational tools, as constructed previ-ously [Nakamura et al., 2008] with some modifications.The analysis steps were: (i) remove tag sequences; (ii)execute a BLASTN search by Hi-per BLAST (Fujitsu);(iii) identify the scientific name for each read based on theNCBI taxonomy database; and (iv) extract viral reads and

Page 3: Handbook of Molecular Microbial Ecology II (Metagenomics in Different Habitats) || Direct Metagenomic Detection of Viral Pathogens in Human Specimens Using an Unbiased High-Throughput

8.3 Results 75

perform mapping to reference data by SSEARCH. Thisanalysis pipeline was constructed by utilizing BioRuby[Goto et al., 2003], BioPerl [Stajich et al., 2002], andMySQL. After classification, particular human andbacterial reads were further analyzed as follows. Humangenome mapping was performed by a MEGABLASTsearch against the human genome, Homo_sapi-ens.NCBI36.49, using a threshold of 1 × 10−40. BacterialrRNA typing was performed by a BLASTN searchagainst the comprehensive rRNA database “silva” release94 [Pruesse et al., 2007; see also Chapter 45, Vol. I]using a threshold of 80% match per read.

8.3 RESULTS

8.3.1 Random RT-PCRAmplification by the TransplexWTA kitTotal RNA isolated from each clinical sample(0.1–0.5 ml) could not be measured with an ND-1000 spectrophotometer (NanoDrop Technologies).Therefore, we performed quasi-random RT-PCR amplifi-cation using a whole transcriptome amplification (WTA1)kit, according to the manufacturer’s protocol withmodifications (i.e., 70 cycles of PCR) [Watanabe et al.,2008]. After random RT-PCR amplification, 10–13 μgof cDNA were obtained from each sample (Fig. 8.1).Because almost all of the amplified cDNA were withinthe 200- to 1000-bp range (Fig. 8.1), the PCR productswere directly used as templates for emulsion PCR in theGS FLX pyrosequencing.

8.3.2 Pyrosequencing Using the GSFLX PlatformThe GS FLX system produces several million bases inone 7.5-h run [Nakamura et al., 2008]. The PicoTiterPlate

device, physically divided into 16 regions, was used inthis study, with 8 (nasal and fecal) or 4 (plasma) samplesbeing loaded into two regions each. A single run yielded15,298–32,335 (average 24,738) and 37,280–57,358(average 46,520) reads, respectively. Data analysis wasbasically performed according to our previously reportedprotocol [Nakamura et al., 2008]. However, since thetemplate cDNA for the high-throughput sequencing wasprepared with random RT-PCR, an extra step was addedto remove tag sequences to use the sequence data forBLASTN search analyses. Figure 8.2 shows the fractionof organisms (from which the sequences in the databasewere derived) that showed the best hits for the querysequences (E -value < 10−5). To identify viral species,including subtype, higher matches (E -value < 10−40) forthe sequence reads were selected (Table 8.1).

8.3.3 Nasal SamplesThe flu sequence was detected from all three samples in21,858–30,958 (average 25,978) reads. A partial genomewas covered in these samples, and the cover rate rangedfrom 8.1% to 58.3%. One major reason for the partial cov-erage might be the large amount of host-derived sequences(90.0–94.6%), which were due to our direct RNA isola-tion from nasopharyngeal aspirates without the elimina-tion of host or bacterial cells. However, 20–460 readswere flu-derived, which was sufficient for subtype identi-fication (H1N1 in sample #F2 and H3N2 in samples #F1and #F3) from these sequences (data not shown).

In addition to the flu sequence, a WUV-derivedsequence was detected in one specimen (#F3). Becausethe detected sequences were located in a single gene(VP1), the presence of a second gene (VP2) was con-firmed with PCR (data not shown). WUV and anothernovel human polyomavirus KI were cloned from respi-ratory tract specimens in 2007 [Allander et al., 2007;Bialasiewicz et al., 2007; Gaynor et al., 2007; Nakamura

1 Kbp

1 Kbp1 Kbp

M1 #F1 #F2 #F3 #N1 #N2 #N3 #N4 #N5 M1 #H1 #H2 #H3 #H4M2 M1

500bp500bp

Figure 8.1 Random RT-PCR amplification of cDNA from clinical specimens. The samples were nasopharyngeal aspirates (left panel:#F1–#F3), stools (middle panel: #N1–#N5), and plasma (right panel: #H1–#H4). RNA extracted from clinical specimens wasreverse-transcribed and random PCR-amplified to prepare template DNA for pyrosequencing. One microgram of amplified PCR products in eachsample was loaded onto a 1% agarose gel. M1 and M2 indicate 100-bp and 1kbp DNA ladders (NEB), respectively.

Page 4: Handbook of Molecular Microbial Ecology II (Metagenomics in Different Habitats) || Direct Metagenomic Detection of Viral Pathogens in Human Specimens Using an Unbiased High-Throughput

76 Chapter 8 Direct Metagenomic Detection of Viral Pathogens

Eukaryotes

#F1

#N1

#H1 #H2 #H3 #H4

#N2 #N3 #N4 #N5

#F2 #F3Bacteria

RNA viruses

Others

Figure 8.2 Pyrosequencing using theGS FLX system. Amplified cDNA wasused as a template for GS FLX analysis. A70 × 75 PicoTiterPlate device (gasket for16 regions) was divided into two regionsfor each sample. Obtained data were thensubjected to a data analysis pipeline, asdescribed in Section 8.2. A comparison ofthe organisms that comprise the bestmatches for the sequences is shown.

Table 8.1 Summary of Detected Viruses

Age Read Virus Age Read Virus

#F1 3 460 Influenza A virus (H3N2) #N5 44d 762 Pepper mild mottlevirus

3 Human endogenousretrovirus HCML-ARV

611 Norovirus (GII/4)

17 Crucifertobamovirus

#F2 7 20 Influenza A virus (H1N1) 2 Tobacco mosaicvirus

#F3 5 107 Influenza A virus (H3N2) #H1 Unknown 13,742 Hepatitis C virus7 WU polyomavirus 2 Human

picovirnavirus2 Enterobacteria

phage T7#N1 62a 7 Norovirus (GII/4)

#H2 Unknown 5,629 Hepatitis C virus#N2 82b 7,304 Norovirus (GII/4) 2 Enterobacteria

phage T7#N3 92b 15,272 Norovirus (GII/4) #H3 Unknown 7,582 GB virus C

813 Kyuri green mottle mosaicvirus

1 Enterobacteriaphage T7

7 Citrus tristeza virus3 Enterobacteria phage phiK #H4 Unknown 5,068 GB virus C

2 Enterbacteria phageT7

#N4 3c 484 Norovirus (GII/4)14 Human coronavirus HKU13 Phage phiV103 Human endogenous

retrovirus K

a Hospitalized patient.b Patients in welfare facilities.cKindergarten student.d Putative food (raw oyster)-associated.

Page 5: Handbook of Molecular Microbial Ecology II (Metagenomics in Different Habitats) || Direct Metagenomic Detection of Viral Pathogens in Human Specimens Using an Unbiased High-Throughput

8.4 Discussion 77

et al., 2008]. Although their etiological role in childhoodrespiratory disease has been proposed [Bialasiewicz et al.,2007; Gaynor et al., 2007], inconsistent epidemiologicalresults have been reported [Norja et al., 2007]. In thisstudy, the WUV-positive patient was a kindergartenstudent who was co-infected with flu, consistent with thereport by Norje et al. [2007]. Partial sequences of thehuman endogenous retrovirus HCML-ARV were detectedin sample #F1 (Table 8.1), although the pathogenesis ofthis virus is unknown.

8.3.4 Fecal SamplesTo remove bacterial and human cells present in thefeces, 15,000-rpm centrifugation was performed andthe supernatants were used for RNA isolation (see alsoChapter 62, Vol. I). The norovirus sequence was detectedfrom all five samples in 15,298–32,335 (average 23,994)reads. In contrast to influenza virus, almost the wholegenome was covered in the #N2 (7302 reads) and #N3(15,260 reads) samples, with average cover depths of141.5 and 258.7, respectively. More than 75% of thegenome was covered in the #N4 (484 reads) and #N5(611 reads) samples. A BLAST search of each sequencestrongly indicated that these four patients were infectedwith a similar genotype, GII.4 (Table 8.1), consistentwith previous diagnostic results [Sakon et al., 2007].In contrast, only 7 reads were detected in sample #N1(Table 8.1), which was undetectable with a single roundof PCR (data not shown), suggesting that the diagnosticmethod using high-throughput pyrosequencing is moresensitive than conventional PCR analysis.

Human coronavirus HKU-1 (HCoV-HKU1), whichwas recently identified as the fifth human coronavirus[Woo et al., 2005], was detected in one specimen (#N4).Epidemiological studies have reported that HCoV-HKU1could be detected in respiratory and stool samples fromchildren and adults. There is also a report of a 9-month-old patient who was co-infected with HCoV-HKU1 andinfluenza C virus [Vabret et al., 2006]. We showed herethat a 5-year-old child was co-infected with HCoV-HKU1and norovirus, although the relationship between thesetwo viruses and/or the relationship between pathogenesis(enteric tract illness) and co-infection of these twoviruses remain unknown. Human endogenous retrovirusK (HERV-K)-derived sequences were also detected inpatient #N4, who was 3 years old (Table 8.1). HERV-K isthe name given to an approximately 30-million-year-oldfamily of endogenous retroviruses present at >50 copiesper haploid human genome [Yang et al., 2000].

In addition to these human viruses, several plant-virus-derived sequences were also detected in the fecalsamples (Table 8.1). Almost all of the detected viruses,except for the Closterovirus group citrus tristeza virus,

belong to the Tobamovirus group (Table 8.1). The mostabundantly detected virus, PMMV (data not shown), wasalso previously found in healthy humans [Zhang et al.,2006]. Although this previous paper indicated that thereis little evidence to support the active replication ofPMMV in human feces, further investigations regardingplant virus replication in the human gut (epithelial cells)seem necessary.

8.3.5 Blood SamplesThe HCV sequence was detected from two samples (#H1and #H2; Table 8.1). Almost the whole genome was cov-ered in these samples, and the cover rate ranged from95.4% to 97.0%. In addition, the GBV-C sequence wasdetected in another two (#H3 and #H4) samples (coverrate: 82.5–91.4%), which were previously diagnosed withhigh ALT but had not displayed any virus. GBV-C andhepatitis G virus (HGV) were independently discovered,but it was determined that they corresponded to two iso-lates of the same virus [Leary et al., 1996]. GBV-C isthe human virus most closely related to HCV, althoughits hepatotropism is still controversial [Ruiz et al., 2006].The copy numbers of HCV (#H1 and #H2) in plasma were2.5 × 107 and 1.2 × 107, respectively, which are compa-rable to those of norovirus (1 × 106 and 5 × 107, respec-tively). Furthermore, specific reads for HCV or GBV-Ccomprised 10.4–17.4% (average 11.5%) of the total reads,similar to norovirus (0.1–59.8% of total reads; average17.4%) in fecal samples.

8.4 DISCUSSION

In this study, we used a “next-generation” parallelsequencing platform for viral detection in humannasopharyngeal, fecal, and blood samples. RandomRT-PCR was performed to amplify undetectable amountsof RNA extracted from 0.1–0.25 ml of nasopharyngealaspirate (N = 3), fecal (N = 5), and plasma (N = 4)

specimens, resulting in the synthesis of >10 μg ofcDNA. Although whole-genome analysis of nasopha-ryngeal aspirate samples was not possible because mostof the reads (>90%) were host genome-derived, wedetected 20–460 flu reads, which was sufficient forsubtype identification. In fecal samples, bacterial andhost cells were removed by centrifugation, resultingin a gain of 484–15,260 reads of norovirus sequences(78–98% of the whole genome was covered), except forone specimen that was undetectable by RT-PCR. In thefour plasma samples from blood donors, viral sequencesof HCV or GBV-C were detected in two specimens each.In addition, possibly pathogenic viruses, such as WUVand HCoV-HKU1, were detected in the nasopharyngeal

Page 6: Handbook of Molecular Microbial Ecology II (Metagenomics in Different Habitats) || Direct Metagenomic Detection of Viral Pathogens in Human Specimens Using an Unbiased High-Throughput

78 Chapter 8 Direct Metagenomic Detection of Viral Pathogens

Nasopharyngeal AspirateSample Fecal Sample

Suspension, Centrifugation

(0.1 ml)

(0.25 ml)

(RNA: less than 10 ng)

(DNA: about 10 μg)

(0.25–0.50 ml)

Blood Sample

Centrifugation

Plasma

Day1

Day2

Day3

Day4

Supernatant

RNA isolation by TRI-LS

Random RT-PCR by Transplex WTA

FragmentationAdaptor Ligation and Emulsion PCR

Pyrosequencing

Data Processing

⎛⎜⎝

GS FLXSequencing

Platform

Figure 8.3 Process diagram for thediagnosis of RNA viruses in nasopharyngealaspirate, fecal, and plasma samples.

aspirate and fecal samples, respectively, suggesting thatthis system (Fig. 8.3) is useful for novel virus identifi-cation and viral genome analysis. The WUV DNA viruswas detected from the isolated RNA, suggesting that theWUV genome and/or its transcripts present in infectedcells were detected. Indeed, a novel human polyomavirus(Merkel cell polyomavirus) isolated from skin carcinomawas detected from the mRNA [Feng et al., 2008].

Taken together, these results indicate that whole RNAisolation, including that from host cells and tissues, fol-lowed by the suitable elimination of host-derived genes,could be an effective method for identifying pathogenicviruses in clinical samples. Recently, we subjected stool-sample-extracted DNAs to 454 pyrosequencing and foundthat nearly 20% of the reads had best hits that matchedcurrently reported bacterial DNA sequences [Nakamuraet al., 2008]. These previous results, together with thefindings presented here, indicate that two protocols (directDNA extraction for bacteria, and cell/bacterial removalby centrifugation followed by RNA/DNA extractionfor virus) could be used to comprehensively identifypathogenic microbes in clinical samples.

When several pathogens are found in a singlesample, a careful interpretation is necessary to decidewhich pathogen(s) is the true cause of a specific disease.Although the most abundant pathogen might generallybe considered to be the best candidate, cooperative inter-actions between multiple pathogens cannot be excludedas an important factor for pathogenesis. To address thisquestion, suitable control samples from healthy personsand/or pair specimens, isolated after recovery, might

be required. Another possible problem with this viralgenome analysis is biased cDNA synthesis by quasi-random RT-PCR with the WTA kit. A significant biaswas found, and its pattern was identical in all samples(data not shown). TG (CA)-rich regions were selectivelyamplified with the WTA kit, probably due to nucleotidesequences of the quasi-random primer. Random RT-PCRamplification using the WTA kit was at least one loghigher than that using the conventional random hexamer(data not shown). This suggests that further improvementis required for whole viral genome analysis, although oursystem is suitable for the comprehensive detection of viralgenes. In addition, the TG (CA)-rich bias was observedwithin the viral genome; therefore, it seems unlikely thatthe bias leads to quantitative differences of the detectedsequences with respect to the original population.

Almost all diagnostic NATs require exact viralgenome information, and thus they cannot be usedto detect novel or unexpected viral infections. In thisstudy, we showed that a diagnostic system based onparallel high-throughput sequencing is useful for thedirect detection of unknown and/or small numbers ofviruses, as well as for the genetic characterization ofmajor pathogenic viruses in clinical specimens. Weplan to share this system (entitled the robotics-assistedpathogen identification system, or RAPID) domesticallyas well as with the Asian epidemic network (TheProgram of Founding Research Centers for Emergingand Reemerging Infectious Diseases) to enable the earlieridentification of unknown pathogens in a novel outbreakor bioterrorism.

Page 7: Handbook of Molecular Microbial Ecology II (Metagenomics in Different Habitats) || Direct Metagenomic Detection of Viral Pathogens in Human Specimens Using an Unbiased High-Throughput

References 79

The cost of this approach will be a key concern forits adaption by the research community. Microbe-derivedDNA/RNA enrichment [Lin et al., 2006], with suitableelimination of host-derived genes as described above,could reduce the required number of reads per sample.In addition, parallel tagged sequencing [Meyer et al.,2008] using sample-specific barcoding adaptors with5′-nucleotide tagged PCR primers [Binladen et al., 2007]would enable the analysis of multiple samples in a singlesequencing region. A lower cost and more compactbench-top pyrosequencer (GS Junior; Roche-454 lifeSciences) will soon be available. If these methods andmachinery were combined, it would lead to significantreductions in the operating costs (i.e., $2000 per sample)of 90% or more.

This system, which can produce >0.4 million clonesper run within a half-day, could also be very useful forthe rapid identification of important mutation(s) by directcomparison with wild and mutant viruses, including“pandemic flu” [Ilyushina et al., 2008] and more virulentnoroviruses [Lindesmith et al., 2008; Lopman et al.,2008] and hepatitis viruses.

INTERNET RESOURCES

NCBI blast (http://blast.ncbi.nlm.nih.gov/)

BioRuby (http://bioruby.org/)

SILVA, comprehensive ribosomal RNA databases(http://www.arb-silva.de/)

Research Institute for Microbial Diseases (RIMD),Osaka University (http://www.biken.osaka-u.ac.jp/e/)

The Program of Founding Research Centers for Emerg-ing and Reemerging Infectious Disease (CRNID),RIKEN (http://www.crnid.riken.jp/english/)

Omics Science Center (OSC), RIKEN, (http://www.osc.riken.jp/english/)

AcknowledgmentsThis chapter is derived from the original publication(Nakamura S, et al. 2009. PLoS One 4(1):e4219) withsome modifications and additional experimental data.We thank Drs. Norihiro Maeda, Michihira Tagami, RyojiMorita, Hiromi Sano, Kengo Hayashida (Omics ScienceCenter, RIKEN), Kazuo Takahashi, Naomi Sakon (OsakaPrefectural Institute of Public Health), Tetsuya Mizutani(National Institute of Infectious Diseases), KazuyoshiIkuta, Teruo Yasunaga, Takahiro Tougan, Naohisa Goto,Akifumi Yamashita, Mayo Ueda, Cheng-Song Yang(RIMD Osaka University), Mikihiro Yunoki (BenesisCorporation), and other project members who participated

in the development of the RAPID for 454 sequencinganalysis, as well as for helpful discussions.

REFERENCES

Allander T, Andreasson K, Gupta S, Bjerkner A, BogdanovicG, et al. 2007. Identification of a third human polyomavirus. J. Virol .81:4130–4136.

Bainbridge MN, Warren RL, He A, Bilenky M, Robertson AG,Jones SJ. 2007. THOR: targeted high-throughput ortholog reconstruc-tor. Bioinformatics 23:2622–2624.

Bialasiewicz S, Whiley DM, Lambert SB, Wang D, Nissen MD,Sloots TP. 2007. A newly reported human polyomavirus, KI virus,is present in the respiratory tract of Australian children. J. Clin. Virol .40:15–18.

Binladen J, Gilbert MT, Bollback JP, Panitz F, Bendixen C,et al. 2007. The use of coded PCR primers enables high-throughputsequencing of multiple homolog amplification products by 454 par-allel sequencing. PLoS ONE 2:e197.

Briese T, Paweska JT, McMullan LK, Hutchison SK, Street C,et al. 2009. Genetic detection and characterization of Lujo virus, anew hemorrhagic fever-associated arenavirus from southern Africa.PLoS Pathog . 5:e1000455.

Cox-Foster DL, Conlan S, Holmes EC, Palacios G, Evans JD,et al. 2007. A metagenomic survey of microbes in honey bee colonycollapse disorder. Science 318:283–287.

Fan HC, Blumenfeld YJ, Chitkara U, Hudgins L, QuakeSr. 2008. Noninvasive diagnosis of fetal aneuploidy by shotgunsequencing DNA from maternal blood. Proc. Natl. Acad. Sci. USA105:16266–16271.

Feng H, Shuda M, Chang Y, Moore PS. 2008. Clonal integra-tion of a polyomavirus in human Merkel cell carcinoma. Science319:1096–1100.

Fox JD. 2007. Nucleic acid amplification tests for detection of respira-tory viruses. J. Clin. Virol . 40(Suppl. 1): S15–S23.

Gaynor AM, Nissen MD, Whiley DM, Mackay IM, Lambert SB,et al. 2007. Identification of a novel polyomavirus from patients withacute respiratory tract infections. PLoS Pathog . 3:e64.

Goldberg SM, Johnson J, Busam D, Feldblyum T, Ferriera S, et al.2006. A Sanger/pyrosequencing hybrid approach for the generationof high-quality draft assemblies of marine microbial genomes. Proc.Natl. Acad. Sci. USA 103:11240–11245.

Goto N, Nakao MC, Kawashima S, Katayama T, Kanehisa M.2003. BioRuby: Open-Source Bioinformatics Library. Genome Infor-matics 14:629–630.

Ilyushina NA, Govorkova EA, Gray TE, Bovin NV, WebsterRG. 2008. Human-like receptor specificity does not affect theneuraminidase-inhibitor susceptibility of H5N1 influenza viruses.PLoS Pathog . 4:e1000043.

Lau SK, Woo PC, Yip CC, Tse H, Tsoi HW, et al. 2006. Coron-avirus HKU1 and other coronavirus infections in Hong Kong. J. Clin.Microbiol . 44:2063–2071.

Leary TP, Muerhoff AS, Simons JN, Pilot-Matias TJ, Erker JC,et al. 1996. Sequence and genomic organization of GBV-C: A novelmember of the flaviviridae associated with human non-A-E hepatitis.J. Med. Virol . 48:60–67.

Lin B, Wang Z, Vora GJ, Thornton JA, Schnur JM, et al. 2006.Broad-spectrum respiratory tract pathogen identification using rese-quencing DNA microarrays. Genome Res . 16:527–535.

Lindesmith LC, Donaldson EF, Lobue AD, Cannon JL, ZhengDP, et al. 2008. Mechanisms of GII.4 norovirus persistence in humanpopulations. PLoS Med . 5:e31.

Page 8: Handbook of Molecular Microbial Ecology II (Metagenomics in Different Habitats) || Direct Metagenomic Detection of Viral Pathogens in Human Specimens Using an Unbiased High-Throughput

80 Chapter 8 Direct Metagenomic Detection of Viral Pathogens

Lopman B, Zambon M, Brown DW. 2008. The evolution of norovirus,the “gastric flu”. PLoS Med . 5:e42.

MacConaill L, Meyerson M. 2008. Adding pathogens by genomicsubtraction. Nat. Genet . 40:380–382.

Margulies M, Egholm M, Altman WE, Attiya S, Bader JS, et al.2005. Genome sequencing in microfabricated high-density picolitrereactors. Nature 437:376–380.

Meyer M, Stenzel U, Hofreiter M. 2008. Parallel tagged sequencingon the 454 platform. Nat. Protoc. 3:267–278.

Moore MJ, Dhingra A, Soltis PS, Shaw R, Farmerie WG, et al.2006. Rapid and accurate pyrosequencing of angiosperm plastidgenomes. BMC Plant Biol . 6:17.

Nakamura S, Maeda N, Miron IM, Yoh M, Izutsu K, et al. 2008.Metagenomic diagnosis of bacterial infections. Emerg. Infect. Dis .14:1784–1786.

Nakamura S, Yang CS, Sakon N, Ueda M, Tougan T, et al. 2009.Direct metagenomic detection of viral pathogens in nasal and fecalspecimens using an unbiased high-throughput sequencing approach.PLoS One 4:e4219.

Norja P, Ubillos I, Templeton K, Simmonds P. 2007. No evidence foran association between infections with WU and KI polyomavirusesand respiratory disease. J. Clin. Virol . 40:307–311.

Palacios G, Druce J, Du L, Tran T, Birch C, et al. 2008. A newarenavirus in a cluster of fatal transplant-associated diseases. N. Engl.J. Med . 358:991–998.

Poinar HN, Schwarz C, Qi J, Shapiro B, Macphee RD, et al. 2006.Metagenomics to paleogenomics: large-scale sequencing of mammothDNA. Science 311:392–394.

Pruesse E, Quast C, Knittel K, Fuchs BM, Ludwig W, et al. 2007.SILVA: A comprehensive online resource for quality checked andaligned ribosomal RNA sequence data compatible with ARB. NucleicAcids Res . 35:7188–7196.

Quan PL, Palacios G, Jabado OJ, Conlan S, Hirschberg DL, et al.2007. Detection of respiratory viruses and subtype identification ofinfluenza A viruses by GreeneChipResp oligonucleotide microarray.J. Clin. Microbiol . 45:2359–2364.

Ruiz V, Espinola L, Mathet VL, Perandones CE, Oubina Jr. 2006.Design, development and evaluation of a competitive RT-PCR forquantitation of GBV-C RNA. J. Virol. Methods 136:58–64.

Sakai K, Mizutani T, Fukushi S, Saijo M, Endoh D, et al. 2007. Animproved procedure for rapid determination of viral RNA sequencesof avian RNA viruses. Arch. Virol . 152:1763–1765.

Sakon N, Yamazaki K, Yoda T, Tsukamoto T, Kase T, et al. 2007.Norovirus storm in Osaka, Japan, last winter (2006/2007). Jpn. J.Infect. Dis . 60:409–410.

Spatz SJ, Rue CA. 2008. Sequence determination of a mildly virulentstrain (CU-2) of Gallid herpesvirus type 2 using 454 pyrosequencing.Virus Genes 36:479–489.

Stajich JE, Block D, Boulez K, Brenner SE, Chervitz SA, et al.2002. The Bioperl toolkit: Perl modules for the life sciences. GenomeRes . 12:1611–1618.

Torres TT, Metta M, Ottenwalder B, Schlotterer C. 2008. Geneexpression profiling by massively parallel sequencing. Genome Res .18:172–177.

Vabret A, Dina J, Gouarin S, Petitjean J, Corbet S, FreymuthF. 2006. Detection of the new human coronavirus HKU1: A reportof 6 cases. Clin. Infect. Dis . 42:634–639.

Vera JC, Wheat CW, Fescemyer HW, Frilander MJ, Craw-ford DL, et al. 2008. Rapid transcriptome characterization fora nonmodel organism using 454 pyrosequencing. Mol. Ecol .17:1636–1647.

Watanabe S, Mizutani T, Sakai K, Kato K, Tohya Y, et al. 2008.Ligation-mediated amplification for effective rapid determination ofviral RNA sequences (RDV). J. Clin. Virol . 43:56–59.

Wicker T, Schlagenhauf E, Graner A, Close TJ, Keller B, SteinN. 2006. 454 sequencing put to the test using the complex genomeof barley. BMC Genomics 7:275.

Woo PC, Lau SK, Chu CM, Chan KH, Tsoi HW, et al. 2005.Characterization and complete genome sequence of a novel coron-avirus, coronavirus HKU1, from patients with pneumonia. J. Virol .79:884–895.

Yang J, Bogerd H, Le SY, Cullen BR. 2000. The human endogenousretrovirus K Rev response element coincides with a predicted RNAfolding region. RNA 6:1551–1564.

Zhang T, Breitbart M, Lee WH, Run JQ, Wei CL, et al. 2006.RNA viral community in human feces: Prevalence of plant pathogenicviruses. PLoS Biol . 4:e3.