34
1 Potential applications of human viral metagenomics and reference materials: 1 considerations for current and future viruses 2 Tasha M. Santiago-Rodriguez 1 and Emily B. Hollister 1 3 1 Diversigen, Inc., Houston, TX 77021 4 ABSTRACT 5 Viruses are ubiquitous particles comprised of genetic material that can infect bacteria, archaea, 6 fungi, as well as human and other animal cells. Given that determining virus composition and 7 function in association with states of human health and disease is of increasing interest, we 8 anticipate that the field of viral metagenomics will continue to expand and be applied in a variety 9 of areas ranging from surveillance to discovery, and will rely heavily upon the continued 10 development of reference materials and databases. Information regarding viral composition and 11 function readily translate into biological and clinical applications, including the rapid sequence 12 identification of pathogenic viruses in various sample types. However, viral metagenomic 13 approaches often lack appropriate standards and reference materials to enable cross-study 14 comparisons and assess potential biases which can be introduced at the various stages of 15 collection, storage, processing, and sequence analysis. In addition, implementation of appropriate 16 viral reference materials can aid in the benchmarking of current and development of novel assays 17 for virus identification, discovery, and surveillance. As the field of viral metagenomics expands 18 and standardizes, results will continue to translate into diverse applications. 19 20 Keywords: microbiome, mock communities, reference materials, viral metagenomics, virome 21 Corresponding authors email: [email protected]; [email protected]m 22 Address: 2450 Holcombe Blvd. Suite BCMA, Houston, TX, USA 77021 23 AEM Accepted Manuscript Posted Online 11 September 2020 Appl. Environ. Microbiol. doi:10.1128/AEM.01794-20 Copyright © 2020 Santiago-Rodriguez and Hollister. This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International license. on May 21, 2021 by guest http://aem.asm.org/ Downloaded from

Downloaded from //aem.asm.org/content/aem/early/2020/09/09/AEM... · 2020. 9. 9. · 1 1 Potential applications of human viral metagenomics and reference materials : 2 considerations

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Downloaded from //aem.asm.org/content/aem/early/2020/09/09/AEM... · 2020. 9. 9. · 1 1 Potential applications of human viral metagenomics and reference materials : 2 considerations

1

Potential applications of human viral metagenomics and reference materials: 1

considerations for current and future viruses 2

Tasha M. Santiago-Rodriguez1 and Emily B. Hollister

1 3

1Diversigen, Inc., Houston, TX 77021 4

ABSTRACT 5

Viruses are ubiquitous particles comprised of genetic material that can infect bacteria, archaea, 6

fungi, as well as human and other animal cells. Given that determining virus composition and 7

function in association with states of human health and disease is of increasing interest, we 8

anticipate that the field of viral metagenomics will continue to expand and be applied in a variety 9

of areas ranging from surveillance to discovery, and will rely heavily upon the continued 10

development of reference materials and databases. Information regarding viral composition and 11

function readily translate into biological and clinical applications, including the rapid sequence 12

identification of pathogenic viruses in various sample types. However, viral metagenomic 13

approaches often lack appropriate standards and reference materials to enable cross-study 14

comparisons and assess potential biases which can be introduced at the various stages of 15

collection, storage, processing, and sequence analysis. In addition, implementation of appropriate 16

viral reference materials can aid in the benchmarking of current and development of novel assays 17

for virus identification, discovery, and surveillance. As the field of viral metagenomics expands 18

and standardizes, results will continue to translate into diverse applications. 19

20

Keywords: microbiome, mock communities, reference materials, viral metagenomics, virome 21

Corresponding authors email: [email protected]; [email protected] 22

Address: 2450 Holcombe Blvd. Suite BCMA, Houston, TX, USA 77021 23

AEM Accepted Manuscript Posted Online 11 September 2020Appl. Environ. Microbiol. doi:10.1128/AEM.01794-20Copyright © 2020 Santiago-Rodriguez and Hollister.This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International license.

on May 21, 2021 by guest

http://aem.asm

.org/D

ownloaded from

Page 2: Downloaded from //aem.asm.org/content/aem/early/2020/09/09/AEM... · 2020. 9. 9. · 1 1 Potential applications of human viral metagenomics and reference materials : 2 considerations

2

Ongoing potential of viral metagenomics 24

Studies targeting the bacterial fraction of the human microbiome have increased in both 25

number and scope during the last decade. By way of example, a recent PubMed search 26

performed using the keyword “microbiome” demonstrates the exponential growth of microbiome 27

studies since 2000 (Figure 1A). However, the number of studies assessing the viral fraction, or 28

the virome, of various sample types has lagged substantially behind (Figure 1B). Reasons for the 29

dramatically lower number of virome studies include: (i) intrinsic challenges of studying viruses, 30

including their smaller structural and genome sizes relative to bacteria and other 31

microorganisms; (ii) high diversity in genome structure and composition (i.e. dsDNA, ssDNA, 32

dsRNA, and ssRNA); (iii) lack of a universal gene amenable to amplicon sequencing, as in the 33

case of the 16S rRNA gene for bacteria and archaea; and (iv) biases in sequence databases which 34

emphasize viral pathogens or very well-known bacteriophage (i.e., viruses that infect bacteria), 35

many of which have previously been cultured. 36

37

Despite these challenges, viruses are important members of the human microbiome. 38

Viruses are present across the body, at sites including the gut (1), the skin (2), and the oral cavity 39

(3), and they can inhabit body sites and samples types previously thought to be sterile including 40

the bladder (4), blood (5), and cerebrospinal fluid (6). Certain viruses can be acquired through 41

birth and continue to be seeded by the maternal bond (7) and shaped by dietary habits (8), as well 42

as intimate contact (9). Viral communities are associated with disease phenotypes including 43

periodontal disease (10), and Inflammatory Bowel Disease (IBD) (11), and can respond to 44

antibiotic treatment as an indirect response associated with changes in bacterial community 45

composition (12). Viruses, specifically bacteriophages, are now being revisited as a tool to treat 46

on May 21, 2021 by guest

http://aem.asm

.org/D

ownloaded from

Page 3: Downloaded from //aem.asm.org/content/aem/early/2020/09/09/AEM... · 2020. 9. 9. · 1 1 Potential applications of human viral metagenomics and reference materials : 2 considerations

3

antimicrobial-resistant infections and various disease phenotypes associated with toxin-47

producing bacteria. This was demonstrated in a recent study where bacteriophages infecting 48

cytolysin-producing Enterococcus faecalis strains could ameliorate liver disease severity in 49

patients with alcoholic hepatitis (13). Importantly, a number of phage therapy clinical trials have 50

been conducted. For instance, the PhagoBurn study was established to evaluate the treatment of 51

Escherichia coli and Pseudomonas aeruginosa burn wound infections using bacteriophages. It 52

was also the first prospective multicentric, randomized, single blind and controlled phage therapy 53

clinical trial according to both Good Manufacturing (GMP) and Good Clinical Practices (GCP 54

(https://globalclinicaltrialdata.com/trial/GCT022014-000714-65). Additional clinical trials which 55

seek to identify phage cocktails to treat burn wound infections are underway 56

(https://globalclinicaltrialdata.com/trial/GCT0104323475). Other clinical trials have evaluated 57

bacteriophages as prebiotics, which are defined as indigestible dietary components that promote 58

specific beneficial bacterial species (https://globalclinicaltrialdata.com/trial/GCT0103269617). 59

In addition, viruses can also predict disease risk (14). A recent study characterizing the stool 60

viromes of children at increased risk for type 1 diabetes showed that enterovirus B (EV-B) was 61

one of the most prevalent viruses in the children’s stool samples. The study also found that 62

children with prolonged shedding of the same EV-B serotype had higher odds of developing islet 63

autoimmunity compared to children negative for EV-B (14). These findings demonstrate the 64

potential applications of viruses for both disease treatment and prediction. 65

66

A number of the discoveries highlighted above have been facilitated through the field of 67

viral metagenomics. Studies demonstrating the current and future potential for viral 68

metagenomics revolve around key focus areas: (i) understanding the natural history of viruses, 69

on May 21, 2021 by guest

http://aem.asm

.org/D

ownloaded from

Page 4: Downloaded from //aem.asm.org/content/aem/early/2020/09/09/AEM... · 2020. 9. 9. · 1 1 Potential applications of human viral metagenomics and reference materials : 2 considerations

4

particularly virus presence/absence and potential integration/lysis mechanisms; (ii) identifying 70

viral associations with disease, including unexplained illnesses; (iii) identifying potential novel 71

viral relationships with health and disease; (iv) identifying viral associations as risk factors for 72

disease; and (v) using viral metagenomics as a surveillance tool for animal, community, and 73

global health. 74

75

Biases in viral metagenomics 76

Although there are a number of applications for viral metagenomics, biases can be 77

introduced at many points along the process. From sample collection and processing, to data 78

generation and analysis, many choices and processes can impact data quality, interpretation, and 79

comparison. A viral metagenomics pipeline usually includes sample collection, sample 80

processing, sequencing, and bioinformatic analyses, similar to a microbiome pipeline (Figure 2). 81

In the following sections, several of the potential biases introduced at the various steps are 82

discussed. 83

84

Sample collection and storage 85

Sample collection practices are often dependent on sample type. For instance, collection 86

methods for samples such as saliva and urine might differ from those of stool, due to the nature 87

of the sample matrix and the amount of material needed for downstream processing. Once a 88

sample has been appropriately collected, a major factor that can challenge the recovery of viral 89

nucleic acid and viral community information is storage temperature. Although this has not been 90

widely assessed in the context of viral metagenomics, microbiome studies have shown a 91

significant impact of storage temperature on bacterial diversity and membership (15). Studies 92

on May 21, 2021 by guest

http://aem.asm

.org/D

ownloaded from

Page 5: Downloaded from //aem.asm.org/content/aem/early/2020/09/09/AEM... · 2020. 9. 9. · 1 1 Potential applications of human viral metagenomics and reference materials : 2 considerations

5

assessing individual virus titers can also provide a sense of the impact of storage temperature on 93

virus viability for downstream applications. For instance, a study assessing the viability of 94

bacteriophage MS2 in wastewater found that only 20% of the initial titer was inactivated at 4°C, 95

compared to a reduction of 57% when storing the sample at -80°C after approximately 8 days 96

(16). Tailed phages, like those infecting Staphylococcus aureus, can be stored at -80°C for long-97

term preservation as long as a stabilizer is added (17). This usually ensures viability by 98

protecting phage tails. Other viruses, like Hepatitis C, can be inactivated after 5 days at room 99

temperature, after 6 months at 4°C, and can show a decrease of 15.6% after 5 days at −20°C 100

(18). While each virus demonstrates differing inactivation rates depending on the storage 101

conditions used, storage at -80°C is typically considered to be the gold standard for long-term 102

preservation of samples intended for microbiome analyses, and warmer storage temperatures can 103

significantly influence results (15). 104

105

Certain studies, particularly those where samples are collected in remote areas or samples 106

associated with outbreaks and pandemics, may require preservation buffers if there is no 107

immediate access to a -80°C freezer. For example, RNAlater (Ambion Inc, Austin, TX, USA), 108

has shown to preserve tissue samples in a similar manner as stored snap-frozen samples (19). 109

More specific to viruses, OMNIgene.GUT (DNA Genotek, Ottawa, Canada) and its preservation 110

buffer are known to preserve dsDNA viruses, including CrAssphage and bacteriophage T5, in a 111

manner similar to storing at -80°C (20). Experiments have shown a significant decrease in 112

bacteriophage T5, but not in CrAssphage, at room temperature after 14 days without preservation 113

buffers or other stabilization agents (20). Similar experiments have shown the ability of 114

OMNIgene.GUT to preserve both bacteriophage T5 and CrAssphage under different conditions 115

on May 21, 2021 by guest

http://aem.asm

.org/D

ownloaded from

Page 6: Downloaded from //aem.asm.org/content/aem/early/2020/09/09/AEM... · 2020. 9. 9. · 1 1 Potential applications of human viral metagenomics and reference materials : 2 considerations

6

(20). These results indicate that, although some viruses may be persistent across different sample 116

storage conditions, as in the case of CrAssphage, other viruses, like bacteriophage T5, may be 117

more sensitive to storage temperature, duration, and other storage conditions. It is possible that 118

these findings may also translate to other viruses, including RNA viruses (e.g. severe acute 119

respiratory syndrome coronavirus 2 (SARS-CoV2), hepatitis C, and influenza), indicating that it 120

is important to select the most suitable stabilization conditions depending on one’s study and its 121

goals. References materials, for instance, may provide insights into the expected outcomes. In 122

addition, future studies are needed to further assess the effect of storage conditions on viruses 123

associated with urine, blood and cerebrospinal fluid. Importantly, addition of a preservation 124

buffer to maintain virome profiles may not always be necessary when storing samples at -80°C. 125

This may be the case for stool samples, where the majority of the viruses are temperate 126

bacteriophages and have shown to be resistant to profile changes after long-term preservation at -127

80°C (21). This may also be explained by the stability in bacterial community composition after 128

long-term storage at -80°C (22). 129

130

Increasing numbers of freeze/thaw cycles are known to affect the detection and 131

distribution of members within microbial communities, particularly bacteria and viruses (23). 132

Freeze/thaw cycles are common when samples of interest are utilized multiple times without 133

prior aliquoting and homogenization. Freeze/thaw can also happen during freezer failures, or 134

insufficient maintenance of cold chain during the shipping of the samples to a laboratory facility. 135

This may result in, but is not limited to changes in viral titer and increased cellular debris, which 136

may in turn affect downstream purification steps including host cell nucleic acid removal (24). 137

Although certain DNA and RNA viruses, including Hepatitis B and C, HIV, and SARS-CoV, 138

on May 21, 2021 by guest

http://aem.asm

.org/D

ownloaded from

Page 7: Downloaded from //aem.asm.org/content/aem/early/2020/09/09/AEM... · 2020. 9. 9. · 1 1 Potential applications of human viral metagenomics and reference materials : 2 considerations

7

have shown to be stable after multiple freeze/thaw cycles involving freezing at -70°C or -80°C, 139

followed by thawing in a water bath at 25°C for 1h (Hepatitis B and C) or room temperature 140

(HIV) (25–27), the sensitivity of bacteria to freeze/thaw cycles may have the potential to impact 141

the recovery, diversity, and composition of bacteriophages, particularly prophages (21), and 142

additional work is needed in this space. Certain viruses are directly affected by freeze/thaw 143

cycles, while others are relatively stable. For example, intact influenza virus is dramatically 144

impacted by a single freeze–thaw cycle, reducing the concentration of filaments by almost half 145

(28). In contrast, extracted RNA from Influenza H1N1, demonstrates stability for up to 56 days 146

at −80°C or −20°C, or up to 9 freeze/thaw cycles (29). Similar experiments should be performed 147

on one’s virus(es) of interest to assess viability and nucleic acid stability preceding molecular 148

analyses. Likewise, the effects of storage conditions and freeze/thaw cycles need to be assessed 149

in novel and emerging viruses. 150

151

Sample processing 152

Sample pre-processing for viral metagenomics is crucial. Viral metagenomic protocols 153

may include a filtration/concentration step, where samples are centrifuged to remove any debris. 154

Centrifugation may be followed by serial filtrations using 0.45 and/or 0.22µm membranes, which 155

aim to remove cells larger than the pore size selected. In addition, filtration may be followed by a 156

concentration step using protein columns, as well as a purification step using cesium chloride 157

gradient ultracentrifugation, for example (9). Other methods include dithiothreitol treatment 158

(DTT) prior filtration and nucleic acid extraction, as well as CsCl gradient ultracentrifugation 159

after filtration and DNAse treatment. Results have shown that, while the CsCl gradient 160

ultracentrifugation method outperforms the DTT methods in removing host DNA, DTT methods 161

on May 21, 2021 by guest

http://aem.asm

.org/D

ownloaded from

Page 8: Downloaded from //aem.asm.org/content/aem/early/2020/09/09/AEM... · 2020. 9. 9. · 1 1 Potential applications of human viral metagenomics and reference materials : 2 considerations

8

discriminate less against specific phage species and can yield more DNA (30). Application of 162

one or several of these practices can significantly influence results. For instance, a previous study 163

testing one or several treatments for virus isolation and purification in saliva samples showed 164

that recovery of the vaccinia virus is reduced after centrifugation, as well as when applying 165

filtration through 0.45µm membranes (31). One possible reason for the loss of the vaccinia WR 166

virus when using filtration is the large size of the virus, which has dimensions of approximately 167

360 × 270 × 250nm. Using 0.22µm and 0.45 µm membranes may have resulted in virus retention 168

due to the large size of the virus. Studies have suggested pre-treatments of the filters that include 169

passing an appropriate buffer, 10% fetal calf serum, veal infusion broth, or bovine albumin prior 170

filtering to decrease virus retention when using an appropriate sized filter (32). In the case of 171

bacteriophages, it is essential to remove any bacterial cells using centrifugation and filtration, 172

and to apply DNAse and RNAse treatments before nucleic acid extraction to remove 173

extracellular nucleic acids that may be host-associated. This is particularly important because 174

phage genes share a degree of homology to bacterial genes and, thus, may affect downstream 175

data interpretation. 176

177

The application of one or several of these steps may be useful, particularly when working 178

with samples that originate from sites with high-level of host contamination. Saliva, blood, 179

biopsies, skin swabs, and cerebrospinal fluid are several sample types known to have a high 180

degree of host burden. In such cases, host cell removal may aid in obtaining accurate profiles and 181

increase the viral signal. Alternatively, enrichment methods for specific viruses have shown 182

promising results. For instance, ViroCap was designed to enrich nucleic acids from 34 viral 183

families infecting vertebrate hosts (33). Other examples include VirCapSeq-VERT, targeting 184

on May 21, 2021 by guest

http://aem.asm

.org/D

ownloaded from

Page 9: Downloaded from //aem.asm.org/content/aem/early/2020/09/09/AEM... · 2020. 9. 9. · 1 1 Potential applications of human viral metagenomics and reference materials : 2 considerations

9

over 200 viral taxa (34), hybrid-capture target enrichment using PCR-generated capture probes 185

(35), and, more recently, multiplex amplicon- and hybrid capture-based sequencing with ultra-186

high-throughput metatranscriptomics for SARS-CoV-2 studies (36). However, in cases of low-187

volume samples, as in the case of skin swabs, applying several of these steps may result in 188

sample loss. In such cases, deeper sequencing and post-sequencing host sequence removal may 189

provide actionable results with the caveat of lower evenness (e.g., Shannon diversity index), as in 190

the case of skin swab samples (2). In other cases, deeper sequencing may not be required due to 191

the low-diversity of the samples, but may be useful to increase genome coverage (37). Another 192

study evaluating 16 different concentration, extraction, and purification protocols showed that 193

tangential flow filtration (TFF), pyrophosphate in combination with sonication, and 194

ultracentrifugation in a sucrose gradient yielded significantly greater numbers of virus-like 195

particles (38). While there is no method or combination of methods known to provide the most 196

optimal results, sample pre-processing would need to be evaluated as needed for the particular 197

virus(es) of interest, although this may add additional biases. 198

199

Nucleic acid extraction and amplification of viral nucleic acids 200

Historically, viral nucleic acid extraction methods have been developed according to the 201

sample type and the virus(es) being targeted (39, 40). Commercially available methods now 202

facilitate viral nucleic acid extraction, and a number of them are intended for specific sample 203

types. For instance, a study evaluating the nucleic acid extraction efficiency from HeLa cells 204

spiked with four viruses, including the double-stranded DNA Epstein-Barr virus, double-205

stranded RNA Reovirus 3, single-stranded RNA Feline leukemia virus, and respiratory syncytial 206

virus, using eleven commercially available extraction kits based on silica membrane column, 207

on May 21, 2021 by guest

http://aem.asm

.org/D

ownloaded from

Page 10: Downloaded from //aem.asm.org/content/aem/early/2020/09/09/AEM... · 2020. 9. 9. · 1 1 Potential applications of human viral metagenomics and reference materials : 2 considerations

10

magnetic beads, and precipitation-based extractions found that dual extraction methods may 208

provide improved sensitivity for recovering nucleic acids from viruses with specific biochemical 209

and biophysical characteristics (41). In another study testing three different nucleic acid 210

extraction methods in biofilms, including chloroform, tetrasodium pyrophosphate in combination 211

with sonication, and dithiothreitol (DTT), showed varying DNA yields that may be explained by 212

the mode of action of the chemicals used. For instance, chloroform may denature the lipid 213

envelopes surrounding viral capsids, as in the case of viruses from the Phycodnaviridae family, 214

including, for instance, MpoVs, which infect Micromonas polaris (38, 42). Results indicate that 215

nucleic acid extraction methods would need to be tested independently, ideally using reference 216

materials, which may help understand the biases that could be introduced at this stage. 217

218

Viral nucleic acid extraction may result in low to moderate yields, depending on the 219

method(s) used. For this reason, whole amplification of the nucleic acids may be considered. 220

One method, known as multiple displacement amplification (MDA) can result in significant 221

higher nucleic acid yields as it utilizes a high-fidelity enzyme (43). Other amplification methods 222

include sequence-independent single primer amplification (SISPA), which is a primer initiated 223

technique that requires target sequence modification preceding the logarithmic amplification of 224

the DNA (44). However, random amplification methods may result in the overrepresentation of 225

certain viruses including single-stranded (ss) DNA viruses. For instance, a study assessing the 226

effect of MDA found the overrepresentation of bacteriophage M13 in a mock community 227

composed of seven DNA viruses including bacteriophage lambda, vaccinia virus, phage phi29, 228

adenovirus, bacteriophage M13, mice minute virus p, and a porcine circovirus (31). This 229

on May 21, 2021 by guest

http://aem.asm

.org/D

ownloaded from

Page 11: Downloaded from //aem.asm.org/content/aem/early/2020/09/09/AEM... · 2020. 9. 9. · 1 1 Potential applications of human viral metagenomics and reference materials : 2 considerations

11

suggests that, while amplification methods aid in obtaining higher nucleic acid yields, results 230

should be carefully interpreted, particularly in terms of abundances. 231

232

Sequencing technology 233

Although Illumina sequencing has dominated the field of metagenomics, other sequencing 234

platforms are also currently being used (e.g., Ion semiconductor sequencing platform). Yet, 235

combining outputs of various sequencing platforms may not be ideal when performing certain 236

meta-analyses (i.e. analyses combining datasets from different studies). In the case of viromes, 237

however, utilizing Illumina vs. Ion Torrent sequencing may not affect the diversity output. This 238

has been noted for cerebrospinal fluid, where viral alpha- and beta-diversity were not 239

significantly altered by the sequencing platform (6). Additional studies are needed to understand 240

the effect of the sequencing platform on more complex samples (e.g. stool and soil samples). 241

High-throughput sequencing is also increasingly being applied to clinical specimens aiming to 242

identify specific pathogenic viruses. In a study assessing the effect of the sequencing platform 243

and the sequencing kit using clinical specimens positive for various enteroviruses and 244

polioviruses showed that the number of viral reads will depend on the type of virus, as well as 245

the sequencing platform and sequencing kit. For instance, the number of enterovirus reads did 246

not seem to be affected by the sequencing platform or sequencing kit, but the number of 247

poliovirus reads was affected (45). The virus coverage was also affected by the sequencing 248

platform and sequencing kit, where the genome coverage was highest when using the Illumina 249

MiSeq 500 v2 kit compared to the Ion Torrent PGM kits (45). This seemed to be dependent on 250

the higher number of reads generated by the Illumina MiSeq 500 v2 kit compared to the Ion 251

on May 21, 2021 by guest

http://aem.asm

.org/D

ownloaded from

Page 12: Downloaded from //aem.asm.org/content/aem/early/2020/09/09/AEM... · 2020. 9. 9. · 1 1 Potential applications of human viral metagenomics and reference materials : 2 considerations

12

Torrent PGM kits. Ideally, further studies assessing the effect of the sequencing platform and 252

sequencing kit may include viral reference materials. 253

254

Bioinformatic analyses - Host removal and assembly 255

Bioinformatic analyses in viral metagenomics may include several steps preceding 256

annotation. As with many microbiome studies, viral metagenomics research is accompanied by 257

the in-silico removal of host sequence (i.e. human, animal, or plant) post-sequencing. This helps 258

to ensure that the analysis consists primarily of viral sequences, which in turn may significantly 259

reduce the amount of computer power and time to process results. Following host sequence 260

removal, viral metagenomic sequence analysis approaches may include viral sequence assembly, 261

and, depending on the annotation tool used, differing results can be acquired as a function of the 262

assembler used and the degree of success achieved in the assembly process. This makes 263

metagenomic assembly particularly challenging for virome data, as it may result in fragmented 264

assemblies and, consequently, poor annotations. A previous study tested the ability of 16 265

different tools for sequence assembly in several sample types, including a viral mock community 266

comprised of 12 viral genomes, 10 of which were at equal abundance (9.82% relative 267

abundance/virus) and 2 were ssDNA genomes (0.92% relative abundance/virus) (46) (Table 1). 268

For this mock community, specifically, particular assemblers, including CLC, Geneious, SPAdes 269

and VICUNA, were able to detect all 12 genomes. However, a number of false positives (i.e., 270

alignment to a number of reference genomes) were also identified. Other assemblers, 271

particularly, Velvet and MetaVelvet, generated no false positives, but failed to assemble three 272

genomes. In contrast, ABySS generated a large number of false positives and failed to assemble 273

four to six genomes, depending on the k-mer setting used. The assemblers IDBA UD and Ray 274

on May 21, 2021 by guest

http://aem.asm

.org/D

ownloaded from

Page 13: Downloaded from //aem.asm.org/content/aem/early/2020/09/09/AEM... · 2020. 9. 9. · 1 1 Potential applications of human viral metagenomics and reference materials : 2 considerations

13

Meta outperformed the other assemblers with an equal number of contigs to genomes, followed 275

by MEGAHIT and SPAdes (46). This illustrates the biases introduced when assembling viral 276

sequences and the ability of reference materials to determine the performance of assembly tools. 277

Reference materials may be used in parallel to the sample of interest to determine the effect of 278

assemblers in detecting the expected viruses, as well as the specificity, including the number of 279

false positives and the number of false negatives. This may also need to be applied as novel 280

assemblers become available. 281

282

Bioinformatic analyses - Annotation tools 283

Annotation tools can also impact the outcomes of a viral metagenomics study. A number 284

of annotation tools specific for viruses are available, which can be alignment- or k-mer-based 285

(47). Since a number of annotation tools will continue to be available, it is important to evaluate 286

the most suitable annotation tool based on specific research needs. For instance, a study 287

assessing the viral content of benthic deep-sea samples using BLAST, MG-RAST, NBC, 288

VMGAP, MetaVir, and VIROME showed that the BLAST tools, followed by MetaVir and 289

VMGAP, provided the most reliable results. In addition, while tBLASTx, MetaVir, VMGAP and 290

VIROME showed a similar efficiency of sequence annotation, MetaVir and tBLASTx identified 291

a higher number of viral strains (48). Another tool, known as VirMAP employs multiple methods 292

that include a combination of de-novo assembly and mapping-based strategies to taxonomically 293

classify sequences (49). While these tools possess a number of advantages, most are database-294

reliant and not necessarily suitable for novel virus discovery. For this reason, another more 295

recent tool known as VIBRANT (Virus Identification By iteRative ANnoTation) enables both 296

virus identification and discovery utilizing machine learning and protein similarity approaches. 297

on May 21, 2021 by guest

http://aem.asm

.org/D

ownloaded from

Page 14: Downloaded from //aem.asm.org/content/aem/early/2020/09/09/AEM... · 2020. 9. 9. · 1 1 Potential applications of human viral metagenomics and reference materials : 2 considerations

14

Importantly, VIBRANT highlights viral auxiliary metabolic genes and metabolic pathways, 298

which is usually not covered by other annotation tools (50). Another tool known as 299

DeepVirFinder is a reference and alignment-independent machine learning method for 300

identifying viral sequences in metagenomics using deep learning (51). Performance of current 301

and future viral taxonomic and functional annotation tools can be assessed using reference 302

materials. 303

304

Assessing biases in viral metagenomics 305

Although several approaches exist for assessing the biases introduced at one or more of 306

the steps described above, references materials represent a standardized manner to benchmark 307

pipelines, reagents, and bioinformatic analyses. Reference materials also represent an approach 308

to assess biases that could be introduced even after standardization (i.e. technician variation, 309

shipping issues, power shortages, etc.). However, the practice of including reference materials 310

arose relatively recently as an approach to address the reliability of Illumina sequencing as a 311

substitute of 454 sequencing (52). Specifically, the term “mock community” arose to refer to a 312

mix of bacterial DNA (52), and is currently applied to refer to a mix of bacterial or fungal cells 313

or DNA, as well as viral particles or nucleic acids in specific concentrations. A number of these 314

reference materials include mock communities composed of organisms with diverse structural 315

(e.g., cell-wall and morphology) and genomic characteristics (e.g., GC content and genome size), 316

mock communities mimicking various body sites, organisms intended to be spiked into a sample, 317

as well as homogenized stool samples. 318

319

on May 21, 2021 by guest

http://aem.asm

.org/D

ownloaded from

Page 15: Downloaded from //aem.asm.org/content/aem/early/2020/09/09/AEM... · 2020. 9. 9. · 1 1 Potential applications of human viral metagenomics and reference materials : 2 considerations

15

Although the importance of reference materials is increasingly being acknowledged, 320

there is room for improvement in the field. For instance, only 30% of articles published in 2018 321

in two broadly-read and well-cited microbiome and microbiology journals reported a negative 322

control, and only 10% reported results from positive controls (53). This suggests that there is the 323

need to implement and report the results of both negative and positive controls in microbiome 324

studies. Although such controls may include a variety of materials, reference materials, 325

specifically, can be used as positive controls and negative controls; yet, reference materials, 326

including mock communities, are not implemented on a regular basis in microbiome studies even 327

though they are available through research and academic institutions, as well as commercial 328

facilities. Most reference materials available are intended for microbiome analysis targeting the 329

bacterial fraction, and these have shown to be helpful in benchmarking nucleic acid extraction 330

methods (54), sequencing platforms and sequencing kits (55), assemblers, as well taxonomic and 331

functional classification tools (56, 57). A limited number of reference materials have been 332

developed for viral metagenomics research, and it is anticipated that additional materials may be 333

developed as viral metagenomic applications experience broader adoption. 334

335

Viral reference materials, as with bacterial and fungal reference materials, aid in 336

assessing biases introduced into a virome pipeline, as described above. Commercially available 337

reference materials, particularly viral mock communities from the American Type Culture 338

Collection (ATCC, Manassas, VA, USA) can be used to identify and potentially quantify biases 339

introduced at particular stages of a viral metagenomics pipeline, as highlighted in the previous 340

sections. ATCC’s viral mock communities are composed of equal concentrations of either 341

nucleic acids from five different viruses, or whole viruses. The mock communities possess 342

on May 21, 2021 by guest

http://aem.asm

.org/D

ownloaded from

Page 16: Downloaded from //aem.asm.org/content/aem/early/2020/09/09/AEM... · 2020. 9. 9. · 1 1 Potential applications of human viral metagenomics and reference materials : 2 considerations

16

double-stranded (ds) DNA viruses, including enveloped viruses (herpesvirus), and unenveloped 343

viruses (adenovirus). The mock communities also possess RNA viruses, including a positive 344

sense ssRNA virus (Zika virus), negative sense ssRNA virus (Influenza B virus and Human 345

respiratory syncytial virus), and a dsRNA virus (Reovirus 3) (58). 346

347

Potential future applications of viral metagenomics and areas for growth 348

Although viral metagenomics has been widely used for virome characterization in 349

various sample types, the approach can be leveraged in a number of different ways to address 350

current and future virus discovery and tracking. Some of these include, but are not limited to: (i) 351

expansion of viral databases through virus discovery efforts; (ii) surveillance activities, 352

particularly among wildlife reservoirs or in the context of SARS-COV-2 in wastewater; (iii) 353

identification of emerging and re-emerging pathogens in veterinary medicine; (iv) identification 354

of novel viral relationships in health and disease, including unexplained illnesses; (v) 355

understanding the potential effects of novel treatments (e.g. fecal transplantation and phage 356

therapy) on the virome and microbiome; (vi) layering virome information onto metagenomic 357

studies to provide new biological and clinical insights; (vii) continued improvements related to 358

sample collection, stabilization, extraction and detection assays; (viii) testing new reagents and 359

approaches; and (ix) development of reference materials for viral metagenomics and clinical 360

applications (Figure 3). 361

362

The numerous potential (future and ongoing) applications of viral metagenomics are 363

increasingly being realized. In the areas of discovery, for instance, viral metagenomics has 364

shown increasing value. In a study evaluating the use of viral metagenomics for the identification 365

on May 21, 2021 by guest

http://aem.asm

.org/D

ownloaded from

Page 17: Downloaded from //aem.asm.org/content/aem/early/2020/09/09/AEM... · 2020. 9. 9. · 1 1 Potential applications of human viral metagenomics and reference materials : 2 considerations

17

of viruses in bats showed that, while only <1.0% of the reads are viral, 83% are bacteriophage, 366

insect-borne viruses, and plant viruses, and 14% represent mammalian viruses (59). In addition, 367

while genome segments within a number of the mammalian viruses shared a high degree of 368

homology to known viruses, a number of these segments differed, suggesting that bats possess a 369

number of viruses that remain to be characterized (59). This approach may also translate to other 370

animals, particularly those harboring potential zoonotic viruses. More recent evidence has also 371

demonstrated the value of viral metagenomics in tracing potential recombination hosts for 372

SARS-CoV-2. Although it has been demonstrated that bats may have been the original reservoirs 373

of the novel coronavirus, genomic evidence has also shown that the pangolin may have served as 374

an intermediate host (60). This was also demonstrated using viral metagenomics, where a 375

coronavirus strain, similar to the human strain was identified from metagenomic datasets (61). 376

377

In addition to the application of viral metagenomics for the discovery and identification 378

of viruses, including SARS-CoV-2, the potential for surveillance activities remains largely 379

unexplored. Viral surveillance activities in various sample types rely largely on PCR- or panel-380

based techniques, which exhibit a number of advantages, including specificity and speed. Yet, 381

one drawback is that surveillance activities based on PCR- or panel-based techniques may be 382

limited to the virus(es) of interest. Viral metagenomics, on the other hand, exhibits the additional 383

advantage of the identification of a number of different viruses simultaneously, including novel 384

viruses. This suggests that viral metagenomics may be added to the toolbox of current methods 385

for virus surveillance in clinical and environmental samples (62). Similarly, viral metagenomics 386

has an increasing potential in virus surveillance in novel treatment methods. For instance, fecal 387

transplantation has shown to be effective in treating Clostridium difficile infections, but the 388

on May 21, 2021 by guest

http://aem.asm

.org/D

ownloaded from

Page 18: Downloaded from //aem.asm.org/content/aem/early/2020/09/09/AEM... · 2020. 9. 9. · 1 1 Potential applications of human viral metagenomics and reference materials : 2 considerations

18

transmission of viruses from donor to recipient remains largely unexplored. In a study assessing 389

the viral contents after fecal transplantation from a donor to three pediatric recipients found that 390

most of the viruses that were transmitted were bacteriophages (63), which is in agreement with a 391

previous study assessing the viral contents of chemostat systems (64). This approach may also 392

apply to the identification and surveillance of other potentially pathogenic viruses in donor and 393

recipient stool samples. Surveillance activities using sequencing approaches may also be applied 394

in under-characterized environments, such as the built environment (e.g. hospitals and schools). 395

Most studies characterizing the built environment focus on the bacterial fraction (65). Thus, viral 396

metagenomics studies of the built environment in developed and developing countries, and in 397

both rural and urbanized regions are still needed. Results may provide information regarding risk 398

of infection of viral pathogens, as well as transmission dynamics and routes of transmission (e.g. 399

airborne, fomite, and water routes) in various built environments (65). In addition, a number of 400

these built environments possess a high prevalence of multidrug-resistant bacteria, which 401

represent a risk to public health. Bacteriophages have, therefore, been suggested as a means to 402

control multidrug-resistant bacteria in built environments (65). In this case, sequencing 403

approaches may aid in identifying any prophages or prophage remnants in the multidrug-resistant 404

bacterial genomes, which often result in superinfection resistance. 405

406

Another area where viral metagenomics may have potential future applications and the 407

potential to further develop is in identifying novel and broad viral relationships in health and 408

disease, including potentially unexplained illnesses. For instance, a study applying high-409

throughput sequencing in low-biomass samples, including cerebrospinal fluid, blood and throat 410

swabs found that viruses were identified in 32% of the samples (66). In parallel, conventional 411

on May 21, 2021 by guest

http://aem.asm

.org/D

ownloaded from

Page 19: Downloaded from //aem.asm.org/content/aem/early/2020/09/09/AEM... · 2020. 9. 9. · 1 1 Potential applications of human viral metagenomics and reference materials : 2 considerations

19

virus diagnostic tests were performed and in multiple cases, the identified viruses were not 412

included in the selected routine diagnostic tests (66). Interestingly, application of viral 413

metagenomics resulted in the adjustment of a subject’s treatment after exclusion of a viral 414

infection (66). 415

416

Another potential application of viral metagenomics is the continued development and 417

refinement of viral reference materials. Development of novel viral reference materials may aid 418

to determine biases that can be introduced when testing current and novel reagents for viral 419

nucleic acid extraction, or conversion or RNA to cDNA, for instance. In addition, development 420

of novel viral reference materials can be used to validate current and novel surveillance assays 421

(e.g., qPCR assays for the detection of SARS-CoV2 in sewage and environmental samples). 422

Novel viral reference materials can also be used to validate a metagenomic assay, starting from 423

sample collection down to data analysis. In many cases when viral reference materials represent 424

the original virus strain, it could represent an opportunity to track virus evolution. Moving 425

forward, we anticipate that a number of viral reference materials will continue to be added to the 426

collection of those currently available. 427

428

References 429

1. Mukhopadhya I, Segal JP, Carding SR, Hart AL, Hold GL. 2019. The gut virome: the 430

‘missing link’between gut bacteria and host immunity? Therap Adv Gastroenterol 431

12:1756284819836620. https://doi.org/10.1177/1756284819836620. 432

2. Hannigan GD, Meisel JS, Tyldsley AS, Zheng Q, Hodkinson BP, SanMiguel AJ, Minot S, 433

Bushman FD, Grice EA. 2015. The Human Skin Double-Stranded DNA Virome: 434

on May 21, 2021 by guest

http://aem.asm

.org/D

ownloaded from

Page 20: Downloaded from //aem.asm.org/content/aem/early/2020/09/09/AEM... · 2020. 9. 9. · 1 1 Potential applications of human viral metagenomics and reference materials : 2 considerations

20

Topographical and Temporal Diversity, Genetic Enrichment, and Dynamic Associations 435

with the Host Microbiome. MBio 6:e01578-15. DOI:10.1128/mBio.01578-15. 436

3. Abeles SR, Robles-Sikisaka R, Ly M, Lum AG, Salzman J, Boehm TK, Pride DT. 2014. 437

Human oral viruses are personal, persistent and gender-consistent. ISME J 8:1753–1767. 438

4. Santiago-Rodriguez TM, Ly M, Bonilla N, Pride DT. 2015. The human urine virome in 439

association with urinary tract infections. Front Microbiol 6:14. 440

https://doi.org/10.3389/fmicb.2015.00014. 441

5. Moustafa A, Xie C, Kirkness E, Biggs W, Wong E, Turpaz Y, Bloom K, Delwart E, 442

Nelson KE, Venter JC, Telenti A. 2017. The blood DNA virome in 8,000 humans. PLoS 443

Pathog 13: e1006292. https://doi.org/10.1371/journal.ppat.1006292. 444

6. Ghose C, Ly M, Schwanemann LK, Shin JH, Atab K, Barr JJ, Little M, Schooley RT, 445

Chopyk J, Pride DT. 2019. The Virome of Cerebrospinal Fluid: Viruses Where We Once 446

Thought There Were None. Front Microbiol 10:2061. 447

https://doi.org/10.3389/fmicb.2019.02061 448

7. Maqsood R, Rodgers R, Rodriguez C, Handley SA, Ndao IM, Tarr PI, Warner BB, Lim 449

ES, Holtz LR. 2019. Discordant transmission of bacteria and viruses from mothers to 450

babies at birth. Microbiome 7:156. https://doi.org/10.1186/s40168-019-0766-7. 451

8. Schulfer A, Santiago-Rodriguez TM, Ly M, Borin JM, Chopyk J, Blaser MJ, Pride DT. 452

2020. Fecal Viral Community Responses to High-Fat Diet in Mice. mSphere 5:1. 453

DOI: 10.1128/mSphere.00833-19 454

9. Ly M, Jones MB, Abeles SR, Santiago-Rodriguez TM, Gao J, Chan IC, Ghose C, Pride 455

DT. 2016. Transmission of viruses via our microbiomes. Microbiome 4:1. DOI 456

10.1186/s40168-016-0212-z 457

on May 21, 2021 by guest

http://aem.asm

.org/D

ownloaded from

Page 21: Downloaded from //aem.asm.org/content/aem/early/2020/09/09/AEM... · 2020. 9. 9. · 1 1 Potential applications of human viral metagenomics and reference materials : 2 considerations

21

10. Ly M, Abeles SR, Boehm TK, Robles-Sikisaka R, Naidu M, Santiago-Rodriguez T, Pride 458

DT. 2014. Altered Oral Viral Ecology in Association with Periodontal Disease. MBio 5:3. 459

DOI: 10.1128/mBio.01133-14. 460

11. Lopetuso LR, Ianiro G, Scaldaferri F, Cammarota G, Gasbarrini A. 2016. Gut Virome and 461

Inflammatory Bowel Disease. Inflamm Bowel Dis 22:1708-1712. 462

https://doi.org/10.1097/MIB.0000000000000807. 463

12. Abeles SR, Ly M, Santiago-Rodriguez TM, Pride DT. 2015. Effects of long term 464

antibiotic therapy on human oral and fecal viromes. PLoS One 10: e0134941. 465

doi:10.1371/journal.pone.0134941. 466

13. Duan Y, Llorente C, Lang S, Brandl K, Chu H, Jiang L, White RC, Clarke TH, Nguyen K, 467

Torralba M, Shao Y, Liu J, Hernandez-Morales A, Lessor L, Rahman IR, Miyamoto Y, 468

Ly M, Gao B, Sun W, Kiesel R, Hutmacher F, Lee S, Ventura-Cots M, Bosques-Padilla F, 469

Verna EC, Abraldes JG, Brown RS, Vargas V, Altamirano J, Caballería J, Shawcross DL, 470

Ho SB, Louvet A, Lucey MR, Mathurin P, Garcia-Tsao G, Bataller R, Tu XM, Eckmann 471

L, van der Donk WA, Young R, Lawley TD, Stärkel P, Pride D, Fouts DE, Schnabl B. 472

2019. Bacteriophage targeting of gut bacterium attenuates alcoholic liver disease. Nature 473

575:505-511. https://doi.org/10.1038/s41586-019-1742-x. 474

14. Vehik K, Lynch KF, Wong MC, Tian X, Ross MC, Gibbs RA, Ajami NJ, Petrosino JF, 475

Rewers M, Toppari J, Ziegler AG, She JX, Lernmark A, Akolkar B, Hagopian WA, 476

Schatz DA, Krischer JP, Hyöty H, Lloyd RE. 2019. Prospective virome analyses in young 477

children at increased genetic risk for type 1 diabetes. Nat Med 25: 1865-1872. 478

https://doi.org/10.1038/s41591-019-0667-0. 479

15. van Zyl KN, Whitelaw AC, Newton-Foot M. 2020. The effect of storage conditions on 480

on May 21, 2021 by guest

http://aem.asm

.org/D

ownloaded from

Page 22: Downloaded from //aem.asm.org/content/aem/early/2020/09/09/AEM... · 2020. 9. 9. · 1 1 Potential applications of human viral metagenomics and reference materials : 2 considerations

22

microbial communities in stool. PLoS One 15: e0227486. 481

https://doi.org/10.1371/journal.pone.0227486 482

16. Olson MR, Axler RP, Hicks RE. 2004. Effects of freezing and storage temperature on 483

MS2 viability. J Virol Methods 122: 147-152. 484

https://doi.org/10.1016/j.jviromet.2004.08.010. 485

17. Gonzalez-Menendez E, Fernandez L, Gutierrez D, Rodríguez A, Martínez B, GarcíaI P. 486

2018. Comparative analysis of different preservation techniques for the storage of 487

Staphylococcus phages aimed for the industrial development of phage-based antimicrobial 488

products. PLoS One 13: e0205728. https://doi.org/10.1371/journal.pone.0205728. 489

18. Halfon P, Khiri H, Gerolami V, Bourliere M, Feryn JM, Reynier P, Gauthier A, Cartouzou 490

G. 1996. Impact of various handling and storage conditions on quantitative detection of 491

hepatitis C virus RNA. J Hepatol 25: 307-311. https://doi.org/10.1016/S0168-492

8278(96)80116-4. 493

19. Kohl C, Wegener M, Nitsche A, Kurth A. 2017. Use of RNALater® preservation for 494

virome sequencing in outbreak settings. Front Microbiol 8:1888. 495

https://doi.org/10.3389/fmicb.2017.01888. 496

20. Le François B, Doukhanine E. 2019. OMNIgene®• GUT accurately captures and 497

stabilizes the human fecal dsDNA virome. White paper. 498

21. Shkoporov AN, Ryan FJ, Draper LA, Forde A, Stockdale SR, Daly KM, McDonnell SA, 499

Nolan JA, Sutton TDS, Dalmasso M, McCann A, Ross RP, Hill C. 2018. Reproducible 500

protocols for metagenomic analysis of human faecal phageomes. Microbiome 6:68. 501

https://doi.org/10.1186/s40168-018-0446-z. 502

22. Dorsaz S, Charretier Y, Girard M, Gaïa N, Leo S, Schrenzel J, Harbarth S, Huttner B, 503

on May 21, 2021 by guest

http://aem.asm

.org/D

ownloaded from

Page 23: Downloaded from //aem.asm.org/content/aem/early/2020/09/09/AEM... · 2020. 9. 9. · 1 1 Potential applications of human viral metagenomics and reference materials : 2 considerations

23

Lazarevic V. 2020. Changes in Microbiota Profiles After Prolonged Frozen Storage of 504

Stool Suspensions. Front Cell Infect Microbiol 10:77. 505

https://doi.org/10.3389/fcimb.2020.00077. 506

23. Song SJ, Amir A, Metcalf JL, Amato KR, Xu ZZ, Humphrey G, Knight R. 2016. 507

Preservation Methods Differ in Fecal Microbiome Stability, Affecting Suitability for Field 508

Studies. mSystems 1:3. DOI: 10.1128/mSystems.00021-16. 509

24. Khan AS, Ng SHS, Vandeputte O, Aljanahi A, Deyati A, Cassart J-P, Charlebois RL, 510

Taliaferro LP. 2017. A Multicenter Study To Evaluate the Performance of High-511

Throughput Sequencing for Virus Detection. mSphere 2:5. DOI: 10.1128/mSphere.00307-512

17. 513

25. Krajden M, Minor JM, Rifkin O, Comanor L. 1999. Effect of multiple freeze-thaw cycles 514

on hepatitis B virus DNA and hepatitis C virus RNA quantification as measured with 515

branched-DNA technology. J Clin Microbiol 37: 1683-1686. 516

DOI: 10.1128/JCM.37.6.1683-1686.1999 517

26. Griffith BP, Rigsby MO, Garner RB, Gordon MM, Chacko TM. 1997. Comparison of the 518

Amplicor HIV-1 Monitor test and the nucleic acid sequence-based amplification assay for 519

quantitation of human immunodeficiency virus RNA in plasma, serum, and plasma 520

subjected to freeze-thaw cycles. J Clin Microbiol 35: 3288-3291. 521

27. Chan KH, Poon LLLM, Cheng VCC, Guan Y, Hung IFN, Kong J, Yam LYC, Seto WH, 522

Yuen KY, Peiris JSM. 2004. Detection of SARS Coronavirus in Patients with Suspected 523

SARS. Emerg Infect Dis 10: 294. DOI: 10.3201/eid1002.030610 524

28. Hirst JC, Hutchinson EC. 2019. Single-particle measurements of filamentous influenza 525

virions reveal damage induced by freezing. J Gen Virol 100:1631. DOI: 526

on May 21, 2021 by guest

http://aem.asm

.org/D

ownloaded from

Page 24: Downloaded from //aem.asm.org/content/aem/early/2020/09/09/AEM... · 2020. 9. 9. · 1 1 Potential applications of human viral metagenomics and reference materials : 2 considerations

24

10.1099/jgv.0.001330 527

29. Granados A, Petrich A, McGeer A, Gubbay JB. 2017. Measuring influenza RNA quantity 528

after prolonged storage or multiple freeze/thaw cycles. J Virol Methods 247:45-50. 529

https://doi.org/10.1016/j.jviromet.2017.05.018 530

30. Kleiner M, Hooper L V., Duerkop BA. 2015. Evaluation of methods to purify virus-like 531

particles for metagenomic sequencing of intestinal viromes. BMC Genomics 16:7. DOI 532

10.1186/s12864-014-1207-4. 533

31. Parras-Moltó M, Rodríguez-Galet A, Suárez-Rodríguez P, López-Bueno A. 2018. 534

Evaluation of bias induced by viral enrichment and random amplification protocols in 535

metagenomic surveys of saliva DNA viruses. Microbiome 6:119. 536

https://doi.org/10.1186/s40168-018-0507-3. 537

32. Ver BA, Melnick JL, Wallis C. 1968. Efficient Filtration and Sizing of Viruses with 538

Membrane Filters. J Virol 2:21-25. 539

33. Wylie TN, Wylie KM, Herter BN, Storch GA. 2015. Enhanced virome sequencing using 540

targeted sequence capture. Genome Res 25:1910-1920. DOI: 10.1101/gr.191049.115 541

34. Briese T, Kapoor A, Mishra N, Jain K, Kumar A, Jabado OJ, Ian Lipkina W. 2015. 542

Virome capture sequencing enables sensitive viral diagnosis and comprehensive virome 543

analysis. MBio 6:5. DOI: 10.1128/mBio.01491-15. 544

35. Duncavage EJ, Magrini V, Becker N, Armstrong JR, Demeter RT, Wylie T, Abel HJ, 545

Pfeifer JD. 2011. Hybrid capture and next-generation sequencing identify viral integration 546

sites from formalin-fixed, paraffin-embedded tissue. J Mol Diagnostics 13:325-333. 547

https://doi.org/10.1016/j.jmoldx.2011.01.006. 548

36. Xiao, M., Liu, X., Ji, J., Li, M., Li, J., Yang, L., Sun, W., Ren, P., Yang, G., Zhao, J. 2020. 549

on May 21, 2021 by guest

http://aem.asm

.org/D

ownloaded from

Page 25: Downloaded from //aem.asm.org/content/aem/early/2020/09/09/AEM... · 2020. 9. 9. · 1 1 Potential applications of human viral metagenomics and reference materials : 2 considerations

25

Multiple approaches for massively parallel sequencing of SARS-CoV-2 genomes directly 550

from clinical samples. Genome Med 12:57. 551

37. Illingworth CJR, Roy S, Beale MA, Tutill H, Williams R, Breuer J. 2017. On the effective 552

depth of viral sequence data. Virus Evol 3:2. https://doi.org/10.1093/ve/vex030. 553

38. Bekliz M, Brandani J, Bourquin M, Battin TJ, Peter H. 2019. Benchmarking protocols for 554

the metagenomic analysis of stream biofilm viromes. PeerJ 7: e8187. 555

https://doi.org/10.7717/peerj.8187. 556

39. Bergallo M, Costa C, Gribaudo G, Tarallo S, Baro S, Ponzi AN, Cavallo R. 2006. 557

Evaluation of six methods for extraction and purification of viral DNA from urine and 558

serum samples. New Microbiol 29:111-119. 559

40. Espy MJ, Patel R, Paya C V., Smith TF. 1995. Comparison of three methods for 560

extraction of viral nucleic acids from blood cultures. J Clin Microbiol 33:41-44. 561

41. Sathiamoorthy S, Malott RJ, Gisonni-Lex L, Ng SHS. 2018. Selection and evaluation of 562

an efficient method for the recovery of viral nucleic acid extraction from complex 563

biologicals. npj Vaccines 3:1-6. https://doi.org/10.1038/s41541-018-0067-3. 564

42. Van Etten JL, Dunigan DD, Nagasaki K, Schroeder DC, Grimsley N, Brussaard CPD, 565

Nissimov JI. 2020. Phycodnaviruses (Phycodnaviridae)☆. DOI:10.1016/B978-0-12-566

809633-8.21291-0. 567

43. Shoaib M, Baconnais S, Mechold U, Le Cam E, Lipinski M, Ogryzko V. 2008. Multiple 568

displacement amplification for complex mixtures of DNA fragments. BMC Genomics 569

9:1-14. https://doi.org/10.1186/1471-2164-9-415. 570

44. Reyes GR, Kim JP. 1991. Sequence-independent, single-primer amplification (SISPA) of 571

complex DNA populations. Mol Cell Probes 5:473-481. https://doi.org/10.1016/S0890-572

on May 21, 2021 by guest

http://aem.asm

.org/D

ownloaded from

Page 26: Downloaded from //aem.asm.org/content/aem/early/2020/09/09/AEM... · 2020. 9. 9. · 1 1 Potential applications of human viral metagenomics and reference materials : 2 considerations

26

8508(05)80020-9. 573

45. Marine RL, Magaña LC, Castro CJ, Zhao K, Montmayeur AM, Schmidt A, Diez-Valcarce 574

M, Ng TFF, Vinjé J, Burns CC. 2020. Comparison of Illumina MiSeq and the Ion Torrent 575

PGM and S5 platforms for whole-genome sequencing of picornaviruses and caliciviruses. 576

J Virol Methods 113865. https://doi.org/10.1016/j.jviromet.2020.113865. 577

46. Sutton TDS, Clooney AG, Ryan FJ, Ross RP, Hill C. 2019. Choice of assembly software 578

has a critical impact on virome characterisation. Microbiome 7:12. 579

https://doi.org/10.1186/s40168-019-0626-5. 580

47. Santiago-Rodriguez TM, Hollister EB. 2019. Human Virome and Disease: High-581

Throughput Sequencing for Virus Discovery, Identification of Phage-Bacteria Dysbiosis 582

and Development of Therapeutic Approaches with Emphasis on the Human Gut. Viruses 583

11:656. https://doi.org/10.3390/v11070656. 584

48. Tangherlini M, Dell’Anno A, Zeigler Allen L, Riccioni G, Corinaldesi C. 2016. Assessing 585

viral taxonomic composition in benthic marine ecosystems: Reliability and efficiency of 586

different bioinformatic tools for viral metagenomic analyses. Sci Rep 6:28428. 587

https://doi.org/10.1038/srep28428. 588

49. Ajami NJ, Wong MC, Ross MC, Lloyd RE, Petrosino JF. 2018. Maximal viral 589

information recovery from sequence data using VirMAP. Nat Commun 9:1-9. 590

https://doi.org/10.1038/s41467-018-05658-8. 591

50. Anantharaman K, Kieft K, Zhou Z. 2019. VIBRANT: Automated recovery, annotation 592

and curation of microbial viruses, and evaluation of virome function from genomic 593

sequences. bioRxiv, 855387. DOI: https://doi.org/10.1101/855387. 594

51. Ren J, Song K, Deng C, Ahlgren NA, Fuhrman JA, Li Y, Xie X, Poplin R, Sun F. 2020. 595

on May 21, 2021 by guest

http://aem.asm

.org/D

ownloaded from

Page 27: Downloaded from //aem.asm.org/content/aem/early/2020/09/09/AEM... · 2020. 9. 9. · 1 1 Potential applications of human viral metagenomics and reference materials : 2 considerations

27

Identifying viruses from metagenomic data using deep learning. Quant Biol 8:64-77. 596

https://doi.org/10.1007/s40484-019-0187-4. 597

52. Kozich JJ, Westcott SL, Baxter NT, Highlander SK, Schloss PD. 2013. Development of a 598

dual-index sequencing strategy and curation pipeline for analyzing amplicon sequence 599

data on the miseq illumina sequencing platform. Appl Environ Microbiol 79: 5112-5120. 600

DOI: 10.1128/AEM.01043-13. 601

53. Hornung BVH, Zwittink RD, Kuijper EJ. 2019. Issues and current standards of controls in 602

microbiome research. FEMS Microbiol Ecol 95: fiz045. 603

https://doi.org/10.1093/femsec/fiz045. 604

54. Leigh Greathouse K, Sinha R, Vogtmann E. 2019. DNA extraction for human microbiome 605

studies: The issue of standardization. Genome Biol 20:212. 606

https://doi.org/10.1186/s13059-019-1843-8. 607

55. Onywera H, Meiring TL. 2020. Comparative analyses of Ion Torrent V4 and Illumina V3-608

V4 16S rRNA gene metabarcoding methods for characterization of cervical microbiota: 609

taxonomic and functional profiling. Sci African: e00278. 610

https://doi.org/10.1016/j.sciaf.2020.e00278. 611

56. Ye SH, Siddle KJ, Park DJ, Sabeti PC. 2019. Benchmarking Metagenomics Tools for 612

Taxonomic Classification. Cell 178: 779-794. https://doi.org/10.1016/j.cell.2019.07.010. 613

57. Pollock J, Glendinning L, Wisedchanwet T, Watson M. 2018. The Madness of 614

Microbiome: Attempting To Find Consensus “Best Practice” for 16S Microbiome Studies. 615

Appl Environ Microbiol 84:7. DOI: 10.1128/AEM.02627-17. 616

58. Lopera, Juan; Brenton, Briana; Sohn, Jung-Woo; King, Stephen; Santiago-Rodriguez, 617

Tasha M; Hollister, Emily B; Wong, Matthew C; Ajami, Nadim; Wilder C. 2019. 618

on May 21, 2021 by guest

http://aem.asm

.org/D

ownloaded from

Page 28: Downloaded from //aem.asm.org/content/aem/early/2020/09/09/AEM... · 2020. 9. 9. · 1 1 Potential applications of human viral metagenomics and reference materials : 2 considerations

28

Development and Evaluation of Next-Generation Sequencing Standards for Virome 619

Research. Manassas, VA. White paper. 620

59. Mishra N, Fagbo SF, Alagaili AN, Nitido A, Williams SH, Ng J, Lee B, Durosinlorun A, 621

Garcia JA, Jain K, Kapoor V, Epstein JH, Briese T, Memish ZA, Olival KJ, Ian Lipkin W. 622

2019. A viral metagenomic survey identifies known and novel mammalian viruses in bats 623

from Saudi Arabia. PLoS One 14: e0214227. 624

https://doi.org/10.1371/journal.pone.0214227. 625

60. Zhang T, Wu Q, Zhang Z. 2020. Probable Pangolin Origin of SARS-CoV-2 Associated 626

with the COVID-19 Outbreak. Curr Biol 30: 1346-1351.e2. 627

https://doi.org/10.1016/j.cub.2020.03.022. 628

61. Wong MC, Cregeen SJJ, Ajami NJ, Petrosino JF. 2020. Evidence of recombination in 629

coronaviruses implicating pangolin origins of nCoV-2019. bioRxiv. 630

https://doi.org/10.1101/2020.02.07.939207. 631

62. Kumar R, Nagpal S, Kaushik S, Mendiratta S. 2020. COVID-19 diagnostic approaches: 632

different roads to the same destination. VirusDisease 31, 97–105. 633

https://doi.org/10.1007/s13337-020-00599-7. 634

63. Chehoud C, Dryga A, Hwang Y, Nagy-Szakal D, Hollister EB, Luna RA, Versalovic J, 635

Kellermayer R, Bushman FD. 2016. Transfer of viral communities between human 636

individuals during fecal microbiota transplantation. MBio 7:2. DOI: 10.1128/mBio.00322-637

16. 638

64. Santiago-Rodriguez TM, Ly M, Daigneault MC, Brown IHL, McDonald JAK, Bonilla N, 639

Vercoe EA, Pride DT. 2015. Chemostat culture systems support diverse bacteriophage 640

communities from human feces. Microbiome 3:1-16. https://doi.org/10.1186/s40168-015-641

on May 21, 2021 by guest

http://aem.asm

.org/D

ownloaded from

Page 29: Downloaded from //aem.asm.org/content/aem/early/2020/09/09/AEM... · 2020. 9. 9. · 1 1 Potential applications of human viral metagenomics and reference materials : 2 considerations

29

0124-3. 642

65. Prussin AJ, Belser JA, Bischoff W, Kelley ST, Lin K, Lindsley WG, Nshimyimana JP, 643

Schuit M, Wu Z, Bibby K, Marr LC. 2020. Viruses in the Built Environment (VIBE) 644

meeting report. Microbiome 8:1. https://doi.org/10.1186/s40168-019-0777-4. 645

66. Kufner V, Plate A, Schmutz S, Braun DL, Günthard HF, Capaul R, Zbinden A, Mueller 646

NJ, Trkola A, Huber M. 2019. Two years of viral metagenomics in a tertiary diagnostics 647

unit: Evaluation of the first 105 cases. Genes (Basel)10:661. 648

https://doi.org/10.3390/genes10090661. 649

650

Figure legends 651

Figure 1. Number of microbiome and virome publications. Panel A shows the number of 652

microbiome publications since 2000 (blue line) and virome publications since 2006 (green line). 653

Panel B shows an expanded graph of the number of viral metagenomic studies since 2006. 654

Figure 2. Description of a viral metagenomics pipeline. A viral metagenomics pipeline may 655

include sample collection, processing, sequencing and bioinformatics. Figure shows several of 656

the steps that may add biases to the results. 657

Figure 3. Potential applications of viral metagenomics in ongoing and future fields related to 658

database expansion, surveillance, identification of viral relationships with health and disease, 659

reagents development, among others. 660

661

Authors Bios: 662

Emily Hollister received her Ph.D. in Molecular and Environmental Plant Sciences from Texas 663

A&M University and completed her postdoctoral training in Molecular Microbial Ecology at 664

on May 21, 2021 by guest

http://aem.asm

.org/D

ownloaded from

Page 30: Downloaded from //aem.asm.org/content/aem/early/2020/09/09/AEM... · 2020. 9. 9. · 1 1 Potential applications of human viral metagenomics and reference materials : 2 considerations

30

Texas A&M as well. She currently serves as the Director, R&D, Computational Biology for 665

Diversigen, where she and her team focus on the development and implementation of novel 666

solutions for 'omics data analysis. She has pursued a variety of research topics related to the 667

microbiome and human health over the course of the last 10 years. 668

669

Tasha M. Santiago-Rodriguez received her Ph.D. in Biology with emphasis on Public Health 670

Water Microbiology from the University of Puerto Rico characterizing bacteriophages as 671

markers of human fecal contamination. She completed her postdoctoral training in virome and 672

ancient microbiomes at the University of California, San Diego, and California Polytechnic State 673

University, San Luis Obispo, respectively. She is currently an R&D Bioinformatician at 674

Diversigen, and has been in the microbiome field for over 7 years. 675

676

on May 21, 2021 by guest

http://aem.asm

.org/D

ownloaded from

Page 31: Downloaded from //aem.asm.org/content/aem/early/2020/09/09/AEM... · 2020. 9. 9. · 1 1 Potential applications of human viral metagenomics and reference materials : 2 considerations

on May 21, 2021 by guest

http://aem.asm

.org/D

ownloaded from

Page 32: Downloaded from //aem.asm.org/content/aem/early/2020/09/09/AEM... · 2020. 9. 9. · 1 1 Potential applications of human viral metagenomics and reference materials : 2 considerations

on May 21, 2021 by guest

http://aem.asm

.org/D

ownloaded from

Page 33: Downloaded from //aem.asm.org/content/aem/early/2020/09/09/AEM... · 2020. 9. 9. · 1 1 Potential applications of human viral metagenomics and reference materials : 2 considerations

on May 21, 2021 by guest

http://aem.asm

.org/D

ownloaded from

Page 34: Downloaded from //aem.asm.org/content/aem/early/2020/09/09/AEM... · 2020. 9. 9. · 1 1 Potential applications of human viral metagenomics and reference materials : 2 considerations

Table 1. Assemblers performance. Performance was previously evaluated using, among many factors, the number of false positives,

false negatives, true positives, number of contigs and sensitivity. Links to the assembler sources are also shown. Modified from [46].

Assembler

False

positives

False

negatives

True

positive

Number of

contigs Sensitivity Source

ABySS (v2.0.2) (k-mer 63) 52 4 8 61 66.67 http://www.bcgsc.ca/downloads/abyss/

ABySS (v2.0.2) (k-mer 127) 50 6 6 56 50 http://www.bcgsc.ca/downloads/abyss/

CLC (v5.0.5) 1143 0 12 1299 100 https://www.qiagenbioinformatics.com/products/clc-assembly-cell/

Geneious (v11.0.3) 53 0 12 65 100 https://www.geneious.com/features/assembly-mapping/

IDBA UD (v1.1.1) 0 0 12 12 100 https://i.cs.hku.hk/~alse/hkubrg/projects/idba_ud

MEGAHIT (v1.1.1-2) 0 0 12 13 100 https://github.com/voutcn/megahit

MetaVelvet (v1.2.02) 0 3 9 26 75 https://metavelvet.dna.bio.keio.ac.jp/

MIRA (v4.0.2) 0 0 12 89 100 http://www.chevreux.org/mira_downloads.html

Ray Meta (v2.3.0) 0 0 12 12 100 http://denovoassembler.sourceforge.net/

SOAPdenovo2 (v2.04) 2 0 12 23 100 http://soap.genomics.org.cn/soapdenovo.html

SPAdes(v3.10.0) 0 0 12 14 100 http://cab.spbu.ru/software/spades/

SPAdes meta (v3.10.0) 0 0 12 14 100 http://cab.spbu.ru/software/spades/ (variation of SPAdes applied with flag)

SPAdes sc 1513 0 12 1527 100 http://cab.spbu.ru/software/spades/ (variation of SPAdes applied with flag)

SPAdes sc careful 0 0 12 15 100 http://cab.spbu.ru/software/spades/ (variation of SPAdes applied with flag)

Velvet (v1.2.10) 0 3 9 26 75 https://www.ebi.ac.uk/~zerbino/velvet/

VICUNA (v1.3) 4969 0 12 5385 100 https://github.com/broadinstitute/mvicuna

on May 21, 2021 by guest

http://aem.asm

.org/D

ownloaded from