Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
Yutin and Koonin Virology Journal 2012, 9:161http://www.virologyj.com/content/9/1/161
RESEARCH Open Access
Hidden evolutionary complexity ofNucleo-Cytoplasmic Large DNA viruses ofeukaryotesNatalya Yutin and Eugene V Koonin*
Abstract
Background: The Nucleo-Cytoplasmic Large DNA Viruses (NCLDV) constitute an apparently monophyletic groupthat consists of at least 6 families of viruses infecting a broad variety of eukaryotic hosts. A comprehensive genomecomparison and maximum-likelihood reconstruction of the NCLDV evolution revealed a set of approximately 50conserved, core genes that could be mapped to the genome of the common ancestor of this class of eukaryoticviruses.
Results: We performed a detailed phylogenetic analysis of these core NCLDV genes and applied the constrainedtree approach to show that the majority of the core genes are unlikely to be monophyletic. Several of the coregenes have been independently acquired from different sources by different NCLDV lineages whereas for themajority of these genes displacement by homologs from cellular organisms in one or more groups of the NCLDVwas demonstrated.
Conclusions: A detailed study of the evolution of the genomic core of the NCLDV reveals substantial complexityand diversity of evolutionary scenarios that was largely unsuspected previously. The phylogenetic coherencebetween the core genes is sufficient to validate the hypothesis on the evolution of all NCLDV from a commonancestral virus although the set of ancestral genes might be smaller than previously inferred from patterns of genepresence-absence.
BackgroundViruses are ubiquitous, obligate, intracellular parasites ofall cellular life forms that rely on the host cell translationsystem, metabolism and, in many cases, the replicationand transcription systems, for their reproduction. Thereis no evidence that all viruses have a monophyletic ori-gin, at least not under the traditional concept of mono-phyly. Indeed, not a single gene is conserved in thegenomes of all known viruses although a small group of“viral hallmark genes” encoding some of the key proteinsinvolved in genome replication and virion structure for-mation are shared by large, diverse subsets of viruses[1,2]. However, several large groups of viruses infectingdiverse hosts do appear to share common ancestry inthe strict sense, that is, to have evolved from the sameancestral virus, as indicated by the conservation of sets
* Correspondence: [email protected] Center for Biotechnology Information, National Library of Medicine,National Institutes of Health, Bethesda, MD 20894, USA
© 2012 Yutin and Koonin; licensee BioMed CeCreative Commons Attribution License (http:/distribution, and reproduction in any medium
of genes encoding proteins responsible for many func-tions essential for virus reproduction.One of the largest viral divisions that seem to be
monophyletic includes 6 recognized families and a 7th
candidate family of viruses with large DNA genomesthat infect diverse eukaryotes and are collectively knownas Nucleo-Cytoplasmic Large DNA Viruses (NCLDV)[3-6]. The formally recognized NCLDV families are Pox-viridae, Asfarviridae, Iridoviridae, Ascoviridae, Phycod-naviridae, and Mimiviridae; in addition, the recentlydiscovered Marseillevirus and the related Lausanneviruscould not be assigned to any of the 6 families, and arelikely to become founding members of a new family[7,8]. Hereinafter we speak of 7 NCLDV families for thesake of simplicity.By far the most thoroughly studied group of the
NCLDV are the Poxviridae, the family of animal virusesthat include a major human pathogen, the smallpoxvirus, important animal pathogens, such as rabbit
ntral Ltd. This is an Open Access article distributed under the terms of the/creativecommons.org/licenses/by/2.0), which permits unrestricted use,, provided the original work is properly cited.
Yutin and Koonin Virology Journal 2012, 9:161 Page 2 of 18http://www.virologyj.com/content/9/1/161
myxoma virus, as well as vaccinia virus (VACV), one ofthe best characterized models of molecular biology [9].Another family of the NCLDV that has recently becomea major focus of attention is the Mimiviridae thatincludes giant viruses infecting amoeba and probablyalgae [10-13]. The genome of the prototype virus of thisfamily, Acanthamoeba polyphaga Mimivirus [14],slightly exceeds one megabase (Mb), and other relatedviruses possess even larger genomes [15,16], so theMimiviridae are undisputed genome size record holdersin the virosphere. Indeed, in terms of genome size andcomplexity, the NCLDV eclipse numerous parasitic bac-teria, and approach the simplest free-living prokaryotes.The NCLDV infect animals and diverse unicellular
eukaryotes, and either replicate exclusively in the cyto-plasm of the host cells, or encompass both cytoplasmicand nuclear stages in their life cycle. Most of theNCLDV do not strongly depend on the host replicationor transcription systems for completing their replication[9,17]. The autonomous life style of the NCLDV is sup-ported by a set of conserved proteins that are encodedin the viral genomes and mediate most of the processesessential for viral reproduction. These essential, con-served proteins include DNA polymerases, helicases,and primases responsible for DNA replication, RNApolymerase subunits and transcription factors that func-tion in transcription initiation and elongation, Hollidayjunction resolvases and topoisomerases involved in gen-ome DNA processing and maturation, ATPase pumpsmediating DNA packaging, molecular chaperonesinvolved in capsid assembly and capsid proteins them-selves [3-5]. Although several viral hallmark genes areshared by NCLDV and other large DNA viruses, such asherpesviruses, baculoviruses and some bacteriophages[2], the conservation of the large set of core genes clearlydemarcates the NCLDV as a distinct, most likely mono-phyletic class of viruses [4,6]. More specifically, recon-structions of the ancestral NCLDV genome compositionusing maximum parsimony and maximum likelihoodmethods have delineated a set of approximately 50 genesthat are inferred to have been responsible for the keyfunctions in the reproduction of the last common ances-tor of the NCLDV [5].The core, presumably ancestral set of the NCLDV
genes was delineated using sequence similarity-basedmethods that have been previously employed for identi-fication of clusters of orthologous genes in diverse cellu-lar life forms. The comparative genomic analysisunderlying these reconstructions was deliberately limitedto the NCLDV genomes to simplify the analysis and tofacilitate detection of distant relationships between viralproteins. Indeed, some of the core NCLDV proteins,such as for example the packaging ATPases and the di-sulfide chaperones, show only weak sequence similarity
between the viral families. Moreover, for some of theNCLDV genes with important functions in virusreproduction, indications of complex evolutionary his-tories have been obtained. The showcase for such evolu-tionary complexity is the viral DNA ligase which isrepresented by two distantly related forms across theNCLDV families. A maximum likelihood reconstructionbased on the presence-absence of conserved genes inthe viral genomes has implied that one of the two forms,the ATP-dependent DNA ligase, was the ancestral formthat was present in the genome of the prototypeNCLDV but was replaced by the distantly relatedNAD-dependent ligase in several viral lineages. How-ever, when the reconstruction was supplemented byphylogenetic analysis of the two forms of DNA ligase,the opposite conclusion has been reached, namely thatthe NAD –dependent ligase was the ancestral form inNCLDV that was displaced by the ATP-dependent lig-ase on several independent occasions [18]. This changein perspective occurred because phylogenetic analysisindicated that the ATP-dependent ligases from differentlineages of the NCLDV clustered with distinct groups ofeukaryotic homologs whereas the NAD–dependentligases of the NCLDV appeared to be monophyletic. Inthe same vein, complex phylogenies suggestive of mul-tiple horizontal gene transfer have been observed forseveral core NCLDV genes such as thymidine kinase orthe two subunits of ribonucleotide reductase [19].Taken together, these findings imply that some of the
apparently conserved genes of the NCLDV might actu-ally have complex histories which could include inde-pendent (convergent) acquisition of these genes fromdifferent cellular organisms as well as displacement ofancestral viral gene by homologs of cellular provenance.This line of reasoning prompted us to perform a com-prehensive phylogenetic analysis of the set of the puta-tive ancestral NCLDV genes. Here we present the resultsof this analysis which suggest that, although the exist-ence of a common ancestor of the NCLDV is beyondreasonable doubt, most of the conserved NCLDV genesindeed had complex evolutionary histories.
ResultsApproach and rationaleMaximum likelihood reconstruction of the gene reper-toire of the putative ancestral NCLDV has revealed acore consisting of approximately 50 viral genes (Table 1).Most of these genes are present in various subsets of theNCLDV genomes rather than all genomes (Additionalfile 1) but high likelihoods of ancestral provenance havebeen estimated for each of them, with the absences ac-cordingly attributed to lineage-specific gene loss. Weconstructed multiple alignments and position-specificscoring matrices for each of the ancestral NCLDV genes
Table 1 Evolutionary scenarios for the ancestral NCLDV genes
NCVOG [3,4] Vacciniavirus genenumbera
Functionalcategory
# of viralfamilies/genomes
NCVOG annotation Evolutionary scenario from constrainedtree analysis
NCVOG0038 E9L DNA replication,recombinationand repair
6/45 DNA polymerase elongationsubunit family B
Monophyletic NCLDV, common origin withbaculoviruses and possibly herpesviruses
NCVOG0023 D5R DNA replication,recombinationand repair
6/45 D5-like helicase-primase Monophyletic NCLDV but displaced by abacteriophage homolog in phycodnaviruses.
NCVOG1060 G5R DNA replication,recombinationand repair
5/35 FLAP-like endonucleaseXPG (cd00128)
Possibly monophyletic NCLDV althoughdisplacement by a phage homolog inpoxviruses cannot be ruled out; loss inphycodnaviruses and subsequent acquisition ofa eukaryotic homolog by E. huxlei virus
NCVOG0036 H6R DNA replication,recombinationand repair
3/23 DNA topoisomerase IB Possibly monophyletic NCLDV
NCVOG0037 None DNA replication,recombinationand repair
6/15 DNA topoisomerase II Polyphyletic NCLDV, gene acquiredfrom different eukaryotes
NCVOG0035 None DNA replication,recombinationand repair
3/7 NAD+dependent DNAligase (smart00532)
Monophyletic NCLDV
NCVOG0034 A50R DNA replication,recombinationand repair
4/19 ATP-dependentDNA ligase(pfam01068, PRK01109)
Polyphyletic NCLDV, gene acquired fromdifferent eukaryotes
NCVOG0278 A22R DNA replication,recombinationand repair
5/36 RuvC, Holliday junctionresolvases (HJRs); cl00243.Extended Pox_A22,Poxvirus A22 family(pfam04848). Marseillevirus protein lacks C-termconserved positions.
Insufficient sequenceconservation for reliable phylogenetic analysis
NCVOG0004_APa None DNA replication,recombinationand repair
4/6 AP (apurinic)endonuclease family 2
Strongly supported monophyly of NCLDV
NCVOG0004_NTa None DNA replication,recombinationand repair
3/4 Nucleotidyltransferase(DNA polymerasebeta family)
Polyphyletic NCLDV, gene acquired fromdifferent eukaryotes
NCVOG1192 None DNA replication,recombinationand repair
4/13 YqaJ viral recombinasefamily (pfam09588)
Possibly monophyletic NCLDV
NCVOG0024 None DNA replication,recombinationand repair
3/4 Superfamily II helicaserelated to herpesvirusreplicative helicase(origin-binding proteinUL9), pfam03121
Limited sequence similarity between proteinsfrom different NCLDV; probable polyphyleticorigin; possibly not an ancestral NCLDV gene
NCVOG1115 D4R DNA replication,recombinationand repair
3/23 Uracil-DNA glycosylase Present in only a few NCLDV; uncertain,monophyly of the NCLDV cannot be ruled out
NCVOG0274 J6R Transcriptionand RNAprocessing
6/36 DNA-directed RNApolymerase subunitalpha
Displacement of ancestral NCLDV gene inmimivirus and ASFV with eukaryotic RNAP2 andRNAP1 subunit genes, respectively; loss in mostphycodnaviruses. Ancestral NCLDV gene possiblyderived from eukaryote RNAP I
NCVOG0271 A24R Transcription andRNA processing
6/36 DNA-directed RNApolymerasesubunit beta
Possibly polyphyletic NCLDV, with one subsetderived from RNAP1. and another from RNAP2;likely more recent displacement with RNAP2 inmimivirus; however, monophyly of NCLDV cannotbe ruled out; loss in most phycodnaviruses.
Yutin and Koonin Virology Journal 2012, 9:161 Page 3 of 18http://www.virologyj.com/content/9/1/161
Table 1 Evolutionary scenarios for the ancestral NCLDV genes (Continued)
NCVOG0273 G5.5R Transcription andRNA processing
5/15 DNA-directed RNApolymerase subunit 5
Uncertain, not enough sequence conservationfor reliable phylogenetic analysis
NCVOG1164 A1L Transcription andRNA processing
6/44 A1L transcriptionfactor/late transcriptionfactor VLTF-2
Uncertain, not enough sequence conservationfor reliable phylogenetic analysis
NCVOG0262 A2L Transcription andRNA processing
6/45 Poxvirus LateTranscription FactorVLTF3 like
No obvious homologs outside NCLDV(monophyly of NCLDV by default).
NCVOG0261 A7L Transcription andRNA processing
5/35 Poxvirus earlytranscription factor(VETF), large subunit(pfam04441)
No obvious homologs outside NCLDV(monophyly of NCLDV by default).
NCVOG0272 E4L Transcription andRNA processing
6/39 Transcription factor S-II(TFIIS)-domain-containing protein
Uncertain, not enough sequence conservationfor reliable phylogenetic analysis
NCVOG1127 One Transcription andRNA processing
4/11 transcription initiationfactor IIB
Strongly supported monophyly of NCLDV
NCVOG0076 A18R Transcription andRNA processing
6/38 DNA or RNA helicases ofsuperfamily II (COG1061)
Monophyletic NCLDV except for displacementwith a eukaryotic homolog in one phycodnavirusand acquisition of a distinct eukaryotic paralogin ASFV
NCVOG0267 I8R Transcription andRNA processing
3/23 RNA-helicase DExH-NPH-II Monophyletic NCLDV; independent losses inseveral NCLDV lineages
NCVOG1117 D1R Transcription andRNA processing
6/33 mRNA capping enzymelarge subunit
Complex, variable domain architecture; apparentmonophyly of NCLDV for the conservedmethyltransferase domain; guanylyltransferases ofapparent distinct eukaryotic origin in a singleiridovirus and a single phycodnavirus
NCVOG0236 D9R, D10R Transcription andRNA processing
6/29 Nudix hydrolase Uncertain, not enough sequence conservation forreliable phylogenetic analysis
NCVOG1088 None Transcription andRNA processing
3/13 RNA ligase Monophyletic NCLDV; common origin withbaculovirus and possibly bacteriophage homologs
- J3R Transcription andRNA processing
2/16 PolyA polymerase,regulatory subunit
Present only in poxviruses and one mimivirus;however, monophyly strongly supported; possibleancestral gene
NCVOG0276 F4L Nucleotidemetabolism
6/29 Ribonucleotide reductasesmall subunit
Polyphyletic NCLDV, complex evolutionaryscenario; ancestral status uncertain; trees similarfor the large and small subunits
NCVOG1353 I4L Nucleotidemetabolism
6/24 ribonucleoside diphosphatereductase, large subunit
Polyphyletic NCLDV, complex evolutionaryscenario; ancestral status uncertain; trees similarfor the large and small subunits
NCVOG0319 J2R Nucleotidemetabolism
5/20 Thymidine kinase Most likely polyphyletic NCLDV althoughmonophyly could not be statistically rejected
NCVOG0320 A48R Nucleotidemetabolism
4/21 Thymidylate kinase Most likely polyphyletic NCLDV althoughmonophyly could not be statistically rejected
NCVOG1068 F2L Nucleotidemetabolism
4/30 dUTPase (cl00493) Polyphyletic NCLDV, monophyly rejected
NCVOG0022 D13L Virion structure andmorphogenesis
6/45 NCLDV major capsidprotein
No homologs outside NCLDV – monophyleticNCLDV by default
NCVOG0249 A32L Virion structure andmorphogenesis
6/45 A32-like packaging ATPase Only distant homologs outside NCLDV –monophyletic NCLDV by default
NCVOG0211 F9L, L1R Virion structure andmorphogenesis
4/34 myristylated envelopeprotein
No homologs outside NCLDV – monophyleticNCLDV by default
NCVOG1122 G9R, J5L,A16L
Virion structure andmorphogenesis
3/31 Myristylated protein;pfam03003, DUF230
No homologs outside NCLDV – monophyleticNCLDV by default
NCVOG0052 E10R Virion structure andmorphogenesis
6/44 disulfide (thiol)oxidoreductase/isomerase;Erv1 / Alr family (pfam04777)
Monophyletic NCLDV
Yutin and Koonin Virology Journal 2012, 9:161 Page 4 of 18http://www.virologyj.com/content/9/1/161
Table 1 Evolutionary scenarios for the ancestral NCLDV genes (Continued)
NCVOG0256 H3L Virion structure andmorphogenesis
2/22 envelope protein,glycosyltransferase
Apparent monophyletic NCLDV but representedonly in poxviruses and mimiviruses; independentacquisition cannot be ruled out
NCVOG0040 H1L Signal transduction,regulation
4/30 cd00127, DSPc, Dualspecificity phosphatases(DSP); Ser/Thr and Tyrprotein phosphatases
Most likely independent acquisitions bydifferent NCLDV
NCVOG0330 VACWR012,VACWR207
Signal transduction,regulation
5/26 RING-finger-containing E3ubiquitin ligase(COG5432: RAD18)
Most likely independent acquisition by differentgroups of the NCLDV; probably not an ancestralgene
NCVOG0329 Signal transduction,regulation
2/3 UBCc, Ubiquitin-conjugatingenzyme E2 (cd00195)
Present only in mimivirus and ASFV; independentacquisitions from eukaryotes, NCLDV monophylyrejected; not an ancestral gene
NCVOG0246 Signal transduction,regulation
3/4 Ulp1-like protease/deubiquitinating enzyme
Uncertain, highly diverged NCLDV sequences
NCVOG0009 Virus-hostinteractions
3/4 pfam00653: BIR (BaculovirusInhibitor of apoptosisprotein Repeat) domain
Most likely independent acquisition bydifferent groups of the NCLDV; probably notan ancestral gene
NCVOG0012 A33R, A44R,A40R
Virus-hostinteractions
3/20 C-type lectin: smart00034,cd03594, cd03593, pfam00059,cd00037, pfam05966
Poorly conserved sequence, most likelyindependent acquisitions by different groupsof NCLDV; probably not an ancestral gene
NCVOG1361 Uncharacterized 6/11 T5orf172 domain (pfam10544) Different domain architectures; most likelyindependent acquisition by different groupsof the NCLDV; probably not an ancestral gene
NCVOG1360 VACWR011,VACWR208
Uncharacterized 3/18 KilA domain (pfam04383);always present at proteinN-termini except formimiviruses. In some proteinsfused to the a RING-fingerdomain.
Different domain architectures; most likelyindependent acquisition by different groupsof the NCLDV; probably not an ancestral gene
NCVOG1424 Uncharacterized 3/6 Uncharacterized domain;found downstream of KilA,BRO, and MSV199 domains.Also found in somebaculoviruses.
Different domain architectures; most likelyindependent acquisition by different groupsof the NCLDV; probably not an ancestral gene
NCVOG0010 Uncharacterized 4/11 pfam02498: Bro-N; BROfamily, N-terminal domain:This family includes theN-terminus of baculovirusBRO and ALI motif proteins.
Different domain architectures; most likelyindependent acquisition by different groupsof the NCLDV; probably not an ancestral gene
NCVOG0059 Uncharacterized 2/3 FtsJ-like methyltransferasefamily proteins (pfam01728)
Present only in mimivirus and ASFV; monophylycannot be ruled out
aThe systematic gene names are for the Copenhagen strain of Vaccinia virus except for the names starting with VACWR which are from the WR strain because therespective genes are missing/disrupted in the Copenhagen strain.
Yutin and Koonin Virology Journal 2012, 9:161 Page 5 of 18http://www.virologyj.com/content/9/1/161
and used these to search for homologs from cellularorganisms and other viruses. The resulting sequence setswere clustered to include all NCLDV proteins along withrepresentatives of as many major lineages of cellularorganisms as possible but also to trim the sets down to asize amenable to detailed phylogenetic analysis (seeMethods for the details). For the sequence sets thusselected, ML phylogenetic trees were constructed andexamined using the constrained tree approach (seeMethods). Specifically, for all trees in which the NCLDVdid not appear as a single strongly supported clade, thelikelihood of the original tree was compared with thelikelihood of a tree constrained for monophyly of
NCLDV. When justified, additional constrained trees,with topologies corresponding to plausible evolutionaryscenarios, were analyzed for individual genes. Below wepresent the results of the constrained phylogenetic treeanalysis of the inferred ancestral NCLDV genes.
Genes involved in genome replication, recombination andrepairThe DNA replication of the NCLDV is largely independ-ent of the host replication, and accordingly, genes en-coding protein components of the complex replicationmachinery represent a major part of the NCLDV genecore (Table 1). This is the largest group in the ancestral
Yutin and Koonin Virology Journal 2012, 9:161 Page 6 of 18http://www.virologyj.com/content/9/1/161
NCLDV gene set that includes 13 genes two of which,the DNA polymerase (DNAP) and the primase-helicase,are central and indispensable to replication and areshared by all viruses of this class (it has been claimedthat the primase-helicase was missing in some phycod-naviruses [20]; however, in the process of NCVOG con-struction [5], this gene was identified in all completeNCLDV genomes, the high sequence divergence in someof the viruses notwithstanding). The B family of DNAPs,to which the NCLDV polymerases belong, includes themain replicative polymerases of all archaea and eukar-yotes as well as many bacterial, archaeal and eukaryoticviruses. The unconstrained phylogenetic tree for theDNAP failed to recover an NCLDV clade. Instead, themajority of the NCLDV DNAPs along with the DNAPsof herpesviruses formed a clade with eukaryotic DNAPdelta and zeta; a sister group to this large branch was aclade that consisted of poxvirus, asfarvirus and baculo-virus DNAPs (Figure 1A). However, constrained trees inwhich monophyly of the NCLDV was enforced couldnot be rejected by statistical tests (Additional file 2).Moreover, among the tested trees, the tree in which theNCLDV clade also included baculoviruses (as the sistergroup to asfarviruses) and herpesviruses as the sistergroup to the composite NCLDV-baculovirus clade, hadthe highest associated likelihood although none of theanalyzed tree topologies could be statistically rejected(Figure 1B and Additional file 2). Thus, the results of theconstrained tree analysis are compatible with the mono-phyly of the DNAPs of all NCLDV and moreover of alllarge DNA viruses of eukaryotes. This conclusion con-trasts the results of several previous phylogenetic ana-lyses that failed to recover an NCLDV clade but did notreport attempts to analyze constrained trees [21-23]. Itseems likely that the apparent non-monophyly of theNCLDV in previous studies was caused by branch lengtheffects, in particular acceleration of evolution in some ofthe viruses resulting in long branch attraction [24] (notethe branch length, in particular for Poxviridae inFigure 1A,B) and possibly other artifacts of phylogeneticanalysis.Phylogenetic analysis of the second hallmark viral rep-
lication gene, the primase-helicase, revealed a stronglysupported clade that included 6 of the 7 NCLDV fam-ilies: phycodnaviruses belonged to a different, also well-supported branch together with diverse bacteriophageand bacterial (most likely, prophage) homologs(Figure 1C). In this case, the constrained tree topologythat included an NCLDV clade was rejected at a statisti-cally significant level (Additional file 2). Thus, the resultof the phylogenetic analysis of the NCLDV primase-helicase appears to be best compatible with the presenceof this gene in the ancestral NCLDV followed bydisplacement with a homologous and functionally
analogous protein from a bacteriophage in the ancestorof phycodnaviruses. This appears to be a typical case ofxenologous gene displacement [25]).The third gene involved in replication that is present
in a large majority of the NCLDV encodes a FLAP nu-clease, an enzyme that is also ubiquitous in cellular lifeforms and removes single-stranded overhangs from rep-lication and recombination intermediates. Nucleases ofthis family are essential for replication and in particularrecombination in poxviruses [26] but can also assumeother functions such as mRNA degradation as is thecase in herpesviruses [27]. The phylogenetic tree ofFLAP nucleases did not include an NCLDV clade. In-stead, poxviruses clustered with bacterial homologs, theonly representative in phycodnaviruses (Emiliana huxleivirus) placed within the eukaryotic branch, and the restof the NCLDV formed a clade with the herpesvirushomologs (Figure 1D). However, a constrained tree withan NCLDV clade (excluding the Emiliana huxlei virus)could not be statistically rejected (Additional file 2). Theplacement of E. huxlei virus within the eukaryotic sub-tree was supported by high boostrap values, and mono-phyly of this virus with the rest of the NCLDV wasweakly supported (AU value <0.1) even if not firmlyrejected (Additional file 2). At present we cannot confi-dently conclude whether the poxvirus FLAP nuclease ismonophyletic with those of other NCLDV or evolvedthrough displacement of the ancestral form with a phagehomolog. However, even the simplest evolutionary sce-nario for this gene involves loss of the ancestral gene inphycodnaviruses followed by regain of the eukaryotichomolog by the E. huxlei virus.The remaining genes in this group are less common in
NCLDV although the ML reconstruction mapped themto the last common ancestor; the implication of the in-ferred ancestral status of these genes is that they havebeen lost on multiple occasions during the evolution ofthe NCLDV. Topoisomerase IB is represented in pox-viruses, one phycodnavirus and mimiviruses. In thephylogenetic tree, poxviruses form a clade with the phy-codnavirus but the position of the mimivirus is uncertain(Figure 2A). The monophyly of the NCLDV could notbe statistically rejected, so for this gene there is noconvincing indication of displacement in any of theNCLDV.A greater number of NCLDV encode Topoisomerase
II that is unrelated to Topoisomerase IB. The phylogen-etic tree of Topo II (Figure 2B) did not include anNCLDV clade, and monophyly of the NCLDV could bestatistically rejected, with the implication of independentacquisition of the Topo II gene from eukaryotes inmimiviruses and from a prokaryotic source in crocodilepoxvirus, with the provenance of this gene in the rest ofthe NCLDV remaining uncertain (Additional file 2). In
Archaea B1
Archaea B3
Eukaryotic alpha
Eukaryotic delta
Eukaryotic zeta
IridoviridaeMarseillevirus
Mimiviridae
Phycodnaviridae
Phycodnaviridae (Phaeovirus)
Herpesviridae
100
82100
100
100
10077
72
86
84
Poxviridae
Baculoviridae
Asfarviridae100
9457
100
100
86
72
6080
10054
78
100
100
Ascoviridae100
94
0.2
A B
Bacteria
Bacteria/phages
Bf Lacre148543897
Bacteria/phages
Bp Desps51244031
Bp Dessa242279615
Irido/Ascoviridae
Mimiviridae
Asfarviridae
Poxviridae
88
81
10091
59
77
69
52
100
99
89
69
88
94
100
92
96
0.5
Marseillevirus
Phycodnaviridae
C D
Archaea B1
Eukaryotic alpha
Eukaryotic delta
Eukaryotic zeta
IridoviridaeMarseillevirus
Mimiviridae
Phycodnaviridae
Phycodnaviridae
Herpesviridae
100
Archaea B389
99
100
91
100 Poxviridae
Baculoviridae
Asfarviridae100
10052
93
70
100
10089
91
98
10060
5861
96
76
67
100
0.2
Ascoviridae100
94
Ascoviridae
IridoviridaeMarseillevirusenvironmental sequences
environmental sequences
Mimivirusenvironmental sequences
52
ArchaeaBacteria
Poxviridae
99
100
10090
58
93
0.2
q2 Emihu73852488Ea Entdi167391034
El Enccu19173066El Macmu109105914
E9 Arath42570539E8 Phatr219125938
Ew Triva123335114Ec Tetth146171182
Ek Leibr154339662100
99
61
87
78
65 63
97
54
85
DNA polymerase DNA polymerase, constrained
Primase-helicase FLAP endonuclease
Herpesviridae
Figure 1 Phylogenetic trees of ancestral NCLDV genes involved in DNA replication. (A). DNA polymerase, the original ML tree. (B). DNApolymerase, constrained tree, monophyly of NCLDV, baculoviruses and herpesviruses enforced. (C). Primase-helicase. (D). FLAP endonuclease.Branches with bootstrap support less than 50 were collapsed. For each sequence, the species name abbreviation and the gene identificationnumbers are indicated. Species abbreviations: Arath, Arabidopsis thaliana; Desps, Desulfotalea psychrophila LSv54; Dessa, Desulfovibrio salexigensDSM 2638; Emihu, Emiliania huxleyi virus 86; Enccu, Encephalitozoon cuniculi GB-M1; Entdi, Entamoeba dispar SAW760; Lacre, Lactobacillus reuteriDSM 20016; Leibr, Leishmania braziliensis MHOM/BR/75/M2904; Macmu, Macaca mulatta; Phatr, Phaeodactylum tricornutum CCAP 1055/1; Tetth,Tetrahymena thermophila; Triva, Trichomonas vaginalis G3; Bf, Firmicutes; Bp, Proteobacteria; E8, Stramenopiles; E9, Viridiplantae; Ea, Amoebozoa; Ec,Alveolata; Ek, Euglenozoa; El, Opisthokonta; Ew, Parabasalidea; q2, Coccolithovirus.
Yutin and Koonin Virology Journal 2012, 9:161 Page 7 of 18http://www.virologyj.com/content/9/1/161
Bacteria
MimivirusPoxviridae
q6 Ostta290343645env136436120
environmental sequences
Eukaryotes
At Nitma161528830
9554
100
9885
57
50
10045
0.2
Eukaryores
Mimivirusenv142094046q6 Ostvi163955216
env142904322q1 Parbu157953897
env144065368l1 Invir109287964
l2 Invir15078758Marseillevirus
q2 Emihu73852918c1 Afrsw9628219
u1 Crovi115531754
Archaea, Bacteria100
87
8888
100100
50
50
89
10096
80
71
0.2
u2 Melsa9631347u2 Amsmo9964524
c1 Afrsw9628240env134945261
env139705312env142958695Mimivirus
env143874461env136260467Marseillevirus
env136708238
Bacteria, Archaea, Eukaryotes
10052
68
51
78
68
5491
100
0.2
Mimivirusenv143530885
env138728848
Eukaryotes
env134864780env143744333
environmental sequencesu2 Amsmo9964524
u2 Melsa9631347env136814837
env136279503c1 Afrsw9628204
Archaea, Bacteria9751
98
8574
53
98
76
92
75
10082
0.2
A B
C
D
E9 Arath30697546E9 Sorbi242073512
E9 Phypa168007250Bp Ruesp99080624
E9 Micsp255088687E9 Ostlu145350267
q1 Parbu9631735env142361341env143316185q6 Ostvi163955119
c1 Afrsw9628216environmental sequences
environmental sequencesMimivirus
Eukaryotes
Bacteria, phahes100
env135373769u2 Amsmo9964554
env144017490q1 Parbu9632034
q3 Ectsi13242536
7688
99
55
56100
64
9983
67
6549
55
70
85
99
79
74
6087
70
E
Topoisomerase IB DNA Topoisomerase II
AP endonuclease 2
Nucleotidyltransferase
YqaJ-like recombinase
Figure 2 (See legend on next page.)
Yutin and Koonin Virology Journal 2012, 9:161 Page 8 of 18http://www.virologyj.com/content/9/1/161
(See figure on previous page.)Figure 2 Phylogenetic trees of ancestral NCLDV genes involved in DNA replication, recombination and repair. (A). Topoisomerase IB.(B). DNA Topoisomerase II. (C). AP endonuclease 2. (D). Nucleotidyltransferase. (E). YqaJ-like recombinase. Branches with bootstrap support lessthan 50 were collapsed. For each sequence, the species name abbreviation and the gene identification numbers are indicated; env stands forsequences retrieved from env_nr database. Species abbreviations: Invir, Invertebrate iridescent virus; Crovi, Crocodilepox virus; Afrsw, African swinefever virus; Amsmo, Amsacta moorei entomopoxvirus 'L'; Arath, Arabidopsis thaliana; Ectsi, Ectocarpus siliculosus virus 1; Emihu, Emiliania huxleyivirus 86; Ostlu, Ostreococcus lucimarinus CCE9901; Ostta, Ostreococcus tauri virus 1; Ostvi, Ostreococcus virus OsV5; Parbu, Paramecium bursariaChlorella virus FR483; Phypa, Physcomitrella patens subsp. patens; Melsa, Melanoplus sanguinipes entomopoxvirus; Micsp, Micromonas sp. RCC299;Nitma, Nitrosopumilus maritimus SCM1; Ruesp, Ruegeria sp. TM1040; Sorbi, Sorghum bicolor; At, Thaumarchaeota; Bp, Proteobacteria; E9,Viridiplantae; c1, Asfarviridae; l1, Chloriridovirus; l2, Iridovirus; q1, Chlorovirus; q2, Coccolithovirus; q3, Phaeovirus; q6, unclassified Phycodnaviridae;u1, Chordopoxvirinae; u2, Entomopoxvirinae.
Yutin and Koonin Virology Journal 2012, 9:161 Page 9 of 18http://www.virologyj.com/content/9/1/161
principle, the opposite direction of gene transfer, fromspecific groups of NCLDV to the respective cellular lifeforms, cannot be ruled out. However, it appears excep-tionally unlikely that, for example, a diverse assortmentof bacteria and archaea received the Topo II gene specif-ically from the crocodile poxvirus lineage.Phylogenetic analysis of apurinic endonuclease 2
(AP2), a repair enzyme represented in a diverse subsetof NCLDV, shows unequivocal support for the mono-phyly of the NCLDV, assuming that the numerous homolo-gous environmental sequences belong to uncharacterizedmembers of the respective NCLDV groups (Figure 2C). Inentomopoxviruses, AP2 is fused with a nucleotidyltransfer-ase (a member of the DNA polymerase beta family)whereas ASFV and mimivirus possess a distinct gene en-coding a nucleotidyltransferase. However, the phylogenetictree of nucleotidyltransferases strongly suggests independ-ent acquisitions of this gene by several NCLDV lineages(Figure 2D).The phylogenetic tree of the YqaJ family recombinase,
an enzyme whose role in NCLDV replication and/or re-pair remains uncertain, is best compatible with multipleacquisitions from bacteriophages although monophyly ofthe NCLDV could not be ruled out (Figure 2E andAdditional file 2).The unusual case of two distinct DNA ligases was ana-
lyzed in detail previously [18]. Surprisingly, the ATP-dependent ligase that is most common among theNCLDV shows clear signs of polyphyletic origin withmultiple, independent acquisitions by several viruslineages whereas the less common NAD-dependent lig-ase appears to be monophyletic, and probably ancestral.A re-analysis performed in the course of this work usingan up to date set of sequences provided strong supportfor this conclusion (Additional file 3).
Genes involved in transcription and mRNA maturationAll the NCLDV, with the exception of the majority ofphycodnaviruses, encode two large subunits of theDNA-dependent RNA polymerase (RNAP) that are alsouniversally conserved in all cellular life forms. It hasbeen reported that in phylogenetic trees the NCLDVRNAP subunits come across as polyphyletic [28,29],
and this conclusion is supported by our present ana-lysis (Figure 3A,B). However, for the alpha subunit ofthe RNAP, the constrained tree in which the NCLDVform a clade, with the exception of the mimivirusand ASFV, actually had the highest likelihood(Figure 3C; Additional file 2). Notably, in this treethe mimivirus gene clustered with the eukaryoticRNAP 2 whereas ASFV clustered with RNAP 1(Figure 3C), suggestive of two independent displace-ments of the ancestral NCLDV gene. Similar resultswere obtained for the RNAP beta subunit: althoughin this case the original tree with polyphyleticNCLDV had the greatest likelihood, the tree with anNCLDV clade excluding mimivirus and ASFV wasnearly as well supported (Additional file 2). As in thecase of the alpha subunit, in the RNAP beta subunittree the mimivirus gene clustered with the eukaryoticRNAP 2 whereas ASFV clustered with RNAP 1(Figure 3B).Three additional RNAP subunits/transcription factors
were too divergent to reconstruct reliable phylogenetictrees (Table 1). However, for Transcription factor TFIIB,monophyly of the NCLDV was strongly supported(Figure 3D).Phylogenetic analysis of Superfamily 2 helicases hom-
ologous to Vaccinia A18 protein (a helicase involved inlate transcription [30]) revealed a strongly supportedNCLDV clade, with two NCLDV genes placed in otherparts of the tree (Figure 4A). These two anomalies in-clude the E. huxlei phycodnavirus which fell within abacteriophage cluster and one of the two members ofthis helicase family encoded by ASFV. Interestingly, thisASFV protein clustered in the tree and shared domainorganization with homologs from Kinetoplastida, sug-gesting the possibility of gene acquisition from unicellu-lar eukaryotes by an ancestral asfarvirus. Monophyly ofthis Asfarvirus protein with other NCLDV was stronglyrejected (Additional file 2). Thus, two Asfarvirus pro-teins of this family likely have different origins: one isbona fide NCLDV protein whereas another was acquiredfrom a host.Another family of helicases implicated in transcription
includes homologs of Vaccinia D6 and D11 which are
A B
C D
Eukaryotes II
El Hydma221110135Mimivirus
Asco/IridoviridaeMarseillevirus
q2 Emihu73852534
Eukaryotes III
Archaea
PoxviridaeBacteria
Eukaryotes I
c1 Afrsw9628206
100
74
100
10069
81
100
98
10058
92
52
94
100
10071
0.2
Eukaryotes II91
Eukaryotic I
Poxviridae
Bacteria
Archaea
Marseillevirus
Eukaryotes III
q2 Emihu73852908
c1 Afrsw9628161
100
10053
100
77
100
65
84
57
82
100100
92
Asco/Iridoviridae
El Hydma221110129Mimivirus
env144053945
79
10093
0.1
Eukaryotes II
El Hydma221110135
Mimivirus
Asco/Iridoviridae
PoxviridaeMarseillevirus
q2 Emihu73852534
c1 Afrsw9628206
Eukaryotes I
Eukaryotes III
Archaea
Bacteria
10098
100
85
74
100
68
84
100
99
67
76
100
83
92
0.2
q1 Acatu155371663q1 Parbu155370233
q1 ParNY157952458c1 Afrsw9628176
env135257836Marseillevirus
env143819106q6 Ostvi163955150
environmental sequencesMimivirus
environmental sequences
environmental sequences
Eukaryotes
env143237987Bb Psyto91218585
Archaea74
51
100
79100
87
100
73
100
79
86100
53
90
100
94
0.2
RNA polymerase alpha subunit RNA polymerase beta subunit
RNA polymerase alpha subunit,constrained
Transcription factor TFIIB
Figure 3 Phylogenetic trees of ancestral NCLDV genes involved in transcription. (A). RNA polymerase alpha subunit. (B). RNA polymerasebeta subunit. (C). RNA polymerase alpha subunit, constrained tree with an NCLDV clade excluding mimivirus and ASFV. (D). Transcription factorTFIIB. Branches with bootstrap support less than 50 were collapsed. For each sequence, the species name abbreviation and the geneidentification numbers are indicated; env stands for sequences retrieved from env_nr database. Species abbreviations: Acatu, Acanthocystisturfacea Chlorella virus 1; Afrsw, African swine fever virus; Emihu, Emiliania huxleyi virus 86; Hydma, Hydra magnipapillata; Ostvi, Ostreococcus virusOsV5; Parbu, Paramecium bursaria Chlorella virus FR483; ParNY, Paramecium bursaria Chlorella virus NY2A; Psyto, Psychroflexus torquis ATCC 700755;Bb, Bacteroidetes; El, Opisthokonta; c1, Asfarviridae; q1, Chlorovirus; q2, Coccolithovirus; q6, unclassified Phycodnaviridae.
Yutin and Koonin Virology Journal 2012, 9:161 Page 10 of 18http://www.virologyj.com/content/9/1/161
present in ASFV and in mimivirus. Thus, the presenceof an ancestral gene of this family in the NCLDV ances-tor implies losses in several groups of viruses. Phylogen-etic analysis of this gene strongly supports themonophyly of the NCLDV (Figure 4B).The capping enzyme of the NCLDV is a complex,
three-domain protein. The N-terminal phosphatase do-main is too divergent for phylogenetic analysis. Phylogen-etic analysis of the other two domains, guanylyltransferaseand methyltransferase, identifies an NCLDV clade, to the
exclusion of the single representative in iridoviruses thatappears to be an independent acquisition of a eukaryotichomolog subsequent to the loss of the ancestral NCLDVgene in iridoviruses (Figure 4C,D).The RNA ligase is an RNA processing enzyme [31]
that is represented in by conserved orthologs irido-viruses, ascoviruses, and Marseillevirus and by a moredistant homolog in ASFV; the precise role of this en-zyme in the NCLDV reproduction remains unclear. Inthe phylogenetic tree, the NCLDV ligases of the NCLDV
A B
env134898567env143049744
env136380008Mimivirus
env136133804env144077465
q6 Ostvi163955040q3 Ectsi13242538
q1 Acatu155371590q1 Parbu9631722
PoxviridaeIridoviridae
Marseillevirusc1 Afrsw9628229
environmental sequencesEk Leiin146094415
c1 Afrsw9628148Ek Trycr71407144
El Phano169616097Ec Plavi156081817
Ea Entdi167392006q2 Emihu73852574
BacteriaArchaea
BacteriaBacteria
Bacteria
9097
95
7765
82100
98
62100
100
99
94
53
96
71
96
81
75
100
93
81
9183
10098
10052
99
0.2
Eukaryotes
Bacteria
Bacteria
Eukaryotes
100
96
7493
53
96
96
10093
0.2
c1 Afrsw9628180
Poxviridae
Mimivirus
Mimivirus
E9 Phypa168007236env135440874
c1 Afrsw9628208env135039429
MimivirusMarseillevirus
environmental sequencesPoxviridae
Ea Dicdi66805201Ea Entdi167392533El Enccu85014195
El Entbi169806196q2 Emihu73852927
env136191089q6 Ostvi290343628
env143320606env136743363
q1 Acatu155371666q1 Parbu9631672
l4 Infsp19881469El Caeel71980521
El Bruma170577180El Anoga158292641
El Trica91083171El Strpu72005733
EukaryotesEukaryotes
Eukaryotes
El Monbr167535971Eukaryotes
Eukaryotes
96
71
9654
100
92
99
60
10053
8376
68
99
67
100
96
7484
53
100
99
91
100
0.2
C
env135954485env136245388
Mimivirus
PoxviridaeMarseillevirus
env136185156env136595843
c1 Afrsw9628208
q6 Ostvi163955066
99
67
58
58100
100
83
66
env142418996env142908014env136139714
Eukaryotes93
80
99
100
0.2
Asco/Iridoviridae
env144204233c1 Afrsw9628169
Bacteria/phages
Baculoviridaeb1 Trini116326771
env144139321env140405543
90
99
54
98
89
54
8274
91
62
70
0.2
environmental sequencesMarseillevirus
environmental sequences
D
E
A18-like helicase D6/D11-like helicase
Capping enzyme (guanylyltransferase)
Capping enzyme (methyltransferase)
RNA ligase
Figure 4 (See legend on next page.)
Yutin and Koonin Virology Journal 2012, 9:161 Page 11 of 18http://www.virologyj.com/content/9/1/161
(See figure on previous page.)Figure 4 Phylogenetic trees of ancestral NCLDV genes involved in mRNA processing/maturation. (A). Superfamily 2 helicases homologousto Vaccinia A18. (B). Superfamily 2 helicases homologous to Vaccinia D6 and D11. (C). Capping enzyme (guanylyltransferase domain). (D). Cappingenzyme (methyltransferase domain). (E). RNA ligase. Branches with bootstrap support less than 50 were collapsed. For each sequence, the speciesname abbreviation and the gene identification numbers are indicated; env stands for sequences retrieved from env_nr database. Speciesabbreviations: Phypa, Physcomitrella patens subsp. patens; Dicdi, Dictyostelium discoideum AX4; Entdi, Entamoeba dispar SAW760; Plavi, Plasmodiumvivax SaI-1; Leiin, Leishmania infantum JPCM5; Trycr, Trypanosoma cruzi strain CL Brener; Anoga, Anopheles gambiae str. PEST; Bruma, Brugia malayi;Caeel, Caenorhabditis elegans; Enccu, Encephalitozoon cuniculi GB-M1; Entbi, Enterocytozoon bieneusi H348; Monbr, Monosiga brevicollis MX1; Phano,Phaeosphaeria nodorum SN15; Strpu, Strongylocentrotus purpuratus; Trica, Tribolium castaneum; Trini, Trichoplusia ni ascovirus 2c; Afrsw, Africanswine fever virus; Infsp, Infectious spleen and kidney necrosis virus; Acatu, Acanthocystis turfacea Chlorella virus 1; Parbu, Paramecium bursariaChlorella virus FR483; Emihu, Emiliania huxleyi virus 86; Ectsi, Ectocarpus siliculosus virus 1; Ostvi, Ostreococcus virus OsV5; E9, Viridiplantae; Ea,Amoebozoa; Ec, Alveolata; Ek, Euglenozoa; El, Opisthokonta; c1, Asfarviridae; l4, Megalocytivirus; q1, Chlorovirus; q2, Coccolithovirus; q3, Phaeovirus;q6, unclassified Phycodnaviridae.
Yutin and Koonin Virology Journal 2012, 9:161 Page 12 of 18http://www.virologyj.com/content/9/1/161
form a well-supported clade (along with many uncharac-terized environmental sequences), with the exception ofthe Trichoplusia ni ascovirus which belonged to a cladewith baculovirus and diverse bacteriophages (Figure 4E).The constrained tree with an NCLDV clade was confi-dently statistically rejected (Additional file 2) indicatingthat in the Trichoplusia ni ascovirus the RNA ligasegenes was displaced by a homolog from a baculovirus.Finally, the previously investigated case of the small,
regulatory subunit of polyA polymerase is unusual inthat this gene is present only in poxviruses which alsoencode the large, catalytic subunit, and in a single mimi-virus strain which lacks the catalytic subunit. Despitethis sparse representation, the NCLDV small subunitsseem to form a strongly supported clade, suggestive ofthe possibility that this gene was present in the ancestralNCLDV [15].
Genes for enzymes of nucleotide metabolismMost of the NCLDV encode varying sets of enzymesinvolved in metabolism of deoxyribonucleotides. Thesegenes are not strictly essential for virus reproduction,given that knockouts are typically viable in cell cultures,but tend to be important in vivo. The most common en-zyme in this group is ribonucleotide reductase (RR)which consists of a large and a small subunits encodedby two distinct genes. The phylogenetic trees of both thelarge and the small subunits show complex topologiesthat are incompatible with monophyly of the NCLDV(Figure 5A,B). Moreover, for both subunits, poxvirusesand iridoviruses show clear, distinct affinities, witheukaryotic and bacteriophage homologs, respectively.The provenance of the RR in the other NCLDV is lessclear. On the whole, it seems to be a safe conclusion thatthe RR of the NCLDV are not monophyletic but ratherhave evolved in a complex evolutionary scenario thatinvolved multiple acquisitions, losses and displacement.Therefore, the results of the ML reconstruction notwith-standing, it is difficult to ascertain whether the last com-mon ancestor of the NCLDV encoded RR.
Thymidine kinase (TK) is another major enzyme ofdNTP biosynthesis that is present in a large subset ofthe NCLDV. Phylogenetic analysis reveals three distinctclusters of viral TKs (Figure 6A); a constrained tree con-taining an NCLDV clade could not be statisticallyrejected but nevertheless had a lower likelihood than theoriginal tree (Additional file 2). Qualitatively similarresults were obtained for Thymidylate kinase (TMPK), thesecond enzyme of thymidylate biosynthesis (Figure 6B).The dUTPase, an enzyme that functions at the interface ofnucleotide metabolism and repair, is present in poxviruses,iridoviruses and many phycodnaviruses, but showed clearsigns of polyphyletic origin, including statistically solid re-jection of NCLDV monophyly (Figure 6C, Additionalfile 2). Similar results have been reported from a pre-vious phylogenetic analysis for some of the genes en-coding enzymes of nucleotide metabolism [19].The majority of the sequences of putative ancestral
virion proteins and proteins involved in virus mor-phogenesis and virus-host interactions were too diver-gent to construct reliable phylogenetic trees. The onlyexception was the disulfide isomerase enzyme forwhich monophyly of the NCLDV (including numer-ous environmental sequences that presumably belongto uncharacterized viruses) could be demonstrated(Figure 7).
DiscussionEvolutionary reconstruction using patterns of genepresence-absence in viral genomes has led to the conclu-sion that the NCLDV represent a monophyletic group ofviruses that evolved from a common ancestor which wasa virus with genomic complexity comparable to that ofthe extant NCLDV. Approximately 50 genes that areconserved in different subsets of the NCLDV have beenassigned to the ancestral virus genome (Table 1), and itappears likely that the ancestral virus additionallyencompassed many lineage-specific genes. In this workwe went beyond the phyletic patterns by analyzing phy-logenies of the inferred ancestral NCLDV genes alongwith their homologs from cellular organisms. In almost
A B
Bp Vibha269963258q3 Ectsi13242650
El Lodel149238952E8 Thaps224001624
env144060058q6 Ostvi163955126env144051596
Ek Leibr154340341Ek Trybr71755889
env144142045Ec Thepa71027825
Ec Parte145497945Ec Crymu209878422
PoxviridaeEl Acypi193608391
env143433140Mimivirus
El Ustma71020483E8 Phatr219110871
q3 Felsp197322445q2 Emihu73852902
Baculoviridaez3 Helze22788802
vi Musdo187903102Marseillevirus
k1 Cyphe131840167k1 Anghe282174153
o1 Shrwh17158276q1 Acatu155371048
q1 ParFR155370864q1 Parbu9632159
c1 Afrsw9628153Bp Frano118497573env136094176
env136607526env142360592env142151739l3 Lymch48843724
l3 Lymdi13358415l5 Ambti45686076l5 Frovi49237335l5 Singr56692701
env143333574l1 Aedta109287943
l2 Invir15078798env142551493
env143288148env144017272Bacteria
Bacteria67
99
100
78
92
56
100
100100
100
100
76
100
87
90
99
100
98 61100
85
97
7155
65
10087
92
100
52
8682
9659
10093
97
57
9898
83
10069
84
59
78
85
0.1
q3 Felsp197322443q3 Ectsi13242599
env135287498
Eukaryotes
EukaryotesEa Dicdi66822029
Ec Parte145537171
q2 Emihu73852496Ek Leibr154337607
Ek Trybr74025548
Bacteriaenv144044197env143932091
q6 Ostvi163955147env142408079
env142452833env136160125
q1 Parbu9632043q1 Acatu155370982
environmental sequences
MimivirusEukaryotes
Baculoviridae
Eukaryotes
Eukaryotes
u1 Canvi40556174z3 Helze22788780
l4 Infsp19881429k1 Anghe282174133
env135039280Bp Phoda269105083
c1 Afrsw9628152Marseillevirus
l5 Ambti45686047l5 Frovi49237365
l5 Singr56692684l3 Lymdi13358428
l3 Lymdi51869996Bp Frano118497575
l1 Invir109287926l2 Invir15079087
Bacteria
92
86
10098
79
89
9584
91
51
61
96
78
71
100
91
100
89
54
97
61
93
9794
100
96
98
100
61
75
8447
85
88
9572
98
76
84
66
80
93
99
0.1
100
100
Poxviridae
Ribonucleotide reductase, large subunit
Ribonucleotide reductase, small subunit
Figure 5 Phylogenetic trees of the ancestral NCLDV genes for ribonucleotide reductase subunits. (A). Ribonucleotide reductase, largesubunit. (B). Ribonucleotide reductase, small subunit. Branches with bootstrap support less than 50 were collapsed. For each sequence, thespecies name abbreviation and the gene identification numbers are indicated; env stands for sequences retrieved from env_nr database. Speciesabbreviations: Acatu, Acanthocystis turfacea Chlorella virus 1; Acypi, Acyrthosiphon pisum; Aedta, Invertebrate iridescent virus 3; Afrsw, Africanswine fever virus; Ambti, Ambystoma tigrinum virus; Anghe, Anguillid herpesvirus 1; Canvi, Canarypox virus; Crymu, Cryptosporidium muris RN66;Cyphe, Cyprinid herpesvirus 3; Dicdi, Dictyostelium discoideum AX4; Ectsi, Ectocarpus siliculosus virus 1; Emihu, Emiliania huxleyi virus 86; Felsp,Feldmannia species virus; Frano, Francisella novicida U112; Frovi, Frog virus 3; Helze, Heliothis zea virus 1; Infsp, Infectious spleen and kidneynecrosis virus; l1_Invir, Invertebrate iridescent virus 3; l2_Invir, Invertebrate iridescent virus 6; Leibr, Leishmania braziliensis MHOM/BR/75/M2904;Lodel, Lodderomyces elongisporus NRRL YB-4239; Lymch, Lymphocystis disease virus - isolate China; Lymdi, Lymphocystis disease virus 1; Musdo,Musca domestica salivary gland hypertrophy virus; Ostvi, Ostreococcus virus OsV5; Parbu, Paramecium bursaria Chlorella virus NY2A; ParFR,Paramecium bursaria Chlorella virus FR483; Parte, Paramecium tetraurelia strain d4-2; Phatr, Phaeodactylum tricornutum CCAP 1055/1; Phoda,Photobacterium damselae subsp. damselae CIP 102761; Shrwh, Shrimp white spot syndrome virus; Singr, Singapore grouper iridovirus; Thaps,Thalassiosira pseudonana CCMP1335; Thepa, Theileria parva strain Muguga; Trybr, Trypanosoma brucei; Ustma, Ustilago maydis 521; Vibha, Vibrioharveyi 1DA3; Bp, Proteobacteria; E8, Stramenopiles; Ea, Amoebozoa; Ec, Alveolata; Ek, Euglenozoa; El, Opisthokonta; c1, Asfarviridae; k1, Herpesvirales;l1, Chloriridovirus; l2, Iridovirus; l3, Lymphocystivirus; l4, Megalocytivirus; l5, Ranavirus; o1, Nimaviridae; q1, Chlorovirus; q2, Coccolithovirus; q3,Phaeovirus; q6, unclassified Phycodnaviridae; u1, Chordopoxvirinae; vi, Nudivirus; z3, unclassified dsDNA viruses.
Yutin and Koonin Virology Journal 2012, 9:161 Page 13 of 18http://www.virologyj.com/content/9/1/161
A BEukaryotes
PoxviridaeEukaryotes
EukaryotesPoxviridae
q6 Ostvi163955199env136422265env142443153env144065395
env143975708env144021532
env142459479env136229246Marseillevirus
env142470004env136062509
Eukaryotes (plants)c1 Afrsw9628158
o1 Shrwh17158497Mimivirus
u2 Amsmo9964330Bacteria98
95
57
100
9771
9188
83
95
79
87
82
84
69
60
9880
6076
100
52
72
100
0.1Asco/Iridoviridae Phycodnaviridae environmental sequences
environmental sequencesenv142414951
k1 Icthe9626826l4 Infsp19881437
vi Musdo187903106Ec Parte145528967
Ec Tetth118389890u1 Canvi40556137
u1 Fowvi9634729u1 Fowvi9634821
env138940489u1 Canvi40556151
Eukaryotes
Bacteria/phagesEukaryotes
68
El Lacth255712129El Cryne58265708
l2 Invir15078963Ec Tetth118351574
El Brafl260796149u1 Canvi40556108
El Homsa42544174E9 Arath145334853
Ea Dicdi66804303E9 Ostlu145348882
u1 Ectvi22164753u1 Varvi9627681u1 Vacvi66275971
Bp Psesy289676128Bb Chipi256419682Bf Desha89894606
Bf Clobo226947280q2 Emihu73852905
Ek Trycr71423920Ek Leima157865253
c1 Afrsw9628143El Caebr268573866
54
100
8493
50
62
8364
97
55
8780
72
70
100
98
51
97
51
72
86
82
5574
85
97
99
94
72
100
817481
0.2
l2 Invir15079149
E9 Phypa167999518Ea Dicdi66800487
El Apime48097825El Nasvi156541544
El Cioin198419421
l5 Singr56692686El Homsa70906444
a2 Fowad9628836a2 Fowad9633174
l5 Ambti45686051l5 Frovi49237361
d2 Spoli148368872z3 Helze22788776
u2 Amsmo9964316Poxviruses
El Caeel71988561d2 Agrse46309437
Ea Enthi67470091env138142700
env143826648env142423093
env143987160
q6 Ostta290343671env143982771
env134673310
BacteriaBp Rhosp146280038
env143370438
q2 Emihu73852871q1 Acatu155371447
q1 Parbu157953045Eukaryotes
100
99
6972
68
96
88
99
53
8685
85
74
9551
54
97
71
77
61
93
100
86
70
75
67
88
70
94
67
0.1
C
Thymidine kinase Thymidylate kinase
dUTPase
Figure 6 (See legend on next page.)
Yutin and Koonin Virology Journal 2012, 9:161 Page 14 of 18http://www.virologyj.com/content/9/1/161
(See figure on previous page.)Figure 6 Phylogenetic trees of ancestral NCLDV genes encoding enzymes of nucleotide metabolism. (A). Thymidine kinase.(B). Thymidylate kinase. (C). dUTPase. Branches with bootstrap support less than 50 were collapsed. For each sequence, the species nameabbreviation and the gene identification numbers are indicated; env stands for sequences retrieved from env_nr database. Species abbreviations:Acatu, Acanthocystis turfacea Chlorella virus 1; Afrsw, African swine fever virus; Agrse, Agrotis segetum granulovirus; Ambti, Ambystoma tigrinumvirus; Amsmo, Amsacta moorei entomopoxvirus 'L'; Apime, Apis mellifera; Arath, Arabidopsis thaliana; Brafl, Branchiostoma floridae; Caebr,Caenorhabditis briggsae; Caeel, Caenorhabditis elegans; Canvi, Canarypox virus; Chipi, Chitinophaga pinensis DSM 2588; Cioin, Ciona intestinalis;Clobo, Clostridium botulinum A2 str. Kyoto; Cryne, Cryptococcus neoformans var. neoformans JEC21; Desha, Desulfitobacterium hafniense Y51; Dicdi,Dictyostelium discoideum AX4; Ectvi, Ectromelia virus; Emihu, Emiliania huxleyi virus 86; Enthi, Entamoeba histolytica HM-1:IMSS; Fowad, Fowladenovirus A; Fowvi, Fowlpox virus; Frovi, Frog virus 3; Helze, Heliothis zea virus 1; Homsa, Homo sapiens; Icthe, Ictalurid herpesvirus 1; Infsp,Infectious spleen and kidney necrosis virus; l2_Invir, Invertebrate iridescent virus 6; Lacth, Lachancea thermotolerans; Leima, Leishmania major;Musdo, Musca domestica salivary gland hypertrophy virus; Nasvi, Nasonia vitripennis; Ostlu, Ostreococcus lucimarinus CCE9901; Ostta, Ostreococcustauri virus 1; Ostvi, Ostreococcus virus OsV5; Parbu, Paramecium bursaria Chlorella virus NY2A; Parte, Paramecium tetraurelia strain d4-2; Phypa,Physcomitrella patens subsp. patens; Psesy, Pseudomonas syringae pv. syringae FF5; Rhosp, Rhodobacter sphaeroides ATCC 17025; Shrwh, Shrimpwhite spot syndrome virus; Singr, Singapore grouper iridovirus; Spoli, Spodoptera litura granulovirus; Tetth, Tetrahymena thermophila; Trybr,Trypanosoma brucei; Vacvi, Vaccinia virus; Varvi, Variola virus; Bb, Bacteroidetes; Bf, Firmicutes; Bp, Proteobacteria; E9, Viridiplantae; Ea, Amoebozoa; Ec,Alveolata; Ek, Euglenozoa; El, Opisthokonta; a2, Adenoviridae; c1, Asfarviridae; d2, Baculoviridae; k1, Herpesvirales; l2, Iridovirus; l4, Megalocytivirus; l5,Ranavirus; o1, Nimaviridae; q1, Chlorovirus; q2, Coccolithovirus; q6, unclassified Phycodnaviridae; u1, Chordopoxvirinae; u2, Entomopoxvirinae; vi,Nudivirus; z3, unclassified dsDNA viruses.
Yutin and Koonin Virology Journal 2012, 9:161 Page 15 of 18http://www.virologyj.com/content/9/1/161
all cases where the information content of the respectivemultiple sequence alignments was sufficient for phylo-genetic analysis, deviations from the simple pattern ofvertical evolution were observed (summarized inTable 1). Strikingly, phylogenetic trees for most of theconserved NCLDV genes failed to show an NCLDVclade although it was not always possible to reject themonophyly of the NCLDV at a statistically significantlevel.The results of phylogenetic analysis of viral genes have
to be interpreted with caution given the generally fastevolution of viral genomes and the ensuing possibility ofartifacts such as long branch attraction. Therefore, weconsidered the monophyly of the NCLDV to be the mostappropriate null hypothesis and generally made conclu-sions on more complex evolutionary scenarios onlywhen this hypothesis could be rejected using conserva-tive statistical tests of tree topology. Nevertheless, evenwith this conservative approach, phylogenomic analysissuggests that evolution of the core NCLDV genes, withonly a few likely exceptions (DNA polymerase, disulfideisomerase and several other genes; see Table 1), includednot only multiple gene losses that were apparent alreadyfrom the examination of the phyletic patterns, but alsomultiple cases of xenologous gene displacement [25], i.e.displacement of the ancestral gene by a homologous(and functionally analogous) gene from a differentsource such as bacteriophage or eukaryote. On someoccasions, when a gene is missing in a large group ofviruses (such as phycodnaviruses) except for a minorityof members of that group, in which it is phylogeneticallydistinct from homologs in other NCLDV, the sequenceof events leading to displacement can be inferred as lossfollowed by regain. In other cases, such as RR and TK,the course of evolution seems to have been too complexto reconstruct the specific scenario.
Thus, the present analysis reveals layers of hiddencomplexity in the history of the conserved gene core ofthe NCLDV that are not apparent from the analysis ofpatterns of gene presence-absence alone [5]. Althoughdeviations from simple vertical evolution probably oc-curred in the history of almost all core genes, theresults do not invalidate the conclusion on the evolu-tion of all known NCLDV from a single ancestral virus.The high prevalence of gene loss and xenologous genedisplacement notwithstanding, for most of the coregenes, these events affected the evolution of only a fewlineages, consistent with the origin of all NCLDV froma common ancestor, followed by isolated events compli-cating the evolutionary scenarios. However, for severalgenes that have been included in the core set on thebasis of gene presence-absence patterns, such as theenzymes of DNA precursor metabolism, multiplesources are apparent, so that the ancestral status ofthese genes becomes uncertain.Phylogenetic analysis of the core NCDLV genes reveals
multiple affinities with genes from eukaryotes, bacteriaand bacteriophages. The acquisition of genes fromeukaryotic hosts (that might not be the same for ances-tral and extant viruses) is not surprising. However, genetransfer to NCLDV, in particular those infecting unicel-lular eukaryotes, from bacteria and bacteriophages isplausible as well given that diverse parasites and sym-bionts often coexist within the same eukaryotic host. In-deed, in amoebas, with their large cells and phagocyticlife style, such coexistence is the norm, making theseorganisms veritable ‘melting pots’ of virus evolution[7,32].The general trend seems to be that bona fide essential
genes, are rarely displaced because for these, intermedi-ates lacking the gene are most like non-viable, so ifintermediate forms existed, they should have encoded
Protein disulfide isomeraseMarseillevirus
env136543793Iridoviridaeenvironmental sequences
env140203035env136808647
q1 ParAR157953760env136606018
env136840138Mimivirus
environmental sequencesMimivirus
env136256796
environmental sequencesq3 Felsp197322406
env141253368env134921190
q3 Ectsi13242631env134459082
environmental sequencesenv139731803l2 Invir15079059
l1 Aedta109287974Ascoviridae
c1 Afrsw9628181environmental sequences
Poxviridae
Eukaryotes
99
q2 Emihu73852597env141323596
80
69
7588
67
72
67
99
5568
84
55
81
72
87
9181
95
60
9455
9365
10060
78
9659
74
0.2
Figure 7 Phylogenetic tree of an ancestral NCLDV geneencoding an enzyme involved in virion morphogenesis: proteindisulfide isomerase. Branches with bootstrap support less than 50were collapsed. For each sequence, the species name abbreviationand the gene identification numbers are indicated; env stands forsequences retrieved from env_nr database. Species abbreviations:Afrsw, African swine fever virus; Aedta, Invertebrate iridescent virus 3;Invir, Invertebrate iridescent virus; ParAR, Paramecium bursariaChlorella virus AR158; Emihu, Emiliania huxleyi virus 86; Ectsi,Ectocarpus siliculosus virus 1; Felsp, Feldmannia species virus; c1,Asfarviridae; l1, Chloriridovirus; l2, Iridovirus; q1, Chlorovirus; q2,Coccolithovirus; q3, Phaeovirus.
Yutin and Koonin Virology Journal 2012, 9:161 Page 16 of 18http://www.virologyj.com/content/9/1/161
two forms of the respective genes, with the original genesubsequently lost and the xenologous gene retained.This evolutionary scenario might be rare due to the con-straints imposed by the requirements for the formationof multisubunit complexes (under the complexity hy-pothesis of Lake and colleagues [33]), e.g. the replisome.However, the clear-cut xenologous displacement of theprimase-helicase gene in phycodnaviruses shows thatthese obstacles are not insurmountable. In contrast, forgenes that are not strictly essential but are beneficial inmost virus-host systems, such as the precursor biosyn-thesis enzymes, parallel loss and regain in multiple
lineages seems to be the rule rather than an exception.This pattern can be linked to the viability of evolutionaryintermediates lacking the respective genes, at least in theshort term.In addition to cases of xenologous gene displacement,
phylogenies of a few core genes point to evolutionarylinks with other large DNA viruses, such as herpes-viruses and baculoviruses, as well as bacteriophages.These observations are compatible with the virus worldconcept under which viruses are linked through complexnetworks of evolutionary connections at the level of in-dividual genes and in some cases gene modules, and arealso involved in extensive gene exchange with cellularlife forms.
MethodsThe NCLDV protein sequences were extracted from theRefSeq database (NCBI, NIH, Bethesda) [34]. The mostrecently sequenced NCLDV genomes, namely the Cafe-teria roenbergensis virus [35], the megavirus [16], both ofthe Mimiviridae family, and Lausannevirus [8], a closerelative of Marseille virus, were not included. For eachcluster of orthologous NCLDV genes (NCVOG) that hasbeen mapped to the last common ancestor of theNCLDV [5], the following procedure was applied. A rep-resentative NCVOG sequence set was constructed byclustering the complete collection of the respectiveprotein sequences using the Blastclust program (ftp://ftp.ncbi.nih.gov/blast/documents/blastclust.html) and select-ing a representative from each cluster of closely relatedsequences). Two BLASTP runs, one against the Refseqdatabase and the other one against the environmental(env_nr) database at the NCBI, were performed foreach representative sequence with the e-value cutoff of0.1. This liberal cutoff was used in order to incorpor-ate highly diverged homologs. To eliminate potentialfalse positives, all alignments were examined case bycase for the conservation of domain architecture andpresence of diagnostic motifs. All the sequences fromthe given NCVOG, the top 20–30 Refseq hits fromeach domain of the NCBI Taxonomy (Eukaryota, Bac-teria, Archaea, and Viruses), and top 10 environmentalhits for each query were combined, and nearly identi-cal sequences were eliminated using Blastclust. Theresulting sequences were aligned using MUSCLE [36];gapped columns (more than 30% of gaps) and columnswith low information content were removed from thealignment [37]. A preliminary tree was constructedusing PhyML [38], with the following parameters:WAG substitution matrix; four relative substitutionrate categories; the fraction of invariable sites andthe alpha parameter of the gamma distribution ofsite-specific evolution rates) were automaticallyselected by PhyML. From this preliminary tree, a
Yutin and Koonin Virology Journal 2012, 9:161 Page 17 of 18http://www.virologyj.com/content/9/1/161
subset of sequences representing each clade wereselected and re-aligned. In the next step, 8 PhyMLruns were performed for each alignment, with 8 sub-stitution models (Blosum62, Dayhoff, JTT, DCMut,RtREV, CpREV, VT, and WAG). The best topologywas chosen by the maximum log likelihood amongthe 8 trees. Bootstrap branch supports (expressed inExpected-Likelihood Weights, ELW) were calculatedusing TreeFinder [39] for the best PhyML tree usingthe best matrix. For detailed phylogenetic analysis,whenever appropriate, alternative (constrained) top-ology ML trees were constructed using TreeFinder[39], with the substitution model found to be thebest for a given alignment in the first-round analysis.Tree topologies were compared with TreeFinderusing the approximately unbiased (AU) test P value[40].
Additional files
Additional file 1: The phyletic patterns of absence-presence of theNCLDV core genes in the sequenced NCLDV genomes.
Additional file 2: Results of statistical analysis of constrained treesfor the core NCLDV genes.
Additional file 3: Phylogenetic trees for the DNA ligases of theNCLDV. A. ATP-dependent ligases. B. NAD-dependent ligases.
Competing interestsThe authors declare that they have no competing interests.
Authors’ contributionsEVK initiated and designed the study; NY collected and analyzed data; EVKwrote the manuscript that was read, edited and approved by both authors.
AcknowledgementsThe authors’ research is supported by the DHHS intramural program (NIH,national Library of Medicine).
Received: 10 February 2012 Accepted: 8 August 2012Published: 14 August 2012
References1. Forterre P: The origin of viruses and their possible roles in major
evolutionary transitions. Virus Res 2006, 117(1):5–16.2. Koonin EV, Senkevich TG, Dolja VV: The ancient virus world and evolution
of cells. Biol Direct 2006, 1(1):29.3. Iyer LM, Aravind L, Koonin EV: Common origin of four diverse families of
large eukaryotic DNA viruses. J Virol 2001, 75(23):11720–11734.4. Iyer LM, Balaji S, Koonin EV, Aravind L: Evolutionary genomics of
nucleo-cytoplasmic large DNA viruses. Virus Res 2006, 117(1):156–184.5. Yutin N, Wolf YI, Raoult D, Koonin EV: Eukaryotic large nucleo-cytoplasmic
DNA viruses: clusters of orthologous genes and reconstruction of viralgenome evolution. Virol J 2009, 6:223.
6. Koonin EV, Yutin N: Origin and evolution of eukaryotic largenucleo-cytoplasmic DNA viruses. Intervirology 2010, 53(5):284–292.
7. Boyer M, Yutin N, Pagnier I, Barrassi L, Fournous G, Espinosa L, Robert C,Azza S, Sun S, Rossmann MG, et al: Giant Marseillevirus highlights the roleof amoebae as a melting pot in emergence of chimeric microorganisms.Proc Natl Acad Sci U S A 2009, 106(51):21848–21853.
8. Thomas V, Bertelli C, Collyn F, Casson N, Telenti A, Goesmann A, Croxatto A,Greub G: Lausannevirus, a giant amoebal virus encoding histonedoublets. Environ Microbiol 2011, 13(6):1454–1466.
9. Moss B: Poxviridae: the viruses and their replication. In Fields Virology,Volume 2. Edited by Knipe DM, Howley PM. Philadelphia: LippincottWilliams & Wilkins; 2007:2905–2946.
10. Claverie JM, Abergel C, Ogata H: Mimivirus. Curr Top Microbiol Immunol2009, 328:89–121.
11. Raoult D, Boyer M: Amoebae as genitors and reservoirs of giant viruses.Intervirology 2010, 53(5):321–329.
12. Colson P, Raoult D: Gene repertoire of amoeba-associated giant viruses.Intervirology 2010, 53(5):330–343.
13. Van Etten JL, Lane LC, Dunigan DD: DNA viruses: the really big ones(giruses). Annu Rev Microbiol 2010, 64:83–99.
14. Raoult D, Audic S, Robert C, Abergel C, Renesto P, Ogata H, La Scola B,Suzan M, Claverie JM: The 1.2-megabase genome sequence of Mimivirus.Science 2004, 306(5700):1344–1350.
15. Colson P, Yutin N, Shabalina SA, Robert C, Fournous G, La Scola B, Raoult D,Koonin EV: Viruses with more than 1,000 genes: Mamavirus, a newAcanthamoeba polyphaga mimivirus strain, and reannotation ofMimivirus genes. Genome Biol Evol 2011, 3:737–742.
16. Arslan D, Legendre M, Seltzer V, Abergel C, Claverie JM: Distant Mimivirusrelative with a larger genome highlights the fundamental features ofMegaviridae. Proc Natl Acad Sci U S A 2011, 108(42):17486–17491.
17. Van Etten JL: Unusual life style of giant chlorella viruses. Annu Rev Genet2003, 37:153–195.
18. Yutin N, Koonin EV: Evolution of DNA ligases of nucleo-cytoplasmic largeDNA viruses of eukaryotes: a case of hidden complexity. Biol Direct 2009, 4:51.
19. Filee J, Pouget N, Chandler M: Phylogenetic evidence for extensive lateralacquisition of cellular genes by Nucleocytoplasmic large DNA viruses.BMC Evol Biol 2008, 8:320.
20. Weynberg KD, Allen MJ, Ashelford K, Scanlan DJ, Wilson WH: From smallhosts come big viruses: the complete genome of a second Ostreococcustauri virus, OtV-1. Environ Microbiol 2009, 11(11):2821–2839.
21. Filee J, Forterre P, Laurent J: The role played by viruses in the evolution oftheir hosts: a view based on informational protein phylogenies. ResMicrobiol 2003, 154(4):237–243.
22. Monier A, Claverie JM, Ogata H: Taxonomic distribution of large DNAviruses in the sea. Genome Biol 2008, 9(7):R106.
23. Ogata H, Toyoda K, Tomaru Y, Nakayama N, Shirai Y, Claverie JM, Nagasaki K:Remarkable sequence similarity between the dinoflagellate-infectingmarine girus and the terrestrial pathogen African swine fever virus. VirolJ 2009, 6:178.
24. Felsenstein J: Inferring Phylogenies. Sunderland, MA: Sinauer Associates; 2004.25. Koonin EV, Makarova KS, Aravind L: Horizontal gene transfer in
prokaryotes: quantification and classification. Annu Rev Microbiol 2001,55:709–742.
26. Senkevich TG, Koonin EV, Moss B: Predicted poxvirus FEN1-like nucleaserequired for homologous recombination, double-strand break repair andfull-size genome formation. Proc Natl Acad Sci U S A 2009,106(42):17921–17926.
27. Everly DN Jr, Feng P, Mian IS, Read GS: mRNA degradation by the virionhost shutoff (Vhs) protein of herpes simplex virus: genetic andbiochemical evidence that Vhs is a nuclease. J Virol 2002,76(17):8560–8571.
28. Sonntag KC, Darai G: Evolution of viral DNA-dependent RNA polymerases.Virus Genes 1995, 11(2–3):271–284.
29. Lane WJ, Darst SA: Molecular evolution of multisubunit RNA polymerases:sequence analysis. J Mol Biol 2010, 395(4):671–685.
30. Lackner CA, Condit RC: Vaccinia virus gene A18R DNA helicase is atranscript release factor. J Biol Chem 2000, 275(2):1485–1494.
31. Pascal JM: DNA and RNA ligases: structural variations and sharedmechanisms. Curr Opin Struct Biol 2008, 18(1):96–105.
32. Moliner C, Fournier PE, Raoult D: Genome analysis of microorganismsliving in amoebae reveals a melting pot of evolution. FEMS Microbiol Rev2010, 34:281–294.
33. Jain R, Rivera MC, Lake JA: Horizontal gene transfer among genomes: thecomplexity hypothesis. Proc Natl Acad Sci U S A 1999, 96(7):3801–3806.
34. Sayers EW, Barrett T, Benson DA, Bolton E, Bryant SH, Canese K,Chetvernin V, Church DM, Dicuccio M, Federhen S, et al: Databaseresources of the National Center for Biotechnology Information. NucleicAcids Res 2012, 40(Database issue):D13–D25.
35. Fischer MG, Allen MJ, Wilson WH, Suttle CA: Giant virus with a remarkablecomplement of genes infects marine zooplankton. Proc Natl Acad SciU S A 2010, 107(45):19508–19513.
Yutin and Koonin Virology Journal 2012, 9:161 Page 18 of 18http://www.virologyj.com/content/9/1/161
36. Edgar RC: MUSCLE: multiple sequence alignment with high accuracy andhigh throughput. Nucleic Acids Res 2004, 32(5):1792–1797.
37. Yutin N, Makarova KS, Mekhedov SL, Wolf YI, Koonin EV: The deep archaealroots of eukaryotes. Mol Biol Evol 2008, 25(8):1619–1630.
38. Guindon S, Gascuel O: A simple, fast, and accurate algorithm to estimatelarge phylogenies by maximum likelihood. Syst Biol 2003, 52(5):696–704.
39. Jobb G, von Haeseler A, Strimmer K: TREEFINDER: a powerful graphicalanalysis environment for molecular phylogenetics. BMC Evol Biol 2004,4:18.
40. Shimodaira H: An approximately unbiased test of phylogenetic treeselection. Syst Biol 2002, 51(3):492–508.
doi:10.1186/1743-422X-9-161Cite this article as: Yutin and Koonin: Hidden evolutionary complexity ofNucleo-Cytoplasmic Large DNA viruses of eukaryotes. Virology Journal2012 9:161.
Submit your next manuscript to BioMed Centraland take full advantage of:
• Convenient online submission
• Thorough peer review
• No space constraints or color figure charges
• Immediate publication on acceptance
• Inclusion in PubMed, CAS, Scopus and Google Scholar
• Research which is freely available for redistribution
Submit your manuscript at www.biomedcentral.com/submit