18
RESEARCH Open Access Hidden evolutionary complexity of Nucleo-Cytoplasmic Large DNA viruses of eukaryotes Natalya Yutin and Eugene V Koonin * Abstract Background: The Nucleo-Cytoplasmic Large DNA Viruses (NCLDV) constitute an apparently monophyletic group that consists of at least 6 families of viruses infecting a broad variety of eukaryotic hosts. A comprehensive genome comparison and maximum-likelihood reconstruction of the NCLDV evolution revealed a set of approximately 50 conserved, core genes that could be mapped to the genome of the common ancestor of this class of eukaryotic viruses. Results: We performed a detailed phylogenetic analysis of these core NCLDV genes and applied the constrained tree approach to show that the majority of the core genes are unlikely to be monophyletic. Several of the core genes have been independently acquired from different sources by different NCLDV lineages whereas for the majority of these genes displacement by homologs from cellular organisms in one or more groups of the NCLDV was demonstrated. Conclusions: A detailed study of the evolution of the genomic core of the NCLDV reveals substantial complexity and diversity of evolutionary scenarios that was largely unsuspected previously. The phylogenetic coherence between the core genes is sufficient to validate the hypothesis on the evolution of all NCLDV from a common ancestral virus although the set of ancestral genes might be smaller than previously inferred from patterns of gene presence-absence. Background Viruses are ubiquitous, obligate, intracellular parasites of all cellular life forms that rely on the host cell translation system, metabolism and, in many cases, the replication and transcription systems, for their reproduction. There is no evidence that all viruses have a monophyletic ori- gin, at least not under the traditional concept of mono- phyly. Indeed, not a single gene is conserved in the genomes of all known viruses although a small group of viral hallmark genesencoding some of the key proteins involved in genome replication and virion structure for- mation are shared by large, diverse subsets of viruses [1,2]. However, several large groups of viruses infecting diverse hosts do appear to share common ancestry in the strict sense, that is, to have evolved from the same ancestral virus, as indicated by the conservation of sets of genes encoding proteins responsible for many func- tions essential for virus reproduction. One of the largest viral divisions that seem to be monophyletic includes 6 recognized families and a 7 th candidate family of viruses with large DNA genomes that infect diverse eukaryotes and are collectively known as Nucleo-Cytoplasmic Large DNA Viruses (NCLDV) [3-6]. The formally recognized NCLDV families are Pox- viridae, Asfarviridae, Iridoviridae, Ascoviridae, Phycod- naviridae, and Mimiviridae; in addition, the recently discovered Marseillevirus and the related Lausannevirus could not be assigned to any of the 6 families, and are likely to become founding members of a new family [7,8]. Hereinafter we speak of 7 NCLDV families for the sake of simplicity. By far the most thoroughly studied group of the NCLDV are the Poxviridae, the family of animal viruses that include a major human pathogen, the smallpox virus, important animal pathogens, such as rabbit * Correspondence: [email protected] National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA © 2012 Yutin and Koonin; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Yutin and Koonin Virology Journal 2012, 9:161 http://www.virologyj.com/content/9/1/161

RESEARCH Open Access Hidden evolutionary complexity of ... · candidate family of viruses with large DNA genomes that infect diverse eukaryotes and are collectively known as Nucleo-Cytoplasmic

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: RESEARCH Open Access Hidden evolutionary complexity of ... · candidate family of viruses with large DNA genomes that infect diverse eukaryotes and are collectively known as Nucleo-Cytoplasmic

Yutin and Koonin Virology Journal 2012, 9:161http://www.virologyj.com/content/9/1/161

RESEARCH Open Access

Hidden evolutionary complexity ofNucleo-Cytoplasmic Large DNA viruses ofeukaryotesNatalya Yutin and Eugene V Koonin*

Abstract

Background: The Nucleo-Cytoplasmic Large DNA Viruses (NCLDV) constitute an apparently monophyletic groupthat consists of at least 6 families of viruses infecting a broad variety of eukaryotic hosts. A comprehensive genomecomparison and maximum-likelihood reconstruction of the NCLDV evolution revealed a set of approximately 50conserved, core genes that could be mapped to the genome of the common ancestor of this class of eukaryoticviruses.

Results: We performed a detailed phylogenetic analysis of these core NCLDV genes and applied the constrainedtree approach to show that the majority of the core genes are unlikely to be monophyletic. Several of the coregenes have been independently acquired from different sources by different NCLDV lineages whereas for themajority of these genes displacement by homologs from cellular organisms in one or more groups of the NCLDVwas demonstrated.

Conclusions: A detailed study of the evolution of the genomic core of the NCLDV reveals substantial complexityand diversity of evolutionary scenarios that was largely unsuspected previously. The phylogenetic coherencebetween the core genes is sufficient to validate the hypothesis on the evolution of all NCLDV from a commonancestral virus although the set of ancestral genes might be smaller than previously inferred from patterns of genepresence-absence.

BackgroundViruses are ubiquitous, obligate, intracellular parasites ofall cellular life forms that rely on the host cell translationsystem, metabolism and, in many cases, the replicationand transcription systems, for their reproduction. Thereis no evidence that all viruses have a monophyletic ori-gin, at least not under the traditional concept of mono-phyly. Indeed, not a single gene is conserved in thegenomes of all known viruses although a small group of“viral hallmark genes” encoding some of the key proteinsinvolved in genome replication and virion structure for-mation are shared by large, diverse subsets of viruses[1,2]. However, several large groups of viruses infectingdiverse hosts do appear to share common ancestry inthe strict sense, that is, to have evolved from the sameancestral virus, as indicated by the conservation of sets

* Correspondence: [email protected] Center for Biotechnology Information, National Library of Medicine,National Institutes of Health, Bethesda, MD 20894, USA

© 2012 Yutin and Koonin; licensee BioMed CeCreative Commons Attribution License (http:/distribution, and reproduction in any medium

of genes encoding proteins responsible for many func-tions essential for virus reproduction.One of the largest viral divisions that seem to be

monophyletic includes 6 recognized families and a 7th

candidate family of viruses with large DNA genomesthat infect diverse eukaryotes and are collectively knownas Nucleo-Cytoplasmic Large DNA Viruses (NCLDV)[3-6]. The formally recognized NCLDV families are Pox-viridae, Asfarviridae, Iridoviridae, Ascoviridae, Phycod-naviridae, and Mimiviridae; in addition, the recentlydiscovered Marseillevirus and the related Lausanneviruscould not be assigned to any of the 6 families, and arelikely to become founding members of a new family[7,8]. Hereinafter we speak of 7 NCLDV families for thesake of simplicity.By far the most thoroughly studied group of the

NCLDV are the Poxviridae, the family of animal virusesthat include a major human pathogen, the smallpoxvirus, important animal pathogens, such as rabbit

ntral Ltd. This is an Open Access article distributed under the terms of the/creativecommons.org/licenses/by/2.0), which permits unrestricted use,, provided the original work is properly cited.

Page 2: RESEARCH Open Access Hidden evolutionary complexity of ... · candidate family of viruses with large DNA genomes that infect diverse eukaryotes and are collectively known as Nucleo-Cytoplasmic

Yutin and Koonin Virology Journal 2012, 9:161 Page 2 of 18http://www.virologyj.com/content/9/1/161

myxoma virus, as well as vaccinia virus (VACV), one ofthe best characterized models of molecular biology [9].Another family of the NCLDV that has recently becomea major focus of attention is the Mimiviridae thatincludes giant viruses infecting amoeba and probablyalgae [10-13]. The genome of the prototype virus of thisfamily, Acanthamoeba polyphaga Mimivirus [14],slightly exceeds one megabase (Mb), and other relatedviruses possess even larger genomes [15,16], so theMimiviridae are undisputed genome size record holdersin the virosphere. Indeed, in terms of genome size andcomplexity, the NCLDV eclipse numerous parasitic bac-teria, and approach the simplest free-living prokaryotes.The NCLDV infect animals and diverse unicellular

eukaryotes, and either replicate exclusively in the cyto-plasm of the host cells, or encompass both cytoplasmicand nuclear stages in their life cycle. Most of theNCLDV do not strongly depend on the host replicationor transcription systems for completing their replication[9,17]. The autonomous life style of the NCLDV is sup-ported by a set of conserved proteins that are encodedin the viral genomes and mediate most of the processesessential for viral reproduction. These essential, con-served proteins include DNA polymerases, helicases,and primases responsible for DNA replication, RNApolymerase subunits and transcription factors that func-tion in transcription initiation and elongation, Hollidayjunction resolvases and topoisomerases involved in gen-ome DNA processing and maturation, ATPase pumpsmediating DNA packaging, molecular chaperonesinvolved in capsid assembly and capsid proteins them-selves [3-5]. Although several viral hallmark genes areshared by NCLDV and other large DNA viruses, such asherpesviruses, baculoviruses and some bacteriophages[2], the conservation of the large set of core genes clearlydemarcates the NCLDV as a distinct, most likely mono-phyletic class of viruses [4,6]. More specifically, recon-structions of the ancestral NCLDV genome compositionusing maximum parsimony and maximum likelihoodmethods have delineated a set of approximately 50 genesthat are inferred to have been responsible for the keyfunctions in the reproduction of the last common ances-tor of the NCLDV [5].The core, presumably ancestral set of the NCLDV

genes was delineated using sequence similarity-basedmethods that have been previously employed for identi-fication of clusters of orthologous genes in diverse cellu-lar life forms. The comparative genomic analysisunderlying these reconstructions was deliberately limitedto the NCLDV genomes to simplify the analysis and tofacilitate detection of distant relationships between viralproteins. Indeed, some of the core NCLDV proteins,such as for example the packaging ATPases and the di-sulfide chaperones, show only weak sequence similarity

between the viral families. Moreover, for some of theNCLDV genes with important functions in virusreproduction, indications of complex evolutionary his-tories have been obtained. The showcase for such evolu-tionary complexity is the viral DNA ligase which isrepresented by two distantly related forms across theNCLDV families. A maximum likelihood reconstructionbased on the presence-absence of conserved genes inthe viral genomes has implied that one of the two forms,the ATP-dependent DNA ligase, was the ancestral formthat was present in the genome of the prototypeNCLDV but was replaced by the distantly relatedNAD-dependent ligase in several viral lineages. How-ever, when the reconstruction was supplemented byphylogenetic analysis of the two forms of DNA ligase,the opposite conclusion has been reached, namely thatthe NAD –dependent ligase was the ancestral form inNCLDV that was displaced by the ATP-dependent lig-ase on several independent occasions [18]. This changein perspective occurred because phylogenetic analysisindicated that the ATP-dependent ligases from differentlineages of the NCLDV clustered with distinct groups ofeukaryotic homologs whereas the NAD–dependentligases of the NCLDV appeared to be monophyletic. Inthe same vein, complex phylogenies suggestive of mul-tiple horizontal gene transfer have been observed forseveral core NCLDV genes such as thymidine kinase orthe two subunits of ribonucleotide reductase [19].Taken together, these findings imply that some of the

apparently conserved genes of the NCLDV might actu-ally have complex histories which could include inde-pendent (convergent) acquisition of these genes fromdifferent cellular organisms as well as displacement ofancestral viral gene by homologs of cellular provenance.This line of reasoning prompted us to perform a com-prehensive phylogenetic analysis of the set of the puta-tive ancestral NCLDV genes. Here we present the resultsof this analysis which suggest that, although the exist-ence of a common ancestor of the NCLDV is beyondreasonable doubt, most of the conserved NCLDV genesindeed had complex evolutionary histories.

ResultsApproach and rationaleMaximum likelihood reconstruction of the gene reper-toire of the putative ancestral NCLDV has revealed acore consisting of approximately 50 viral genes (Table 1).Most of these genes are present in various subsets of theNCLDV genomes rather than all genomes (Additionalfile 1) but high likelihoods of ancestral provenance havebeen estimated for each of them, with the absences ac-cordingly attributed to lineage-specific gene loss. Weconstructed multiple alignments and position-specificscoring matrices for each of the ancestral NCLDV genes

Page 3: RESEARCH Open Access Hidden evolutionary complexity of ... · candidate family of viruses with large DNA genomes that infect diverse eukaryotes and are collectively known as Nucleo-Cytoplasmic

Table 1 Evolutionary scenarios for the ancestral NCLDV genes

NCVOG [3,4] Vacciniavirus genenumbera

Functionalcategory

# of viralfamilies/genomes

NCVOG annotation Evolutionary scenario from constrainedtree analysis

NCVOG0038 E9L DNA replication,recombinationand repair

6/45 DNA polymerase elongationsubunit family B

Monophyletic NCLDV, common origin withbaculoviruses and possibly herpesviruses

NCVOG0023 D5R DNA replication,recombinationand repair

6/45 D5-like helicase-primase Monophyletic NCLDV but displaced by abacteriophage homolog in phycodnaviruses.

NCVOG1060 G5R DNA replication,recombinationand repair

5/35 FLAP-like endonucleaseXPG (cd00128)

Possibly monophyletic NCLDV althoughdisplacement by a phage homolog inpoxviruses cannot be ruled out; loss inphycodnaviruses and subsequent acquisition ofa eukaryotic homolog by E. huxlei virus

NCVOG0036 H6R DNA replication,recombinationand repair

3/23 DNA topoisomerase IB Possibly monophyletic NCLDV

NCVOG0037 None DNA replication,recombinationand repair

6/15 DNA topoisomerase II Polyphyletic NCLDV, gene acquiredfrom different eukaryotes

NCVOG0035 None DNA replication,recombinationand repair

3/7 NAD+dependent DNAligase (smart00532)

Monophyletic NCLDV

NCVOG0034 A50R DNA replication,recombinationand repair

4/19 ATP-dependentDNA ligase(pfam01068, PRK01109)

Polyphyletic NCLDV, gene acquired fromdifferent eukaryotes

NCVOG0278 A22R DNA replication,recombinationand repair

5/36 RuvC, Holliday junctionresolvases (HJRs); cl00243.Extended Pox_A22,Poxvirus A22 family(pfam04848). Marseillevirus protein lacks C-termconserved positions.

Insufficient sequenceconservation for reliable phylogenetic analysis

NCVOG0004_APa None DNA replication,recombinationand repair

4/6 AP (apurinic)endonuclease family 2

Strongly supported monophyly of NCLDV

NCVOG0004_NTa None DNA replication,recombinationand repair

3/4 Nucleotidyltransferase(DNA polymerasebeta family)

Polyphyletic NCLDV, gene acquired fromdifferent eukaryotes

NCVOG1192 None DNA replication,recombinationand repair

4/13 YqaJ viral recombinasefamily (pfam09588)

Possibly monophyletic NCLDV

NCVOG0024 None DNA replication,recombinationand repair

3/4 Superfamily II helicaserelated to herpesvirusreplicative helicase(origin-binding proteinUL9), pfam03121

Limited sequence similarity between proteinsfrom different NCLDV; probable polyphyleticorigin; possibly not an ancestral NCLDV gene

NCVOG1115 D4R DNA replication,recombinationand repair

3/23 Uracil-DNA glycosylase Present in only a few NCLDV; uncertain,monophyly of the NCLDV cannot be ruled out

NCVOG0274 J6R Transcriptionand RNAprocessing

6/36 DNA-directed RNApolymerase subunitalpha

Displacement of ancestral NCLDV gene inmimivirus and ASFV with eukaryotic RNAP2 andRNAP1 subunit genes, respectively; loss in mostphycodnaviruses. Ancestral NCLDV gene possiblyderived from eukaryote RNAP I

NCVOG0271 A24R Transcription andRNA processing

6/36 DNA-directed RNApolymerasesubunit beta

Possibly polyphyletic NCLDV, with one subsetderived from RNAP1. and another from RNAP2;likely more recent displacement with RNAP2 inmimivirus; however, monophyly of NCLDV cannotbe ruled out; loss in most phycodnaviruses.

Yutin and Koonin Virology Journal 2012, 9:161 Page 3 of 18http://www.virologyj.com/content/9/1/161

Page 4: RESEARCH Open Access Hidden evolutionary complexity of ... · candidate family of viruses with large DNA genomes that infect diverse eukaryotes and are collectively known as Nucleo-Cytoplasmic

Table 1 Evolutionary scenarios for the ancestral NCLDV genes (Continued)

NCVOG0273 G5.5R Transcription andRNA processing

5/15 DNA-directed RNApolymerase subunit 5

Uncertain, not enough sequence conservationfor reliable phylogenetic analysis

NCVOG1164 A1L Transcription andRNA processing

6/44 A1L transcriptionfactor/late transcriptionfactor VLTF-2

Uncertain, not enough sequence conservationfor reliable phylogenetic analysis

NCVOG0262 A2L Transcription andRNA processing

6/45 Poxvirus LateTranscription FactorVLTF3 like

No obvious homologs outside NCLDV(monophyly of NCLDV by default).

NCVOG0261 A7L Transcription andRNA processing

5/35 Poxvirus earlytranscription factor(VETF), large subunit(pfam04441)

No obvious homologs outside NCLDV(monophyly of NCLDV by default).

NCVOG0272 E4L Transcription andRNA processing

6/39 Transcription factor S-II(TFIIS)-domain-containing protein

Uncertain, not enough sequence conservationfor reliable phylogenetic analysis

NCVOG1127 One Transcription andRNA processing

4/11 transcription initiationfactor IIB

Strongly supported monophyly of NCLDV

NCVOG0076 A18R Transcription andRNA processing

6/38 DNA or RNA helicases ofsuperfamily II (COG1061)

Monophyletic NCLDV except for displacementwith a eukaryotic homolog in one phycodnavirusand acquisition of a distinct eukaryotic paralogin ASFV

NCVOG0267 I8R Transcription andRNA processing

3/23 RNA-helicase DExH-NPH-II Monophyletic NCLDV; independent losses inseveral NCLDV lineages

NCVOG1117 D1R Transcription andRNA processing

6/33 mRNA capping enzymelarge subunit

Complex, variable domain architecture; apparentmonophyly of NCLDV for the conservedmethyltransferase domain; guanylyltransferases ofapparent distinct eukaryotic origin in a singleiridovirus and a single phycodnavirus

NCVOG0236 D9R, D10R Transcription andRNA processing

6/29 Nudix hydrolase Uncertain, not enough sequence conservation forreliable phylogenetic analysis

NCVOG1088 None Transcription andRNA processing

3/13 RNA ligase Monophyletic NCLDV; common origin withbaculovirus and possibly bacteriophage homologs

- J3R Transcription andRNA processing

2/16 PolyA polymerase,regulatory subunit

Present only in poxviruses and one mimivirus;however, monophyly strongly supported; possibleancestral gene

NCVOG0276 F4L Nucleotidemetabolism

6/29 Ribonucleotide reductasesmall subunit

Polyphyletic NCLDV, complex evolutionaryscenario; ancestral status uncertain; trees similarfor the large and small subunits

NCVOG1353 I4L Nucleotidemetabolism

6/24 ribonucleoside diphosphatereductase, large subunit

Polyphyletic NCLDV, complex evolutionaryscenario; ancestral status uncertain; trees similarfor the large and small subunits

NCVOG0319 J2R Nucleotidemetabolism

5/20 Thymidine kinase Most likely polyphyletic NCLDV althoughmonophyly could not be statistically rejected

NCVOG0320 A48R Nucleotidemetabolism

4/21 Thymidylate kinase Most likely polyphyletic NCLDV althoughmonophyly could not be statistically rejected

NCVOG1068 F2L Nucleotidemetabolism

4/30 dUTPase (cl00493) Polyphyletic NCLDV, monophyly rejected

NCVOG0022 D13L Virion structure andmorphogenesis

6/45 NCLDV major capsidprotein

No homologs outside NCLDV – monophyleticNCLDV by default

NCVOG0249 A32L Virion structure andmorphogenesis

6/45 A32-like packaging ATPase Only distant homologs outside NCLDV –monophyletic NCLDV by default

NCVOG0211 F9L, L1R Virion structure andmorphogenesis

4/34 myristylated envelopeprotein

No homologs outside NCLDV – monophyleticNCLDV by default

NCVOG1122 G9R, J5L,A16L

Virion structure andmorphogenesis

3/31 Myristylated protein;pfam03003, DUF230

No homologs outside NCLDV – monophyleticNCLDV by default

NCVOG0052 E10R Virion structure andmorphogenesis

6/44 disulfide (thiol)oxidoreductase/isomerase;Erv1 / Alr family (pfam04777)

Monophyletic NCLDV

Yutin and Koonin Virology Journal 2012, 9:161 Page 4 of 18http://www.virologyj.com/content/9/1/161

Page 5: RESEARCH Open Access Hidden evolutionary complexity of ... · candidate family of viruses with large DNA genomes that infect diverse eukaryotes and are collectively known as Nucleo-Cytoplasmic

Table 1 Evolutionary scenarios for the ancestral NCLDV genes (Continued)

NCVOG0256 H3L Virion structure andmorphogenesis

2/22 envelope protein,glycosyltransferase

Apparent monophyletic NCLDV but representedonly in poxviruses and mimiviruses; independentacquisition cannot be ruled out

NCVOG0040 H1L Signal transduction,regulation

4/30 cd00127, DSPc, Dualspecificity phosphatases(DSP); Ser/Thr and Tyrprotein phosphatases

Most likely independent acquisitions bydifferent NCLDV

NCVOG0330 VACWR012,VACWR207

Signal transduction,regulation

5/26 RING-finger-containing E3ubiquitin ligase(COG5432: RAD18)

Most likely independent acquisition by differentgroups of the NCLDV; probably not an ancestralgene

NCVOG0329 Signal transduction,regulation

2/3 UBCc, Ubiquitin-conjugatingenzyme E2 (cd00195)

Present only in mimivirus and ASFV; independentacquisitions from eukaryotes, NCLDV monophylyrejected; not an ancestral gene

NCVOG0246 Signal transduction,regulation

3/4 Ulp1-like protease/deubiquitinating enzyme

Uncertain, highly diverged NCLDV sequences

NCVOG0009 Virus-hostinteractions

3/4 pfam00653: BIR (BaculovirusInhibitor of apoptosisprotein Repeat) domain

Most likely independent acquisition bydifferent groups of the NCLDV; probably notan ancestral gene

NCVOG0012 A33R, A44R,A40R

Virus-hostinteractions

3/20 C-type lectin: smart00034,cd03594, cd03593, pfam00059,cd00037, pfam05966

Poorly conserved sequence, most likelyindependent acquisitions by different groupsof NCLDV; probably not an ancestral gene

NCVOG1361 Uncharacterized 6/11 T5orf172 domain (pfam10544) Different domain architectures; most likelyindependent acquisition by different groupsof the NCLDV; probably not an ancestral gene

NCVOG1360 VACWR011,VACWR208

Uncharacterized 3/18 KilA domain (pfam04383);always present at proteinN-termini except formimiviruses. In some proteinsfused to the a RING-fingerdomain.

Different domain architectures; most likelyindependent acquisition by different groupsof the NCLDV; probably not an ancestral gene

NCVOG1424 Uncharacterized 3/6 Uncharacterized domain;found downstream of KilA,BRO, and MSV199 domains.Also found in somebaculoviruses.

Different domain architectures; most likelyindependent acquisition by different groupsof the NCLDV; probably not an ancestral gene

NCVOG0010 Uncharacterized 4/11 pfam02498: Bro-N; BROfamily, N-terminal domain:This family includes theN-terminus of baculovirusBRO and ALI motif proteins.

Different domain architectures; most likelyindependent acquisition by different groupsof the NCLDV; probably not an ancestral gene

NCVOG0059 Uncharacterized 2/3 FtsJ-like methyltransferasefamily proteins (pfam01728)

Present only in mimivirus and ASFV; monophylycannot be ruled out

aThe systematic gene names are for the Copenhagen strain of Vaccinia virus except for the names starting with VACWR which are from the WR strain because therespective genes are missing/disrupted in the Copenhagen strain.

Yutin and Koonin Virology Journal 2012, 9:161 Page 5 of 18http://www.virologyj.com/content/9/1/161

and used these to search for homologs from cellularorganisms and other viruses. The resulting sequence setswere clustered to include all NCLDV proteins along withrepresentatives of as many major lineages of cellularorganisms as possible but also to trim the sets down to asize amenable to detailed phylogenetic analysis (seeMethods for the details). For the sequence sets thusselected, ML phylogenetic trees were constructed andexamined using the constrained tree approach (seeMethods). Specifically, for all trees in which the NCLDVdid not appear as a single strongly supported clade, thelikelihood of the original tree was compared with thelikelihood of a tree constrained for monophyly of

NCLDV. When justified, additional constrained trees,with topologies corresponding to plausible evolutionaryscenarios, were analyzed for individual genes. Below wepresent the results of the constrained phylogenetic treeanalysis of the inferred ancestral NCLDV genes.

Genes involved in genome replication, recombination andrepairThe DNA replication of the NCLDV is largely independ-ent of the host replication, and accordingly, genes en-coding protein components of the complex replicationmachinery represent a major part of the NCLDV genecore (Table 1). This is the largest group in the ancestral

Page 6: RESEARCH Open Access Hidden evolutionary complexity of ... · candidate family of viruses with large DNA genomes that infect diverse eukaryotes and are collectively known as Nucleo-Cytoplasmic

Yutin and Koonin Virology Journal 2012, 9:161 Page 6 of 18http://www.virologyj.com/content/9/1/161

NCLDV gene set that includes 13 genes two of which,the DNA polymerase (DNAP) and the primase-helicase,are central and indispensable to replication and areshared by all viruses of this class (it has been claimedthat the primase-helicase was missing in some phycod-naviruses [20]; however, in the process of NCVOG con-struction [5], this gene was identified in all completeNCLDV genomes, the high sequence divergence in someof the viruses notwithstanding). The B family of DNAPs,to which the NCLDV polymerases belong, includes themain replicative polymerases of all archaea and eukar-yotes as well as many bacterial, archaeal and eukaryoticviruses. The unconstrained phylogenetic tree for theDNAP failed to recover an NCLDV clade. Instead, themajority of the NCLDV DNAPs along with the DNAPsof herpesviruses formed a clade with eukaryotic DNAPdelta and zeta; a sister group to this large branch was aclade that consisted of poxvirus, asfarvirus and baculo-virus DNAPs (Figure 1A). However, constrained trees inwhich monophyly of the NCLDV was enforced couldnot be rejected by statistical tests (Additional file 2).Moreover, among the tested trees, the tree in which theNCLDV clade also included baculoviruses (as the sistergroup to asfarviruses) and herpesviruses as the sistergroup to the composite NCLDV-baculovirus clade, hadthe highest associated likelihood although none of theanalyzed tree topologies could be statistically rejected(Figure 1B and Additional file 2). Thus, the results of theconstrained tree analysis are compatible with the mono-phyly of the DNAPs of all NCLDV and moreover of alllarge DNA viruses of eukaryotes. This conclusion con-trasts the results of several previous phylogenetic ana-lyses that failed to recover an NCLDV clade but did notreport attempts to analyze constrained trees [21-23]. Itseems likely that the apparent non-monophyly of theNCLDV in previous studies was caused by branch lengtheffects, in particular acceleration of evolution in some ofthe viruses resulting in long branch attraction [24] (notethe branch length, in particular for Poxviridae inFigure 1A,B) and possibly other artifacts of phylogeneticanalysis.Phylogenetic analysis of the second hallmark viral rep-

lication gene, the primase-helicase, revealed a stronglysupported clade that included 6 of the 7 NCLDV fam-ilies: phycodnaviruses belonged to a different, also well-supported branch together with diverse bacteriophageand bacterial (most likely, prophage) homologs(Figure 1C). In this case, the constrained tree topologythat included an NCLDV clade was rejected at a statisti-cally significant level (Additional file 2). Thus, the resultof the phylogenetic analysis of the NCLDV primase-helicase appears to be best compatible with the presenceof this gene in the ancestral NCLDV followed bydisplacement with a homologous and functionally

analogous protein from a bacteriophage in the ancestorof phycodnaviruses. This appears to be a typical case ofxenologous gene displacement [25]).The third gene involved in replication that is present

in a large majority of the NCLDV encodes a FLAP nu-clease, an enzyme that is also ubiquitous in cellular lifeforms and removes single-stranded overhangs from rep-lication and recombination intermediates. Nucleases ofthis family are essential for replication and in particularrecombination in poxviruses [26] but can also assumeother functions such as mRNA degradation as is thecase in herpesviruses [27]. The phylogenetic tree ofFLAP nucleases did not include an NCLDV clade. In-stead, poxviruses clustered with bacterial homologs, theonly representative in phycodnaviruses (Emiliana huxleivirus) placed within the eukaryotic branch, and the restof the NCLDV formed a clade with the herpesvirushomologs (Figure 1D). However, a constrained tree withan NCLDV clade (excluding the Emiliana huxlei virus)could not be statistically rejected (Additional file 2). Theplacement of E. huxlei virus within the eukaryotic sub-tree was supported by high boostrap values, and mono-phyly of this virus with the rest of the NCLDV wasweakly supported (AU value <0.1) even if not firmlyrejected (Additional file 2). At present we cannot confi-dently conclude whether the poxvirus FLAP nuclease ismonophyletic with those of other NCLDV or evolvedthrough displacement of the ancestral form with a phagehomolog. However, even the simplest evolutionary sce-nario for this gene involves loss of the ancestral gene inphycodnaviruses followed by regain of the eukaryotichomolog by the E. huxlei virus.The remaining genes in this group are less common in

NCLDV although the ML reconstruction mapped themto the last common ancestor; the implication of the in-ferred ancestral status of these genes is that they havebeen lost on multiple occasions during the evolution ofthe NCLDV. Topoisomerase IB is represented in pox-viruses, one phycodnavirus and mimiviruses. In thephylogenetic tree, poxviruses form a clade with the phy-codnavirus but the position of the mimivirus is uncertain(Figure 2A). The monophyly of the NCLDV could notbe statistically rejected, so for this gene there is noconvincing indication of displacement in any of theNCLDV.A greater number of NCLDV encode Topoisomerase

II that is unrelated to Topoisomerase IB. The phylogen-etic tree of Topo II (Figure 2B) did not include anNCLDV clade, and monophyly of the NCLDV could bestatistically rejected, with the implication of independentacquisition of the Topo II gene from eukaryotes inmimiviruses and from a prokaryotic source in crocodilepoxvirus, with the provenance of this gene in the rest ofthe NCLDV remaining uncertain (Additional file 2). In

Page 7: RESEARCH Open Access Hidden evolutionary complexity of ... · candidate family of viruses with large DNA genomes that infect diverse eukaryotes and are collectively known as Nucleo-Cytoplasmic

Archaea B1

Archaea B3

Eukaryotic alpha

Eukaryotic delta

Eukaryotic zeta

IridoviridaeMarseillevirus

Mimiviridae

Phycodnaviridae

Phycodnaviridae (Phaeovirus)

Herpesviridae

100

82100

100

100

10077

72

86

84

Poxviridae

Baculoviridae

Asfarviridae100

9457

100

100

86

72

6080

10054

78

100

100

Ascoviridae100

94

0.2

A B

Bacteria

Bacteria/phages

Bf Lacre148543897

Bacteria/phages

Bp Desps51244031

Bp Dessa242279615

Irido/Ascoviridae

Mimiviridae

Asfarviridae

Poxviridae

88

81

10091

59

77

69

52

100

99

89

69

88

94

100

92

96

0.5

Marseillevirus

Phycodnaviridae

C D

Archaea B1

Eukaryotic alpha

Eukaryotic delta

Eukaryotic zeta

IridoviridaeMarseillevirus

Mimiviridae

Phycodnaviridae

Phycodnaviridae

Herpesviridae

100

Archaea B389

99

100

91

100 Poxviridae

Baculoviridae

Asfarviridae100

10052

93

70

100

10089

91

98

10060

5861

96

76

67

100

0.2

Ascoviridae100

94

Ascoviridae

IridoviridaeMarseillevirusenvironmental sequences

environmental sequences

Mimivirusenvironmental sequences

52

ArchaeaBacteria

Poxviridae

99

100

10090

58

93

0.2

q2 Emihu73852488Ea Entdi167391034

El Enccu19173066El Macmu109105914

E9 Arath42570539E8 Phatr219125938

Ew Triva123335114Ec Tetth146171182

Ek Leibr154339662100

99

61

87

78

65 63

97

54

85

DNA polymerase DNA polymerase, constrained

Primase-helicase FLAP endonuclease

Herpesviridae

Figure 1 Phylogenetic trees of ancestral NCLDV genes involved in DNA replication. (A). DNA polymerase, the original ML tree. (B). DNApolymerase, constrained tree, monophyly of NCLDV, baculoviruses and herpesviruses enforced. (C). Primase-helicase. (D). FLAP endonuclease.Branches with bootstrap support less than 50 were collapsed. For each sequence, the species name abbreviation and the gene identificationnumbers are indicated. Species abbreviations: Arath, Arabidopsis thaliana; Desps, Desulfotalea psychrophila LSv54; Dessa, Desulfovibrio salexigensDSM 2638; Emihu, Emiliania huxleyi virus 86; Enccu, Encephalitozoon cuniculi GB-M1; Entdi, Entamoeba dispar SAW760; Lacre, Lactobacillus reuteriDSM 20016; Leibr, Leishmania braziliensis MHOM/BR/75/M2904; Macmu, Macaca mulatta; Phatr, Phaeodactylum tricornutum CCAP 1055/1; Tetth,Tetrahymena thermophila; Triva, Trichomonas vaginalis G3; Bf, Firmicutes; Bp, Proteobacteria; E8, Stramenopiles; E9, Viridiplantae; Ea, Amoebozoa; Ec,Alveolata; Ek, Euglenozoa; El, Opisthokonta; Ew, Parabasalidea; q2, Coccolithovirus.

Yutin and Koonin Virology Journal 2012, 9:161 Page 7 of 18http://www.virologyj.com/content/9/1/161

Page 8: RESEARCH Open Access Hidden evolutionary complexity of ... · candidate family of viruses with large DNA genomes that infect diverse eukaryotes and are collectively known as Nucleo-Cytoplasmic

Bacteria

MimivirusPoxviridae

q6 Ostta290343645env136436120

environmental sequences

Eukaryotes

At Nitma161528830

9554

100

9885

57

50

10045

0.2

Eukaryores

Mimivirusenv142094046q6 Ostvi163955216

env142904322q1 Parbu157953897

env144065368l1 Invir109287964

l2 Invir15078758Marseillevirus

q2 Emihu73852918c1 Afrsw9628219

u1 Crovi115531754

Archaea, Bacteria100

87

8888

100100

50

50

89

10096

80

71

0.2

u2 Melsa9631347u2 Amsmo9964524

c1 Afrsw9628240env134945261

env139705312env142958695Mimivirus

env143874461env136260467Marseillevirus

env136708238

Bacteria, Archaea, Eukaryotes

10052

68

51

78

68

5491

100

0.2

Mimivirusenv143530885

env138728848

Eukaryotes

env134864780env143744333

environmental sequencesu2 Amsmo9964524

u2 Melsa9631347env136814837

env136279503c1 Afrsw9628204

Archaea, Bacteria9751

98

8574

53

98

76

92

75

10082

0.2

A B

C

D

E9 Arath30697546E9 Sorbi242073512

E9 Phypa168007250Bp Ruesp99080624

E9 Micsp255088687E9 Ostlu145350267

q1 Parbu9631735env142361341env143316185q6 Ostvi163955119

c1 Afrsw9628216environmental sequences

environmental sequencesMimivirus

Eukaryotes

Bacteria, phahes100

env135373769u2 Amsmo9964554

env144017490q1 Parbu9632034

q3 Ectsi13242536

7688

99

55

56100

64

9983

67

6549

55

70

85

99

79

74

6087

70

E

Topoisomerase IB DNA Topoisomerase II

AP endonuclease 2

Nucleotidyltransferase

YqaJ-like recombinase

Figure 2 (See legend on next page.)

Yutin and Koonin Virology Journal 2012, 9:161 Page 8 of 18http://www.virologyj.com/content/9/1/161

Page 9: RESEARCH Open Access Hidden evolutionary complexity of ... · candidate family of viruses with large DNA genomes that infect diverse eukaryotes and are collectively known as Nucleo-Cytoplasmic

(See figure on previous page.)Figure 2 Phylogenetic trees of ancestral NCLDV genes involved in DNA replication, recombination and repair. (A). Topoisomerase IB.(B). DNA Topoisomerase II. (C). AP endonuclease 2. (D). Nucleotidyltransferase. (E). YqaJ-like recombinase. Branches with bootstrap support lessthan 50 were collapsed. For each sequence, the species name abbreviation and the gene identification numbers are indicated; env stands forsequences retrieved from env_nr database. Species abbreviations: Invir, Invertebrate iridescent virus; Crovi, Crocodilepox virus; Afrsw, African swinefever virus; Amsmo, Amsacta moorei entomopoxvirus 'L'; Arath, Arabidopsis thaliana; Ectsi, Ectocarpus siliculosus virus 1; Emihu, Emiliania huxleyivirus 86; Ostlu, Ostreococcus lucimarinus CCE9901; Ostta, Ostreococcus tauri virus 1; Ostvi, Ostreococcus virus OsV5; Parbu, Paramecium bursariaChlorella virus FR483; Phypa, Physcomitrella patens subsp. patens; Melsa, Melanoplus sanguinipes entomopoxvirus; Micsp, Micromonas sp. RCC299;Nitma, Nitrosopumilus maritimus SCM1; Ruesp, Ruegeria sp. TM1040; Sorbi, Sorghum bicolor; At, Thaumarchaeota; Bp, Proteobacteria; E9,Viridiplantae; c1, Asfarviridae; l1, Chloriridovirus; l2, Iridovirus; q1, Chlorovirus; q2, Coccolithovirus; q3, Phaeovirus; q6, unclassified Phycodnaviridae;u1, Chordopoxvirinae; u2, Entomopoxvirinae.

Yutin and Koonin Virology Journal 2012, 9:161 Page 9 of 18http://www.virologyj.com/content/9/1/161

principle, the opposite direction of gene transfer, fromspecific groups of NCLDV to the respective cellular lifeforms, cannot be ruled out. However, it appears excep-tionally unlikely that, for example, a diverse assortmentof bacteria and archaea received the Topo II gene specif-ically from the crocodile poxvirus lineage.Phylogenetic analysis of apurinic endonuclease 2

(AP2), a repair enzyme represented in a diverse subsetof NCLDV, shows unequivocal support for the mono-phyly of the NCLDV, assuming that the numerous homolo-gous environmental sequences belong to uncharacterizedmembers of the respective NCLDV groups (Figure 2C). Inentomopoxviruses, AP2 is fused with a nucleotidyltransfer-ase (a member of the DNA polymerase beta family)whereas ASFV and mimivirus possess a distinct gene en-coding a nucleotidyltransferase. However, the phylogenetictree of nucleotidyltransferases strongly suggests independ-ent acquisitions of this gene by several NCLDV lineages(Figure 2D).The phylogenetic tree of the YqaJ family recombinase,

an enzyme whose role in NCLDV replication and/or re-pair remains uncertain, is best compatible with multipleacquisitions from bacteriophages although monophyly ofthe NCLDV could not be ruled out (Figure 2E andAdditional file 2).The unusual case of two distinct DNA ligases was ana-

lyzed in detail previously [18]. Surprisingly, the ATP-dependent ligase that is most common among theNCLDV shows clear signs of polyphyletic origin withmultiple, independent acquisitions by several viruslineages whereas the less common NAD-dependent lig-ase appears to be monophyletic, and probably ancestral.A re-analysis performed in the course of this work usingan up to date set of sequences provided strong supportfor this conclusion (Additional file 3).

Genes involved in transcription and mRNA maturationAll the NCLDV, with the exception of the majority ofphycodnaviruses, encode two large subunits of theDNA-dependent RNA polymerase (RNAP) that are alsouniversally conserved in all cellular life forms. It hasbeen reported that in phylogenetic trees the NCLDVRNAP subunits come across as polyphyletic [28,29],

and this conclusion is supported by our present ana-lysis (Figure 3A,B). However, for the alpha subunit ofthe RNAP, the constrained tree in which the NCLDVform a clade, with the exception of the mimivirusand ASFV, actually had the highest likelihood(Figure 3C; Additional file 2). Notably, in this treethe mimivirus gene clustered with the eukaryoticRNAP 2 whereas ASFV clustered with RNAP 1(Figure 3C), suggestive of two independent displace-ments of the ancestral NCLDV gene. Similar resultswere obtained for the RNAP beta subunit: althoughin this case the original tree with polyphyleticNCLDV had the greatest likelihood, the tree with anNCLDV clade excluding mimivirus and ASFV wasnearly as well supported (Additional file 2). As in thecase of the alpha subunit, in the RNAP beta subunittree the mimivirus gene clustered with the eukaryoticRNAP 2 whereas ASFV clustered with RNAP 1(Figure 3B).Three additional RNAP subunits/transcription factors

were too divergent to reconstruct reliable phylogenetictrees (Table 1). However, for Transcription factor TFIIB,monophyly of the NCLDV was strongly supported(Figure 3D).Phylogenetic analysis of Superfamily 2 helicases hom-

ologous to Vaccinia A18 protein (a helicase involved inlate transcription [30]) revealed a strongly supportedNCLDV clade, with two NCLDV genes placed in otherparts of the tree (Figure 4A). These two anomalies in-clude the E. huxlei phycodnavirus which fell within abacteriophage cluster and one of the two members ofthis helicase family encoded by ASFV. Interestingly, thisASFV protein clustered in the tree and shared domainorganization with homologs from Kinetoplastida, sug-gesting the possibility of gene acquisition from unicellu-lar eukaryotes by an ancestral asfarvirus. Monophyly ofthis Asfarvirus protein with other NCLDV was stronglyrejected (Additional file 2). Thus, two Asfarvirus pro-teins of this family likely have different origins: one isbona fide NCLDV protein whereas another was acquiredfrom a host.Another family of helicases implicated in transcription

includes homologs of Vaccinia D6 and D11 which are

Page 10: RESEARCH Open Access Hidden evolutionary complexity of ... · candidate family of viruses with large DNA genomes that infect diverse eukaryotes and are collectively known as Nucleo-Cytoplasmic

A B

C D

Eukaryotes II

El Hydma221110135Mimivirus

Asco/IridoviridaeMarseillevirus

q2 Emihu73852534

Eukaryotes III

Archaea

PoxviridaeBacteria

Eukaryotes I

c1 Afrsw9628206

100

74

100

10069

81

100

98

10058

92

52

94

100

10071

0.2

Eukaryotes II91

Eukaryotic I

Poxviridae

Bacteria

Archaea

Marseillevirus

Eukaryotes III

q2 Emihu73852908

c1 Afrsw9628161

100

10053

100

77

100

65

84

57

82

100100

92

Asco/Iridoviridae

El Hydma221110129Mimivirus

env144053945

79

10093

0.1

Eukaryotes II

El Hydma221110135

Mimivirus

Asco/Iridoviridae

PoxviridaeMarseillevirus

q2 Emihu73852534

c1 Afrsw9628206

Eukaryotes I

Eukaryotes III

Archaea

Bacteria

10098

100

85

74

100

68

84

100

99

67

76

100

83

92

0.2

q1 Acatu155371663q1 Parbu155370233

q1 ParNY157952458c1 Afrsw9628176

env135257836Marseillevirus

env143819106q6 Ostvi163955150

environmental sequencesMimivirus

environmental sequences

environmental sequences

Eukaryotes

env143237987Bb Psyto91218585

Archaea74

51

100

79100

87

100

73

100

79

86100

53

90

100

94

0.2

RNA polymerase alpha subunit RNA polymerase beta subunit

RNA polymerase alpha subunit,constrained

Transcription factor TFIIB

Figure 3 Phylogenetic trees of ancestral NCLDV genes involved in transcription. (A). RNA polymerase alpha subunit. (B). RNA polymerasebeta subunit. (C). RNA polymerase alpha subunit, constrained tree with an NCLDV clade excluding mimivirus and ASFV. (D). Transcription factorTFIIB. Branches with bootstrap support less than 50 were collapsed. For each sequence, the species name abbreviation and the geneidentification numbers are indicated; env stands for sequences retrieved from env_nr database. Species abbreviations: Acatu, Acanthocystisturfacea Chlorella virus 1; Afrsw, African swine fever virus; Emihu, Emiliania huxleyi virus 86; Hydma, Hydra magnipapillata; Ostvi, Ostreococcus virusOsV5; Parbu, Paramecium bursaria Chlorella virus FR483; ParNY, Paramecium bursaria Chlorella virus NY2A; Psyto, Psychroflexus torquis ATCC 700755;Bb, Bacteroidetes; El, Opisthokonta; c1, Asfarviridae; q1, Chlorovirus; q2, Coccolithovirus; q6, unclassified Phycodnaviridae.

Yutin and Koonin Virology Journal 2012, 9:161 Page 10 of 18http://www.virologyj.com/content/9/1/161

present in ASFV and in mimivirus. Thus, the presenceof an ancestral gene of this family in the NCLDV ances-tor implies losses in several groups of viruses. Phylogen-etic analysis of this gene strongly supports themonophyly of the NCLDV (Figure 4B).The capping enzyme of the NCLDV is a complex,

three-domain protein. The N-terminal phosphatase do-main is too divergent for phylogenetic analysis. Phylogen-etic analysis of the other two domains, guanylyltransferaseand methyltransferase, identifies an NCLDV clade, to the

exclusion of the single representative in iridoviruses thatappears to be an independent acquisition of a eukaryotichomolog subsequent to the loss of the ancestral NCLDVgene in iridoviruses (Figure 4C,D).The RNA ligase is an RNA processing enzyme [31]

that is represented in by conserved orthologs irido-viruses, ascoviruses, and Marseillevirus and by a moredistant homolog in ASFV; the precise role of this en-zyme in the NCLDV reproduction remains unclear. Inthe phylogenetic tree, the NCLDV ligases of the NCLDV

Page 11: RESEARCH Open Access Hidden evolutionary complexity of ... · candidate family of viruses with large DNA genomes that infect diverse eukaryotes and are collectively known as Nucleo-Cytoplasmic

A B

env134898567env143049744

env136380008Mimivirus

env136133804env144077465

q6 Ostvi163955040q3 Ectsi13242538

q1 Acatu155371590q1 Parbu9631722

PoxviridaeIridoviridae

Marseillevirusc1 Afrsw9628229

environmental sequencesEk Leiin146094415

c1 Afrsw9628148Ek Trycr71407144

El Phano169616097Ec Plavi156081817

Ea Entdi167392006q2 Emihu73852574

BacteriaArchaea

BacteriaBacteria

Bacteria

9097

95

7765

82100

98

62100

100

99

94

53

96

71

96

81

75

100

93

81

9183

10098

10052

99

0.2

Eukaryotes

Bacteria

Bacteria

Eukaryotes

100

96

7493

53

96

96

10093

0.2

c1 Afrsw9628180

Poxviridae

Mimivirus

Mimivirus

E9 Phypa168007236env135440874

c1 Afrsw9628208env135039429

MimivirusMarseillevirus

environmental sequencesPoxviridae

Ea Dicdi66805201Ea Entdi167392533El Enccu85014195

El Entbi169806196q2 Emihu73852927

env136191089q6 Ostvi290343628

env143320606env136743363

q1 Acatu155371666q1 Parbu9631672

l4 Infsp19881469El Caeel71980521

El Bruma170577180El Anoga158292641

El Trica91083171El Strpu72005733

EukaryotesEukaryotes

Eukaryotes

El Monbr167535971Eukaryotes

Eukaryotes

96

71

9654

100

92

99

60

10053

8376

68

99

67

100

96

7484

53

100

99

91

100

0.2

C

env135954485env136245388

Mimivirus

PoxviridaeMarseillevirus

env136185156env136595843

c1 Afrsw9628208

q6 Ostvi163955066

99

67

58

58100

100

83

66

env142418996env142908014env136139714

Eukaryotes93

80

99

100

0.2

Asco/Iridoviridae

env144204233c1 Afrsw9628169

Bacteria/phages

Baculoviridaeb1 Trini116326771

env144139321env140405543

90

99

54

98

89

54

8274

91

62

70

0.2

environmental sequencesMarseillevirus

environmental sequences

D

E

A18-like helicase D6/D11-like helicase

Capping enzyme (guanylyltransferase)

Capping enzyme (methyltransferase)

RNA ligase

Figure 4 (See legend on next page.)

Yutin and Koonin Virology Journal 2012, 9:161 Page 11 of 18http://www.virologyj.com/content/9/1/161

Page 12: RESEARCH Open Access Hidden evolutionary complexity of ... · candidate family of viruses with large DNA genomes that infect diverse eukaryotes and are collectively known as Nucleo-Cytoplasmic

(See figure on previous page.)Figure 4 Phylogenetic trees of ancestral NCLDV genes involved in mRNA processing/maturation. (A). Superfamily 2 helicases homologousto Vaccinia A18. (B). Superfamily 2 helicases homologous to Vaccinia D6 and D11. (C). Capping enzyme (guanylyltransferase domain). (D). Cappingenzyme (methyltransferase domain). (E). RNA ligase. Branches with bootstrap support less than 50 were collapsed. For each sequence, the speciesname abbreviation and the gene identification numbers are indicated; env stands for sequences retrieved from env_nr database. Speciesabbreviations: Phypa, Physcomitrella patens subsp. patens; Dicdi, Dictyostelium discoideum AX4; Entdi, Entamoeba dispar SAW760; Plavi, Plasmodiumvivax SaI-1; Leiin, Leishmania infantum JPCM5; Trycr, Trypanosoma cruzi strain CL Brener; Anoga, Anopheles gambiae str. PEST; Bruma, Brugia malayi;Caeel, Caenorhabditis elegans; Enccu, Encephalitozoon cuniculi GB-M1; Entbi, Enterocytozoon bieneusi H348; Monbr, Monosiga brevicollis MX1; Phano,Phaeosphaeria nodorum SN15; Strpu, Strongylocentrotus purpuratus; Trica, Tribolium castaneum; Trini, Trichoplusia ni ascovirus 2c; Afrsw, Africanswine fever virus; Infsp, Infectious spleen and kidney necrosis virus; Acatu, Acanthocystis turfacea Chlorella virus 1; Parbu, Paramecium bursariaChlorella virus FR483; Emihu, Emiliania huxleyi virus 86; Ectsi, Ectocarpus siliculosus virus 1; Ostvi, Ostreococcus virus OsV5; E9, Viridiplantae; Ea,Amoebozoa; Ec, Alveolata; Ek, Euglenozoa; El, Opisthokonta; c1, Asfarviridae; l4, Megalocytivirus; q1, Chlorovirus; q2, Coccolithovirus; q3, Phaeovirus;q6, unclassified Phycodnaviridae.

Yutin and Koonin Virology Journal 2012, 9:161 Page 12 of 18http://www.virologyj.com/content/9/1/161

form a well-supported clade (along with many uncharac-terized environmental sequences), with the exception ofthe Trichoplusia ni ascovirus which belonged to a cladewith baculovirus and diverse bacteriophages (Figure 4E).The constrained tree with an NCLDV clade was confi-dently statistically rejected (Additional file 2) indicatingthat in the Trichoplusia ni ascovirus the RNA ligasegenes was displaced by a homolog from a baculovirus.Finally, the previously investigated case of the small,

regulatory subunit of polyA polymerase is unusual inthat this gene is present only in poxviruses which alsoencode the large, catalytic subunit, and in a single mimi-virus strain which lacks the catalytic subunit. Despitethis sparse representation, the NCLDV small subunitsseem to form a strongly supported clade, suggestive ofthe possibility that this gene was present in the ancestralNCLDV [15].

Genes for enzymes of nucleotide metabolismMost of the NCLDV encode varying sets of enzymesinvolved in metabolism of deoxyribonucleotides. Thesegenes are not strictly essential for virus reproduction,given that knockouts are typically viable in cell cultures,but tend to be important in vivo. The most common en-zyme in this group is ribonucleotide reductase (RR)which consists of a large and a small subunits encodedby two distinct genes. The phylogenetic trees of both thelarge and the small subunits show complex topologiesthat are incompatible with monophyly of the NCLDV(Figure 5A,B). Moreover, for both subunits, poxvirusesand iridoviruses show clear, distinct affinities, witheukaryotic and bacteriophage homologs, respectively.The provenance of the RR in the other NCLDV is lessclear. On the whole, it seems to be a safe conclusion thatthe RR of the NCLDV are not monophyletic but ratherhave evolved in a complex evolutionary scenario thatinvolved multiple acquisitions, losses and displacement.Therefore, the results of the ML reconstruction notwith-standing, it is difficult to ascertain whether the last com-mon ancestor of the NCLDV encoded RR.

Thymidine kinase (TK) is another major enzyme ofdNTP biosynthesis that is present in a large subset ofthe NCLDV. Phylogenetic analysis reveals three distinctclusters of viral TKs (Figure 6A); a constrained tree con-taining an NCLDV clade could not be statisticallyrejected but nevertheless had a lower likelihood than theoriginal tree (Additional file 2). Qualitatively similarresults were obtained for Thymidylate kinase (TMPK), thesecond enzyme of thymidylate biosynthesis (Figure 6B).The dUTPase, an enzyme that functions at the interface ofnucleotide metabolism and repair, is present in poxviruses,iridoviruses and many phycodnaviruses, but showed clearsigns of polyphyletic origin, including statistically solid re-jection of NCLDV monophyly (Figure 6C, Additionalfile 2). Similar results have been reported from a pre-vious phylogenetic analysis for some of the genes en-coding enzymes of nucleotide metabolism [19].The majority of the sequences of putative ancestral

virion proteins and proteins involved in virus mor-phogenesis and virus-host interactions were too diver-gent to construct reliable phylogenetic trees. The onlyexception was the disulfide isomerase enzyme forwhich monophyly of the NCLDV (including numer-ous environmental sequences that presumably belongto uncharacterized viruses) could be demonstrated(Figure 7).

DiscussionEvolutionary reconstruction using patterns of genepresence-absence in viral genomes has led to the conclu-sion that the NCLDV represent a monophyletic group ofviruses that evolved from a common ancestor which wasa virus with genomic complexity comparable to that ofthe extant NCLDV. Approximately 50 genes that areconserved in different subsets of the NCLDV have beenassigned to the ancestral virus genome (Table 1), and itappears likely that the ancestral virus additionallyencompassed many lineage-specific genes. In this workwe went beyond the phyletic patterns by analyzing phy-logenies of the inferred ancestral NCLDV genes alongwith their homologs from cellular organisms. In almost

Page 13: RESEARCH Open Access Hidden evolutionary complexity of ... · candidate family of viruses with large DNA genomes that infect diverse eukaryotes and are collectively known as Nucleo-Cytoplasmic

A B

Bp Vibha269963258q3 Ectsi13242650

El Lodel149238952E8 Thaps224001624

env144060058q6 Ostvi163955126env144051596

Ek Leibr154340341Ek Trybr71755889

env144142045Ec Thepa71027825

Ec Parte145497945Ec Crymu209878422

PoxviridaeEl Acypi193608391

env143433140Mimivirus

El Ustma71020483E8 Phatr219110871

q3 Felsp197322445q2 Emihu73852902

Baculoviridaez3 Helze22788802

vi Musdo187903102Marseillevirus

k1 Cyphe131840167k1 Anghe282174153

o1 Shrwh17158276q1 Acatu155371048

q1 ParFR155370864q1 Parbu9632159

c1 Afrsw9628153Bp Frano118497573env136094176

env136607526env142360592env142151739l3 Lymch48843724

l3 Lymdi13358415l5 Ambti45686076l5 Frovi49237335l5 Singr56692701

env143333574l1 Aedta109287943

l2 Invir15078798env142551493

env143288148env144017272Bacteria

Bacteria67

99

100

78

92

56

100

100100

100

100

76

100

87

90

99

100

98 61100

85

97

7155

65

10087

92

100

52

8682

9659

10093

97

57

9898

83

10069

84

59

78

85

0.1

q3 Felsp197322443q3 Ectsi13242599

env135287498

Eukaryotes

EukaryotesEa Dicdi66822029

Ec Parte145537171

q2 Emihu73852496Ek Leibr154337607

Ek Trybr74025548

Bacteriaenv144044197env143932091

q6 Ostvi163955147env142408079

env142452833env136160125

q1 Parbu9632043q1 Acatu155370982

environmental sequences

MimivirusEukaryotes

Baculoviridae

Eukaryotes

Eukaryotes

u1 Canvi40556174z3 Helze22788780

l4 Infsp19881429k1 Anghe282174133

env135039280Bp Phoda269105083

c1 Afrsw9628152Marseillevirus

l5 Ambti45686047l5 Frovi49237365

l5 Singr56692684l3 Lymdi13358428

l3 Lymdi51869996Bp Frano118497575

l1 Invir109287926l2 Invir15079087

Bacteria

92

86

10098

79

89

9584

91

51

61

96

78

71

100

91

100

89

54

97

61

93

9794

100

96

98

100

61

75

8447

85

88

9572

98

76

84

66

80

93

99

0.1

100

100

Poxviridae

Ribonucleotide reductase, large subunit

Ribonucleotide reductase, small subunit

Figure 5 Phylogenetic trees of the ancestral NCLDV genes for ribonucleotide reductase subunits. (A). Ribonucleotide reductase, largesubunit. (B). Ribonucleotide reductase, small subunit. Branches with bootstrap support less than 50 were collapsed. For each sequence, thespecies name abbreviation and the gene identification numbers are indicated; env stands for sequences retrieved from env_nr database. Speciesabbreviations: Acatu, Acanthocystis turfacea Chlorella virus 1; Acypi, Acyrthosiphon pisum; Aedta, Invertebrate iridescent virus 3; Afrsw, Africanswine fever virus; Ambti, Ambystoma tigrinum virus; Anghe, Anguillid herpesvirus 1; Canvi, Canarypox virus; Crymu, Cryptosporidium muris RN66;Cyphe, Cyprinid herpesvirus 3; Dicdi, Dictyostelium discoideum AX4; Ectsi, Ectocarpus siliculosus virus 1; Emihu, Emiliania huxleyi virus 86; Felsp,Feldmannia species virus; Frano, Francisella novicida U112; Frovi, Frog virus 3; Helze, Heliothis zea virus 1; Infsp, Infectious spleen and kidneynecrosis virus; l1_Invir, Invertebrate iridescent virus 3; l2_Invir, Invertebrate iridescent virus 6; Leibr, Leishmania braziliensis MHOM/BR/75/M2904;Lodel, Lodderomyces elongisporus NRRL YB-4239; Lymch, Lymphocystis disease virus - isolate China; Lymdi, Lymphocystis disease virus 1; Musdo,Musca domestica salivary gland hypertrophy virus; Ostvi, Ostreococcus virus OsV5; Parbu, Paramecium bursaria Chlorella virus NY2A; ParFR,Paramecium bursaria Chlorella virus FR483; Parte, Paramecium tetraurelia strain d4-2; Phatr, Phaeodactylum tricornutum CCAP 1055/1; Phoda,Photobacterium damselae subsp. damselae CIP 102761; Shrwh, Shrimp white spot syndrome virus; Singr, Singapore grouper iridovirus; Thaps,Thalassiosira pseudonana CCMP1335; Thepa, Theileria parva strain Muguga; Trybr, Trypanosoma brucei; Ustma, Ustilago maydis 521; Vibha, Vibrioharveyi 1DA3; Bp, Proteobacteria; E8, Stramenopiles; Ea, Amoebozoa; Ec, Alveolata; Ek, Euglenozoa; El, Opisthokonta; c1, Asfarviridae; k1, Herpesvirales;l1, Chloriridovirus; l2, Iridovirus; l3, Lymphocystivirus; l4, Megalocytivirus; l5, Ranavirus; o1, Nimaviridae; q1, Chlorovirus; q2, Coccolithovirus; q3,Phaeovirus; q6, unclassified Phycodnaviridae; u1, Chordopoxvirinae; vi, Nudivirus; z3, unclassified dsDNA viruses.

Yutin and Koonin Virology Journal 2012, 9:161 Page 13 of 18http://www.virologyj.com/content/9/1/161

Page 14: RESEARCH Open Access Hidden evolutionary complexity of ... · candidate family of viruses with large DNA genomes that infect diverse eukaryotes and are collectively known as Nucleo-Cytoplasmic

A BEukaryotes

PoxviridaeEukaryotes

EukaryotesPoxviridae

q6 Ostvi163955199env136422265env142443153env144065395

env143975708env144021532

env142459479env136229246Marseillevirus

env142470004env136062509

Eukaryotes (plants)c1 Afrsw9628158

o1 Shrwh17158497Mimivirus

u2 Amsmo9964330Bacteria98

95

57

100

9771

9188

83

95

79

87

82

84

69

60

9880

6076

100

52

72

100

0.1Asco/Iridoviridae Phycodnaviridae environmental sequences

environmental sequencesenv142414951

k1 Icthe9626826l4 Infsp19881437

vi Musdo187903106Ec Parte145528967

Ec Tetth118389890u1 Canvi40556137

u1 Fowvi9634729u1 Fowvi9634821

env138940489u1 Canvi40556151

Eukaryotes

Bacteria/phagesEukaryotes

68

El Lacth255712129El Cryne58265708

l2 Invir15078963Ec Tetth118351574

El Brafl260796149u1 Canvi40556108

El Homsa42544174E9 Arath145334853

Ea Dicdi66804303E9 Ostlu145348882

u1 Ectvi22164753u1 Varvi9627681u1 Vacvi66275971

Bp Psesy289676128Bb Chipi256419682Bf Desha89894606

Bf Clobo226947280q2 Emihu73852905

Ek Trycr71423920Ek Leima157865253

c1 Afrsw9628143El Caebr268573866

54

100

8493

50

62

8364

97

55

8780

72

70

100

98

51

97

51

72

86

82

5574

85

97

99

94

72

100

817481

0.2

l2 Invir15079149

E9 Phypa167999518Ea Dicdi66800487

El Apime48097825El Nasvi156541544

El Cioin198419421

l5 Singr56692686El Homsa70906444

a2 Fowad9628836a2 Fowad9633174

l5 Ambti45686051l5 Frovi49237361

d2 Spoli148368872z3 Helze22788776

u2 Amsmo9964316Poxviruses

El Caeel71988561d2 Agrse46309437

Ea Enthi67470091env138142700

env143826648env142423093

env143987160

q6 Ostta290343671env143982771

env134673310

BacteriaBp Rhosp146280038

env143370438

q2 Emihu73852871q1 Acatu155371447

q1 Parbu157953045Eukaryotes

100

99

6972

68

96

88

99

53

8685

85

74

9551

54

97

71

77

61

93

100

86

70

75

67

88

70

94

67

0.1

C

Thymidine kinase Thymidylate kinase

dUTPase

Figure 6 (See legend on next page.)

Yutin and Koonin Virology Journal 2012, 9:161 Page 14 of 18http://www.virologyj.com/content/9/1/161

Page 15: RESEARCH Open Access Hidden evolutionary complexity of ... · candidate family of viruses with large DNA genomes that infect diverse eukaryotes and are collectively known as Nucleo-Cytoplasmic

(See figure on previous page.)Figure 6 Phylogenetic trees of ancestral NCLDV genes encoding enzymes of nucleotide metabolism. (A). Thymidine kinase.(B). Thymidylate kinase. (C). dUTPase. Branches with bootstrap support less than 50 were collapsed. For each sequence, the species nameabbreviation and the gene identification numbers are indicated; env stands for sequences retrieved from env_nr database. Species abbreviations:Acatu, Acanthocystis turfacea Chlorella virus 1; Afrsw, African swine fever virus; Agrse, Agrotis segetum granulovirus; Ambti, Ambystoma tigrinumvirus; Amsmo, Amsacta moorei entomopoxvirus 'L'; Apime, Apis mellifera; Arath, Arabidopsis thaliana; Brafl, Branchiostoma floridae; Caebr,Caenorhabditis briggsae; Caeel, Caenorhabditis elegans; Canvi, Canarypox virus; Chipi, Chitinophaga pinensis DSM 2588; Cioin, Ciona intestinalis;Clobo, Clostridium botulinum A2 str. Kyoto; Cryne, Cryptococcus neoformans var. neoformans JEC21; Desha, Desulfitobacterium hafniense Y51; Dicdi,Dictyostelium discoideum AX4; Ectvi, Ectromelia virus; Emihu, Emiliania huxleyi virus 86; Enthi, Entamoeba histolytica HM-1:IMSS; Fowad, Fowladenovirus A; Fowvi, Fowlpox virus; Frovi, Frog virus 3; Helze, Heliothis zea virus 1; Homsa, Homo sapiens; Icthe, Ictalurid herpesvirus 1; Infsp,Infectious spleen and kidney necrosis virus; l2_Invir, Invertebrate iridescent virus 6; Lacth, Lachancea thermotolerans; Leima, Leishmania major;Musdo, Musca domestica salivary gland hypertrophy virus; Nasvi, Nasonia vitripennis; Ostlu, Ostreococcus lucimarinus CCE9901; Ostta, Ostreococcustauri virus 1; Ostvi, Ostreococcus virus OsV5; Parbu, Paramecium bursaria Chlorella virus NY2A; Parte, Paramecium tetraurelia strain d4-2; Phypa,Physcomitrella patens subsp. patens; Psesy, Pseudomonas syringae pv. syringae FF5; Rhosp, Rhodobacter sphaeroides ATCC 17025; Shrwh, Shrimpwhite spot syndrome virus; Singr, Singapore grouper iridovirus; Spoli, Spodoptera litura granulovirus; Tetth, Tetrahymena thermophila; Trybr,Trypanosoma brucei; Vacvi, Vaccinia virus; Varvi, Variola virus; Bb, Bacteroidetes; Bf, Firmicutes; Bp, Proteobacteria; E9, Viridiplantae; Ea, Amoebozoa; Ec,Alveolata; Ek, Euglenozoa; El, Opisthokonta; a2, Adenoviridae; c1, Asfarviridae; d2, Baculoviridae; k1, Herpesvirales; l2, Iridovirus; l4, Megalocytivirus; l5,Ranavirus; o1, Nimaviridae; q1, Chlorovirus; q2, Coccolithovirus; q6, unclassified Phycodnaviridae; u1, Chordopoxvirinae; u2, Entomopoxvirinae; vi,Nudivirus; z3, unclassified dsDNA viruses.

Yutin and Koonin Virology Journal 2012, 9:161 Page 15 of 18http://www.virologyj.com/content/9/1/161

all cases where the information content of the respectivemultiple sequence alignments was sufficient for phylo-genetic analysis, deviations from the simple pattern ofvertical evolution were observed (summarized inTable 1). Strikingly, phylogenetic trees for most of theconserved NCLDV genes failed to show an NCLDVclade although it was not always possible to reject themonophyly of the NCLDV at a statistically significantlevel.The results of phylogenetic analysis of viral genes have

to be interpreted with caution given the generally fastevolution of viral genomes and the ensuing possibility ofartifacts such as long branch attraction. Therefore, weconsidered the monophyly of the NCLDV to be the mostappropriate null hypothesis and generally made conclu-sions on more complex evolutionary scenarios onlywhen this hypothesis could be rejected using conserva-tive statistical tests of tree topology. Nevertheless, evenwith this conservative approach, phylogenomic analysissuggests that evolution of the core NCLDV genes, withonly a few likely exceptions (DNA polymerase, disulfideisomerase and several other genes; see Table 1), includednot only multiple gene losses that were apparent alreadyfrom the examination of the phyletic patterns, but alsomultiple cases of xenologous gene displacement [25], i.e.displacement of the ancestral gene by a homologous(and functionally analogous) gene from a differentsource such as bacteriophage or eukaryote. On someoccasions, when a gene is missing in a large group ofviruses (such as phycodnaviruses) except for a minorityof members of that group, in which it is phylogeneticallydistinct from homologs in other NCLDV, the sequenceof events leading to displacement can be inferred as lossfollowed by regain. In other cases, such as RR and TK,the course of evolution seems to have been too complexto reconstruct the specific scenario.

Thus, the present analysis reveals layers of hiddencomplexity in the history of the conserved gene core ofthe NCLDV that are not apparent from the analysis ofpatterns of gene presence-absence alone [5]. Althoughdeviations from simple vertical evolution probably oc-curred in the history of almost all core genes, theresults do not invalidate the conclusion on the evolu-tion of all known NCLDV from a single ancestral virus.The high prevalence of gene loss and xenologous genedisplacement notwithstanding, for most of the coregenes, these events affected the evolution of only a fewlineages, consistent with the origin of all NCLDV froma common ancestor, followed by isolated events compli-cating the evolutionary scenarios. However, for severalgenes that have been included in the core set on thebasis of gene presence-absence patterns, such as theenzymes of DNA precursor metabolism, multiplesources are apparent, so that the ancestral status ofthese genes becomes uncertain.Phylogenetic analysis of the core NCDLV genes reveals

multiple affinities with genes from eukaryotes, bacteriaand bacteriophages. The acquisition of genes fromeukaryotic hosts (that might not be the same for ances-tral and extant viruses) is not surprising. However, genetransfer to NCLDV, in particular those infecting unicel-lular eukaryotes, from bacteria and bacteriophages isplausible as well given that diverse parasites and sym-bionts often coexist within the same eukaryotic host. In-deed, in amoebas, with their large cells and phagocyticlife style, such coexistence is the norm, making theseorganisms veritable ‘melting pots’ of virus evolution[7,32].The general trend seems to be that bona fide essential

genes, are rarely displaced because for these, intermedi-ates lacking the gene are most like non-viable, so ifintermediate forms existed, they should have encoded

Page 16: RESEARCH Open Access Hidden evolutionary complexity of ... · candidate family of viruses with large DNA genomes that infect diverse eukaryotes and are collectively known as Nucleo-Cytoplasmic

Protein disulfide isomeraseMarseillevirus

env136543793Iridoviridaeenvironmental sequences

env140203035env136808647

q1 ParAR157953760env136606018

env136840138Mimivirus

environmental sequencesMimivirus

env136256796

environmental sequencesq3 Felsp197322406

env141253368env134921190

q3 Ectsi13242631env134459082

environmental sequencesenv139731803l2 Invir15079059

l1 Aedta109287974Ascoviridae

c1 Afrsw9628181environmental sequences

Poxviridae

Eukaryotes

99

q2 Emihu73852597env141323596

80

69

7588

67

72

67

99

5568

84

55

81

72

87

9181

95

60

9455

9365

10060

78

9659

74

0.2

Figure 7 Phylogenetic tree of an ancestral NCLDV geneencoding an enzyme involved in virion morphogenesis: proteindisulfide isomerase. Branches with bootstrap support less than 50were collapsed. For each sequence, the species name abbreviationand the gene identification numbers are indicated; env stands forsequences retrieved from env_nr database. Species abbreviations:Afrsw, African swine fever virus; Aedta, Invertebrate iridescent virus 3;Invir, Invertebrate iridescent virus; ParAR, Paramecium bursariaChlorella virus AR158; Emihu, Emiliania huxleyi virus 86; Ectsi,Ectocarpus siliculosus virus 1; Felsp, Feldmannia species virus; c1,Asfarviridae; l1, Chloriridovirus; l2, Iridovirus; q1, Chlorovirus; q2,Coccolithovirus; q3, Phaeovirus.

Yutin and Koonin Virology Journal 2012, 9:161 Page 16 of 18http://www.virologyj.com/content/9/1/161

two forms of the respective genes, with the original genesubsequently lost and the xenologous gene retained.This evolutionary scenario might be rare due to the con-straints imposed by the requirements for the formationof multisubunit complexes (under the complexity hy-pothesis of Lake and colleagues [33]), e.g. the replisome.However, the clear-cut xenologous displacement of theprimase-helicase gene in phycodnaviruses shows thatthese obstacles are not insurmountable. In contrast, forgenes that are not strictly essential but are beneficial inmost virus-host systems, such as the precursor biosyn-thesis enzymes, parallel loss and regain in multiple

lineages seems to be the rule rather than an exception.This pattern can be linked to the viability of evolutionaryintermediates lacking the respective genes, at least in theshort term.In addition to cases of xenologous gene displacement,

phylogenies of a few core genes point to evolutionarylinks with other large DNA viruses, such as herpes-viruses and baculoviruses, as well as bacteriophages.These observations are compatible with the virus worldconcept under which viruses are linked through complexnetworks of evolutionary connections at the level of in-dividual genes and in some cases gene modules, and arealso involved in extensive gene exchange with cellularlife forms.

MethodsThe NCLDV protein sequences were extracted from theRefSeq database (NCBI, NIH, Bethesda) [34]. The mostrecently sequenced NCLDV genomes, namely the Cafe-teria roenbergensis virus [35], the megavirus [16], both ofthe Mimiviridae family, and Lausannevirus [8], a closerelative of Marseille virus, were not included. For eachcluster of orthologous NCLDV genes (NCVOG) that hasbeen mapped to the last common ancestor of theNCLDV [5], the following procedure was applied. A rep-resentative NCVOG sequence set was constructed byclustering the complete collection of the respectiveprotein sequences using the Blastclust program (ftp://ftp.ncbi.nih.gov/blast/documents/blastclust.html) and select-ing a representative from each cluster of closely relatedsequences). Two BLASTP runs, one against the Refseqdatabase and the other one against the environmental(env_nr) database at the NCBI, were performed foreach representative sequence with the e-value cutoff of0.1. This liberal cutoff was used in order to incorpor-ate highly diverged homologs. To eliminate potentialfalse positives, all alignments were examined case bycase for the conservation of domain architecture andpresence of diagnostic motifs. All the sequences fromthe given NCVOG, the top 20–30 Refseq hits fromeach domain of the NCBI Taxonomy (Eukaryota, Bac-teria, Archaea, and Viruses), and top 10 environmentalhits for each query were combined, and nearly identi-cal sequences were eliminated using Blastclust. Theresulting sequences were aligned using MUSCLE [36];gapped columns (more than 30% of gaps) and columnswith low information content were removed from thealignment [37]. A preliminary tree was constructedusing PhyML [38], with the following parameters:WAG substitution matrix; four relative substitutionrate categories; the fraction of invariable sites andthe alpha parameter of the gamma distribution ofsite-specific evolution rates) were automaticallyselected by PhyML. From this preliminary tree, a

Page 17: RESEARCH Open Access Hidden evolutionary complexity of ... · candidate family of viruses with large DNA genomes that infect diverse eukaryotes and are collectively known as Nucleo-Cytoplasmic

Yutin and Koonin Virology Journal 2012, 9:161 Page 17 of 18http://www.virologyj.com/content/9/1/161

subset of sequences representing each clade wereselected and re-aligned. In the next step, 8 PhyMLruns were performed for each alignment, with 8 sub-stitution models (Blosum62, Dayhoff, JTT, DCMut,RtREV, CpREV, VT, and WAG). The best topologywas chosen by the maximum log likelihood amongthe 8 trees. Bootstrap branch supports (expressed inExpected-Likelihood Weights, ELW) were calculatedusing TreeFinder [39] for the best PhyML tree usingthe best matrix. For detailed phylogenetic analysis,whenever appropriate, alternative (constrained) top-ology ML trees were constructed using TreeFinder[39], with the substitution model found to be thebest for a given alignment in the first-round analysis.Tree topologies were compared with TreeFinderusing the approximately unbiased (AU) test P value[40].

Additional files

Additional file 1: The phyletic patterns of absence-presence of theNCLDV core genes in the sequenced NCLDV genomes.

Additional file 2: Results of statistical analysis of constrained treesfor the core NCLDV genes.

Additional file 3: Phylogenetic trees for the DNA ligases of theNCLDV. A. ATP-dependent ligases. B. NAD-dependent ligases.

Competing interestsThe authors declare that they have no competing interests.

Authors’ contributionsEVK initiated and designed the study; NY collected and analyzed data; EVKwrote the manuscript that was read, edited and approved by both authors.

AcknowledgementsThe authors’ research is supported by the DHHS intramural program (NIH,national Library of Medicine).

Received: 10 February 2012 Accepted: 8 August 2012Published: 14 August 2012

References1. Forterre P: The origin of viruses and their possible roles in major

evolutionary transitions. Virus Res 2006, 117(1):5–16.2. Koonin EV, Senkevich TG, Dolja VV: The ancient virus world and evolution

of cells. Biol Direct 2006, 1(1):29.3. Iyer LM, Aravind L, Koonin EV: Common origin of four diverse families of

large eukaryotic DNA viruses. J Virol 2001, 75(23):11720–11734.4. Iyer LM, Balaji S, Koonin EV, Aravind L: Evolutionary genomics of

nucleo-cytoplasmic large DNA viruses. Virus Res 2006, 117(1):156–184.5. Yutin N, Wolf YI, Raoult D, Koonin EV: Eukaryotic large nucleo-cytoplasmic

DNA viruses: clusters of orthologous genes and reconstruction of viralgenome evolution. Virol J 2009, 6:223.

6. Koonin EV, Yutin N: Origin and evolution of eukaryotic largenucleo-cytoplasmic DNA viruses. Intervirology 2010, 53(5):284–292.

7. Boyer M, Yutin N, Pagnier I, Barrassi L, Fournous G, Espinosa L, Robert C,Azza S, Sun S, Rossmann MG, et al: Giant Marseillevirus highlights the roleof amoebae as a melting pot in emergence of chimeric microorganisms.Proc Natl Acad Sci U S A 2009, 106(51):21848–21853.

8. Thomas V, Bertelli C, Collyn F, Casson N, Telenti A, Goesmann A, Croxatto A,Greub G: Lausannevirus, a giant amoebal virus encoding histonedoublets. Environ Microbiol 2011, 13(6):1454–1466.

9. Moss B: Poxviridae: the viruses and their replication. In Fields Virology,Volume 2. Edited by Knipe DM, Howley PM. Philadelphia: LippincottWilliams & Wilkins; 2007:2905–2946.

10. Claverie JM, Abergel C, Ogata H: Mimivirus. Curr Top Microbiol Immunol2009, 328:89–121.

11. Raoult D, Boyer M: Amoebae as genitors and reservoirs of giant viruses.Intervirology 2010, 53(5):321–329.

12. Colson P, Raoult D: Gene repertoire of amoeba-associated giant viruses.Intervirology 2010, 53(5):330–343.

13. Van Etten JL, Lane LC, Dunigan DD: DNA viruses: the really big ones(giruses). Annu Rev Microbiol 2010, 64:83–99.

14. Raoult D, Audic S, Robert C, Abergel C, Renesto P, Ogata H, La Scola B,Suzan M, Claverie JM: The 1.2-megabase genome sequence of Mimivirus.Science 2004, 306(5700):1344–1350.

15. Colson P, Yutin N, Shabalina SA, Robert C, Fournous G, La Scola B, Raoult D,Koonin EV: Viruses with more than 1,000 genes: Mamavirus, a newAcanthamoeba polyphaga mimivirus strain, and reannotation ofMimivirus genes. Genome Biol Evol 2011, 3:737–742.

16. Arslan D, Legendre M, Seltzer V, Abergel C, Claverie JM: Distant Mimivirusrelative with a larger genome highlights the fundamental features ofMegaviridae. Proc Natl Acad Sci U S A 2011, 108(42):17486–17491.

17. Van Etten JL: Unusual life style of giant chlorella viruses. Annu Rev Genet2003, 37:153–195.

18. Yutin N, Koonin EV: Evolution of DNA ligases of nucleo-cytoplasmic largeDNA viruses of eukaryotes: a case of hidden complexity. Biol Direct 2009, 4:51.

19. Filee J, Pouget N, Chandler M: Phylogenetic evidence for extensive lateralacquisition of cellular genes by Nucleocytoplasmic large DNA viruses.BMC Evol Biol 2008, 8:320.

20. Weynberg KD, Allen MJ, Ashelford K, Scanlan DJ, Wilson WH: From smallhosts come big viruses: the complete genome of a second Ostreococcustauri virus, OtV-1. Environ Microbiol 2009, 11(11):2821–2839.

21. Filee J, Forterre P, Laurent J: The role played by viruses in the evolution oftheir hosts: a view based on informational protein phylogenies. ResMicrobiol 2003, 154(4):237–243.

22. Monier A, Claverie JM, Ogata H: Taxonomic distribution of large DNAviruses in the sea. Genome Biol 2008, 9(7):R106.

23. Ogata H, Toyoda K, Tomaru Y, Nakayama N, Shirai Y, Claverie JM, Nagasaki K:Remarkable sequence similarity between the dinoflagellate-infectingmarine girus and the terrestrial pathogen African swine fever virus. VirolJ 2009, 6:178.

24. Felsenstein J: Inferring Phylogenies. Sunderland, MA: Sinauer Associates; 2004.25. Koonin EV, Makarova KS, Aravind L: Horizontal gene transfer in

prokaryotes: quantification and classification. Annu Rev Microbiol 2001,55:709–742.

26. Senkevich TG, Koonin EV, Moss B: Predicted poxvirus FEN1-like nucleaserequired for homologous recombination, double-strand break repair andfull-size genome formation. Proc Natl Acad Sci U S A 2009,106(42):17921–17926.

27. Everly DN Jr, Feng P, Mian IS, Read GS: mRNA degradation by the virionhost shutoff (Vhs) protein of herpes simplex virus: genetic andbiochemical evidence that Vhs is a nuclease. J Virol 2002,76(17):8560–8571.

28. Sonntag KC, Darai G: Evolution of viral DNA-dependent RNA polymerases.Virus Genes 1995, 11(2–3):271–284.

29. Lane WJ, Darst SA: Molecular evolution of multisubunit RNA polymerases:sequence analysis. J Mol Biol 2010, 395(4):671–685.

30. Lackner CA, Condit RC: Vaccinia virus gene A18R DNA helicase is atranscript release factor. J Biol Chem 2000, 275(2):1485–1494.

31. Pascal JM: DNA and RNA ligases: structural variations and sharedmechanisms. Curr Opin Struct Biol 2008, 18(1):96–105.

32. Moliner C, Fournier PE, Raoult D: Genome analysis of microorganismsliving in amoebae reveals a melting pot of evolution. FEMS Microbiol Rev2010, 34:281–294.

33. Jain R, Rivera MC, Lake JA: Horizontal gene transfer among genomes: thecomplexity hypothesis. Proc Natl Acad Sci U S A 1999, 96(7):3801–3806.

34. Sayers EW, Barrett T, Benson DA, Bolton E, Bryant SH, Canese K,Chetvernin V, Church DM, Dicuccio M, Federhen S, et al: Databaseresources of the National Center for Biotechnology Information. NucleicAcids Res 2012, 40(Database issue):D13–D25.

35. Fischer MG, Allen MJ, Wilson WH, Suttle CA: Giant virus with a remarkablecomplement of genes infects marine zooplankton. Proc Natl Acad SciU S A 2010, 107(45):19508–19513.

Page 18: RESEARCH Open Access Hidden evolutionary complexity of ... · candidate family of viruses with large DNA genomes that infect diverse eukaryotes and are collectively known as Nucleo-Cytoplasmic

Yutin and Koonin Virology Journal 2012, 9:161 Page 18 of 18http://www.virologyj.com/content/9/1/161

36. Edgar RC: MUSCLE: multiple sequence alignment with high accuracy andhigh throughput. Nucleic Acids Res 2004, 32(5):1792–1797.

37. Yutin N, Makarova KS, Mekhedov SL, Wolf YI, Koonin EV: The deep archaealroots of eukaryotes. Mol Biol Evol 2008, 25(8):1619–1630.

38. Guindon S, Gascuel O: A simple, fast, and accurate algorithm to estimatelarge phylogenies by maximum likelihood. Syst Biol 2003, 52(5):696–704.

39. Jobb G, von Haeseler A, Strimmer K: TREEFINDER: a powerful graphicalanalysis environment for molecular phylogenetics. BMC Evol Biol 2004,4:18.

40. Shimodaira H: An approximately unbiased test of phylogenetic treeselection. Syst Biol 2002, 51(3):492–508.

doi:10.1186/1743-422X-9-161Cite this article as: Yutin and Koonin: Hidden evolutionary complexity ofNucleo-Cytoplasmic Large DNA viruses of eukaryotes. Virology Journal2012 9:161.

Submit your next manuscript to BioMed Centraland take full advantage of:

• Convenient online submission

• Thorough peer review

• No space constraints or color figure charges

• Immediate publication on acceptance

• Inclusion in PubMed, CAS, Scopus and Google Scholar

• Research which is freely available for redistribution

Submit your manuscript at www.biomedcentral.com/submit