Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
1
Evidence supporting a viral origin of the eukaryotic nucleus 1
2
3
Dr Philip JL Bell 4
Microbiogen Pty Ltd 5
Correspondence should be addressed to Email [email protected] 6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
Keywords: Viral eukaryogenesis, nucleus, eukaryote origin, viral factory, mRNA capping, phylogeny 25
26
27
.CC-BY-NC-ND 4.0 International licenseauthor/funder. It is made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the. https://doi.org/10.1101/679175doi: bioRxiv preprint
2
Abstract 28
29
The defining feature of the eukaryotic cell is the possession of a nucleus that uncouples transcription 30
from translation. This uncoupling of transcription from translation depends on a complex process 31
employing hundreds of eukaryotic specific genes acting in concert and requires the 7-32
methylguanylate (m7G) cap to prime eukaryotic mRNA for splicing, nuclear export, and cytoplasmic 33
translation. The origin of this complex system is currently a paradox since it is not found or needed 34
in prokaryotic cells which lack nuclei, yet it was apparently present and fully functional in the Last 35
Eukaryotic Common Ancestor (LECA). According to the Viral Eukaryogenesis (VE) hypothesis the 36
abrupt appearance of the nucleus in the eukaryotic lineage occurred because the nucleus descends 37
from the viral factory of a DNA phage that infected the archaeal ancestor of the eukaryotes. 38
Consequently, the system for uncoupling of transcription from translation in eukaryotes is predicted 39
by the VE hypothesis to be viral in origin. In support of this hypothesis it is shown here that m7G 40
capping apparatus that primes the uncoupling of transcription from translation in eukaryotes is 41
present in viruses of the Mimiviridae but absent from bona-fide archaeal relatives of the eukaryotes 42
such as Lokiarchaeota. Furthermore, phylogenetic analysis of the m7G capping pathway indicates 43
that eukaryotic nuclei and Mimiviridae obtained this pathway from a common ancestral source that 44
predated the origin of LECA. These results support the VE hypothesis and suggest the eukaryotic 45
nucleus and the Mimiviridae descend from a common First Eukaryotic Nuclear Ancestor (FENA). 46
47
48
49
50
51
52
53
.CC-BY-NC-ND 4.0 International licenseauthor/funder. It is made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the. https://doi.org/10.1101/679175doi: bioRxiv preprint
3
Introduction 54
55
A membrane-bound nucleus defines the eukaryotic domain, and all cellular organisms without nuclei 56
are prokaryotic (Sapp 2005; Stanier and Van Niel 1962). Its presence contributes to a great divide 57
between eukaryotes and prokaryotes defined by features such as linear chromosomes, telomeres, 58
nuclear pores, the spliceosome, mitosis, meiosis, the sexual cycle, and the endoplasmic reticulum. 59
Since the nucleus separates the eukaryotic genome from the ribosomal apparatus, its presence also 60
introduces an uncoupling of transcription from translation unique to the eukaryotic domain. 61
62
Lokiarchaeota reportedly ‘bridges the gap between prokaryotes and eukaryotes’ and are proposed 63
to be bona-fide archaeal relative of the eukaryotes (Spang et al. 2015). Lokiarchaeota and related 64
Asgardians encode Crenactins, the ESCRT-III complex, a family of small Ras-like GTPases and a 65
ubiquitin system, making them a plausible direct descendent of an archaeal ancestor of the 66
eukaryotes (Koonin 2015). This discovery supports the Eocyte tree of life where eukaryotes evolved 67
from a specific archaeon rather than representing a sister group to the archaea (Riviera and Lake 68
1992). If Lokiarchaeota are bona-fide archaeal relatives of the eukaryotes, the last common 69
ancestor of the Asgard archaea and the eukaryotes can be inferred to be an archaeal First Eukaryotic 70
Common Ancestor (FECA) (Eme et al. 2017). 71
72
Despite sharing a proposed archaeal ancestor with Lokiarchaeota, the last eukaryotic common 73
ancestor (LECA) possessed both a nucleus and a mitochondrion and no eukaryotes are descended 74
from any earlier intermediates without both these complex organelles (Neumann et al. 2010). 75
Whilst the abrupt appearance of the mitochondrion in LECA is persuasively explained by its 76
endosymbiotic descent from an alpha-proteobacterium (e.g. Lang et al. 1999) the similarly abrupt 77
appearance of a nucleus has been much more difficult to explain (Martin 1999; Martin 2005). 78
79
.CC-BY-NC-ND 4.0 International licenseauthor/funder. It is made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the. https://doi.org/10.1101/679175doi: bioRxiv preprint
4
The presence of the eukaryotic nucleus results in an uncoupling transcription and translation, and 80
this uncoupling requires mRNA to be synthesised inside the nucleus, capped, processed, and 81
exported into the cytoplasm for translation (Kyrieleis et al. 2014). This contrasts to prokaryotic 82
translational systems that rely on direct recognition of uncapped mRNA by the ribosomal apparatus 83
(Benelli and Londei 2011). Evidence of archaeal methanogens 3.8 billion years ago (Battistuzzi et al. 84
2004) shows that prokaryotes evolved well before the eukaryotes originated 1.8 billion years ago 85
(Parfrey et al. 2011). Accordingly the prokaryotic system of direct recognition of mRNA by the 86
ribosomal apparatus existed for nearly two billion years before the nucleus and its cap based system 87
abruptly appeared in LECA. 88
89
The change from a prokaryotic translational system found in FECA (Figure 1) to the uncoupled 90
eukaryotic system found in LECA (Figure 2) involved the evolution of a complex molecular system 91
involving hundreds of interacting genes. The m7G cap is critical to this process since it primes the 92
mRNA for processing, export and translation (Figure 2). The genes required to add the m7G cap 93
include: an RNA polymerase (RNAP-II) dedicated to capped mRNA synthesis (Sentenac 1985), an RNA 94
triphophatase (TPase), guanylyltransferase (GTase) and methyltransferase (MTase) required for 95
capping mRNA (Kyrieleis et al. 2014). A cap binding protein (eIF4E) is also essential since it is 96
required for initiating translation of the capped mRNA in the cytoplasm (Marcotrigiano et al. 1997). 97
Paradoxically, the high level of complexity and the integrated nature of the cap based system of 98
uncoupling transcription from translation suggest a long evolutionary history, yet no transitional 99
cellular forms linking the prokaryotic (Figure 1) and eukaryotic systems (Figure 2) have been 100
described. Consequently if only prokaryotes are considered as source for the eukaryotic m7G cap 101
based system, an abrupt and currently insurmountable phylogenetic impasse is encountered. 102
103
The Viral Eukaryogenesis (VE) hypothesis proposes the nucleus derives from an ancient DNA 104
phage/virus and predicts the m7G cap based system that primes the uncoupling of transcription 105
.CC-BY-NC-ND 4.0 International licenseauthor/funder. It is made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the. https://doi.org/10.1101/679175doi: bioRxiv preprint
5
from translation in eukaryotes originated amongst the prokaryotic viruses (Bell 2001). Although 106
fossil evidence for viruses is unlikely to be found, viruses almost certainly existed before the origin of 107
LECA. For example, a prokaryotic genome that is free of genetic parasites is expected to show 108
signs of genome degeneration due to the need for a mechanism to overcome the degradation of 109
prokaryotic genomes caused by processes such as Muller’s ratchet (Iranzo et al. 2016). There are 110
also strong biological arguments that the emergence of genetic parasites is inevitable due to the 111
instability of parasite-free states (Koonin et al. 2017). Further experimental support for a pre-LECA 112
origin of viruses comes from phylogenomic analysis which shows that modern eukaryotic viruses 113
evolved from pre-existing prokaryotic phage (Koonin et al. 2015). It can thus be anticipated that 114
viruses would have emerged in concert with the first prokaryotes and existed for much of the 2 115
billion years between the appearance of the first methanogens and the appearance of LECA. 116
117
The VE hypothesis has been supported by the discovery that the Pseudomonas jumbophage 201 Φ2-118
1 constructs a nucleus-like viral factory that uncouples transcription from translation (Chaikeeratisak 119
et al. 2017b). The viral factory established by 201 Φ2-1 confines phage DNA within the factory 120
whilst excluding ribosomes (Chaikeeratisak et al. 2017b). Thus once the factory is established, 121
transcription occurs within the factory and the mRNA must be exported into the cytoplasm for 122
translation. Functionally, infection results in the bacterial protoplasm being divided into a viroplasm 123
where viral information processing occurs, and a cytoplasm where translation and metabolic 124
enzymes are localised. Since viral encoded enzymes such as RNA polymerases and DNA polymerases 125
must function inside the viral factory whilst components of the phage virions are assembled in the 126
cytoplasm, it can be inferred that the boundary of these viral factories must be able to selectively 127
sort which proteins, RNA transcripts and other factors can move across the boundary. 128
129
Deepening similarities between the eukaryotic nucleus and the viral factories of phage 201 Φ2-1, 130
201 Φ2-1 possesses homologues of eukaryotic tubulin (PhuZ), and this tubulin polymerises via 131
.CC-BY-NC-ND 4.0 International licenseauthor/funder. It is made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the. https://doi.org/10.1101/679175doi: bioRxiv preprint
6
dynamic instability, positioning the factory in the centre of the infected cell (Chaikeeratisak et al. 132
2017). The PhuZ spindle is the only known example of a cytoskeletal structure that shares three key 133
properties with the eukaryotic spindle: dynamic instability, a bipolar array of filaments, and central 134
positioning of DNA (Chaikeeratisak et al. 2017). This is a significant parallel since eukaryotic nuclei 135
are positioned in the cell by microtubule-dependent motors during development and differentiation 136
(Star 2009). 137
138
It was similarities between the eukaryotic nucleus and the Pox viruses that led to the original VE 139
proposal that the nucleus was derived from a virus that infected the archaeal ancestor of the 140
eukaryotes (Bell 2001). In particular the observations supporting the model were that the Pox 141
viruses could produce capped mRNA, possessed linear chromosomes, could separate transcription 142
from translation, and had an ability to replicate entirely within the host cytoplasm (Bell 2001). 143
Subsequently Pox viruses were found to be members of an ancient monophyletic group, the NCLDV 144
viruses (Iyer et al. 2001). The discovery of the giant Mimivirus in 2004 and its allocation to the 145
NCLDV group (Raoult et al. 2004) demonstrated pox-viral relatives were of unprecedented size and 146
possessed a complexity comparable to prokaryotic cells (Raoult et al. 2004). Many other giant 147
NCLDV viruses have been discovered including even more complex relatives such as the 148
Kloseneuvirus (Schultz et al. 2017) and Tupanvirus (Abrahão et al. 2018). 149
150
A prokaryotic viral ancestry for both the Poxviruses and the other NCLDV viruses has been supported 151
by phylogenomic studies (Koonin and Yutin 2010) and is compatible with the NCLDV common 152
ancestor existing at or before the origin of LECA (Boyer et al. 2010; Nasir et al. 2012; Yutin et al. 153
2009). Furthermore, comparison between inferred genome of the NCLDV common ancestor (Yutin 154
and Koonin 2012) and the modern PhiKZ like viruses (including 201 Φ2-1) reveals that both classes of 155
giant virus possess large genomes, encode homologues of DNA polymerases (Kazlauskas and 156
Venclovas 2011), multi-subunit RNA polymerase (Ceyssens et al. 2014), DNA ligases (Wojtus et al. 157
.CC-BY-NC-ND 4.0 International licenseauthor/funder. It is made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the. https://doi.org/10.1101/679175doi: bioRxiv preprint
7
2017), RNA ligases (Wojtus et al. 2017) and replicate with a high degree of autonomy from their 158
hosts (Yuan and Gao 2017). Furthermore distantly related NCLDV viruses from the Poxviridae, 159
Asfaviridae, Pithoviridae Marseilleviridae and Mimiviridae families replicate partially or exclusively 160
within large cytoplasmic Viral Factories (Fridmann-Sirkis 2016). Since the genes for adding an m7G 161
cap to mRNA were present in the common ancestor of all the NCLDV viruses (Iyer et al. 2001; Yutin 162
and Koonin 2012) it can be inferred from these observations that the common ancestor of the 163
NCLDV viruses was a virus that could produce capped mRNA and like phage 201 Φ2-1 could establish 164
a viral factory in its host’s cytoplasm. 165
166
In addition to inheriting the ability to add an m7G cap to mRNA from the NCLDV common ancestor, 167
two separate groups of NCLDV viruses, the Pandoraviridae and the Mimiviridae, also possess 168
homologues of the eukaryotic cap binding protein eIF4E (Schultz et al. 2017). Unlike many NCLDV 169
viruses, the Pandoraviruses possess introns in their genes strongly suggesting that at least part of the 170
Pandoravirus genome is transcribed in the nucleus (Phillipe et al. 2013). By contrast members of the 171
Mimiviridae have been shown to replicate entirely in the host cytoplasm and establish a nucleus-like 172
uncoupling of transcription from translation (Fridmann-Sirkis et al. 2016). Furthermore, the cap 173
binding protein encoded by eIF4E is located in the cytoplasm outside the viral factory during 174
infection (Fridmann-Sirkis et al. 2016). Thus as shown in Figure 3, in addition to viral factories of 175
Mimiviruses and 201 Φ2-1 sharing fundamental features with each other such as the ability to 176
uncouple transcription from translation and selectively control which macromolecules enter and exit 177
the viral factory, the Mimivirus viral factories also share further specific fundamental features with 178
the eukaryotic nucleus. Amongst the shared features with the eukaryotic nucleus, the Mimivirus has 179
a linear genome, establishes a nucleus-like organelle in its host’s cytoplasm, possesses its own RNA 180
polymerase dedicated to transcribing capped mRNA, exports capped mRNA into the cytoplasm for 181
translation, and possesses its own version of the cap binding protein (eIF4E) which is located in the 182
host cytoplasm during infection and is presumably involved in controlling the initiation of translation 183
.CC-BY-NC-ND 4.0 International licenseauthor/funder. It is made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the. https://doi.org/10.1101/679175doi: bioRxiv preprint
8
of capped Mimiviral transcripts. These discoveries have led to independent suggestions that the 184
nucleus is a derived from a viral factory (e.g. Forterre and Raoult 2017). 185
186
Amongst the phylogenetically diverse NCLDV viruses, the Mimiviridae appear to be the only group 187
that establishes a solely cytoplasmic viral factory and possesses the eIF4E gene. Thus by analogy 188
with the proposal that the presence of Crenactins, the ESCRT-III complex, a family of small Ras-like 189
GTPases and a ubiquitin system make Lokiarchaeota a plausible direct descendent of an archaeal 190
ancestor of the eukaryotes (Koonin 2015) it is proposed here that ability to construct a viral factory 191
that uncouples transcription from translation, the possession of the m7G capping apparatus and the 192
presence of the eIF4E binding protein make the Mimiviridae a plausible direct descendant of a viral 193
ancestor of the eukaryotic nucleus. It is proposed here that this common ancestor of the 194
Mimiviridae and the eukaryotic nucleus was the First Eukaryotic Nuclear Ancestor (FENA). To test 195
this hypothesis, phylogenetic analysis was performed on the largest subunit of RNAP-II which is 196
required for synthesis of mRNA destined for capping; the capping apparatus which are required to 197
add the m7G cap to eukaryotic mRNA, and the eIF4E gene which is required to initiate translation of 198
capped mRNA in the cytoplasm (see Figure 2). 199
200
201
Results 202
203
The closest known archaeal relative of the eukaryotes shows no evidence of the eukaryotic genes 204
required to prime the uncoupling of transcription from translation in eukaryotes 205
206
To confirm the extent to which the closest archaeal relatives of the eukaryotes lack homologues of 207
the eukaryotic cap based system for uncoupling of transcription from translation, the genome of 208
Heimdallarchaea LC-3 (formerly Loki3, (Spang et al. 2015; Spang et al. 2018)) was Blast searched to 209
.CC-BY-NC-ND 4.0 International licenseauthor/funder. It is made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the. https://doi.org/10.1101/679175doi: bioRxiv preprint
9
identify homologues of four of the Saccharomyces cerevisiae genes required to uncouple 210
transcription from translation. 211
212
RPO21 of S. cerevisiae was used to identify homologues of RNAP-II in Heimdallarchaea LC-3. RPO21 213
was chosen because it is the largest subunit of RNAP-II in S. cerevisiae, and it encodes the C Terminal 214
Domain (CTD) containing the heptapeptide repeat (YSPTSPS) that recruits the capping enzymes to 215
the nascent mRNA transcript and is intricately involved in further processing of capped mRNA 216
(McCracken et al. 1997). The Homo sapiens genome was also searched for homologues to illustrate 217
the level of homology of these genes in distantly related descendants of LECA. Despite the large 218
evolutionary distance between the yeast and humans, three different homologues of RPO21 with 219
very significant E values were identified in H. sapiens (Table 1). These three correspond to RNA-I, 220
RNA-II and RNA-III which are are present in all eukaryotes (Sentenac 1985). By contrast in 221
Heimdallarchaeota LC-3, only a single RNA polymerase subunit A’ was identified as a homologue. 222
This is consistent with Heimdallarchaoeta LC-3 possessing a prokaryotic transcription system where 223
all RNA is transcribed by the same RNA polymerase (Werner 2007). 224
225
The capping apparatus in eukaryotes requires a TPase, a GTase and an MTase. Since the TPase gene 226
in eukaryotes apparently arose from two phylogenetically different origins (Ramanathan et al. 2016; 227
Kyrieleis et al. 2014), only the GTase and MTase genes required for constructing the m7G cap were 228
used to search the Heimdallarchaea LC-3 genome. Using the CEG1 (GTase) of S. cerevisiae to search 229
for homologues in H. sapiens identifies the human GTase with a very significant E value. By contrast, 230
although some putative homologues with non-significant E values were identified with Blast in 231
Heimdallarchaeota LC-3, only one of these showed homology with the known domain structure of 232
the GTase. This gene was an ATP Ligase (Table 1), a group known to share homology with the 233
GTases (Shuman and Schwer 1995). Using the ABD1 (MTase) gene of S. cerevisiae identifies 234
homologs with significant E values in both humans and Heimdallarchaeota LC-3. However it is 235
.CC-BY-NC-ND 4.0 International licenseauthor/funder. It is made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the. https://doi.org/10.1101/679175doi: bioRxiv preprint
10
known that the methyltransferase domain of the capping enzyme shares homology with a wide 236
family of methyltransferases, and according to the annotated genome of the Heimdallarchaeota LC3, 237
the gene detected in this search shares affinity with the Trans-aconitate 2- methytransferases rather 238
than the capping MTase. 239
240
Using the S. cerevisiae eIF4E gene (CDC33) to search for homologues in H. sapiens identifies the 241
human eIF4E with a very significant E value (Table 1). By contrast, no homologues eIF4E with 242
significant E values were found in the Heimdallarchaeaota LC-3 genome. Furthermore, none of the 243
genes with even low degrees of homology detected in Heimdallarchaeota LC-3 possessed the 244
conserved sites that are known to be involved when eIF4E binds the m7G cap (Marcotrigiano et al. 245
1997). 246
247
These results are consistent with the Asgard archaea being authentically archaeal in design and thus 248
lacking a nucleus, the defining feature of the eukaryotic domain (Stanier and Van Niel 1962). The 249
absence of any sign of a nucleus in Asgard archaea and the sudden appearance of the nucleus in 250
LECA is strikingly similar to the sudden appearance of the mitochondrion in the eukaryotic lineage. 251
Due to fundamental similarities between the mitochondria and alpha-proteobacteria, the abrupt 252
appearance of the mitochondrion in LECA is widely accepted to be the result of endosymbiosis 253
between a bacterium and the ancestor of the eukaryotes (e.g. Lang et al. 1999). The similar abrupt 254
appearance of a highly complex nucleus in LECA in consistent with an endosymbiotic origin, but the 255
nucleus is clearly not of prokaryotic cellular origin since it lacks an obvious homologue or precursor 256
among prokaryotes and is primarily an information processing organelle (Martin 1999; Martin 2005). 257
258
Mimiviral and eukaryotic RNAP-II, Gtase, MTase and eIF4E form two discrete monophyletic groups 259
260
.CC-BY-NC-ND 4.0 International licenseauthor/funder. It is made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the. https://doi.org/10.1101/679175doi: bioRxiv preprint
11
Unlike any known prokaryotes, members of the Mimiviridae construct nucleus-like viral factories in 261
the cytoplasm of their hosts that separate transcription from translation (Figure 3). They also 262
possess a functional mRNA capping pathway that is homologous to the pathway utilised by the 263
eukaryotes to prime the uncoupling of transcription from translation. This pathway includes an 264
RNAP dedicated to mRNA synthesis, a TPase, GTase and MTase required to add the m7G cap to the 265
mRNA, and eIF4E, the cytoplasmic cap binding protein required to initiate translation of the capped 266
transcript in the cytoplasm. 267
268
In this phylogenetic analysis members of the eukaryotic domains were carefully selected to cover 269
the major eukaryotic supergroups and thus span the diversity of the eukaryotic domain (see 270
Materials and Methods). Where possible eukaryotic clades were chosen that contained at least one 271
member that has been studied in depth at a molecular level and where experimental knowledge of 272
the processes of transcription and translation exists. In addition, all phylogenetic analysis uses the 273
same organisms, and only species with complete genomes where all genes (RNAP-II, GTase, MTase 274
and eIF4E) could be unambiguously identified were used in tree construction. 275
276
As shown Figure 4a the unrooted phylogenetic tree of the RNAP largest subunit resolves into two 277
discrete monophyletic clades: the eukaryotes which descend from LECA, and the Mimiviridae that 278
descend from the common ancestor of the Mimiviridae. Despite the more limited phylogenetic 279
information contained in the GTase, MTase and eIF4E alignments, similar patterns are observed to 280
the RNAP tree, and the monophyly of the eukaryotes and the Mimiviridae is maintained in each 281
case. Concatenating the four genes (Fig. 4e) generates a phylogenetic tree with bootstrap values 282
higher than any of the individual trees suggesting that the four genes have a common phylogenetic 283
signal. Within the eukaryotic domain, clades corresponding to Holozoa, Ameobozoa, Fungi, 284
Viridiplantae, Alveolata and Excavata were well resolved with high support. These results are 285
consistent with studies that show LECA possessed a functional eukaryotic nucleus (Neumann et al. 286
.CC-BY-NC-ND 4.0 International licenseauthor/funder. It is made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the. https://doi.org/10.1101/679175doi: bioRxiv preprint
12
2010) and that all four eukaryotic genes identified as critical in uncoupling transcription from 287
translation primed by the m7G cap descend from a common ancestral set of genes that were 288
present in LECA. The Mimiviridae also belong to well supported monophyletic group suggesting that 289
all four genes were also present in the common viral ancestor of the Mimiviridae. The Mimiviridae 290
resolved into three clades that generally correspond to those previously described in the 291
Mimiviridae (Claverie and Abergel 2018). 292
293
The eukaryotic RNAP-II dedicated to capping mRNA shares a common ancestor with the 294
Mimiviridae RNAP, and the common ancestor predates the origin of LECA 295
296
Although all the phylogenetic trees in Figure 4 have been drawn with a root between the viral and 297
eukaryotic versions of the genes, establishing the root of the MTase, GTase and eIF4E phylogenetic 298
trees is challenging since the capping apparatus is unique to the eukaryotic domain. Thus only 299
paralogues of these three genes exist outside the eukaryotes and the NCLDV viruses making it 300
difficult to establish informative outgroups. In addition, despite being conserved, these genes are 301
short and thus possess relatively little phylogenetic information. By contrast, the RNAP gene is a 302
large phylogenetically informative gene that is found in all cellular domains. Since independent 303
fossil evidence suggests that domain Archaea existed some two billion years before the appearance 304
of LECA (Knoll 2015), and the eukaryotes apparently descend from a particular branch of the archaea 305
(Spang et al. 2015), the RNAP large subunit is a suitable outgroup that can polarise the relationship 306
between the eukaryotic RNAP-II and Mimiviral RNA polymerases. An additional advantage of the 307
RNAP based tree is that all eukaryotes possess multiple RNAP’s (Sentenac 1985). Since these 308
multiple RNAP’s were present in LECA, these can be used in concert with the archaeal sequences to 309
firmly establish the root of the RNAP tree. Since both logic and the phylogenetic analysis performed 310
here show that the RNAP, GTase, MTase and eIF4E genes are part of a co-evolving module 311
.CC-BY-NC-ND 4.0 International licenseauthor/funder. It is made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the. https://doi.org/10.1101/679175doi: bioRxiv preprint
13
responsible for producing and translating capped mRNA, it can be argued that establishing the root 312
of the RNAP tree can be used to deduce the phylogeny of the entire capping apparatus. 313
314
As shown in Figure 5, using the archaeal RNAP subunit A’ and the homologous region of the 315
eukaryotic RNAP-III to polarise the relationship between RNAP-II and the Mimiviridae RNAP shows 316
that both the eukaryotic and Mimiviral genes descend from a common ancestral gene that predated 317
the origin of LECA. The high bootstrap values give confidence that there is significant phylogenetic 318
information in the alignment. In addition, both subtrees of the eukaryotic RNAP genes recapitulate 319
the expected phylogenetic relationships between the eukaryotes, including the establishing the 320
Excavata as the most divergent eukaryotic supergroup (Hampl et al. 2009). Furthermore, within the 321
eukaryotic domains, all the chosen eukaryotes were assigned to their accepted branches. A 322
parsimonious explanation of the observed tree is that the ability to produce m7G capped mRNA was 323
a feature of the ancestor of both the eukaryotic RNAP- II and Mimiviridae RNA polymerase since 324
both the eukaryotic and viral genes produce capped mRNA, whilst neither RNAP-III nor the Archaeal 325
RNAP is associated with producing capped mRNA. Although other interpretations may be possible, 326
the tree is entirely consistent with descent of the eukaryotic nucleus and the Mimiviridae from an 327
ancient viral factory that could produce capped mRNA, a defining, core component of the apparatus 328
required to uncouple transcription from translation by the eukaryotic nucleus that has not been 329
observed in the archaeal relatives of the eukaryotes. 330
331
332
Discussion 333
334
Here it is shown that the apparatus used by eukaryotic nuclei to produce and translate capped 335
mRNA is not found in the closest archaeal relatives of eukaryotes. This is significant since in the 336
eukaryotic nucleus, the uncoupling of transcription from translation requires a complex highly 337
.CC-BY-NC-ND 4.0 International licenseauthor/funder. It is made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the. https://doi.org/10.1101/679175doi: bioRxiv preprint
14
evolved pathway consisting of hundreds of genes acting in concert (Figure 2) and the m7G cap is 338
critical to this pathway since it is used to prime mRNA for processing, nuclear export, and 339
cytoplasmic translation (Figure 2). The absence of the m7G apparatus implies that the highly 340
complex pathway for uncoupling transcription from translation is also absent from archaeal relatives 341
of the eukaryotes. This presents a major biological paradox since such a complex pathway 342
incorporating the concerted action of hundreds of genes unique to the eukaryotic domain implies a 343
long evolutionary history, yet no sign of the pathway is found in the closest archaeal relatives of the 344
eukaryotes. 345
346
Although the appartus for producing capped mRNA is absent from the archaeal relatives of the 347
eukaryotes, the apparatus is present in the Mimiviridae which is consistent with the postulates of 348
the VE hypothesis. Phylogenetic analysis performed here demonstrates that viral and eukaryotic 349
genes form discrete monophyletic clades, and that both viral and eukaryotic clades descend from a 350
common ancestor that existed prior to the appearance of LECA. This pattern is consistent with 351
proposal that the eukaryotic nucleus and the Mimiviridae both descend from a First Eukaryotic 352
Nuclear Ancestor (FENA). 353
354
Prior to the discovery of the nucleus-like viral factory of 201 Φ2-1, the ability to uncouple 355
transcription from translation was thought to be an exclusive innovation of the eukaryotic nucleus. 356
Thus, arguments could be made that the viral factory of the Mimiviruses had evolved by borrowing 357
genes from the nucleus to allow it to establish the eukaryotic uncoupling of transcription from 358
translation. However, since 201 Φ2-1 infects bacteria it seems very unlikely that it obtained its 359
ability to build a viral factory and uncouple transcription from translation from the eukaryotes, but 360
rather indicates this ability has evolved in prokaryotic viruses as part of their replication cycle. Thus 361
the discovery of 201 Φ2-1 demonstrates that uncoupling of transcription from translation is most 362
likely a viral innovation, and since prokaryotes existed billions of years before the origin of the 363
.CC-BY-NC-ND 4.0 International licenseauthor/funder. It is made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the. https://doi.org/10.1101/679175doi: bioRxiv preprint
15
eukaryotes, viral factories potentially existed for billions of years before the origin of LECA. Studies 364
on a relative of phage 201 Φ2-1 (PhiKZ), show the viral factory appears to shield phage DNA from 365
host immune systems including the CrispR cas system (Mendoza et al. 2018). Thus viral factories 366
may have evolved to provide biological protection from various anti-phage systems possessed by 367
prokaryotic hosts (Hendrickson and Poole 2018). 368
369
The modern nucleus is clearly differentiated from any member of the Mimiviridae by its ability to 370
construct a fully functional translational system including ribosomes. In the absence of their own 371
translational machinery, all known viruses are dependent upon their host’s translation machinery to 372
produce polypeptides required for their own reproduction. Thus all mRNAs produced by viruses 373
accordingly engage cellular ribosomes to ensure translation (Jan et al. 2016). However, when the 374
Mimivirus was first discovered the “most unexpected discovery was the presence of numerous 375
genes encoding central protein-translation components” (Raoult et al. 2004). The discovery of the 376
Klosneuvirus increased the number of translation related genes found in viruses to levels that far 377
exceeds that seen in the original Mimivirus (Schulz et al. 2017) and sequencing of the Tupanvirus 378
genome revealed that some members of the Mimiviridae possess a translation associated gene set 379
that ‘only lacks the ribosome’ (Abrahão et al. 2018). Amongst this set of translational genes is up to 380
70 tRNA, 20 aaRS, 11 factors for all translation steps and factors related to tRNA/mRNA maturation 381
and ribosome protein modification (Abrahão et al. 2018). Since it appears that the ancestor of the 382
Mimiviridae did not possess all these functions, the appearance of so many translation related genes 383
in viruses such as the Kloseneuvirus and the Tupanvirus suggests that they acquired these 384
components of the eukaryotic translational machinery via a piecemeal capture process (Schultz et al. 385
2017). 386
387
The expanded viral repertoire of translational genes found in the Tupanvirus and Klosneuvirinae 388
suggest that there is selective pressure to acquire these genes in some branches of the Mimiviridae. 389
.CC-BY-NC-ND 4.0 International licenseauthor/funder. It is made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the. https://doi.org/10.1101/679175doi: bioRxiv preprint
16
This capture process may also be acting amongst the modern giant phage where captured ribosomal 390
genes appear to be part of the mechanism(s) by which phage direct the host translation apparatus 391
to selectively translate viral mRNA (Al-Shayeh et al. 2019). If the VE hypothesis is valid and a similar 392
capture process was operating before the origin of the first eukaryotes, the translational apparatus 393
acquired by FENA could only have been captured from prokaryotic cells. Since FENA is proposed to 394
have infected an Asgardian ancestor, many of the translation related genes would be derived from 395
its archaeal Asgardian host and directed to enhancing translation of the viral transcripts by the host’s 396
archaeal translational system. Consistent with this proposal, it is known that eukaryotic nuclei 397
possess a core set of archaeal related translation initiation factors including eIF1A, eIF2, eIF2B, 398
eIF4A, eIF5B and eIF6 (Jagus et al. 2012), and a core set of eukaryotic specific initiation factors (eIF5, 399
eIF4E, eIF4G, eIF4B, eIF4H and eIF3) (Jagus et al. 2012). With the exception of eIF5, all these 400
eukaryotic specific initiation factors are involved with 5’-cap-binding and scanning processes 401
required for translation of capped eukaryotic mRNA (Jagus et al. 2012). Furthermore in the process 402
of evolving into the nucleus, a viral ancestor of the nucleus must have acquired the ability to 403
synthesise uncapped rRNA and tRNA, and thus a part of the transition into a fully autonomous 404
nucleus would have been the capture by the virus of second and third RNA polymerase dedicated to 405
the synthesis of non-capped RNA associated with functioning of the ribosomes. 406
407
If the VE hypothesis can be accepted, the descent of the nucleus from a viral factory provides a 408
plausible resolution to several of the major paradoxes associated with the origin of the nucleus. 409
That is, if the nucleus descends from a viral factory and the viral factory set up by FENA was similar 410
in structure and function to the 201 Φ2-1 and Mimiviridae viral factories (Figure 3), the VE 411
hypothesis explains why the nucleus is mainly an information containing and processing 412
compartment, why it’s boundary selectively controls the entry and exit of proteins and nucleic acids, 413
why it exports mRNA into the cytoplasm, why it contains no functional ribosomes, why it possesses 414
linear rather than circular chromosomes, why it is positioned in the cell by the tubulin cytoskeleton, 415
.CC-BY-NC-ND 4.0 International licenseauthor/funder. It is made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the. https://doi.org/10.1101/679175doi: bioRxiv preprint
17
and as explored in this paper, why the eukaryotes possess highly evolved complex machinery to 416
allow uncoupling of transcription from translation with no prokaryotic precedents. It also provides a 417
rationale for the neo-functionalisation of RNA polymerases in the eukaryotes since the viral factory 418
introduces its own RNA polymerase specifically dedicated to the transcription of capped viral mRNA 419
destined for translation in the cytoplasm. The origin of the nucleus from a viral ancestry has also 420
been shown to provide a plausible mechanistic model for the origin of mitosis, meiosis and the 421
sexual cycle (Bell 2006, Bell 2013), a problem described as the queen of evolutionary problems (Bell 422
1982). Thus the origin of the nucleus from a viral factory addresses many of the challenges required 423
to explain the apparently abrupt appearance of a fully formed and functional nucleus in LECA, 424
despite its complete absence from bona-fide archaeal relatives such as members of the Asgard 425
archaea. 426
427
It should be noted that the VE hypothesis is not a pure ‘endosymbiotic theory’. According to the VE 428
hypothesis (Bell 2001), the eukaryotic cell is descended from an archaeal ancestor of the eukaryotic 429
cytoplasm, a bacterial ancestor of the mitochondrion, and as explored in this paper, a viral ancestor 430
of the nucleus. Although the archaeal ancestor of the cytoplasm may had a mutually beneficial 431
symbiotic relationship with a bacterium leading to the origin of the mitochondria, the host archaeon 432
did not gain any benefit from the viral infection, rather the archaeon host was enslaved by the virus 433
and its genome was ultimately destroyed. 434
435
However, like the endosymbiotic theories for the origin of the mitochondria and the chloroplasts, 436
the VE hypothesis deals with complex irreversible events that are difficult to directly test (Margulis 437
1975). In the case of the mitochondria, it took nearly 100 years before the consilience of evidence 438
built up sufficiently for the endosymbiotic origin the mitochondria to become (almost) universally 439
accepted. Although a more radical concept than endosymbiosis, if the VE hypothesis is similarly 440
supported by the accumulation of multiple lines of evidence, it will introduce a major paradigm shift 441
.CC-BY-NC-ND 4.0 International licenseauthor/funder. It is made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the. https://doi.org/10.1101/679175doi: bioRxiv preprint
18
in our understanding of the evolution of complex life on earth. In particular, if the VE hypothesis is 442
ultimately accepted, it implies that the eukaryotic cell derives from a consortium of three organisms 443
that became integrated to such an extent that they created an emergent ‘super-organism’. The 444
novel features of this emergent ‘super-organism’ allowed it to escape the limitations of prokaryotic 445
evolution and evolve to levels of unprecedented organismal complexity. 446
447
448
Materials and Methods 449
450
Choice of eukaryotic organisms 451
i) Eukaryotes 452
The organisms used in this study were carefully selected to cover all the relevant groupings of 453
eukaryotes, whilst limiting the complexity of the phylogenetic analysis. Currently 5 or 6 eukaryotic 454
supergroups are proposed to cover the vast majority of eukaryotic diversity (Hampl et al. 2009). The 455
present study focussed on ‘model’ organisms for the phylogenetic trees so that there was significant 456
knowledge of their molecular biology of at least one or more of the divisions. To represent the 457
Holozoa, Homo sapiens, Mus musculus, Danio rerio and Caenorhabditis elegans were chosen since 458
each is a model organism, and the phylogenetic relationships are well established. To represent 459
Amoebozoa, Dictyostelium disocoidium was chosen since it is a model organism. Dictyostellium 460
purpurem and Acytostelium subglosum were chosen as suitably distant relatives. To represent the 461
Fungi, Saccharomyces cerevisiae, Kluyveromyces marxianus and Aspergillus niger were chosen since 462
all three are model organism and the phylogenetic relationships are well understood. To represent 463
Viridiplantae, Arabidopsis lyrata was chosen as a model species and Brassica napus was chosen as a 464
relatively close relative. Ostreococcus tauri was chosen as a distant algal relative of the land plants. 465
To represent the SAR group, focus was placed on the Alveolata group since members such as 466
Plasmodium and Cryptosporidium have been studied in depth at a molecular level. To ensure the 467
.CC-BY-NC-ND 4.0 International licenseauthor/funder. It is made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the. https://doi.org/10.1101/679175doi: bioRxiv preprint
19
robustness of the tree and limit the effects of long-branch attraction, Plasmodium falciparum and 468
Plasmodium vivax were chosen as close relatives whilst Theileria equi strain WA, Cryptosporidium 469
muris, and Perkinsus marinus were chosen as increasingly distantly related members of the 470
Alveolata. To represent the Excavata, members of the Trypanosoma were chosen since they are 471
model organisms that have been studied in depth at molecular level. To ensure robustness of the 472
tree and to minimise the effects of long-branch attraction, Trypanosoma cruzi cruzi and 473
Trypanosoma rangeli were chosen as close relatives whilst Leishmania mexicana, Leptomonas 474
seymouri and Bodo saltans were chosen as increasingly distantly related members of the Excavata. 475
The organisms listed above include members of all 5 or 6 major clades. In addition, complete 476
genomes are available for each of the organisms listed, ensuring that the phylogenetic trees 477
included exactly the same organisms. 478
479
ii) Mimiviridae 480
Only members of the Mimiviridae containing clear homologues to RNAP, GTase, MTase and eIF4E 481
were chosen for analysis. Based on phylogenetic analysis by Claverie and Abergel, 2018, the 482
following viruses were chosen to represent three informal groupings of the Mimiviridae. 483
Mesomimivirinae: Tetraselmis virus, Chrysochromomulina ericina virus and Phaeocystis globosa 484
virus. Klosnuevirinae: Klosneuvirus, Catovirus, Indivirus and Bodo saltans virus. Megavirinae: 485
Acanthamoeba polyphaga mimivirus, Powai lake megavirus, Moumouvirus australiensis, 486
Acanthamoeba polyphaga moumouvirus, Tupanvirus deep ocean and Tupanvirus soda lake. 487
Cafeteria roenbergensis virus (CroV) is basal to the Klosenuvirinae and Megavirinae and does not 488
appear to have other close relatives available yet. 489
490
Choice of sequences 491
i) RNAP subunits 492
493
.CC-BY-NC-ND 4.0 International licenseauthor/funder. It is made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the. https://doi.org/10.1101/679175doi: bioRxiv preprint
20
Of the three RNA polymerases in eukaryotes, the RNAP-II is the one intimately associated with the 494
capping of transcripts. The largest RNAP-II subunit possesses a carboxy terminal domain (CTD) 495
consisting of a heptapeptide repeat region that is involved in mRNA processing including capping, 496
splicing and polyadenylation (McCracken et al. 1997). Homologues of RPO21, the largest subunit of 497
RNAP-II of S. cerevisiae were identified. With the exception of the members of the Excavata and 498
Perkinus marinus, a CTD heptapeptide repeat region was readily identified in all RNAP-II subunits 499
used in the phylogenetic analysis. Although the heptapeptide repeat is absent from the Excavata 500
studied, Trypanosoma RNAP-II genes possesses a non-canonical C-terminal extension (Smith et al. 501
1989). As a result, the Trypanosoma cruzi cruzi RNAP-II was used to identify RNAP-II homologues in 502
the Excavata clade. In the Mimiviridae only one homologue of the largest subunit of RNAP-II was 503
detected. 504
505
ii) GTase and MTase 506
Although three enzymatic functions are universally required to produced capped mRNA (Kyrieleis et 507
al. 2014), only the GTase and MTase are monophyletic in eukaryotes, with the TPase apparently 508
originating from two independent sources (Ramanathan et al. 2016; Kyrieleis et al. 2014). In S. 509
cerevisiae and most other unicellular eukaryotes such as Alveolata all three functions are encoded 510
by separate genes. In both Holozoa and Viridiplantae the TPase and GTase are encoded in the same 511
polypeptide. In Excavata, two capping complexes are present (Takagi et al. 2007). Of these, the 512
gene encoding both the GTase and MTase in the same polypeptide is essential for growth and 513
adding the m7G cap and was thus chosen for phylogenetic analysis. In the Mimiviridae, all three 514
functions are present in the same polypeptide. 515
516
iii) eIF4E 517
Although in the yeast Saccharomyces cerevisiae, there is only one eIF4E gene, the core role of eIF4E 518
in protein translation has meant that in higher eukaryotes several paralogous eIF4E genes have 519
.CC-BY-NC-ND 4.0 International licenseauthor/funder. It is made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the. https://doi.org/10.1101/679175doi: bioRxiv preprint
21
evolved that encode distinctly featured proteins. In addition to regular translation initiation, these 520
paralogues are involved in the preferential translation of particular mRNAs or are tissue and/or 521
developmental stage specific. For example, eight such genes have been found in Drosophila and five 522
in Caenorhabditis (Frydryskova et al. 2018). In humans where there are multiple paralogues, the 523
three isoforms of eIF4E1 bring the mRNAs to the ribosome via an interaction with scaffold protein 524
eIF4G (Frydryskova et al. 2018). As a result, in this study the human eIF4E1 isoform was used to 525
conduct blast searches of Holozoa, and the hits with the highest blast score were taken for 526
phylogenetic analysis. In Arabidopsis, the EIF4E1 is expressed in all tissues except in the cells of the 527
specialization zone of the roots whereas the At.EIF4E2 mRNA is particularly abundant in floral organs 528
and in young developing tissues (Rodriguez et al. 1998). The Arabidopsis EIF4E1 gene was thus used 529
in blast searches of plants, and the genes with the highest homology taken for phylogenetic analysis. 530
Where molecular knowledge was insufficient for such rational sequence selection, the homologue 531
with the highest homology to the Saccharomyces gene was identified, and provided that the gene 532
possessed regions equivalent to the structurally important regions that bind to the m7G cap 533
(Marcotrigiano et al. 1997), this gene was used to identify the closest homologues within the 534
supergroup. With the exception of the Tupanviruses, the Mimiviridae were found to encode only 535
one eIF4E homologue. In the case of the Tupanvirus, two eIF4E homologues were identified. In this 536
case, only one of the homologues was included in the phylogenetic analysis. The homologue with 537
the highest homology to the Mimivirus homologue was used in both cases. 538
539
Phylogenetic analysis 540
Homology searches were carried out using the BLASTp and psi-BLAST algorithms (Altschul et al. 541
1997). MEGA7 (Kumar et al. 2016) was used for all phylogenetic analysis. Unless otherwise stated 542
all program parameters for homology searching and domain identification were left at their 543
respective defaults. Protein alignments were performed using MUSCLE. Once alignments were 544
completed for all organisms for a particular alignment, the alignments were trimmed and used for 545
.CC-BY-NC-ND 4.0 International licenseauthor/funder. It is made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the. https://doi.org/10.1101/679175doi: bioRxiv preprint
22
tree construction. The evolutionary histories were inferred by using the Maximum Likelihood 546
method based on the JTT matrix-based model (Jones et al. 1992). All bootstrap consensus trees were 547
inferred from 1000 replicates (Felsenstein 1987) and is taken to represent the evolutionary history of 548
the taxa analyzed (Felsenstein 1987). Branches corresponding to partitions reproduced in less than 549
51% bootstrap replicates were collapsed. The percentage of replicate trees in which the associated 550
taxa clustered together in the bootstrap test (1000 replicates) are shown next to the branches 551
(Felsenstein 1987). Initial tree(s) for the heuristic search were obtained automatically by applying 552
Neighbor-Join and BioNJ algorithms to a matrix of pairwise distances estimated using a JTT model, 553
and then selecting the topology with superior log likelihood value. 554
555
556
References 557
558
1. Abrahão J, Silva L, Silva LS, Khalil JYB, Rodrigues R, Arantes T, Assis F, Boratto P, Andrade M, 559
Kroon EG, Ribeiro B, Bergier I, Seligmann H, Ghigo E, Colson P, Levasseur A, Kroemer G, 560
Raoult D, La Scola B. 2018. Tailed giant Tupanvirus possesses the most complete 561
translational apparatus of the known virosphere. Nat Commun. 9:749. 562
2. Al-Shayeb B et al. Clades of huge phage from across Earth’s ecosystems. BioRxiv [Preprint] 563
March 11, 2019 [Cited 19/06/2019] Available from: https://doi.org/10.1101/572362 564
3. Battistuzzi FU, Feijao A, Hedges SB. 2004. A genomic timescale of prokaryote evolution: 565
insights into the origin of methanogenesis, phototrophy, and the colonization of land. BMC 566
Evol Biol. 4:44. 567
4. Bell G. 1982. The Masterpiece of Nature: The Evolution and Genetics of Sexuality. London: 568
Croom Helm. p19 569
5. Bell PJ. 2001. Viral eukaryogenesis: was the ancestor of the nucleus a complex DNA virus? J 570
Mol Evol. 53(3):251-256. 571
.CC-BY-NC-ND 4.0 International licenseauthor/funder. It is made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the. https://doi.org/10.1101/679175doi: bioRxiv preprint
23
6. Bell PJ. 2006. Sex and the eukaryotic cell cycle is consistent with a viral ancestry for the 572
eukaryotic nucleus. J Theor Biol. 243(1):54-63. 573
7. Bell PJ. 2013. Meiosis: Its Origin According to the Viral Eukaryogenesis Theory. In: Bernstein 574
C, Bernstein M. editors. Meiosis. Intechopen. P. 77-99. 575
8. Benelli D, Londei P. 2011. Translation initiation in Archaea: conserved and domain-576
specific features. Biochem Soc Trans. 39(1):89-93. 577
9. Boyer M, Madoui MA, Gimenez G, La Scola B, Raoult D. 2010. Phylogenetic and phyletic 578
studies of informational genes in genomes highlight existence of a 4 domain of life including 579
giant viruses. PLoS One. (12):e15530. 580
10. Ceyssens PJ, Minakhin L, Van den Bossche A, Yakunina M, Klimuk E, Blasdel B, De Smet J, 581
Noben JP, Bläsi U, Severinov K, Lavigne R. 2014. Development of giant bacteriophage ϕKZ is 582
independent of the host transcription apparatus. J Virol. (18):10501-10510. 583
11. Chaikeeratisak V, Nguyen K, Egan ME, Erb ML, Vavilina A, Pogliano J. 2017. The phage 584
nucleus and tubulin spindle are conserved among large Pseudomonas phages. Cell Rep. 585
20(7):1563-1571. 586
12. Chaikeeratisak V, Nguyen K, Khanna K, Brilot AF, Erb ML, Coker JK, Vavilina A, Newton GL, 587
Buschauer R, Pogliano K, Villa E, Agard DA, Pogliano J. 2017. Assembly of a nucleus-like 588
structure during viral replication in bacteria. Science. 355(6321):194-197. 589
13. Claverie JM, Abergel C. 2018. Mimiviridae: An expanding family of highly diverse large 590
dsDNA viruses infecting a wide Phylogenetic range of aquatic eukaryotes. Viruses. 10(9): 591
506. 592
14. Eme L, Spang A, Lombard J, Stairs CW, Ettema TJG. 2017. Archaea and the origin of 593
eukaryotes. Nat Rev Microbiol. 15(12):711-723. 594
15. Felsenstein J. 1985. Confidence limits on phylogenies: An approach using the bootstrap. 595
Evolution 39:783-791. 596
.CC-BY-NC-ND 4.0 International licenseauthor/funder. It is made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the. https://doi.org/10.1101/679175doi: bioRxiv preprint
24
16. Forterre P, Raoult D. 2017. The transformation of a bacterium into a nucleated virocell 597
reminds the viral eukaryogenesis hypothesis. Virologie. 21(4):28-30. 598
17. Fridmann-Sirkis Y, Milrot E, Mutsafi Y, Ben-Dor S, Levin Y, Savidor A, Kartvelishvily E, Minsky 599
A. 2016. Efficiency in complexity: composition and dynamic nature of Mimivirus replication 600
factories. J Virol. 90(21):10039–10047. 601
18. Frydryskova K, Masek T, Borcin K, Mrvova S, Venturi V, Pospisek M. 2016. Distinct 602
recruitment of human eIF4E isoforms to processing bodies and stress granules. BMC Mol 603
Biol. 17(1):21. 604
19. Hampl V, Hug L, Leigh JW, Dacks JB, Lang BF, Simpson AG, Roger AJ. 2009. Phylogenomic 605
analyses support the monophyly of Excavata and resolve relationships among eukaryotic 606
"supergroups". Proc Natl Acad Sci U S A. 106(10):3859–3864. 607
20. Hendrickson HL, Poole AM. 2018. Manifold routes to a nucleus. Front Microbiol. 9:2604. 608
21. Hernández G, Proud CG, Preiss T, Parsyan A. 2012. On the diversification of the translation 609
apparatus across eukaryotes. Comp Funct Genomics . 2012: 256848. 610
22. Iranzo J, Puigbò P, Lobkovsky AE, Wolf YI, Koonin EV. 2016. Inevitability of genetic 611
parasites. Genome Biol Evol. 8(9):2856–2869. 612
23. Iyer LM, Aravind L, Koonin EV. 2001. Common origin of four diverse families of large 613
eukaryotic DNA viruses. J Virol. 75(23):11720–11734. 614
24. Jagus R, Bachvaroff TR, Joshi B, Place AR. 2012. Diversity of Eukaryotic Translational Initiation 615
Factor eIF4E in Protists. Comp Funct Genomics. 2012:134839. 616
25. Jan E, Mohr I, Walsh D. 2016. A cap-to-tail guide to mRNA translation strategies in virus-617
infected cells. Annu Rev Virol. 3(1):283-307. 618
26. Jones DT, Taylor WR, Thornton JM. 1992. The rapid generation of mutation data matrices 619
from protein sequences. Comput Appl Biosci. 8: 275-282. 620
27. Kabachinski G, Schwartz TU. 2015. The nuclear pore complex--structure and function at a 621
glance. J Cell Sci. 128(3):423-429. 622
.CC-BY-NC-ND 4.0 International licenseauthor/funder. It is made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the. https://doi.org/10.1101/679175doi: bioRxiv preprint
25
28. Katahira J. 2015. Nuclear export of messenger RNA. Genes 6(2):163-184. 623
29. Kazlauskas D, Venclovas C. 2011. Computational analysis of DNA replicases in double-624
stranded DNA viruses: relationship with the genome size. Nucleic Acids Res. (19):8291-305. 625
30. Knoll AH. Paleobiological perspectives on early microbial evolution. 2015. Cold Spring Harb 626
Perspect Biol. 7(7):a018093. 627
31. Koonin EV, Dolja VV, Krupovic M. 2015. Origins and evolution of viruses of eukaryotes: The 628
ultimate modularity. Virology. 479-480:2–25. 629
32. Koonin EV, Wolf YI, Katsnelson MI. 2017. Inevitability of the emergence and persistence of 630
genetic parasites caused by evolutionary instability of parasite-free states. Biol Direct. 631
12(1):31. 632
33. Koonin EV, Yutin N. 2010. Origin and evolution of eukaryotic large nucleo-cytoplasmic DNA 633
viruses. Intervirology. 53(5):284–292. 634
34. Koonin EV. 2015. Archaeal ancestors of eukaryotes: not so elusive any more. BMC Biol. 635
13:84. 636
35. Kumar S, Stecher G, Tamura K. 2016. MEGA7: Molecular evolutionary genetics analysis 637
version 7.0 for bigger datasets. Mol Biol Evol. 33:1870-1874. 638
36. Kyrieleis OJ, Chang J, de la Peña M, Shuman S, Cusack S. 2014. Crystal structure of vaccinia 639
virus mRNA capping enzyme provides insights into the mechanism and evolution of the 640
capping apparatus. Structure. 22(3):452–465. 641
37. Lang BF, Gray MW, Burger G. 1999. Mitochondrial genome evolution and the origin of 642
eukaryotes. Annu Rev Genet. 33:351-97. 643
38. Marcotrigiano J, Gingras AC, Sonenberg N, Burley SK. 1997. Co-crystal structure of the 644
messenger RNA 5' cap-binding protein (eIF4E) bound to 7-methyl-GDP. Cell. 89(6):951-61. 645
39. Margulis L. 1975. Symbiotic theory of the origin of eukaryotic organelles; criteria for proof. 646
Symp Soc Exp Biol. (29):21-38. 647
.CC-BY-NC-ND 4.0 International licenseauthor/funder. It is made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the. https://doi.org/10.1101/679175doi: bioRxiv preprint
26
40. Martin W. 1999. A briefly argued case that mitochondria and plastids are descendants of 648
endosymbionts, but that the nuclear compartment is not. Proc Biol Sci. 266(1426): 1387. 649
41. Martin W. 2005. Archaebacteria (Archaea) and the origin of the eukaryotic nucleus. Curr 650
Opin Microbiol. (6):630-637. 651
42. McCracken S, Fong N, Yankulov K, Ballantyne S, Pan G, Greenblatt J, Patterson SD, Wickens 652
M, Bentley DL. 1997. The C-terminal domain of RNA polymerase II couples mRNA processing 653
to transcription. Nature. 385(6614):357-361. 654
43. Mendoza SD, Berry JD, Nieweglowska ES, Leon LM, David A, Agard DA, Bondy-Denomy J. 655
2018. A nucleus-like compartment shields bacteriophage DNA from CRISPR-Cas and 656
restriction nucleases. BioRxiv [Preprint] July 17, 2018. [Cited 19/06/2019] Available 657
from: https://doi.org/10.1101/370791. 658
44. Mutsafi Y, Zauberman N, Sabanay I, Minsky A. 2010. Vaccinia-like cytoplasmic replication of 659
the giant Mimivirus. Proc Natl Acad Sci U S A. 107(13):5978–5982. 660
45. Nasir A, Kim KM, Caetano-Anolles G. 2012. Giant viruses coexisted with the cellular 661
ancestors and represent a distinct supergroup along with superkingdoms Archaea, Bacteria 662
and Eukarya. BMC Evol Biol. 12:156. 663
46. Neumann N, Lundin D, Poole AM. 2010. Comparative genomic evidence for a complete 664
nuclear pore complex in the last eukaryotic common ancestor. PLoS One. 5(10):e13241. 665
47. Okamura M, Inose H, Masuda S. 2015. RNA Export through the NPC in Eukaryotes. Genes 666
(Basel). 6(1):124–149. 667
48. Parfrey LW, Lahr DJ, Knoll AH, Katz LA. 2011. Estimating the timing of early eukaryotic 668
diversification with multigene molecular clocks. Proc Natl Acad Sci U S A. 108(33):13624–669
13629. 670
49. Philippe N, Legendre M, Doutre G, Couté Y, Poirot O, Lescot M, Arslan D, Seltzer V, Bertaux L, 671
Bruley C, Garin J, Claverie J-M, Abergel C. 2013. Pandoraviruses: Amoeba viruses with 672
genomes up to 2.5 Mb reaching that of parasitic eukaryotes. Science. 341:(6143) 281-286. 673
.CC-BY-NC-ND 4.0 International licenseauthor/funder. It is made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the. https://doi.org/10.1101/679175doi: bioRxiv preprint
27
50. Ramanathan A, Robb GB, Chan SH. 2016. mRNA capping: biological functions and 674
applications. Nucleic Acids Res. 44(16):7511–7526. 675
51. Raoult D, Audic S, Robert C, Abergel C, Renesto P, Ogata H, La Scola B, Suzan M, Claverie JM. 676
2004. The 1.2-megabase genome sequence of Mimivirus. Science. 306(5700):1344-1350. 677
52. Rivera MC, Lake JA. 1992. Evidence that eukaryotes and eocyte prokaryotes are immediate 678
relatives. Science. 257:74–76. 679
53. Rodriguez CM, Freire MA, Camilleri C, Robaglia C. 1998. The Arabidopsis thaliana cDNAs 680
coding for eIF4E and eIF(iso)4E are not functionally equivalent for yeast complementation 681
and are differentially expressed during plant development. Plant J. (4):465-73. 682
54. Sapp J. 2005. The prokaryote-eukaryote dichotomy: meanings and mythology. Microbiol Mol 683
Biol Rev. 69(2):292–305. 684
55. Schulz F, Yutin N, Ivanova NN, Ortega DR, Lee TK, Vierheilig J, Daims H, Horn M, Wagner M, 685
Jensen GJ, Kyrpides NC, Koonin EV, Woyke T. 2017. Giant viruses with an expanded 686
complement of translation system components. Science. 356(6333):82-85. 687
56. Sentenac A. 1985. Eukaryotic RNA polymerases. Crit Rev Biochem. 18(1):31-90. 688
57. Shuman S, Schwer B. 1995. RNA capping enzyme and DNA ligase: a superfamily of covalent 689
nucleotidyl transferases. Mol Microbiol. 17(3):405-10. 690
58. Smith JL, Levin JR, Ingles CJ, Agabian N. 1989. In trypanosomes the homolog of the largest 691
subunit of RNA polymerase II is encoded by two genes and has a highly unusual C-terminal 692
domain structure. Cell. 56(5):815-27. 693
59. Spang A, Eme L, Saw JH, Caceres EF, Zaremba-Niedzwiedzka K, Lombard J, Guy L, Ettema TJG. 694
2018. Asgard archaea are the closest prokaryotic relatives of eukaryotes. PLoS Genet 14(3): 695
e1007080. 696
60. Spang A, Saw JH, Jørgensen SL, Zaremba-Niedzwiedzka K, Martijn J, Lind AE, van Eijk R, 697
Schleper C, Guy L, Ettema TJG. 2015. Complex archaea that bridge the gap between 698
prokaryotes and eukaryotes. Nature. 521:173-179. 699
.CC-BY-NC-ND 4.0 International licenseauthor/funder. It is made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the. https://doi.org/10.1101/679175doi: bioRxiv preprint
28
61. Stanier RY, van Niel CB. 1962. The concept of a bacterium. Arch. Microbiol. 42:17-35. 700
62. Starr DA. 2009. A nuclear-envelope bridge positions nuclei and moves chromosomes. J Cell 701
Sci. 122(Pt 5):577–586. 702
63. Takagi Y, Sindkar S, Ekonomidis D, Hall MP, Ho CK. 2007. Trypanosoma brucei encodes a 703
bifunctional capping enzyme essential for cap 4 formation on the spliced leader RNA. J Biol 704
Chem. 282(22):15995-6005. 705
64. Werner F. 2007. Structure and function of archaeal RNA polymerases. Mol Microbiol. 706
65(6):1395-404. 707
65. Wojtus JK, Fitch JL, Christian E, Dalefield T, Lawes JK, Kumar K. Peebles CL, Altermann E, 708
Hendrickson HL. 2017. Complete genome sequences of three novel Pseudomonas 709
fluorescens SBW25 bacteriophages, Noxifer, Phabio, and Skulduggery. Genome 710
announcements. 5(31), e00725-17. 711
66. Yuan Y, Gao M. 2017. Jumbo bacteriophages: an overview. Front Microbiol. 8:403. 712
67. Yutin N. Koonin EV. 2012. Hidden evolutionary complexity of Nucleo-Cytoplasmic Large DNA 713
viruses of eukaryotes. Virol J. 9: 161. 714
68. Yutin, N, Wolf, YI, Raoult D, Koonin EV. 2009. Eukaryotic large nucleocytoplasmic DNA 715
viruses: clusters of orthologous genes and reconstruction of viral genome evolution. Virol J. 716
6:223. 717
69. Zauberman N, Mutsafi Y, Halevy DB, Shimoni E, Klein E, Xiao C, Sun S, Minsky A. 2008. 718
Distinct DNA exit and packaging portals in the virus Acanthamoeba polyphaga mimivirus. 719
PLoS Biol. 6(5):e114. 720
721
722
723
724
725
.CC-BY-NC-ND 4.0 International licenseauthor/funder. It is made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the. https://doi.org/10.1101/679175doi: bioRxiv preprint
29
Figures 726
727
728 729 730 Figure 1: The coupled prokaryotic system of transcription and translation. Both archaea and bacteria utilize 731 one type of multi-component RNA polymerase (RNAP) to transcribe all RNA (Werner 2007). Transcription and 732 translation in prokaryotes are coupled since transcription and translation occur directly in the protoplasm, and 733 thus translation initiation can occur before the mRNA transcript is fully synthesised. Translation in prokaryotes 734 relies on direct recognition of mRNA by the ribosomal apparatus via sequences such as the Shine-Dalgarno 735 sequences or short UTR’s (Benelli and Londei 2011). In the case of Shine-Dalgarno sites, the 30S ribosomal 736 subunit binds to the mRNA in such a way that AUG codon lies on the peptidyl (P) site and the second codon lies 737 on aminoacyl (A) site. The initiator tRNA binds to the P site, the large ribosomal subunit docks with the small 738 subunit, the initiation factors are released and the ribosome is ready to start translation. Since prokaryotes 739 originated 3.8 billion years ago (Battistuzzi et al. 2004) the coupled prokaryotic process predates the 740 uncoupled eukaryotic system by close to 2 billion years and thus is the most ancient cellular system of 741 transcription and translation. 742 743 744 745 746 747 748 749 750
751
752
.CC-BY-NC-ND 4.0 International licenseauthor/funder. It is made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the. https://doi.org/10.1101/679175doi: bioRxiv preprint
30
753 754 Figure 2: The eukaryotic system to uncouple transcription from translation is complex and employs 755 hundreds of genes that act in concert. The dominant eukaryotic cap dependent system of transcription and 756 translation that was apparently present and fully functional in LECA (Neumann et al. 2010) is described below. 757 i) Subunits of RNAP-II are translated in the cytoplasm imported into the nucleus through the Nuclear Pore 758 Complex (NPC). RNAP-II initiates transcription of mRNA by binding to the promoter regions of protein coding 759 genes. ii) After the synthesis of the first 20 to 25 bp of mRNA, the polymerase pauses until the mRNA is 760 capped (Ramanathan et al. 2016, Okamura et al. 2015). The eukaryotic m7G cap (symbolised by ) consists of 761 7-methylguanosine linked via a reversed 5’-5’ triphosphate linkage to the transcript and is the first 762 modification made to RNAP-II transcribed RNA. Three enzymatic functions are required to generate the cap. 763 Firstly, a RNA 5’-phosphatase (TPase) hydrolyses the 5’-triphosphatase end of the nascent mRNA to generate a 764 5’-diphosphate. The 5’-diphosphate is then capped with guanosine mono-phosphate by a RNA 765 guanylyltransferase (GTase) to generate a 5’ GpppRNA cap on the transcript. Finally, the guanosine GpppRNA 766 cap is methylated by RNA (guanine-N7)-methyltransferase (MTase) (Kyrieleis et al. 2014). iii) The nuclear cap 767 binding complex (CBC) binds to the m7G cap which then forms a complex with snRNP’s to initiate splicing and 768 polyadenylation (Ramanathan et al. 2016). Splicing of mRNA transcripts is unique to the eukaryotes and 769 requires interaction of hundreds of proteins and the conserved snRNAs. iv) The m7G cap primes the mRNA for 770 transport through the nuclear pores into the cytoplasm (Katahira 2015) by binding trans-acting factors to form 771 a mature messenger ribonucleoprotein (mRNPs). Recruitment of the multisubunit TRanscription-EXport (TREX-772 1) complex requires the 5’ capping of pre-mRNA because CBP80 interacts with the QAlyRef and THO sub-773 complexes of TREX-1 (Okamura et al. 2015). v) The nuclear pore complex (NPC) is integral to the uncoupling of 774 transcription from translation because the NPC acts as a gate keeper, controlling which macromolecules enter 775 and exit the nucleus. NPC’s are unique to the eukaryotes, and a single NPC comprises ∼500 individual protein 776 molecules collectively known as nucleoporins (Nups) (Kabachinski and Schwartz 2015). The NPC includes a 777 nuclear ring, a central transport channel and eight cytoplasmic fibrils which allow molecules smaller that 40-60 778 kDa to freely diffuse (Kabachinski and Schwartz 2015). Large molecules such as mRNA must associate with 779 specific export receptors such as Nxf1, Crm1 or other karyopherins to be actively transported through the NPC. 780 vi) To initiate translation, the 43S ribosomal preinitiation complex is recruited to the 5’ end of the mRNA, a 781 process that is co-ordinated by eIF4E through its interactions with eIF4G and the 40S ribosomal subunit 782 associated eIF3 (Hernandez et al. 2012). Several eukaryotic specific initiation factors eIF4E, eIF4G, eIF4B, eIF4H 783 and eIF3 are involved with 5’-cap-binding and scanning processes that are essential to the initiation and 784 translation of capped eukaryotic mRNA (Jagus et al. 2012). vii) Once the ribosome has been recruited to the 785 capped mRNA transcript, a scanning process occurs and translation is generally initiated at the first ATG 786 encountered. 787
.CC-BY-NC-ND 4.0 International licenseauthor/funder. It is made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the. https://doi.org/10.1101/679175doi: bioRxiv preprint
31
788
Figure 3: 201 Φ2-1 viral factories, Mimivirus viral factories, and the eukaryotic nucleus share the ability to 789 uncouple transcription from translation. i a) Image of Phage 201 Φ2-1 viral factory (Chaikeeratisak et al. 790 2017b). ii a) Image of Mimivirus viral factory (Zauberman et al. 2008). iii a) The eukaryotic nucleus. i b) Phage 791 201 Φ2-1 establishes a viral factory in the cytoplasm of the bacterial host confining DNA replication and 792 transcription to the viral factory. Translation is confined to the cytoplasm since host bacterial ribosomes are 793 excluded from the viral factory (Chaikeeratisak et al. 2017b). Since PhiKZ relatives of 201 Φ2-1 can complete 794 infection in the absence of bacterial RNA polymerase (RNAP) activity (Ceyssens et al. 2014) it can be inferred 795 that the multi-subunit RNAP genes encoded by the phage are transcribed in the viral factory, transcripts 796 exported into the cytoplasm for translation and the proteins re-imported into the viral factory to transcribe 797 the phage DNA. ii b) The Mimivirus also establishes a viral factory in the cytoplasm of its eukaryotic host 798 (Mutsafi et al. 2010) confining DNA replication and transcription to the viral factory. Translation is confined to 799 the cytoplasm since host ribosomes are excluded from the viral factory (Fridmann-Sirkis et al. 2016). 800 Mimiviruses encode a multi-subunit RNA polymerase that transcribes Mimiviral DNA and functions within the 801 viral factory (Fridmann-Sirkis et al. 2016). It can therefore be inferred that the Mimivirus viral factory controls 802 which macromolecules are transported in and out of the viroplasm. Like the eukaryotic nucleus Mimiviridae 803 encode their own mRNA capping apparatus and a version of the eIF4E gene. In cells infected by the 804 Mimiviridae EIF4E remains located in the host cytoplasm (Fridmann-Sirkis et al. 2016). iii b) The eukaryotic 805 nucleus, like viral factories of both phage 201 Φ2-1 and the Mimivirus, is a specialised compartment located in 806 the cytoplasm that confines DNA replication and transcription within its boundaries. Translation is confined to 807 the cytoplasm since functional ribosomes are excluded from the nucleus. The mRNA encoding RNAP- II 808 subunits are transcribed within the nucleus, exported into the cytoplasm for translation, and re-imported into 809 the nucleus to transcribe nuclear DNA. Unlike viral factories, the mechanisms by which the nucleus sorts the 810 macromolecules that can enter and exit the nucleus is well understood, and known to be controlled by the 811 NPCs. Like the Mimivirus viral factory, eukaryotic nuclei encode their own capping apparatus and encode the 812 eIF4E gene which binds to the m7G cap and both are part of a complex system to uncouple transcription from 813 translation. 814 815 816 817 818 819 820 821 822
.CC-BY-NC-ND 4.0 International licenseauthor/funder. It is made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the. https://doi.org/10.1101/679175doi: bioRxiv preprint
32
823
824
Figure 4: Unrooted phylogenetic trees of the mRNA capping pathway in selected eukaryotes and 825 Mimiviridae. All five trees use sequences from the same set of carefully selected organisms (see Materials and 826 Methods) and the proposed position of LECA is marked in each tree. The number of conserved amino acids in 827 the final alignment for each gene is marked on the diagram. Trees were constructed and drawn using the ML 828 method using default settings in MEGA7 with 1000 bootstrap replicates. NCBI accession numbers are given for 829 each sequence in the Materials and Methods. Mimiviridae informal grouping names are based on Claverie and 830 Abergel 2018. a) RNAP largest subunit gene tree. b) GTase gene tree. c) MTase gene tree. d) eIF4E gene tree 831 e) Phylogenetic tree inferred from concatenation of all four gene sequences. 832
.CC-BY-NC-ND 4.0 International licenseauthor/funder. It is made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the. https://doi.org/10.1101/679175doi: bioRxiv preprint
33
833
Figure 5: Maximum Likelihood tree of RNA polymerases using Archaeal RNAP subunit A’ as an outgroup. 834 The RNAP A’ subunit of archaea was used as an out-group to establish the root of the largest subunit of the 835 Mimiviral RNAP and Eukaryotic RNAP-II and RNAP-III genes. RNAP-II and RNAP-III are found to belong to two 836 separate monophyletic groups. Both the RNAP-II and RNAP-III trees are robust, appropriately assign 837 eukaryotes to their correct phylogenetic branches and re-capitulate the expected phylogenetic relationships 838 between the eukaryotes including the early divergence of the Excavata (Hampl et al. 2009). The Mimiviridae 839 tree is consistent with previous phylogenetic analyses of the Mimiviridae (Claverie and Abergel, 2018). This 840 tree shows that the Mimiviridae and eukaryotic RNAP-II genes share a common ancestor. This ancestor 841 existed before LECA and is consistent with the proposal that both descend from FENA, a proposed viral 842 ancestor of both the Mimiviridae and the eukaryotic nucleus that infected an archaeal ancestor of the 843 eukaryotes. Since both viral and eukaryotic RNAP-II synthesise m7G capped mRNA it can be inferred that the 844 common RNA polymerase ancestor also produced capped mRNA. This tree was produced from an alignment 845 of 64 sequences and 598 positions using Maximum Likelihood method and the JTT substitution model. 846 Bootstrap values are indicated on each branch and are based on 1000 replicates. The tree and the 847 computations were performed using MEGA7. NCBI accession numbers are given for each sequence in the 848 Materials and Methods 849 850
851
852
853
854
.CC-BY-NC-ND 4.0 International licenseauthor/funder. It is made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the. https://doi.org/10.1101/679175doi: bioRxiv preprint
34
Tables 855
856
Table 1. Homologues of S. cerevisiae RNAP-II, GTase, MTase and eIF4E in Homo sapiens and 857
Heimdallarchaota LC-3 identified using Blast. 858
859
860
Table 2. Summary of Accession numbers used in this study 861
862
Saccharomyces Annotation in Annotation in Heimdallarchaeota LC-3
gene name Homo sapiens Accession E value Accession E value
RNA polymerase I large subunit NP_056240 1.00E-97
RPO21 (NP_010141) RNA polymerase II large subunit NP_000928 0 OLS19521.1 0 RNA polymerase subunit A'
RNA polymerase III large subunit NP_008986 0
CEG1 (NP_011385) Guanylytransferase AAH19954 2.00E-17 OLS26805.1 2.7 DNA ligase
ABD1 (NP_009795) Methytransferase NP_003790 2.00E-38 OLS26405.1 4.00E-05 Trans-aconitate 2-methyltransferase
CDC33 (NP_014502) eIF4E NP_001959 3.00E-36 OLS27732.1 0.37 5-exo-hydroxycamphor dehydrogenase
Homo sapiens Heimdallarchaeota LC-3
Group species Gtase Mtase eIF4E RNAP-II RNAP-III
Eukarya Fungi Saccharomyces cerevisiae NP_011385 NP_009795 NP_014502 NP_010141 NP_014759
Eukarya Fungi Kluyveromyces marxianus XP_022676394 XP_022674569 XP_022678436 XP_022677581 XP_022678447
Eukarya Fungi Aspergillus niger XP_001400555 XP_001394253 XP_001395221 XP_001389676 XP_001393726
Eukarya Holozoa Homo sapiens AAH19954 BAA82447 NP_001959 NP_000928 NP_008986
Eukarya Holozoa Mus musculus NP_036014 NP_080716 NP_031943 AAB58418 NP_001074716
Eukarya Holozoa Danio rerio NP_998032 NP_001038465 NP_001007778 XP_005156282 NP_001263425
Eukarya Holozoa Caenorhabdis. elegans NP_001020979 NP_492674 NP_503124 NP_500523 NP_501127
Eukarya viridiplantae Arabidopsis lyrata XP_002873017 XP_002894293 XP_020875354 XP_020873010 XP_020884300
Eukarya viridiplantae Brassica napus XP_013647283 XP_013640697 AGA20262 XP_013656472 XP_013681133
Eukarya viridiplantae Ostreococcus tauri XP_003075327 XP_003081423 XP_022840751 XP_022839775 XP_022840814
Eukarya Amoebozoa Dictyostelium discoideum XP_636333 XP_642389 XP_647593 XP_641735 XP_642724
Eukarya Amoebozoa Dictyostelium purpureum XP_003293052 XP_003293647 XP_003293106 XP_003285719 XP_003284018
Eukarya Amoebozoa Acytostelium subglobosum LB1 XP_012756463 XP_012752660 XP_012756585 XP_012756853 XP_012752065
Eukarya Alveolata Plasmodium falciparum KNC37820 ETW19449 XP_001351220 XP_001351252 XP_001350009
Eukarya Alveolata Plasmodium vivax KMZ83875 SGX75114 XP_001614562 XP_001614530 XP_001614080
Eukarya Alveolata Theileria equi strain WA XP_004828897 XP_004828862 XP_004829399 XP_004831990 XP_004830926
Eukarya Alveolata Perkinsus marinus ATCC 50983 XP_002774114 XP_002774250 XP_002774365 XP_002767562 XP_002778409
Eukarya Alveolata Cryptosporidium muris RN66 XP_002140608 XP_002139632 XP_002140059 XP_002141559 XP_002142344
Eukarya Excavata Trypanosoma cruzi cruzi PBJ71163 PBJ71163 PBJ73557 PBJ81421 PBJ72541
Eukarya Excavata Trypanosoma rangeli RNF00410 RNF00410 RNF02202 RNF07318 RNF04215
Eukarya Excavata Leishmania mexicana XP_003875466 XP_003875466 XP_003876737 XP_003877779 XP_003878621
Eukarya Excavata Leptomonas seymouri KPI83387 KPI83387 KPI89876 KPI84927 KPI86235
Eukarya Excavata Bodo saltans CUG90421 CUG90421 CUF95139 CUI14899 CUI14455
Mimiviridae Klosneuvirinae Catovirus CTV1 ARF09224 ARF09224 ARF09024 ARF09013-20
Mimiviridae Klosneuvirinae Klosneuvirus KNV1 ARF11732 ARF11732 ARF11337 ARF11340-43
Mimiviridae Klosneuvirinae Indivirus ILV1 ARF09638 ARF09638 ARF09452 ARF09455
Mimiviridae Klosneuvirinae Bodo saltans virus ATZ80933 ATZ80933 ATZ80516 ATZ80519
Mimiviridae Mimivirinae Acanthamoeba polyphaga mimivirus AEJ34618 AEJ34618 AKI79272 YP_003987013
Mimiviridae Mimivirinae Moumouvirus australiensis AVL94825 AVL94825 AVL94704 AVL94698 AVL94696
Mimiviridae Mimivirinae Powai lake megavirus ANB50623 ANB50623 ANB50499 ANB50494 ANB50492
Mimiviridae Mimivirinae Tupanvirus deep ocean AUL79325 AUL79325 AUL79602 AUL79608
Mimiviridae Mimivirinae Tupanvirus soda lake AUL78031 AUL78031 AUL78296 AUL78302
Mimiviridae Mimivirinae Acanthamoeba polyphaga moumouvirus YP_007354410 YP_007354410 YP_007354285 YP_007354277
Mimiviridae Mesomimivirinae Chrysochromulina ericina virus YP_009173557 YP_009173557 YP_009173322 YP_009173653
Mimiviridae Mesomimivirinae Phaeocystis globosa virus YP_008052553 YP_008052553 YP_008052407 YP_008052581
Mimiviridae Mesomimivirinae Tetraselmis virus 1 AUF82182 AUF82182 AUF82209 AUF82600
Mimiviridae CroV Cafeteria roenbergensis virus BV-PW1 YP_003969844 YP_003969844 YP_003969852 YP_003970001
Archaea Crenarchaeota Saccharolobus solfataricus WP_009990476
Archaea Crenarchaeota Sulfolobus acidocaldarius WP_011277574
Archaea Asgardarchaea Candidatus Odinarchaeota archaeon LCB_4 OLS17382
Archaea Euryarchaeota Pyrococcus furiosus WP_014835440
.CC-BY-NC-ND 4.0 International licenseauthor/funder. It is made available under aThe copyright holder for this preprint (which was not peer-reviewed) is the. https://doi.org/10.1101/679175doi: bioRxiv preprint