AAAplus New

8/14/2019 AAAplus New

1/21

Evolutionary history and higher order classificationof AAA+ ATPases

Lakshminarayan M. Iyer, Detlef D. Leipe, Eugene V. Koonin, and L. Aravind*

National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA

Received 2 September 2003, and in revised form 8 October 2003

Abstract

The AAA+ ATPases are enzymes containing a P-loop NTPase domain, and function as molecular chaperones, ATPase

subunits of proteases, helicases or nucleic-acid-stimulated ATPases. All available sequences and structures of AAA+ protein

domains were compared with the aim of identifying the definitive sequence and structure features of these domains and inferring

the principal events in their evolution. An evolutionary classification of the AAA+ class was developed using standard phy-

logenetic methods, analysis of shared sequence and structural signatures, and similarity-based clustering. This analysis resulted in

the identification of 26 major families within the AAA+ ATPase class. We also describe the position of the AAA+ ATPases with

respect to the RecA/F1, helicase superfamilies I/II, PilT, and ABC classes of P-loop NTPases. The AAA+ class appears to have

undergone an early radiation into the clamp-loader, DnaA/Orc/Cdc6, classic AAA, and pre-sensor 1 b-hairpin (PS1BH)

clades. Within the PS1BH clade, chelatases, MoxR, YifB, McrB, Dynein-midasin, NtrC, and MCMs form a monophyletic

assembly defined by a distinct insert in helix-2 of the conserved ATPase core, and additional helical segment between the core

ATPase domain and the C-terminal a-helical bundle. At least 6 distinct AAA+ proteins, which represent the different major

clades, are traceable to the last universal common ancestor (LUCA) of extant cellular life. Additionally, superfamily III heli-

cases, which belong to the PS1BH assemblage, were probably present at this stage in virus-like selfish replicons. The nextmajor radiation, at the base of the two prokaryotic kingdoms, bacteria and archaea, gave rise to several distinct chaperones,

ATPase subunits of proteases, DNA helicases, and transcription factors. The third major radiation, at the outset of eukaryotic

evolution, contributed to the origin of several eukaryote-specific adaptations related to nuclear and cytoskeletal functions. The

new relationships and previously undetected domains reported here might provide new leads for investigating the biology of

AAA+ ATPases.

Published by Elsevier Inc.

1. Introduction

A large part of the proteome of any organism is de-

voted to proteins that bind nucleoside triphosphates

and, typically, utilize them as substrates in various re-

actions (reviewed in Vetter and Wittinghofer, 1999).

Several distinct NTP-binding protein folds have been

structurally characterized to date, but amongst these the

P-loop NTPases (Saraste et al., 1990) are by far the most

abundant class, which accounts for 1018% of the pre-

dicted gene products in the sequenced prokaryotic and

eukaryotic genomes (Koonin et al., 2000a). Proteins

with P-loop NTPase domains are also present in the

majority of viruses studied to date (Gorbalenya and

Koonin, 1989). The P-loop NTPases are thought to be a

monophyletic assemblage of protein domains, and sev-

eral distinct versions of this domain are traceable to the

last universal common ancestor (LUCA) of all modern

cellular life forms (Kyrpides et al., 1999; Leipe et al.,

2002). This suggests that the P-loop domain originated

long before the time of the LUCA and had undergone

considerable structural and functional diversification

prior to this period. Thus, understanding the natural

history of P-loop NTPases is critical for understanding

the key aspects of lifes evolution, ranging from the early

phases to the radiation of major organismal lineages.

Most members of the P-loop NTPase fold hydrolyze

the bc phosphate bond of a bound nucleoside tri-

phosphate, most often, ATP or GTP. The free energy of

* Corresponding author. Fax: 1-301-435-7794.

E-mail address: [email protected] (L. Aravind).

1047-8477/$ - see front matter. Published by Elsevier Inc.

doi:10.1016/j.jsb.2003.10.010

Journal of Structural Biology 146 (2004) 1131

Journal of

StructuralBiology

www.elsevier.com/locate/yjsbi
http://mail%20to:%[email protected]/http://mail%20to:%[email protected]/


2/21

this hydrolysis reaction is typically utilized to induce

conformational changes in other molecules. This con-

stitutes the basis of the biochemical activities and bio-

logical functions of most P-loop fold proteins. In

contrast, members of one major lineage of P-loop pro-

teins, the kinases, transfer the ATP c-phosphate to di-

verse substrates (Leipe et al., 2003). Structurally, P-loopdomains adopt a 3-layered a/b sandwich configuration

that contains regularly recurring ab units with the

b-strands forming a central, mostly parallel b-sheet

surrounded on both sides by a-helices (Milner-White

et al., 1991) (see also the SCOP database (Murzin et al.,

1995): http://scop.mrc-lmb.cam.ac.uk/scop/). At the se-

quence level, P-loop NTPases are generally character-

ized by two conserved sequence motifs, the Walker A

and B motifs, which bind, respectively, the b and c

phosphate moieties of the bound NTP, and a Mg2

cation (Saraste et al., 1990; Vetter and Wittinghofer,

1999; Walker et al., 1982).

Sequence and structure analyses suggest that the

primary diversification event in the evolution of

the P-loop fold resulted in the two principal classes of

the P-loop domains. The first of these, the KG (Kinase

GTPase) division includes the kinases and GTPases that

share number of structural similarities, such as the ad-

jacent placement of the P-loop and Walker B strands.

The other class, the ASCE division (for additional

strand, catalytic E), is characterized by an additional

strand in the core sheet, which is located between the

P-loop strand and the Walker B strand (Leipe et al.,

2002, 2003; Fig. 1). As opposed to kinases and GTPases,

ATP hydrolysis by the proteins of the ASCE grouptypically depends on a conserved catalytic (proton-

abstracting) glutamate that primes a water molecule for

the nucleophilic attack on the c-phosphate group of

ATP. The ASCE division includes AAA+, ABC, PilT/

VirD4, superfamily 1/2 (SF1/2) helicases, and RecA/F1/

F0 superfamilies of ATPases, along with several addi-

tional, less confidently classified families.

Starting over a decade ago, the AAA ATPases

(ATPases associated with a variety of cellular activities)

were encountered in studies on an astonishing range of

biochemical systems (Confalonieri and Duguet, 1995;

Lupas and Martin, 2002; Ogura and Wilkinson, 2001).

These included, among others, the eukaryotic prote-

asomal ATPases, CDC48, and FtsH, which are involved

in processes related to protein stability and degradation

in bacteria and eukaryotes, NSF, which is implicated in

vesicular fusion, Pex1p, involved in peroxisome bio-

genesis, and Bcs1p, which participates in the assembly of

mitochondrial membrane complexes. Approximately

around the same time, a detailed computational analysis

of various cellular and viral proteins involved in nucleic

acid metabolism, such as DnaA, the MCM proteins,

NtrC-type transcription factors, and helicases of various

RNA and DNA viruses comprising the helicase

superfamily 3 (SF3), suggested that all these proteins

shared a conserved ATPase domain (Koonin, 1993).

Solution of the X-ray structure of the NSF protein

and its comparison with the clamp loader subunit

structure supported the unification of these ATPases

into a monophyletic group (Guenther et al., 1997;

Lenzen et al., 1998). Concomitantly, we conducted asystematic analysis of these ATPase domains using

advanced sequence profile analysis methods and struc-

tural comparisons, which resulted in the unification of

the bona fide AAA ATPase and the DnaA/MCM/

NtrC/SFIII-related proteins into a single, monophyletic

AAA+ class (Neuwald et al., 1999). Additionally,

this analysis showed that various other ATPase domain

families, such as ClpAB/Hsp100, ClpX, HslU, and Lon,

which are involved in protein folding and degradation,

the eukaryotic motor protein dynein, a large, conserved

eukaryotic protein with 6 ATPase domains (subse-

quently termed midasin), magnesium and cobalt

chelatases, the bacterial DNA-replication clamp load-

ers, and eukaryotic replication factor C subunits, also

belonged to the AAA+ class. It was also proposed that

the AAA+ domain might be a common denominator in

the catalytic assembly or disassembly of large cellular

complexes of polypeptides and nucleic acids and that

the majority of AAA+ ATPases function as oligomeric

ring structures, which provide symmetric or quasi-

symmetric surfaces for interactions with other mole-

cules or a central pore for threading molecules in an

extended conformation (Neuwald, 1999; Neuwald et al.,

1999).

Since the publication of the original analysis of theAAA+ class, a wealth of structural and biochemical

studies have been published that have strongly rein-

forced the monophyly of AAA+ ATPases and eluci-

dated intricate functional details of how oligomeric

rings of AAA+ proteins could be deployed in various

biological contexts (Dougan et al., 2002; Lupas and

Martin, 2002; Ogura and Wilkinson, 2001). Currently

over 15 structures of distinct types of the AAA+

domain are available (Fig. 1). This data, along with

the genome sequences of diverse organisms from many

of the principal phylogenetic lineages, provides for a

post-genomic vantage point to address several

issues, which have not been tractable previously: (1) A

formal, unified definition of the AAA+ class that

combines sequence and structural information. (2) The

higher order relationships within the AAA+ class. (3)

The earliest events in the evolution of AAA+ ATPases

and its differentiation from the other ASCE ATPases.

(4) The trends in colonization of various functional

niches during the evolution of this class of ATPases.

Here, we address these problems, particularly in light

of the new information that became available since the

previous survey of the AAA+ class fo ATPases

(Neuwald et al., 1999).

12 L.M. Iyer et al. / Journal of Structural Biology 146 (2004) 1131
http://scop.mrc-lmb.cam.ac.uk/scop/http://scop.mrc-lmb.cam.ac.uk/scop/


3/21

2. Results and discussion

2.1. The defining structural and catalytic features of the

AAA+ ATPases

We collated all currently available structures of pro-

teins that have been confidently assigned to the AAA+class and prepared a multiple alignment of their se-

quences on the basis of their structure superposition.

This allowed us to map all the major conserved sequence

features to their 3D structural cognates in the core

AAA+ domain. Furthermore, this structure-based

alignment also enabled correction of the earlier align-

ments (Koonin, 1993; Neuwald et al., 1999), which were

derived principally from sequence comparisons, and the

recognition of previously overlooked subtle sequence

signatures. Representative sequences of previously

identified AAA+ proteins were then used as seeds for

PSI-BLAST searches of the NR database to detect new

AAA+ ATPases. All the newly detected putative AAA+

proteins were compared to the structure-based multiple

alignment to establish their membership in the AAA+

class based on the presence of the defining consensus

patterns.

The AAA+ ATPase domains share the ancestral

Walker A and B motifs with the rest of the P-loop

NTPases (Figs. 13). In the majority of the members of

the AAA+ class, the canonical form of the Walker A (P-

loop) motif is conserved, typically, in the form of

GX2GXGK[ST]. At least one of the residues between

the first two glycines of the signature is frequently a

proline. Certain minor deviations from the canonicalform are seen in P-loop motifs of the NtrC

(GX2GXGK[DE]), MCM (GX2GXAKS), and MoxR

(GX2GXAK[ST]) families (Fig. 3). Additionally, the

P-loop motif is disrupted in various catalytically inactive

AAA+ domains, such as Sir3p, Orc4p, some of the re-

peats of dynein, and the d0 subunit of the bacterial

replication clamp-loader. The Walker B motif typically

assumes the form hhhhDE (h is a hydrophobic residue),

where the conserved glutamate primes a water molecule

for a nucleophilic attack on the c-phosphate group of

ATP (Story and Steitz, 1992). Deviations from this state

are again observed in forms that are likely to be cata-

lytically inactive. The core parallel sheet of the AAA+

ATPase domain assumes a 51432 topology, where

strands 1 and 3 are, respectively, associated with the

Walker A and B motifs (Fig. 1) (Guenther et al., 1997;

Lenzen et al., 1998; Neuwald et al., 1999). This core

differs from most of the other ATPases of the ASCE

division, such as RecA-F1, SF1/2 helicases and PilT

ATPases, in lacking additional strands to the right of

strand 2. Strand 4 of the AAA+ ATPase domain is as-

sociated with another motif (termed sensor-1 or Motif

C) bearing a conserved polar residue (Figs. 2, 3). Like its

equivalents in several other ASCE division ATPases,

this residue is likely to mediate interactions that are

critical for ATP hydrolysis rather than ATP binding

(Guenther et al., 1997; Putnam et al., 2001).

Beyond these basic features of the core domain, the

AAA+ class has several features that distinguish it from

other NTPases of the ASCE division. First, the AAA+

proteins possess an additional conserved helix N-ter-minal to the Walker A strand. This helix typically con-

tains a conserved glycine or a small residue that caps it

at the N-terminus, and a conserved polar (usually

acidic) residue that defines the N-terminal sequence

motif of this class (Figs. 13). Most structures also

contain a conserved region, which is located upstream of

this helix, adopts a characteristic extended conformation

and runs perpendicular to the direction of the strands in

the core sheet (Figs. 1 and 2). The presence of this region

in representatives from all the diverse branches of the

AAA+ class suggests that it is an ancestral, defining

feature of the AAA+ clade. The AAA+ class is also

distinguished by the presence of a helical bundle with 4

helical segments that occurs immediately C-terminal to

strand 5 of the core ATPase domain. This structure

contains a conserved motif (sensor-2) with a frequently

conserved arginine (Figs. 2 and 3). In the classical AAA

proteins this arginine is often replaced by an alanine.

Sensor-2 appears to be critical in constraining the ATP

molecule to facilitate its hydrolysis and undergoes con-

formation changes depending on the presence of ATP or

ADP. Thus, this helical module appears to mediate the

transmission of the free energy of ATP hydrolysis by

AAA+ proteins to their respective substrates (Ogura

and Wilkinson, 2001).A large number of the AAA+ ATPases characterized

to date form quasi-symmetrical oligomeric ring struc-

tures that, in certain cases, have been shown to thread

nucleic acids or peptides in extended conformation

through the central pore (Lenzen et al., 1998; Neuwald,

1999; Neuwald et al., 1999; Ogura and Wilkinson, 2001;

VanLoock et al., 2002). In the case of dynein and mi-

dasin, which contain 6 repeats of the AAA+ domain in a

single polypeptide, the protein is also likely to fold into a

hexameric ring. This quaternary structure has also been

consistently observed in several members of the RecA/

F1 and PilT classes of ATPases, suggesting that it is an

ancestral feature of the entire ASCE clade (Egelman

et al., 1995; Gomis-Ruth et al., 2001; Leipe et al., 2000).

The inter-protomer cooperation in ATP-hydrolysis is

elicited by another defining feature of the AAA+ su-

perclass, the arginine finger. This is a conserved arginine

that is located at the C-terminus of the helix upstream of

strand 5 and is directed towards the ATP-containing

active site of the adjacent protomer in a ring (Guenther

et al., 1997; Putnam et al., 2001; Zhang et al., 2002). The

arginine finger appears to be displaced relative to its

position in other AAA+ domains in the DnaA and Orc/

Cdc6 families (Fig. 2). The above features can be

L.M. Iyer et al. / Journal of Structural Biology 146 (2004) 1131 13


4/21

considered shared derived characters (synapomorphies)

of the AAA+ clade within the ASCE division and con-

stitute a blueprint that allows clear segregation of

AAA+ ATPases from all other P-loop NTPases.

2.2. Evolutionary classification of the AAA+ ATPases

2.2.1. Identification of AAA+ families and relationships

between them

All previously identified AAA+ ATPases and newly

detected proteins, which conformed to the AAA+

blueprint described above, were clustered using the

BLASTCLUST program with varying score density and

protein-length-overlap thresholds. Those clusters that

remained stable over a range of medium thresholds

(score density 0.30.5) were considered likely to define

monophyletic families or at least cores of such families.

The alignments for these clusters were then constructed

using the T-Coffee program, corrected based on the

template structural alignment, and analyzed to identify

regions of extended conservation between and beyond

the principal conserved signatures described above. This

allowed us to formally define families on the basis of

conserved signatures (Table 1). Typically, families in-

cluded one or more orthologous lineages and, in some

cases, also a cloud of more divergent paralogs. The

presence of shared sequence features between families

and/or consistent clustering of families based on score

densities allowed us to delineate clades comprised of

multiple families. Further higher order groups of these

clades were derived using structure-based clustering

with pair-wise DALI Z-scores and through identifica-tion of unique structural features that unified multiple

superfamilies (Figs. 1, 4 and Table 1). Conventional

phylogenetic trees, constructed using the neighbor

joining, maximum likelihood, and minimum evolution

methods, were used to explore the relationships within

individual families or a group of closely related families.

However, as the overall sequence similarity between

families decreases there are only a small number of

conserved positions shared between the families. It is

not possible to obtain statistically well-supported higher

order groupings based on this small set of universally

conserved residues using conventional phylogenetic

methods. In these conditions, could also cause artificial

clustering of proteins that retain primitive sequence

features present in the common ancestor of the entire

group. Hence, it is necessary to depend mainly on

structure comparisons to identify the primitive struc-

tural state for the fold, and then delineate the structural

characters that are derived in particular lineages. Based

on these characters the most parsimonious scenario (a

scenario, which minimizes the loses and independent

innovations of particular features), which accounts for

the observed structural quirks was determined and is

presented here.

Additionally, we analyzed the phyletic distribution of

individual families in the completely sequenced ge-

nomes. For families with a wide phyletic spread, i.e.,

those present in 2 or all 3 primary kingdoms (bacteria,

archaea, and eukaryotes), conventional phylogenetic

trees were constructed. The presence of distinct archaeal

(or archaeo-eukaryotic) and bacterial branches in thesetrees was taken as an indication that the given family

was most likely represented in LUCA. In contrast, the

presence of a well-defined bacterio-eukaryotic branch,

particularly in the absence of a clear archaeal clade,

suggested horizontal gene transfer (HGT) from bacteria

to eukaryotes, most often from the pro-mitochondrion

or the pro-chloroplast. Below we briefly describe the 26

identified families of AAA+ ATPases, with an emphasis

on their phyletic distribution and lineage-specific deri-

vations, along with the predicted functions of unchar-

acterized groups (Table 1).

2.2.2. The clamp loader cladeIn all cellular life forms, the clamp loader ATPases

are responsible for loading the DNA clamp, which is

comprised of the PCNA or DNA polymerase III b

subunits, onto the DNA (Davey et al., 2002a; Davey

and ODonnell, 2000; Hingorani and ODonnell, 1998).

The AAA+ domains of this clade mostly conform to the

idealized core of this class without any specialized

innovations (Fig. 2). The clamp loader ATPases have a

synapomorphic RC signature associated with the ar-

ginine finger (Fig. 3). Three major families can be

identified within this clade. The first of these, the bac-

terial family, includes two pan-bacterial orthologouslineages, typified by the Escherichia coli HolB (d0 sub-

unit) and DnaX (c and s subunits) proteins, respectively.

The bacterial family of clamp loaders is characterized by

the insertion of a Zn cluster downstream of the helix

associated with the Walker A motif (Guenther et al.,

1997). The presence of members of the two orthologous

lineages in almost all bacteria suggests an ancient du-

plication and functional diversification in the common

ancestor of all known bacteria. The DnaX gene from

cyanobacteria appears to have been secondarily trans-

ferred to plants and probably participates in the repli-

cation of the chloroplast DNA.

The second clamp loader family, the RFC family,

has an archaeo-eukaryotic distribution and consists of

two major orthologous lineages. One of these lineages

includes the archaeal RFC proteins typified by

MTH241 and the eukaryotic RFC2, RFC3, RFC4,

and RFC5 proteins. The second lineage includes the

archaeal proteins typified by MTH240 and the eu-

karyotic RFC1, Rad24, Chl12, and Yor144c proteins.

In several archaea, the representatives of the two or-

thologous lineages are encoded by adjacent genes in

the genome. This suggests that an ancient duplication

in the archaeal lineage resulted in two branches of the



5/21

RFC family. Early in eukaryotic evolution each of

these lineages appears to have undergone several ad-

ditional duplications giving rise to the 5 subunits of

RFC and the other proteins, such as Rad24 and

Chl12, which were recruited for distinct roles in DNA

repair (Naiki et al., 2001). This was paralleled by the

duplication of the PCNA clamp itself, resulting inspecialized clamps, such as Rad1, Rad9, and Hus2,

that functioned as partners for the Rad24-like ATP-

ases (Aravind et al., 1999). Additionally, the RFC

family includes the clamp-loaders of viruses, such as

T4 (Davey et al., 2002b), which could represent an

early diverging branch associated specifically with viral

replicons.

The third major family of is the WHIP family, typi-

fied by the Werner helicase interacting protein from

vertebrates and yeast Mgs1p (Kawabe et al., 2001). This

family is present in most major bacterial lineages and in

all sequenced eukaryotic genomes. Most members of

this family contain a distinct C-terminal globular do-

main fused to the AAA+ domain (Fig. 5) that contains a

conserved acidic residue reminiscent of the catalytic

residue of the RNAse H fold (Bork et al., 1992). In

phylogenetic trees, the eukaryotic proteins form a tight

group that lies within the bacterial radiation, closer to

the proteobacteria (data not shown). This suggests that

the WHIP family was probably derived early in bacterial

evolution from the bacterial clamp loader family and

was transferred to eukaryotes from the pro-mitochon-

drial endosymbiont (Fig. 4).

The presence of distinct, functionally equivalent

bacterial and eukaryotic families in the clamp loaderclade suggests that one ancestral member of this clade

was present in LUCA.

2.2.3. The DnaA/CDC6/ORC cladeThis clade consists of two families, namely, the bac-

terial DnaA family and the archaeo-eukaryotic CDC6/

ORC family, which perform analogous roles in the as-

sembly of protein complexes associated with the repli-

cation origin (Erzberger et al., 2002; Giraldo, 2003). The

synapomorphy that unifies the DnaA/CDC6/ORC clade

is the presence of two helices after strand 2, which are of

approximately equal size, and pack against each other in

a characteristic fashion (Figs. 13). However, the two

families of this clade have greatly diverged from each

other in terms of sequence. The bacterial DnaA familyin bacteria contains two principal orthologous groups.

DnaA proper is a pan-bacterial group, typically present

in a single copy in all known bacteria. The DnaA pro-

tein forms an oligomeric complex around the replication

origin site and initiates local unwinding of DNA, which

is required for recruitment of the DnaB helicase (Davey

and ODonnell, 2000; Erzberger et al., 2002). The second

orthologous set typified by the E. coli DnaC protein is

sporadically present in several diverse bacteria and ap-

pears to have arisen through a duplication of DnaA

relatively late in bacterial evolution. In E. coli, DnaC

has been shown to load the helicase DnaB to the single-

stranded DNA at the origin site (Davey et al., 2002a). In

bacteria lacking DnaC, this function might be per-

formed by DnaA itself. IstB, a member of the DnaC

lineage, is encoded by transposons of the IS21 family

and is required for their transposition, probably via re-

cruitment of the replication complex to the transposon

DNA.

The Orc/Cdc6 family has 12 representatives in al-

most all archaea. Interestingly, it is absent in Met-

hanococcus, while Halobacterium shows an extensive

lineage-specific expansion of this family, with 11 par-

alogous proteins. Early in eukaryotic evolution, the

Orc/Cdc6 family appears to have differentiated into theOrc1p, Orc4p, Orc5p, and Cdc6p lineages that are

present in most extant eukaryotes. These proteins co-

operate in loading the MCM complex to the origin of

replication site analogously to the action of DnaA and

DnaC in loading DnaB in bacteria (Lee and Bell, 2000;

Liu et al., 2000). Certain eukaryotic members of

the Orc/Cdc6 family, such as yeast Orc4p and Sir3,

a yeast-specific paralog of Orc1p, have disrupted

Fig. 1. Topology diagrams of selected AAA+ ATPases and other members of the P-loop NTPase fold. Strands are shown as arrows with the

arrowhead on the C-terminal side and numbered 1 through 5. Strands 1 and 3 that encompass the conserved sequence motifs GxxxxGK[ST] (WalkerA) and hhhh[DE] (Walker B) are rendered in orange; the other core strands (2,4,5) are in light orange; non-conserved strands are in gray. Helices are

shown as blue rectangles when above the plane of theb-sheet and in faint blue when below the b-sheet. The P-loop is shown as a red line and a green

arrowhead marks the N-terminus of the ATPase domain. The defining feature of the AAA+ ATPases, the loop that runs across the face of the b-sheet

and the helix before strand 1 are highlighted in dark blue. The b-hairpin that defines the PS1BH group is rendered in purple. Broken lines indicate

secondary structure elements that are not present in the PDB file or that were left out for clarity. The top panel shows two proteins from outside the

AAA+ group (RecA, ASCE division; TMP Kinase, KG division) for comparison purposes. Sequences are identified with the protein name, PDB

code (in parenthesis) and the organism name.

Fig. 2. Structures of selected AAA+ ATPases. The top left panel shows the structure of a member of the clamp loader clade labeled with all the

synapomorphies of the AAA+ clade. It is close to the ideal AAA+ domain structure. The top left panel shows the structure of the archaeal Cdc6

protein with 2 helices after strand 2, which are the shared character of the ORC/CDC6 clade. Bottom left panel shows the structure of RuvB, which

has the hallmark b-hairpin defining the PS1BH superclade. The bottom right panel shows the structure of Mg chelatase, which also belongs to the

PS1BH clade. Additionally, it also shows the features that define the helix-2 insert clade and the displaced C-terminal helical bundle. Walker B (DE),

sensor 1(polar residue), arginine finger and sensor 2 (arginine) are shown in all the structures.

c



6/21



7/21



8/21

P-loops, suggesting that they have been recruited for

non-catalytic roles like proteinprotein and protein

DNA interactions.

The two families of the DnaA/CDC6/ORC clade can

be traced back, respectively, to the common ancestors of

the archaeo-eukaryotic and bacterial clades. This ob-

servation, taken together with the mechanistic similarity

of their role in replication initiation, suggests that the

clade was represented by the common ancestor of the

two families in LUCA (Giraldo, 2003). The recruitment

of very different replication initiation complexes by these

ATPases in archaea-eukaryotes and in bacteria might

have contributed to the extensive sequence divergence

between them.

2.2.4. The classical AAA domains and associated diver-

gent sister groups

The classical AAA clade consists of all ATPases re-

lated to the proteasomal ATPases, FtsH, and CDC48

that originally were defined as the AAA superfamily

(Confalonieri and Duguet, 1995). The main structural

synapomorphy for this clade is the presence of an addi-

tional short helix immediately downstream of strand 2

(Figs. 1 and 3). The proteins of this clade strongly cluster

to the exclusion of other AAA+ proteins in similarity-

based clustering, and often have a conserved glycine

N-terminal to the arginine finger. Since several detailed

evolutionary investigations on this family have been

published, we only mention here the broad evolutionary

Fig. 3. Alignment of AAA+ ATPases. Multiple sequence alignment of selected representatives AAA+ ATPases families. The coloring reflects 80%

consensus of residue conservation. Secondary structure assignments are shown above the alignment, where E signifies ab-strand and H a a-helix. The

coloring is based on the 80% consensus shown underneath the alignment. The coloring scheme is as follows: h indicates hydrophobic residues(ACFILMVWY) shaded yellow, s indicates small residues (AGSVCDN), colored green, o indicates alcohol group containing residues (ST), shaded

blue, and p indicates polar residues (STEDKRNQHC) colored purple. Strongly conserved polar residues are colored in red (RED). The predicted

secondary structure elements are shown below the alignment. Specific synapomorphic characters have been boxed. Species abbreviations are as

follows: AAV. Adeno-associated virus; Aqae, Aquifex aeolicus; At, Arabidopsis thaliana; Af, Archaeoglobus fulgidus; AcNPV, Autographa californica

nucleopolyhedrovirus; Bs, Bacillus subtilis; phiC31, Bacteriophage phi-c31; Ce, Caenorhabditis elegans; Cj, Campylobacter jejuni; Dr, Deinococcus

radiodurans; Ddi, Dictyostelium discoideum; Dm, Drosophila melanogaster; Ec, Escherichia coli; Gila, Giardia intestinalis; Hi, Haemophilus influenzae;

Hs, Homo sapiens; Kpn, Klebsiella pneumoniae; Mlo, Mesorhizobium loti; Mjan, Methanocaldococcus jannaschii; Mac, Methanosarcina acetivorans;

Mta, Methanothermobacter thermautotrophicus; Mex, Methylobacterium extorquens; Mm, Mus musculus; Polio, Poliovirus; Pfu, Pyrococcus furiosus;

Ph, Pyrococcus horikoshii; Rhca, Rhodobacter capsulatus; Sc, Saccharomyces cerevisiae; Sp, Schizosaccharomyces pombe; SV40, Simian virus 40; Scoe,

Streptomyces coelicolor; Sso, Sulfolobus solfataricus; Ssp, Synechocystis sp; Tac, Thermoplasma acidophilum; Thth, Thermus thermophilus; Ter,

Trichodesmium erythraeum; VV, Vaccinia virus; and Vc, Vibrio cholerae.



9/21

patterns seen in this clade. Several distinct families can

be identified within this clade, with most of the diversity

observed in eukaryotes. The only pan-bacterial family,

FtsH, includes membrane proteins with a single AAA+

ATPase domain fused to a C-terminal metalloprotease

domain (Koonin et al., 2000b; Krzywda et al., 2002;

Tomoyasu et al., 1995). Some members of the FtsH

family have been laterally transferred to eukaryotes,

apparently through the mitochondrial and chloroplast

routes. Two major families, namely, the proteasomal

ATPase family and the CDC48 family, which contains a

tandem repeat of the AAA+ module, are conserved

throughout the archaeo-eukaryotic branch (Beyer, 1997;

Swaffield and Purugganan, 1997). In eukaryotes, the

proteasomal ATPase family has undergone a massive

proliferation: 6 orthologous lineages, namely S4, S6a,

S6b, S7, S8, S10a, and S10b, can be traced back to the

common ancestor of all crown group eukaryotes.

Table 1

Clamp loader clade Pre-sensor 1 hairpin superclade

SR[CAT] motif associated with Arginine finger b-hairpin inserted before Sensor 1 motif

*HolB/DNAX family SFIII helicase clade [viruses]

Zn cluster between helix-1 and strand 2 *NCLDV D5 ATPase family

**Hol B subfamily [B] *Positive strand RNA virus helicase family

**DNAX subfamily [B, Pl] *Other viral helicases

* RFC family HslU/ClpX/Lon/ClpAB-C clade

**MTH241/RFC2 subfamily [AE] Extended loop after strand 2

(In eukaryotes includes RFC2, RFC3, RFC4, RFC5) *HslU/ClpX family

**MTH240/RFC1 subfamily [AE] **HslU subfamily [B, Kin, Api]

(In eukaryotes includes RFC1, Chl12, RAD24, Yor144c) **ClpX subfamily [B, E]

*WHIP family [B, E] *Lon family

Fused to a C-terminal WC domain

Fused to a Lon protease at the C-terminus

**Bacterial lon protease subfamily [B, E]

DnaA/ORC clade Fused to a LAN domain at the N-terminus

Two helices of approximately equal length after strand 2 **Archaeal lon protease subfamily [A>B]

*DnaA family *ClpAB-C family

**DnaA subfamily [B] NRhD motif associated with Arginine finger

**DnaC subfamily [B] **ClpAB-C subfamily [B, E]

*CDC6/ORC family [A, E] **Torsin subfamily [E]**CDC6/ORC [A] Torsin has a distinct a helical N-terminal domain

** Orc1p, Orc4p, Orc5p, and Cdc6p subfamilies [E] RuvB family [B, E]

Has C-terminal wHTH domain; Helical segment between strand

5 and C-terminal bundle

Classical AAA clade and its divergent relatives

Small helix between strand-2 and helix-2, [GN]R motif associated with

arginine finger

Helix-2 insert clade

*FtsH family [B, E] bab insert in helix-2

Fused to a metalloprotease at the C-terminus NtrC/MCM group

*CDC48 family [A, E] Have [AG][FL]T motif in helix-2 before bab insert

DPBB at N-terminus, 2 AAA+ domains *NtrC family [B]

*Proteasomal ATPase family [A, E] *MCM family [A, E]

**S4, S6a, S6b, S7, S8, S10a and S10b subfamilies [E] Fused to a Zn ribbon at N-terminus and a wHTH

at the C-terminus

*Katanin p60/Fidgetin family [E] *Chelatase / YifB group*NSF1 family [E] **Chelatase family [A, B, plants]

*Pex1/6 subfamily [E] Associated with a vWA domain

*Bcs1p subfamily [E] **YifB family [B]

(Possibl e rel ative s of classic al AAA c lade ) Fuse d to a Lon Protease at its N-te rminus, insert of Zn c luster

after strand 4

*AFG1 family [B, E] *McrB family [B>A, E]

*ClpAB-N family [B>A, E] *Mox R group

*TIP49 family [A, E] **MoxR family [A, B]

N-terminal module with characteristic H[ST]H motif, Predictedb barrel

between Helix-2 and Strand 3

**Dynein/Midasin family

6 Tandem AAA+ domains

**Dynein subfamily [E]

**Midasin subfamily [E]

Abbreviations: A, Archaea; B, Bacteria; E, Eukaryotes; Api, Apicomplexans; Kin, Kinetoplastids; and Pl, Plants >

indicates sporadic lateraltransfer.



10/21

In addition to these archaeo-eukaryotic families,

there are several distinct eukaryotic families that appear

to have occupied specific functional niches (Beyer, 1997;

Frohlich, 2001; Lupas and Martin, 2002; Ogura and

Wilkinson, 2001; Swaffield and Purugganan, 1997) (also

see Froehlichs website: http://aaa-proteins.uni-graz.at/

AAA/Default.html). These include the Katanin p60/Fidgetin family, which is involved in microtubule dis-

assembly, the NSF family with 2 AAA+ domains

involved in membrane fusion, Pex1/6, which is involved

in peroxisome biogenesis, and Bcs1p, which participates

in assembly of mitochondrial membrane protein com-

plexes. The NSF family, which shares a N-terminal

double-w-b-barrel domain with the CDC48 family, is

likely to have been derived from the latter family during

the emergence of the eukaryotic secretory apparatus

(Castillo et al., 1999; Coles et al., 1999). The Bcs1p

family (Cruciat et al., 1999) is the most divergent of all

the eukaryotic families and includes a distinctive plant-

specific subfamily with an uncharacterized, conserved

N-terminal globular domain. This subfamily has un-

dergone a lineage-specific gene expansion in plants, with

at least 15 members encoded in monocot and dicot ge-

nomes. In contrast to these families, the highly divergent

AFG1 family (Lee and Wickner, 1992) shows a bacterio-

eukaryotic phyletic pattern (Table 1). It is present in

most eukaryotic lineages, whereas, amongst the bacte-

ria, it is predominantly seen in proteobacteria, with a

few sporadic occurrences in Deinococcus and actino-

mycetes. Grouping of the eukaryotic members of this

family within the proteobacterial radiation suggests that

this family is likely to have originated in bacteria, withsubsequent HGT to eukaryotes via the proto-mito-

chondrion. Most of these families with limited or spo-

radic phyletic patterns could have emerged through

rapid divergence from the more ancient widely con-

served members of the classical AAA clade.

Members of the ClpAB ATPase family (Hoskins

et al., 2001) have two AAA+ modules that are very

different from each other in terms of sequence, structure,

functions, and phylogenetic affinities (Volker and Lu-

pas, 2002). Furthermore, in oligomers, the two domains

appear to form two distinct homotypic ring structures

stacked on top of one another (Guo et al., 2002). The

N-terminal domain shares a characteristic feature with

the classical AAA clade, namely, the additional small

helix downstream of strand 2 (Figs. 1 and 3). This feature,

along with the sequence similarity patterns, suggests that

the ClpA N-terminal domain was probably derived

through rapid divergent evolution after branching off

from the classical AAA clade. In contrast, the C-ter-

minal domain is related to the Lon and HslU-like

ATPases (see below). Thus, the ClpAB ATPases appear

to have evolved as a result of an ancient fusion of two

phylogenetically distinct AAA+ ATPase modules, ra-

ther than through tandem duplication (which would

seem to be an intuitively appealing scenario given the

head to tail juxtaposition of the 2 ATPase modules in

these proteins) (also see Volker and Lupas, 2002). The

ClpAB ATPases are important chaperones and stress-

response proteins that are found in all bacterial lineages,

often in at least two versions per bacterial proteome

(Hoskins et al., 2001). Some bacteria, e.g., Pseudomonasaeruginosa, encode up to 8 members of this family. In

eukaryotes, the ClpAB proteins are represented by

Hsp104 and the mitochondrial heat shock protein

Hsp78. These proteins are absent in the archaea with the

exception of a single member in Methanothermobacter,

which appears to have been transferred from the bac-

teria. The eukaryotic ClpAB homologs are nested within

the bacterial radiation (data not shown), suggesting that

they have been acquired from the bacterial precursors of

the mitochondria.

The presence of at least one pan-bacterial and two

pan-archaeo-eukaryotic families suggests that LUCA

probably possessed a single ancestral representative of

the classic AAA clade (Fig. 4). This ancestral form ap-

pears to have initially diversified to occupy functional

niches related to protein unfolding and degradation, and

subsequently diversified to perform several other bio-

logical functions in eukaryotes.

2.2.5. The TIP49 family

These proteins are DNA-stimulated ATPases that

have been shown to associate with the TATA-binding

protein and appear to play a critical role in the assembly

of complexes related to transcriptional activation

(Wood et al., 2000). This family has a single represen-tative in several archaea and two distinct orthologous

groups, namely pontin and reptin (Rottbauer et al.,

2002), in the crown group eukaryotes. Thus, the TIP49

family appears to have emerged in the common ancestor

of the archaeo-eukaryotic lineage, followed by a split

into two paralogous lineages prior to the radiation of

the eukaryotic crown group. This group is characterized

by a small conserved N-terminal module and a re-

markable insert of a novel predicted b-barrel domain

upstream of the Walker B strand. Family-specific inserts

are seen in this location in other AAA+ families (e.g.,

HlsU) also and are likely to form a second toroidal

structure stacked below the ATPase domain. Iterative

database searches with profiles for the TIP49 family

recover both classic AAA and clamp loader clade

proteins as the best hits. However, multiple alignment-

based secondary structure prediction suggests the pres-

ence of an additional small helix downstream of strand

2, just as in the classical AAA clade (Fig. 3). This sug-

gests that Tip49 probably branched off from that clade,

early in the common ancestor of the archaeo-eukaryotic

lineage, through extensive sequence divergence. It has

been claimed previously that TIP49 was related to the

helicase subunit of the bacterial resolvasome, RuvB

http://aaa-proteins.uni-graz.at/AAA/Default.htmlhttp://aaa-proteins.uni-graz.at/AAA/Default.htmlhttp://aaa-proteins.uni-graz.at/AAA/Default.htmlhttp://aaa-proteins.uni-graz.at/AAA/Default.html


11/21

(Kurokawa et al., 1999). However, certain key features,

such as a conserved b-hairpin that is characteristic of

RuvB and its relatives (Figs. 13 see below), cannot be

detected in the Tip49 family.

2.2.6. The pre-sensor-1 b-hairpin (PS1BH) superclade

All the remaining lineages of AAA+ domains,namely, the SF3 helicases, HslU and ClpX, Lon, torsin,

the C-terminal AAA+ domain of ClpAB, RuvB, NtrC,

MoxR, and its relatives, dynein, midasin, and the

chelatases, can be unified into a vast monophyletic as-

semblage. This entire superclade is defined by the pres-

ence of a synapomorphic insert between the sensor-1

strand and the preceding helix. Analysis of the struc-

tures of T-antigen (SFIII) (Li et al., 2003), HslU

(Bochtler et al., 2000), Mg chelatase (Fodje et al., 2001),

and RuvB (Putnam et al., 2001; Yamada et al., 2002)

shows that this insert forms a b-hairpin that projects out

of the AAA+ core at a particular angle that is generally

conserved in all the diverse proteins with this feature

(Fig. 2). The HslU structure reveals that the hairpin

forms a lid-like band in the oligomeric ring state

(Bochtler et al., 2000). Sequence conservation associated

with this region allowed us to identify the members of

this assemblage for which 3D structures are not yet

available. Hereinafter, we refer to this assemblage as the

pre-sensor-1 b-hairpin (PS1BH) superclade. Below, we

detail the various higher order clades that could be de-

lineated within this large group of AAA+ ATPases

(Fig. 4, Table 1).

2.2.7. The SFIII helicase cladeSuperfamily III helicases were first identified in nu-

merous small RNA viruses, e.g., picornaviruses and

comoviruses, DNA viruses, such as the papovaviruses,

parvoviruses, circoviruses, and baculoviruses (p143

protein), and phages, e.g., P4 (Gorbalenya et al., 1990;

Koonin, 1993). Subsequently, we identified a specific

version of the SFIII helicase, typified by the vaccinia

virus D5 protein, to be one of the synapomorphies of the

nucleo-cytoplasmic large DNA virus clade (Iyer et al.,

2001) (Table 1). However, other than prophage rem-

nants, there are no SF3 helicases encoded in cellular

genomes. Thus, this lineage of AAA+ ATPases might

have originally evolved in primitive, small replicons that

now are only represented by viruses. The emergence of a

more complex DNA replication apparatus in the ar-

chaeo-eukaryotic and bacterial branches of life might

have displaced these simpler helicases in the cellular

systems.

2.2.8. The HslU/ClpX/Lon/ClpAB-C clade

The chaperones and ATPase subunits of proteases

appear to constitute another major monophyletic line-

age within the PS1BH superclade (Fig. 4). This clade is

supported by an extended loop between strand 2 and the

helix downstream of it. Support for this clade is also

offered by the consistent reciprocal recovery of these

proteins in iterative profile searches. Furthermore,

functional considerations also support the emergence of

this clade from an ancestral ATPase that probably

functioned as a cofactor for diverse proteases.

Three distinct families can be identified within thisclade. The first of these, the HslU/ClpX family, is

widespread in bacteria but absent in the currently

available archaeal proteomes (Koonin, 1993). Within

this family, two orthologous lineages, typified, respec-

tively, by the E. coli HslU and CplX proteins are

widespread in most major bacterial lineages. Orthologs

of ClpX, which function in the mitochondria, have been

detected in several eukaryotes (Corydon et al., 2000),

whereas HlsU orthologs, which might also have a mi-

tochondrial or plastid function, are known from kine-

toplastids and apicomplexans (Couvreur et al., 2002).

The phyletic pattern of the HslU/ClpX family suggests

that they diversified into the two distinct orthologous

groups prior to the radiation of the major bacterial

phyla and were subsequently acquired by eukaryotes

from the pro-mitochondrial endosymbionts. Interest-

ingly, although HslU and ClpX belong to the same

family of AAA+ ATPases, their protease partners, HslV

and ClpP, respectively, belong to the unrelated NTN

hydrolase (Pei and Grishin, 2003) and acyl-CoA decar-

boxylase/isomerase superfamilies (Aravind and Koonin,

1999a). The HslV protease is related to macropain, one

of the proteases of the archaeo-eukaryotic proteasome.

However, in the proteasome, the macropains function-

ally interact with the ATPases of the classical AAAclade (Seemuller et al., 1995; Unno et al., 2002).

The C-terminal domain of the ClpAB proteins and

the Torsin proteins from animals comprise the second

family (ClpAB-C family) of the HslU/ClpX/Lon/ClpAB-

C clade. As discussed above, the two AAA+ domains of

the ClpAB proteins are very different from each other

(Guo et al., 2002; Volker and Lupas, 2002). The C-ter-

minal domain has the hallmark hairpin of the PS1BH

superclade and, in sequence profile searches the C ter-

minal domain of ClpAB preferentially recovers other

PS1BH proteins rather than the N-terminal domain of

ClpAB. Torsin defines a group of animal proteins,

which appear to be involved in the assembly of protein

complexes in the endoplasmic reticulum (Basham and

Rose, 2001; Bassler et al., 2001). Torsin is specifically

related to the ClpAB C-terminal domain, but has a far

more restricted phyletic pattern compared to the nearly

pan-bacterio-eukaryotic spread of the ClpAB proteins

(see above). Thus, it seems likely that torsin was derived

specifically in the animal lineage through rapid diver-

gence of a breakaway version of the ClpAB C-ter-

minal domain.

The Lon proteins from archaea and bacteria define

the third major family (Lon family) within the HslU/



12/21

ClpX/Lon/ClpAB-C clade. Two distinct lineages can be

delineated within the Lon family (Koonin et al., 2000a).

The first of these, the bacterial Lon lineage, is repre-

sented in all the major groups of bacteria and is also

seen in eukaryotes as the mitochondrial Lon protease.

These proteins are characterized by a distinctive domain

termed the LAN domain and a Lon-protease domain,

which flank the AAA+ domain at the N- and C-termini,

respectively (Fig. 5). The second lineage, LonB or ar-

chaeal Lon (Fukui et al., 2002), has a pan-archaeal

representation, with occasional representatives in bac-

teria, such as low GC Gram-positive bacteria, c-prote-

obacteria, Thermotoga and Treponema. They lack the

LAN domain but have the C-terminal protease domain

separated from the ATPase domain by a long segment

predicted to assume a coiled-coil conformation. These

phyletic patterns suggest that bacterial Lon emerged

prior to the diversification of bacteria and was trans-

ferred to the eukaryotes via the mitochondrial route.

LonB (archaeal Lon) appears to have emerged prior to

the archaeal diversification and probably was horizon-

tally transferred to bacteria subsequently. This implies

Fig. 4. Inferred evolutionary history of AAA+ ATPases. The figure shows several relative temporal epochs separated by the major evolutionary

transitions that mark their boundaries. The solid colored bars indicate the maximum depth to which the AAA+ lineages can be traced with respect to

these temporal epochs. The dashed lines indicate uncertainty in terms of the exact point of origin of a lineage. The ellipses bundle groups of lineages

from which a new lineage with relatively limited phyletic pattern could have potentially emerged via rapid divergence. Colored circles at the terminal

branches indicate broad functional categories: yellow, DNA replication and repair; blue, transcription; pink, chaperone or protein unfolding/deg-

radation; and white, other specialized functions.



13/21

that the LUCA had a single representative of the Lon

family that subsequently diversified into the bacterial

and archaeal Lons concomitantly with the divergence of

these superkingdoms of life.

Interestingly, the LAN appears to have acquired an

independent existence of its own during bacterial evo-

lution (Fig. 5). In several bacteria, such as Deinococcus,

Cyanobacteria, and proteobacteria, it was detected in

standalone proteins, whereas in certain eukaryotic pro-

teins, typified by the fungal CrgA and its orthologs, it is

fused to a N-terminal ubiqutin-E3 ligase (a RING fin-

ger; Fig. 5). This domain architecture suggests that LAN

domain was reutilized as a general adaptor for interac-

tions with proteins that are targeted for degradation.

2.2.9. The RuvB family

RuvB is the helicase subunit of the bacterial Holliday

junction resolvasome, which additionally includes RuvC,

the resolvase, and RuvA, a DNA-binding protein asso-

ciated with the complex (Putnam et al., 2001; Yamada

et al., 2002). RuvB is ubiquitous in bacteria, but is absent

in eukaryotes and archaea. The structure of RuvB clearly

shows that this family belongs to the PS1BH superclade

(Figs.1 and 3); however, it does not have any features that

allow us to specifically place it within any of the other

clades. The phyletic pattern of RuvB suggests that it

evolved prior to the radiation of bacteria from their

common ancestor, probably from one of the PS1BH

families that was already present in the LUCA (Fig. 4).

Fig. 5. A graph showing the domain architectures and select conserved functional interactions of the AAA+ ATPase. Direction of the arrow on an

edge of the graph indicates the polarity (whether a domain is to the N- or C-terminus) of the domain fusions in a polypeptide. A two-headed arrow

indicates that the fusions may occur either at the N or C terminus in different polypetides. The black edge indicates a physical interaction between two

kinds of domains in a protein complex. The barbed arrowhead on an edge indicated a domain insertion (e.g., the Zn cluster inserted in bacterial

clamp loaders). The domain abbreviations are: ANK, Ankyrin repeat; BAM, bromo-associated motif; BRCT, BRCA C-terminal domain; Bromo,

bromodomain; CH, calponin homology domain; ClpN- N-terminal domain of ClpAB ATPases; DPBB, double-w-b-barrel; fHTH, Fis-type helix-

turn-helix; wHTH, winged HTH; LAN, LA(LON)-N terminal domain; R-RING finger; REC, Receiver domain; REPO, insertb-barrel domain in

Reptin and Pontin; WC-WHIP-C-terminal domain; ZNC1/2/3, zinc clusters; ZNR-Zinc Ribbon.



14/21

2.2.10. The helix-2 insert clade

The structure of the Mg chelatase reveals the presence

of a unique insert within helix-2 of the conserved ASCE

division P-loop ATPase core (Fodje et al., 2001). This

insert folds into two b-strands that form hydrogen

bonds with each other and flank a small helical region

between them. This insert does not significantly distortthe axis of helix-2, and the small helical segment flanked

by the b-strands appears to be a laterally displaced

fragment of helix-2 (Fig. 2). A sequence profile corre-

sponding to this region from the chelatases detects the

insert in a specific group of families of AAA+ domains,

and their sequence alignment reveals the presence of

sequence conservation associated with this region

(Fig. 3). Additionally, the families possessing this insert

in helix-2 also, typically, contain an insert of a long

helical segment between strand 5 (the C-terminal most

strand of the AAA+ core) and the C-terminal a-helical

bundle of the AAA+ module (Figs. 2 and 3). The crystal

structure of the Mg chelatase reveals that this insert

results in a displacement of the helical bundle from its

usual position at the top of the core AAA+ P-loop

module to below it (Figs. 2 and 3) (Fodje et al., 2001).

Experiments on the Mg2 chelatase suggest that that

upon binding ATP binding a large conformational

change is likely to reorient the helical bundle back to the

regular conformation (Hansson et al., 2002). Given the

conservation of this helical region it is likely that such a

conformational change is a common aspect of the

functions of this entire clade. Accordingly, we unified

the families containing these synapomporphic features

into the helix-2 insert clade. Seven major families,namely NtrC, MCMs, McrB, chelatases, YifB, MoxR,

and dynein-midasin were identified within this lineage.

The NtrC family is purely bacterial in its distribution

and co-occurs with its functional partner, the RNA

polymerase sigma factor 54. Members of this family

function as transcriptional activators that bind DNA at

a site distal from the sigma 54 binding site and, upon

interaction with the latter, catalyze ATP-dependent

structural transitions required for transcription initia-

tion (Rombel et al., 1998; Zhang et al., 2002). Most

members of this family contain C-terminal helix-turn-

helix (HTH) domains of the FIS family, which enables

them to bind DNA. The NtrC family has diversified

within the bacteria into several distinct subfamilies that

have characteristic fusions with various N-terminal do-

mains. These include the CheY-like receiver domain in

NtrC, the GAF domain in FhlA, the ACT and PAS

domains in TyrR, and the 4-vinyl reductase (4VR) do-

main in XylR (Fig. 5; Anantharaman et al., 2001). These

domains either connect the NtrC ATPase to the two-

component signaling systems or function as small mol-

ecule-binding domains that enable the ATPase to sense

various metabolites in the environment (Rombel et al.,

1998; Zhang et al., 2002).

The MCM family, which is ubiquitous in the archaeo-

eukaryotic family, has at least one member in all ar-

chaea studied to date (Kelman and Hurwitz, 2003). In

eukaryotes, it has diversified into six distinct ortholo-

gous lineages that appear to have diverged from each

other at an early stage of eukaryotic evolution. Mem-

bers of this family are characterized by fusions to anN-terminal Zn-ribbon domain and a C-terminal DNA-

binding winged-helix-turn-helix domain (Aravind and

Koonin, 1999b). These proteins function as hexameric

or heptameric ring helicases, which catalyze extensive

unwinding of the DNA at the origin of replication

during the initiation process (Fletcher et al., 2003; Yu

et al., 2002).

The McrB family is typified by the NTPase subunit of

the Mcr restriction-modification system and differs from

most of the AAA+ proteins in that it specifically utilizes

GTP rather than ATP (Panne et al., 2001; Pieper et al.,

1999). Orthologs of McrB are sporadically distributed in

bacteria and the archaea, Pyrobaculum and Methano-

sarcina, and are all encoded by the mobile operon that

encodes the Mcr-type restriction-modification system.

Interestingly, the McrB family is also represented in

animals by proteins such as Unc-53, which is involved in

axonal path finding (Stringham et al., 2002), HELAD-1

and cortactin-binding protein-2 (CBP-2). These are

large multi-domain proteins, which combine the AAA+

module of the McrB family with an N-terminal ankyrin

(in CBP-2) or Calponin homology domains (in Unc-53

and Helad-1) (Fig. 5). The Helad-1 protein has been

shown to possess 30 ! 50 helicase activity, but the rele-

vance of this function for neuronal path finding is notclear (Ishiguro et al., 2002). It is likely that these pro-

teins also play a role in the assembly of cytoskeletal

complexes. The CBP-2 proteins have a disrupted P-loop

and Walker B motif, suggesting that they are catalyti-

cally inactive, and the AAA+ module probably only

mediates specific interactions (Fig. 3). Since, among the

eukaryotes, the McrB family so far has been detected

only in animals, it seems likely that it was acquired

relatively late in eukaryotic evolution via HGT from

bacteria. McrB seems to represent a remarkable case

where a protein has been recruited for a biological

function that is completely different from its ancestral

role, after trans-kingdom HGT.

Members of the Chelatase family of the helix-2 insert

clade catalyze the insertion of metal ions, such as Mg2

and Co2, into the porphyrin rings during the biosyn-

thesis of cofactors, such as chlorophyll and cobalamin

(Fodje et al., 2001; Hansson et al., 2002). Members of

this family are either fused to a C-terminal von Wille-

brand factor A (vWA) domains or interact with stand-

alone vWA domains in a multisubunit complex. The

vWA domain functionally cooperates with the AAA+

ATPase domain in the metal-insertion reaction (Fodje

et al., 2001). The chelatase family is widespread in



15/21

photosynthetic and autotrophic prokaryotes and plants.

Although two branches, dominated, respectively, by

bacterial and archaeal proteins can be seen in the phy-

logenetic tree of the chelatases, there are a number of

archaeal and bacterial proteins that cluster as sister

groups in the tree (data not shown). This implies mul-

tiple HGT events in the evolution of the chelatasefamily. Tree also shows that the plant members of this

family have clearly been derived from the cyanobacterial

precursor of the chloroplast.

The YifB family, typified by the E. coli YifB protein,

is found in several major bacterial lineages, such as cy-

anobacteria, actinomycetes, Deinococcus, proteobacte-

ria, certain spirochetes, Thermotoga, and Aquifex, and in

a single archaeon, Methanothermobacter. This family is

characterized by a fusion to an N-terminal Lon protease

domain, suggesting that it probably functions as an

ATP-dependent protease similar to Lon (Koonin et al.,

2000a). Members of this family also contain a unique

insertion of a Zn cluster just downstream of strand 4.

Sequence comparisons show that YifB is closest to the

chelatase family with which it shares certain distinct

sequence signatures (Fig. 3). Thus, the Lon protease

domain appears to have associated with two phyloge-

netically distinct AAA+ domains on independent occa-

sions in evolution. Furthermore, the Lon protease

domain is also fused with an ATPase domain of the

RecA class in the bacterial Sms protein (Aravind et al.,

1999).

The MoxR family is a large family that is represented

in all the major lineages of bacteria and archaea. Some

organisms, such as Mycobacterium tuberculosis, Pseu-domonas, and Aeropyrum pernix, encode up to 5 distinct

members of this family. Several major subfamilies are

recognizable within the MoxR family, including the

classic MoxR subfamily, GvpN, YehL, APE0892, and

YieN subfamilies. Despite their wide distribution, none

of these proteins have been experimentally characterized

in detail. MoxR is involved in the biogenesis of the

methanol dehydrogenase complex (Van Spanning et al.,

1991), while NirQ and CbbQ of the GvpN subfamily are

required for the biogenesis of the nitric oxide reductase

and Rubisco complexes, respectively (Hayashi et al.,

1998, 1999). GvpN itself appears to participate in the

formation of gas vesicles in diverse prokaryotes (Horne

et al., 1991). Thus, the members of the MoxR family

seem to function as chaperones in the assembly of spe-

cific enzymatic complexes. Members of the YieN family

co-occur with genes encoding proteins with vWA do-

mains and, by inference are likely to functionally inter-

act with them, similarly to the Mg chelatase family

proteins. The phylogenetic tree of the MoxR family is

similar to that of the chelatases, suggesting considerable

lateral mobility of these genes within and between ar-

chaea and bacteria (data not shown). Nevertheless, the

nearly ubiquitous presence in the major archaeal and

bacterial lineages suggests that this family emerged very

early.

The two giant multidomain ATPases from eukary-

otes comprise the dynein/midasin family. Both dynein

and midasin contain 6 tandem AAA+ domains in the

same polypeptide (Mocz and Gibbons, 2001; Neuwald

et al., 1999). Dynein functions as an ATP-dependentmotor in a large protein complex, which interacts with

microtubules. Cytoplasmic dynein transports vesicles,

organelles, and chromosomes (during cell division) in

the retrograde direction, whereas flagellar dynein acts as

motor for the movement of the eukaryotic flagellum

(Vale, 2003). At least a single copy of dynein is encoded

in all sequenced eukaryotic genomes, suggesting that it

was present in the last common ancestor of eukaryotes.

Very early in eukaryotic evolution, the dyneins diversi-

fied into forms specialized in cytoplasmic and flagellar

functions, and 12 or more paralogs of dynein are seen in

flagellated early-branching eukaryotes, such as Giardia.

Midasin also appears to be present in all eukaryotes and

is associated with the nuclear pore complex involved in

cytoplasmic export of the 60S ribosomal particle

(Bassler et al., 2001; Garbarino and Gibbons, 2002). By

analogy to dynein, midasin might act as a motor in the

translocation of the ribosomal particles across the nu-

clear pore. Animals have a second paralog of midasin

with 4 AAA+ domains. All members of the midasin

subfamily are associated with a C-terminal vWA do-

main, suggesting that, as with the chelatases, the inter-

action of the ATPase and vWA domains is required for

the proteins function.

Within the helix-2 insert clade, similarity-based clus-tering, sequence conservation patterns and reciprocal

recovery in profile searches suggested a closer higher

order relationship between the MoxR and dynein/mi-

dasin families, on one hand, and the YifB and Chelatase

families on the other hand (Fig. 4). Together, all these

families are related to the McrB family, to the exclusion

of the NtrC and MCM families (Fig. 4). The MCM and

NtrC familes, in turn, might share a closer higher order

relationship, as suggested by the sequence conservation

pattern in the first strand of the helix-2 insert. These

relationships, taken together with the phyletic patterns,

suggest that one or two ancestral members of this clade

were present in the LUCA and subsequently diversified

into the families discussed above concomitant with the

diversification of the major divisions of life.

2.2.11. Non-AAA+ ATPases previously included in this

class

In addition to the families outlined above, several

P-loop NTPases have been previously included in the

AAA+ class. However, the present analysis showed that

they lacked the defining features of this class and, ac-

cordingly, do not belong with the bona fide AAA+

proteins. The most notable case is that of FtsK, which



16/21

functions as a DNA pump in bacteria (Aussel et al.,

2002; Donachie, 2002). Detailed analysis of the FtsK

sequence shows that it lacks the key features of the

AAA+ superclass and instead has the defining features

of the PilT/VirD4 class of P-loop NTPases, such as

additional strands in the ATPase core (LA, unpublished

observations). Thus, like other DNA pumps, such asTrwB, FtsK is a member of the PilT class (Gomis-Ruth

et al., 2001).

2.3. AAA+ ATPases present in LUCA and their evolution

in the pre-LUCA era

Based on the classification presented above, it can be

conservatively inferred that LUCA had 56 distinct

AAA+ domains. These include ancestors of: (1) the

clamp loader clade, (2) the DnaA-Orc/Cdc6 clade, (3)

the classic AAA clade, (4) LON ATPase family, (5)

Mcm and NtrC families, and (6) MoxR, Chelatase,

YifB, and Mcrb families (Fig. 4). Additionally, SFIII

helicases were probably encoded in virus-like replicons.

Thus, the founders of the higher order groups of the

AAA+ class probably have already diverged from each

other in the pre-LUCA era (Fig. 4).

Sliding clamps are critical for the processivity of

DNA polymerases and are utilized by most large DNA

replicons, including all cellular life forms and large

DNA viruses. Hence, the presence of a clamp loader

ATPase in the LUCA points to the presence of a rela-

tively large DNA-containing genome at this stage. This

is also consistent with the presence of the DNA re-

combinase RecA. However, as noticed previously, theenzymes that actually catalyze the critical steps of DNA

replication, including the helicase, which initiates

replication, the primase, the DNA polymerases, the

DNA-ligase, proof-reading exonucleases, and Holliday

junction resolvases, are not orthologous and, in many

cases, not even homologous between the archaeo-

eukaryotic and bacterial branches (Edgell and Doolittle,

1997; Leipe et al., 1999). However, although the initia-

tion helicases are non-orthologous in the bacterial and

archaeo-eukaryotic clade (DnaB and MCMs, respec-

tively), the ATPase responsible for their assembly,

namely, the ancestral member of the DnaA/Cdc6/Orc

clade, appears to have been present in LUCA ((Giraldo,

2003) and this work). This peculiar conservation pattern

of the replicative components suggests that the ancestral

clamp loader and DnaA/Cdc6/Orc ATPases functioned

in the context of a replication system that was dramat-

ically different from those found in modern cellular life

forms. One possibility is that genome replication in

LUCA occurred via reverse transcription of relatively

long RNA molecules, whereas the enzymes required for

direct DNA replication were invented only after the

separation of the archaeo-eukaryotic and bacterial

branches; the clamp loader and initiator ATPase might

have been parts of such a system (Leipe et al., 1999). The

other scenario is that the ancestral DNA replication

enzymes of LUCA were displaced later in evolution by

independently invented replication enzymes in one or

both of the principal divisions of life, perhaps with an

active contribution from virus-like elements (Forterre,

2002). However, the difficulty with the latter proposal isthat intrakingdom non-orthologous displacement of

core replication enzymes is not observed among the

extant living forms, although such displacements of

DNA polymerases and other enzymes are common in

DNA repair systems (Aravind et al., 1999; Eisen and

Hanawalt, 1999). Furthermore, the displacement hy-

pothesis would require concomitant displacement of

enzymes catalyzing several distinct steps of the DNA

replication process.

The SF3 helicases present another enigma in the

evolution of AAA+ ATPases. While they are extremely

prevalent in selfish replicons, they are not (so far)

represented in cellular genomes (Iyer et al., 2001). They

could have potentially been the ancestral replicative

helicases that were replaced by other helicases upon the

origin of distinctive DNA replication systems in the two

major branches of life. However, since the SF3 helicases

form a derived lineage in the PS1BH superclade, they

are unlikely to represent the most ancient version of the

AAA+ ATPases. The functional diversity within the

helix-2 insert clade of the PS1BH assemblage does not

allow us to predict the functions of their 12 ancestral

representatives, which might have been present in

LUCA. Both chaperone-like and helicase activities are

common in different families of this clade (Neuwaldet al., 1999; Ogura and Wilkinson, 2001), suggesting that

the ancestral form could have been a generic ATPase

possessing both these activities. Two AAA+ ATPases

with potential chaperone or ATP-dependent protein

unfolding activity, Lon and an ancestral classic AAA

ATPase, apparently were represented in LUCA. This

indicates that mechanisms for assembly and recycling of

multidomain proteins and multisubunit protein com-

plexes were already well advanced in LUCA.

Several rounds of duplications within the AAA+

class appear to have occurred during pre-LUCA

evolution. In LUCA, the AAA+ ATPases probably

performed two principal biochemical functions: (1) ca-

talysis of ATP-dependent structural transitions in

proteins and (2) nucleic acid-associated or stimulated

ATPase or helicase activities. These biochemical activi-

ties dominate all the extant branches of the AAA+

superclass, with both activities exhibited by the bacterial

LON ATPases (Fu et al., 1997). Thus, it is reasonable to

assume that the common ancestor of the entire AAA+

class was a generic ATPase that performed both of these

activities without much specificity. Even after the radi-

ation of the clamp-loader, DnaA/CDC6/ORC, classic

AAA, and PS1BH lineages, most of these proteins, with



17/21

the possible exception of the classical AAA family,

probably retained some ability to perform both

functions.

The principal biochemical functions of the AAA+

ATPases are closely linked to their ring quaternary

structure, which allows them to thread peptides or nu-

cleic acids through the central pore of a ring or providesa quasi-periodic surface for interactions (Neuwald et al.,

1999; Ogura and Wilkinson, 2001; Zhang et al., 2002).

This quaternary structure is even more widely repre-

sented in the ASCE division of the P-loop NTPases

(Egelman et al., 1995; Gomis-Ruth et al., 2001; Leipe

et al., 2000). Within the ASCE division, the AAA+ class

forms one branch, whereas most of the other ASCE

ATPases form a second branch. Furthermore, within

this second branch, one lineage includes the ABC

ATPases, whereas the second lineage consists of the

PilT, RecA/F1, and SF1/2 helicase N-terminal domains.

Ring structures are additionally observed in the RecA/

F1 and the PilT classes. The common ancestor of this

branch of the ASCE ATPases can be reconstructed as a

nucleic acid-associated ATPase. Given that the ancestral

AAA+ ATPase also probably had a nucleic acid-asso-

ciated ATPase activity, we suspect that the common

ancestor of the entire ASCE division had a nucleic acid-

associated function, perhaps as a low specificity helicase

or a nucleic acid pump. Thus, the ring quaternary

structure might have evolved as an ancestral feature of

the ASCE ATPases, which formed these structures

around double-stranded nucleic acids in the extended

conformation. Subsequently, as the AAA+ ATPases

diverged from the rest of the ASCE NTPases, theymight have utilized this ancestral structure in the context

of peptide threading, in addition to nucleic acid un-

winding.

Of the ASCE ATPases, for which structures are

currently unavailable, the AP-ATPases, NACHT

GTPases, and uncharacterized archaeal ATPases con-

stitute a major class that appears to share specific fea-

tures with the AAA+ ATPases (Aravind et al., 2001;

Koonin, 1997 and DDL, LMI, EVK, and LA, unpub-

lished). These apparent synapomorphic features include

the N-terminal helix preceding the Walker A motif and a

5-stranded core sheet ending in an a-helical extension.

Thus, together with AAA+ ATPases, these NTPases

might constitute a major higher order assemblage within

the ASCE division.

2.4. The adaptive radiation of the AAA+ superfamily in

the primary kingdoms

The separation of the bacterial and archaeo-eukary-

otic lineages appears to have been accompanied by a

major, rapid diversification of AAA+ into several spe-

cific roles primarily related to chaperone and protein

degradation activities. This dramatic radiation included

the emergence of ClpAB-N and C-terminal domains,

FtsH, ClpX, HlsU, and YifB in the bacterial lineage,

and the proteasomal and CDC48-like ATPases in the

archaeo-eukaryotic lineage (Fig. 4). Furthermore, the

Tip49 and MCM families with helicase activity evolved

in the archaeo-eukaryotic lineage, whereas the tran-

scription factor NtrC underwent an explosive diversifi-cation in the bacterial lineage. The second major phase

in the diversification of AAA+ ATPases was concomi-

tant with the origin of the eukaryotes. This diversifica-

tion occurred primarily within the classical AAA clade

(Beyer, 1997; Swaffield and Purugganan, 1997) and ap-

pears to have generated proteins, which were critical for

the emergence of many defining features of the eukary-

otes. A second important eukaryotic innovation oc-

curred in the helix-2 insert clade. Here, the ancient

MoxR family appears to have diversified into dynein

and midasin. These events appear to have been critical

for the emergence of the eukaryotic nucleus (midasin is

involved in nucleo-cytoplasmic transport of ribosomes),

cytoplasm (dynein for cytoplasmic transport), and fla-

gella (the dynein motor). In this context, it would be of

interest to investigate whether some of the prokaryotic

MoxR proteins have motor activity comparable to that

of dynein.

A significant aspect of the diversification of the

AAA+ class is related to the fusion of the ATPase do-

main with other globular domains in same polypeptide

(Dougan et al., 2002). These fusions ensure physical

proximity of the respective domains and are diagnostic

of functional interactions (Huynen and Snel, 2000;

Marcotte et al., 1999). We examined these domain fu-sions, as well as known physical interactions between

AAA+ ATPases and other domains, by constructing a

graph, which represents the entire network of such in-

teractions (Fig. 5). Three general trends emerge from

this analysis. First, the AAA+ domain underwent fu-

sions with different protease domains on several inde-

pendent occasions during evolution. The fusions include

the Zn metalloprotease domain in FtsH and the Lon

protease in the LON and YifB families. Furthermore,

the proteasomal ATPases of the classic AAA+ clade

interact with proteases of the NTN hydrolase (macro-

pain) and JAB superfamilies (Verma et al., 2002). The

distinct ClpP protease interacts with ClpX, while its

paralog HslU interacts with HslV protease of the NTN

hydrolase superfamily (Fig. 5). These fusions and in-

teractions are generally reminiscent of the fusions and

interactions of the SF1/2 helicases with diverse nucleases

(Aravind et al., 1999). In contrast, there are hardly any

domain fusions of AAA+ ATPases with nucleases, al-

though there are a few cases of physical association, e.g.,

McrB and some SF3 helicases. Fusions and associations

with proteases are rare in the other branches of the

ASCE division of ATPases. This observation suggests

that origin of AAA+ ATPases marked the emergence of



18/21

chaperone and related protein restructuring activities

among the P-loop NTPases.

Second, the vWA domain is fused or interacts non-

covalently with different versions of the AAA+ domain,

such as the chelatases, MoxR, and dynein/midasin. The

S5a protein, a subunit of the eukaryotic proteasome,

also contains a vWA domain that functionally interactswith the proteasomal AAA ATPases (Fig. 5). The vWA

domains could function as adaptors that bring the

ATPase domains to their substrates. Alternatively, in

the case of the chelatases, vWA domain might bind

Mg2 as part of the chelation reaction (Fodje et al.,

2001). Other adaptor modules, such as the BAM do-

main in the eukaryotic ORC1 proteins, the bromodo-

main in the F11A10.1 proteins of the proteasomal

ATPase family and the BRCT domain in RFC1 could

allow these proteins to specifically associate with other

proteins of the complex eukaryotic chromatin and DNA

repair systems, respectively. Likewise, the ankyrin re-

peats and the CH domain could mediate the interaction

of Unc-53 and CBP-2 family with cytoskeletal compo-

nents (Fig. 5).

Finally, several independent fusions of the AAA+

domain with different versions of the DNA-binding

HTH domain are seen in the NtrC, MCM, Cdc6, DnaA,

and RuvB families (Fig. 5). These fusions might have

allowed the evolution of specific interactions between

the AAA+ domains and nucleic acids. This could imply

that the ancestral AAA+ domains non-covalently as-

sociated with stand-alone HTH proteins to translocate

to specific sites on the DNA, such replication origins.

3. General conclusions

Our understanding of the AAA+ ATPases, especially

in structural and mechanistic terms, has vastly improved

since the publication of the previous survey of this

protein class (Neuwald et al., 1999). Using the wealth of

structures and genomic information currently at our

disposal, we identified the defining structural features of

the AAA+ superclass and constructed an evolutionary

classification along with a reconstruction of some major

aspects of their evolutionary history. In particular, some

aspects of the earliest stages of their evolution, the

higher order divergence events and position of some

highly divergent versions, such as SF3 helicases, dynein,

and midasin, are becoming apparent. The AAA+

ATPases are an ancient group of ATPases, which

already showed considerable diversity in LUCA. The

earliest diversification events in their evolution appear to

correspond to the emergence of specific features related

to processivity of DNA replication apparatus and

assembly of the replication initiation complexes. The

next great radiation gave rise to several distinct chap-

erones, ATPase subunits of proteases, DNA helicases

and transcription factors. The third major radiation at

the base of the eukaryotic lineage probably contributed

to the origin of eukaryote-specific adaptations related to

nuclear and cytoskeletal functions. Some of the rela-

tionships and domains reported here might provide new

leads in investigating the functions and biology of

AAA+ ATPases.

4. Materials and methods

Sequences of AAA+ proteins were extracted from the

non-redundant (NR) protein sequence database (Na-

tional Center for Biotechnology Information, NIH,

Bethesda) using the PSI-BLAST program (Altschul

et al., 1997), with the sequences of known AAA+ proteins

as queries. Sequence similarity-based protein clustering

was performed using the BLASTCLUST program (ftp://

ftp.ncbi.nih.gov/blast/documents/README.bcl). Mul-

tiple alignments were constructed using the Clustal X

(Thompson et al., 1997) or T-Coffee (Notredame et al.,

2000) programs and corrected on the basis of PSI-

BLAST results and structural alignments, as previously

described (Aravind and Koonin, 1999a). All newly re-

covered sequences were evaluated on the basis of the

presence of conserved AAA+ specific motifs, such as

those associated with the N-terminal helix, sensor 1 and

2, and the arginine finger, for differentiating them from

other P-loop proteins. For each of the families recog-

nized through these procedures, the phyletic distribution

was evaluated in terms of the presence of homologues in

a phylogenetically diverse sample of60 complete ge-nomes from the three primary kingdoms, Bacteria, Ar-

chaea, and Eukaryota. The COG database was used as a

guide for identifying orthologous proteins and their

phyletic patterns (Tatusov et al., 2003).

Phylogenetic trees were constructed using the

PROTDIST and FITCH programs of the PHYLIP

package with the default parameters (Felsenstein, 1996),

followed by optimization via local rearrangements

conducted using the maximum likelihood (ML) method

with the JTTF substitution model as implemented in the

MOLPHY package (Adachi and Hasegawa, 1992).

Neighbor joining trees were constructed using the

MEGA program (Kumar et al., 1994). Support for

selected tree branches was measured by 10,000 resam-

plings with the relative logarithmic boostrap (RELL-

BP) procedures implemented in the MOLPHY package

(Adachi and Hasegawa, 1992). For evolutionary re-

constructions, the standard model of early evolution,

which postulates the original split between the bacterial

and archaeo-eukaryotic lineages, was employed as the

null hypothesis (Brown and Doolittle, 1997).

For structural co

Documents

AAAplus New