Upload
somasushma
View
213
Download
0
Embed Size (px)
Citation preview
8/14/2019 AAAplus New
1/21
Evolutionary history and higher order classificationof AAA+ ATPases
Lakshminarayan M. Iyer, Detlef D. Leipe, Eugene V. Koonin, and L. Aravind*
National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
Received 2 September 2003, and in revised form 8 October 2003
Abstract
The AAA+ ATPases are enzymes containing a P-loop NTPase domain, and function as molecular chaperones, ATPase
subunits of proteases, helicases or nucleic-acid-stimulated ATPases. All available sequences and structures of AAA+ protein
domains were compared with the aim of identifying the definitive sequence and structure features of these domains and inferring
the principal events in their evolution. An evolutionary classification of the AAA+ class was developed using standard phy-
logenetic methods, analysis of shared sequence and structural signatures, and similarity-based clustering. This analysis resulted in
the identification of 26 major families within the AAA+ ATPase class. We also describe the position of the AAA+ ATPases with
respect to the RecA/F1, helicase superfamilies I/II, PilT, and ABC classes of P-loop NTPases. The AAA+ class appears to have
undergone an early radiation into the clamp-loader, DnaA/Orc/Cdc6, classic AAA, and pre-sensor 1 b-hairpin (PS1BH)
clades. Within the PS1BH clade, chelatases, MoxR, YifB, McrB, Dynein-midasin, NtrC, and MCMs form a monophyletic
assembly defined by a distinct insert in helix-2 of the conserved ATPase core, and additional helical segment between the core
ATPase domain and the C-terminal a-helical bundle. At least 6 distinct AAA+ proteins, which represent the different major
clades, are traceable to the last universal common ancestor (LUCA) of extant cellular life. Additionally, superfamily III heli-
cases, which belong to the PS1BH assemblage, were probably present at this stage in virus-like selfish replicons. The nextmajor radiation, at the base of the two prokaryotic kingdoms, bacteria and archaea, gave rise to several distinct chaperones,
ATPase subunits of proteases, DNA helicases, and transcription factors. The third major radiation, at the outset of eukaryotic
evolution, contributed to the origin of several eukaryote-specific adaptations related to nuclear and cytoskeletal functions. The
new relationships and previously undetected domains reported here might provide new leads for investigating the biology of
AAA+ ATPases.
Published by Elsevier Inc.
1. Introduction
A large part of the proteome of any organism is de-
voted to proteins that bind nucleoside triphosphates
and, typically, utilize them as substrates in various re-
actions (reviewed in Vetter and Wittinghofer, 1999).
Several distinct NTP-binding protein folds have been
structurally characterized to date, but amongst these the
P-loop NTPases (Saraste et al., 1990) are by far the most
abundant class, which accounts for 1018% of the pre-
dicted gene products in the sequenced prokaryotic and
eukaryotic genomes (Koonin et al., 2000a). Proteins
with P-loop NTPase domains are also present in the
majority of viruses studied to date (Gorbalenya and
Koonin, 1989). The P-loop NTPases are thought to be a
monophyletic assemblage of protein domains, and sev-
eral distinct versions of this domain are traceable to the
last universal common ancestor (LUCA) of all modern
cellular life forms (Kyrpides et al., 1999; Leipe et al.,
2002). This suggests that the P-loop domain originated
long before the time of the LUCA and had undergone
considerable structural and functional diversification
prior to this period. Thus, understanding the natural
history of P-loop NTPases is critical for understanding
the key aspects of lifes evolution, ranging from the early
phases to the radiation of major organismal lineages.
Most members of the P-loop NTPase fold hydrolyze
the bc phosphate bond of a bound nucleoside tri-
phosphate, most often, ATP or GTP. The free energy of
* Corresponding author. Fax: 1-301-435-7794.
E-mail address: [email protected] (L. Aravind).
1047-8477/$ - see front matter. Published by Elsevier Inc.
doi:10.1016/j.jsb.2003.10.010
Journal of Structural Biology 146 (2004) 1131
Journal of
StructuralBiology
www.elsevier.com/locate/yjsbi
http://mail%20to:%[email protected]/http://mail%20to:%[email protected]/8/14/2019 AAAplus New
2/21
this hydrolysis reaction is typically utilized to induce
conformational changes in other molecules. This con-
stitutes the basis of the biochemical activities and bio-
logical functions of most P-loop fold proteins. In
contrast, members of one major lineage of P-loop pro-
teins, the kinases, transfer the ATP c-phosphate to di-
verse substrates (Leipe et al., 2003). Structurally, P-loopdomains adopt a 3-layered a/b sandwich configuration
that contains regularly recurring ab units with the
b-strands forming a central, mostly parallel b-sheet
surrounded on both sides by a-helices (Milner-White
et al., 1991) (see also the SCOP database (Murzin et al.,
1995): http://scop.mrc-lmb.cam.ac.uk/scop/). At the se-
quence level, P-loop NTPases are generally character-
ized by two conserved sequence motifs, the Walker A
and B motifs, which bind, respectively, the b and c
phosphate moieties of the bound NTP, and a Mg2
cation (Saraste et al., 1990; Vetter and Wittinghofer,
1999; Walker et al., 1982).
Sequence and structure analyses suggest that the
primary diversification event in the evolution of
the P-loop fold resulted in the two principal classes of
the P-loop domains. The first of these, the KG (Kinase
GTPase) division includes the kinases and GTPases that
share number of structural similarities, such as the ad-
jacent placement of the P-loop and Walker B strands.
The other class, the ASCE division (for additional
strand, catalytic E), is characterized by an additional
strand in the core sheet, which is located between the
P-loop strand and the Walker B strand (Leipe et al.,
2002, 2003; Fig. 1). As opposed to kinases and GTPases,
ATP hydrolysis by the proteins of the ASCE grouptypically depends on a conserved catalytic (proton-
abstracting) glutamate that primes a water molecule for
the nucleophilic attack on the c-phosphate group of
ATP. The ASCE division includes AAA+, ABC, PilT/
VirD4, superfamily 1/2 (SF1/2) helicases, and RecA/F1/
F0 superfamilies of ATPases, along with several addi-
tional, less confidently classified families.
Starting over a decade ago, the AAA ATPases
(ATPases associated with a variety of cellular activities)
were encountered in studies on an astonishing range of
biochemical systems (Confalonieri and Duguet, 1995;
Lupas and Martin, 2002; Ogura and Wilkinson, 2001).
These included, among others, the eukaryotic prote-
asomal ATPases, CDC48, and FtsH, which are involved
in processes related to protein stability and degradation
in bacteria and eukaryotes, NSF, which is implicated in
vesicular fusion, Pex1p, involved in peroxisome bio-
genesis, and Bcs1p, which participates in the assembly of
mitochondrial membrane complexes. Approximately
around the same time, a detailed computational analysis
of various cellular and viral proteins involved in nucleic
acid metabolism, such as DnaA, the MCM proteins,
NtrC-type transcription factors, and helicases of various
RNA and DNA viruses comprising the helicase
superfamily 3 (SF3), suggested that all these proteins
shared a conserved ATPase domain (Koonin, 1993).
Solution of the X-ray structure of the NSF protein
and its comparison with the clamp loader subunit
structure supported the unification of these ATPases
into a monophyletic group (Guenther et al., 1997;
Lenzen et al., 1998). Concomitantly, we conducted asystematic analysis of these ATPase domains using
advanced sequence profile analysis methods and struc-
tural comparisons, which resulted in the unification of
the bona fide AAA ATPase and the DnaA/MCM/
NtrC/SFIII-related proteins into a single, monophyletic
AAA+ class (Neuwald et al., 1999). Additionally,
this analysis showed that various other ATPase domain
families, such as ClpAB/Hsp100, ClpX, HslU, and Lon,
which are involved in protein folding and degradation,
the eukaryotic motor protein dynein, a large, conserved
eukaryotic protein with 6 ATPase domains (subse-
quently termed midasin), magnesium and cobalt
chelatases, the bacterial DNA-replication clamp load-
ers, and eukaryotic replication factor C subunits, also
belonged to the AAA+ class. It was also proposed that
the AAA+ domain might be a common denominator in
the catalytic assembly or disassembly of large cellular
complexes of polypeptides and nucleic acids and that
the majority of AAA+ ATPases function as oligomeric
ring structures, which provide symmetric or quasi-
symmetric surfaces for interactions with other mole-
cules or a central pore for threading molecules in an
extended conformation (Neuwald, 1999; Neuwald et al.,
1999).
Since the publication of the original analysis of theAAA+ class, a wealth of structural and biochemical
studies have been published that have strongly rein-
forced the monophyly of AAA+ ATPases and eluci-
dated intricate functional details of how oligomeric
rings of AAA+ proteins could be deployed in various
biological contexts (Dougan et al., 2002; Lupas and
Martin, 2002; Ogura and Wilkinson, 2001). Currently
over 15 structures of distinct types of the AAA+
domain are available (Fig. 1). This data, along with
the genome sequences of diverse organisms from many
of the principal phylogenetic lineages, provides for a
post-genomic vantage point to address several
issues, which have not been tractable previously: (1) A
formal, unified definition of the AAA+ class that
combines sequence and structural information. (2) The
higher order relationships within the AAA+ class. (3)
The earliest events in the evolution of AAA+ ATPases
and its differentiation from the other ASCE ATPases.
(4) The trends in colonization of various functional
niches during the evolution of this class of ATPases.
Here, we address these problems, particularly in light
of the new information that became available since the
previous survey of the AAA+ class fo ATPases
(Neuwald et al., 1999).
12 L.M. Iyer et al. / Journal of Structural Biology 146 (2004) 1131
http://scop.mrc-lmb.cam.ac.uk/scop/http://scop.mrc-lmb.cam.ac.uk/scop/8/14/2019 AAAplus New
3/21
2. Results and discussion
2.1. The defining structural and catalytic features of the
AAA+ ATPases
We collated all currently available structures of pro-
teins that have been confidently assigned to the AAA+class and prepared a multiple alignment of their se-
quences on the basis of their structure superposition.
This allowed us to map all the major conserved sequence
features to their 3D structural cognates in the core
AAA+ domain. Furthermore, this structure-based
alignment also enabled correction of the earlier align-
ments (Koonin, 1993; Neuwald et al., 1999), which were
derived principally from sequence comparisons, and the
recognition of previously overlooked subtle sequence
signatures. Representative sequences of previously
identified AAA+ proteins were then used as seeds for
PSI-BLAST searches of the NR database to detect new
AAA+ ATPases. All the newly detected putative AAA+
proteins were compared to the structure-based multiple
alignment to establish their membership in the AAA+
class based on the presence of the defining consensus
patterns.
The AAA+ ATPase domains share the ancestral
Walker A and B motifs with the rest of the P-loop
NTPases (Figs. 13). In the majority of the members of
the AAA+ class, the canonical form of the Walker A (P-
loop) motif is conserved, typically, in the form of
GX2GXGK[ST]. At least one of the residues between
the first two glycines of the signature is frequently a
proline. Certain minor deviations from the canonicalform are seen in P-loop motifs of the NtrC
(GX2GXGK[DE]), MCM (GX2GXAKS), and MoxR
(GX2GXAK[ST]) families (Fig. 3). Additionally, the
P-loop motif is disrupted in various catalytically inactive
AAA+ domains, such as Sir3p, Orc4p, some of the re-
peats of dynein, and the d0 subunit of the bacterial
replication clamp-loader. The Walker B motif typically
assumes the form hhhhDE (h is a hydrophobic residue),
where the conserved glutamate primes a water molecule
for a nucleophilic attack on the c-phosphate group of
ATP (Story and Steitz, 1992). Deviations from this state
are again observed in forms that are likely to be cata-
lytically inactive. The core parallel sheet of the AAA+
ATPase domain assumes a 51432 topology, where
strands 1 and 3 are, respectively, associated with the
Walker A and B motifs (Fig. 1) (Guenther et al., 1997;
Lenzen et al., 1998; Neuwald et al., 1999). This core
differs from most of the other ATPases of the ASCE
division, such as RecA-F1, SF1/2 helicases and PilT
ATPases, in lacking additional strands to the right of
strand 2. Strand 4 of the AAA+ ATPase domain is as-
sociated with another motif (termed sensor-1 or Motif
C) bearing a conserved polar residue (Figs. 2, 3). Like its
equivalents in several other ASCE division ATPases,
this residue is likely to mediate interactions that are
critical for ATP hydrolysis rather than ATP binding
(Guenther et al., 1997; Putnam et al., 2001).
Beyond these basic features of the core domain, the
AAA+ class has several features that distinguish it from
other NTPases of the ASCE division. First, the AAA+
proteins possess an additional conserved helix N-ter-minal to the Walker A strand. This helix typically con-
tains a conserved glycine or a small residue that caps it
at the N-terminus, and a conserved polar (usually
acidic) residue that defines the N-terminal sequence
motif of this class (Figs. 13). Most structures also
contain a conserved region, which is located upstream of
this helix, adopts a characteristic extended conformation
and runs perpendicular to the direction of the strands in
the core sheet (Figs. 1 and 2). The presence of this region
in representatives from all the diverse branches of the
AAA+ class suggests that it is an ancestral, defining
feature of the AAA+ clade. The AAA+ class is also
distinguished by the presence of a helical bundle with 4
helical segments that occurs immediately C-terminal to
strand 5 of the core ATPase domain. This structure
contains a conserved motif (sensor-2) with a frequently
conserved arginine (Figs. 2 and 3). In the classical AAA
proteins this arginine is often replaced by an alanine.
Sensor-2 appears to be critical in constraining the ATP
molecule to facilitate its hydrolysis and undergoes con-
formation changes depending on the presence of ATP or
ADP. Thus, this helical module appears to mediate the
transmission of the free energy of ATP hydrolysis by
AAA+ proteins to their respective substrates (Ogura
and Wilkinson, 2001).A large number of the AAA+ ATPases characterized
to date form quasi-symmetrical oligomeric ring struc-
tures that, in certain cases, have been shown to thread
nucleic acids or peptides in extended conformation
through the central pore (Lenzen et al., 1998; Neuwald,
1999; Neuwald et al., 1999; Ogura and Wilkinson, 2001;
VanLoock et al., 2002). In the case of dynein and mi-
dasin, which contain 6 repeats of the AAA+ domain in a
single polypeptide, the protein is also likely to fold into a
hexameric ring. This quaternary structure has also been
consistently observed in several members of the RecA/
F1 and PilT classes of ATPases, suggesting that it is an
ancestral feature of the entire ASCE clade (Egelman
et al., 1995; Gomis-Ruth et al., 2001; Leipe et al., 2000).
The inter-protomer cooperation in ATP-hydrolysis is
elicited by another defining feature of the AAA+ su-
perclass, the arginine finger. This is a conserved arginine
that is located at the C-terminus of the helix upstream of
strand 5 and is directed towards the ATP-containing
active site of the adjacent protomer in a ring (Guenther
et al., 1997; Putnam et al., 2001; Zhang et al., 2002). The
arginine finger appears to be displaced relative to its
position in other AAA+ domains in the DnaA and Orc/
Cdc6 families (Fig. 2). The above features can be
L.M. Iyer et al. / Journal of Structural Biology 146 (2004) 1131 13
8/14/2019 AAAplus New
4/21
considered shared derived characters (synapomorphies)
of the AAA+ clade within the ASCE division and con-
stitute a blueprint that allows clear segregation of
AAA+ ATPases from all other P-loop NTPases.
2.2. Evolutionary classification of the AAA+ ATPases
2.2.1. Identification of AAA+ families and relationships
between them
All previously identified AAA+ ATPases and newly
detected proteins, which conformed to the AAA+
blueprint described above, were clustered using the
BLASTCLUST program with varying score density and
protein-length-overlap thresholds. Those clusters that
remained stable over a range of medium thresholds
(score density 0.30.5) were considered likely to define
monophyletic families or at least cores of such families.
The alignments for these clusters were then constructed
using the T-Coffee program, corrected based on the
template structural alignment, and analyzed to identify
regions of extended conservation between and beyond
the principal conserved signatures described above. This
allowed us to formally define families on the basis of
conserved signatures (Table 1). Typically, families in-
cluded one or more orthologous lineages and, in some
cases, also a cloud of more divergent paralogs. The
presence of shared sequence features between families
and/or consistent clustering of families based on score
densities allowed us to delineate clades comprised of
multiple families. Further higher order groups of these
clades were derived using structure-based clustering
with pair-wise DALI Z-scores and through identifica-tion of unique structural features that unified multiple
superfamilies (Figs. 1, 4 and Table 1). Conventional
phylogenetic trees, constructed using the neighbor
joining, maximum likelihood, and minimum evolution
methods, were used to explore the relationships within
individual families or a group of closely related families.
However, as the overall sequence similarity between
families decreases there are only a small number of
conserved positions shared between the families. It is
not possible to obtain statistically well-supported higher
order groupings based on this small set of universally
conserved residues using conventional phylogenetic
methods. In these conditions, could also cause artificial
clustering of proteins that retain primitive sequence
features present in the common ancestor of the entire
group. Hence, it is necessary to depend mainly on
structure comparisons to identify the primitive struc-
tural state for the fold, and then delineate the structural
characters that are derived in particular lineages. Based
on these characters the most parsimonious scenario (a
scenario, which minimizes the loses and independent
innovations of particular features), which accounts for
the observed structural quirks was determined and is
presented here.
Additionally, we analyzed the phyletic distribution of
individual families in the completely sequenced ge-
nomes. For families with a wide phyletic spread, i.e.,
those present in 2 or all 3 primary kingdoms (bacteria,
archaea, and eukaryotes), conventional phylogenetic
trees were constructed. The presence of distinct archaeal
(or archaeo-eukaryotic) and bacterial branches in thesetrees was taken as an indication that the given family
was most likely represented in LUCA. In contrast, the
presence of a well-defined bacterio-eukaryotic branch,
particularly in the absence of a clear archaeal clade,
suggested horizontal gene transfer (HGT) from bacteria
to eukaryotes, most often from the pro-mitochondrion
or the pro-chloroplast. Below we briefly describe the 26
identified families of AAA+ ATPases, with an emphasis
on their phyletic distribution and lineage-specific deri-
vations, along with the predicted functions of unchar-
acterized groups (Table 1).
2.2.2. The clamp loader cladeIn all cellular life forms, the clamp loader ATPases
are responsible for loading the DNA clamp, which is
comprised of the PCNA or DNA polymerase III b
subunits, onto the DNA (Davey et al., 2002a; Davey
and ODonnell, 2000; Hingorani and ODonnell, 1998).
The AAA+ domains of this clade mostly conform to the
idealized core of this class without any specialized
innovations (Fig. 2). The clamp loader ATPases have a
synapomorphic RC signature associated with the ar-
ginine finger (Fig. 3). Three major families can be
identified within this clade. The first of these, the bac-
terial family, includes two pan-bacterial orthologouslineages, typified by the Escherichia coli HolB (d0 sub-
unit) and DnaX (c and s subunits) proteins, respectively.
The bacterial family of clamp loaders is characterized by
the insertion of a Zn cluster downstream of the helix
associated with the Walker A motif (Guenther et al.,
1997). The presence of members of the two orthologous
lineages in almost all bacteria suggests an ancient du-
plication and functional diversification in the common
ancestor of all known bacteria. The DnaX gene from
cyanobacteria appears to have been secondarily trans-
ferred to plants and probably participates in the repli-
cation of the chloroplast DNA.
The second clamp loader family, the RFC family,
has an archaeo-eukaryotic distribution and consists of
two major orthologous lineages. One of these lineages
includes the archaeal RFC proteins typified by
MTH241 and the eukaryotic RFC2, RFC3, RFC4,
and RFC5 proteins. The second lineage includes the
archaeal proteins typified by MTH240 and the eu-
karyotic RFC1, Rad24, Chl12, and Yor144c proteins.
In several archaea, the representatives of the two or-
thologous lineages are encoded by adjacent genes in
the genome. This suggests that an ancient duplication
in the archaeal lineage resulted in two branches of the
14 L.M. Iyer et al. / Journal of Structural Biology 146 (2004) 1131
8/14/2019 AAAplus New
5/21
RFC family. Early in eukaryotic evolution each of
these lineages appears to have undergone several ad-
ditional duplications giving rise to the 5 subunits of
RFC and the other proteins, such as Rad24 and
Chl12, which were recruited for distinct roles in DNA
repair (Naiki et al., 2001). This was paralleled by the
duplication of the PCNA clamp itself, resulting inspecialized clamps, such as Rad1, Rad9, and Hus2,
that functioned as partners for the Rad24-like ATP-
ases (Aravind et al., 1999). Additionally, the RFC
family includes the clamp-loaders of viruses, such as
T4 (Davey et al., 2002b), which could represent an
early diverging branch associated specifically with viral
replicons.
The third major family of is the WHIP family, typi-
fied by the Werner helicase interacting protein from
vertebrates and yeast Mgs1p (Kawabe et al., 2001). This
family is present in most major bacterial lineages and in
all sequenced eukaryotic genomes. Most members of
this family contain a distinct C-terminal globular do-
main fused to the AAA+ domain (Fig. 5) that contains a
conserved acidic residue reminiscent of the catalytic
residue of the RNAse H fold (Bork et al., 1992). In
phylogenetic trees, the eukaryotic proteins form a tight
group that lies within the bacterial radiation, closer to
the proteobacteria (data not shown). This suggests that
the WHIP family was probably derived early in bacterial
evolution from the bacterial clamp loader family and
was transferred to eukaryotes from the pro-mitochon-
drial endosymbiont (Fig. 4).
The presence of distinct, functionally equivalent
bacterial and eukaryotic families in the clamp loaderclade suggests that one ancestral member of this clade
was present in LUCA.
2.2.3. The DnaA/CDC6/ORC cladeThis clade consists of two families, namely, the bac-
terial DnaA family and the archaeo-eukaryotic CDC6/
ORC family, which perform analogous roles in the as-
sembly of protein complexes associated with the repli-
cation origin (Erzberger et al., 2002; Giraldo, 2003). The
synapomorphy that unifies the DnaA/CDC6/ORC clade
is the presence of two helices after strand 2, which are of
approximately equal size, and pack against each other in
a characteristic fashion (Figs. 13). However, the two
families of this clade have greatly diverged from each
other in terms of sequence. The bacterial DnaA familyin bacteria contains two principal orthologous groups.
DnaA proper is a pan-bacterial group, typically present
in a single copy in all known bacteria. The DnaA pro-
tein forms an oligomeric complex around the replication
origin site and initiates local unwinding of DNA, which
is required for recruitment of the DnaB helicase (Davey
and ODonnell, 2000; Erzberger et al., 2002). The second
orthologous set typified by the E. coli DnaC protein is
sporadically present in several diverse bacteria and ap-
pears to have arisen through a duplication of DnaA
relatively late in bacterial evolution. In E. coli, DnaC
has been shown to load the helicase DnaB to the single-
stranded DNA at the origin site (Davey et al., 2002a). In
bacteria lacking DnaC, this function might be per-
formed by DnaA itself. IstB, a member of the DnaC
lineage, is encoded by transposons of the IS21 family
and is required for their transposition, probably via re-
cruitment of the replication complex to the transposon
DNA.
The Orc/Cdc6 family has 12 representatives in al-
most all archaea. Interestingly, it is absent in Met-
hanococcus, while Halobacterium shows an extensive
lineage-specific expansion of this family, with 11 par-
alogous proteins. Early in eukaryotic evolution, the
Orc/Cdc6 family appears to have differentiated into theOrc1p, Orc4p, Orc5p, and Cdc6p lineages that are
present in most extant eukaryotes. These proteins co-
operate in loading the MCM complex to the origin of
replication site analogously to the action of DnaA and
DnaC in loading DnaB in bacteria (Lee and Bell, 2000;
Liu et al., 2000). Certain eukaryotic members of
the Orc/Cdc6 family, such as yeast Orc4p and Sir3,
a yeast-specific paralog of Orc1p, have disrupted
Fig. 1. Topology diagrams of selected AAA+ ATPases and other members of the P-loop NTPase fold. Strands are shown as arrows with the
arrowhead on the C-terminal side and numbered 1 through 5. Strands 1 and 3 that encompass the conserved sequence motifs GxxxxGK[ST] (WalkerA) and hhhh[DE] (Walker B) are rendered in orange; the other core strands (2,4,5) are in light orange; non-conserved strands are in gray. Helices are
shown as blue rectangles when above the plane of theb-sheet and in faint blue when below the b-sheet. The P-loop is shown as a red line and a green
arrowhead marks the N-terminus of the ATPase domain. The defining feature of the AAA+ ATPases, the loop that runs across the face of the b-sheet
and the helix before strand 1 are highlighted in dark blue. The b-hairpin that defines the PS1BH group is rendered in purple. Broken lines indicate
secondary structure elements that are not present in the PDB file or that were left out for clarity. The top panel shows two proteins from outside the
AAA+ group (RecA, ASCE division; TMP Kinase, KG division) for comparison purposes. Sequences are identified with the protein name, PDB
code (in parenthesis) and the organism name.
Fig. 2. Structures of selected AAA+ ATPases. The top left panel shows the structure of a member of the clamp loader clade labeled with all the
synapomorphies of the AAA+ clade. It is close to the ideal AAA+ domain structure. The top left panel shows the structure of the archaeal Cdc6
protein with 2 helices after strand 2, which are the shared character of the ORC/CDC6 clade. Bottom left panel shows the structure of RuvB, which
has the hallmark b-hairpin defining the PS1BH superclade. The bottom right panel shows the structure of Mg chelatase, which also belongs to the
PS1BH clade. Additionally, it also shows the features that define the helix-2 insert clade and the displaced C-terminal helical bundle. Walker B (DE),
sensor 1(polar residue), arginine finger and sensor 2 (arginine) are shown in all the structures.
c
L.M. Iyer et al. / Journal of Structural Biology 146 (2004) 1131 15
8/14/2019 AAAplus New
6/21
16 L.M. Iyer et al. / Journal of Structural Biology 146 (2004) 1131
8/14/2019 AAAplus New
7/21
L.M. Iyer et al. / Journal of Structural Biology 146 (2004) 1131 17
8/14/2019 AAAplus New
8/21
P-loops, suggesting that they have been recruited for
non-catalytic roles like proteinprotein and protein
DNA interactions.
The two families of the DnaA/CDC6/ORC clade can
be traced back, respectively, to the common ancestors of
the archaeo-eukaryotic and bacterial clades. This ob-
servation, taken together with the mechanistic similarity
of their role in replication initiation, suggests that the
clade was represented by the common ancestor of the
two families in LUCA (Giraldo, 2003). The recruitment
of very different replication initiation complexes by these
ATPases in archaea-eukaryotes and in bacteria might
have contributed to the extensive sequence divergence
between them.
2.2.4. The classical AAA domains and associated diver-
gent sister groups
The classical AAA clade consists of all ATPases re-
lated to the proteasomal ATPases, FtsH, and CDC48
that originally were defined as the AAA superfamily
(Confalonieri and Duguet, 1995). The main structural
synapomorphy for this clade is the presence of an addi-
tional short helix immediately downstream of strand 2
(Figs. 1 and 3). The proteins of this clade strongly cluster
to the exclusion of other AAA+ proteins in similarity-
based clustering, and often have a conserved glycine
N-terminal to the arginine finger. Since several detailed
evolutionary investigations on this family have been
published, we only mention here the broad evolutionary
Fig. 3. Alignment of AAA+ ATPases. Multiple sequence alignment of selected representatives AAA+ ATPases families. The coloring reflects 80%
consensus of residue conservation. Secondary structure assignments are shown above the alignment, where E signifies ab-strand and H a a-helix. The
coloring is based on the 80% consensus shown underneath the alignment. The coloring scheme is as follows: h indicates hydrophobic residues(ACFILMVWY) shaded yellow, s indicates small residues (AGSVCDN), colored green, o indicates alcohol group containing residues (ST), shaded
blue, and p indicates polar residues (STEDKRNQHC) colored purple. Strongly conserved polar residues are colored in red (RED). The predicted
secondary structure elements are shown below the alignment. Specific synapomorphic characters have been boxed. Species abbreviations are as
follows: AAV. Adeno-associated virus; Aqae, Aquifex aeolicus; At, Arabidopsis thaliana; Af, Archaeoglobus fulgidus; AcNPV, Autographa californica
nucleopolyhedrovirus; Bs, Bacillus subtilis; phiC31, Bacteriophage phi-c31; Ce, Caenorhabditis elegans; Cj, Campylobacter jejuni; Dr, Deinococcus
radiodurans; Ddi, Dictyostelium discoideum; Dm, Drosophila melanogaster; Ec, Escherichia coli; Gila, Giardia intestinalis; Hi, Haemophilus influenzae;
Hs, Homo sapiens; Kpn, Klebsiella pneumoniae; Mlo, Mesorhizobium loti; Mjan, Methanocaldococcus jannaschii; Mac, Methanosarcina acetivorans;
Mta, Methanothermobacter thermautotrophicus; Mex, Methylobacterium extorquens; Mm, Mus musculus; Polio, Poliovirus; Pfu, Pyrococcus furiosus;
Ph, Pyrococcus horikoshii; Rhca, Rhodobacter capsulatus; Sc, Saccharomyces cerevisiae; Sp, Schizosaccharomyces pombe; SV40, Simian virus 40; Scoe,
Streptomyces coelicolor; Sso, Sulfolobus solfataricus; Ssp, Synechocystis sp; Tac, Thermoplasma acidophilum; Thth, Thermus thermophilus; Ter,
Trichodesmium erythraeum; VV, Vaccinia virus; and Vc, Vibrio cholerae.
18 L.M. Iyer et al. / Journal of Structural Biology 146 (2004) 1131
8/14/2019 AAAplus New
9/21
patterns seen in this clade. Several distinct families can
be identified within this clade, with most of the diversity
observed in eukaryotes. The only pan-bacterial family,
FtsH, includes membrane proteins with a single AAA+
ATPase domain fused to a C-terminal metalloprotease
domain (Koonin et al., 2000b; Krzywda et al., 2002;
Tomoyasu et al., 1995). Some members of the FtsH
family have been laterally transferred to eukaryotes,
apparently through the mitochondrial and chloroplast
routes. Two major families, namely, the proteasomal
ATPase family and the CDC48 family, which contains a
tandem repeat of the AAA+ module, are conserved
throughout the archaeo-eukaryotic branch (Beyer, 1997;
Swaffield and Purugganan, 1997). In eukaryotes, the
proteasomal ATPase family has undergone a massive
proliferation: 6 orthologous lineages, namely S4, S6a,
S6b, S7, S8, S10a, and S10b, can be traced back to the
common ancestor of all crown group eukaryotes.
Table 1
Clamp loader clade Pre-sensor 1 hairpin superclade
SR[CAT] motif associated with Arginine finger b-hairpin inserted before Sensor 1 motif
*HolB/DNAX family SFIII helicase clade [viruses]
Zn cluster between helix-1 and strand 2 *NCLDV D5 ATPase family
**Hol B subfamily [B] *Positive strand RNA virus helicase family
**DNAX subfamily [B, Pl] *Other viral helicases
* RFC family HslU/ClpX/Lon/ClpAB-C clade
**MTH241/RFC2 subfamily [AE] Extended loop after strand 2
(In eukaryotes includes RFC2, RFC3, RFC4, RFC5) *HslU/ClpX family
**MTH240/RFC1 subfamily [AE] **HslU subfamily [B, Kin, Api]
(In eukaryotes includes RFC1, Chl12, RAD24, Yor144c) **ClpX subfamily [B, E]
*WHIP family [B, E] *Lon family
Fused to a C-terminal WC domain
Fused to a Lon protease at the C-terminus
**Bacterial lon protease subfamily [B, E]
DnaA/ORC clade Fused to a LAN domain at the N-terminus
Two helices of approximately equal length after strand 2 **Archaeal lon protease subfamily [A>B]
*DnaA family *ClpAB-C family
**DnaA subfamily [B] NRhD motif associated with Arginine finger
**DnaC subfamily [B] **ClpAB-C subfamily [B, E]
*CDC6/ORC family [A, E] **Torsin subfamily [E]**CDC6/ORC [A] Torsin has a distinct a helical N-terminal domain
** Orc1p, Orc4p, Orc5p, and Cdc6p subfamilies [E] RuvB family [B, E]
Has C-terminal wHTH domain; Helical segment between strand
5 and C-terminal bundle
Classical AAA clade and its divergent relatives
Small helix between strand-2 and helix-2, [GN]R motif associated with
arginine finger
Helix-2 insert clade
*FtsH family [B, E] bab insert in helix-2
Fused to a metalloprotease at the C-terminus NtrC/MCM group
*CDC48 family [A, E] Have [AG][FL]T motif in helix-2 before bab insert
DPBB at N-terminus, 2 AAA+ domains *NtrC family [B]
*Proteasomal ATPase family [A, E] *MCM family [A, E]
**S4, S6a, S6b, S7, S8, S10a and S10b subfamilies [E] Fused to a Zn ribbon at N-terminus and a wHTH
at the C-terminus
*Katanin p60/Fidgetin family [E] *Chelatase / YifB group*NSF1 family [E] **Chelatase family [A, B, plants]
*Pex1/6 subfamily [E] Associated with a vWA domain
*Bcs1p subfamily [E] **YifB family [B]
(Possibl e rel ative s of classic al AAA c lade ) Fuse d to a Lon Protease at its N-te rminus, insert of Zn c luster
after strand 4
*AFG1 family [B, E] *McrB family [B>A, E]
*ClpAB-N family [B>A, E] *Mox R group
*TIP49 family [A, E] **MoxR family [A, B]
N-terminal module with characteristic H[ST]H motif, Predictedb barrel
between Helix-2 and Strand 3
**Dynein/Midasin family
6 Tandem AAA+ domains
**Dynein subfamily [E]
**Midasin subfamily [E]
Abbreviations: A, Archaea; B, Bacteria; E, Eukaryotes; Api, Apicomplexans; Kin, Kinetoplastids; and Pl, Plants >
indicates sporadic lateraltransfer.
L.M. Iyer et al. / Journal of Structural Biology 146 (2004) 1131 19
8/14/2019 AAAplus New
10/21
In addition to these archaeo-eukaryotic families,
there are several distinct eukaryotic families that appear
to have occupied specific functional niches (Beyer, 1997;
Frohlich, 2001; Lupas and Martin, 2002; Ogura and
Wilkinson, 2001; Swaffield and Purugganan, 1997) (also
see Froehlichs website: http://aaa-proteins.uni-graz.at/
AAA/Default.html). These include the Katanin p60/Fidgetin family, which is involved in microtubule dis-
assembly, the NSF family with 2 AAA+ domains
involved in membrane fusion, Pex1/6, which is involved
in peroxisome biogenesis, and Bcs1p, which participates
in assembly of mitochondrial membrane protein com-
plexes. The NSF family, which shares a N-terminal
double-w-b-barrel domain with the CDC48 family, is
likely to have been derived from the latter family during
the emergence of the eukaryotic secretory apparatus
(Castillo et al., 1999; Coles et al., 1999). The Bcs1p
family (Cruciat et al., 1999) is the most divergent of all
the eukaryotic families and includes a distinctive plant-
specific subfamily with an uncharacterized, conserved
N-terminal globular domain. This subfamily has un-
dergone a lineage-specific gene expansion in plants, with
at least 15 members encoded in monocot and dicot ge-
nomes. In contrast to these families, the highly divergent
AFG1 family (Lee and Wickner, 1992) shows a bacterio-
eukaryotic phyletic pattern (Table 1). It is present in
most eukaryotic lineages, whereas, amongst the bacte-
ria, it is predominantly seen in proteobacteria, with a
few sporadic occurrences in Deinococcus and actino-
mycetes. Grouping of the eukaryotic members of this
family within the proteobacterial radiation suggests that
this family is likely to have originated in bacteria, withsubsequent HGT to eukaryotes via the proto-mito-
chondrion. Most of these families with limited or spo-
radic phyletic patterns could have emerged through
rapid divergence from the more ancient widely con-
served members of the classical AAA clade.
Members of the ClpAB ATPase family (Hoskins
et al., 2001) have two AAA+ modules that are very
different from each other in terms of sequence, structure,
functions, and phylogenetic affinities (Volker and Lu-
pas, 2002). Furthermore, in oligomers, the two domains
appear to form two distinct homotypic ring structures
stacked on top of one another (Guo et al., 2002). The
N-terminal domain shares a characteristic feature with
the classical AAA clade, namely, the additional small
helix downstream of strand 2 (Figs. 1 and 3). This feature,
along with the sequence similarity patterns, suggests that
the ClpA N-terminal domain was probably derived
through rapid divergent evolution after branching off
from the classical AAA clade. In contrast, the C-ter-
minal domain is related to the Lon and HslU-like
ATPases (see below). Thus, the ClpAB ATPases appear
to have evolved as a result of an ancient fusion of two
phylogenetically distinct AAA+ ATPase modules, ra-
ther than through tandem duplication (which would
seem to be an intuitively appealing scenario given the
head to tail juxtaposition of the 2 ATPase modules in
these proteins) (also see Volker and Lupas, 2002). The
ClpAB ATPases are important chaperones and stress-
response proteins that are found in all bacterial lineages,
often in at least two versions per bacterial proteome
(Hoskins et al., 2001). Some bacteria, e.g., Pseudomonasaeruginosa, encode up to 8 members of this family. In
eukaryotes, the ClpAB proteins are represented by
Hsp104 and the mitochondrial heat shock protein
Hsp78. These proteins are absent in the archaea with the
exception of a single member in Methanothermobacter,
which appears to have been transferred from the bac-
teria. The eukaryotic ClpAB homologs are nested within
the bacterial radiation (data not shown), suggesting that
they have been acquired from the bacterial precursors of
the mitochondria.
The presence of at least one pan-bacterial and two
pan-archaeo-eukaryotic families suggests that LUCA
probably possessed a single ancestral representative of
the classic AAA clade (Fig. 4). This ancestral form ap-
pears to have initially diversified to occupy functional
niches related to protein unfolding and degradation, and
subsequently diversified to perform several other bio-
logical functions in eukaryotes.
2.2.5. The TIP49 family
These proteins are DNA-stimulated ATPases that
have been shown to associate with the TATA-binding
protein and appear to play a critical role in the assembly
of complexes related to transcriptional activation
(Wood et al., 2000). This family has a single represen-tative in several archaea and two distinct orthologous
groups, namely pontin and reptin (Rottbauer et al.,
2002), in the crown group eukaryotes. Thus, the TIP49
family appears to have emerged in the common ancestor
of the archaeo-eukaryotic lineage, followed by a split
into two paralogous lineages prior to the radiation of
the eukaryotic crown group. This group is characterized
by a small conserved N-terminal module and a re-
markable insert of a novel predicted b-barrel domain
upstream of the Walker B strand. Family-specific inserts
are seen in this location in other AAA+ families (e.g.,
HlsU) also and are likely to form a second toroidal
structure stacked below the ATPase domain. Iterative
database searches with profiles for the TIP49 family
recover both classic AAA and clamp loader clade
proteins as the best hits. However, multiple alignment-
based secondary structure prediction suggests the pres-
ence of an additional small helix downstream of strand
2, just as in the classical AAA clade (Fig. 3). This sug-
gests that Tip49 probably branched off from that clade,
early in the common ancestor of the archaeo-eukaryotic
lineage, through extensive sequence divergence. It has
been claimed previously that TIP49 was related to the
helicase subunit of the bacterial resolvasome, RuvB
20 L.M. Iyer et al. / Journal of Structural Biology 146 (2004) 1131
http://aaa-proteins.uni-graz.at/AAA/Default.htmlhttp://aaa-proteins.uni-graz.at/AAA/Default.htmlhttp://aaa-proteins.uni-graz.at/AAA/Default.htmlhttp://aaa-proteins.uni-graz.at/AAA/Default.html8/14/2019 AAAplus New
11/21
(Kurokawa et al., 1999). However, certain key features,
such as a conserved b-hairpin that is characteristic of
RuvB and its relatives (Figs. 13 see below), cannot be
detected in the Tip49 family.
2.2.6. The pre-sensor-1 b-hairpin (PS1BH) superclade
All the remaining lineages of AAA+ domains,namely, the SF3 helicases, HslU and ClpX, Lon, torsin,
the C-terminal AAA+ domain of ClpAB, RuvB, NtrC,
MoxR, and its relatives, dynein, midasin, and the
chelatases, can be unified into a vast monophyletic as-
semblage. This entire superclade is defined by the pres-
ence of a synapomorphic insert between the sensor-1
strand and the preceding helix. Analysis of the struc-
tures of T-antigen (SFIII) (Li et al., 2003), HslU
(Bochtler et al., 2000), Mg chelatase (Fodje et al., 2001),
and RuvB (Putnam et al., 2001; Yamada et al., 2002)
shows that this insert forms a b-hairpin that projects out
of the AAA+ core at a particular angle that is generally
conserved in all the diverse proteins with this feature
(Fig. 2). The HslU structure reveals that the hairpin
forms a lid-like band in the oligomeric ring state
(Bochtler et al., 2000). Sequence conservation associated
with this region allowed us to identify the members of
this assemblage for which 3D structures are not yet
available. Hereinafter, we refer to this assemblage as the
pre-sensor-1 b-hairpin (PS1BH) superclade. Below, we
detail the various higher order clades that could be de-
lineated within this large group of AAA+ ATPases
(Fig. 4, Table 1).
2.2.7. The SFIII helicase cladeSuperfamily III helicases were first identified in nu-
merous small RNA viruses, e.g., picornaviruses and
comoviruses, DNA viruses, such as the papovaviruses,
parvoviruses, circoviruses, and baculoviruses (p143
protein), and phages, e.g., P4 (Gorbalenya et al., 1990;
Koonin, 1993). Subsequently, we identified a specific
version of the SFIII helicase, typified by the vaccinia
virus D5 protein, to be one of the synapomorphies of the
nucleo-cytoplasmic large DNA virus clade (Iyer et al.,
2001) (Table 1). However, other than prophage rem-
nants, there are no SF3 helicases encoded in cellular
genomes. Thus, this lineage of AAA+ ATPases might
have originally evolved in primitive, small replicons that
now are only represented by viruses. The emergence of a
more complex DNA replication apparatus in the ar-
chaeo-eukaryotic and bacterial branches of life might
have displaced these simpler helicases in the cellular
systems.
2.2.8. The HslU/ClpX/Lon/ClpAB-C clade
The chaperones and ATPase subunits of proteases
appear to constitute another major monophyletic line-
age within the PS1BH superclade (Fig. 4). This clade is
supported by an extended loop between strand 2 and the
helix downstream of it. Support for this clade is also
offered by the consistent reciprocal recovery of these
proteins in iterative profile searches. Furthermore,
functional considerations also support the emergence of
this clade from an ancestral ATPase that probably
functioned as a cofactor for diverse proteases.
Three distinct families can be identified within thisclade. The first of these, the HslU/ClpX family, is
widespread in bacteria but absent in the currently
available archaeal proteomes (Koonin, 1993). Within
this family, two orthologous lineages, typified, respec-
tively, by the E. coli HslU and CplX proteins are
widespread in most major bacterial lineages. Orthologs
of ClpX, which function in the mitochondria, have been
detected in several eukaryotes (Corydon et al., 2000),
whereas HlsU orthologs, which might also have a mi-
tochondrial or plastid function, are known from kine-
toplastids and apicomplexans (Couvreur et al., 2002).
The phyletic pattern of the HslU/ClpX family suggests
that they diversified into the two distinct orthologous
groups prior to the radiation of the major bacterial
phyla and were subsequently acquired by eukaryotes
from the pro-mitochondrial endosymbionts. Interest-
ingly, although HslU and ClpX belong to the same
family of AAA+ ATPases, their protease partners, HslV
and ClpP, respectively, belong to the unrelated NTN
hydrolase (Pei and Grishin, 2003) and acyl-CoA decar-
boxylase/isomerase superfamilies (Aravind and Koonin,
1999a). The HslV protease is related to macropain, one
of the proteases of the archaeo-eukaryotic proteasome.
However, in the proteasome, the macropains function-
ally interact with the ATPases of the classical AAAclade (Seemuller et al., 1995; Unno et al., 2002).
The C-terminal domain of the ClpAB proteins and
the Torsin proteins from animals comprise the second
family (ClpAB-C family) of the HslU/ClpX/Lon/ClpAB-
C clade. As discussed above, the two AAA+ domains of
the ClpAB proteins are very different from each other
(Guo et al., 2002; Volker and Lupas, 2002). The C-ter-
minal domain has the hallmark hairpin of the PS1BH
superclade and, in sequence profile searches the C ter-
minal domain of ClpAB preferentially recovers other
PS1BH proteins rather than the N-terminal domain of
ClpAB. Torsin defines a group of animal proteins,
which appear to be involved in the assembly of protein
complexes in the endoplasmic reticulum (Basham and
Rose, 2001; Bassler et al., 2001). Torsin is specifically
related to the ClpAB C-terminal domain, but has a far
more restricted phyletic pattern compared to the nearly
pan-bacterio-eukaryotic spread of the ClpAB proteins
(see above). Thus, it seems likely that torsin was derived
specifically in the animal lineage through rapid diver-
gence of a breakaway version of the ClpAB C-ter-
minal domain.
The Lon proteins from archaea and bacteria define
the third major family (Lon family) within the HslU/
L.M. Iyer et al. / Journal of Structural Biology 146 (2004) 1131 21
8/14/2019 AAAplus New
12/21
ClpX/Lon/ClpAB-C clade. Two distinct lineages can be
delineated within the Lon family (Koonin et al., 2000a).
The first of these, the bacterial Lon lineage, is repre-
sented in all the major groups of bacteria and is also
seen in eukaryotes as the mitochondrial Lon protease.
These proteins are characterized by a distinctive domain
termed the LAN domain and a Lon-protease domain,
which flank the AAA+ domain at the N- and C-termini,
respectively (Fig. 5). The second lineage, LonB or ar-
chaeal Lon (Fukui et al., 2002), has a pan-archaeal
representation, with occasional representatives in bac-
teria, such as low GC Gram-positive bacteria, c-prote-
obacteria, Thermotoga and Treponema. They lack the
LAN domain but have the C-terminal protease domain
separated from the ATPase domain by a long segment
predicted to assume a coiled-coil conformation. These
phyletic patterns suggest that bacterial Lon emerged
prior to the diversification of bacteria and was trans-
ferred to the eukaryotes via the mitochondrial route.
LonB (archaeal Lon) appears to have emerged prior to
the archaeal diversification and probably was horizon-
tally transferred to bacteria subsequently. This implies
Fig. 4. Inferred evolutionary history of AAA+ ATPases. The figure shows several relative temporal epochs separated by the major evolutionary
transitions that mark their boundaries. The solid colored bars indicate the maximum depth to which the AAA+ lineages can be traced with respect to
these temporal epochs. The dashed lines indicate uncertainty in terms of the exact point of origin of a lineage. The ellipses bundle groups of lineages
from which a new lineage with relatively limited phyletic pattern could have potentially emerged via rapid divergence. Colored circles at the terminal
branches indicate broad functional categories: yellow, DNA replication and repair; blue, transcription; pink, chaperone or protein unfolding/deg-
radation; and white, other specialized functions.
22 L.M. Iyer et al. / Journal of Structural Biology 146 (2004) 1131
8/14/2019 AAAplus New
13/21
that the LUCA had a single representative of the Lon
family that subsequently diversified into the bacterial
and archaeal Lons concomitantly with the divergence of
these superkingdoms of life.
Interestingly, the LAN appears to have acquired an
independent existence of its own during bacterial evo-
lution (Fig. 5). In several bacteria, such as Deinococcus,
Cyanobacteria, and proteobacteria, it was detected in
standalone proteins, whereas in certain eukaryotic pro-
teins, typified by the fungal CrgA and its orthologs, it is
fused to a N-terminal ubiqutin-E3 ligase (a RING fin-
ger; Fig. 5). This domain architecture suggests that LAN
domain was reutilized as a general adaptor for interac-
tions with proteins that are targeted for degradation.
2.2.9. The RuvB family
RuvB is the helicase subunit of the bacterial Holliday
junction resolvasome, which additionally includes RuvC,
the resolvase, and RuvA, a DNA-binding protein asso-
ciated with the complex (Putnam et al., 2001; Yamada
et al., 2002). RuvB is ubiquitous in bacteria, but is absent
in eukaryotes and archaea. The structure of RuvB clearly
shows that this family belongs to the PS1BH superclade
(Figs.1 and 3); however, it does not have any features that
allow us to specifically place it within any of the other
clades. The phyletic pattern of RuvB suggests that it
evolved prior to the radiation of bacteria from their
common ancestor, probably from one of the PS1BH
families that was already present in the LUCA (Fig. 4).
Fig. 5. A graph showing the domain architectures and select conserved functional interactions of the AAA+ ATPase. Direction of the arrow on an
edge of the graph indicates the polarity (whether a domain is to the N- or C-terminus) of the domain fusions in a polypeptide. A two-headed arrow
indicates that the fusions may occur either at the N or C terminus in different polypetides. The black edge indicates a physical interaction between two
kinds of domains in a protein complex. The barbed arrowhead on an edge indicated a domain insertion (e.g., the Zn cluster inserted in bacterial
clamp loaders). The domain abbreviations are: ANK, Ankyrin repeat; BAM, bromo-associated motif; BRCT, BRCA C-terminal domain; Bromo,
bromodomain; CH, calponin homology domain; ClpN- N-terminal domain of ClpAB ATPases; DPBB, double-w-b-barrel; fHTH, Fis-type helix-
turn-helix; wHTH, winged HTH; LAN, LA(LON)-N terminal domain; R-RING finger; REC, Receiver domain; REPO, insertb-barrel domain in
Reptin and Pontin; WC-WHIP-C-terminal domain; ZNC1/2/3, zinc clusters; ZNR-Zinc Ribbon.
L.M. Iyer et al. / Journal of Structural Biology 146 (2004) 1131 23
8/14/2019 AAAplus New
14/21
2.2.10. The helix-2 insert clade
The structure of the Mg chelatase reveals the presence
of a unique insert within helix-2 of the conserved ASCE
division P-loop ATPase core (Fodje et al., 2001). This
insert folds into two b-strands that form hydrogen
bonds with each other and flank a small helical region
between them. This insert does not significantly distortthe axis of helix-2, and the small helical segment flanked
by the b-strands appears to be a laterally displaced
fragment of helix-2 (Fig. 2). A sequence profile corre-
sponding to this region from the chelatases detects the
insert in a specific group of families of AAA+ domains,
and their sequence alignment reveals the presence of
sequence conservation associated with this region
(Fig. 3). Additionally, the families possessing this insert
in helix-2 also, typically, contain an insert of a long
helical segment between strand 5 (the C-terminal most
strand of the AAA+ core) and the C-terminal a-helical
bundle of the AAA+ module (Figs. 2 and 3). The crystal
structure of the Mg chelatase reveals that this insert
results in a displacement of the helical bundle from its
usual position at the top of the core AAA+ P-loop
module to below it (Figs. 2 and 3) (Fodje et al., 2001).
Experiments on the Mg2 chelatase suggest that that
upon binding ATP binding a large conformational
change is likely to reorient the helical bundle back to the
regular conformation (Hansson et al., 2002). Given the
conservation of this helical region it is likely that such a
conformational change is a common aspect of the
functions of this entire clade. Accordingly, we unified
the families containing these synapomporphic features
into the helix-2 insert clade. Seven major families,namely NtrC, MCMs, McrB, chelatases, YifB, MoxR,
and dynein-midasin were identified within this lineage.
The NtrC family is purely bacterial in its distribution
and co-occurs with its functional partner, the RNA
polymerase sigma factor 54. Members of this family
function as transcriptional activators that bind DNA at
a site distal from the sigma 54 binding site and, upon
interaction with the latter, catalyze ATP-dependent
structural transitions required for transcription initia-
tion (Rombel et al., 1998; Zhang et al., 2002). Most
members of this family contain C-terminal helix-turn-
helix (HTH) domains of the FIS family, which enables
them to bind DNA. The NtrC family has diversified
within the bacteria into several distinct subfamilies that
have characteristic fusions with various N-terminal do-
mains. These include the CheY-like receiver domain in
NtrC, the GAF domain in FhlA, the ACT and PAS
domains in TyrR, and the 4-vinyl reductase (4VR) do-
main in XylR (Fig. 5; Anantharaman et al., 2001). These
domains either connect the NtrC ATPase to the two-
component signaling systems or function as small mol-
ecule-binding domains that enable the ATPase to sense
various metabolites in the environment (Rombel et al.,
1998; Zhang et al., 2002).
The MCM family, which is ubiquitous in the archaeo-
eukaryotic family, has at least one member in all ar-
chaea studied to date (Kelman and Hurwitz, 2003). In
eukaryotes, it has diversified into six distinct ortholo-
gous lineages that appear to have diverged from each
other at an early stage of eukaryotic evolution. Mem-
bers of this family are characterized by fusions to anN-terminal Zn-ribbon domain and a C-terminal DNA-
binding winged-helix-turn-helix domain (Aravind and
Koonin, 1999b). These proteins function as hexameric
or heptameric ring helicases, which catalyze extensive
unwinding of the DNA at the origin of replication
during the initiation process (Fletcher et al., 2003; Yu
et al., 2002).
The McrB family is typified by the NTPase subunit of
the Mcr restriction-modification system and differs from
most of the AAA+ proteins in that it specifically utilizes
GTP rather than ATP (Panne et al., 2001; Pieper et al.,
1999). Orthologs of McrB are sporadically distributed in
bacteria and the archaea, Pyrobaculum and Methano-
sarcina, and are all encoded by the mobile operon that
encodes the Mcr-type restriction-modification system.
Interestingly, the McrB family is also represented in
animals by proteins such as Unc-53, which is involved in
axonal path finding (Stringham et al., 2002), HELAD-1
and cortactin-binding protein-2 (CBP-2). These are
large multi-domain proteins, which combine the AAA+
module of the McrB family with an N-terminal ankyrin
(in CBP-2) or Calponin homology domains (in Unc-53
and Helad-1) (Fig. 5). The Helad-1 protein has been
shown to possess 30 ! 50 helicase activity, but the rele-
vance of this function for neuronal path finding is notclear (Ishiguro et al., 2002). It is likely that these pro-
teins also play a role in the assembly of cytoskeletal
complexes. The CBP-2 proteins have a disrupted P-loop
and Walker B motif, suggesting that they are catalyti-
cally inactive, and the AAA+ module probably only
mediates specific interactions (Fig. 3). Since, among the
eukaryotes, the McrB family so far has been detected
only in animals, it seems likely that it was acquired
relatively late in eukaryotic evolution via HGT from
bacteria. McrB seems to represent a remarkable case
where a protein has been recruited for a biological
function that is completely different from its ancestral
role, after trans-kingdom HGT.
Members of the Chelatase family of the helix-2 insert
clade catalyze the insertion of metal ions, such as Mg2
and Co2, into the porphyrin rings during the biosyn-
thesis of cofactors, such as chlorophyll and cobalamin
(Fodje et al., 2001; Hansson et al., 2002). Members of
this family are either fused to a C-terminal von Wille-
brand factor A (vWA) domains or interact with stand-
alone vWA domains in a multisubunit complex. The
vWA domain functionally cooperates with the AAA+
ATPase domain in the metal-insertion reaction (Fodje
et al., 2001). The chelatase family is widespread in
24 L.M. Iyer et al. / Journal of Structural Biology 146 (2004) 1131
8/14/2019 AAAplus New
15/21
photosynthetic and autotrophic prokaryotes and plants.
Although two branches, dominated, respectively, by
bacterial and archaeal proteins can be seen in the phy-
logenetic tree of the chelatases, there are a number of
archaeal and bacterial proteins that cluster as sister
groups in the tree (data not shown). This implies mul-
tiple HGT events in the evolution of the chelatasefamily. Tree also shows that the plant members of this
family have clearly been derived from the cyanobacterial
precursor of the chloroplast.
The YifB family, typified by the E. coli YifB protein,
is found in several major bacterial lineages, such as cy-
anobacteria, actinomycetes, Deinococcus, proteobacte-
ria, certain spirochetes, Thermotoga, and Aquifex, and in
a single archaeon, Methanothermobacter. This family is
characterized by a fusion to an N-terminal Lon protease
domain, suggesting that it probably functions as an
ATP-dependent protease similar to Lon (Koonin et al.,
2000a). Members of this family also contain a unique
insertion of a Zn cluster just downstream of strand 4.
Sequence comparisons show that YifB is closest to the
chelatase family with which it shares certain distinct
sequence signatures (Fig. 3). Thus, the Lon protease
domain appears to have associated with two phyloge-
netically distinct AAA+ domains on independent occa-
sions in evolution. Furthermore, the Lon protease
domain is also fused with an ATPase domain of the
RecA class in the bacterial Sms protein (Aravind et al.,
1999).
The MoxR family is a large family that is represented
in all the major lineages of bacteria and archaea. Some
organisms, such as Mycobacterium tuberculosis, Pseu-domonas, and Aeropyrum pernix, encode up to 5 distinct
members of this family. Several major subfamilies are
recognizable within the MoxR family, including the
classic MoxR subfamily, GvpN, YehL, APE0892, and
YieN subfamilies. Despite their wide distribution, none
of these proteins have been experimentally characterized
in detail. MoxR is involved in the biogenesis of the
methanol dehydrogenase complex (Van Spanning et al.,
1991), while NirQ and CbbQ of the GvpN subfamily are
required for the biogenesis of the nitric oxide reductase
and Rubisco complexes, respectively (Hayashi et al.,
1998, 1999). GvpN itself appears to participate in the
formation of gas vesicles in diverse prokaryotes (Horne
et al., 1991). Thus, the members of the MoxR family
seem to function as chaperones in the assembly of spe-
cific enzymatic complexes. Members of the YieN family
co-occur with genes encoding proteins with vWA do-
mains and, by inference are likely to functionally inter-
act with them, similarly to the Mg chelatase family
proteins. The phylogenetic tree of the MoxR family is
similar to that of the chelatases, suggesting considerable
lateral mobility of these genes within and between ar-
chaea and bacteria (data not shown). Nevertheless, the
nearly ubiquitous presence in the major archaeal and
bacterial lineages suggests that this family emerged very
early.
The two giant multidomain ATPases from eukary-
otes comprise the dynein/midasin family. Both dynein
and midasin contain 6 tandem AAA+ domains in the
same polypeptide (Mocz and Gibbons, 2001; Neuwald
et al., 1999). Dynein functions as an ATP-dependentmotor in a large protein complex, which interacts with
microtubules. Cytoplasmic dynein transports vesicles,
organelles, and chromosomes (during cell division) in
the retrograde direction, whereas flagellar dynein acts as
motor for the movement of the eukaryotic flagellum
(Vale, 2003). At least a single copy of dynein is encoded
in all sequenced eukaryotic genomes, suggesting that it
was present in the last common ancestor of eukaryotes.
Very early in eukaryotic evolution, the dyneins diversi-
fied into forms specialized in cytoplasmic and flagellar
functions, and 12 or more paralogs of dynein are seen in
flagellated early-branching eukaryotes, such as Giardia.
Midasin also appears to be present in all eukaryotes and
is associated with the nuclear pore complex involved in
cytoplasmic export of the 60S ribosomal particle
(Bassler et al., 2001; Garbarino and Gibbons, 2002). By
analogy to dynein, midasin might act as a motor in the
translocation of the ribosomal particles across the nu-
clear pore. Animals have a second paralog of midasin
with 4 AAA+ domains. All members of the midasin
subfamily are associated with a C-terminal vWA do-
main, suggesting that, as with the chelatases, the inter-
action of the ATPase and vWA domains is required for
the proteins function.
Within the helix-2 insert clade, similarity-based clus-tering, sequence conservation patterns and reciprocal
recovery in profile searches suggested a closer higher
order relationship between the MoxR and dynein/mi-
dasin families, on one hand, and the YifB and Chelatase
families on the other hand (Fig. 4). Together, all these
families are related to the McrB family, to the exclusion
of the NtrC and MCM families (Fig. 4). The MCM and
NtrC familes, in turn, might share a closer higher order
relationship, as suggested by the sequence conservation
pattern in the first strand of the helix-2 insert. These
relationships, taken together with the phyletic patterns,
suggest that one or two ancestral members of this clade
were present in the LUCA and subsequently diversified
into the families discussed above concomitant with the
diversification of the major divisions of life.
2.2.11. Non-AAA+ ATPases previously included in this
class
In addition to the families outlined above, several
P-loop NTPases have been previously included in the
AAA+ class. However, the present analysis showed that
they lacked the defining features of this class and, ac-
cordingly, do not belong with the bona fide AAA+
proteins. The most notable case is that of FtsK, which
L.M. Iyer et al. / Journal of Structural Biology 146 (2004) 1131 25
8/14/2019 AAAplus New
16/21
functions as a DNA pump in bacteria (Aussel et al.,
2002; Donachie, 2002). Detailed analysis of the FtsK
sequence shows that it lacks the key features of the
AAA+ superclass and instead has the defining features
of the PilT/VirD4 class of P-loop NTPases, such as
additional strands in the ATPase core (LA, unpublished
observations). Thus, like other DNA pumps, such asTrwB, FtsK is a member of the PilT class (Gomis-Ruth
et al., 2001).
2.3. AAA+ ATPases present in LUCA and their evolution
in the pre-LUCA era
Based on the classification presented above, it can be
conservatively inferred that LUCA had 56 distinct
AAA+ domains. These include ancestors of: (1) the
clamp loader clade, (2) the DnaA-Orc/Cdc6 clade, (3)
the classic AAA clade, (4) LON ATPase family, (5)
Mcm and NtrC families, and (6) MoxR, Chelatase,
YifB, and Mcrb families (Fig. 4). Additionally, SFIII
helicases were probably encoded in virus-like replicons.
Thus, the founders of the higher order groups of the
AAA+ class probably have already diverged from each
other in the pre-LUCA era (Fig. 4).
Sliding clamps are critical for the processivity of
DNA polymerases and are utilized by most large DNA
replicons, including all cellular life forms and large
DNA viruses. Hence, the presence of a clamp loader
ATPase in the LUCA points to the presence of a rela-
tively large DNA-containing genome at this stage. This
is also consistent with the presence of the DNA re-
combinase RecA. However, as noticed previously, theenzymes that actually catalyze the critical steps of DNA
replication, including the helicase, which initiates
replication, the primase, the DNA polymerases, the
DNA-ligase, proof-reading exonucleases, and Holliday
junction resolvases, are not orthologous and, in many
cases, not even homologous between the archaeo-
eukaryotic and bacterial branches (Edgell and Doolittle,
1997; Leipe et al., 1999). However, although the initia-
tion helicases are non-orthologous in the bacterial and
archaeo-eukaryotic clade (DnaB and MCMs, respec-
tively), the ATPase responsible for their assembly,
namely, the ancestral member of the DnaA/Cdc6/Orc
clade, appears to have been present in LUCA ((Giraldo,
2003) and this work). This peculiar conservation pattern
of the replicative components suggests that the ancestral
clamp loader and DnaA/Cdc6/Orc ATPases functioned
in the context of a replication system that was dramat-
ically different from those found in modern cellular life
forms. One possibility is that genome replication in
LUCA occurred via reverse transcription of relatively
long RNA molecules, whereas the enzymes required for
direct DNA replication were invented only after the
separation of the archaeo-eukaryotic and bacterial
branches; the clamp loader and initiator ATPase might
have been parts of such a system (Leipe et al., 1999). The
other scenario is that the ancestral DNA replication
enzymes of LUCA were displaced later in evolution by
independently invented replication enzymes in one or
both of the principal divisions of life, perhaps with an
active contribution from virus-like elements (Forterre,
2002). However, the difficulty with the latter proposal isthat intrakingdom non-orthologous displacement of
core replication enzymes is not observed among the
extant living forms, although such displacements of
DNA polymerases and other enzymes are common in
DNA repair systems (Aravind et al., 1999; Eisen and
Hanawalt, 1999). Furthermore, the displacement hy-
pothesis would require concomitant displacement of
enzymes catalyzing several distinct steps of the DNA
replication process.
The SF3 helicases present another enigma in the
evolution of AAA+ ATPases. While they are extremely
prevalent in selfish replicons, they are not (so far)
represented in cellular genomes (Iyer et al., 2001). They
could have potentially been the ancestral replicative
helicases that were replaced by other helicases upon the
origin of distinctive DNA replication systems in the two
major branches of life. However, since the SF3 helicases
form a derived lineage in the PS1BH superclade, they
are unlikely to represent the most ancient version of the
AAA+ ATPases. The functional diversity within the
helix-2 insert clade of the PS1BH assemblage does not
allow us to predict the functions of their 12 ancestral
representatives, which might have been present in
LUCA. Both chaperone-like and helicase activities are
common in different families of this clade (Neuwaldet al., 1999; Ogura and Wilkinson, 2001), suggesting that
the ancestral form could have been a generic ATPase
possessing both these activities. Two AAA+ ATPases
with potential chaperone or ATP-dependent protein
unfolding activity, Lon and an ancestral classic AAA
ATPase, apparently were represented in LUCA. This
indicates that mechanisms for assembly and recycling of
multidomain proteins and multisubunit protein com-
plexes were already well advanced in LUCA.
Several rounds of duplications within the AAA+
class appear to have occurred during pre-LUCA
evolution. In LUCA, the AAA+ ATPases probably
performed two principal biochemical functions: (1) ca-
talysis of ATP-dependent structural transitions in
proteins and (2) nucleic acid-associated or stimulated
ATPase or helicase activities. These biochemical activi-
ties dominate all the extant branches of the AAA+
superclass, with both activities exhibited by the bacterial
LON ATPases (Fu et al., 1997). Thus, it is reasonable to
assume that the common ancestor of the entire AAA+
class was a generic ATPase that performed both of these
activities without much specificity. Even after the radi-
ation of the clamp-loader, DnaA/CDC6/ORC, classic
AAA, and PS1BH lineages, most of these proteins, with
26 L.M. Iyer et al. / Journal of Structural Biology 146 (2004) 1131
8/14/2019 AAAplus New
17/21
the possible exception of the classical AAA family,
probably retained some ability to perform both
functions.
The principal biochemical functions of the AAA+
ATPases are closely linked to their ring quaternary
structure, which allows them to thread peptides or nu-
cleic acids through the central pore of a ring or providesa quasi-periodic surface for interactions (Neuwald et al.,
1999; Ogura and Wilkinson, 2001; Zhang et al., 2002).
This quaternary structure is even more widely repre-
sented in the ASCE division of the P-loop NTPases
(Egelman et al., 1995; Gomis-Ruth et al., 2001; Leipe
et al., 2000). Within the ASCE division, the AAA+ class
forms one branch, whereas most of the other ASCE
ATPases form a second branch. Furthermore, within
this second branch, one lineage includes the ABC
ATPases, whereas the second lineage consists of the
PilT, RecA/F1, and SF1/2 helicase N-terminal domains.
Ring structures are additionally observed in the RecA/
F1 and the PilT classes. The common ancestor of this
branch of the ASCE ATPases can be reconstructed as a
nucleic acid-associated ATPase. Given that the ancestral
AAA+ ATPase also probably had a nucleic acid-asso-
ciated ATPase activity, we suspect that the common
ancestor of the entire ASCE division had a nucleic acid-
associated function, perhaps as a low specificity helicase
or a nucleic acid pump. Thus, the ring quaternary
structure might have evolved as an ancestral feature of
the ASCE ATPases, which formed these structures
around double-stranded nucleic acids in the extended
conformation. Subsequently, as the AAA+ ATPases
diverged from the rest of the ASCE NTPases, theymight have utilized this ancestral structure in the context
of peptide threading, in addition to nucleic acid un-
winding.
Of the ASCE ATPases, for which structures are
currently unavailable, the AP-ATPases, NACHT
GTPases, and uncharacterized archaeal ATPases con-
stitute a major class that appears to share specific fea-
tures with the AAA+ ATPases (Aravind et al., 2001;
Koonin, 1997 and DDL, LMI, EVK, and LA, unpub-
lished). These apparent synapomorphic features include
the N-terminal helix preceding the Walker A motif and a
5-stranded core sheet ending in an a-helical extension.
Thus, together with AAA+ ATPases, these NTPases
might constitute a major higher order assemblage within
the ASCE division.
2.4. The adaptive radiation of the AAA+ superfamily in
the primary kingdoms
The separation of the bacterial and archaeo-eukary-
otic lineages appears to have been accompanied by a
major, rapid diversification of AAA+ into several spe-
cific roles primarily related to chaperone and protein
degradation activities. This dramatic radiation included
the emergence of ClpAB-N and C-terminal domains,
FtsH, ClpX, HlsU, and YifB in the bacterial lineage,
and the proteasomal and CDC48-like ATPases in the
archaeo-eukaryotic lineage (Fig. 4). Furthermore, the
Tip49 and MCM families with helicase activity evolved
in the archaeo-eukaryotic lineage, whereas the tran-
scription factor NtrC underwent an explosive diversifi-cation in the bacterial lineage. The second major phase
in the diversification of AAA+ ATPases was concomi-
tant with the origin of the eukaryotes. This diversifica-
tion occurred primarily within the classical AAA clade
(Beyer, 1997; Swaffield and Purugganan, 1997) and ap-
pears to have generated proteins, which were critical for
the emergence of many defining features of the eukary-
otes. A second important eukaryotic innovation oc-
curred in the helix-2 insert clade. Here, the ancient
MoxR family appears to have diversified into dynein
and midasin. These events appear to have been critical
for the emergence of the eukaryotic nucleus (midasin is
involved in nucleo-cytoplasmic transport of ribosomes),
cytoplasm (dynein for cytoplasmic transport), and fla-
gella (the dynein motor). In this context, it would be of
interest to investigate whether some of the prokaryotic
MoxR proteins have motor activity comparable to that
of dynein.
A significant aspect of the diversification of the
AAA+ class is related to the fusion of the ATPase do-
main with other globular domains in same polypeptide
(Dougan et al., 2002). These fusions ensure physical
proximity of the respective domains and are diagnostic
of functional interactions (Huynen and Snel, 2000;
Marcotte et al., 1999). We examined these domain fu-sions, as well as known physical interactions between
AAA+ ATPases and other domains, by constructing a
graph, which represents the entire network of such in-
teractions (Fig. 5). Three general trends emerge from
this analysis. First, the AAA+ domain underwent fu-
sions with different protease domains on several inde-
pendent occasions during evolution. The fusions include
the Zn metalloprotease domain in FtsH and the Lon
protease in the LON and YifB families. Furthermore,
the proteasomal ATPases of the classic AAA+ clade
interact with proteases of the NTN hydrolase (macro-
pain) and JAB superfamilies (Verma et al., 2002). The
distinct ClpP protease interacts with ClpX, while its
paralog HslU interacts with HslV protease of the NTN
hydrolase superfamily (Fig. 5). These fusions and in-
teractions are generally reminiscent of the fusions and
interactions of the SF1/2 helicases with diverse nucleases
(Aravind et al., 1999). In contrast, there are hardly any
domain fusions of AAA+ ATPases with nucleases, al-
though there are a few cases of physical association, e.g.,
McrB and some SF3 helicases. Fusions and associations
with proteases are rare in the other branches of the
ASCE division of ATPases. This observation suggests
that origin of AAA+ ATPases marked the emergence of
L.M. Iyer et al. / Journal of Structural Biology 146 (2004) 1131 27
8/14/2019 AAAplus New
18/21
chaperone and related protein restructuring activities
among the P-loop NTPases.
Second, the vWA domain is fused or interacts non-
covalently with different versions of the AAA+ domain,
such as the chelatases, MoxR, and dynein/midasin. The
S5a protein, a subunit of the eukaryotic proteasome,
also contains a vWA domain that functionally interactswith the proteasomal AAA ATPases (Fig. 5). The vWA
domains could function as adaptors that bring the
ATPase domains to their substrates. Alternatively, in
the case of the chelatases, vWA domain might bind
Mg2 as part of the chelation reaction (Fodje et al.,
2001). Other adaptor modules, such as the BAM do-
main in the eukaryotic ORC1 proteins, the bromodo-
main in the F11A10.1 proteins of the proteasomal
ATPase family and the BRCT domain in RFC1 could
allow these proteins to specifically associate with other
proteins of the complex eukaryotic chromatin and DNA
repair systems, respectively. Likewise, the ankyrin re-
peats and the CH domain could mediate the interaction
of Unc-53 and CBP-2 family with cytoskeletal compo-
nents (Fig. 5).
Finally, several independent fusions of the AAA+
domain with different versions of the DNA-binding
HTH domain are seen in the NtrC, MCM, Cdc6, DnaA,
and RuvB families (Fig. 5). These fusions might have
allowed the evolution of specific interactions between
the AAA+ domains and nucleic acids. This could imply
that the ancestral AAA+ domains non-covalently as-
sociated with stand-alone HTH proteins to translocate
to specific sites on the DNA, such replication origins.
3. General conclusions
Our understanding of the AAA+ ATPases, especially
in structural and mechanistic terms, has vastly improved
since the publication of the previous survey of this
protein class (Neuwald et al., 1999). Using the wealth of
structures and genomic information currently at our
disposal, we identified the defining structural features of
the AAA+ superclass and constructed an evolutionary
classification along with a reconstruction of some major
aspects of their evolutionary history. In particular, some
aspects of the earliest stages of their evolution, the
higher order divergence events and position of some
highly divergent versions, such as SF3 helicases, dynein,
and midasin, are becoming apparent. The AAA+
ATPases are an ancient group of ATPases, which
already showed considerable diversity in LUCA. The
earliest diversification events in their evolution appear to
correspond to the emergence of specific features related
to processivity of DNA replication apparatus and
assembly of the replication initiation complexes. The
next great radiation gave rise to several distinct chap-
erones, ATPase subunits of proteases, DNA helicases
and transcription factors. The third major radiation at
the base of the eukaryotic lineage probably contributed
to the origin of eukaryote-specific adaptations related to
nuclear and cytoskeletal functions. Some of the rela-
tionships and domains reported here might provide new
leads in investigating the functions and biology of
AAA+ ATPases.
4. Materials and methods
Sequences of AAA+ proteins were extracted from the
non-redundant (NR) protein sequence database (Na-
tional Center for Biotechnology Information, NIH,
Bethesda) using the PSI-BLAST program (Altschul
et al., 1997), with the sequences of known AAA+ proteins
as queries. Sequence similarity-based protein clustering
was performed using the BLASTCLUST program (ftp://
ftp.ncbi.nih.gov/blast/documents/README.bcl). Mul-
tiple alignments were constructed using the Clustal X
(Thompson et al., 1997) or T-Coffee (Notredame et al.,
2000) programs and corrected on the basis of PSI-
BLAST results and structural alignments, as previously
described (Aravind and Koonin, 1999a). All newly re-
covered sequences were evaluated on the basis of the
presence of conserved AAA+ specific motifs, such as
those associated with the N-terminal helix, sensor 1 and
2, and the arginine finger, for differentiating them from
other P-loop proteins. For each of the families recog-
nized through these procedures, the phyletic distribution
was evaluated in terms of the presence of homologues in
a phylogenetically diverse sample of60 complete ge-nomes from the three primary kingdoms, Bacteria, Ar-
chaea, and Eukaryota. The COG database was used as a
guide for identifying orthologous proteins and their
phyletic patterns (Tatusov et al., 2003).
Phylogenetic trees were constructed using the
PROTDIST and FITCH programs of the PHYLIP
package with the default parameters (Felsenstein, 1996),
followed by optimization via local rearrangements
conducted using the maximum likelihood (ML) method
with the JTTF substitution model as implemented in the
MOLPHY package (Adachi and Hasegawa, 1992).
Neighbor joining trees were constructed using the
MEGA program (Kumar et al., 1994). Support for
selected tree branches was measured by 10,000 resam-
plings with the relative logarithmic boostrap (RELL-
BP) procedures implemented in the MOLPHY package
(Adachi and Hasegawa, 1992). For evolutionary re-
constructions, the standard model of early evolution,
which postulates the original split between the bacterial
and archaeo-eukaryotic lineages, was employed as the
null hypothesis (Brown and Doolittle, 1997).
For structural co