AAAplus New

Embed Size (px)

Citation preview

  • 8/14/2019 AAAplus New

    1/21

    Evolutionary history and higher order classificationof AAA+ ATPases

    Lakshminarayan M. Iyer, Detlef D. Leipe, Eugene V. Koonin, and L. Aravind*

    National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA

    Received 2 September 2003, and in revised form 8 October 2003

    Abstract

    The AAA+ ATPases are enzymes containing a P-loop NTPase domain, and function as molecular chaperones, ATPase

    subunits of proteases, helicases or nucleic-acid-stimulated ATPases. All available sequences and structures of AAA+ protein

    domains were compared with the aim of identifying the definitive sequence and structure features of these domains and inferring

    the principal events in their evolution. An evolutionary classification of the AAA+ class was developed using standard phy-

    logenetic methods, analysis of shared sequence and structural signatures, and similarity-based clustering. This analysis resulted in

    the identification of 26 major families within the AAA+ ATPase class. We also describe the position of the AAA+ ATPases with

    respect to the RecA/F1, helicase superfamilies I/II, PilT, and ABC classes of P-loop NTPases. The AAA+ class appears to have

    undergone an early radiation into the clamp-loader, DnaA/Orc/Cdc6, classic AAA, and pre-sensor 1 b-hairpin (PS1BH)

    clades. Within the PS1BH clade, chelatases, MoxR, YifB, McrB, Dynein-midasin, NtrC, and MCMs form a monophyletic

    assembly defined by a distinct insert in helix-2 of the conserved ATPase core, and additional helical segment between the core

    ATPase domain and the C-terminal a-helical bundle. At least 6 distinct AAA+ proteins, which represent the different major

    clades, are traceable to the last universal common ancestor (LUCA) of extant cellular life. Additionally, superfamily III heli-

    cases, which belong to the PS1BH assemblage, were probably present at this stage in virus-like selfish replicons. The nextmajor radiation, at the base of the two prokaryotic kingdoms, bacteria and archaea, gave rise to several distinct chaperones,

    ATPase subunits of proteases, DNA helicases, and transcription factors. The third major radiation, at the outset of eukaryotic

    evolution, contributed to the origin of several eukaryote-specific adaptations related to nuclear and cytoskeletal functions. The

    new relationships and previously undetected domains reported here might provide new leads for investigating the biology of

    AAA+ ATPases.

    Published by Elsevier Inc.

    1. Introduction

    A large part of the proteome of any organism is de-

    voted to proteins that bind nucleoside triphosphates

    and, typically, utilize them as substrates in various re-

    actions (reviewed in Vetter and Wittinghofer, 1999).

    Several distinct NTP-binding protein folds have been

    structurally characterized to date, but amongst these the

    P-loop NTPases (Saraste et al., 1990) are by far the most

    abundant class, which accounts for 1018% of the pre-

    dicted gene products in the sequenced prokaryotic and

    eukaryotic genomes (Koonin et al., 2000a). Proteins

    with P-loop NTPase domains are also present in the

    majority of viruses studied to date (Gorbalenya and

    Koonin, 1989). The P-loop NTPases are thought to be a

    monophyletic assemblage of protein domains, and sev-

    eral distinct versions of this domain are traceable to the

    last universal common ancestor (LUCA) of all modern

    cellular life forms (Kyrpides et al., 1999; Leipe et al.,

    2002). This suggests that the P-loop domain originated

    long before the time of the LUCA and had undergone

    considerable structural and functional diversification

    prior to this period. Thus, understanding the natural

    history of P-loop NTPases is critical for understanding

    the key aspects of lifes evolution, ranging from the early

    phases to the radiation of major organismal lineages.

    Most members of the P-loop NTPase fold hydrolyze

    the bc phosphate bond of a bound nucleoside tri-

    phosphate, most often, ATP or GTP. The free energy of

    * Corresponding author. Fax: 1-301-435-7794.

    E-mail address: [email protected] (L. Aravind).

    1047-8477/$ - see front matter. Published by Elsevier Inc.

    doi:10.1016/j.jsb.2003.10.010

    Journal of Structural Biology 146 (2004) 1131

    Journal of

    StructuralBiology

    www.elsevier.com/locate/yjsbi

    http://mail%20to:%[email protected]/http://mail%20to:%[email protected]/
  • 8/14/2019 AAAplus New

    2/21

    this hydrolysis reaction is typically utilized to induce

    conformational changes in other molecules. This con-

    stitutes the basis of the biochemical activities and bio-

    logical functions of most P-loop fold proteins. In

    contrast, members of one major lineage of P-loop pro-

    teins, the kinases, transfer the ATP c-phosphate to di-

    verse substrates (Leipe et al., 2003). Structurally, P-loopdomains adopt a 3-layered a/b sandwich configuration

    that contains regularly recurring ab units with the

    b-strands forming a central, mostly parallel b-sheet

    surrounded on both sides by a-helices (Milner-White

    et al., 1991) (see also the SCOP database (Murzin et al.,

    1995): http://scop.mrc-lmb.cam.ac.uk/scop/). At the se-

    quence level, P-loop NTPases are generally character-

    ized by two conserved sequence motifs, the Walker A

    and B motifs, which bind, respectively, the b and c

    phosphate moieties of the bound NTP, and a Mg2

    cation (Saraste et al., 1990; Vetter and Wittinghofer,

    1999; Walker et al., 1982).

    Sequence and structure analyses suggest that the

    primary diversification event in the evolution of

    the P-loop fold resulted in the two principal classes of

    the P-loop domains. The first of these, the KG (Kinase

    GTPase) division includes the kinases and GTPases that

    share number of structural similarities, such as the ad-

    jacent placement of the P-loop and Walker B strands.

    The other class, the ASCE division (for additional

    strand, catalytic E), is characterized by an additional

    strand in the core sheet, which is located between the

    P-loop strand and the Walker B strand (Leipe et al.,

    2002, 2003; Fig. 1). As opposed to kinases and GTPases,

    ATP hydrolysis by the proteins of the ASCE grouptypically depends on a conserved catalytic (proton-

    abstracting) glutamate that primes a water molecule for

    the nucleophilic attack on the c-phosphate group of

    ATP. The ASCE division includes AAA+, ABC, PilT/

    VirD4, superfamily 1/2 (SF1/2) helicases, and RecA/F1/

    F0 superfamilies of ATPases, along with several addi-

    tional, less confidently classified families.

    Starting over a decade ago, the AAA ATPases

    (ATPases associated with a variety of cellular activities)

    were encountered in studies on an astonishing range of

    biochemical systems (Confalonieri and Duguet, 1995;

    Lupas and Martin, 2002; Ogura and Wilkinson, 2001).

    These included, among others, the eukaryotic prote-

    asomal ATPases, CDC48, and FtsH, which are involved

    in processes related to protein stability and degradation

    in bacteria and eukaryotes, NSF, which is implicated in

    vesicular fusion, Pex1p, involved in peroxisome bio-

    genesis, and Bcs1p, which participates in the assembly of

    mitochondrial membrane complexes. Approximately

    around the same time, a detailed computational analysis

    of various cellular and viral proteins involved in nucleic

    acid metabolism, such as DnaA, the MCM proteins,

    NtrC-type transcription factors, and helicases of various

    RNA and DNA viruses comprising the helicase

    superfamily 3 (SF3), suggested that all these proteins

    shared a conserved ATPase domain (Koonin, 1993).

    Solution of the X-ray structure of the NSF protein

    and its comparison with the clamp loader subunit

    structure supported the unification of these ATPases

    into a monophyletic group (Guenther et al., 1997;

    Lenzen et al., 1998). Concomitantly, we conducted asystematic analysis of these ATPase domains using

    advanced sequence profile analysis methods and struc-

    tural comparisons, which resulted in the unification of

    the bona fide AAA ATPase and the DnaA/MCM/

    NtrC/SFIII-related proteins into a single, monophyletic

    AAA+ class (Neuwald et al., 1999). Additionally,

    this analysis showed that various other ATPase domain

    families, such as ClpAB/Hsp100, ClpX, HslU, and Lon,

    which are involved in protein folding and degradation,

    the eukaryotic motor protein dynein, a large, conserved

    eukaryotic protein with 6 ATPase domains (subse-

    quently termed midasin), magnesium and cobalt

    chelatases, the bacterial DNA-replication clamp load-

    ers, and eukaryotic replication factor C subunits, also

    belonged to the AAA+ class. It was also proposed that

    the AAA+ domain might be a common denominator in

    the catalytic assembly or disassembly of large cellular

    complexes of polypeptides and nucleic acids and that

    the majority of AAA+ ATPases function as oligomeric

    ring structures, which provide symmetric or quasi-

    symmetric surfaces for interactions with other mole-

    cules or a central pore for threading molecules in an

    extended conformation (Neuwald, 1999; Neuwald et al.,

    1999).

    Since the publication of the original analysis of theAAA+ class, a wealth of structural and biochemical

    studies have been published that have strongly rein-

    forced the monophyly of AAA+ ATPases and eluci-

    dated intricate functional details of how oligomeric

    rings of AAA+ proteins could be deployed in various

    biological contexts (Dougan et al., 2002; Lupas and

    Martin, 2002; Ogura and Wilkinson, 2001). Currently

    over 15 structures of distinct types of the AAA+

    domain are available (Fig. 1). This data, along with

    the genome sequences of diverse organisms from many

    of the principal phylogenetic lineages, provides for a

    post-genomic vantage point to address several

    issues, which have not been tractable previously: (1) A

    formal, unified definition of the AAA+ class that

    combines sequence and structural information. (2) The

    higher order relationships within the AAA+ class. (3)

    The earliest events in the evolution of AAA+ ATPases

    and its differentiation from the other ASCE ATPases.

    (4) The trends in colonization of various functional

    niches during the evolution of this class of ATPases.

    Here, we address these problems, particularly in light

    of the new information that became available since the

    previous survey of the AAA+ class fo ATPases

    (Neuwald et al., 1999).

    12 L.M. Iyer et al. / Journal of Structural Biology 146 (2004) 1131

    http://scop.mrc-lmb.cam.ac.uk/scop/http://scop.mrc-lmb.cam.ac.uk/scop/
  • 8/14/2019 AAAplus New

    3/21

    2. Results and discussion

    2.1. The defining structural and catalytic features of the

    AAA+ ATPases

    We collated all currently available structures of pro-

    teins that have been confidently assigned to the AAA+class and prepared a multiple alignment of their se-

    quences on the basis of their structure superposition.

    This allowed us to map all the major conserved sequence

    features to their 3D structural cognates in the core

    AAA+ domain. Furthermore, this structure-based

    alignment also enabled correction of the earlier align-

    ments (Koonin, 1993; Neuwald et al., 1999), which were

    derived principally from sequence comparisons, and the

    recognition of previously overlooked subtle sequence

    signatures. Representative sequences of previously

    identified AAA+ proteins were then used as seeds for

    PSI-BLAST searches of the NR database to detect new

    AAA+ ATPases. All the newly detected putative AAA+

    proteins were compared to the structure-based multiple

    alignment to establish their membership in the AAA+

    class based on the presence of the defining consensus

    patterns.

    The AAA+ ATPase domains share the ancestral

    Walker A and B motifs with the rest of the P-loop

    NTPases (Figs. 13). In the majority of the members of

    the AAA+ class, the canonical form of the Walker A (P-

    loop) motif is conserved, typically, in the form of

    GX2GXGK[ST]. At least one of the residues between

    the first two glycines of the signature is frequently a

    proline. Certain minor deviations from the canonicalform are seen in P-loop motifs of the NtrC

    (GX2GXGK[DE]), MCM (GX2GXAKS), and MoxR

    (GX2GXAK[ST]) families (Fig. 3). Additionally, the

    P-loop motif is disrupted in various catalytically inactive

    AAA+ domains, such as Sir3p, Orc4p, some of the re-

    peats of dynein, and the d0 subunit of the bacterial

    replication clamp-loader. The Walker B motif typically

    assumes the form hhhhDE (h is a hydrophobic residue),

    where the conserved glutamate primes a water molecule

    for a nucleophilic attack on the c-phosphate group of

    ATP (Story and Steitz, 1992). Deviations from this state

    are again observed in forms that are likely to be cata-

    lytically inactive. The core parallel sheet of the AAA+

    ATPase domain assumes a 51432 topology, where

    strands 1 and 3 are, respectively, associated with the

    Walker A and B motifs (Fig. 1) (Guenther et al., 1997;

    Lenzen et al., 1998; Neuwald et al., 1999). This core

    differs from most of the other ATPases of the ASCE

    division, such as RecA-F1, SF1/2 helicases and PilT

    ATPases, in lacking additional strands to the right of

    strand 2. Strand 4 of the AAA+ ATPase domain is as-

    sociated with another motif (termed sensor-1 or Motif

    C) bearing a conserved polar residue (Figs. 2, 3). Like its

    equivalents in several other ASCE division ATPases,

    this residue is likely to mediate interactions that are

    critical for ATP hydrolysis rather than ATP binding

    (Guenther et al., 1997; Putnam et al., 2001).

    Beyond these basic features of the core domain, the

    AAA+ class has several features that distinguish it from

    other NTPases of the ASCE division. First, the AAA+

    proteins possess an additional conserved helix N-ter-minal to the Walker A strand. This helix typically con-

    tains a conserved glycine or a small residue that caps it

    at the N-terminus, and a conserved polar (usually

    acidic) residue that defines the N-terminal sequence

    motif of this class (Figs. 13). Most structures also

    contain a conserved region, which is located upstream of

    this helix, adopts a characteristic extended conformation

    and runs perpendicular to the direction of the strands in

    the core sheet (Figs. 1 and 2). The presence of this region

    in representatives from all the diverse branches of the

    AAA+ class suggests that it is an ancestral, defining

    feature of the AAA+ clade. The AAA+ class is also

    distinguished by the presence of a helical bundle with 4

    helical segments that occurs immediately C-terminal to

    strand 5 of the core ATPase domain. This structure

    contains a conserved motif (sensor-2) with a frequently

    conserved arginine (Figs. 2 and 3). In the classical AAA

    proteins this arginine is often replaced by an alanine.

    Sensor-2 appears to be critical in constraining the ATP

    molecule to facilitate its hydrolysis and undergoes con-

    formation changes depending on the presence of ATP or

    ADP. Thus, this helical module appears to mediate the

    transmission of the free energy of ATP hydrolysis by

    AAA+ proteins to their respective substrates (Ogura

    and Wilkinson, 2001).A large number of the AAA+ ATPases characterized

    to date form quasi-symmetrical oligomeric ring struc-

    tures that, in certain cases, have been shown to thread

    nucleic acids or peptides in extended conformation

    through the central pore (Lenzen et al., 1998; Neuwald,

    1999; Neuwald et al., 1999; Ogura and Wilkinson, 2001;

    VanLoock et al., 2002). In the case of dynein and mi-

    dasin, which contain 6 repeats of the AAA+ domain in a

    single polypeptide, the protein is also likely to fold into a

    hexameric ring. This quaternary structure has also been

    consistently observed in several members of the RecA/

    F1 and PilT classes of ATPases, suggesting that it is an

    ancestral feature of the entire ASCE clade (Egelman

    et al., 1995; Gomis-Ruth et al., 2001; Leipe et al., 2000).

    The inter-protomer cooperation in ATP-hydrolysis is

    elicited by another defining feature of the AAA+ su-

    perclass, the arginine finger. This is a conserved arginine

    that is located at the C-terminus of the helix upstream of

    strand 5 and is directed towards the ATP-containing

    active site of the adjacent protomer in a ring (Guenther

    et al., 1997; Putnam et al., 2001; Zhang et al., 2002). The

    arginine finger appears to be displaced relative to its

    position in other AAA+ domains in the DnaA and Orc/

    Cdc6 families (Fig. 2). The above features can be

    L.M. Iyer et al. / Journal of Structural Biology 146 (2004) 1131 13

  • 8/14/2019 AAAplus New

    4/21

    considered shared derived characters (synapomorphies)

    of the AAA+ clade within the ASCE division and con-

    stitute a blueprint that allows clear segregation of

    AAA+ ATPases from all other P-loop NTPases.

    2.2. Evolutionary classification of the AAA+ ATPases

    2.2.1. Identification of AAA+ families and relationships

    between them

    All previously identified AAA+ ATPases and newly

    detected proteins, which conformed to the AAA+

    blueprint described above, were clustered using the

    BLASTCLUST program with varying score density and

    protein-length-overlap thresholds. Those clusters that

    remained stable over a range of medium thresholds

    (score density 0.30.5) were considered likely to define

    monophyletic families or at least cores of such families.

    The alignments for these clusters were then constructed

    using the T-Coffee program, corrected based on the

    template structural alignment, and analyzed to identify

    regions of extended conservation between and beyond

    the principal conserved signatures described above. This

    allowed us to formally define families on the basis of

    conserved signatures (Table 1). Typically, families in-

    cluded one or more orthologous lineages and, in some

    cases, also a cloud of more divergent paralogs. The

    presence of shared sequence features between families

    and/or consistent clustering of families based on score

    densities allowed us to delineate clades comprised of

    multiple families. Further higher order groups of these

    clades were derived using structure-based clustering

    with pair-wise DALI Z-scores and through identifica-tion of unique structural features that unified multiple

    superfamilies (Figs. 1, 4 and Table 1). Conventional

    phylogenetic trees, constructed using the neighbor

    joining, maximum likelihood, and minimum evolution

    methods, were used to explore the relationships within

    individual families or a group of closely related families.

    However, as the overall sequence similarity between

    families decreases there are only a small number of

    conserved positions shared between the families. It is

    not possible to obtain statistically well-supported higher

    order groupings based on this small set of universally

    conserved residues using conventional phylogenetic

    methods. In these conditions, could also cause artificial

    clustering of proteins that retain primitive sequence

    features present in the common ancestor of the entire

    group. Hence, it is necessary to depend mainly on

    structure comparisons to identify the primitive struc-

    tural state for the fold, and then delineate the structural

    characters that are derived in particular lineages. Based

    on these characters the most parsimonious scenario (a

    scenario, which minimizes the loses and independent

    innovations of particular features), which accounts for

    the observed structural quirks was determined and is

    presented here.

    Additionally, we analyzed the phyletic distribution of

    individual families in the completely sequenced ge-

    nomes. For families with a wide phyletic spread, i.e.,

    those present in 2 or all 3 primary kingdoms (bacteria,

    archaea, and eukaryotes), conventional phylogenetic

    trees were constructed. The presence of distinct archaeal

    (or archaeo-eukaryotic) and bacterial branches in thesetrees was taken as an indication that the given family

    was most likely represented in LUCA. In contrast, the

    presence of a well-defined bacterio-eukaryotic branch,

    particularly in the absence of a clear archaeal clade,

    suggested horizontal gene transfer (HGT) from bacteria

    to eukaryotes, most often from the pro-mitochondrion

    or the pro-chloroplast. Below we briefly describe the 26

    identified families of AAA+ ATPases, with an emphasis

    on their phyletic distribution and lineage-specific deri-

    vations, along with the predicted functions of unchar-

    acterized groups (Table 1).

    2.2.2. The clamp loader cladeIn all cellular life forms, the clamp loader ATPases

    are responsible for loading the DNA clamp, which is

    comprised of the PCNA or DNA polymerase III b

    subunits, onto the DNA (Davey et al., 2002a; Davey

    and ODonnell, 2000; Hingorani and ODonnell, 1998).

    The AAA+ domains of this clade mostly conform to the

    idealized core of this class without any specialized

    innovations (Fig. 2). The clamp loader ATPases have a

    synapomorphic RC signature associated with the ar-

    ginine finger (Fig. 3). Three major families can be

    identified within this clade. The first of these, the bac-

    terial family, includes two pan-bacterial orthologouslineages, typified by the Escherichia coli HolB (d0 sub-

    unit) and DnaX (c and s subunits) proteins, respectively.

    The bacterial family of clamp loaders is characterized by

    the insertion of a Zn cluster downstream of the helix

    associated with the Walker A motif (Guenther et al.,

    1997). The presence of members of the two orthologous

    lineages in almost all bacteria suggests an ancient du-

    plication and functional diversification in the common

    ancestor of all known bacteria. The DnaX gene from

    cyanobacteria appears to have been secondarily trans-

    ferred to plants and probably participates in the repli-

    cation of the chloroplast DNA.

    The second clamp loader family, the RFC family,

    has an archaeo-eukaryotic distribution and consists of

    two major orthologous lineages. One of these lineages

    includes the archaeal RFC proteins typified by

    MTH241 and the eukaryotic RFC2, RFC3, RFC4,

    and RFC5 proteins. The second lineage includes the

    archaeal proteins typified by MTH240 and the eu-

    karyotic RFC1, Rad24, Chl12, and Yor144c proteins.

    In several archaea, the representatives of the two or-

    thologous lineages are encoded by adjacent genes in

    the genome. This suggests that an ancient duplication

    in the archaeal lineage resulted in two branches of the

    14 L.M. Iyer et al. / Journal of Structural Biology 146 (2004) 1131

  • 8/14/2019 AAAplus New

    5/21

    RFC family. Early in eukaryotic evolution each of

    these lineages appears to have undergone several ad-

    ditional duplications giving rise to the 5 subunits of

    RFC and the other proteins, such as Rad24 and

    Chl12, which were recruited for distinct roles in DNA

    repair (Naiki et al., 2001). This was paralleled by the

    duplication of the PCNA clamp itself, resulting inspecialized clamps, such as Rad1, Rad9, and Hus2,

    that functioned as partners for the Rad24-like ATP-

    ases (Aravind et al., 1999). Additionally, the RFC

    family includes the clamp-loaders of viruses, such as

    T4 (Davey et al., 2002b), which could represent an

    early diverging branch associated specifically with viral

    replicons.

    The third major family of is the WHIP family, typi-

    fied by the Werner helicase interacting protein from

    vertebrates and yeast Mgs1p (Kawabe et al., 2001). This

    family is present in most major bacterial lineages and in

    all sequenced eukaryotic genomes. Most members of

    this family contain a distinct C-terminal globular do-

    main fused to the AAA+ domain (Fig. 5) that contains a

    conserved acidic residue reminiscent of the catalytic

    residue of the RNAse H fold (Bork et al., 1992). In

    phylogenetic trees, the eukaryotic proteins form a tight

    group that lies within the bacterial radiation, closer to

    the proteobacteria (data not shown). This suggests that

    the WHIP family was probably derived early in bacterial

    evolution from the bacterial clamp loader family and

    was transferred to eukaryotes from the pro-mitochon-

    drial endosymbiont (Fig. 4).

    The presence of distinct, functionally equivalent

    bacterial and eukaryotic families in the clamp loaderclade suggests that one ancestral member of this clade

    was present in LUCA.

    2.2.3. The DnaA/CDC6/ORC cladeThis clade consists of two families, namely, the bac-

    terial DnaA family and the archaeo-eukaryotic CDC6/

    ORC family, which perform analogous roles in the as-

    sembly of protein complexes associated with the repli-

    cation origin (Erzberger et al., 2002; Giraldo, 2003). The

    synapomorphy that unifies the DnaA/CDC6/ORC clade

    is the presence of two helices after strand 2, which are of

    approximately equal size, and pack against each other in

    a characteristic fashion (Figs. 13). However, the two

    families of this clade have greatly diverged from each

    other in terms of sequence. The bacterial DnaA familyin bacteria contains two principal orthologous groups.

    DnaA proper is a pan-bacterial group, typically present

    in a single copy in all known bacteria. The DnaA pro-

    tein forms an oligomeric complex around the replication

    origin site and initiates local unwinding of DNA, which

    is required for recruitment of the DnaB helicase (Davey

    and ODonnell, 2000; Erzberger et al., 2002). The second

    orthologous set typified by the E. coli DnaC protein is

    sporadically present in several diverse bacteria and ap-

    pears to have arisen through a duplication of DnaA

    relatively late in bacterial evolution. In E. coli, DnaC

    has been shown to load the helicase DnaB to the single-

    stranded DNA at the origin site (Davey et al., 2002a). In

    bacteria lacking DnaC, this function might be per-

    formed by DnaA itself. IstB, a member of the DnaC

    lineage, is encoded by transposons of the IS21 family

    and is required for their transposition, probably via re-

    cruitment of the replication complex to the transposon

    DNA.

    The Orc/Cdc6 family has 12 representatives in al-

    most all archaea. Interestingly, it is absent in Met-

    hanococcus, while Halobacterium shows an extensive

    lineage-specific expansion of this family, with 11 par-

    alogous proteins. Early in eukaryotic evolution, the

    Orc/Cdc6 family appears to have differentiated into theOrc1p, Orc4p, Orc5p, and Cdc6p lineages that are

    present in most extant eukaryotes. These proteins co-

    operate in loading the MCM complex to the origin of

    replication site analogously to the action of DnaA and

    DnaC in loading DnaB in bacteria (Lee and Bell, 2000;

    Liu et al., 2000). Certain eukaryotic members of

    the Orc/Cdc6 family, such as yeast Orc4p and Sir3,

    a yeast-specific paralog of Orc1p, have disrupted

    Fig. 1. Topology diagrams of selected AAA+ ATPases and other members of the P-loop NTPase fold. Strands are shown as arrows with the

    arrowhead on the C-terminal side and numbered 1 through 5. Strands 1 and 3 that encompass the conserved sequence motifs GxxxxGK[ST] (WalkerA) and hhhh[DE] (Walker B) are rendered in orange; the other core strands (2,4,5) are in light orange; non-conserved strands are in gray. Helices are

    shown as blue rectangles when above the plane of theb-sheet and in faint blue when below the b-sheet. The P-loop is shown as a red line and a green

    arrowhead marks the N-terminus of the ATPase domain. The defining feature of the AAA+ ATPases, the loop that runs across the face of the b-sheet

    and the helix before strand 1 are highlighted in dark blue. The b-hairpin that defines the PS1BH group is rendered in purple. Broken lines indicate

    secondary structure elements that are not present in the PDB file or that were left out for clarity. The top panel shows two proteins from outside the

    AAA+ group (RecA, ASCE division; TMP Kinase, KG division) for comparison purposes. Sequences are identified with the protein name, PDB

    code (in parenthesis) and the organism name.

    Fig. 2. Structures of selected AAA+ ATPases. The top left panel shows the structure of a member of the clamp loader clade labeled with all the

    synapomorphies of the AAA+ clade. It is close to the ideal AAA+ domain structure. The top left panel shows the structure of the archaeal Cdc6

    protein with 2 helices after strand 2, which are the shared character of the ORC/CDC6 clade. Bottom left panel shows the structure of RuvB, which

    has the hallmark b-hairpin defining the PS1BH superclade. The bottom right panel shows the structure of Mg chelatase, which also belongs to the

    PS1BH clade. Additionally, it also shows the features that define the helix-2 insert clade and the displaced C-terminal helical bundle. Walker B (DE),

    sensor 1(polar residue), arginine finger and sensor 2 (arginine) are shown in all the structures.

    c

    L.M. Iyer et al. / Journal of Structural Biology 146 (2004) 1131 15

  • 8/14/2019 AAAplus New

    6/21

    16 L.M. Iyer et al. / Journal of Structural Biology 146 (2004) 1131

  • 8/14/2019 AAAplus New

    7/21

    L.M. Iyer et al. / Journal of Structural Biology 146 (2004) 1131 17

  • 8/14/2019 AAAplus New

    8/21

    P-loops, suggesting that they have been recruited for

    non-catalytic roles like proteinprotein and protein

    DNA interactions.

    The two families of the DnaA/CDC6/ORC clade can

    be traced back, respectively, to the common ancestors of

    the archaeo-eukaryotic and bacterial clades. This ob-

    servation, taken together with the mechanistic similarity

    of their role in replication initiation, suggests that the

    clade was represented by the common ancestor of the

    two families in LUCA (Giraldo, 2003). The recruitment

    of very different replication initiation complexes by these

    ATPases in archaea-eukaryotes and in bacteria might

    have contributed to the extensive sequence divergence

    between them.

    2.2.4. The classical AAA domains and associated diver-

    gent sister groups

    The classical AAA clade consists of all ATPases re-

    lated to the proteasomal ATPases, FtsH, and CDC48

    that originally were defined as the AAA superfamily

    (Confalonieri and Duguet, 1995). The main structural

    synapomorphy for this clade is the presence of an addi-

    tional short helix immediately downstream of strand 2

    (Figs. 1 and 3). The proteins of this clade strongly cluster

    to the exclusion of other AAA+ proteins in similarity-

    based clustering, and often have a conserved glycine

    N-terminal to the arginine finger. Since several detailed

    evolutionary investigations on this family have been

    published, we only mention here the broad evolutionary

    Fig. 3. Alignment of AAA+ ATPases. Multiple sequence alignment of selected representatives AAA+ ATPases families. The coloring reflects 80%

    consensus of residue conservation. Secondary structure assignments are shown above the alignment, where E signifies ab-strand and H a a-helix. The

    coloring is based on the 80% consensus shown underneath the alignment. The coloring scheme is as follows: h indicates hydrophobic residues(ACFILMVWY) shaded yellow, s indicates small residues (AGSVCDN), colored green, o indicates alcohol group containing residues (ST), shaded

    blue, and p indicates polar residues (STEDKRNQHC) colored purple. Strongly conserved polar residues are colored in red (RED). The predicted

    secondary structure elements are shown below the alignment. Specific synapomorphic characters have been boxed. Species abbreviations are as

    follows: AAV. Adeno-associated virus; Aqae, Aquifex aeolicus; At, Arabidopsis thaliana; Af, Archaeoglobus fulgidus; AcNPV, Autographa californica

    nucleopolyhedrovirus; Bs, Bacillus subtilis; phiC31, Bacteriophage phi-c31; Ce, Caenorhabditis elegans; Cj, Campylobacter jejuni; Dr, Deinococcus

    radiodurans; Ddi, Dictyostelium discoideum; Dm, Drosophila melanogaster; Ec, Escherichia coli; Gila, Giardia intestinalis; Hi, Haemophilus influenzae;

    Hs, Homo sapiens; Kpn, Klebsiella pneumoniae; Mlo, Mesorhizobium loti; Mjan, Methanocaldococcus jannaschii; Mac, Methanosarcina acetivorans;

    Mta, Methanothermobacter thermautotrophicus; Mex, Methylobacterium extorquens; Mm, Mus musculus; Polio, Poliovirus; Pfu, Pyrococcus furiosus;

    Ph, Pyrococcus horikoshii; Rhca, Rhodobacter capsulatus; Sc, Saccharomyces cerevisiae; Sp, Schizosaccharomyces pombe; SV40, Simian virus 40; Scoe,

    Streptomyces coelicolor; Sso, Sulfolobus solfataricus; Ssp, Synechocystis sp; Tac, Thermoplasma acidophilum; Thth, Thermus thermophilus; Ter,

    Trichodesmium erythraeum; VV, Vaccinia virus; and Vc, Vibrio cholerae.

    18 L.M. Iyer et al. / Journal of Structural Biology 146 (2004) 1131

  • 8/14/2019 AAAplus New

    9/21

    patterns seen in this clade. Several distinct families can

    be identified within this clade, with most of the diversity

    observed in eukaryotes. The only pan-bacterial family,

    FtsH, includes membrane proteins with a single AAA+

    ATPase domain fused to a C-terminal metalloprotease

    domain (Koonin et al., 2000b; Krzywda et al., 2002;

    Tomoyasu et al., 1995). Some members of the FtsH

    family have been laterally transferred to eukaryotes,

    apparently through the mitochondrial and chloroplast

    routes. Two major families, namely, the proteasomal

    ATPase family and the CDC48 family, which contains a

    tandem repeat of the AAA+ module, are conserved

    throughout the archaeo-eukaryotic branch (Beyer, 1997;

    Swaffield and Purugganan, 1997). In eukaryotes, the

    proteasomal ATPase family has undergone a massive

    proliferation: 6 orthologous lineages, namely S4, S6a,

    S6b, S7, S8, S10a, and S10b, can be traced back to the

    common ancestor of all crown group eukaryotes.

    Table 1

    Clamp loader clade Pre-sensor 1 hairpin superclade

    SR[CAT] motif associated with Arginine finger b-hairpin inserted before Sensor 1 motif

    *HolB/DNAX family SFIII helicase clade [viruses]

    Zn cluster between helix-1 and strand 2 *NCLDV D5 ATPase family

    **Hol B subfamily [B] *Positive strand RNA virus helicase family

    **DNAX subfamily [B, Pl] *Other viral helicases

    * RFC family HslU/ClpX/Lon/ClpAB-C clade

    **MTH241/RFC2 subfamily [AE] Extended loop after strand 2

    (In eukaryotes includes RFC2, RFC3, RFC4, RFC5) *HslU/ClpX family

    **MTH240/RFC1 subfamily [AE] **HslU subfamily [B, Kin, Api]

    (In eukaryotes includes RFC1, Chl12, RAD24, Yor144c) **ClpX subfamily [B, E]

    *WHIP family [B, E] *Lon family

    Fused to a C-terminal WC domain

    Fused to a Lon protease at the C-terminus

    **Bacterial lon protease subfamily [B, E]

    DnaA/ORC clade Fused to a LAN domain at the N-terminus

    Two helices of approximately equal length after strand 2 **Archaeal lon protease subfamily [A>B]

    *DnaA family *ClpAB-C family

    **DnaA subfamily [B] NRhD motif associated with Arginine finger

    **DnaC subfamily [B] **ClpAB-C subfamily [B, E]

    *CDC6/ORC family [A, E] **Torsin subfamily [E]**CDC6/ORC [A] Torsin has a distinct a helical N-terminal domain

    ** Orc1p, Orc4p, Orc5p, and Cdc6p subfamilies [E] RuvB family [B, E]

    Has C-terminal wHTH domain; Helical segment between strand

    5 and C-terminal bundle

    Classical AAA clade and its divergent relatives

    Small helix between strand-2 and helix-2, [GN]R motif associated with

    arginine finger

    Helix-2 insert clade

    *FtsH family [B, E] bab insert in helix-2

    Fused to a metalloprotease at the C-terminus NtrC/MCM group

    *CDC48 family [A, E] Have [AG][FL]T motif in helix-2 before bab insert

    DPBB at N-terminus, 2 AAA+ domains *NtrC family [B]

    *Proteasomal ATPase family [A, E] *MCM family [A, E]

    **S4, S6a, S6b, S7, S8, S10a and S10b subfamilies [E] Fused to a Zn ribbon at N-terminus and a wHTH

    at the C-terminus

    *Katanin p60/Fidgetin family [E] *Chelatase / YifB group*NSF1 family [E] **Chelatase family [A, B, plants]

    *Pex1/6 subfamily [E] Associated with a vWA domain

    *Bcs1p subfamily [E] **YifB family [B]

    (Possibl e rel ative s of classic al AAA c lade ) Fuse d to a Lon Protease at its N-te rminus, insert of Zn c luster

    after strand 4

    *AFG1 family [B, E] *McrB family [B>A, E]

    *ClpAB-N family [B>A, E] *Mox R group

    *TIP49 family [A, E] **MoxR family [A, B]

    N-terminal module with characteristic H[ST]H motif, Predictedb barrel

    between Helix-2 and Strand 3

    **Dynein/Midasin family

    6 Tandem AAA+ domains

    **Dynein subfamily [E]

    **Midasin subfamily [E]

    Abbreviations: A, Archaea; B, Bacteria; E, Eukaryotes; Api, Apicomplexans; Kin, Kinetoplastids; and Pl, Plants >

    indicates sporadic lateraltransfer.

    L.M. Iyer et al. / Journal of Structural Biology 146 (2004) 1131 19

  • 8/14/2019 AAAplus New

    10/21

    In addition to these archaeo-eukaryotic families,

    there are several distinct eukaryotic families that appear

    to have occupied specific functional niches (Beyer, 1997;

    Frohlich, 2001; Lupas and Martin, 2002; Ogura and

    Wilkinson, 2001; Swaffield and Purugganan, 1997) (also

    see Froehlichs website: http://aaa-proteins.uni-graz.at/

    AAA/Default.html). These include the Katanin p60/Fidgetin family, which is involved in microtubule dis-

    assembly, the NSF family with 2 AAA+ domains

    involved in membrane fusion, Pex1/6, which is involved

    in peroxisome biogenesis, and Bcs1p, which participates

    in assembly of mitochondrial membrane protein com-

    plexes. The NSF family, which shares a N-terminal

    double-w-b-barrel domain with the CDC48 family, is

    likely to have been derived from the latter family during

    the emergence of the eukaryotic secretory apparatus

    (Castillo et al., 1999; Coles et al., 1999). The Bcs1p

    family (Cruciat et al., 1999) is the most divergent of all

    the eukaryotic families and includes a distinctive plant-

    specific subfamily with an uncharacterized, conserved

    N-terminal globular domain. This subfamily has un-

    dergone a lineage-specific gene expansion in plants, with

    at least 15 members encoded in monocot and dicot ge-

    nomes. In contrast to these families, the highly divergent

    AFG1 family (Lee and Wickner, 1992) shows a bacterio-

    eukaryotic phyletic pattern (Table 1). It is present in

    most eukaryotic lineages, whereas, amongst the bacte-

    ria, it is predominantly seen in proteobacteria, with a

    few sporadic occurrences in Deinococcus and actino-

    mycetes. Grouping of the eukaryotic members of this

    family within the proteobacterial radiation suggests that

    this family is likely to have originated in bacteria, withsubsequent HGT to eukaryotes via the proto-mito-

    chondrion. Most of these families with limited or spo-

    radic phyletic patterns could have emerged through

    rapid divergence from the more ancient widely con-

    served members of the classical AAA clade.

    Members of the ClpAB ATPase family (Hoskins

    et al., 2001) have two AAA+ modules that are very

    different from each other in terms of sequence, structure,

    functions, and phylogenetic affinities (Volker and Lu-

    pas, 2002). Furthermore, in oligomers, the two domains

    appear to form two distinct homotypic ring structures

    stacked on top of one another (Guo et al., 2002). The

    N-terminal domain shares a characteristic feature with

    the classical AAA clade, namely, the additional small

    helix downstream of strand 2 (Figs. 1 and 3). This feature,

    along with the sequence similarity patterns, suggests that

    the ClpA N-terminal domain was probably derived

    through rapid divergent evolution after branching off

    from the classical AAA clade. In contrast, the C-ter-

    minal domain is related to the Lon and HslU-like

    ATPases (see below). Thus, the ClpAB ATPases appear

    to have evolved as a result of an ancient fusion of two

    phylogenetically distinct AAA+ ATPase modules, ra-

    ther than through tandem duplication (which would

    seem to be an intuitively appealing scenario given the

    head to tail juxtaposition of the 2 ATPase modules in

    these proteins) (also see Volker and Lupas, 2002). The

    ClpAB ATPases are important chaperones and stress-

    response proteins that are found in all bacterial lineages,

    often in at least two versions per bacterial proteome

    (Hoskins et al., 2001). Some bacteria, e.g., Pseudomonasaeruginosa, encode up to 8 members of this family. In

    eukaryotes, the ClpAB proteins are represented by

    Hsp104 and the mitochondrial heat shock protein

    Hsp78. These proteins are absent in the archaea with the

    exception of a single member in Methanothermobacter,

    which appears to have been transferred from the bac-

    teria. The eukaryotic ClpAB homologs are nested within

    the bacterial radiation (data not shown), suggesting that

    they have been acquired from the bacterial precursors of

    the mitochondria.

    The presence of at least one pan-bacterial and two

    pan-archaeo-eukaryotic families suggests that LUCA

    probably possessed a single ancestral representative of

    the classic AAA clade (Fig. 4). This ancestral form ap-

    pears to have initially diversified to occupy functional

    niches related to protein unfolding and degradation, and

    subsequently diversified to perform several other bio-

    logical functions in eukaryotes.

    2.2.5. The TIP49 family

    These proteins are DNA-stimulated ATPases that

    have been shown to associate with the TATA-binding

    protein and appear to play a critical role in the assembly

    of complexes related to transcriptional activation

    (Wood et al., 2000). This family has a single represen-tative in several archaea and two distinct orthologous

    groups, namely pontin and reptin (Rottbauer et al.,

    2002), in the crown group eukaryotes. Thus, the TIP49

    family appears to have emerged in the common ancestor

    of the archaeo-eukaryotic lineage, followed by a split

    into two paralogous lineages prior to the radiation of

    the eukaryotic crown group. This group is characterized

    by a small conserved N-terminal module and a re-

    markable insert of a novel predicted b-barrel domain

    upstream of the Walker B strand. Family-specific inserts

    are seen in this location in other AAA+ families (e.g.,

    HlsU) also and are likely to form a second toroidal

    structure stacked below the ATPase domain. Iterative

    database searches with profiles for the TIP49 family

    recover both classic AAA and clamp loader clade

    proteins as the best hits. However, multiple alignment-

    based secondary structure prediction suggests the pres-

    ence of an additional small helix downstream of strand

    2, just as in the classical AAA clade (Fig. 3). This sug-

    gests that Tip49 probably branched off from that clade,

    early in the common ancestor of the archaeo-eukaryotic

    lineage, through extensive sequence divergence. It has

    been claimed previously that TIP49 was related to the

    helicase subunit of the bacterial resolvasome, RuvB

    20 L.M. Iyer et al. / Journal of Structural Biology 146 (2004) 1131

    http://aaa-proteins.uni-graz.at/AAA/Default.htmlhttp://aaa-proteins.uni-graz.at/AAA/Default.htmlhttp://aaa-proteins.uni-graz.at/AAA/Default.htmlhttp://aaa-proteins.uni-graz.at/AAA/Default.html
  • 8/14/2019 AAAplus New

    11/21

    (Kurokawa et al., 1999). However, certain key features,

    such as a conserved b-hairpin that is characteristic of

    RuvB and its relatives (Figs. 13 see below), cannot be

    detected in the Tip49 family.

    2.2.6. The pre-sensor-1 b-hairpin (PS1BH) superclade

    All the remaining lineages of AAA+ domains,namely, the SF3 helicases, HslU and ClpX, Lon, torsin,

    the C-terminal AAA+ domain of ClpAB, RuvB, NtrC,

    MoxR, and its relatives, dynein, midasin, and the

    chelatases, can be unified into a vast monophyletic as-

    semblage. This entire superclade is defined by the pres-

    ence of a synapomorphic insert between the sensor-1

    strand and the preceding helix. Analysis of the struc-

    tures of T-antigen (SFIII) (Li et al., 2003), HslU

    (Bochtler et al., 2000), Mg chelatase (Fodje et al., 2001),

    and RuvB (Putnam et al., 2001; Yamada et al., 2002)

    shows that this insert forms a b-hairpin that projects out

    of the AAA+ core at a particular angle that is generally

    conserved in all the diverse proteins with this feature

    (Fig. 2). The HslU structure reveals that the hairpin

    forms a lid-like band in the oligomeric ring state

    (Bochtler et al., 2000). Sequence conservation associated

    with this region allowed us to identify the members of

    this assemblage for which 3D structures are not yet

    available. Hereinafter, we refer to this assemblage as the

    pre-sensor-1 b-hairpin (PS1BH) superclade. Below, we

    detail the various higher order clades that could be de-

    lineated within this large group of AAA+ ATPases

    (Fig. 4, Table 1).

    2.2.7. The SFIII helicase cladeSuperfamily III helicases were first identified in nu-

    merous small RNA viruses, e.g., picornaviruses and

    comoviruses, DNA viruses, such as the papovaviruses,

    parvoviruses, circoviruses, and baculoviruses (p143

    protein), and phages, e.g., P4 (Gorbalenya et al., 1990;

    Koonin, 1993). Subsequently, we identified a specific

    version of the SFIII helicase, typified by the vaccinia

    virus D5 protein, to be one of the synapomorphies of the

    nucleo-cytoplasmic large DNA virus clade (Iyer et al.,

    2001) (Table 1). However, other than prophage rem-

    nants, there are no SF3 helicases encoded in cellular

    genomes. Thus, this lineage of AAA+ ATPases might

    have originally evolved in primitive, small replicons that

    now are only represented by viruses. The emergence of a

    more complex DNA replication apparatus in the ar-

    chaeo-eukaryotic and bacterial branches of life might

    have displaced these simpler helicases in the cellular

    systems.

    2.2.8. The HslU/ClpX/Lon/ClpAB-C clade

    The chaperones and ATPase subunits of proteases

    appear to constitute another major monophyletic line-

    age within the PS1BH superclade (Fig. 4). This clade is

    supported by an extended loop between strand 2 and the

    helix downstream of it. Support for this clade is also

    offered by the consistent reciprocal recovery of these

    proteins in iterative profile searches. Furthermore,

    functional considerations also support the emergence of

    this clade from an ancestral ATPase that probably

    functioned as a cofactor for diverse proteases.

    Three distinct families can be identified within thisclade. The first of these, the HslU/ClpX family, is

    widespread in bacteria but absent in the currently

    available archaeal proteomes (Koonin, 1993). Within

    this family, two orthologous lineages, typified, respec-

    tively, by the E. coli HslU and CplX proteins are

    widespread in most major bacterial lineages. Orthologs

    of ClpX, which function in the mitochondria, have been

    detected in several eukaryotes (Corydon et al., 2000),

    whereas HlsU orthologs, which might also have a mi-

    tochondrial or plastid function, are known from kine-

    toplastids and apicomplexans (Couvreur et al., 2002).

    The phyletic pattern of the HslU/ClpX family suggests

    that they diversified into the two distinct orthologous

    groups prior to the radiation of the major bacterial

    phyla and were subsequently acquired by eukaryotes

    from the pro-mitochondrial endosymbionts. Interest-

    ingly, although HslU and ClpX belong to the same

    family of AAA+ ATPases, their protease partners, HslV

    and ClpP, respectively, belong to the unrelated NTN

    hydrolase (Pei and Grishin, 2003) and acyl-CoA decar-

    boxylase/isomerase superfamilies (Aravind and Koonin,

    1999a). The HslV protease is related to macropain, one

    of the proteases of the archaeo-eukaryotic proteasome.

    However, in the proteasome, the macropains function-

    ally interact with the ATPases of the classical AAAclade (Seemuller et al., 1995; Unno et al., 2002).

    The C-terminal domain of the ClpAB proteins and

    the Torsin proteins from animals comprise the second

    family (ClpAB-C family) of the HslU/ClpX/Lon/ClpAB-

    C clade. As discussed above, the two AAA+ domains of

    the ClpAB proteins are very different from each other

    (Guo et al., 2002; Volker and Lupas, 2002). The C-ter-

    minal domain has the hallmark hairpin of the PS1BH

    superclade and, in sequence profile searches the C ter-

    minal domain of ClpAB preferentially recovers other

    PS1BH proteins rather than the N-terminal domain of

    ClpAB. Torsin defines a group of animal proteins,

    which appear to be involved in the assembly of protein

    complexes in the endoplasmic reticulum (Basham and

    Rose, 2001; Bassler et al., 2001). Torsin is specifically

    related to the ClpAB C-terminal domain, but has a far

    more restricted phyletic pattern compared to the nearly

    pan-bacterio-eukaryotic spread of the ClpAB proteins

    (see above). Thus, it seems likely that torsin was derived

    specifically in the animal lineage through rapid diver-

    gence of a breakaway version of the ClpAB C-ter-

    minal domain.

    The Lon proteins from archaea and bacteria define

    the third major family (Lon family) within the HslU/

    L.M. Iyer et al. / Journal of Structural Biology 146 (2004) 1131 21

  • 8/14/2019 AAAplus New

    12/21

    ClpX/Lon/ClpAB-C clade. Two distinct lineages can be

    delineated within the Lon family (Koonin et al., 2000a).

    The first of these, the bacterial Lon lineage, is repre-

    sented in all the major groups of bacteria and is also

    seen in eukaryotes as the mitochondrial Lon protease.

    These proteins are characterized by a distinctive domain

    termed the LAN domain and a Lon-protease domain,

    which flank the AAA+ domain at the N- and C-termini,

    respectively (Fig. 5). The second lineage, LonB or ar-

    chaeal Lon (Fukui et al., 2002), has a pan-archaeal

    representation, with occasional representatives in bac-

    teria, such as low GC Gram-positive bacteria, c-prote-

    obacteria, Thermotoga and Treponema. They lack the

    LAN domain but have the C-terminal protease domain

    separated from the ATPase domain by a long segment

    predicted to assume a coiled-coil conformation. These

    phyletic patterns suggest that bacterial Lon emerged

    prior to the diversification of bacteria and was trans-

    ferred to the eukaryotes via the mitochondrial route.

    LonB (archaeal Lon) appears to have emerged prior to

    the archaeal diversification and probably was horizon-

    tally transferred to bacteria subsequently. This implies

    Fig. 4. Inferred evolutionary history of AAA+ ATPases. The figure shows several relative temporal epochs separated by the major evolutionary

    transitions that mark their boundaries. The solid colored bars indicate the maximum depth to which the AAA+ lineages can be traced with respect to

    these temporal epochs. The dashed lines indicate uncertainty in terms of the exact point of origin of a lineage. The ellipses bundle groups of lineages

    from which a new lineage with relatively limited phyletic pattern could have potentially emerged via rapid divergence. Colored circles at the terminal

    branches indicate broad functional categories: yellow, DNA replication and repair; blue, transcription; pink, chaperone or protein unfolding/deg-

    radation; and white, other specialized functions.

    22 L.M. Iyer et al. / Journal of Structural Biology 146 (2004) 1131

  • 8/14/2019 AAAplus New

    13/21

    that the LUCA had a single representative of the Lon

    family that subsequently diversified into the bacterial

    and archaeal Lons concomitantly with the divergence of

    these superkingdoms of life.

    Interestingly, the LAN appears to have acquired an

    independent existence of its own during bacterial evo-

    lution (Fig. 5). In several bacteria, such as Deinococcus,

    Cyanobacteria, and proteobacteria, it was detected in

    standalone proteins, whereas in certain eukaryotic pro-

    teins, typified by the fungal CrgA and its orthologs, it is

    fused to a N-terminal ubiqutin-E3 ligase (a RING fin-

    ger; Fig. 5). This domain architecture suggests that LAN

    domain was reutilized as a general adaptor for interac-

    tions with proteins that are targeted for degradation.

    2.2.9. The RuvB family

    RuvB is the helicase subunit of the bacterial Holliday

    junction resolvasome, which additionally includes RuvC,

    the resolvase, and RuvA, a DNA-binding protein asso-

    ciated with the complex (Putnam et al., 2001; Yamada

    et al., 2002). RuvB is ubiquitous in bacteria, but is absent

    in eukaryotes and archaea. The structure of RuvB clearly

    shows that this family belongs to the PS1BH superclade

    (Figs.1 and 3); however, it does not have any features that

    allow us to specifically place it within any of the other

    clades. The phyletic pattern of RuvB suggests that it

    evolved prior to the radiation of bacteria from their

    common ancestor, probably from one of the PS1BH

    families that was already present in the LUCA (Fig. 4).

    Fig. 5. A graph showing the domain architectures and select conserved functional interactions of the AAA+ ATPase. Direction of the arrow on an

    edge of the graph indicates the polarity (whether a domain is to the N- or C-terminus) of the domain fusions in a polypeptide. A two-headed arrow

    indicates that the fusions may occur either at the N or C terminus in different polypetides. The black edge indicates a physical interaction between two

    kinds of domains in a protein complex. The barbed arrowhead on an edge indicated a domain insertion (e.g., the Zn cluster inserted in bacterial

    clamp loaders). The domain abbreviations are: ANK, Ankyrin repeat; BAM, bromo-associated motif; BRCT, BRCA C-terminal domain; Bromo,

    bromodomain; CH, calponin homology domain; ClpN- N-terminal domain of ClpAB ATPases; DPBB, double-w-b-barrel; fHTH, Fis-type helix-

    turn-helix; wHTH, winged HTH; LAN, LA(LON)-N terminal domain; R-RING finger; REC, Receiver domain; REPO, insertb-barrel domain in

    Reptin and Pontin; WC-WHIP-C-terminal domain; ZNC1/2/3, zinc clusters; ZNR-Zinc Ribbon.

    L.M. Iyer et al. / Journal of Structural Biology 146 (2004) 1131 23

  • 8/14/2019 AAAplus New

    14/21

    2.2.10. The helix-2 insert clade

    The structure of the Mg chelatase reveals the presence

    of a unique insert within helix-2 of the conserved ASCE

    division P-loop ATPase core (Fodje et al., 2001). This

    insert folds into two b-strands that form hydrogen

    bonds with each other and flank a small helical region

    between them. This insert does not significantly distortthe axis of helix-2, and the small helical segment flanked

    by the b-strands appears to be a laterally displaced

    fragment of helix-2 (Fig. 2). A sequence profile corre-

    sponding to this region from the chelatases detects the

    insert in a specific group of families of AAA+ domains,

    and their sequence alignment reveals the presence of

    sequence conservation associated with this region

    (Fig. 3). Additionally, the families possessing this insert

    in helix-2 also, typically, contain an insert of a long

    helical segment between strand 5 (the C-terminal most

    strand of the AAA+ core) and the C-terminal a-helical

    bundle of the AAA+ module (Figs. 2 and 3). The crystal

    structure of the Mg chelatase reveals that this insert

    results in a displacement of the helical bundle from its

    usual position at the top of the core AAA+ P-loop

    module to below it (Figs. 2 and 3) (Fodje et al., 2001).

    Experiments on the Mg2 chelatase suggest that that

    upon binding ATP binding a large conformational

    change is likely to reorient the helical bundle back to the

    regular conformation (Hansson et al., 2002). Given the

    conservation of this helical region it is likely that such a

    conformational change is a common aspect of the

    functions of this entire clade. Accordingly, we unified

    the families containing these synapomporphic features

    into the helix-2 insert clade. Seven major families,namely NtrC, MCMs, McrB, chelatases, YifB, MoxR,

    and dynein-midasin were identified within this lineage.

    The NtrC family is purely bacterial in its distribution

    and co-occurs with its functional partner, the RNA

    polymerase sigma factor 54. Members of this family

    function as transcriptional activators that bind DNA at

    a site distal from the sigma 54 binding site and, upon

    interaction with the latter, catalyze ATP-dependent

    structural transitions required for transcription initia-

    tion (Rombel et al., 1998; Zhang et al., 2002). Most

    members of this family contain C-terminal helix-turn-

    helix (HTH) domains of the FIS family, which enables

    them to bind DNA. The NtrC family has diversified

    within the bacteria into several distinct subfamilies that

    have characteristic fusions with various N-terminal do-

    mains. These include the CheY-like receiver domain in

    NtrC, the GAF domain in FhlA, the ACT and PAS

    domains in TyrR, and the 4-vinyl reductase (4VR) do-

    main in XylR (Fig. 5; Anantharaman et al., 2001). These

    domains either connect the NtrC ATPase to the two-

    component signaling systems or function as small mol-

    ecule-binding domains that enable the ATPase to sense

    various metabolites in the environment (Rombel et al.,

    1998; Zhang et al., 2002).

    The MCM family, which is ubiquitous in the archaeo-

    eukaryotic family, has at least one member in all ar-

    chaea studied to date (Kelman and Hurwitz, 2003). In

    eukaryotes, it has diversified into six distinct ortholo-

    gous lineages that appear to have diverged from each

    other at an early stage of eukaryotic evolution. Mem-

    bers of this family are characterized by fusions to anN-terminal Zn-ribbon domain and a C-terminal DNA-

    binding winged-helix-turn-helix domain (Aravind and

    Koonin, 1999b). These proteins function as hexameric

    or heptameric ring helicases, which catalyze extensive

    unwinding of the DNA at the origin of replication

    during the initiation process (Fletcher et al., 2003; Yu

    et al., 2002).

    The McrB family is typified by the NTPase subunit of

    the Mcr restriction-modification system and differs from

    most of the AAA+ proteins in that it specifically utilizes

    GTP rather than ATP (Panne et al., 2001; Pieper et al.,

    1999). Orthologs of McrB are sporadically distributed in

    bacteria and the archaea, Pyrobaculum and Methano-

    sarcina, and are all encoded by the mobile operon that

    encodes the Mcr-type restriction-modification system.

    Interestingly, the McrB family is also represented in

    animals by proteins such as Unc-53, which is involved in

    axonal path finding (Stringham et al., 2002), HELAD-1

    and cortactin-binding protein-2 (CBP-2). These are

    large multi-domain proteins, which combine the AAA+

    module of the McrB family with an N-terminal ankyrin

    (in CBP-2) or Calponin homology domains (in Unc-53

    and Helad-1) (Fig. 5). The Helad-1 protein has been

    shown to possess 30 ! 50 helicase activity, but the rele-

    vance of this function for neuronal path finding is notclear (Ishiguro et al., 2002). It is likely that these pro-

    teins also play a role in the assembly of cytoskeletal

    complexes. The CBP-2 proteins have a disrupted P-loop

    and Walker B motif, suggesting that they are catalyti-

    cally inactive, and the AAA+ module probably only

    mediates specific interactions (Fig. 3). Since, among the

    eukaryotes, the McrB family so far has been detected

    only in animals, it seems likely that it was acquired

    relatively late in eukaryotic evolution via HGT from

    bacteria. McrB seems to represent a remarkable case

    where a protein has been recruited for a biological

    function that is completely different from its ancestral

    role, after trans-kingdom HGT.

    Members of the Chelatase family of the helix-2 insert

    clade catalyze the insertion of metal ions, such as Mg2

    and Co2, into the porphyrin rings during the biosyn-

    thesis of cofactors, such as chlorophyll and cobalamin

    (Fodje et al., 2001; Hansson et al., 2002). Members of

    this family are either fused to a C-terminal von Wille-

    brand factor A (vWA) domains or interact with stand-

    alone vWA domains in a multisubunit complex. The

    vWA domain functionally cooperates with the AAA+

    ATPase domain in the metal-insertion reaction (Fodje

    et al., 2001). The chelatase family is widespread in

    24 L.M. Iyer et al. / Journal of Structural Biology 146 (2004) 1131

  • 8/14/2019 AAAplus New

    15/21

    photosynthetic and autotrophic prokaryotes and plants.

    Although two branches, dominated, respectively, by

    bacterial and archaeal proteins can be seen in the phy-

    logenetic tree of the chelatases, there are a number of

    archaeal and bacterial proteins that cluster as sister

    groups in the tree (data not shown). This implies mul-

    tiple HGT events in the evolution of the chelatasefamily. Tree also shows that the plant members of this

    family have clearly been derived from the cyanobacterial

    precursor of the chloroplast.

    The YifB family, typified by the E. coli YifB protein,

    is found in several major bacterial lineages, such as cy-

    anobacteria, actinomycetes, Deinococcus, proteobacte-

    ria, certain spirochetes, Thermotoga, and Aquifex, and in

    a single archaeon, Methanothermobacter. This family is

    characterized by a fusion to an N-terminal Lon protease

    domain, suggesting that it probably functions as an

    ATP-dependent protease similar to Lon (Koonin et al.,

    2000a). Members of this family also contain a unique

    insertion of a Zn cluster just downstream of strand 4.

    Sequence comparisons show that YifB is closest to the

    chelatase family with which it shares certain distinct

    sequence signatures (Fig. 3). Thus, the Lon protease

    domain appears to have associated with two phyloge-

    netically distinct AAA+ domains on independent occa-

    sions in evolution. Furthermore, the Lon protease

    domain is also fused with an ATPase domain of the

    RecA class in the bacterial Sms protein (Aravind et al.,

    1999).

    The MoxR family is a large family that is represented

    in all the major lineages of bacteria and archaea. Some

    organisms, such as Mycobacterium tuberculosis, Pseu-domonas, and Aeropyrum pernix, encode up to 5 distinct

    members of this family. Several major subfamilies are

    recognizable within the MoxR family, including the

    classic MoxR subfamily, GvpN, YehL, APE0892, and

    YieN subfamilies. Despite their wide distribution, none

    of these proteins have been experimentally characterized

    in detail. MoxR is involved in the biogenesis of the

    methanol dehydrogenase complex (Van Spanning et al.,

    1991), while NirQ and CbbQ of the GvpN subfamily are

    required for the biogenesis of the nitric oxide reductase

    and Rubisco complexes, respectively (Hayashi et al.,

    1998, 1999). GvpN itself appears to participate in the

    formation of gas vesicles in diverse prokaryotes (Horne

    et al., 1991). Thus, the members of the MoxR family

    seem to function as chaperones in the assembly of spe-

    cific enzymatic complexes. Members of the YieN family

    co-occur with genes encoding proteins with vWA do-

    mains and, by inference are likely to functionally inter-

    act with them, similarly to the Mg chelatase family

    proteins. The phylogenetic tree of the MoxR family is

    similar to that of the chelatases, suggesting considerable

    lateral mobility of these genes within and between ar-

    chaea and bacteria (data not shown). Nevertheless, the

    nearly ubiquitous presence in the major archaeal and

    bacterial lineages suggests that this family emerged very

    early.

    The two giant multidomain ATPases from eukary-

    otes comprise the dynein/midasin family. Both dynein

    and midasin contain 6 tandem AAA+ domains in the

    same polypeptide (Mocz and Gibbons, 2001; Neuwald

    et al., 1999). Dynein functions as an ATP-dependentmotor in a large protein complex, which interacts with

    microtubules. Cytoplasmic dynein transports vesicles,

    organelles, and chromosomes (during cell division) in

    the retrograde direction, whereas flagellar dynein acts as

    motor for the movement of the eukaryotic flagellum

    (Vale, 2003). At least a single copy of dynein is encoded

    in all sequenced eukaryotic genomes, suggesting that it

    was present in the last common ancestor of eukaryotes.

    Very early in eukaryotic evolution, the dyneins diversi-

    fied into forms specialized in cytoplasmic and flagellar

    functions, and 12 or more paralogs of dynein are seen in

    flagellated early-branching eukaryotes, such as Giardia.

    Midasin also appears to be present in all eukaryotes and

    is associated with the nuclear pore complex involved in

    cytoplasmic export of the 60S ribosomal particle

    (Bassler et al., 2001; Garbarino and Gibbons, 2002). By

    analogy to dynein, midasin might act as a motor in the

    translocation of the ribosomal particles across the nu-

    clear pore. Animals have a second paralog of midasin

    with 4 AAA+ domains. All members of the midasin

    subfamily are associated with a C-terminal vWA do-

    main, suggesting that, as with the chelatases, the inter-

    action of the ATPase and vWA domains is required for

    the proteins function.

    Within the helix-2 insert clade, similarity-based clus-tering, sequence conservation patterns and reciprocal

    recovery in profile searches suggested a closer higher

    order relationship between the MoxR and dynein/mi-

    dasin families, on one hand, and the YifB and Chelatase

    families on the other hand (Fig. 4). Together, all these

    families are related to the McrB family, to the exclusion

    of the NtrC and MCM families (Fig. 4). The MCM and

    NtrC familes, in turn, might share a closer higher order

    relationship, as suggested by the sequence conservation

    pattern in the first strand of the helix-2 insert. These

    relationships, taken together with the phyletic patterns,

    suggest that one or two ancestral members of this clade

    were present in the LUCA and subsequently diversified

    into the families discussed above concomitant with the

    diversification of the major divisions of life.

    2.2.11. Non-AAA+ ATPases previously included in this

    class

    In addition to the families outlined above, several

    P-loop NTPases have been previously included in the

    AAA+ class. However, the present analysis showed that

    they lacked the defining features of this class and, ac-

    cordingly, do not belong with the bona fide AAA+

    proteins. The most notable case is that of FtsK, which

    L.M. Iyer et al. / Journal of Structural Biology 146 (2004) 1131 25

  • 8/14/2019 AAAplus New

    16/21

    functions as a DNA pump in bacteria (Aussel et al.,

    2002; Donachie, 2002). Detailed analysis of the FtsK

    sequence shows that it lacks the key features of the

    AAA+ superclass and instead has the defining features

    of the PilT/VirD4 class of P-loop NTPases, such as

    additional strands in the ATPase core (LA, unpublished

    observations). Thus, like other DNA pumps, such asTrwB, FtsK is a member of the PilT class (Gomis-Ruth

    et al., 2001).

    2.3. AAA+ ATPases present in LUCA and their evolution

    in the pre-LUCA era

    Based on the classification presented above, it can be

    conservatively inferred that LUCA had 56 distinct

    AAA+ domains. These include ancestors of: (1) the

    clamp loader clade, (2) the DnaA-Orc/Cdc6 clade, (3)

    the classic AAA clade, (4) LON ATPase family, (5)

    Mcm and NtrC families, and (6) MoxR, Chelatase,

    YifB, and Mcrb families (Fig. 4). Additionally, SFIII

    helicases were probably encoded in virus-like replicons.

    Thus, the founders of the higher order groups of the

    AAA+ class probably have already diverged from each

    other in the pre-LUCA era (Fig. 4).

    Sliding clamps are critical for the processivity of

    DNA polymerases and are utilized by most large DNA

    replicons, including all cellular life forms and large

    DNA viruses. Hence, the presence of a clamp loader

    ATPase in the LUCA points to the presence of a rela-

    tively large DNA-containing genome at this stage. This

    is also consistent with the presence of the DNA re-

    combinase RecA. However, as noticed previously, theenzymes that actually catalyze the critical steps of DNA

    replication, including the helicase, which initiates

    replication, the primase, the DNA polymerases, the

    DNA-ligase, proof-reading exonucleases, and Holliday

    junction resolvases, are not orthologous and, in many

    cases, not even homologous between the archaeo-

    eukaryotic and bacterial branches (Edgell and Doolittle,

    1997; Leipe et al., 1999). However, although the initia-

    tion helicases are non-orthologous in the bacterial and

    archaeo-eukaryotic clade (DnaB and MCMs, respec-

    tively), the ATPase responsible for their assembly,

    namely, the ancestral member of the DnaA/Cdc6/Orc

    clade, appears to have been present in LUCA ((Giraldo,

    2003) and this work). This peculiar conservation pattern

    of the replicative components suggests that the ancestral

    clamp loader and DnaA/Cdc6/Orc ATPases functioned

    in the context of a replication system that was dramat-

    ically different from those found in modern cellular life

    forms. One possibility is that genome replication in

    LUCA occurred via reverse transcription of relatively

    long RNA molecules, whereas the enzymes required for

    direct DNA replication were invented only after the

    separation of the archaeo-eukaryotic and bacterial

    branches; the clamp loader and initiator ATPase might

    have been parts of such a system (Leipe et al., 1999). The

    other scenario is that the ancestral DNA replication

    enzymes of LUCA were displaced later in evolution by

    independently invented replication enzymes in one or

    both of the principal divisions of life, perhaps with an

    active contribution from virus-like elements (Forterre,

    2002). However, the difficulty with the latter proposal isthat intrakingdom non-orthologous displacement of

    core replication enzymes is not observed among the

    extant living forms, although such displacements of

    DNA polymerases and other enzymes are common in

    DNA repair systems (Aravind et al., 1999; Eisen and

    Hanawalt, 1999). Furthermore, the displacement hy-

    pothesis would require concomitant displacement of

    enzymes catalyzing several distinct steps of the DNA

    replication process.

    The SF3 helicases present another enigma in the

    evolution of AAA+ ATPases. While they are extremely

    prevalent in selfish replicons, they are not (so far)

    represented in cellular genomes (Iyer et al., 2001). They

    could have potentially been the ancestral replicative

    helicases that were replaced by other helicases upon the

    origin of distinctive DNA replication systems in the two

    major branches of life. However, since the SF3 helicases

    form a derived lineage in the PS1BH superclade, they

    are unlikely to represent the most ancient version of the

    AAA+ ATPases. The functional diversity within the

    helix-2 insert clade of the PS1BH assemblage does not

    allow us to predict the functions of their 12 ancestral

    representatives, which might have been present in

    LUCA. Both chaperone-like and helicase activities are

    common in different families of this clade (Neuwaldet al., 1999; Ogura and Wilkinson, 2001), suggesting that

    the ancestral form could have been a generic ATPase

    possessing both these activities. Two AAA+ ATPases

    with potential chaperone or ATP-dependent protein

    unfolding activity, Lon and an ancestral classic AAA

    ATPase, apparently were represented in LUCA. This

    indicates that mechanisms for assembly and recycling of

    multidomain proteins and multisubunit protein com-

    plexes were already well advanced in LUCA.

    Several rounds of duplications within the AAA+

    class appear to have occurred during pre-LUCA

    evolution. In LUCA, the AAA+ ATPases probably

    performed two principal biochemical functions: (1) ca-

    talysis of ATP-dependent structural transitions in

    proteins and (2) nucleic acid-associated or stimulated

    ATPase or helicase activities. These biochemical activi-

    ties dominate all the extant branches of the AAA+

    superclass, with both activities exhibited by the bacterial

    LON ATPases (Fu et al., 1997). Thus, it is reasonable to

    assume that the common ancestor of the entire AAA+

    class was a generic ATPase that performed both of these

    activities without much specificity. Even after the radi-

    ation of the clamp-loader, DnaA/CDC6/ORC, classic

    AAA, and PS1BH lineages, most of these proteins, with

    26 L.M. Iyer et al. / Journal of Structural Biology 146 (2004) 1131

  • 8/14/2019 AAAplus New

    17/21

    the possible exception of the classical AAA family,

    probably retained some ability to perform both

    functions.

    The principal biochemical functions of the AAA+

    ATPases are closely linked to their ring quaternary

    structure, which allows them to thread peptides or nu-

    cleic acids through the central pore of a ring or providesa quasi-periodic surface for interactions (Neuwald et al.,

    1999; Ogura and Wilkinson, 2001; Zhang et al., 2002).

    This quaternary structure is even more widely repre-

    sented in the ASCE division of the P-loop NTPases

    (Egelman et al., 1995; Gomis-Ruth et al., 2001; Leipe

    et al., 2000). Within the ASCE division, the AAA+ class

    forms one branch, whereas most of the other ASCE

    ATPases form a second branch. Furthermore, within

    this second branch, one lineage includes the ABC

    ATPases, whereas the second lineage consists of the

    PilT, RecA/F1, and SF1/2 helicase N-terminal domains.

    Ring structures are additionally observed in the RecA/

    F1 and the PilT classes. The common ancestor of this

    branch of the ASCE ATPases can be reconstructed as a

    nucleic acid-associated ATPase. Given that the ancestral

    AAA+ ATPase also probably had a nucleic acid-asso-

    ciated ATPase activity, we suspect that the common

    ancestor of the entire ASCE division had a nucleic acid-

    associated function, perhaps as a low specificity helicase

    or a nucleic acid pump. Thus, the ring quaternary

    structure might have evolved as an ancestral feature of

    the ASCE ATPases, which formed these structures

    around double-stranded nucleic acids in the extended

    conformation. Subsequently, as the AAA+ ATPases

    diverged from the rest of the ASCE NTPases, theymight have utilized this ancestral structure in the context

    of peptide threading, in addition to nucleic acid un-

    winding.

    Of the ASCE ATPases, for which structures are

    currently unavailable, the AP-ATPases, NACHT

    GTPases, and uncharacterized archaeal ATPases con-

    stitute a major class that appears to share specific fea-

    tures with the AAA+ ATPases (Aravind et al., 2001;

    Koonin, 1997 and DDL, LMI, EVK, and LA, unpub-

    lished). These apparent synapomorphic features include

    the N-terminal helix preceding the Walker A motif and a

    5-stranded core sheet ending in an a-helical extension.

    Thus, together with AAA+ ATPases, these NTPases

    might constitute a major higher order assemblage within

    the ASCE division.

    2.4. The adaptive radiation of the AAA+ superfamily in

    the primary kingdoms

    The separation of the bacterial and archaeo-eukary-

    otic lineages appears to have been accompanied by a

    major, rapid diversification of AAA+ into several spe-

    cific roles primarily related to chaperone and protein

    degradation activities. This dramatic radiation included

    the emergence of ClpAB-N and C-terminal domains,

    FtsH, ClpX, HlsU, and YifB in the bacterial lineage,

    and the proteasomal and CDC48-like ATPases in the

    archaeo-eukaryotic lineage (Fig. 4). Furthermore, the

    Tip49 and MCM families with helicase activity evolved

    in the archaeo-eukaryotic lineage, whereas the tran-

    scription factor NtrC underwent an explosive diversifi-cation in the bacterial lineage. The second major phase

    in the diversification of AAA+ ATPases was concomi-

    tant with the origin of the eukaryotes. This diversifica-

    tion occurred primarily within the classical AAA clade

    (Beyer, 1997; Swaffield and Purugganan, 1997) and ap-

    pears to have generated proteins, which were critical for

    the emergence of many defining features of the eukary-

    otes. A second important eukaryotic innovation oc-

    curred in the helix-2 insert clade. Here, the ancient

    MoxR family appears to have diversified into dynein

    and midasin. These events appear to have been critical

    for the emergence of the eukaryotic nucleus (midasin is

    involved in nucleo-cytoplasmic transport of ribosomes),

    cytoplasm (dynein for cytoplasmic transport), and fla-

    gella (the dynein motor). In this context, it would be of

    interest to investigate whether some of the prokaryotic

    MoxR proteins have motor activity comparable to that

    of dynein.

    A significant aspect of the diversification of the

    AAA+ class is related to the fusion of the ATPase do-

    main with other globular domains in same polypeptide

    (Dougan et al., 2002). These fusions ensure physical

    proximity of the respective domains and are diagnostic

    of functional interactions (Huynen and Snel, 2000;

    Marcotte et al., 1999). We examined these domain fu-sions, as well as known physical interactions between

    AAA+ ATPases and other domains, by constructing a

    graph, which represents the entire network of such in-

    teractions (Fig. 5). Three general trends emerge from

    this analysis. First, the AAA+ domain underwent fu-

    sions with different protease domains on several inde-

    pendent occasions during evolution. The fusions include

    the Zn metalloprotease domain in FtsH and the Lon

    protease in the LON and YifB families. Furthermore,

    the proteasomal ATPases of the classic AAA+ clade

    interact with proteases of the NTN hydrolase (macro-

    pain) and JAB superfamilies (Verma et al., 2002). The

    distinct ClpP protease interacts with ClpX, while its

    paralog HslU interacts with HslV protease of the NTN

    hydrolase superfamily (Fig. 5). These fusions and in-

    teractions are generally reminiscent of the fusions and

    interactions of the SF1/2 helicases with diverse nucleases

    (Aravind et al., 1999). In contrast, there are hardly any

    domain fusions of AAA+ ATPases with nucleases, al-

    though there are a few cases of physical association, e.g.,

    McrB and some SF3 helicases. Fusions and associations

    with proteases are rare in the other branches of the

    ASCE division of ATPases. This observation suggests

    that origin of AAA+ ATPases marked the emergence of

    L.M. Iyer et al. / Journal of Structural Biology 146 (2004) 1131 27

  • 8/14/2019 AAAplus New

    18/21

    chaperone and related protein restructuring activities

    among the P-loop NTPases.

    Second, the vWA domain is fused or interacts non-

    covalently with different versions of the AAA+ domain,

    such as the chelatases, MoxR, and dynein/midasin. The

    S5a protein, a subunit of the eukaryotic proteasome,

    also contains a vWA domain that functionally interactswith the proteasomal AAA ATPases (Fig. 5). The vWA

    domains could function as adaptors that bring the

    ATPase domains to their substrates. Alternatively, in

    the case of the chelatases, vWA domain might bind

    Mg2 as part of the chelation reaction (Fodje et al.,

    2001). Other adaptor modules, such as the BAM do-

    main in the eukaryotic ORC1 proteins, the bromodo-

    main in the F11A10.1 proteins of the proteasomal

    ATPase family and the BRCT domain in RFC1 could

    allow these proteins to specifically associate with other

    proteins of the complex eukaryotic chromatin and DNA

    repair systems, respectively. Likewise, the ankyrin re-

    peats and the CH domain could mediate the interaction

    of Unc-53 and CBP-2 family with cytoskeletal compo-

    nents (Fig. 5).

    Finally, several independent fusions of the AAA+

    domain with different versions of the DNA-binding

    HTH domain are seen in the NtrC, MCM, Cdc6, DnaA,

    and RuvB families (Fig. 5). These fusions might have

    allowed the evolution of specific interactions between

    the AAA+ domains and nucleic acids. This could imply

    that the ancestral AAA+ domains non-covalently as-

    sociated with stand-alone HTH proteins to translocate

    to specific sites on the DNA, such replication origins.

    3. General conclusions

    Our understanding of the AAA+ ATPases, especially

    in structural and mechanistic terms, has vastly improved

    since the publication of the previous survey of this

    protein class (Neuwald et al., 1999). Using the wealth of

    structures and genomic information currently at our

    disposal, we identified the defining structural features of

    the AAA+ superclass and constructed an evolutionary

    classification along with a reconstruction of some major

    aspects of their evolutionary history. In particular, some

    aspects of the earliest stages of their evolution, the

    higher order divergence events and position of some

    highly divergent versions, such as SF3 helicases, dynein,

    and midasin, are becoming apparent. The AAA+

    ATPases are an ancient group of ATPases, which

    already showed considerable diversity in LUCA. The

    earliest diversification events in their evolution appear to

    correspond to the emergence of specific features related

    to processivity of DNA replication apparatus and

    assembly of the replication initiation complexes. The

    next great radiation gave rise to several distinct chap-

    erones, ATPase subunits of proteases, DNA helicases

    and transcription factors. The third major radiation at

    the base of the eukaryotic lineage probably contributed

    to the origin of eukaryote-specific adaptations related to

    nuclear and cytoskeletal functions. Some of the rela-

    tionships and domains reported here might provide new

    leads in investigating the functions and biology of

    AAA+ ATPases.

    4. Materials and methods

    Sequences of AAA+ proteins were extracted from the

    non-redundant (NR) protein sequence database (Na-

    tional Center for Biotechnology Information, NIH,

    Bethesda) using the PSI-BLAST program (Altschul

    et al., 1997), with the sequences of known AAA+ proteins

    as queries. Sequence similarity-based protein clustering

    was performed using the BLASTCLUST program (ftp://

    ftp.ncbi.nih.gov/blast/documents/README.bcl). Mul-

    tiple alignments were constructed using the Clustal X

    (Thompson et al., 1997) or T-Coffee (Notredame et al.,

    2000) programs and corrected on the basis of PSI-

    BLAST results and structural alignments, as previously

    described (Aravind and Koonin, 1999a). All newly re-

    covered sequences were evaluated on the basis of the

    presence of conserved AAA+ specific motifs, such as

    those associated with the N-terminal helix, sensor 1 and

    2, and the arginine finger, for differentiating them from

    other P-loop proteins. For each of the families recog-

    nized through these procedures, the phyletic distribution

    was evaluated in terms of the presence of homologues in

    a phylogenetically diverse sample of60 complete ge-nomes from the three primary kingdoms, Bacteria, Ar-

    chaea, and Eukaryota. The COG database was used as a

    guide for identifying orthologous proteins and their

    phyletic patterns (Tatusov et al., 2003).

    Phylogenetic trees were constructed using the

    PROTDIST and FITCH programs of the PHYLIP

    package with the default parameters (Felsenstein, 1996),

    followed by optimization via local rearrangements

    conducted using the maximum likelihood (ML) method

    with the JTTF substitution model as implemented in the

    MOLPHY package (Adachi and Hasegawa, 1992).

    Neighbor joining trees were constructed using the

    MEGA program (Kumar et al., 1994). Support for

    selected tree branches was measured by 10,000 resam-

    plings with the relative logarithmic boostrap (RELL-

    BP) procedures implemented in the MOLPHY package

    (Adachi and Hasegawa, 1992). For evolutionary re-

    constructions, the standard model of early evolution,

    which postulates the original split between the bacterial

    and archaeo-eukaryotic lineages, was employed as the

    null hypothesis (Brown and Doolittle, 1997).

    For structural co