Dioxygenases

Embed Size (px)

Citation preview

  • 7/30/2019 Dioxygenases

    1/8

    http://genomebiology.com/2001/2/3/research/0007.1

    Research

    The DNA-repair protein AlkB, EGL-9, and leprecan define newfamilies of 2-oxoglutarate- and iron-dependent dioxygenases

    L Aravind and Eugene V KooninAddress: National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA.

    Correspondence: L Aravind. E-mail: [email protected]

    Abstract

    Background: Protein fold recognition using sequence profile searches frequently allowsprediction of the structure and biochemical mechanisms of proteins with an important biologicalfunction but unknown biochemical activity. Here we describe such predictions resulting from ananalysis of the 2-oxoglutarate (2OG) and Fe(II)-dependent oxygenases, a class of enzymes that arewidespread in eukaryotes and bacteria and catalyze a variety of reactions typically involving theoxidation of an organic substrate using a dioxygen molecule.

    Results: We employ sequence profile analysis to show that the DNA repair protein AlkB, theextracellular matrix protein leprecan, the disease-resistance-related protein EGL-9 and several

    uncharacterized proteins define novel families of enzymes of the 2OG-Fe(II) oxygenasesuperfamily. The identification of AlkB as a member of the 2OG-Fe(II) oxygenase superfamilysuggests that this protein catalyzes oxidative detoxification of alkylated bases. More distanthomologs of AlkB were detected in eukaryotes and in plant RNA viruses, leading to the hypothesisthat these proteins might be involved in RNA demethylation. The EGL-9 protein fromCaenorhabditis elegans is necessary for normal muscle function and its inactivation results inresistance against paralysis induced by the Pseudomonas aeruginosa toxin. EGL-9 and leprecan arepredicted to be novel protein hydroxylases that might be involved in the generation of substratesfor protein glycosylation.

    Conclusions: Here, using sequence profile searches, we show that several previously undetectedprotein families contain 2OG-Fe(II) oxygenase fold. This allows us to predict the catalytic activity for awide range of biologically important, but biochemically uncharacterized proteins from eukaryotes and

    bacteria.

    Published: 19 January 2001

    Genome Biology2001, 2(3):research0007.10007.8

    The electronic version of this article is the complete one and can befound online at http://genomebiology.com/2001/2/3/research/0007

    2001 Aravind and Koonin, licensee BioMed Central Ltd(Print ISSN 1465-6906; Online ISSN 1465-6914)

    Received: 17 November 2000Revised: 14 December 2000Accepted: 12 January 2001

    Background2-Oxoglutarate (2OG)- and e(II)-dependent dioxygenases

    are widespread in eukaryotes and bacteria and catalyze a

    variety of reactions typically involving the oxidation of an

    organic substrate using a dioxygen molecule [1,2]. One well-

    studied reaction catalyzed by such enzymes is the hydroxyla-

    tion of proline and lysine sidechains in collagen and other

    animal glycoproteins [3-5]. In plants, enzymes of this family

    catalyze the formation of the plant hormone ethylene by

    oxidative desaturation of 1-aminocyclopropane-1-carboxylate,

    and catalyze the hydroxylation and desaturation steps in the

    synthesis of other plant hormones, pigments and metabo-

    lites such as gibberellins, anthocyanidins and flavones

    [1,6,7]. In bacteria and fungi, several members of this family

    participate in the desaturative cyclization and oxidative ring

    expansion reactions in the biosynthesis of antibiotics such as

  • 7/30/2019 Dioxygenases

    2/8

    2 Genome Biology Vol 2 No 3 Aravind and Koonin

    penicillin and cephalosporin [8-10]. The details of the cat-

    alytic mechanism of these enzymes have been revealed by

    determination of the crystal structures of isopenicillin N syn-

    thase (IPNS), deacetoxycephalosporin C synthase (DAOCS)

    and clavaminic acid synthase (CAS) [8-11]. These structures

    showed that the catalytic core of the proteins consists of adouble-stranded >-helix (DSBH) fold containing a HX[DE]

    dyad (where X is any amino acid) and a conserved carboxy-

    terminal histidine which together chelate a single iron atom.

    The substrates are bound within a spacious cavity formed by

    the interior of the DSBH (see the Structural Classification of

    Proteins (SCOP) [12]).

    We use sequence profile analysis [13,14] to show that the

    DNA-repair protein AlkB, the extracellular matrix protein

    leprecan and the disease-resistance-related protein EGL-9

    define new families of the 2OG-e(II) dioxygenase super-

    family. AlkB is widely represented in bacteria and eukary-

    otes and has an important role in countering the toxic DNAmodifications caused by alkylating agents in both

    Escherichia coliand Homo sapiens [15-17]. Despite consid-

    erable effort, the precise biochemical mechanisms of its

    action in DNA repair remain unknown. Recent studies have

    shown that AlkB is required for specifically processing

    lesions resulting from the alkylation of single-stranded (ss)

    DNA [18]. Our findings predict an unusual role for this

    enzyme in oxidative detoxification of DNA damage. The

    EGL-9 protein from Caenorhabditis elegans is necessary for

    normal muscle function, and its inactivation results in

    strong resistance to paralysis induced by the Pseudomonas

    aeruginosa toxin [19]. We predict that EGL-9 is a novel

    hydroxylase that could elicit its action through the modifica-tion of sidechains of intracellular proteins. Similarly, we

    show that the animal extracellular matrix protein leprecan

    [20] defines a hitherto unknown family of protein hydroxy-

    lases that might be involved in the generation of substrates

    for protein glycosylation.

    Results and discussionThe 2OG-Fe(II) dioxygenase protein superfamily:

    classification and functional prediction

    The Non-redundant Protein Sequence Database (NCBI) [21]

    was searched using the PSI-BLAST program [22] run to con-

    vergence, with a profile-inclusion threshold of 0.01 and AlkB

    protein sequences from various organisms as queries. In

    addition to the AlkB orthologs, these searches retrieved from

    the database, with statistically significant expectation (e)

    values, several other more distant homologs of AlkB, includ-

    ing uncharacterized eukaryotic proteins and fragments of

    the polyproteins of plant RNA viruses from the carla-,

    tricho- and potexvirus families. Examples of homologsfound include: Leishmania L3377.4, iteration 5, e-value =

    8 x 1 0-7;Drosophila CG17807, iteration 3, e-value = 4 x 10-6;

    papaya mosaic virus, iteration 3, e-value = 2 x 10-4. urther

    iterations of the search using each of the detected proteins as

    a new query resulted in the detection of several more eukary-

    otic proteins, including EGL-9 and leprecan, several

    uncharacterized bacterial proteins and prolyl and lysyl

    hydroxylases. inally, another iteration of database searches

    initiated with the sequences of bacterial proteins, typified by

    E. coliYbiX, resulted in the unification of these proteins with

    plant dioxygenases such as leucoanthocyanidin oxidase and

    gibberellin-20 oxidase. In this context, it should be noted

    that the DNA-repair proteins typified byE. coliAlkB areunrelated to the alkane omega-hydroxylase typified by the

    Figure 1 (see pages 3 and 4)Multiple sequence alignment of the 2OG-Fe(II) dioxygenase superfamily. Individual protein families are separated by blanklines and a brief description of each family is given to the right of the alignment. The numbers at the ends of the alignmentindicate the position of the first and last of the aligned residues in the respective protein sequences. The consensus secondarystructure is shown above the alignment in uppercase letters. It was derived by taking those elements that are shared by thepredicted structures of individual families and the experimentally determined structures; H indicates = helix and E indicatesextended conformation (> strand). The lowercase letters represent extensions of the secondary structure elements that areseen in some, but not all, members of the superfamily. The conserved amino-terminal extensions that are specific only to agiven family are separated from the rest of the alignment by vertical lines. The coloring of the alignment columns is accordingto the 85% consensus that is shown underneath the alignment and includes the following categories of amino acid residues:h, hydrophobic; l, aliphatic; a, aromatic (Y, F, W, H, L, I, V, M, A, all shaded yellow); s, small (S, A, G, T, V, P, N, H, D, shaded

    blue); b, big (K, R, E, Q, W, F, Y, L, M, I, shaded gray); +, positively charged (K, R, H; colored magenta). The (predicted)catalytic residues are indicated by asterisks and with reverse red shading. The proteins are designated by the protein/genename, the species abbreviation and the gene identification (GI) number. Protein abbreviations are: CAS, clavaminic acidsynthase; DAOCS, deacetoxycephalosporin C synthetase; EFE, ethylene-forming enzyme; FLAS, flavonol synthase; Ga20Ox,giberellin 20-oxidase; IPNS, isopenicillin N synthase; LDOX, leucoanthocyanidin hydroxylase; Lep, leprecan; P4HA, prolyl-4-hydroxylase; PLO, lysyl hydroxylase; SanF and SanC, enzymes involved in nikkomycin biosynthesis. The remaining names arethe standard names of the genes that encode the respective proteins. Species abbreviations: At,Arabidopsis thaliana;Bb, Borrelia burgdorferi; Cc, Caulobacter crescentus; Ce, Caenorhabditis elegans; Ci, Ciona intestinalis; Dm, Drosophila melanogaster;Ec, Escherichia coli; Em, Emericella nidulans; Hs, Homo sapiens; Lc, Lysobacter lactamgenus; Le, Lycopersicon esculentum;Mtu,Mycobacterium tuberculosis; Nc, Neurospora crassa; Pa, Pseudomonas aeruginosa; Pet, Petunia hybrida; Rr, Rattus rattus;Sc, Saccharomyces cerevisiae; Sp, Schizosaccharomyces pombe; Sot, Solanum tuberosum; Scoe, Streptomyces coelicolor;Scan, Streptomyces ansochromogenes; Scla, Streptomyces clavuligerus; Ssp, Synechocystis; Vc, Vibrio cholerae; ASPV, apple stempitting virus; ACLSV, apple chlorotic leaf spot virus; BSV, blueberry scorch virus; GLV, garlic latent virus; GVA, grapevinevirus A; PBCV, Paramecium bursaria chlorella virus; PMV, papaya mosaic virus; SHVX, shallot virus X.

  • 7/30/2019 Dioxygenases

    3/8

    http://genomebiology.com/2001/2/3/research/0007.3

    Ps. oleovorans protein also named AlkB. ortuitously, these

    latter alkane hydroxylases are also oxygenases; however,

    they are not 2OG-e(II) dioxygenases but a distinct class of

    di-iron enzymes [23]. On the basis of the results of database

    searches with representative sequences, we delineated

    several distinct families within the 2OG-e(II) dioxygenase

    fold and constructed individual alignments for each using

    the ClustalW program [24]. Secondary structure was pre-

    dicted for each family using the PHD [25] and PSI-PRED

    [26] programs. Using the secondary structure elements

    Figure 1 (continued on the next page)

    Secondary Structure: .hhhhhhHHHHHHHHHHHhhhhhhhhhhhh........EEEEEEEEE.................eeeeeee..

    YBIX_Ec_3025022 23 PQDVARFREQLEQAEWVDGRVTTGAQGAQVKNNQQVDT|RSTLYAALQNEVLNAVNQHALFFAAA-------LPRTLSTPLFNRYQ-------------NNETYGFHVDGAV

    PA4515_Pa_9950756 11 AEEVSRIRAALEQAEWADGKATAGYQSAKAKHNLQLPQ|DHPLAREIGEAMLQRLWNHPLFMSAA-------LPLKVFPPLFNCYT-------------GGGSFDFHIDNAV

    Ybix_Bsep_5777381 11 PAEAAQIRARLEAADWVDGKVTAGYQSAQVKHNRQLSE|QHPLAQELGGLILQRLAANNLFMSAA-------LPRKIFPPLFNRYE-------------GGEAFGYHVDNAL

    XF0598_Xf_9105464 11 RTQATSMQERLAAANWTDGRETVGPQGAQVKHNLQLPE|TSPLRQELGNEILDALARSPLYFAAT-------LPLRTLPPRFNCYQ-------------ENHQYGFHVDGAV

    Lep_Rr_5805194 497 SPHTPNEKFYGVTVLKALKLGQEGK--VPLQS|AHMYYNVTEKVRRVMESYF-------------RL-DTPLYFSYSHLVCRTAIE-ESQAERKDSSHPVHVDNCI

    AK025841_Hs_10438479 159 SPHTPNEKFYGVTVFKALKLGQEGK--VPLQS|AHLYYNVTEKVRRIMESYF-------------RL-DTPLYFSYSHLVCRTAIE-EVQAERKDDSHPVHVDNCI

    FLJ10718_Hs_8922619 317 SPHTPNEKFEGATVLKALKSGYEGR--VPLKS|ARLFYDISEKARRIVESYF-------------ML-NSTLYFSYTHMVCRTALS-GQQDRRNDLSHPIHADNCL

    Lep_Ci_9229924 216 SPHSEHELFQGMTVYLAAKLAEKGK--VPPQT|AALYYKLSEEARLQVKMYF-------------KL-TQELYFDYTHLVCRTTVK-GKPVKRTDLSHPVHSDNCL

    AK025875_Hs_10438524 146 DLHSGALSVGKHFVNLYRYFGDKIQNIFSEED|FRLYREVRQKVQLTIAEAF-------------GISASSLHLTKPTFFSRINST-EARTAH-DEYWHAHVDKVT

    EGL-9_Ce_5923812 418 FHIKDIRSDHIYWYDGYDGR|AKDAATVRLLISMIDSVIQHFKKRIDHDIG------GRSRAMLAIYP------------GNGTRYVKHVDNPV

    AK025273_Hs_10437756 68 VSKRHLRGDQITWIGGNEEG|---CEAISFLLSLIDRLVLYCGSRLGKYYV-----KERSKAMVACYP------------GNGTGYVRHVDNPN

    CG1114_Dm_7296772 97 VRGDKIRGDKIKWVGGNEPG|---CSNVWYLTNQIDSVVYRVNTMKDNGILGNYHIRERTRAMVACYP------------GSGTHYVMHVDNPQ

    VCA0949_Vc_9658386 53 QRAADIRSDKIQWLDLSMGQ|PVQDYLERMEQIRCEVNRHFFLGL-----------FEYEGAHFAKYE-------------AGDFYLKHLDSFR

    PA0310_Pa_9946156 85 VIREGIRGDLTQWLEPGESE|ACDEYLGVMDSLRQALNASLFLGL-----------EDFEGCHFALYP-------------PGAYYQKHVDRFR

    P4HA_Hs_4505565 360 LSRATVHDPE--TGKLTTAQYRVSKSAWLSGYENP|-----VVSRINMRIQDLTGLDV-------------STAEELQVANYG-------------VGGQYEPHFDFAR

    CG9728_Dm_7301967 798 MHRSTVNPLP--GGQLKKSAFRVSKNAWLAYESHP|-----TMVGMLRDLKDATGLDT-------------TFCEQLQVANYG-------------VGGHYEPHWDFFR

    CG1546_Dm_10726875 351 MVRSAVAGS---GGEGTVRDLRVSQQTWLDYKS-P|-----VMNSVGRIIQFVSGFDM-------------AGAEHMQVANYG-------------VGGQYEPHPDYF-

    P4HA_Ce_1709529 352 LARATVHDSV--TGKLVTATYRISKSAWLKEWEGD|-----VVETVNKRIGYMTNLEM-------------ETAEELQIANYG-------------IGGHYDPHFDHAK

    P4HA_Dm_7301948 141 FRRATVQNSV--TGALETANYRISKSAWLKTQEDR|-----VIETVVQRTADMTGLDM-------------DSAEELQVVNYG-------------IGGHYEPHFDFAR

    CG9713_Dm_7301967 354 LKRATVYQAS--SGRNEVVKTRTSKVAWFPDGYNP|-----LTVRLNARISDMTGFNL-------------YGSEMLQLMNYG-------------LGGHYDQHYDFFN

    C14E2.4_Ce_1166595 116 MNEQKVVRDD--GEIAYSTYRQANGTITPAHSHAE|-----AQSLMDTATQLLPVFDF-------------QYTEQISALSYI-------------KGGHYALHTDFLT

    Y43F8B.4.b_Ce_3947627 347 FEEQLVVNDD--GNDIVSKIRRANGTQVFHEDHPA|-----ARSIWDTAKNLLPNLNF-------------KTAEDILALSYN-------------PGGHYAAHHDYLL

    T20B3.7_Ce_7508029 39 LEIQKTS-DF--GTSIETTHRRANGSFIPPEDSNV|-----TVEIKMQAQKRIPGLNL-------------TVAEHFSALSYL-------------PGGHYAVHYDYLD

    CG9708_Dm_10726879 1134 LERAKVFRVE--KGSDEIDPSRSADGAWLPHQNID|PDDLEVLNRIGRRIEDMTGLNT-------------RSGSKMQFLKYG-------------FGGHFVPHYDYFN

    P4HA_At_6437556 70 LQRSAVADND--NGESQVSDVRTSSGTFISKGKDP|-----IVSGIEDKLSTWTFLPK-------------ENGEDLQVLRYE-------------HGQKYDAHFDYF-

    P4AH_PBCV_9631654 76 LIKSEVGGATENDPIKLDPKSRNSEQTWFMPGEHE|VIDKIQKKTREFLNSKKHCIDK-------------YNFEDVQVARYK-------------PGQYYYHHYDGDD

    B12F1.100_Nc_9368561 314 YLPDAPIKED--GSES---SILAHNVYWIVDQTFH|DALW-ERVRPFIPTHDAGRKAR-------------GINRRFRVYRYV-------------PGAEYRCHFDGAW

    PLOD_Ce_6093732 565 ERFCEELIEEMEGFGRWSDGSNNDKRLAGGYENVPTRDIHMNQVGF|ER-QWLYFMDTYVRPVQEKTFIGYYHQ-------PVESNMMFVVRYK-----------PEEQPSLRPHHDAST

    PLO_Dm_7294743 556 DAFCDDLVAIMEAHNGWSDGSNNDNRLEGGYEAVPTRDIHMKQVGL|ER-LYLKFLQMFVRPLQERAFTGYFHN-------PPRALMNFMVRYR-----------PDEQPSLRPHHDSST

    PLO2_Hs_6093730 573 EKACDELVEEMEHYGKWSGGKHHDSRISGGYENVPTDDIHMKQVDL|EN-VWLDFIREFIAPVTLKVFAGYYT--------KGFALLNFVVKYS-----------PERQRSLRPHHDAST

    PLO1_Hs_400205 563 EVACDELVEEMEHFGQWSLGNNKDNRIQGGYENVPTIDIHMNQIGF|ER-EWHKFLLEYIAPMTEKLYPGYYT--------RAQFDLAFVVRYK-----------PDEQPSLMPHHDAST

    PLO3_Hs_6093731 574 EQMCDELVAEMEHYGQWSGGRHEDSRLAGGYENVPTVDIHMKQVGY|ED-QWLQLLRTYVGPMTESLFPGYHT--------KARAVMNFVVRYR-----------PDEQPSLRPHHDSST

    MRC8.21_At_9294077 177 PSFCEMMLAEIDNFERWVGETKFRIMRPN---TMNKYGAVLDDFGL|DT-MLDKLMEGFIRPISKVFFSDVGGA-------TLDSHHGFVVEYG-----------KDRDVDLGFHVDDSE

    K9D7.16_At_10178205 135 PDFFQKLLVEVENMRKWLHEAKLMIRKPN---NKSKYGVVLDDFGM|DI-MLKPLVEDFIFPICKVFFPQVCGT-------MFDTQHGFVIENC-----------EDRDAELGFHVENSD

    AK023553_Hs_10435524 87 APFCQALLEELEHFEQSDMPKGRPN-------TMNNYGVLLHELGL|DEPLMTPLRERFLQPLMALLYPDCGGG-------RLDSHRAFVVKYA-----------PGQDLELGCHYDNAE

    SPBC6B1.08c_Sp_7491725 91 NMLKYLRQVRDA|LY----SEEFRSHVQKITGCGP-----------LSASKKDLSVNVYS-------------KGCHLMNHDDVI-

    Yer049wp_Sc_731462 108 SRLPNLFKLRQI|LY----SKQYRDFFGYVTKAGK-----------LSGSKTDMSINTYT-------------KGCHLLTHDDVI-

    KIAA1612_Hs_10047299 115 RREPHISTLRKI|LF-----EDFRSWLSDISKID------------LESTIFDMSCAKYE--------------TDALLCHDDEL-

    CG18761_Dm_10726758 137 MPACRLLTNFLQ|VL----RKQVRPWLEKVTNLK------------LD--YVSASCSMYT-------------CGDYLLVHDDLL-

    C17G10.1_Ce_861277 104 INPVENPAVFSF|RQ--FLYKEVKEWLQNVSGVE------------LTE-QVDCNGSCYA-------------RTDSLLPHNDLI-

    SanC_Scan_7407127 77 DQVRTLPANWQQ|LVADVVSRPYREALSALTGVD------------LERCLVEARMTRYA-------------RGCWIEPHTDRP-

    SanF_Scan_8389327 71 PVFPRLPGVWRD|LVEDLRGAEFTAWLEKSTGIE------------LAGLQRSIGLYTHR-------------NGDYLSVHKDKP-

    sll0428_Ssp_1653558 203 MFLPIFAPFSEL|II-----ERVKAILPQLISDLQ--------IQPFQIDYVEAQLTAHN-------------HGNYYKVHNDNG-

    sll0191_Ssp_7469866 154 LFSQKIPALSAL|IR-----ERIKQKLPELLGQLN--------FSPFEVAEIELQLTAHN-------------DGCYYRIHNDAG-

    CAS_Scla_322266 53 FNAEGSEDGHLLLRGL--PVEADADLPTTP-----------SSTP|----APEDRSLLTMEAMLGLVGRRLGLHTG--YRELRSGTVYHDVYP-SPGAHHL-SSETSETLLEFHTEMA-

    IPNS_En_124825 104 CYLNPNFTPDHPRIQAKTPTHEVNVWPDETKHPG----FQDFAEQ|YYWDVFGLSSALLKGYALALGKEENFFARH-FKPDDTLASVVLIRYPYLDPYP3KTAADGTKLSFEWHEDVS-

    FLAS_Pet_421946 136 VEGKKGWVDHLFHKIWPPSAVNYRYWPKNPPS------YREANEE|YGKRMREVVDRIFKSLSLGLGLEGHEMIEA-AGGDEIVYLLKINYYP--PCPR-----PDLALGVVAHTDMS-

    LDOX_Pet_1730108 137 ACGQLEWEDYFFHCAFPEDKRDLSIWPKNPTD------YTPATSE|YAKQIRALATKILTVLSIGLGLEEGRLEKEVGGMEDLLLQMKINYYP--KCPQ-----PELALGVEAHTDVS-

    Srg_At_479047 135 EDQKLDWADLFFHTVQPVELRKPHLFPKLPLP------FRDTLEM|YSSEVQSVAKILIAKMARALEIKPEELEKL-FDDVDSVQSMRMNYYP--PCPQ-----PDQVIGLTPHSDSV-

    EFE_Le_398992 80 EVTDLDWESTFFLRHLPTS--NISQVPDLDEE------YREVMRD|FAKRLEKLAEELLDLLCENLGLEKGYLKNAFYGSKGPNFGTKVSNYP--PCPK-----PDLIKGLRAHTDAG-

    Ga20Ox_Sot_10800976 156 LQGSSHMVDQYFLKTM------GEDFSH----------IGKFYQE|YCNAMSTLSSGIMELLGESLGVSKNHFKQ---FFEENESIMRLNYYP--TCQK-----PDLALGTGPHCDPT-

    PA0147_Pa_9945977 97 FKETFDMALHLPAEHPDVRAGKSFYGPNRHPDLP--GWEALLEGH|Y-ADMLALARTVLRALAIALGIEEDFFDR---RFEQPVSVFRLIHYP--PASA---RQSADQPGAGAHTDYG-

    PA4191_Pa_9950401 98 WKEGLYLGSELDAEHPEVRAGTPLHGANLFPEVP---GLRETLLE|YLDATTRVGHRLMEGIALGLGLEADYFAAR--YTGDPLILFRLFNYPSQPVPE----GLDVQWGVGEHTDYG-

    ISP7_Sp_729862 181 FRESFYFGNDNLSKDRLLR---PFQGPNKWPSTAGS-SFRKALVK|YHDQMLAFANHVMSLLAESLELSPDAFDE---FCSDPTTSIRLLRYP----------SSPNRLGVQEHTDAD-

    SPCC1494.01_Sc_7491815 81 FGEKLDMEHQSRGDLKESYDLAGFPDPKLENLCPFIAEHMDEFLQ|FQRHCYKLTLRLLDFFAIGFGIPPDFFSK---SHSSEEDVLRLLKYSI-PEGV---ERREDDEDAGAHSDYG-DAOCS_Lyl_769809 89 CTNTGDYSDYAMVYSMGIS---GNIFPTAH--------FERLWSD|YFDLYYGISRQAARAVLESMDVHLNTDID---ALIDCDPVLRYRYFPDVPEDR---CAEQQPNRMAPHYDLS-

    RRPO_SHVX_548840 610 KGRLAAFYSRDGQGYSYTGYSH--|KSQGWL-EGLDKLIEACGEKPT--------------TYNQCLVQKYE-------------QGSRIGFHSDEQA

    POL_ASPV_487652 719 KGRGASFYSRDLKGYSYTGFSH--|VSRGWP-AFLDKFLSDNKIPLN--------------FYNQCLVQEYS-------------TGHGLSMHKDDES

    POL_BSV_409711 708 KGRDAAWYSKEDREYKYNGGSH--|LCRGWP-KWLQLWMQANGVDE---------------TYDCMLAQRYG-------------AQGKIGFHADNEE

    RRPO_PMV_139137 554 TGRKAWFFSKDGKPYSYTGGSH--|ASRGWP-NWLEKILAAIEIKEP------------LPEFNQCLVQQFK-------------LQAAIPFHRDDEP

    POL_GLV_1154656 638 SNRDAAFYSKGSFSYNCNGGYH--|TGQEWL-GEFDSFLAINGHDLE--------------YFNCVLFQKYD-------------GGHGIGFHRDDEE

    Pol_GVA_1405615 603 KGREVAFYSRHSKEYKYNGGSH--|RSLGWD-EALNELTQELGLDD---------------SYDHCLIQRYT-------------AGGSIGFHADDEP

    RRPO_ACLSV_1710717 704 NRKAAYFCIDYPMVYFHDKISY--|PTFEAT-GEIKQIIMRARDKWG-------------ANFNSALIQVYN-------------DGCRLPLHSDNEE

    T13L16.2_At_2708738 266 RGKGRETIQFGCCYNYAPDRAG--|NPPGILQREEVDPLPHLFKVIIRKLIK-WHVLPPTCVPDSCIVNIYD-------------EGDCIPPHIDNHD

    T19K4.220_At_3036813 264 RGKGRVTIQFGCCYNYAPDKAG--|NPPGILQRGDVDPMPSIFKV----------------IIKSCIVNIYE-------------EDDCIPPHIDNHD

    At2g48080_At_4249414 197 ETFVLFNKNTKGTKRELLQLGVP-|IFGNTTDEHSVEPIPTLVQSVIDHLLQWRL-IPEYKRPNGCVINFFDQ------------P-FQKPPHVD---

    AK000315.1_Hs_7020317 111 APLRNKYFFGEGYTYGAQLQKRGP|GQERLYPPGDVDEIPEWVHQLVIQKLVEHRVIP--EGFVNSAVINDYQ------------PGGCIVSHVDPIH

    CG17807_Dm_7291441 167 GSLKHRNVKHFGFEFLYGTNN---|VDPSKP---LEQSIPSACDILWPRLNSFASTW-DWSSPDQLTVNEYE-------------PGHGIPPHVDTHS

    CG6144_Dm_7297712 26 LSHIERTPKPRWTQLLNRRLVNY-|GGVPHPNGMIAEEIPEWLQTYVDK--VNNLGVFESQNANHVLVNEYL-------------PGQGILPHTDGPL

    CG4036_Dm_7297561 93 DLDDLPWDISQSGRRKQNFGPKTN|FKKRKLRLGSFAGFPRTTEYVQRRFED--VPLLRGFQTIEQCSLEYEPS-----------KGASIDPHVDDCW

    FLJ2001_Hs_38923019 91 LMDRDPWKLSQSGRRKQDYGPKVN|FRKQKLKTEGFCGLPSFSREVVRRMGL--YPGLEGFRPVEQCNLDYCPE-----------RGSAIDPHLDDAW

    C14B1.10_Ce_6580210 188 QSLKHRAVVHFGHVFDYSTNS---|-ASEWK---EADPIPPVINSLIDRLIS---DKYITERPDQVTANVYE-------------SGHGIPSHYDTHS

    SPAP8A3.02c_Sp_7491301 62 VQRNLINNVPKELLSIYGSGKQ--|SHLYIPFPAHINCLNDYIPSDFKQR------LWKGQDAEAIIMQVYN-------------PGDGIIPHKDLEM

    L3377.4_Lm_9989036 49 DASNLRKGYVDVYTRASDRIILND|GRFQLPPLPPASFMPLLERLEQDN-------VVPKSWLNNQTANLYE-------------PGDFIRAHIDNLF

    MTCI237.14c_Mtu_2052134 58 RRQMYDRVVDVPRLVSFHDLTI--|EDPPHPQLARMRRRLNDIYGGELG----------EPFTTAGLCYYRD-------------GSDSVAWHGDTIG

    AlkB_Cc_2055386 43 ALGSLGWTSDARGYRYVDRHPE--|TGRPWP-DMPPALLDLWTVLGD-----------PETPPDSCLVNLYA-------------TGARMGLHQDRDE

    ALKB_Ec_113638 63 NCGHLGWTTHRQGYLYSPIDPQ--|TNKPWPAMPQSFHNLCQRAATAAGY--------PDFQPDACLINRYA-------------PGAKLSLHQDKDE

    AlkB_Scoe_8894829 60 RQVCLGRHWYPYGYAATAVDGD--|GAPVKPFPARLDGLARRAVTDALGAEAV-----APAPYDIALINFYD-------------ADARMGMHRDADE

    AlkB_At_4835778 173 LLRKLRWSTLGLQFDWSKRNYD--|--VSLPHNNIPDALCQLAKTHAAI----AMPDGEEFRPEGAIVNYFG-------------IGDTLGGHLDDMEAlkB_Sp_3080529 131 VHKKLRWVTLGEQYDWTTKEYP--|DPSKSP-GFPKDLGDFVEKVVK------ESTDFLHWKAEAAIVNFYS-------------PGDTLSAHIDESE

    AlkB_Hs_2134723 89 LLEKLRWVTVGYHYNWDSKKYS--|ADHYTP--FPSDLGFLSEQVAAAC-------GFEDFRAEAGILNYYR-------------LDSTLGIHVDRSE

    Consensus(85%): ........................|..........................................h..a..................h..H.D...

    * *

  • 7/30/2019 Dioxygenases

    4/8

    from the experimentally determined structures of IPNS

    (PDB:1ips), DAOCS (PDB:1dcs), CAS (PDB:1drt) and the

    predicted secondary structures for individual families, the

    conserved core of these elements was delineated (igure 1).

    A multiple alignment for the entire superfamily was

    constructed by aligning the conserved sequence features

    shared by all the individual families. The boundaries of the

    secondary structure elements in the alignment were adjusted

    using the secondary structure conservation as a guide.

    The conserved portion of the 2OG-e(II) dioxygenase super-

    family proteins comprises the core DSBH domain seen in the

    IPNS, DAOCS and CAS structures [8-10] and part of the

    amino-terminal = helix that showed considerable variability

    4 Genome Biology Vol 2 No 3 Aravind and Koonin

    Figure 1 (continued from the previous page)

    Secondary Structure: ...................EEEEEEE.............EEEEEEE...............EEEEE...EEEEEE...............EEEEEE........EEEEEEEEEE.

    YBIX_Ec_3025022 RSHPQN--------GWMRTDLSATLFLSDPQSY------DGGELVVNDTFGQ---------HRVKLPA-GDLVLYPS-----------SSLHCVTPVT----RGVRVASFMWIQS 189\YbiX

    PA4515_Pa_9950756 RDVHGGR-------ERVRTDLSSTLFFSDPEDY------DGGELVIQDTYGL---------QQVKLPA-GDLVLYPG-----------TSLHKVNPVT----RGARYASFFWTQS 178|family

    Ybix_Bsep_5777381 RPVPGTA-------ERVRTDLSATLFFSEPDSY------DGGELVVDDTYGP---------RTVKLPA-GHMVLYPG-----------TSLHKVTPVT----RGARISAFFWLQS 178|

    XF0598_Xf_9105464 MSLPIAPG---HTPASLRSDISCTLFLNDPDEY------EGGELIIADTYGE---------HEVKLPA-GDLIIYPS-----------TSLHRVAPVT----RGMRIASFFWVQS 182/

    Lep_Rr_5805194 LNAESLVCIK-EPPAYTFRDYSAILYLN-GD-------FDGGNFYFTELDAKTV-------TAEVQPQCGRAVGFSSG---------TENPHGVKAVT----RGQRCAIALWFTL 670\

    AK025841_Hs_10438479 LNAETLVCVK-EPLAYTFRDYSAILYLN-GD-------FDGGNFYFTELDAKTV-------TAEVQPQCGRAVGFSSG---------TENPHGVKAVT----RGQRCAIALWFTL 332|

    FLJ10718_Hs_8922619 LDPEANECWK-EPPAYTFRDYSALLYMN-DD-------FEGGEFIFTEMDAKTV-------TASIKPKCGRMISFSSG---------GENPHGVKAVT----KGKRCAVALWFTL 490|Leprecans

    Lep_Ci_9229924 LK-ENGSCLK-ERPAYTWRDYSAILYLN-DE-------FEGGEFIMTDATARRV-------KVQVRPKCGRLVSFSAG---------KECLHGVKPVT----KGRRCAMALWFTM 388|

    AK025875_Hs_10438524 YG---------------SFDYTSLLYLS-NYLED----FGGGRFMFMEEGAN----------KTVEPRAGRVSFFTSG---------SENL HRVEKVH----WGTRYAITIAFSC 307/

    EGL-9_Ce_5923812 K--------------DGRCITTIYYCNENWDMA-----TDGGTLRLYPETSMT--------PMDIDPR-ADRLVFFWSD--------RRNPHEVMPVF-----RHRFAITIWYMD 566\

    AK025273_Hs_10437756 G--------------DGRCITCIYYLNKNWDAK-----LHGGILRIFPEGKSF--------IADVEPI-FDRLLFFWSD--------RRNPHEVQPSY-----ATRYAMTVWYFD 214|EGL-9

    CG1114_Dm_7296772 K--------------DGRVITAIYYLNINWDAR-----ESGGILRIRPTPGTT--------VADIEPK-FDRLIFFWSD--------IRNPHEVQPAH-----RTRYAITVWYFD 248|family

    VCA0949_Vc_9658386 G-------------NENRKLTTVFYLNENWTP------ADGGELKIYDLQDNW--------IETLAPV-AGRLVVFLS---------ERFPHEVLEAH-----ADRVSIAGWFRT 193|

    PA0310_Pa_9946156 D-------------DDARTVSAVLYLNDAWLP------EHGGALRLHLPQR----------QVDIQPT-GGSLVVFMS---------AGTEHEVLPAS-----RDRLSLTGWFRR 223/

    P4HA_Hs_4505565 KDEPDAF----RELGTGNRIATWLFYMSDVS--------AGGATVFPEVG------------ASVWPKKGTAVFWYNLFA--SGEGDYSTRHAACPVL----VGNKWVSNKWLHE 519\

    CG9728_Dm_7301967 DPNHY-------PAEEGNRIATAIFYLSEVE--------QGGATAFPFLD------------IAVKPQLGNVLFWYNLHR--SLDKDYRTKHAGCPVL----KGSKWIGNVWIHE 903|

    CG1546_Dm_10726875 EVNLP-------KNFEGDRISTSMFYLSDVE--------QGGYTVFTKLN------------VFLPPVKGALVMWHNLHR--SLHVDARTLHAGCPVI----VGSKRIGNIWMHS 504|

    P4HA_Ce_1709529 KEESKSF----ESLGTGNRIATVLFYMSQPS--------HGGGTVFTEAK------------STILPTKNDALFWYNLYK--QGDGNPDTRHAACPVL----VGIKWVSNKWIHE 511|

    P4HA_Dm_7301948 KEEQRAF----EGLNLGNRIATVLFYMSDVE--------QGGATVFTSLH------------TALFPKKGTAAFWMNLHR--DGQGDVRTRHAACPVL----TGTKWVSNKWIHE 300|

    CG9713_Dm_7301967 KTNSN------MTAMSGDRIATVLFYLTDVE--------QGGATVFPNIR------------KAVFPQRGSVVMWYNLKD--NGQIDTQTLHAACPVI----VGSKWVCNKWIRE 511|Prolyl

    C14E2.4_Ce_1166595 FANAEDSNR--HFGEMGNRLATFIMVFKKAE--------KGGGTLFPQLG------------NVFRANPGDAFLWFNCNG--NLEREAKSLHGGCPIR----AGEKIIATIWIRI 277|hydroxylase

    Y43F8B.4.b_Ce_3947627 YPSEKEWDE--WMRVNGNRFGTLIMAFGAAE--------SGGATVFPRLG------------AAVRTKPGDAFFWFNAMG--NSEQEDLSEHAGCPIY----KGQKQISTIWLRM 508|family

    T20B3.7_Ce_7508029 YRSKQDYDW--WMNKTGNRIGTLIFVLKPAE--------KGGGTVFPSIG------------STVRANAGDAFFWFNAQA--DEEKEMLSNHGGCPIY----EGRKVIATIWIRA 199|

    CG9708_Dm_10726879 SKTFS-------LETVGDRIATVLFYLNNVD--------HGGATVFPKLN------------LAVPTQKGSALFWHNIDR-KSYDYDTRTFHGACPLI----SGTKLVVPQNIS- 129|

    P4HA_At_6437556 HDKVN-------IARGGHRIATVLLYLSNVT--------KGGETVFPDAQ------------VCLKPKKGNALLFFNLQQ--DAIPDPFSLHGGCPVI----EGEKWSATKWIHV 225|

    P4AH_PBCV_9631654 CDDACP---------KDQRLATLMVYLKAPEEG------GGGETDFPTLK------------TKIKPKKGTSIFFWVADP-VTRKLYKETLHAGLPVK----SGEKIIANQWIRA 240|

    B12F1.100_Nc_9368561 PPSGIHPTDASPADKKQSSLFTFLMYLNDEF--------EGGETTFFTPSVRDGVMNAH----PVRPVMGSVAVFPHG------ENHGALLHEGTGVR----KGAKYIIRTDVEF 488/

    PLOD_Ce_6093732 ------------------FSIDIALNKKGRD-------YEGGGVRYIRYNC-----------TVPADE--VGYAMMF-PG------RLTHLHEGLATT----KGTRYIMVSFINP 730\

    PLO_Dm_7294743 ------------------YTINIAMNRAGID-------YQGGGCRFIRYNC-----------SVTDTK--KGWMLMH-PG------RLTHYHEGLLVT----NGTRYIMISFIDP 721|

    PLO2_Hs_6093730 ------------------FTINIALNNVGED-------FQGGGCKFLRYNC-----------SIESPR--KGWSFMH-PG------RLTHLHEGLPVK----NGTRYIAVSFIDP 737|Lysyl

    PLO1_Hs_400205 ------------------FTINIALNRVGVD-------YEGGGCRFLRYNC-----------SIRAPR--KGWTLMH-PG------RLTHYHEGLPTT----RGTRYIAVSFVDP 727|hydroxylase

    PLO3_Hs_6093731 ------------------FTLNVALNHKGLD-------YEGGGCRFLRYDC-----------VISSPR--KGWALLH-PG------RLTHYHEGLPTT----WGTRYIMVSFVDP 738|family

    MRC8.21_At_9294077 ------------------VTLNVCLGNQ----------FVGGELFFRGTRCEKHVN------TATKAD--ETYDYCHIPG-QAVLHRGRHRHGARATT----CGHRVNMLLWCRS 347|

    K9D7.16_At_10178205 ------------------ITLNVCLSKQ----------SEGGEILFTGTRCNKHLK------AGPKPE--EIFEYCHEPG-QAILHLGCHSHGAKAAI----SCSRANMILWCIN 306|

    AK023553_Hs_10435524 ------------------LTLNVALGKV----------FTGGALYFGGLFQAPT--------ALTEPL--EVEHVVG-QG---VLHRGGQL HGARPLG----TGERWNLVVWLRA 249/

    SPBC6B1.08c_Sp_7491725 ----------------GTRCISYILYLVEPDEGWK--PEYGGALRLFPTLQPSFP--AADFCHSIPPQ-WNQLSFFRVKP-------GHSFHDVEEVYV---DKPRMAISGWFHY 230\

    Yer049wp_Sc_731462 ----------------GSRRISFILYLPDPDRKWK--SHYGGGLRLFPSILPNVP--HSDPSAKLVPQ-FNQIAFFKVLP-------GFSFHDVEEVKV---DKHRLSIQGWYHI 247|

    KIAA1612_Hs_10047299 ----------------EGRRIAFILYLVPPWDR-----SMGGTLDLYSIDEHFQP--KQIV-KSLIPS-WNKLVFFEVS--------PVSFHQVSEVLS---EKSRLSISGWFHG 247|

    CG18761_Dm_10726758 ----------------KDRQVAFIYYLSPWEGAEEWTDEQGGCLEIFGSDDQCFP--QFPVQRKIAPK-DNQFAFFKVG--------SRSFHQVGEVTT---DYPRLTINGWFHG 275|SanF/

    C17G10.1_Ce_861277 ----------------ETRRFAFVYYITSADWDSE---VNGGDLQLFNHDKKLQP--TSVA-AQFSPL-RNSFMLFEVS--------EKSWHRVAEMLS---EEPRLSINGWFHS 240|SanC

    SanC_Scan_7407127 -----------------DKAVTHLFYFND-GWDP----EWRGDLRLLRSADMAD------CAKRVAPT-TGTSVVLVRS--------DRSW HGVPPVADT--PVDRRALLVHFVR 212|family

    SanF_Scan_8389327 -----------------TKAITVILYLNR-DWPV----EAGGQFQIFASPKEGP-------TEEISPV-GGQLLAFPPT--------DKSWHAVSKIEHP--GTERITVQIEYWL 205|

    sll0428_Ssp_1653558 ------------SPDSATRELTYVYYFNR-EPKA----FSGGELAIYDSKIENNFYVAAESFKTVQPV-NNSIVFFLS----------RYMHEVLPVNCP3-ADSRFTINGWVRK 349|

    sll0191_Ssp_7469866 ------------SEKTASRQITYVYYFYQ-EPKA----FSGGELRLYDTELKNNTITTHPKFQTITPI-NNSIIFFNS----------RCRHEVMSVVCP3-AHSRFTVNGWIRK 300/

    CAS_Scla_322266 -------------YHRLQPNYVMLACSRADHE------RTAATLVASVRK---70---VTEAVYLEPG-DLLIVDNF-----------RTTHARTPFSPRWDGKDRWLHRVYIRT 302\

    IPNS_En_124825 -------------------LITVLYQ------------SNVQNLQVETAA--------GYQDIEADDT-GYLINCGSYMAHLTNNYYKAPIHRVKWVN-----AERQSLPFFVNL 288|

    FLAS_Pet_421946 -------------------YITILVP------------NEVQGLQVFKDG--------HWYDVKYIPN-ALIVHIGDQVEILSNGKYKSVYHRTTVNK----DKTRMSWPVFLEP 309|

    LDOX_Pet_1730108 -------------------ALTFILH------------NMVPGLQLFYEG--------QWVTAKCVPN-SIIMHIGDTIEILSNGKYKSILHRGVVNK----EKVRFSWAIFCEP 311|

    Srg_At_479047 -------------------GLTVLMQV-----------NDVEGLQIKKDG--------KWVPVKPLPN-AFIVNIGDVLEIITNGTYRSIEHRGVVNS----EKERLSIATFHNV 309|

    EFE_Le_398992 -------------------GIILLFQD-----------DKVSGLQLLKDE--------QWIDVPPMRH-SIV VNLGDQLEVITNGKYKSVLHRVIAQT----DGTRMSLASFYNP 253|Small

    Ga20Ox_Sot_10800976 -------------------SLTILHQ------------DSVSGLQVFMDN--------QWRSISPNLS-AFV VNIGDTFMALSNGRYKSCLHRAVVNN----KTPRKSLAFFLCP 317|molecule

    PA0147_Pa_9945977 -------------------CVTLLYQ------------DAAGGLQVQNRQG-------EWIDAPPIDG-TFVVNIGDMMARWSNDRYRSTPHRVISPR----GVHRYSMPFFAEP 274|dioxygenases

    PA4191_Pa_9950401 -------------------LLTLLHQ------------DAIGGLQVRTPQ--------GWLEAPPIPG-SFVCNLGDMLERMTGGLYRSTPHRVARNTS---GRDRLSFPLFFDP 277|

    ISP7_Sp_729862 -------------------ALTLMSQ------------DNVKGLEILDPVSN------CFLSVSPAPG-ALIANLGDIMAILTNNRYKSSMHRVCNNS----GSDRYTIPFFLQG 353|

    SPCC1494.01_Sc_7491815 -------------------SITLLFQ------------RDAAGLEIRPPNFVKDM---DWIKVNVQPD-VVLVNIADMLQFWTSGKLRSTVHRVRIDPG---VKTRQTIAYFVTP 267|

    DACCS_Lyl_769809 -------------------IVSLILQTPCP--------NGFVSLQVEIDG--------RFVEVPPRPG-CVVVFCGSIAPLVSDGKIKAPQHRVVS-PGA4-GSNRTSSVLFLRP 268/

    RRPO_SHVX_548840 IYPKG------------NKILTVNAA-------------GSGTFGI---------------KCAKGE-TTLN LEDGD-YFQMPSGFQETHKHNVVA------VTPRLSFTFRSTV 743\

    POL_ASPV_487652 IYDIN------------HQVLTVNYS-------------GDAIFCI---------------ECLGSGF-EIPLSGPQ-MLLMPFGFQKEHRHGIKSP-----SKGRISLTFRLTK 853|POL_BSV_409711 IFMRG------------APVHTVSMD-------------GNADFGT---------------ECAAGR--QYTTLRGNVQFTMPSGFQETHK HAVRNT-----TAGRVSYTFRRLA 841|RNA

    RRPO_PMV_139137 CYPKG------------HQVLTINHS-------------GECLTQI---------------ACQKGKA-SIT MGFGD-YYLSPVGFQESHKHAVSNT-----TGGRVSLTFRCTV 690|viral

    POL_GLV_1154656 IFEKD------------SKILTVCIQ-------------GDCEFRF---------------RCATGET-GFY MEAPK-QFMMPDGFQSNHVHAVREC-----TPGRISATFRRAK 772|AlkB

    Pol_GVA_1405615 CYLPG------------GSVVTVNLH-------------GDATFEVK--------------ENQSGKIEKKE LHDGD-VYVMGPGMQQTHKHRVTSH-----TDGRCSITLRNKT 738|homologs

    RRPO_ACLSV_1710717 CYDD-------------DEILTINVV-------------GDAKFHT---------------TC-HGE--IID LRQGD-EILMPGGYQKMNKHAVEVA-----SEGRTSVTLRVHK 836/

    T13L16.2_At_2708738 FL---------------RPFCTISFL-------------SECDILFGSNLKVE------GPGDFSGSY-SIPLPVGS-VLVLNGNGADVAKHCVPAV-----PTKRISITFRKMD 420\

    T19K4.220_At_3036813 FL---------------RPFCTVSFL-------------SECNILFGSNLKVL------GPGEFSGSY-SIPLPVGS-VLVLKGNGADVAKHCVPAV-----PTKRISITFRKMD 403|

    At2g48080_At_4249414 -----------------QPISTLVL--------------SESTMVFGHRLGVD------NDGNFRGSL-TLPLKEGS-LLVMRGNSADMARHVMCPS-----PNKRVAITFFKLK 351|

    AK000315.1_Hs_7020317 IFE--------------RPIVSVSFF-------------SDSALCFGCKFQFK-------PIRVSEPVLSLPVRRGS-VTVLSGYAADEITHCIRPQDI---KERRAVIILRKTR 270|

    CG17807_Dm_7291441 AFL--------------DPILSLSLQ-------------SDVVMDFRRG---------------DDQV-QVR LPRRS-LLIMSGEARYDWTHGIRPKHID13RGKRTSLTFRRLR 325|Eukaryotic

    CG6144_Dm_7297712 FH---------------PIISTISTG-------------AHTVLEFVKREDTTTETEAGDQTTREVLF-KLL LEPRS-LLILKDTLYTDYLHAISETSED24RSPRISLTIRNVP 213|Family of

    CG4036_Dm_7297561 IWGERVVTVNC------LGDSVLTLT--PYEVQQSGKYNLDLVASYEDELLAP-LLTDDQLATFEGKVLRIP MPNLS-LIVLYGPARYQFEHSVLREDV---QERRVCVAYREFT 278|AlkB

    FLJ2001_Hs_38923019 LWGERLVSLNL------LSPTVLSMC-----REAPGSLLLCSAPSAAPEALVDSVIAPSRSVLCQEVEVAIPLPARS-LLVLTGAARHQWKHAIHRRHI---EARRVCVTFRELS 274|paralogs

    C14B1.10_Ce_6580210 AFD--------------DPIVSISLL-------------SDVVMEFKD-------------GANSARIAPVL LKARS-LCLIQGESRYRWKHGIVNRKYD10RQTRVSLTLRKIR 343|

    SPAP8A3.02c_Sp_7491301 FGDG-------------VAIFSFLSN-------------TTMIFTHPE--------------LKLKS--KIRLEKGS-LLLMSGTARYDWFHEIPFRAGD12RSQRLSVTMRRII 219|

    L3377.4_Lm_9989036 VYD--------------DIFAICSLG-------------SNCLLRFVH-------------VQNGEEL-DVMVPDRS-VYIMSGPARYVYFHMVLPV-----EAQRFSLVFRRSI 193/

    MTCI237.14c_Mtu_2052134 RGSTEDTM---------VAIVSLGAT-------------RVFALRP----------------RGRG PSLRLPLAHGD-LLVMGGSCQRTFEHAVPKTSAP--TGPRVSIQFRPRD 203\

    AlkB_Cc_2055386 ADPR-------------FPLLSISLG-------------DTAVFRIGG-------------VNRKDPTRSLRLASGD--VCRLLGPARLAFHGVDRILPG6-GGGRINLTLRRAR 190|

    ALKB_Ec_113638 PDLR-------------APIVSVSLG-------------LPAIFQFGG-------------LKRNDPLKRLLLEHGD--VVVWGGESRLFYHGIQPLKAG5-IDCRYNLTFRQAG 213|Classic

    AlkB_Scoe_8894829 RTD--------------APVVSLSLG-------------DTCVFRFGN------------PETRTRPYTDTELRSGD--LFVFGGPSRLAYHGVPRVHPG7-LRGRLNITLRVSG 215|AlkB

    AlkB_At_4835778 ADWS-------------KPIVSMSLG-------------CKAIFLLGGK-------------SKDDPPHAMYLRSGD--VVLMAGEARECFHGNLLHFQL34KTSRININIRQVF 354|

    AlkB_Sp_3080529 EDLT-------------LPLISLSMG-------------LDCIYLIGTE------------SRSEKPS-ALRLHSGD--VVIMTGTSRKAFHGKHC-------SFKYLIYSQLIA 272|

    AlkB_Hs_2134723 LDHS-------------KPLLSFSFG-------------QSAIFLLGGL------------QRDEAPP-PMFMHSGD--IMIMSGFSRLLNHAVPRVLPN39KTARVNMa RQVL 272/

    Consensus(85%): .....................sh.h................s...h....................s.....h..................H.s...........+h.h..b...

    * *

  • 7/30/2019 Dioxygenases

    5/8

    in both length and sequence between individual families

    (igures 1,2). The DSBH region includes seven conserved

    strands that are common to all these proteins and are

    arranged in two sheets in a jelly-roll topology (igures 2,3).

    However, different families have specific inserts in various

    positions between the conserved strands; some of these

    inserts contain additional secondary structures and show sig-

    nificant sequence conservation (igures 1,3). or example,

    the insert between the fifth and sixth strand in AlkB is pre-

    dicted to contain an extra strand, whereas in the small-mole-

    cule dioxygenase (IPNS/ethylene-forming enzyme (EE))

    family, the same region forms (or is predicted to form) a

    short helix (igures 1,3). The clavaminic acid synthase, anoutlier of this latter family, has its own characteristic inserts,

    including a giant (approximately 70 amino acids) insert

    between strands 4 and 5 [8], and some members of the AlkB

    family have smaller inserts in the same position (igure 1).

    This reflects the relative resilience of the core DSBH to inser-

    tions, and accounts for difficulties in unification of this

    superfamily by sequence-based methods. The multiple align-

    ment contains at least three characteristic conserved motifs

    that center, respectively, at a HXD dyad near the amino ter-

    minus, a histidine towards the carboxyl terminus, and an

    arginine or lysine further downstream (igure 1). The HXD

    dyad is located in a flexible loop that follows the first con-

    served strand and stacks with the sheet containing three of

    the core strands (igures 2,3). The second conserved histi-

    dine is associated with the beginning of the sixth strand,

    whereas the conserved basic residue (R or K) is in the begin-

    ning of the seventh strand of the DSBH core (igures 2,3).The sixth position after the conserved basic residue is invari-

    ably occupied by a bulky residue that is either arginine (in

    AlkB) or phenylalanine or tryptophan in all other members

    of this fold [8-10] (igure 1).

    The roles of all these conserved residues in catalysis are

    apparent from the crystal structures of IPNS, clavaminic

    acid synthase (CAS) and deacetoxycephalosporin C synthase

    (DAOCS) [8-10]. The conserved HXD and the carboxy-

    terminal histidine coordinate e(II) that is directly involved

    in catalysis by these enzymes (igure 2). The conserved basic

    residue interacts with the carboxylate group of the acidic

    substrate, while the bulky or aromatic residue locatedcarboxy-terminal to the basic one forms the base of the cleft

    that holds this substrate molecule (igure 2). Whereas most

    of these enzymes necessarily use 2-OG as the acidic sub-

    strate, IPNS does not bind 2-OG. Its place is occupied by its

    single substrate, L-@-(=-aminoadipoyl)-L-cysteinyl-D-valine,

    whose carboxyl sidechain interacts with the conserved basic

    residue similarly to the OG-carboxylate in the other enzymes

    of this superfamily [9,10].

    The conservation of all the residues implicated in catalysis in

    the biochemically uncharacterized proteins such as AlkB,

    EGL-9, leprecan and YbiX suggests they all catalyze oxida-

    tive reactions similar to those catalyzed by IPNS, DAOCS,CAS and related enzymes such as protein lysyl/prolyl

    hydroxylases, EE and leucoanthocyanidin oxidases [1,2].

    Using the available contextual information, we attempted to

    predict possible substrates of these proteins. AlkB binds

    ssDNA and is required for the processing of toxic DNA mod-

    ifications caused by SN2 alkylating agents such as methyl-

    methanesulfonate (MMS) specifically on ssDNA. On the

    basis of the preferential specificity of AlkB-dependent repair

    for ssDNA and for SN2 as opposed to SN1 alkylating agents,

    it has been proposed that its targets include modified bases

    such as N1-methyladenine and N3-methylcytosine [18]. The

    predicted 2OG-e(II) dioxygenase activity of AlkB is proba-

    bly involved in the detoxification of methylated bases inssDNA, possibly through hydroxylation of the methyl

    groups, resulting in less toxic base derivatives. The hydroxy-

    lation might be followed by a second oxidative step that

    could remove the hydroxymethyl group, restoring the

    normal base in DNA. These reactions are consistent with the

    observation that, unlike AlkA, AlkB has no DNA glycosylase

    activity [18]. The most intriguing finding made during our

    analysis of the AlkB family is the existence of multiple

    eukaryotic AlkB homologs, other than the actual orthologs,

    especially in plants and their viruses. The specific action of

    AlkB on ssDNA suggests that ssRNA could be the substrate

    http://genomebiology.com/2001/2/3/research/0007.5

    Figure 2A structural model of the DSBH core of the 2OG-Fe(II)dioxygenase superfamily. This is based on the Emericellanidulans isopenicillin N synthase structure (PDB:1ips). Theside chains of the amino acid residues implicated in catalysisand in substrate binding are shown (see text) and the Fe(II)ion is indicated by a red circle.

  • 7/30/2019 Dioxygenases

    6/8

    for other members of this family. Given the presence of

    RNA methylases that could modify RNA in a similar way as

    the alkylating agents the function of the uncharacterized

    AlkB-like proteins could be to reverse such modifications.

    Such modifications might also be used in the host-mediated

    inactivation of viral RNAs, and the AlkB homologs acquired

    by some plant viruses could counter this host-defense

    mechanism. The connection between nucleic acid methyla-

    tion and the AlkB homologs described here is supported bythe domain architecture of a conserved pair of animal pro-

    teins (C14B1.10 from C. elegans and CG17807 from

    Drosophila) in which an amino-terminal AlkB-like domain

    is fused to a carboxy-terminal predicted methyltransferase

    domain. These proteins could potentially be involved in reg-

    ulation of RNA stability or DNA repair via controlled

    methylation/demethylation.

    EGL-9 defines a new family within the 2OG-e(II) dioxyge-

    nase superfamily that is highly conserved in animals and is

    also represented in the pathogenic proteobacteria such as

    Ps. aeruginosa and Vibrio cholerae. In C. elegans, EGL-9 is

    required for normal egg laying, whereas loss of its function

    provides resistance to hypercontractile muscular paralysis

    caused byPs. aeruginosa [19]. The vertebrate homolog of

    EGL-9 is specifically expressed in smooth muscles and is

    likely to have a role in their function [27]. At its carboxyl

    terminus, EGL-9 contains a MYND finger domain that is

    probably involved in specific protein-protein interactions

    [28]. The closest relatives of the EGL-9 family are theproline hydroxylases with which they share a region of spe-

    cific extended conservation amino terminal to the core

    DSBH domain (igure 1). This relationship, along with the

    combination to the intracellular MYND domain and the lack

    of signal peptides, suggests that the Egl-9 family proteins are

    prolyl hydroxylases that modify intracellular proteins, unlike

    the classic prolyl hydroxylases that have been implicated pri-

    marily in the modification of collagens in the endoplasmic

    lumen. The interesting aspect of the EGL-9 family is its pres-

    ence in V. cholerae and Ps. aeruginosa (igure 1), which

    have apparently acquired these genes by horizontal transfer

    6 Genome Biology Vol 2 No 3 Aravind and Koonin

    Figure 3Topological diagrams for three members of the 2OG-Fe(II) dioxygenase superfamily. The diagrams are based on theexperimentally determined structures for E. nidulans isopenicillin N synthase (PDB: 1ips) and structural models of prolyl-4-hydroxylase and AlkB. The amino acid residues of the active site and the Fe(II) ion are shown as in Figure 2.

    Isopenicillin synthase Prolyl-4-hydroxylase

    AlkB

    Variableinsert

    Variableregion

    N

    C C

    N

    C

    N

  • 7/30/2019 Dioxygenases

    7/8

    from eukaryotes. The direct connection between this gene

    acquisition and the action of the Ps. aeruginosa toxins on

    animal muscles is unclear, but it seems possible that the bac-

    terial EGL-9-like proteins modify host proteins in a manner

    that favors the survival and spread of the pathogen. This

    might be especially pertinent if the host downregulates theendogenous ortholog in response to the infection.

    Leprecan is a proteoglycan that is associated with the base-

    ment membrane in chordates [20]. It and related proteins

    contain an amino-terminal segment rich in leucine and

    proline [20] and a carboxy-terminal globular part that

    includes the 2OG-e(II) oxygenase domain. More distant

    relatives of the leprecan-like proteins include the T23K23.7

    protein from Arabidopsis and a family of uncharacterized

    proteobacterial proteins typified by YbiX (igure 1). These

    proteins are predicted to be previously unnoticed amino-

    acid hydroxylases that catalyze modifications of intracellular

    and extracellular proteins.

    Evolutionary implications

    Sequence conservation is high within individual families of

    the 2OG-e(II) dioxygenase superfamily, with specific

    extensions typical of each family, but low between different

    families (igure 1). This observation, together with the

    phyletic distribution of these proteins, provides some clues

    to their evolutionary history. Members of this superfamily

    could not be detected in the Archaea despite extensive

    profile searches of the archaeal proteomes as well as transi-

    tive searches seeded with many divergent sequences as start-

    ing points. This suggests that, unlike bacteria and

    eukaryotes, in which the superfamily is widely represented,archaea do not encode bona fide members. A corollary of

    this is that horizontal gene transfer between bacteria and

    eukaryotes might have had a significant role in the evolution

    of this superfamily. The DSBH fold that comprises the core

    domain of the 2OG-e(II) dioxygenases is also present in a

    large number of proteins typified by the arabinose-binding

    domain of the transcription regulator AraC, plant seed pro-

    teins such as vicilin, and oxalate oxidase ([29]; see SCOP

    [12]). Despite the lack of detectable sequence similarity to

    2OG-e(II) dioxygenases, these proteins contain a conserved

    HXH motif and a carboxy-terminal histidine that appear to

    be equivalent to the metal-chelating HXD and carboxy-ter-

    minal histidine of the latter (see above). Thus, these twoprotein superfamilies could have evolved from a common

    ancestor by acquiring distinct catalytic and ligand-binding

    properties. The classic AraC-like DSBH proteins appear to

    have a more universal distribution than the 2OG-e(II)

    dioxygenases, with diverse forms represented in archaea,

    bacteria and eukaryotes. The 2OG-e(II) dioxygenase super-

    family is present in diverse bacteria that had probably

    diverged before the diversification of the eukaryotes that

    contain this superfamily. This, together with the phyletic dis-

    tributution of the AraC and 2OG-e(II) dioxygenase super-

    families, leads us to speculate that the 2OG-e(II)

    dioxygenase superfamily evolved from the AraC-like super-

    family in bacteria through drastic sequence divergence that

    eliminated significant sequence similarity, followed by fixa-

    tion of the modified active-site configuration.

    In the case of the AlkB family, a single horizontal transfer eventprobably resulted in their entry into the eukaryotic lineage,

    which was followed by adaptation to new, RNA-related roles by

    some paralogs. In the case of the small-molecule dioxygenase

    (IPNS/EE) family, a complex web of relationships can be

    discerned. The EE and the plant secondary metabolite

    biosynthesis enzymes flavonol synthase, leucoanthocyanidin

    hydroxylase and giberellin-20 oxidase show maximum

    diversity in the plant lineage [1,6,7]. Their close homologs

    are, however, also found in Pseudomonas, but not in other

    well-studied proteobacterial lineages. Similarly, fungi

    contain several enzymes involved in secondary metabolite

    biosynthesis, such as IPNS and DAOCS, that are distinctly

    related to their counterparts in actinomycetes. This patchyphyletic distribution across kingdoms is suggestive of multi-

    ple gene transfer events that have apparently led to the wide

    dissemination of these proteins in bacteria and eukaryotes.

    The close grouping of the ethylene-forming enzyme and its

    Pseudomonas homologs [30] to the exclusion of other

    members of this family in plants indicates a possible recent

    acquisition of the gene inPseudomonas from its plant hosts.

    In Arabidopsis there is an expansion of small-molecule

    dioxygenases (at least 75 members), whereasPs. aeruginosa

    has at least three recently duplicated members. This prolif-

    eration of the small-molecule dioxygenases is consistent

    with their possible role in the synthesis of secondary

    metabolites in these organisms [1,6,7,30]. The amino-acidhydroxylases show the greatest diversity in eukaryotes, and

    are represented in a number of different forms in both

    animals and plants. They were probably derived from small-

    molecule hydroxylases early in eukaryotic evolution. In con-

    trast, the predicted amino-acid hydroxylases of the EGL-9

    family are seen sporadically, in single copies, in certain bac-

    terial lineages such as V. cholerae and Ps. aeruginosa, sug-

    gesting a secondary horizontal transfer from the eukaryotes

    to bacteria.

    Conclusions

    Before this study, structure determination, biochemicalstudies and sequence comparisons of 2OG-e(II) dioxyge-

    nases [1,9] had elucidated their structural fold, active-site

    residues and reaction mechanism. Here, using sequence

    profile searches, we show that many other protein families

    contain the same constellation of active-site residues and are

    predicted to adopt the same fold. This allows us to predict

    the catalytic activity of a wide range of functionally impor-

    tant, but biochemically uncharacterized, proteins from

    eukaryotes and bacteria. In particular, we propose a specific

    mechanism of action in DNA repair, and possibly in RNA

    modification, for the AlkB protein and its homologs.

    http://genomebiology.com/2001/2/3/research/0007.7

  • 7/30/2019 Dioxygenases

    8/8

    Materials and methodsThe Non-redundant Protein Sequence Database [21], the

    Expressed Sequence Tags Database (NCBI) [31] and the

    individual protein sequence databases of completely and

    partially sequenced genomes [32] were searched using the

    gapped version of the BLAST programs (BLASTPGP for pro-teins and TBLASTNGP for translating searches of nucleotide

    databases) [22]. Sequence profile searches were performed

    using the PSI-BLAST program [22]; profiles were saved

    using the C option and retrieved using the R option. Mul-

    tiple alignments of amino acid sequences were generated

    using a combination of PSI-BLAST, CLUSTALW [24] and

    secondary structure predictions that were produced using

    the PHD program [25] and the PSI-PRED program [26],

    with multiple alignments of individual protein families used

    as queries. The three-dimensional structure visualization,

    alignment and modeling were carried out using the SWISS-

    PDB-Viewer program [33].

    References1. Prescott AG: A dilemma of dioxygenases: or where molecu-

    lar biology and biochemistry fail to meet. J Exp Bot 1993,44:849-861.

    2. Hegg EL, Que L Jr: The 2-His-1-carboxylate facial triad - anemerging structural motif in mononuclear non-hemeiron(II) enzymes. Eur J Biochem 1997, 250:625-629.

    3. Myllyharju J, Kivirikko KI: Characterization of the iron- and 2-oxoglutarate-binding sites of human prolyl 4-hydroxylase.EMBO J 1997, 16:1173-1180.

    4. Pirskanen A, Kaimio AM, Myllyla R, Kivirikko KI: Site-directedmutagenesis of human lysyl hydroxylase expressed in insectcells. Identification of histidine residues and an aspartic acidresidue critical for catalytic activity. J Biol Chem 1996,

    271:9398-9402.5. Passoja K, Myllyharju J, Pirskanen A, Kivirikko KI: Identification ofarginine-700 as the residue that binds the C-5 carboxylgroup of 2-oxoglutarate in human lysyl hydroxylase 1. FEBSLett 1998, 434:145-148.

    6. Zhang Z, Barlow JN, Baldwin JE, Schofield CJ: Metal-catalyzed oxi-dation and mutagenesis studies on the iron(II) binding siteof 1-aminocyclopropane-1-carboxylate oxidase. Biochemistry1997, 36:15999-16007.

    7. Lukacin R, Britsch L: Identification of strictly conserved histi-dine and arginine residues as part of the active site inPetunia hybrida flavanone 3beta-hydroxylase. Eur J Biochem1997, 249:748-757.

    8. Zhang Z, Ren J, Stammers DK, Baldwin JE, Harlos K, Schofield CJ:Structural origins of the selectivity of the trifunctional oxy-genase clavaminic acid synthase. Nat Struct Biol2000, 7:127-133.

    9. Valegard K, van Scheltinga AC, Lloyd MD, Hara T, Ramaswamy S,Perrakis A, Thompson A, Lee HJ, Baldwin JE, Schofield CJ, et al.:

    Structure of a cephalosporin synthase. Nature 1998, 394:805-809.10. Roach PL, Clifton IJ, Fulop V, Harlos K, Barton GJ, Hajdu J, Anders-

    son I, Schofield CJ, Baldwin JE: Crystal structure of isopenicillinN synthase is the first from a new structural family ofenzymes. Nature 1995, 375:700-704.

    11. Lange SJ, Que L Jr: Oxygen activating nonheme iron enzymes.Curr Opin Chem Biol1998, 2:159-172.

    12. Lo Conte L, Ailey B, Hubbard TJ, Brenner SE, Murzin AG, ChothiaC: SCOP: a structural classification of proteins database.Nucleic Acids Res 2000, 28:257-259.

    13. Altschul SF, Koonin EV: Iterated profile searches with PSI-BLAST - a tool for discovery in protein databases. TrendsBiochem Sci1998, 23:444-447.

    14. Aravind L, Koonin EV: Gleaning non-trivial structural, func-tional and evolutionary information about proteins by itera-tive database searches.J Mol Biol1999, 287:1023-1040.

    15. Wei YF, Carter KC, Wang RP, Shell BK: Molecular cloning andfunctional analysis of a human cDNA encoding anEscherichia coli AlkB homolog, a protein involved in DNAalkylation damage repair. Nucleic Acids Res 1996, 24:931-937.

    16. Chen BJ, Carroll P, Samson L: The Escherichia coli AlkB proteinprotects human cells against alkylation-induced toxicity. JBacteriol1994, 176:6255-6261.

    17. Kondo H, Nakabeppu Y, Kataoka H, Kuhara S, Kawabata S, SekiguchiM: Structure and expression of the alkB gene ofEscherichiacoli related to the repair of alkylated DNA.J Biol Chem 1986,261:15772-15777.

    18. Dinglay S, Trewick SC, Lindahl T, Sedgwick B: Defective process-ing of methylated single-stranded DNA by E. coli AlkBmutants. Genes Dev2000, 14:2097-2105.

    19. Darby C, Cosma CL, Thomas JH, Manoil C: Lethal paralysis ofCaenorhabditis elegans by Pseudomonas aeruginosa. Proc NatlAcad Sci USA 1999, 96:15202-15207.

    20. Wassenhove-McCarthy DJ, McCarthy KJ: Molecular characteri-zation of a novel basement membrane-associated proteo-glycan, leprecan.J Biol Chem 1999, 274:25004-25017.

    21. Non-redundant Protein Sequence Database[http://www.ncbi.nlm.nih.gov/BLAST/]

    22. Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W,Lipman DJ: Gapped BLAST and PSI-BLAST: a new genera-tion of protein database search programs. Nucleic Acids Res

    1997, 25:3389-3402.23. Shanklin J, Achim C, Schmidt H, Fox BG, Munck E: Mossbauer

    studies of alkane omega-hydroxylase: evidence for a diironcluster in an integral-membrane enzyme. Proc Natl Acad SciUSA 1997, 94:2981-2986.

    24. Thompson JD, Higgins DG, Gibson TJ: CLUSTAL W: improvingthe sensitivity of progressive multiple sequence alignmentthrough sequence weighting, position-specific gap penaltiesand weight matrix choice. Nucleic Acids Res 1994, 22:4673-4680.

    25. Rost B, Sander C: Prediction of protein secondary structure atbetter than 70% accuracy.J Mol Biol1993, 232:584-599.

    26. Jones DT: Protein secondary structure prediction based onposition-specific scoring matrices.J Mol Biol1999, 292:195-202.

    27. Wax SD, Rosenfield CL, Taubman MB: Identification of a novelgrowth factor-responsive gene in vascular smooth musclecells.J Biol Chem 1994, 269:13041-13047.

    28. Masselink H, Bernards R: The adenovirus E1A binding proteinBS69 is a corepressor of transcription through recruitment

    of N-CoR. Oncogene 2000, 19:1538-1546.29. Gane PJ, Dunwell JM, Warwicker J: Modeling based on the struc-

    ture of vicilins predicts a histidine cluster in the active siteof oxalate oxidase.J Mol Evol1998, 46:488-493.

    30. Nagahama K, Yoshino K, Matsuoka M, Sato M, Tanase S, Ogawa T,Fukuda H: Ethylene production by strains of the plant-patho-genic bacterium Pseudomonas syringae depends upon thepresence of indigenous plasmids carrying homologous genesfor the ethylene-forming enzyme.Microbiology1994, 140:2309-2313.

    31. Expressed Sequence Tags Database[http://www.ncbi.nlm.nih.gov/blast/blast.cgi?Jform=0]

    32. Unfinished Genomes Database[http://www.ncbi.nlm.nih.gov/Microb_blast/unfinishedgenome.html]

    33. Guex N, Peitsch MC: SWISS-MODEL and the Swiss-Pdb-Viewer: an environment for comparative protein modeling.Electrophoresis 1997, 18:2714-2723.

    8 Genome Biology Vol 2 No 3 Aravind and Koonin