7
BioMed Central Page 1 of 7 (page number not for citation purposes) Biology Direct Open Access Discovery notes The Anabaena sensory rhodopsin transducer defines a novel superfamily of prokaryotic small-molecule binding domains Robson F De Souza, Lakshminarayan M Iyer and L Aravind* Address: National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA Email: Robson F De Souza - [email protected]; Lakshminarayan M Iyer - [email protected]; L Aravind* - [email protected] * Corresponding author Abstract The Anabaena sensory rhodopsin transducer (ASRT) is a small protein that has been claimed to function as a signaling molecule downstream of the cyanobacterial sensory rhodopsin. However, orthologs of ASRT have been detected in several bacteria that lack rhodopsin, raising questions about the generality of this function. Using sequence profile searches we show that ASRT defines a novel superfamily of β-sandwich fold domains. Through contextual inference based on domain architectures and predicted operons and structural analysis we present strong evidence that these domains bind small molecules, most probably sugars. We propose that the intracellular versions like ASRT probably participate as sensors that regulate a diverse range of sugar metabolism operons or even the light sensory behavior in Anabaena by binding sugars or related metabolites. We also show that one of the extracellular versions define a predicted sugar-binding structure in a novel cell-surface lipoprotein found across actinobacteria, including several pathogens such as Tropheryma, Actinomyces and Thermobifida. The analysis of this superfamily also provides new data to investigate the evolution of carbohydrate binding modes in β-sandwich domains with very different topologies. Reviewers: This article was reviewed by M. Madan Babu and Mark A. Ragan. Introduction In the past decade membrane-embedded photoactive ret- inylidene-containing rhodopsins have been identified in several bacteria [1]. One such rhodopsin from the cyano- bacterium Anabaena, which utilizes a chlorophyll-based photosynthetic apparatus, was proposed to function as a sensory rhodopsin, as opposed to being a proton pump [2-4]. Inference of a sensory function for the Anabaena sensory rhodopsin (ASR) was based on several features [2], including its co-transcription with a 14 kDa protein with which it was shown to physically interact. This co- transcribed 14 kDa protein has been proposed to function as an analog of the G-protein signal transducers associated with the animal visual rhodopsins [2,5] and was accord- ingly named ASRT ("Anabaena sensory rhodopsin trans- ducer"). The solution of the three-dimensional structure of ASRT showed that it formed a tetrameric structure which was believed to mimic the G-β subunit of the ani- mal visual rhodopsin [5]. However, it was observed that orthologs of ASRT were found in other bacteria, such as Thermotoga maritma [5], whose genomes do not encode any rhodopsin homolog. This observation raised ques- tions regarding the more general function of the ASRT proteins and also its actual role in connection to ASR in Published: 14 August 2009 Biology Direct 2009, 4:25 doi:10.1186/1745-6150-4-25 Received: 12 August 2009 Accepted: 14 August 2009 This article is available from: http://www.biology-direct.com/content/4/1/25 © 2009 De Souza et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0 ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

The Anabaena sensory rhodopsin transducer defines a novel

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: The Anabaena sensory rhodopsin transducer defines a novel

BioMed CentralBiology Direct

ss

Open AcceDiscovery notesThe Anabaena sensory rhodopsin transducer defines a novel superfamily of prokaryotic small-molecule binding domainsRobson F De Souza, Lakshminarayan M Iyer and L Aravind*

Address: National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA

Email: Robson F De Souza - [email protected]; Lakshminarayan M Iyer - [email protected]; L Aravind* - [email protected]

* Corresponding author

AbstractThe Anabaena sensory rhodopsin transducer (ASRT) is a small protein that has been claimed tofunction as a signaling molecule downstream of the cyanobacterial sensory rhodopsin. However,orthologs of ASRT have been detected in several bacteria that lack rhodopsin, raising questionsabout the generality of this function. Using sequence profile searches we show that ASRT definesa novel superfamily of β-sandwich fold domains. Through contextual inference based on domainarchitectures and predicted operons and structural analysis we present strong evidence that thesedomains bind small molecules, most probably sugars. We propose that the intracellular versionslike ASRT probably participate as sensors that regulate a diverse range of sugar metabolismoperons or even the light sensory behavior in Anabaena by binding sugars or related metabolites.We also show that one of the extracellular versions define a predicted sugar-binding structure ina novel cell-surface lipoprotein found across actinobacteria, including several pathogens such asTropheryma, Actinomyces and Thermobifida. The analysis of this superfamily also provides new datato investigate the evolution of carbohydrate binding modes in β-sandwich domains with verydifferent topologies.

Reviewers: This article was reviewed by M. Madan Babu and Mark A. Ragan.

IntroductionIn the past decade membrane-embedded photoactive ret-inylidene-containing rhodopsins have been identified inseveral bacteria [1]. One such rhodopsin from the cyano-bacterium Anabaena, which utilizes a chlorophyll-basedphotosynthetic apparatus, was proposed to function as asensory rhodopsin, as opposed to being a proton pump[2-4]. Inference of a sensory function for the Anabaenasensory rhodopsin (ASR) was based on several features[2], including its co-transcription with a 14 kDa proteinwith which it was shown to physically interact. This co-transcribed 14 kDa protein has been proposed to function

as an analog of the G-protein signal transducers associatedwith the animal visual rhodopsins [2,5] and was accord-ingly named ASRT ("Anabaena sensory rhodopsin trans-ducer"). The solution of the three-dimensional structureof ASRT showed that it formed a tetrameric structurewhich was believed to mimic the G-β subunit of the ani-mal visual rhodopsin [5]. However, it was observed thatorthologs of ASRT were found in other bacteria, such asThermotoga maritma [5], whose genomes do not encodeany rhodopsin homolog. This observation raised ques-tions regarding the more general function of the ASRTproteins and also its actual role in connection to ASR in

Published: 14 August 2009

Biology Direct 2009, 4:25 doi:10.1186/1745-6150-4-25

Received: 12 August 2009Accepted: 14 August 2009

This article is available from: http://www.biology-direct.com/content/4/1/25

© 2009 De Souza et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Page 1 of 7(page number not for citation purposes)

Page 2: The Anabaena sensory rhodopsin transducer defines a novel

Biology Direct 2009, 4:25 http://www.biology-direct.com/content/4/1/25

Anabaena. This prompted us to systematically investigateASRT homologs with the view of gleaning new functionalinformation regarding this family by means of compara-tive genomics analysis. As a consequence we show thatASRT defines a novel family of prokaryotic small-mole-cule-binding domains that function in a wide range ofcontexts, predominantly as a carbohydrate sensor.

Results and discussionDetection of novel ASRT homologsThe 3D structure of ASRT (PDB: 2II7) shows that theentire protein is comprised of a single globular domainwith an eight-stranded β-sandwich fold [5]. All previouslyreported homologs of ASRT are similarly sized single-domain proteins (completely congruent to the PFAMalignment DUF1362 [6]) from various prokaryotes. Toidentify potential previously undetected homologs we ini-tiated a series of sequence profile searches with knownrepresentatives of ASRT from different bacteria. A PSI-BLAST search with the Thermotoga maritma ASRT ortholog(PDB: 1NC7) against NR database detected segments ofputative secreted amidases from certain chloroflexi with e-values of borderline significance (e = 0.013) and then con-verged. These alignments corresponded to regions thatdid not overlap with the amidase catalytic domain anddid not correspond to any previously described domains.These segments in the amidases were also independentlydetected in a search with the ASRT alignment as a HMMagainst the NR database using the recently releasedHMMER3 program (e = .024). Despite their borderline e-values, the alignment with the ASRT starting pointextended throughout the entire globular domain in thelatter and showed conservation of structurally criticalhydrophobic residues. To test this relationship we per-formed a search with the above-detected sequence seg-ments from the amidases against a library of HMMsderived from PDB structures as seeds using HHpred,which performs profile-profile comparisons, and recov-ered the Anabaena ASRT and its Thermotoga orthologs asthe best hits with significant e-values (E = 10-3). Likewise,a reciprocal PSI-BLAST search initiated with one of theseregions detected in the amidases (residues 373471 ofgi:148657162; Roseiflexus sp. RS-1) identified AnabaenaASRT and its Thermotoga orthologs (e-values = 7*10-3, iter-ation 2) and in subsequent iterations retrieved all knownASRT homologs. This search also detected multiplerepeats of the segment related to ASRT in the Roseiflexusamidase-domain-containing protein. In further transitivesearches using representatives of these additional seg-ments as query (e.g. 148270795, 679789), we retrieved,in addition to the previously identified homologs,sequences from actinobacteria such as Tropheryma whip-plei. Reciprocal PSI-BLAST searches with representatives ofthe segments detected in these actinobacterial sequences(e.g. gi:28493092) retrieved several other actinobacterial

sequences together with representatives of sequencesfound in the initial searches (see Additional file 1 fordetails). None of these newly found sequences was recov-ered by the PFAM model DUF1362 or has been previouslyreported as being related to ASRT. The secondary-structureprediction for the newly detected sequences with signifi-cant relationship to ASRT suggested an all-β fold congru-ent with that determined for ASRT and it T. maritimaortholog. Taken together, these investigations indicatedthat ASRT and its cognates from other prokaryotesbelonged to a larger superfamily of homologous domains,which might occur in more than one copy and in combi-nation with other domains in a polypeptide (see below).Hereinafter, we refer to this domain as the ASRAH (forAnabaena sensory ahodopsin- associated omology)domain.

Classification and structural analysis of ASRAH domainsBased on clustering with BLAST scores and domain archi-tectural features the ASRAH domains can be divided into3 families with distinctive phyletic patterns (see Addi-tional file 1, Table S1 and Fig. 1 for details). The first ofthese, defined by the stand-alone, intracellular versions ofthe domain similar to ASRT, are found in addition to Ther-motoga and Anabaena in some actinobacteria, proteobacte-ria, firmicutes, Chthoniobacter flavus (verrucomicrobia),the extremely thermophilic Dictyoglomus [7] as well as thearchaeon Natrialba magadii. The second family is presentin secreted proteins, always fused to an amidase domain,and is predominantly found in photosynthetic bacteriasuch as Roseiflexus, Chloroflexus and Herpetosiphon. Thethird family, also secreted, is found mainly in actinobac-teria and is defined by unique N- and C-terminal con-served extensions flanking the ASRAH domains. Amultiple sequence alignment of the ASRAH domain wasgenerated by editing a preliminary alignment produced bythe KALIGN program based on PSI-BLAST HSPs and sec-ondary structure predictions (Fig. 2). Examination of thestructures reveals that key structural elements of the Ana-baena representative have been mutated to obtain its crys-tal structure. As consequence there are major distortionsin this structure and it is unlikely to represent the nativecondition of the ASRAH domain. Hence, we continuedfurther analysis using the Thermotoga ASRT structure deter-mined as part of the Structural Genomics initiative(1NC7). Superposition of the secondary structure of thisprotein onto the alignment of the ASRAH superfamilyshows that the sequence conservation is comprised pre-dominantly of hydrophobic residues defining the corethat stabilizes the β-sandwich. However, there are a fewresidues that distinguish this fold from other previouslycharacterized β-sandwich folds. These include: 1) a well-conserved tryptophan, usually following a polar residue,present at the start of the first strand. This tryptophanappears to be central to a hydrophobic interaction

Page 2 of 7(page number not for citation purposes)

Page 3: The Anabaena sensory rhodopsin transducer defines a novel

Biology Direct 2009, 4:25 http://www.biology-direct.com/content/4/1/25

required to hold the two β-sheets of the sandwich together(Fig. 2). 2) A nearly absolutely conserved asparaginelocated at the end of the second β-strand. The asparagineforms two hydrogen bonds with the backbone carbonylsof the residues 2 and 4 positions downstream from it. Fur-ther, there are no gaps in this part of the alignment sug-gesting that this conserved asparagine helps in stabilizingthe characteristic tight turn between strand 2 and 3 of thestructure.

An examination of the structure of the ASRAH domainsuggests that the β-sandwich is composed of two internalrepeats of four strands each the first, third and fourthstrand of each repeat are in one sheet of the β-sandwichwhile the second is in the opposite sheet. This topologydistinguishes the ASRAH domain from other structurally

comparable β-sandwich domains such as those found inthe β-galactosidase [8] (shows evidence for independentduplication and lacks the C-terminal strand relative to theASRAH β-sandwich) and the carbohydrate binding mod-ule CMB4 from the thermostable xylanase from Rhodo-thermus marinus [9] (which has additional N-terminalstrands and displays a circular permutation of the laststrand relative to the ASRAH β-sandwich; see Fig. 2). Thestructure shows that the solo ASRAH domain from Ther-motoga exists as a tetramer with each unit containing adeep cleft that could potentially form a binding-pocket onone face of the sheet. In two of the monomers in the struc-ture we observed that the pocket contains a 1,2-ethane-diol molecule (Fig. 2), but contains considerable room fora much larger molecule such as a pentose or a larger sugar.Interestingly, the binding cleft is lined by two residues

Domain architectures and conserved gene neighborhoods of ASRAH homologsFigure 1Domain architectures and conserved gene neighborhoods of ASRAH homologs. Gene neighborhoods are labeled by the organisms or taxonomic groups in which they are detected, of which the prototype species is depicted in bold letters. Complete descriptions for the gene abbreviations used in gene neighborhood and operon depictions are provided in the key below. The signal peptides in the actinomycete protein with two ASRAH domains are accompanied by a lipobox-like cysteine. Colors denote different functional classes, e.g. red represents enzymes or enzymatic domains, orange transporters, purple sen-sors. For a complete list of architectures and gene neighborhoods see Additional file 1.

Page 3 of 7(page number not for citation purposes)

Page 4: The Anabaena sensory rhodopsin transducer defines a novel

Biology Direct 2009, 4:25 http://www.biology-direct.com/content/4/1/25

Figure 2 (see legend on next page)

TD

h: hydrophobic: FWYILVACMV p: polar: CDEHKNQRST s: small: AGSCDNPTV u: tiny: GASa: aromatic: FHWY Strand: E E E E

W

Predicted Secondary StructureTM1070_Tmar_15643828 1ebA4555_Aaro_56478016 1azo1767_Asp._119898058 1Nham_4189_Nham_92109356 3A2cp1_0859_Adeh_220915973 1Adeh_0807_Adeh_86157234 1AnaeK_0855_Asp._197121269 1Smlt2639_Smal_190574565 5Smed_4041_Smed_150376186 2alr3166_Ana_17230658 2Ava_3864_Avar_75910068 2ORF12_Scel_6724249 31ORF4_Scel_6724241 2Noc_0463_Noce_77163991 2Rxyl_2717_Rxyl_108805507 3NmagDRAFT_3474_Nmag_224821217 1SM_b20098_Smel_16263846 1Smed_4039_Smed_150376184 1NmagDRAFT_2046_Nmag_224822529 23RUMGNA_03032_Rgna_154505495 1Rxyl_1686_Rxyl_108804522 6GYMC10DRAFT_4815_Gsp._192812314 1BL00814_Blic_52081863 1Dtur_0418_Dtur_217966830 2DICTH_0300_Dthe_206901662 2Csac_1017_Csac_146296051 1Tpet_1674_Tpet_148270795 1CTN_1499_Tnea_222100473 1RoseRS_3053_Rsp._148657162 681Rcas_2671_Rcas_156742631 681Caur_1864_Caur_163847426 693Cagg_1595_Cagg_219848499 693Haur_1193_Haur_159897722 637RoseRS_3053_Rsp._148657162 580Rcas_2671_Rcas_156742631 580Caur_1864_Caur_163847426 592Cagg_1595_Cagg_219848499 592Haur_1193_Haur_159897722 535RoseRS_3053_Rsp._148657162 468Rcas_2671_Rcas_156742631 468Caur_1864_Caur_163847426 482Cagg_1595_Cagg_219848499 482Haur_1193_Haur_159897722 434RoseRS_3053_Rsp._148657162 369Rcas_2671_Rcas_156742631 369Caur_1864_Caur_163847426 383Cagg_1595_Cagg_219848499 383Haur_1193_Haur_159897722 335Arth_1194_Asp._116669755 232Achl_1265_Achl_220912036 213AAur_1312_Aaur_119961677 206RSal33209_3015_Rsal_163841745 219A20C1_05961_Mact_88855619 193CMS_0648_Cmic_170781085 155CMM_1031_Cmic_148272211 149Acel_0461_Acel_117927670 147Tfu_2513_Tfus_72162912 143SGR_4505_Sgri_182438298 172SCO3032_Scoe_21221475 168ORF8_Snod_212007852 166SSEG_08070_Ssvi_197778862 166SAV_5044_Save_29831587 164SSCG_04211_Scla_197768628 169SSBG_01882_Ssp._197763853 2JNB_18748_Jsp._84498155 162TWT125_Twhi_28493092 121TW136_Twhi_28572299 121ACTODO_02044_Aodo_154509512 115BL1006_Blon_23465575 178BloniC5_010100003899_Blon_224287920 178Blon_1768_Blon_213692635 178BIFBRE_01033_Bbre_223467316 178BAD_0661_Bado_119025679 176BIFADO_01360_Bado_154487505 166BIFCAT_01052_Bcat_212716137 111BIFPSEUDO_03463_Bpse_225351861 111BIFDEN_02346_Bden_171743218 111BbifN4_010100003254_Bbif_224282825 176BIFANG_01091_Bang_217415110 174BIFGAL_00375_Bgal_217412440 184BIFLAC_02412_Bani_183601472 184FRAAL1267_Faln_111220725 154Franean1_5862_Fsp._158317606 157BlinB01002916_Blin_62423497 163Francci3_0747_Fsp._86739460 1292Noca_1413_Nsp._119715649 143Acel_0461_Acel_117927670 242Noca_1413_Nsp._119715649 241SSBG_01882_Ssp._197763853 96SSCG_04211_Scla_197768628 263SSEG_08070_Ssvi_197778862 260ORF8_Snod_212007852 260SCO3032_Scoe_21221475 262SAV_5044_Save_29831587 258Tfu_2513_Tfus_72162912 236consensus/70%

MMMGMMMATSSEDSAMMMLMPMMNDMMMGGGGGGGGGGGGGGGSSDDDCCCCCCCCCCCCCCCRCCCCCCCCCCCCCCCCCCCCCCGGGGLLLLQs

NDSVRRRSHLLKKTE-REANVSKKRKNNAVAAAVMIIAIVIVIAARTQQLQQRSSEATTGSTTAAQQAVVVEVVVVVDVVILEGESQATEDDDDR.

-PSAAAACASSPPPHAAAP-G--EEA--VMQRATTDDVMADDSAASALQQAVAEEAEAAVAAPRGQQASSSTVVTVAVVNNPQAPAIARADDDDSs

-IIVIIIIIIIIIIITIIIYEK-IIL--AASEEEETQARHRRRIMTTTPPPPVAAPPPPPPPPRPPPATTTTPPPPPGPTTPSASPPEDEKKKKG.

GGGGGGGGGGGGGGGGGGGGGGGGGGGGPPPPPLLLLPPPLLWVVVVVASGAGTTVSDDDDDEDGRRRSSSAAAAAAAAAQRRARALLATLLTLRs

AKKRHHHHSRRRRKRKKKESAEEAAKAAAAAADSSSGSSSSSSRRRRRNNNNGAATSTTTTSTTSQQTLLLLLLLLLLILLAASAPGTGGGGGGD.

RRTRRRRRRTTTTTRTRRTKLKTKRKRKYYYYYRRRRNRRRRRSSAAKDDDDDDDDSDEEEEDDDEESSSSSKKKKEESENRRNRHTQGSGGGGAp

EKRRVTTTRRCCRRCRTRRRIVQHIIVKRSSRRVEERQTVVVVTRRRRRMQALSAAATFFFFFLFLSSQQQQQQQQQQQQQQSTFVAADDDDDDDD.

EWWWWWWWWWWWW

WWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWW

WWWWWWWWWWWWWWWWWWWW

SSSAASSSSATSWW

WWWWWWWWWWW

EFAAAAAAAAAAAAAATTTEILIILLVFFRRVAHHHLFYYYYYYFFYYTLLLLLLLFFFFFFFILLFLIFFFFFFLLFFFFFFFIFFLLLLLLLLVh

EFIIIIIIIIIIIIIIIIIIFVIIIIIFFFFFFFFFFFFFFFFFFFFFFVALVSVVIAPPPPPPPLLLALLLLLLMMLVLLLAALTTPPVTATTAPh

PAAAPPPAAAAAAAAPAAPPPPPPPPPPAAIIAAAAAAAAAAAAAAAVGGGGGGGGAGGGGGGGAGGTLLLLLLVVVLLVIGGAGGRAPGAAAAAs

DEEEEEEEDEEEEEEEGGGDDDDDDDDDDDDDEEEEEEEEEDEEEEEEAAATGGGAPAAVAAAVGGGGSSSSSSSSTPPPSPPGPVAQAASSAST.

GGGGGGGGGGGGGGGGGGGAGGGAAGGGGGGGGGGGGGGGGGGGGGGSNSNSAAASGSSSSGSDGAAADDGGGGGGGDSSGSASSGDAAATAAAAu

YYYYYYYYYYYYYYYYRHHEFYFYYYYYRRRRRTTTTASSSSDNNNNNTTTTTTTTGTTTTTTTGTTTSSGTTTTTTTTTTTTGTAGEDRDDDDA.

IILILLLIIIIIIIIIIIVCIIILLIIITTTTTTTTTTTTTTGPPTTTSTADTTTRTAASAAASAGGTTTTAKKKKQQGAKTVAVGPPPPPPPPP.

PPPPPPPPPPPPPPPPPPPPPPPPPPPPSGSSAAAEELEETTSRRAARLVVITTTAEKAEADKKAVVATTTTTTTTATTTEAAAAAAAAASSAAAs

NPPGPPPASPPGGPSELLVPEPPEESNCGGDDDRPGGDGGNENDDGGDGGGGGGGGLSGSDGEEGFFGGGGGGGGGGGGGGGGGGGTTPAGDGGTs

GSQQRRRWSYYRGSYGEGEEREEIIEGGDDAAPPPEEPDDQQD-----RRRRRRRAKRRRRRRRRDDATTTTLLIMMTSHTRRSRHTDRTSTDSS.

KSNSGEGSSGSSSSSSSSSGSSSGGGKKYY--AFFFFFKKTTRYYYYY--------------------------------------LNLALLLLH.

RPGVRRRNANNNNTNNNNTVTSSDDSRR----------------------------------------------------------TLVVVVVVV.

GDESPPPGSGGGGGGGGGGNGGGPPMGG----------------------------------------------------------VLLLLLLLI.

YARGEEEPDPPPPPPPPMPSGESNNSYY----------------------------------------------------------PMPPPPPPP.

-SGDDSDSDEEEEAEEEEE-------------------------------------------------------------------GGGGGGGGG.

--------------------------------------------------------------------------------------FLIIIIIII.

-YRRRRRPPPPPPPPPPPP-------------------------------------------------------------------LAPPPPPPP.

-ERAAAAEAQQQQQQEEEE-------------------------------------------------------------------GPDAKEKKG.

LLMLLLLMLFFMMMFMFFLILLLLLILL----------------------------------------------------------GGDDDDDDG.

VKVVEEEQVIITTTTLTTVPETTPPPVA----------------------------------------------------------DGAAAAAAD.

SSSSSSSSSSSSSSSSSSSGSSSSSGSS----------------------------------------------------------GGTTTTTTG.

HHHHHHHHHHHHHHHHRRNHHHHHHHHHQQSTTRRRRVRRQQSEESSNTTTTSTTQDQTTTTQSQNNDTTTTTTTTTTNTTDDNDTGRKSADADNp

EEEEEEEEEEEEEEEEEDDEEEEEEEEEEQQYYEMMTTTTTTTTEEQQEASAATTTLLDDDDDDDETTMQQQQQQQQQQQNQPPSPSRRLVVVVVRp

ESATTAAATSTTTTTTSEEKSASASSCSSYYYYFSSQQTFFYYLRRRRRVVVILFFRRYYYYYYYRLLVQQQQQQQQQQQQQIVQVVTTRRRRRRR.

ELAAAAAACAVVAAAAIIILIVVITTLLLLLLLLLLFLLLLLLLLLLLILLLLIVVLLLVVVVVVLLLLLLLLLLLLLLLLLVLLVLLLLLLLLLLl

ECCCCCCCCCCCCCCCCAACICCCCCCCCLLCCLLLLLAVVIIWAASSMNVNTSVVVHHHHHHHHIIIIIIIIVVVVVVVVLIYLIEFVVVVIVVIh

EIILIVVVVVIILLIIIVVLIVVVIIIIIIILLVVVVVLLLIILVVMFLLLVLLLLLLLLLLLLLLLLLVVVIVVVVVVLIILWLWLLVAAAALALl

EXLLLLLLLLLLLLLLLLLLTLLLLLVMMSSSSLVLLLLFFFFYLLYYFSSSATDDDATTTTTTTTGGSSSSAAAAAAAASASTSSATAFYFFFFA.

NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN

AAATTAN

TATAAAAATAAAAATTCCTTALCVVTTVPPVVPPPPPPPPPPPPPPPPAAASPPPPIPPPPPPPTPPPLPPPPPPPPSPPPLTPLPPGPPPPPPVs

GGGSGGGGGGGSSTGSCSGCSTHNNGGNSSSSNNNNNNQQQQQGGAATSSSSTSSDDEDDDDDDGSSGSSSSSSSSSSSTTAGGADGTGGGGGGSs

DDRDDDDDDDDDDDDDDDPDREEDDDDKQRPPQGGNSPSSPPPDSPGNSSEAEGGDEDDDDDDDADDPTTTTAAAAASSDDDPDDSDDTAEDGDDs

ERYEEEEETEERRQAESSECRQEKKVEERSLLRDDQQVEENNSEERRQTSTTVVVVTSSSSSSTNSSTKKKKKKKKKKKQKQRVEGIDDHSATSEp

TLDDDDDPPDDDDEPDPPMDEDNPPDTTQQSSPRRPPSPPDDESPPPIPPPAASSPAAAAAAPAEQQAPPPPDDAAAAPPSPPPPTDEDDDDDDPs

EAAAAAAAAAAAAAAAAAAAAAVAAAAAAAATAAAAAAAVVTAVAAATAAAAASSSAAAAAAAAAVTTSTTTTTTTTTTTTTAAVAAAVAAAAAATu

EKQHEHHHHKHHQQQRSRRTHRVNKKKKKRRIMQNNNNAGGRSVNTTTRTTTTTTTTTVVVVVVVTSSVTTTTSSSSSVTTTQRQLVTRSEDDDDTp

EIIVILLLVVVVVVIIVLLLILVLVVVIIIIAVVIVVVIAAAAAVVAATVVVVVVVVVAVVVVVVVVVVVVVVVVVVLVVVVVVAIAVALLLLLLVl

ERKRAEEETEEEAAAAERRETSTAKKKRKDDTTKTTEETRRQQNRRRRISSNNTDDDNDDDDDDDDSSSDDDETTNNTDSTQGSENDSDKKKKKKQp

EIIIILLLLIIIIIIIIIILLLIIFFLIIVVIIVVVAATVVIILFFVIVLLLLLLLLVIIVIIIILFFIIIIIIIIIVVVVLVVVVILVVVVVVLLl

ETTTTRRRTTTTTTTTTHQTTTHTTTTTWAAEETRRIITLLTTLIITTTEEDDDAADDEEQEEEAQTTDAAAKRRKKKREQSRGARTRRQEQRRRRp

EFIVLVVVLLIIVVVVVIILLFAVFFVFFFFVTLFFFFFLLYYMIILLFLLLLIIILLLLLLLMLVVVGVVVIIIVVVVVLMVALVVVVFLLLLLLl

ELFYFFFFYFYYYYYYYHHYFYYYYYFLFIIIAGLLRRMMIMMMVVFFLYFYFFTTYYYYYYYFYLYYYWWWWWWWWWWYWTFVMLYTVAAAAAALh

EFFFFFFFFFYYFFFFFYYYFFFFFFFFFLLFFKTTGGTTTRRPRRPPPGGGGDGGSAGGGGGGGGSSGGGGGGGGGGGGGGTSTSGSTRGSSSST.

ETSTAAAESSSSSSATAAATEEEEEEEEAAGGDPPPPEDNRRTQQPEGAGHSEEEDPPKKKKAPGGGASSSSSSSSATTATDSPDRPDKPPPPPPs

DDDDDDDDDDDDDDDDDDNDDDDDDDDDDDDDDDDDQEDDDDDPPTVQQKENQDEQDEDDDDNKKNNPEEEKDDDDDEKQE--S-GDDSESSS

c

--------------------------------------------------------------------TTTQKKKKAQN----------------.

--------------------------------------------------------------------AAASSSASSSVSP--------------.

SRRRRRRRRKKRRRRRRRGKRRRRKRSRGGGGGGGGGGGGGGGGGGGTGGGGGGGDGGGGGGGGGEEGGGGGGGGGGGGGGGGGGGGSSGGGGGGs

KEEEEEEDEEEDDEEGPPHDEDPDDDEDSSTTTTTIVSTTTTIAVQQNQQQVAAAVPAATAAEEVKKPKKKEAAAAAAKRQTQQPVSVTRLQLQE.

-------------------------------------TTT--------IIIVVVV-TLLLLLLLVKKLMMMMLLLLLLMMI----VFFIIIIIIV.

-------------------------------------FFF--------QQQQSSS-FKKKDKRR-FF-AAAAAAAAAASAV----DTAVVTTTTD.

PPPPPPPPPPPPPPPPAVEPPPPPPPPPRRKKTSSAV-VVVVTEEPPAAAAAAAAPSSSSSSTSP--NLLLLLLLLLLLSMTTRAAPPPPPPPPAs

VAACAAAVAVVAAIAVVVAVVLLIIIVLAATTQLLAA-YYFFAIVIVVPPPAAPPTLETTTTTES--ASSSSSSSSSSAAAGAGTPTKAAAVAAEs

-GGGGGGGGGGGGGGGGGGQRLEEEE------------PP--------GGGGGGGSEVVVLLLVP--ATTTTTTTTTTTTTPPESRGGGGGGGGT.

VPPPPPPPPPAPPPPPPPPLGGGSSSVSSSSEVAATTTPPETTIIIVRSSAGTAASTGGGGGTGTSSPQQQQGGGGGGSNQPRPPLYLHSNNHNLs

EHYYYYYYYFYYYYYYFYYYSEMIKKDHHRRLLVRRRRRFFQQRRRRRRRRRKSSSRRDEEEEEEGSSRSSSSSSSSSSGSSQVTQRDHEEEEEET.

EELLRRRRRRRRRRQREEDPEGESEEYEESSSGERRRRRDDEDQDEEEEGGGGGGGGGGGGGGGGSGGQTTTSTTTTTTNTSDEEDGQENSTTTTDp

EVLITLLLVFLLVVFVKVVLVVTEIIFVVLLVVYYYYYYSSVVVVTILYLLLLIIIIIIVIIIIITVVVLLLLLLLLLLVLVIVVIVVILLLLVLVl

EETTKTTTTTTTTTTTTEESTVITIIVEEVVRRTAAAAQEEQQLTTQQELLLLVSSRTTTTTTPQVPPTSSSATTTTTTTTVSTSTSTRLHHHHHEp

EIVVVVVVVVVVVVIVVVIVVVVVVVVIVMMIIIIIIIVPPIILIVVVIVVVVVVVVVVVVVVVIVVVVVVVIVVVVVVVVVVVIVVVIVVVLVVVl

SPPGGGGPAPPPPPPPAAANPPHLPKSPPPPPPPPPPPIVGGPAAPPQAAAPPPPPDPPQAQPPPPPPPPPPAAGGAARQGPPKPPPAKKKKKKAs

PPAAAAAAAAAAAAPAAAPAAAGPPSPPAAAARPPPPATVAAPPPGGPPPPAAAAGPAPPPPAPASSAAAAASSAAAAGAAPVAPGAPSAASAAPs

XRQRRRRRRRRRRRRERRERERKEKKMGGGGGTLLRRLLLQRFGGGGGGGGGNGGRYRHHHHRLRRRREEESGGGGNNNGGGGGEGGQGGGGGGA.

KRRRRRRRRRRRRRRRRRRRRRRRRRKRSSAAGAAASAALNNTAAGGQTTTSGGGATSGSSSASGSSSGGGGKKKKGAGGEGRKHTQSMMMMMMA.

ESTTTTTTTTTTTTTTTVVVVTTTTTTSSRRRRRRRQQRPPRRRRRRRRTTTSQRRQESSSSSGTRLLSEEEEEEEEEEQTQVTQASIVTTTTTTTp

ELCVLLLLCVKKRRRLKRRRRRKKWWILLYYYYILLLLLQRLLTAAAAVRRRRRRRTVVEEDELVVRRASSSNTTSSTSSTRVVRVVTQAAATTAT.

EHHHHHHHHHHHHHHHHRRHCHHHHHHHHTTTTSASAADQQVVQNNDDNSSSSVVVAEASPPPPPAFFSSSSAVVIVVVTTTTSAIRARSSSAGAR.

ELLLLLLLLLVVVVVLLIIVLLIIVVALLMMLLVIIIIVRRVVVLLVLIIVVILLLLLVLVIIIVLVVVMMMLLLLLMLMMVRRIRLVVLIAVVVVl

ERRRRRRRRRRRRRRRRRRRRRRRRRWRRAAAALDDDDDITAADVVNNIVVNVSSSASLLLLLLLLNNLEEEDNNNNNSDNDRRRRDDADDDDDDSp

ELFFFLLLFFFFFFFFFFFVTTTTLLLLLVVLLLVVVVLVVVVLIIVVALLLLLLLIVLLLLLLLVLLLLLLLLLLLLLLLLLLLLLLVLLLLLLLl

DDNNDDDNNNNNNNNNNNNNDSSDDNDDHHQHDNNNNRVIHHSNNNNNAAAASAADLSSSSSSSDAAASSSASSAAASSSAAAGAGRSSGGGGGEs

KEDDDDDDDDDDDEEEDDDSRSSKKKKKEEEEAEDEEEVVDDSEQEEEGGGGGGGQPTTTTTTTAGGGAAAAAAAAAAAAAAEGSSPHGDDDEDGs

LLLLLLLLLLLLLLLFLLLEPLLSSPLLLLAAIVVVVIMMIINIVILILLLLFLLLLLLLLLLLIIIWAAAAAAAAAAAAALLLLVVVIVVVVVPl

-DRARRRQDNNTTNEEIIIQERHEEE------D-----GGTTYFFAVFAAAAAAALVTSITTTSTAAFVVVAAAAAAAAAAAAAAILLTTTTTTM.

-NDDDDDDDDDEEDDEFFDDETREEL------P-------LLPNNPPSPPPAPPPPEADDEDAGGPPPDDDPSSSSASPAPPPGPP--HRRRRRE.

-------------------FVG-III-------------------------------------S-------------------------------.

-------------------GGDSLLG------------DDPP-----------------------------------------------------.

-PPPPPPPPPPPPPPPPPPNGGGGGN------------LLDD-----------------------------------------------------.

-EEEEEEAEAAEEEEEEEYYAQEVVL------N-------GG-DDNGDDGGGGDDGHEVDEEDPAGGDGGGNGGEGKGGNNEEGERRGGGGGGG-.

GPPPPPPPPPPPPPPEAAATRGRKKVGG----L-------ST-TTAATQEEQVLLAAAKQKTVEELLEQQQQQQQQQQQQQAAAARGPEEEEEE-.

IIIVIIIVVIIIIIIILLPVVIILLIIIYFFF-VVVVVLLRR---------------------------------------------A-------.

PPPPPPPPPPPPPPPPPPPPPPPEEPPPPPPP-PPPPPPPPP---------------------------------------------I-------.

----------------------------------------LL---------------------------------------------S-------.

KVLRRRRRRHHRRKCTLLLIRPERRKKRDDDN-----S--AA-----------------------------------------------------.

CNDDGGGGDDDDDDDGDDASGGNNNECCQQEE-EEDHPNNDN---------------VQEQDA------------------------DAPAAAAR.

KTTTTTTTTTTTTTTTEVEEVIVVVVKKTTTSQLLLLQRAATKPSSNFEESKTASNEDTSETPSSSSSHHHQDDDDDKDDDATETDAGGGGGGGRs

PPDSDDDGPDDDDDDPDDDQPPPPPPPPTIAASGGGGGSSSNADDAANKQQDSSSLAQDDNDDDSRRAGGGGAAAAAGGAAVVSAEDASSSSSSDs

EYYYYYYYFYFFYYYYFFFYYYYYFFYYYIIVVHVVFFVFFFFFAALLLLLLLPPPVVLLLLVVLPIILLLLLLLLLLLLLLTTLTLATLLLLLLAh

ESSAAAAAAAAAAAAAAGGGAAAAAASSSSSATTAAAAGGGGGGPPPPPSSAAVVAAATTTTTTTAAAAFFFYYYYYYYYWRATAAAAGVIVVVLTs

EISSSTTTTSSSSSSSSAAIIVMIIILIITAAVITTTTTMMMMLAAAAIIVVVVVVVVAVVVVVAVVVVVVVVAAVVVVVVVLVVLIVLLVLLLLLh

EXVVVVVVLLVVVVIVVVVVMREEKKVMMTTIISTTEEMRRRRRIIIIEHRHHHHHHHHHHHHHHRQQHTTTTTTTTTTKDNDDDDETLSSTTTTV.

EAIIIIIIVIIIIIIIIIIVLVVIVVVAAVVVVVIIVVLVVVVIIVVVVAVVAVVVVVVVVVVVVVVVVVVVVVVVVVVVVLVVVVVIVPPPSPPVl

EEEEERRREEQQEEQEEDEEEETAEEEEERRRREIIIKTTTIIAEEEESRRRKTEQTRTGSNTSTRKKTSSSSTTSSSSTIVEDKQVTTEETTTSIp

SSSSSSSSASSSSSSSAASSSSSSSSSSAASSAAAAASAAAAAASAASSSSSSSSVTTVAVVTAASSASSSSGGSSSSSSSVVVVASADEDGDDAu

NTDTDDDDDNNDDNDDDDNDDSDDDDNTTTNNNDDDDSTSDNTNNNNNATTNTRRRNRRRRRRTTDDDKKKKTTAVDRESQRHDRSDTPGTTRTDs

------------------------------------------------GGGGGGGSASSSSSSTGDDGEEEEDDDDEQSDGATGS---KASPSS-.

----------------------------------------------------------------------------------------AR-----.

----------------------------------------------------------------------------------------PA-----.

VVRVVVVQTVVVAVVVIVVCVAVVVQATQQQQLRRRRQQQQLAAAEEQGGGGGGGGGGGGGGGGGVVGTTTTTTTTTTTTSGGGGRQATTVVVVHs

EPPPPPPPPPPPPPPPPPPPPPPPKKNPPPPPPPPPPPPPPPPPPPAAAPPPPQQQRRRRRRRRRSPPGPPPPPPPPPPAAPRRVRGPPPPPPPPPs

EVVIIVVVIVIIIIIVVVVIVVVVIIIVVIVIIVIVVVIIIVVIVIIIIVIVVIVVVVILVVVIIVVVVIIIIVVVVVVVVVVVVVRLVVVVVVVVl

EVVVVVVVIVVVVVVVVVVVIVFIVVVVVVVVVVAAAAVVVAAAIILLVTAASTVVVAGGAGAGSRYYAAAAAAAAAASAAALLSLLVTVVVVVVVh

EXVVVVVVVVVVVVVVCVVVAVVVVVVMMVVAAAAAVVAAAVVAAAAAGAASAAAAPAAAAAAAAASSASSSAAAAAAAACSASGTGAAAAAAAAVs

EQQQQQQQQQQQQQQQQQQQQQQQQQQQQEEEEEEEEEEEEEEEEEEED-------GAVAAAAAAVVVW-------------WW-WAGTAAAAAAG.

ELHHHHHHHQHHHHHHHFFWYHYYHHMLLRRRRRRRRRRRRRRRRRRRR-------LL-------MLLA-------------MM-V-GLLLLLVLV.

ESTSTTTTTTTTTTTTTTTSGSSSSSSSSSSSSTAAAATTTTTTFFSST----------------S--Q---------------------------.

----------------------------IIILILLMMSMMMMVMMMMM----------------D--A---------------------------.

RRRRRRRRRRRRRRRRRRRRRRRRRRRRYYYFLYYYRYIIRRLNNAALVATVERRFYVVVVVVVIQQSVVVVIIVVISSTVVVSASSRERRQRRA.

LLLLLLLLLLLLLLLLLLQAMLLLLLLLAAPPIWWFFFFFFFVFFLLFIIIILLLTALQQQQRDWHHLVVVVVVVVVVVIIDDLDVVSVIVVVVIh

DDDDDDDDDDDDDDDDDDDDDDDDDDDDGGGGDRRAANGGGGGGGTTAQQQQQEEQQSAAAAVALSSMRRRRRRRRRRRRRRRQRLAFTVVVVVR.

VSSSAAASSSSSSSSSSSTPTSTTTTAVAPNGNNNGGAPPPPPAATSNQQQQQQQVRALLLLGGDIINTTTTTTTTSVVMTSAEPDLVRRRRRRHs

GRRRRRRRRRRRRRRRGGRRGTTRRRGGPPGGTGDGGGGGNGTDDDDDSSSSSSSMSDDDDDETGVVGVVVVVVIIIILIVSSTSRPEGGGGGGP.

KQHNRRRQQQQQQQQQTTQTQQQQQLKKTAVVSAAANNSSGGKLLIIAVVVVITTAGDDDDDARSKKETTTSSSTAARVDSLGVAYQGKEKKKKD.

NAAPAAAPPAAAAAAAAGAVEAPPPGDESARRGVAVVSTTSSGSSDDLLLLLVVVGRQKKKKEDIGGIMMMMMMMMMTMMMGSLGDSDGGGDGGE.

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

1

7

7

7

7

7

6

6

6

6

6

5

5

5

5

5

4

4

4

4

4

3

3

3

3

2

2

2

2

2

2

2

2

2

2

2

9

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2

1

2

3

3

2

3

3

3

3

3

3

0

1

1

1

1

1

1

1

1

1

1

4

1

1

1

1

1

1

3

0

1

0

0

1

1

1

0

0

7

7

8

8

3

7

7

8

8

3

6

6

8

8

2

6

6

7

7

3

2

0

0

1

8

5

4

4

4

6

6

6

6

6

6

9

6

1

1

1

7

7

7

7

7

6

0

0

0

7

7

8

8

4

5

5

3

3

4

5

0

7

6

6

7

6

4

1

1

1

3

1

1

1

5

2

2

2

1

2

2

3

0

1

1

3

9

5

9

7

1

1

0

1

1

7

7

8

8

3

5

5

7

7

1

9

9

5

5

8

4

4

8

8

0

7

8

1

4

8

0

4

2

0

9

5

3

3

1

6

0

5

5

3

5

5

5

5

3

3

8

8

8

3

1

0

0

8

1

6

8

8

8

1

7

4

9

9

1

7

2

\SOLOS||||||||||||||||||||||||||/\TETRA||||||||||||||||||/\DUOS|||||||||||||||||||||||||||||||||||||||||||||/

6

1K42: - 1-2 4 1 2 3 4 1 2 3

N

C

1NC7: 1 2 3 4 1 2 3 4

NC

H

C

N

W

C

N

C29

W7

N32

H25

A B C

D

S

L

C

A

R

l: aliphatic: ILV c: charged: DEHKR

Page 4 of 7(page number not for citation purposes)

Page 5: The Anabaena sensory rhodopsin transducer defines a novel

Biology Direct 2009, 4:25 http://www.biology-direct.com/content/4/1/25

from the second strand containing the SHEshChhN signa-ture (where "h" is a hydrophobic and "s" a small residue),which is nearly absolutely conserved in the ASRT-like fam-ily. The H and the C from this signature are seen to contactthe 1,2-ethanediol in the structure, suggesting that theymight be key to binding a ligand in this family. While thisstrand is well-conserved in the other families, these fami-lies possess their own distinctive residues instead of the Hand C. This suggests that though they are likely to containa similar cleft as the ASRT-like family, they might bind dis-tinct ligands. Further, the walls of this cleft are formed inpart by the dimer interface (Fig. 2). In line with this, wefound that the ASRAH domains in the other familiesalways occur either as dyads (the actinomycete family, Fig.1) or as tetrads (the amidase associated versions, Fig. 1),suggesting that they might all have the potential to formhomo- or hetero- tetrameric structures.

A ligand-binding pocket on one face of the sheet is partic-ularly common in other β-sandwich domains, especiallythose involved in carbohydrate binding e.g. the CMB4 ofxylanases and the DOMON domain. However the ASRAHdomain differs from the former in having its predicted-lig-and pocket on the opposite face relative to the CMB4domains and in being topologically very distinct from theten-stranded DOMON domains [10].

Domain architectures and gene neighborhoods point to a carbohydrate-binding role for the ASRAH domainWe sought to obtain further evidence for the ligand-bind-ing role of the ASRAH domains based on their domainarchitectures and predicted operonic associations. Thepredicted gene neighborhoods of the solo versions of theASRAH domain show multiple associations with genesencoding various carbohydrate metabolism proteins. Onegene neighborhood conserved across many phylogeneti-cally distant bacteria, including Thermotoga, combines agene for the solo ASRT-like ASRAH proteins with genesencoding a sugar isomerase, a sugar aldolase and a sugarkinase. Another widely distributed bacterial conservedgene neighborhood combines the solo ASRT-like gene

with a cluster of genes encoding an ABC transporter sys-tem specific for monosaccharide (Fig. 1). ASRT-like genesare also frequently combined in operons with genesencoding other potential sugar metabolism-relatedenzymes such as amylases (e.g. Sinorhizobium), pyruvatedecarboxylase/oxidase, a sugar/sugar acid dehydratase/racemase and a FAD-dependent oxidoreductase related tothe glycolate oxidase (Fig. 1). Interestingly, in the actino-bacterium Rubrobacter xylanophilus two paralogous copiesof ASRT orthologs are each associated with either the iso-merase, aldolase and kinase gene neighborhood or thepyruvate decarboxylase/oxidase, dehydratase/racemaseneighborhood. Thus, these linkages strongly support theASRT-like domains regulating distinct sugar metabolismprocess in cis. The amidase domain found fused to theASRAH domain is of the N-acetylmuramoyl-L-alanineamidase, which is consistent with a potential role for theASRAH domain in binding elements of peptidoglycan.The cell-surface versions found mainly in the actinomyc-etes are predicted to be lipoproteins because their N-ter-minal region contains a conserved signal peptidesequence followed by an absolutely conserved cysteinereminiscent of the "lipobox" [11]. They additionally con-tain an internal cysteine, just N-terminal to the ASRAHdomain (Fig. 2), which could form the site of a second dis-tinct lipid modification. Their predicted gene-neighbor-hood context, which is conserved in practically allactinomycetes encoding such an ASRAH domain protein,shows a tight linkage to a transcription factor of the WhiBfamily, a glycosyltransferase and an inactive version of azincin-like metallopeptidase. In some organisms the glyc-osyltransferase and the ASRAH domain containing pro-tein are fused into a single polypeptide (Frankia sp. CcI3)supporting the functional linkage between these proteins.Thus, these cell-surface proteins also display contextualconnections that are compatible with them binding sug-ars. It is likely that they are lipid-modified themselves andform an actinobacteria-specific cell-surface structure bybinding a polysaccharide created by the linked glycosyl-transferase or recruiting the same enzyme to a precursorpolysaccharide.

Conserved structural and sequence features of the ASRAH domainFigure 2 (see previous page)Conserved structural and sequence features of the ASRAH domain. A) Topology diagrams of Thermotoga TM1070 and the Rhodothermus marinus carbohydrate-binding module (CBM4-2) present in xylanase 10A. Equivalent strands are colored similarly. Numbers below each topology diagram illustrate the strand order. Note the circular permutation in the CBM4-2 domain. Additional N-terminal strands characteristic of CBM4-2 are colored orange. B) Cartoon representation of Thermotoga ASRAH homolog TM1070 (PDB: 1NC7), with key residue side-chains represented as sticks. Predicted ligand-binding residues are marked in green (H25 and C29), and other conserved residues (W7 and N32) in yellow. C) Surface model of the TM1070 tetramer, with the conserved histidine (H25) and cysteine (C29) of chain A marked in blue and yellow respectively. The 1,2-ethanediol molecule detected in the ligand-binding pocket of Thermotoga TM1070 is rendered as spheres. D) Comprehensive multiple sequence alignment of the ASRAH domain. Proteins are represented by their gene names, species abbreviations and gis. The coloring reflects the consensus at 70% conservation and highly conserved residues are shaded in red. Consensus abbreviations are listed in the lower panel. Refer to the Additional file 1 for species abbreviations.

Page 5 of 7(page number not for citation purposes)

Page 6: The Anabaena sensory rhodopsin transducer defines a novel

Biology Direct 2009, 4:25 http://www.biology-direct.com/content/4/1/25

Thus, the contextual evidence, combined with the struc-tural evidence for ligand-binding, favors the ASRAHdomain being a sugar-binding domain, in the manner ofmany other β-sandwich structures. This is not unexpectedfor the secreted or cell-surface versions of the ASRAHdomain because it is consistent with the function of manyother extracellular β-sandwich structures in binding cell-surface polysaccharides, which are parts of the peptidogly-can, glycoprotein and other polymeric carbohydrate lay-ers that decorate prokaryotic cell surfaces. However, theintracellular standalone ASRAH domains are somewhatfunctionally unexpected because they appear to representa novel regulatory mechanism a potential sugar-sensorthat might influence the function of other enzymes orsugar transporters encoded by the conserved gene-neigh-borhood via physical interactions with them dependingon sugar concentrations. In light of this, we propose thatthe unusual linkage to the rhodopsin, which is observedonly in the cyanobacterium Nostoc, might represent anovel mechanism in which the behavior of a light sensor(ASR) is influenced via interaction with an intracellularsensor of a sugar or a related metabolite (ASRT). It is plau-sible that this interaction might have a role in phototaxisin response to intracellular nutrient status.

General discussionOur investigation demonstrates that ASRT belongs to asuperfamily of domains predicted to bind small mole-cules, most likely sugars, in both extracellular and intrac-ellular locations in various bacterial and few archaealproteins. On one hand the presence of sugar bindingcapability in a β-sandwich scaffold is hardly unprece-dented for domains found in extracellular or cell-surfaceproteins. Nevertheless, the identification of such a poten-tial sugar-binding element in a novel cell-surface lipopro-tein is of interest especially given its presence acrossactinobacteria, including several pathogens such as Troph-eryma, Actinomyces and Thermobifida. On the other hand,the identification of such a function in an intracellularcontext is of interest especially because the tetramer-form-ing standalone versions are predicted to regulate a diverserange of sugar metabolism operons or even the light sen-sory behavior in a cyanobacterium. This observationpoints to a previously unreported mechanism of regula-tion by a potential standalone small-molecular sensorthat probably occurs at the level of protein-protein inter-actions rather than via the sensor domain of a one-com-ponent system transcription factor. In this respect such aregulatory process is closer to the allosteric regulation bysmall molecules.

While "sideways" ligand binding via one of the exposedsheets has been observed in several β-sandwich scaffolds(e.g. DOMON domain and other carbohydrate-bindingdomains [12]), the exact mode of binding is not estab-

lished for all such folds. The analysis of the ASRAHdomain presented here identifies the binding site for onemore of these β-sandwich scaffolds. The proposed bind-ing site for this superfamily of domains reinforces anobservation that presents an interesting evolutionaryconundrum: though several β-sandwich scaffolds bindligands in a "sideways" fashion, their topologies greatlydiffer from each other. This leads to the question as towhether there was repeated convergent evolution of the"sideways" binding mode in various β-sandwiches orwhether it represents an ancestral binding mode for the β-sandwich scaffolds, which was preserved despite theextensive topological rearrangements occurring as conse-quence of duplication of internal units or accretion ofadditional strands. Hence we hope that these observationswould also contribute to the more general understandingof evolution of ligand binding in β-sandwiches.

Materials and methodsGene neighborhoods were determined using a customscript that uses completely sequenced genomes or wholegenome shotgun sequences to derive a table of geneneighbors centered on a query gene. Then the BLAST-CLUST program is used to cluster products in the neigh-borhood and establish conserved co-occurring genes.These conserved gene neighborhoods are then sorted asper a ranking scheme based on occurrence in at least oneother phylogenetically distinct lineage ("phylum" inNCBI Taxonomy database), complete conservation in aparticular lineage ("phylum") and physical closeness onthe chromosome indicating sharing of regulatory -10 and-35 elements. Profile searches were conducted using thePSI-BLAST program [13] with a default profile inclusionexpectation (E) value threshold of 0.01. Profile-profilecomparisons were performed using the HHpred program[14]. HMM searches were conducted using the newlyreleased HMMER3 program [15]. Multiple alignmentswere constructed using Kalign [16] followed by manualadjustments based on PSI-BLAST results. Protein second-ary structure was predicted using a multiple alignment asthe input for the JPRED program [17].

Competing interestsThe authors declare that they have no competing interests.

Authors' contributionsLMI and LA were involved in the discovery process. RdeSand LA performed the analysis. RdeS and LA wrote thepaper. The figures were prepared by RdeS and LMI. Allauthors have read and approved the final manuscript.

Reviewers' commentsReviewer 1M. Madan Babu, MRC Laboratory of Molecular Biology,University of Cambridge, UK.

Page 6 of 7(page number not for citation purposes)

Page 7: The Anabaena sensory rhodopsin transducer defines a novel

Biology Direct 2009, 4:25 http://www.biology-direct.com/content/4/1/25

In this manuscript, De Souza, Iyer and Aravind report thecharacterization of a family of proteins which are relatedto the Anabena sensory rhodopsin transducer. Through arigorous computational analysis the authors first identi-fied homologs of the ASRT gene in several bacteria thatlack a sensory rhodopsin. Through a combination ofgenomic context and structural analysis, the authorsreport that this domain is likely to bind to small mole-cules, which are most likely to be sugars. In addition, theauthors identify a subset of such proteins that are pre-dicted to be secreted or membrane associated via lipidmodification in several actinobacteria species, which alsoincludes certain pathogens. Additional file 1 is compre-hensive and provides a good substrate for experiments. Inshort, this is an excellent study and I would support pub-lication of this work in Biology Direct upon minor revi-sion:

1. It is currently unclear how the authors identifiedsecreted proteins and the membrane associated proteins.Could the authors clarify this point, please? In addition, itwould be good to see a comment on the functional signif-icance of secreting sugar binding proteins rather than hav-ing them membrane associated.

Authors' responseSecreted proteins where identified based on detection ofan amino-terminal signal peptide by the program SignalP,followed by manual inspection. Transmembrane regionswere detected using TMHMM. Except for the Frankia sp.homolog, which is fused to glycosyltransferase, all pro-teins with signal peptides (dyads and tetrads) did not havetransmembrane regions and, therefore, are probablysecreted. The functional significance of secreting sugar-binding proteins: these sugar binding domains in theextracellular proteins are likely to provide a means forassociating with the sugar moieties in extracellular biopol-ymers such as peptidoglycans and other glycans present insurface capsules and gums.

Reviewer 2Mark A. Ragan, Institute for Molecular Bioscience, TheUniversity of Queensland

Brisbane, Australia.

I support the publication of your manuscript "The Ana-baena sensory rhodopsin transducer defines a novelsuperfamily of prokaryotic small-molecule bindingdomains", by RF De Souza, LM Iyer and yourself, in Biol-ogy Direct as a Discovery Note.

Additional material

AcknowledgementsWork by the authors is supported by the intramural funds of the National Library of Medicine at the National Institutes of Health, USA.

References1. Man D, Wang W, Sabehi G, Aravind L, Post AF, Massana R, Spudich

EN, Spudich JL, Beja O: Diversification and spectral tuning inmarine proteorhodopsins. Embo J 2003, 22:1725-1731.

2. Jung KH, Trivedi VD, Spudich JL: Demonstration of a sensoryrhodopsin in eubacteria. Mol Microbiol 2003, 47:1513-1522.

3. Vogeley L, Sineshchekov OA, Trivedi VD, Sasaki J, Spudich JL, LueckeH: Anabaena sensory rhodopsin: a photochromic color sen-sor at 2.0 A. Science 2004, 306:1390-1393.

4. Furutani Y, Kawanabe A, Jung KH, Kandori H: FTIR spectroscopyof the all-trans form of Anabaena sensory rhodopsin at 77 K:hydrogen bond of a water between the Schiff base andAsp75. Biochemistry 2005, 44:12287-12296.

5. Vogeley L, Trivedi VD, Sineshchekov OA, Spudich EN, Spudich JL,Luecke H: Crystal structure of the Anabaena sensory rho-dopsin transducer. J Mol Biol 2007, 367:741-751.

6. Finn RD, Tate J, Mistry J, Coggill PC, Sammut SJ, Hotz HR, Ceric G,Forslund K, Eddy SR, Sonnhammer EL, Bateman A: The Pfam pro-tein families database. Nucleic Acids Res 2008, 36:D281-288.

7. Morris DD, Gibbs MD, Chin CW, Koh MH, Wong KK, Allison RW,Nelson PJ, Bergquist PL: Cloning of the xynB gene from Dicty-oglomus thermophilum Rt46B.1 and action of the geneproduct on kraft pulp. Appl Environ Microbiol 1998, 64:1759-1765.

8. Jacobson RH, Zhang XJ, DuBose RF, Matthews BW: Three-dimen-sional structure of beta-galactosidase from E. coli. Nature1994, 369:761-766.

9. Simpson PJ, Jamieson SJ, Abou-Hachem M, Karlsson EN, Gilbert HJ,Holst O, Williamson MP: The solution structure of the CBM4-2carbohydrate binding module from a thermostable Rhodo-thermus marinus xylanase. Biochemistry 2002, 41:5712-5719.

10. Iyer LM, Anantharaman V, Aravind L: The DOMON domains areinvolved in heme and sugar recognition. Bioinformatics 2007,23:2660-2664.

11. Babu MM, Priya ML, Selvan AT, Madera M, Gough J, Aravind L, Sanka-ran K: A database of bacterial lipoproteins (DOLOP) withfunctional assignments to predicted lipoproteins. J Bacteriol2006, 188:2761-2773.

12. Hashimoto H: Recent structural studies of carbohydrate-bind-ing modules. Cell Mol Life Sci 2006, 63:2954-2967.

13. Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lip-man DJ: Gapped BLAST and PSI-BLAST: a new generation ofprotein database search programs. Nucleic Acids Res 1997,25:3389-3402.

14. Soding J, Biegert A, Lupas AN: The HHpred interactive serverfor protein homology detection and structure prediction.Nucleic Acids Res 2005, 33:W244-248.

15. Eddy SR: A probabilistic model of local sequence alignmentthat simplifies statistical significance estimation. PLoS ComputBiol 2008, 4:e1000069.

16. Lassmann T, Sonnhammer EL: Kalignan accurate and fast multi-ple sequence alignment algorithm. BMC Bioinformatics 2005,6:298.

17. Cuff JA, Clamp ME, Siddiqui AS, Finlay M, Barton GJ: JPred: a con-sensus secondary structure prediction server. Bioinformatics1998, 14:892-893.

Additional file 1Supplementary material. Material and methods and a complete list of conserved gene neighborhoods and comprehensive alignment of the: ftp://ftp.ncbi.nih.gov/pub/aravind/ASRAH/.Click here for file[http://www.biomedcentral.com/content/supplementary/1745-6150-4-25-S1.html]

Page 7 of 7(page number not for citation purposes)