9
JOURNAL OF BACTERIOLOGY, Jan. 2009, p. 23–31 Vol. 191, No. 1 0021-9193/09/$08.000 doi:10.1128/JB.01017-08 Copyright © 2009, American Society for Microbiology. All Rights Reserved. MINIREVIEW Bioinformatics Resources for the Study of Gene Regulation in Bacteria Julio Collado-Vides, 1 * Heladia Salgado, 1 Enrique Morett, 2 Socorro Gama-Castro, 1 Vero ´nica Jime ´nez-Jacinto, 1 Irma Martínez-Flores, 1 Alejandra Medina-Rivera, 1 Luis Mun ˜iz-Rascado, 1 Martín Peralta-Gil, 1 and Alberto Santos-Zavaleta 1 Programa de Geno ´mica Computacional, Centro de Ciencias Geno ´micas, Universidad Nacional Auto ´noma de Me ´xico, A.P. 565-A, Cuernavaca, Morelos 62100, Me ´xico, 1 and Departamento de Ingeniería Celular y Biocata ´lisis, Instituto de Biotecnología, Universidad Nacional Auto ´noma de Me ´xico, A.P. 510-3, Cuernavaca, Morelos 62100, Me ´xico 2 Genomics, which has been identified as the science of the century, is dramatically changing the historically weak relation- ship between experimental and theoretical biology. The addi- tion to the Journal of Bacteriology of a section for computa- tional biology marks a turning point in the history of this dialogue. This minireview is focused on the computational biology of gene regulation in bacteria, defined as the extensive use of bioinformatics tools to increase our understanding of the regulation of gene expression. The study of gene regulation has been radically affected by the elucidation of full-genome DNA sequences and the subse- quent development of high-throughput methodologies for deciphering their expression. Before the genomics era, most research was focused on individual biological systems. A large number of our colleagues, contributors to this journal, have devoted much of their academic careers to the understanding of individual regulatory units describing operons, regulators, and promoters and their roles in the physiology of the cell. These contributions have provided fundamental information to support the most recent efforts for the integrative knowl- edge of the cell that all the genomics sciences are achieving. Genomics offers for the microbiologist studying gene regu- lation the opportunity to understand individual systems in the context of the whole cell. These integrative sciences have also changed the landscape of data available for new discoveries in the evolution of gene regulation. The major challenge of the genomics era is dealing with large amounts of data at all mo- lecular levels and being able to generate integrated biological knowledge from the data. Bioinformatics is essential to progress in this direction, as it provides what is necessary to deal with large amounts of data: databases, algorithms to gen- erate genomic answers to standard questions, overviews, and navigation capabilities, as well as statistical methods to per- form and validate analyses. Current knowledge of gene regulation in prokaryotes is quite diverse, from the constantly increasing number of full genome sequences for which very little experimental work has been performed, including the many genomes that cannot yet be grown in the laboratory or the little-studied archaea, to the few highly characterized bacterial genomes, such as those of Bacillus subtilis and Pseudomonas aeruginosa, with Escherichia coli K-12 being by far the best-known bacterium and free-living organism. Figure 1 shows this strongly unequal distribution of knowledge, with the number of publications on gene regula- tion, as an example, for the different microbial genomes. The first exhaustive historical set of regulated promoters and their associated transcription factors (TFs) and TF DNA-bind- ing sites (TFBSs) gathered around 120 70 and 54 promoters of E. coli K-12 (16, 29). This information and the experience obtained were the seeds for what is now RegulonDB (http: //regulondb.ccg.unam.mx/), the original source of expert cu- rated knowledge on the regulation of transcription initiation and operon organization in E. coli K-12. It contains what is currently the major electronically encoded regulatory network for any free-living organism (25). This information is also con- tained in EcoCyc, the E. coli model organism database, with added curated knowledge on metabolism and transport (http: //ecocyc.org/). We estimate that currently around 25% of the interactions of the full cellular regulatory network of transcrip- tion initiation have been assembled. RegulonDB should not be conceived of as a database, but as an environment for genomic regulatory investigations, linked to bioinformatics tools that facilitate analyses of upstream regions, together with data sets and tools for microarray analyses and, more recently, direct access to full papers supporting its knowledge. We are not only curating up-to-date original papers, but have also initiated “active annotation,” to use Jean-Michelle Claverie’s terminol- ogy, to more precisely experimentally map promoters by using a high-throughput strategy. This minireview has been organized taking into account the fact that most of the readers of the Journal of Bacteriology are experimentalists. We start with a few examples of lessons on gene regulation from bioinformatics, focusing on promoters, their definition and regulation, and operon structure. The sec- ond section summarizes how RegulonDB has been useful to experimentalists, as well as its role as the “gold standard” for implementing bioinformatics predictive methods, topological analyses of the network, and models of the cell. The last sec- tion offers a compendium of links to bioinformatics resources on gene regulation in bacteria, illustrating their usage with flowcharts associated with questions on the regulation of gene * Corresponding author. Mailing address: Programa de Geno ´mica Computacional, Centro de Ciencias Geno ´micas, Universidad Nacional Auto ´noma de Me ´xico, A.P. 565-A, Cuernavaca, Morelos 62100, Mex- ico. Phone: 52 (777) 3139877. Fax: 52 (777) 3175581. E-mail: collado @ccg.unam.mx. Published ahead of print on 31 October 2008. 23 on August 22, 2020 by guest http://jb.asm.org/ Downloaded from

Bioinformatics Resources for the Study of Gene Regulation ... · Programa de Geno´mica Computacional, Centro de Ciencias Geno´micas, Universidad Nacional Auto´noma de Me´xico,

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Bioinformatics Resources for the Study of Gene Regulation ... · Programa de Geno´mica Computacional, Centro de Ciencias Geno´micas, Universidad Nacional Auto´noma de Me´xico,

JOURNAL OF BACTERIOLOGY, Jan. 2009, p. 23–31 Vol. 191, No. 10021-9193/09/$08.00�0 doi:10.1128/JB.01017-08Copyright © 2009, American Society for Microbiology. All Rights Reserved.

MINIREVIEW

Bioinformatics Resources for the Study of Gene Regulation in Bacteria�

Julio Collado-Vides,1* Heladia Salgado,1 Enrique Morett,2 Socorro Gama-Castro,1Veronica Jimenez-Jacinto,1 Irma Martínez-Flores,1 Alejandra Medina-Rivera,1

Luis Muniz-Rascado,1 Martín Peralta-Gil,1 and Alberto Santos-Zavaleta1

Programa de Genomica Computacional, Centro de Ciencias Genomicas, Universidad Nacional Autonoma de Mexico, A.P. 565-A,Cuernavaca, Morelos 62100, Mexico,1 and Departamento de Ingeniería Celular y Biocatalisis, Instituto de Biotecnología,

Universidad Nacional Autonoma de Mexico, A.P. 510-3, Cuernavaca, Morelos 62100, Mexico2

Genomics, which has been identified as the science of thecentury, is dramatically changing the historically weak relation-ship between experimental and theoretical biology. The addi-tion to the Journal of Bacteriology of a section for computa-tional biology marks a turning point in the history of thisdialogue. This minireview is focused on the computationalbiology of gene regulation in bacteria, defined as the extensiveuse of bioinformatics tools to increase our understanding of theregulation of gene expression.

The study of gene regulation has been radically affected bythe elucidation of full-genome DNA sequences and the subse-quent development of high-throughput methodologies fordeciphering their expression. Before the genomics era, mostresearch was focused on individual biological systems. A largenumber of our colleagues, contributors to this journal, havedevoted much of their academic careers to the understandingof individual regulatory units describing operons, regulators,and promoters and their roles in the physiology of the cell.These contributions have provided fundamental informationto support the most recent efforts for the integrative knowl-edge of the cell that all the genomics sciences are achieving.

Genomics offers for the microbiologist studying gene regu-lation the opportunity to understand individual systems in thecontext of the whole cell. These integrative sciences have alsochanged the landscape of data available for new discoveries inthe evolution of gene regulation. The major challenge of thegenomics era is dealing with large amounts of data at all mo-lecular levels and being able to generate integrated biologicalknowledge from the data. Bioinformatics is essential toprogress in this direction, as it provides what is necessary todeal with large amounts of data: databases, algorithms to gen-erate genomic answers to standard questions, overviews, andnavigation capabilities, as well as statistical methods to per-form and validate analyses.

Current knowledge of gene regulation in prokaryotes isquite diverse, from the constantly increasing number of fullgenome sequences for which very little experimental work has

been performed, including the many genomes that cannot yetbe grown in the laboratory or the little-studied archaea, to thefew highly characterized bacterial genomes, such as those ofBacillus subtilis and Pseudomonas aeruginosa, with Escherichiacoli K-12 being by far the best-known bacterium and free-livingorganism. Figure 1 shows this strongly unequal distribution ofknowledge, with the number of publications on gene regula-tion, as an example, for the different microbial genomes.

The first exhaustive historical set of regulated promoters andtheir associated transcription factors (TFs) and TF DNA-bind-ing sites (TFBSs) gathered around 120 �70 and �54 promotersof E. coli K-12 (16, 29). This information and the experienceobtained were the seeds for what is now RegulonDB (http://regulondb.ccg.unam.mx/), the original source of expert cu-rated knowledge on the regulation of transcription initiationand operon organization in E. coli K-12. It contains what iscurrently the major electronically encoded regulatory networkfor any free-living organism (25). This information is also con-tained in EcoCyc, the E. coli model organism database, withadded curated knowledge on metabolism and transport (http://ecocyc.org/). We estimate that currently around 25% of theinteractions of the full cellular regulatory network of transcrip-tion initiation have been assembled. RegulonDB should not beconceived of as a database, but as an environment for genomicregulatory investigations, linked to bioinformatics tools thatfacilitate analyses of upstream regions, together with data setsand tools for microarray analyses and, more recently, directaccess to full papers supporting its knowledge. We are not onlycurating up-to-date original papers, but have also initiated“active annotation,” to use Jean-Michelle Claverie’s terminol-ogy, to more precisely experimentally map promoters by usinga high-throughput strategy.

This minireview has been organized taking into account thefact that most of the readers of the Journal of Bacteriology areexperimentalists. We start with a few examples of lessons ongene regulation from bioinformatics, focusing on promoters,their definition and regulation, and operon structure. The sec-ond section summarizes how RegulonDB has been useful toexperimentalists, as well as its role as the “gold standard” forimplementing bioinformatics predictive methods, topologicalanalyses of the network, and models of the cell. The last sec-tion offers a compendium of links to bioinformatics resourceson gene regulation in bacteria, illustrating their usage withflowcharts associated with questions on the regulation of gene

* Corresponding author. Mailing address: Programa de GenomicaComputacional, Centro de Ciencias Genomicas, Universidad NacionalAutonoma de Mexico, A.P. 565-A, Cuernavaca, Morelos 62100, Mex-ico. Phone: 52 (777) 3139877. Fax: 52 (777) 3175581. E-mail: [email protected].

� Published ahead of print on 31 October 2008.

23

on August 22, 2020 by guest

http://jb.asm.org/

Dow

nloaded from

Page 2: Bioinformatics Resources for the Study of Gene Regulation ... · Programa de Geno´mica Computacional, Centro de Ciencias Geno´micas, Universidad Nacional Auto´noma de Me´xico,

expression that are entailed in the use of RegulonDB andassociated bioinformatics resources. We illustrate the very ef-ficient text analysis obtained using Textpresso to access morethan 2,400 full papers that support the E. coli regulatory net-work. A cautionary note: the examples of the three sections arebiased toward cases in E. coli and its RegulonDB regulatorydatabase. This bias is natural, since first of all, our directexperience is with RegulonDB, and second, especially for bac-terial-gene regulation (Fig. 1), we may well quote Fred Neid-hardt (87): “Not everyone is mindful of it, but all cell biologistshave two organisms of interest: the one they are studying andEscherichia coli!”

LESSONS FROM OVERVIEWS IN GENE REGULATION:PROMOTER DEFINITION, PROXIMAL-SITE

REQUIREMENT FOR RNAP-�70, ANDOPERON STRUCTURE

Historically, knowledge of gene regulation started with amodel of the lac operon, its cis-regulatory elements, and thenotion of allosterism in E. coli (43, 65). Quickly, this model,based on repression, had to be expanded to accommodate themore elaborated positive mechanisms of gene regulation. Sincethen, we have witnessed a gradual expansion in the diversity ofknowledge about the molecular anatomy of gene regulationand the rich mechanisms that together compose the decisionmachinery of the cell.

The discovery of a conserved motif in E. coli promoters, the�10 box, also called the Pribnow box, is a striking example ofhow a pattern that was visually discovered in as little as sevenDNA sequences has remained valid (75). This was an earlycontribution of bioinformatics to the study of gene regulation.Certainly, the identification of a promoter as the physical re-gion for the binding of the RNA polymerase (RNAP) resultsfrom the combination of transcription initiation experimental

mapping and pattern recognition, initially by visual inspectionand now performed by multiple-alignment methods (37, 39).Promoters in this sense come from a combination of experi-mental and bioinformatics evidence.

We should keep in mind, though, that gathering a largecollection of data in biology does not guarantee that we canmake sense of it or that new knowledge will emerge. Weillustrate this by showing what we have learned from an ex-haustive genomic collection of components of the regulatorynetwork, that is, regulated promoters and the relative distancesto their corresponding activator and repressor binding sites,and with a second example demonstrating how a very simpleanalysis of operon structure enabled the development of amethod to predict operons in E. coli and then in any bacterialgenome.

As mentioned above, in 1991, we gathered knowledge aboutaround 120 promoters in E. coli K-12 (16, 29), and we havecontinued since then, increasing the data set (see http://regulondb.ccg.unam.mx/html/Database_summary.jsp for thehistory of the accumulated curated knowledge). One of theclearest lessons from that collection effort was the conclusionthat regulation of transcription initiation in the case of the �70

holoenzymes (E�70) always requires a proximal DNA site (aproximal site is defined in terms of its position relative to thetranscription initiation site, so that a direct contact of the TFwith RNAP is assumed) (53, 74). Seventeen years later, version6.2 of RegulonDB (July 2008) has 1,754 promoters. Of these,697 are �70 promoters, 421 of which have at least one charac-terized binding site for a TF. This collection of promoters has1,382 associated binding sites with coordinate positions. Thedefinition of proximal sites in that initial review was from �65to �20 (16); however, single activation and coactivation werelater reported from �90 with cyclic AMP receptor protein(CRP) (12, 13, 116). We also learned, after 1991, that theflexibility of the � C-terminal domain of RNAP expands theproximal region to around �100, supporting direct contactwith CRP (5, 57). Thus, in principle, the range of positionsenabling direct contact with RNAP can be set from �95 to�20. Analyses with current data show that only 26 promoters,accounting for less than 5.9% of all regulated promoters, cur-rently lack sites within this proximal range. In principle, thereshould be no promoter subject to regulation from only remotepositions, other than �54 promoters (67), and we will not dis-cuss them in detail here. The distribution of proximal sites inthis range is shown in Fig. 2. We can observe the same generaltendencies discussed in the 1991 review, with repressors dis-tributed across all of the proximal region, with the downstream�30 to �20 interval being dominant. Repressors preventRNAP from interacting with the proximal region, between�30 and �20, or from preventing interactions of activators at�40, �50, and �60 central positions. The purB gene, cotrans-cribed with hflD, is repressed by PurR with only one operatorlocated at �892.5. It could interfere with transcription initia-tion or act as a roadblock and obstruct the progress of thetranscribing polymerase (30, 33).

As shown in Fig. 2, all activators, not only CRP, tend toprefer interacting near positions �40 and �70. The stronglydiminished occupancy around �50 has been shown to be dueto the binding of the � C-terminal domain of RNAP (4, 83).On the other hand, some activators overlap the promoter and

FIG. 1. Number of published particles per organism. We searchedthrough PubMed by using a collection of keywords that we regularlyuse to gather information for RegulonDB across different bacteria.The profile requires the name of the organism to be in the title.

24 MINIREVIEW J. BACTERIOL.

on August 22, 2020 by guest

http://jb.asm.org/

Dow

nloaded from

Page 3: Bioinformatics Resources for the Study of Gene Regulation ... · Programa de Geno´mica Computacional, Centro de Ciencias Geno´micas, Universidad Nacional Auto´noma de Me´xico,

can bind downstream of the �1 site. TFs of the MerR familyactivate transcription binding at a palindrome located betweenthe �35 and �10 elements of the promoter (71, 93). Otherregulators activate transcription through auxiliary binding siteslocated between �10 and �20 (38). A striking, well-docu-mented case is activation from �1848 by IHF, a TF known tobend the DNA to favor transcription (1).

Understanding gene regulation requires a detailed knowl-edge of operon structure. Thanks to the efforts of Mary Berlyn,around 1998 we populated RegulonDB with an initial set ofputative operons that we have since been expanding using bothexperimental information and predictions based on the ruleslearned from the original sets. Figure 3 shows the distributionof intergenic distances within operons (572 in 2002 versus1,839 in 2008) and its contrasting set of distances of genes atboundaries of operons in the same direction of transcription(346 in 2002 versus 1,311 in 2008). This striking, clear-cutdistribution of very short distances within operons as opposedto intergenic regions upstream of operons was the basis for theprediction of operons even for genes without an assigned func-tion in the E. coli genome, and subsequently in many bacterialgenomes (66, 84). As mentioned below, a high-quality curatedcorpus of knowledge such as this one has supported the devel-opment of bioinformatics methods capable of predicting manyaspects of the regulatory network.

We recall how the definitions of promoter elements arecurrently the basis of more elaborate bioinformatics methodsfor pattern recognition. We briefly discuss two examples inwhich gathering large amounts of data for individual cases hasbeen fruitful in studying the biology of gene regulation: theexamples of (i) promoters and their regulation and (ii) operon

structure. It is true, though, that these are large collections of“the same types of stamps.” The major challenge in genomics,as we have said, is nonetheless a different one, that of integrat-ing a particular system or gene and its product with expressionpatterns of similarly regulated genes as the cellular environ-ment changes, for instance.

HOW AN ELECTRONIC CORPUS ON GENEREGULATION HAS BEEN USEFUL

FOR EXPERIMENTALISTS

Here, we illustrate how the experimental and bioinformaticsscientists studying gene regulation, not only in E. coli but alsoin many other organisms, have made extensive use of Regu-lonDB to gain insights into several aspects of gene regulation.We note that RegulonDB contains detailed, accurate, and up-to-date information about operon organization, regulatoryDNA sites for TFs, promoters, terminators, and RNA regula-tory elements that have been both experimentally determinedand predicted. Together, these elements constitute the knowntranscriptional regulatory network of E. coli. The primarysource of information for this section was obtained from ques-tions addressed to the database by users and a search of theliterature for articles citing or making use of RegulonDB. Weare certain that these questions are valid for other databasesand for any other bacterial species. Table 1 provides a list ofselected articles published since the creation of RegulonDB,emphasizing the purposes for which they have used Regu-lonDB. People studying gene expression or modulonarchitectures by using microarrays or proteomics data, formany wild-type and mutant strains grown under different con-ditions, have taken into account the operon structure in E. colito make sense of experimental data. For instance, in work byYooseph et al. (114), the 7.7 million Global Ocean Samplingsequences were analyzed using the collection of transcriptionunits in RegulonDB for their statistical analysis and to identifysame-operon gene pairs. RegulonDB has also been used for

FIG. 2. Distribution of TFBSs. RegulonDB version 6.2 has 697 �70

promoters, 421 of which have at least one characterized binding site fora TF. The figure displays the distribution of central positions of acti-vator and repressor DNA-binding sites in the �95 to �20 interval. Thepercentage of promoters was divided by the number of activator orrepressor DNA-binding sites with the center position within each in-terval of 10 bp. This figure can be compared to Fig. 2 in reference 16.

FIG. 3. Intergenic distances of genes within and at transcriptionunit boundaries. The sharply different distributions of these distancesenabled the use of a direct method to predict transcription units in thecomplete E. coli genome. This figure is very similar to Fig. 3 in refer-ence 84.

VOL. 191, 2009 MINIREVIEW 25

on August 22, 2020 by guest

http://jb.asm.org/

Dow

nloaded from

Page 4: Bioinformatics Resources for the Study of Gene Regulation ... · Programa de Geno´mica Computacional, Centro de Ciencias Geno´micas, Universidad Nacional Auto´noma de Me´xico,

the identification of regulatory binding sites and determinationof how this information correlates with gene expression inwild-type and TF mutant strains. Promoter and TF binding sitemapping by using genomic strategies (chromatin immunopre-cipitation [ChIP]-chip) have also relied on this database as thesource of primary information, or even proteomics data. In thegenome-wide location of RNAP promoter sites using ChIP-chip, the 961 identified promoters in RegulonDB were used toset the 26% negative-detection rate (35).

The corpus of knowledge on gene regulation has been es-sential for diverse implementations relying on bioinformatics.As depicted in Table 1, this corpus provides the means togenerate and test predictions for a large collection of regula-tory elements, such as promoters, TFs, and TFBSs; completegenomic repertoires of TFs and operons; and even unidentifiednetwork interactions. It has also served the purpose of mod-eling the overall dynamics of the network and for proposingnovel biological concepts, such as the network motifs reportedby the group of Uri Alon (90) or the notion of hierarchical andmodular networks described by Ravasz and colleagues (79).

Annotated genomes and ingenious ways to transfer knowl-edge, with all the associated risks of assumptions of ortholo-gous relationships, enable us nonetheless to estimate that thewiring of the regulatory network is different across organisms,especially among bacteria (59, 60). We know better now thatthe evolutionary origin of regulatory interactions in bacteriadepends on gene duplication and specialization, operon reor-ganization, binding-site duplications, and horizontal genetransfer (76, 97).

A COMPENDIUM OF BIOINFORMATICS RESOURCESFOR STUDYING BACTERIAL GENE REGULATION

AND INTRODUCTORY PROTOCOLS

Designing a representation of the rich and variable knowl-edge of biological systems in order to encode it into a formaldatabase management system, together with the correspondingdata-gathering and curation processes, is one of the majorinfrastructure-building efforts that characterize computationalbiology. This is apparent if one examines the year’s first volumeof Nucleic Acids Research, which is devoted to databases. Wedid not find an integrated collection of databases and bioin-formatics tools devoted to gene regulation; therefore, we havegathered here an exhaustive selection of resources specificallydealing with gene regulation in bacteria. Based on the com-pendium gathered by Galperin in 2008 (24) plus Ecoli Hub(http://www.ecolicommunity.org/) and BIOPAX pathway data-bases (94), we identified approximately 100 different resources(from 240) that were directly related to prokaryotic-gene reg-ulation. Our compendium groups sites, among others, for TFsand gene regulation (e.g., devoted specifically to the AraC/XylR families [102] or to TFs [110]); for RNAs; and for bio-logical pathways and regulatory networks, microarray data-bases, and some other related themes, such as signaltransduction pathways (http://genomics.ornl.gov/mist/), pro-tein-protein interactions, genome databases, the published lit-erature, and metadatabases. Users should be aware that we didnot specify for each resource the date of its last update, whichis quite variable. This compendium should be a useful resourcefor those interested in searching or analyzing regulatory fea-

TABLE 1. Uses of RegulonDBDescription Reference

ExperimentalGene expression analysis (microarray)

Operons improve estimation for cDNA microarrays ...............................................113Large-scale validation of regulation from expression profiles.................................22Cross talk between the plasmid and the chromosome.............................................32Microarray of a Shewanella oneidensis etrA mutant..................................................3Clustering gene expression data with error information..........................................100Affymetrix microarray coexpression of genes in operons ........................................31Quantitative description of large-scale microarray data ..........................................86Analysis of the NsrR regulon ......................................................................................23The Rcs phosphorelay and intrinsic antibiotic resistance........................................52Growth defects and cross-regulation of gene expression.........................................92�S-dependent genes and their promoters ..................................................................50DNA adenine methyltransferase and gene regulation .............................................89The CRP regulon: in vitro and in vivo transcriptional profiling.............................115Genome-wide expression analysis of FNR regulation..............................................46Microarrays of genes in quorum sensing ...................................................................19Transcriptome analysis of E. coli ................................................................................101An extended regulon of the methionine repressor...................................................62Transcriptome polymorphism in E. coli/Shigella species..........................................54, 88

Global RNA half-life analysis and patterns of transcript degradationEarly osmostress gene expression using microarrays................................................111Transcriptome determination of transcription regulators........................................47Constraint-based in silico models of E. coli ..............................................................80

Analysis of ChIP-chip dataRNA polymerase binding sites by ChIP-chip ............................................................35

Analysis of protein abundanceProtein abundance profiling of the cytosol ................................................................42

Analysis regulatory mechanismsOuter membrane vesicle and membrane instability .................................................64

DNA sequence annotationThe symbiotic plasmid of Rhizobium etli CFN42......................................................27

Analysis of specific biological systemsMutant release factor 1 and 16S rRNA maturation.................................................45The PTS system of Vibrio fischeri................................................................................108

Analysis of gene expression dynamicsSoxRS-dependent transcriptional networks ...............................................................7

MetagenomicsThe Global Ocean Sampling: expanding protein families .......................................114

BionformaticsRegulatory-network prediction

Prediction of new members of regulons ....................................................................96Predicting transcriptional regulatory interactions .....................................................107

Promoter predictionRecognition and prediction of �70 promoters ...........................................................56Improved prediction of transcription start sites........................................................28Improving promoter prediction in E. coli ..................................................................11Transcription factor prediction....................................................................................73

Predicted transcriptional regulators in E. coliDNA-binding site prediction

Genomic prediction of transcriptional regulatory sites............................................98Binding sites in bacterial genomes..............................................................................55TF binding sites in E. coli and Streptomyces coelicolor ............................................51

Operon predictionOperon prediction in Pyrococcus furiosus ..................................................................103Operon prediction without expt ..................................................................................6A Bayesian network approach to operon prediction................................................8

Analysis of promotersComputational promoter analyses in 32 genomes ....................................................91

Orthologous regulatory identificationOrthologous TFs have different functions .................................................................77Transcription in archaea...............................................................................................49

Architecture of operons, promoters, or DNA-binding sitesGene expression with combinatorial promoters........................................................17Positional distribution of DNA motifs in promoter regions ...................................14Transcriptional units and operons in Bacillus and E. coli .......................................70

Evolutionary analysis of elements involved in transcriptionEvolution of noncoding DNA in prokaryotic genomes............................................82Evolution of DNA regulatory regions for proteogamma bacteria..........................78Co-volution of TFs depends on mode of regulation ................................................36

Prediction of gene/protein biological functionGene Function Predictor based on context analysis.................................................21Protein function and linkages based on genome organization................................95Inference of functional relationships from predicted operons................................44

Inferring the role of transcription factorsInferring the role of TFs in regulatory networks......................................................106

Source for other databasesMultidimensional annotation of the E. coli K-12 genome ......................................48Regulatory interactions in gammaproteobacterial genomes....................................72

Modeling and topology of the networkModeling the overall dynamics of the network

Genetic and metabolic regulatory networks of Bacillus...........................................26Reconstruction of microbial transcriptional regulatory networks...........................34

Metabolic network and connectivity analysesHierarchy and modularity in metabolic networks.....................................................79Low-degree metabolites in biological networks ........................................................85Regulatory network analysisLogical types of network control in gene expression ...............................................63Dynamics in regulatory networks from four kingdoms............................................2Gene expression in negatively autoregulated circuits...............................................61

Novel topological conceptsNetwork motifs in the regulation network of E. coli................................................90

26 MINIREVIEW J. BACTERIOL.

on August 22, 2020 by guest

http://jb.asm.org/

Dow

nloaded from

Page 5: Bioinformatics Resources for the Study of Gene Regulation ... · Programa de Geno´mica Computacional, Centro de Ciencias Geno´micas, Universidad Nacional Auto´noma de Me´xico,

tures across bacterial genomes. The name of the site and itsURL address, together with a short description and a list oftools available, can now be accessed at http://regulondb.ccg.unam.mx/Additional_resources.jsp.

Several resources have easily implemented documentation,tutorials, and demonstrations to help the user. Even thoughbioinformatitians devoted to database construction and main-tenance invest important efforts in the design and implemen-tation of user-friendly interfaces, it is not uncommon for first-time users to have trouble finding the best way to address theirquestions of interest. Let us take the following question as anexample: “If I have a gene, how can I find out all that is knownabout its regulation and operon organization?” This simplequestion generates a rather complex and rich answer, includingthe sequences, coordinates, and regulatory effects of everysingle TFBS and all promoters of the gene, as well as its operonorganization. Figure 4 shows a flowchart that explains how thisinformation can be obtained in a few navigation steps, which inthis case all occur within RegulonDB. Many other resourcesand databases display this same information differently. Forinstance, the PRODORIC genome browser has the ability tozoom in to the level of the DNA sequence, showing the bindingsites and promoters (69). RegulonDB is linked to Gene Ex-pression Tools (GetTools) (40), as well as to the RegulatorySequences Analysis Tools (RSAT) website (99), and contains asuite of tools built to predict and analyze regulatory regions for

663 available microbial genomes, in addition to 62 eukaryoticones. These tools are designed to answer questions related togroups of genes suspected to be coregulated, a very commonsubject whenever a research group has done a microarray ex-periment, or a group of genes from a ChIP-chip (or ChIP-sequence) experiment. Figure 5 shows a flowchart for a givenset of genes from a ChIP-chip experiment with LexA (109).RSAT generates a collection of upstream sequences given thegene set as input, while RegulonDB contains a position-spe-cific matrix that was derived from the collection of experimen-tally characterized binding sites (TFBSs). These matrices canbe used to scan sequences in order to predict putative targetTFBSs in the whole set of upstream regions of the genome andcan then be displayed in a graph. The RSAT team has justpublished several protocols for a variety of similar questions.The main goal of these introductory protocols is to illustrateand motivate the use of bioinformatics resources for the studyof gene regulation by illustrative questions, showing in com-prehensible flowcharts an easy way to answer them. In workreported by Defrance et al. (18), flowcharts and protocolsdescribe in detail how to discover the TFBSs common to aregulon obtained from RegulonDB, or any other bacterial da-tabase, in fact. The whole interaction network of E. coli re-ported in RegulonDB can also be analyzed (10). Given a set ofnames of commonly expressed genes from, for instance, amicroarray or ChIP-chip experiment with any bacterial ge-

FIG. 4. Flowchart for gathering all regulation information for a single gene. Navigation options are shown, starting from the main page ofRegulonDB with the name of a gene, melA or melR in this example. The MelR-CRP complex regulon is shown.

VOL. 191, 2009 MINIREVIEW 27

on August 22, 2020 by guest

http://jb.asm.org/

Dow

nloaded from

Page 6: Bioinformatics Resources for the Study of Gene Regulation ... · Programa de Geno´mica Computacional, Centro de Ciencias Geno´micas, Universidad Nacional Auto´noma de Me´xico,

nome, using RSAT one can obtain the upstream regions andsearch for common motifs or TF binding sites and cis-regula-tory modules (104).

EcoCyc contains a large collection of graphic and text dis-plays, including a genome browser that shows genes by geneontology class, operons, and all elements of gene regulation ina region of the genome, and it provides a network display of

regulatory interactions, in addition to the Omics viewers thatenable the user to display pathways and regulatory networksbased on an input file, for example (48). Graphic displays of aset of genes can be obtained for several genomes with thePRODONET tool (http://www.prodonet.tu-bs.de/). For manybacteria, given a set of genes, their functional classes can beobtained (http://www.jprogo.de/). Several graphics tools areavailable to show the genomic context of a gene within itsgenome, as well as the contexts of its orthologs within severalrelated genomes (see, for example, GeConT [15] or http://img.jgi.doe.gov/cgi-bin/pub/main.cgi). The number of plausibleelaborated questions is certainly quite large. We invite theinterested reader to use the compendium of resources and tosee additional flowcharts by visiting RegulonDB at http://regulondb.ccg.unam.mx/Flow_charts.jsp. We believe thesewill serve as examples that the user can modify and use to find,intuitively, equivalent usages in other bacterial databases.

ACCESS TO SPECIFIC GENE REGULATIONLITERATURE AND FULL PAPERS REQUIRES

A SEPARATE MENTION

Textpresso, a powerful text-mining engine for studying thescientific literature (68), was implemented for RegulonDB(http://regulondb.ccg.unam.mx/Textpresso/), and 2,472 full-text papers, 3,125 abstracts, and more than 4,200 curator notescan be directly searched (81). This valuable tool allows theexperimental researcher to search through categories, key-words, and ontology classes with the specific gene, promoter,operon, or TF of interest through the knowledge space of fullpapers that support the electronically encoded transcriptionalregulatory network of E. coli.

CONCLUSIONS

As we have mentioned, genomics has changed the focusfrom individual systems to an understanding of the whole cell.The study of gene regulation, as well as almost any other aspectof modern molecular and cellular biology, requires bioinfor-matics tools and methods to manipulate and analyze the largeamounts of available information to eventually generate amore integrated perspective and knowledge of the cell. Theglobal analysis of TFBSs and their positions relative to tran-scription initiation illustrates a vivid combination of details ofindividual systems and the search for a unified understandingbased on the need for bound TFs to interact with RNAP. Thisis one example, among many others, of what genomic per-spectives offer as opportunities and challenges. To para-phrase Whitehead (112), integration of gene regulation in-volves both the cautious gathering of details and a passionfor understanding.

The bioinformatics infrastructure for microbial-gene regula-tion involves an important effort to maintain and update theconstantly increasing body of knowledge in this field resultingfrom the accumulation of experiments performed in many lab-oratories through the years. In fact, this paper celebrates the 10years since the first publication of RegulonDB (41). Of course,no matter how much human effort databases involve, theyremain tips of icebergs compared to what is found in eachpaper. New methodologies are helping to manage in more

FIG. 5. Flowchart for ChIP-chip data and genes with similar DNA-binding site motifs. The example uses as input a set of genes from aChIP-chip experiment with LexA (109). RSAT (99, 105) was used toobtain the collection of upstream sequences, given the ChIP-chip geneset. The position-specific matrix (PSSM) for LexA was obtained byselecting Downloads3 Data sets3Matrix alignment from the Regu-lonDB main menu. Then, it was pasted into RSAT to run a matrixscan. This program will search, given a threshold, for predicted sites inthe complete set of upstream regions of the genome, and the resultscan be automatically obtained in a graphic display by using the featuremap program.

28 MINIREVIEW J. BACTERIOL.

on August 22, 2020 by guest

http://jb.asm.org/

Dow

nloaded from

Page 7: Bioinformatics Resources for the Study of Gene Regulation ... · Programa de Geno´mica Computacional, Centro de Ciencias Geno´micas, Universidad Nacional Auto´noma de Me´xico,

intelligent ways the large amounts of information, such ascomputational access to query a full-text specific corpus ofliterature or the scientific community’s participation in Ecoli-Wiki within the EcoliHub (http://www.ecolicommunity.org/),and many others.

This review will fulfill its purpose if it facilitates the appre-ciation by experimentalists of the usefulness of databases andprograms devoted to the study of gene regulation. The empha-sis of this review has been on the use of bioinformatics re-sources, not on their implementation or the challenges ahead,which are numerous. For instance, what is the best way tocurate experiments as new high-throughput technologiesemerge? There is a lot of work to be done in order to expandthe prominent databases beyond the level of transcription ini-tiation and to integrate the evolving knowledge about otherlevels of gene regulation, for instance, those of single mole-cules and single-cell experiments (20). Furthermore, feedbackloops of the type supporting multistationarity and stochasticity(20, 58) or those responsible for the return of systems to theirinitial state, have not yet been systematically described in da-tabases (however, see Goelzer et al. [26] for further informa-tion). A major challenge, given the increasing number of se-quenced genomes, will be to generate predictive tools that canexpand our knowledge of a few bacteria to a similar extent forthe many more organisms for which there is currently muchless experimental support (9).

The study of the regulation of gene expression of bacterialsystems will no doubt make important contributions in thiscentury of genomics, both to the understanding at the molec-ular level of the capabilities of the basic unit of life, a singlecell, and in the many potential technological applications. Mul-tidisciplinary teams and future, younger generations are wel-come to this enterprise.

ACKNOWLEDGMENTS

We dedicate this article to our colleagues and collaborators whohave contributed to RegulonDB through the 10 years since its firstpublication. The experimental work behind RegulonDB has beenmainly carried out by L. Olvera, M. Olvera, and A. Mendoza. J.C.-V.also recognizes long-term collaborations with Jacques van Helden,Rick Gourse, Robert Gunsalus, and Jim Hu, as well as importantdiscussions with Jaime Mora. We acknowledge the comments andsuggestions by the editor and two anonymous reviewers, which moti-vated major modifications to the previous version of this review.

This work was funded by the National Institutes of Health, grantsnumber R01 GM071962-05 and GM077678, and by UNAM, PAPIITgrant number IN214905.

REFERENCES

1. Abouhamad, W. N., and M. D. Manson. 1994. The dipeptide permease ofEscherichia coli closely resembles other bacterial transport systems andshows growth-phase-dependent expression. Mol. Microbiol. 14:1077–1092.

2. Balleza, E., E. R. Alvarez-Buylla, A. Chaos, S. Kauffman, I. Shmulevich,and M. Aldana. 2008. Critical dynamics in genetic regulatory networks:examples from four kingdoms. PLoS ONE 3:e2456.

3. Beliaev, A. S., D. K. Thompson, M. W. Fields, L. Wu, D. P. Lies, K. H.Nealson, and J. Zhou. 2002. Microarray transcription profiling of aShewanella oneidensis etrA mutant. J. Bacteriol. 184:4612–4616.

4. Belyaeva, T. A., J. A. Bown, N. Fujita, A. Ishihama, and S. J. Busby. 1996.Location of the C-terminal domain of the RNA polymerase alpha subunitin different open complexes at the Escherichia coli galactose operon regu-latory region. Nucleic Acids Res. 24:2242–2251.

5. Belyaeva, T. A., V. A. Rhodius, C. L. Webster, and S. J. Busby. 1998.Transcription activation at promoters carrying tandem DNA sites for theEscherichia coli cyclic AMP receptor protein: organisation of the RNApolymerase alpha subunits. J. Mol. Biol. 277:789–804.

6. Bergman, N. H., K. D. Passalacqua, P. C. Hanna, and Z. S. Qin. 2007.Operon prediction for sequenced bacterial genomes without experimentalinformation. Appl. Environ. Microbiol. 73:846–854.

7. Blanchard, J. L., W. Y. Wholey, E. M. Conlon, and P. J. Pomposiello. 2007.Rapid changes in gene expression dynamics in response to superoxidereveal SoxRS-dependent and independent transcriptional networks. PLoSONE 2:e1186.

8. Bockhorst, J., M. Craven, D. Page, J. Shavlik, and J. Glasner. 2003. ABayesian network approach to operon prediction. Bioinformatics 19:1227–1235.

9. Bonneau, R., M. T. Facciotti, D. J. Reiss, A. K. Schmid, M. Pan, A. Kaur,V. Thorsson, P. Shannon, M. H. Johnson, J. C. Bare, W. Longabaugh, M.Vuthoori, K. Whitehead, A. Madar, L. Suzuki, T. Mori, D. E. Chang, J.Diruggiero, C. H. Johnson, L. Hood, and N. S. Baliga. 2007. A predictivemodel for transcriptional control of physiology in a free living cell. Cell131:1354–1365.

10. Brohee, S., K. Faust, G. Lima-Mendez, O. Sand, R. Janky, G. Vander-stocken, Y. Deville, and J. van Helden. 2008. NeAT: a toolbox for theanalysis of biological networks, clusters, classes and pathways. Nucleic Ac-ids Res. 36:W444–W451.

11. Burden, S., Y. X. Lin, and R. Zhang. 2005. Improving promoter predictionfor the NNPP2.2 algorithm: a case study using Escherichia coli DNA se-quences. Bioinformatics 21:601–607.

12. Busby, S., and R. H. Ebright. 1999. Transcription activation by cataboliteactivator protein (CAP). J. Mol. Biol. 293:199–213.

13. Busby, S., D. West, M. Lawes, C. Webster, A. Ishihama, and A. Kolb. 1994.Transcription activation by the Escherichia coli cyclic AMP receptor pro-tein. Receptors bound in tandem at promoters can interact synergistically.J. Mol. Biol. 241:341–352.

14. Casimiro, A. C., S. Vinga, A. T. Freitas, and A. L. Oliveira. 2008. Ananalysis of the positional distribution of DNA motifs in promoter regionsand its biological relevance. BMC Bioinform. 9:89.

15. Ciria, R., C. Abreu-Goodger, E. Morett, and E. Merino. 2004. GeConT:gene context analysis. Bioinformatics 20:2307–2308.

16. Collado-Vides, J., B. Magasanik, and J. D. Gralla. 1991. Control site loca-tion and transcriptional regulation in Escherichia coli. Microbiol. Rev. 55:371–394.

17. Cox, R. S., III, M. G. Surette, and M. B. Elowitz. 2007. Programming geneexpression with combinatorial promoters. Mol. Syst. Biol. 3:145.

18. Defrance, M., R. Janky, O. Sand, and J. van Helden. 2008. Using RSAToligo-analysis and dyad-analysis tools to discover regulatory signals in nu-cleic sequences. Nat. Protoc. 3:1589–1603.

19. DeLisa, M. P., C. F. Wu, L. Wang, J. J. Valdes, and W. E. Bentley. 2001.DNA microarray-based identification of genes controlled by autoinducer2-stimulated quorum sensing in Escherichia coli. J. Bacteriol. 183:5239–5247.

20. Elf, J., G. W. Li, and X. S. Xie. 2007. Probing transcription factor dynamicsat the single-molecule level in a living cell. Science 316:1191–1194.

21. Enault, F., K. Suhre, and J. M. Claverie. 2005. Phydbac “Gene FunctionPredictor”: a gene annotation tool based on genomic context analysis. BMCBioinform. 6:247.

22. Faith, J. J., B. Hayete, J. T. Thaden, I. Mogno, J. Wierzbowski, G. Cottarel,S. Kasif, J. J. Collins, and T. S. Gardner. 2007. Large-scale mapping andvalidation of Escherichia coli transcriptional regulation from a compendiumof expression profiles. PLoS Biol. 5:e8.

23. Filenko, N., S. Spiro, D. F. Browning, D. Squire, T. W. Overton, J. Cole, andC. Constantinidou. 2007. The NsrR regulon of Escherichia coli K-12 in-cludes genes encoding the hybrid cluster protein and the periplasmic, re-spiratory nitrite reductase. J. Bacteriol. 189:4410–4417.

24. Galperin, M. Y. 2008. The Molecular Biology Database Collection: 2008update. Nucleic Acids Res. 36:D2–D4.

25. Gama-Castro, S., V. Jimenez-Jacinto, M. Peralta-Gil, A. Santos-Zavaleta,M. I. Penaloza-Spinola, B. Contreras-Moreira, J. Segura-Salazar, L. Mu-niz-Rascado, I. Martinez-Flores, H. Salgado, C. Bonavides-Martinez, C.Abreu-Goodger, C. Rodriguez-Penagos, J. Miranda-Rios, E. Morett, E.Merino, A. M. Huerta, L. Trevino-Quintanilla, and J. Collado-Vides. 2008.RegulonDB (version 6.0): gene regulation model of Escherichia coli K-12beyond transcription, active (experimental) annotated promoters and Tex-tpresso navigation. Nucleic Acids Res. 36:D120–D124.

26. Goelzer, A., F. B. Brikci, I. Martin-Verstraete, P. Noirot, P. Bessieres, S.Aymerich, and V. Fromion. 2008. Reconstruction and analysis of the geneticand metabolic regulatory networks of the central metabolism of Bacillussubtilis. BMC Syst. Biol. 2:20.

27. Gonzalez, V., P. Bustos, M. A. Ramirez-Romero, A. Medrano-Soto, H.Salgado, I. Hernandez-Gonzalez, J. C. Hernandez-Celis, V. Quintero, G.Moreno-Hagelsieb, L. Girard, O. Rodriguez, M. Flores, M. A. Cevallos, J.Collado-Vides, D. Romero, and G. Davila. 2003. The mosaic structure of thesymbiotic plasmid of Rhizobium etli CFN42 and its relation to other sym-biotic genome compartments. Genome Biol. 4:R36.

28. Gordon, J. J., M. W. Towsey, J. M. Hogan, S. A. Mathews, and P. Timms.2006. Improved prediction of bacterial transcription start sites. Bioinfor-matics 22:142–148.

VOL. 191, 2009 MINIREVIEW 29

on August 22, 2020 by guest

http://jb.asm.org/

Dow

nloaded from

Page 8: Bioinformatics Resources for the Study of Gene Regulation ... · Programa de Geno´mica Computacional, Centro de Ciencias Geno´micas, Universidad Nacional Auto´noma de Me´xico,

29. Gralla, J. D., and J. Collado-Vides. 1996. Organization and function oftranscription regulatory elements, p. 1232–1245. In F. C. Neidhardt, R.Curtiss III, J. Ingraham, E. C. C. Lin, K. B. Low, B. Magasanik, W.Reznikoff, M. Schaechter, H. E. Umbarger, and M. Riley (ed.), Cellular andmolecular biology: Escherichia coli and Salmonella, 2nd ed. ASM Press,Washington, DC.

30. Green, S. M., T. Malik, I. G. Giles, and W. T. Drabble. 1996. The purB geneof Escherichia coli K-12 is located in an operon. Microbiology 142:3219–3230.

31. Harr, B., and C. Schlotterer. 2006. Comparison of algorithms for theanalysis of Affymetrix microarray data as evaluated by co-expression ofgenes in known operons. Nucleic Acids Res. 34:e8.

32. Harr, B., and C. Schlotterer. 2006. Gene expression analysis indicatesextensive genotype-specific crosstalk between the conjugative F-plasmidand the E. coli chromosome. BMC Microbiol. 6:80.

33. He, B., J. M. Smith, and H. Zalkin. 1992. Escherichia coli purB gene:cloning, nucleotide sequence, and regulation by purR. J. Bacteriol. 174:130–136.

34. Herrgard, M. J., M. W. Covert, and B. O. Palsson. 2004. Reconstruction ofmicrobial transcriptional regulatory networks. Curr. Opin. Biotechnol. 15:70–77.

35. Herring, C. D., M. Raffaelle, T. E. Allen, E. I. Kanin, R. Landick, A. Z.Ansari, and B. O. Palsson. 2005. Immobilization of Escherichia coli RNApolymerase and location of binding sites by use of chromatin immunopre-cipitation and microarrays. J. Bacteriol. 187:6166–6174.

36. Hershberg, R., and H. Margalit. 2006. Co-evolution of transcription factorsand their targets depends on mode of regulation. Genome Biol. 7:R62.

37. Hertz, G. Z., and G. D. Stormo. 1999. Identifying DNA and protein patternswith statistically significant alignments of multiple sequences. Bioinformat-ics 15:563–577.

38. Hudson, J. M., and M. G. Fried. 1991. The binding of cyclic AMP receptorprotein to two lactose promoter sites is not cooperative in vitro. J. Bacteriol.173:59–66.

39. Huerta, A. M., and J. Collado-Vides. 2003. �70 promoters in Escherichiacoli: specific transcription in dense regions of overlapping promoter-likesignals. J. Mol. Biol. 333:261–278.

40. Huerta, A. M., J. D. Glasner, R. M. Gutierrez-Rios, F. R. Blattner, and J.Collado-Vides. 2002. GETools: gene expression tool for analysis of tran-scriptome experiments in E. coli. Trends Genet. 18:217–218.

41. Huerta, A. M., H. Salgado, D. Thieffry, and J. Collado-Vides. 1998. Regu-lonDB: a database on transcriptional regulation in Escherichia coli. NucleicAcids Res. 26:55–59.

42. Ishihama, Y., T. Schmidt, J. Rappsilber, M. Mann, F. U. Hartl, M. J.Kerner, and D. Frishman. 2008. Protein abundance profiling of the Esch-erichia coli cytosol. BMC Genomics 9:102.

43. Jacob, F., and J. Monod. 1961. Genetic regulatory mechanisms in thesynthesis of proteins. J. Mol. Biol. 3:318–356.

44. Janga, S. C., J. Collado-Vides, and G. Moreno-Hagelsieb. 2005. Nebulon: asystem for the inference of functional relationships of gene products fromthe rearrangement of predicted operons. Nucleic Acids Res. 33:2521–2530.

45. Kaczanowska, M., and M. Ryden-Aulin. 2004. Temperature sensitivitycaused by mutant release factor 1 is suppressed by mutations that affect 16SrRNA maturation. J. Bacteriol. 186:3046–3055.

46. Kang, Y., K. D. Weber, Y. Qiu, P. J. Kiley, and F. R. Blattner. 2005.Genome-wide expression analysis indicates that FNR of Escherichia coliK-12 regulates a large number of genes of unknown function. J. Bacteriol.187:1135–1160.

47. Kao, K. C., Y. L. Yang, R. Boscolo, C. Sabatti, V. Roychowdhury, and J. C.Liao. 2004. Transcriptome-based determination of multiple transcriptionregulator activities in Escherichia coli by using network component analysis.Proc. Natl. Acad. Sci.USA 101:641–646.

48. Karp, P. D., I. M. Keseler, A. Shearer, M. Latendresse, M. Krummenacker,S. M. Paley, I. Paulsen, J. Collado-Vides, S. Gama-Castro, M. Peralta-Gil,A. Santos-Zavaleta, M. I. Penaloza-Spinola, C. Bonavides-Martinez, andJ. Ingraham. 2007. Multidimensional annotation of the Escherichia coliK-12 genome. Nucleic Acids Res. 35:7577–7590.

49. Kyrpides, N. C., and C. A. Ouzounis. 1999. Transcription in archaea. Proc.Natl. Acad. Sci. USA 96:8545–8550.

50. Lacour, S., and P. Landini. 2004. �S-dependent gene expression at theonset of stationary phase in Escherichia coli: function of �S-dependentgenes and identification of their promoter sequences. J. Bacteriol. 186:7186–7195.

51. Laing, E., K. Sidhu, and S. J. Hubbard. 2008. Predicted transcription factorbinding sites as predictors of operons in Escherichia coli and Streptomycescoelicolor. BMC Genomics 9:79.

52. Laubacher, M. E., and S. E. Ades. 2008. The Rcs phosphorelay is a cellenvelope stress response activated by peptidoglycan stress and contributesto intrinsic antibiotic resistance. J. Bacteriol. 190:2065–2074.

53. Law, E. C., N. J. Savery, and S. J. Busby. 1999. Interactions between theEscherichia coli cAMP receptor protein and the C-terminal domain of thealpha subunit of RNA polymerase at class I promoters. Biochem. J. 337:415–423.

54. Le Gall, T., P. Darlu, P. Escobar-Paramo, B. Picard, and E. Denamur.2005. Selection-driven transcriptome polymorphism in Escherichia coli/Shi-gella species. Genome Res. 15:260–268.

55. Li, H., V. Rhodius, C. Gross, and E. D. Siggia. 2002. Identification of thebinding sites of regulatory proteins in bacterial genomes. Proc. Natl. Acad.Sci. USA 99:11772–11777.

56. Li, Q. Z., and H. Lin. 2006. The recognition and prediction of �70 promot-ers in Escherichia coli K-12. J. Theor. Biol. 242:135–141.

57. Lloyd, G. S., S. J. Busby, and N. J. Savery. 1998. Spacing requirements forinteractions between the C-terminal domain of the alpha subunit of Esch-erichia coli RNA polymerase and the cAMP receptor protein. Biochem. J.330:413–420.

58. Losick, R., and C. Desplan. 2008. Stochasticity and cell fate. Science 320:65–68.

59. Lozada-Chavez, I., S. C. Janga, and J. Collado-Vides. 2006. Bacterial reg-ulatory networks are extremely flexible in evolution. Nucleic Acids Res.34:3434–3445.

60. Madan Babu, M., S. A. Teichmann, and L. Aravind. 2006. Evolutionarydynamics of prokaryotic transcriptional regulatory networks. J. Mol. Biol.358:614–633.

61. Maithreye, R., R. R. Sarkar, V. K. Parnaik, and S. Sinha. 2008. Delay-induced transient increase and heterogeneity in gene expression in nega-tively auto-regulated gene circuits. PLoS ONE 3:e2972.

62. Marincs, F., I. W. Manfield, J. A. Stead, K. J. McDowall, and P. G. Stockley.2006. Transcript analysis reveals an extended regulon and the importanceof protein-protein co-operativity for the Escherichia coli methionine repres-sor. Biochem. J. 396:227–234.

63. Marr, C., M. Geertz, M. T. Hutt, and G. Muskhelishvili. 2008. Dissectingthe logical types of network control in gene expression profiles. BMC Syst.Biol. 2:18.

64. McBroom, A. J., A. P. Johnson, S. Vemulapalli, and M. J. Kuehn. 2006.Outer membrane vesicle production by Escherichia coli is independent ofmembrane instability. J. Bacteriol. 188:5385–5392.

65. Monod, J., J. P. Changeux, and F. Jacob. 1963. Allosteric proteins andcellular control systems. J. Mol. Biol. 6:306–329.

66. Moreno-Hagelsieb, G., and J. Collado-Vides. 2002. A powerful non-homol-ogy method for the prediction of operons in prokaryotes. Bioinformatics18(Suppl. 1):S329–S336.

67. Morett, E., and M. Buck. 1988. NifA-dependent in vivo protection demon-strates that the upstream activator sequence of nif promoters is a proteinbinding site. Proc. Natl. Acad. Sci. USA 85:9401–9405.

68. Muller, H. M., E. E. Kenny, and P. W. Sternberg. 2004. Textpresso: anontology-based information retrieval and extraction system for biologicalliterature. PLoS Biol. 2:e309.

69. Munch, R., K. Hiller, A. Grote, M. Scheer, J. Klein, M. Schobert, and D.Jahn. 2005. Virtual Footprint and PRODORIC: an integrative frameworkfor regulon prediction in prokaryotes. Bioinformatics 21:4187–4189.

70. Okuda, S., S. Kawashima, K. Kobayashi, N. Ogasawara, M. Kanehisa, andS. Goto. 2007. Characterization of relationships between transcriptionalunits and operon structures in Bacillus subtilis and Escherichia coli. BMCGenomics 8:48.

71. Outten, C. E., F. W. Outten, and T. V. O’Halloran. 1999. DNA distortionmechanism for transcriptional activation by ZntR, a Zn(II)-responsiveMerR homologue in Escherichia coli. J. Biol. Chem. 274:37517–37524.

72. Perez, A. G., V. E. Angarica, A. T. Vasconcelos, and J. Collado-Vides. 2007.Tractor_DB (version 2.0): a database of regulatory interactions in gamma-proteobacterial genomes. Nucleic Acids Res. 35:D132–D136.

73. Perez-Rueda, E., and J. Collado-Vides. 2000. The repertoire of DNA-binding transcriptional regulators in Escherichia coli K-12. Nucleic AcidsRes. 28:1838–1847.

74. Pinkney, M., and J. G. Hoggett. 1988. Binding of the cyclic AMP receptorprotein of Escherichia coli to RNA polymerase. Biochem. J. 250:897–902.

75. Pribnow, D. 1975. Nucleotide sequence of an RNA polymerase binding siteat an early T7 promoter. Proc. Natl. Acad. Sci. USA 72:784–788.

76. Price, M. N., P. S. Dehal, and A. P. Arkin. 2008. Horizontal gene transferand the evolution of transcriptional regulation in Escherichia coli. GenomeBiol. 9:R4.

77. Price, M. N., P. S. Dehal, and A. P. Arkin. 2007. Orthologous transcriptionfactors in bacteria have different functions and regulate different genes.PLoS Comput. Biol. 3:1739–1750.

78. Rajewsky, N., N. D. Socci, M. Zapotocky, and E. D. Siggia. 2002. Theevolution of DNA regulatory regions for proteo-gamma bacteria by inter-species comparisons. Genome Res. 12:298–308.

79. Ravasz, E., A. L. Somera, D. A. Mongru, Z. N. Oltvai, and A. L. Barabasi.2002. Hierarchical organization of modularity in metabolic networks. Sci-ence 297:1551–1555.

80. Reed, J. L., and B. O. Palsson. 2003. Thirteen years of building constraint-based in silico models of Escherichia coli. J. Bacteriol. 185:2692–2699.

81. Rodriguez-Penagos, C., H. Salgado, I. Martinez-Flores, and J. Collado-Vides. 2007. Automatic reconstruction of a bacterial regulatory networkusing Natural Language Processing. BMC Bioinform. 8:293.

82. Rogozin, I. B., K. S. Makarova, D. A. Natale, A. N. Spiridonov, R. L.

30 MINIREVIEW J. BACTERIOL.

on August 22, 2020 by guest

http://jb.asm.org/

Dow

nloaded from

Page 9: Bioinformatics Resources for the Study of Gene Regulation ... · Programa de Geno´mica Computacional, Centro de Ciencias Geno´micas, Universidad Nacional Auto´noma de Me´xico,

Tatusov, Y. I. Wolf, J. Yin, and E. V. Koonin. 2002. Congruent evolution ofdifferent classes of non-coding DNA in prokaryotic genomes. Nucleic AcidsRes. 30:4264–4271.

83. Ross, W., S. E. Aiyar, J. Salomon, and R. L. Gourse. 1998. Escherichia colipromoters with UP elements of different strengths: modular structure ofbacterial promoters. J. Bacteriol. 180:5375–5383.

84. Salgado, H., G. Moreno-Hagelsieb, T. F. Smith, and J. Collado-Vides. 2000.Operons in Escherichia coli: genomic analyses and predictions. Proc. Natl.Acad. Sci. USA 97:6652–6657.

85. Samal, A., S. Singh, V. Giri, S. Krishna, N. Raghuram, and S. Jain. 2006.Low degree metabolites explain essential reactions and enhance modularityin biological networks. BMC Bioinform. 7:118.

86. Sangurdekar, D. P., F. Srienc, and A. B. Khodursky. 2006. A classificationbased framework for quantitative description of large-scale microarraydata. Genome Biol. 7:R32.

87. Schaechter, M., and F. Neidhardt. 1987. Introduction, p. 1–2. In F. C.Neidhardt, J. Ingraham, K. B. Low, B. Magasanik, M. Schaechter, and H. E.Umbarger (ed.), Cellular and molecular biology: Escherichia coli and Sal-monella, 1st ed. ASM Press, Washington, DC.

88. Selinger, D. W., R. M. Saxena, K. J. Cheung, G. M. Church, and C.Rosenow. 2003. Global RNA half-life analysis in Escherichia coli revealspositional patterns of transcript degradation. Genome Res. 13:216–223.

89. Seshasayee, A. S. 2007. An assessment of the role of DNA adenine meth-yltransferase on gene expression regulation in E. coli. PLoS ONE 2:e273.

90. Shen-Orr, S. S., R. Milo, S. Mangan, and U. Alon. 2002. Network motifs inthe transcriptional regulation network of Escherichia coli. Nat. Genet. 31:64–68.

91. Sinoquet, C., S. Demey, and F. Braun. 2008. Large-scale computational andstatistical analyses of high transcription potentialities in 32 prokaryoticgenomes. Nucleic Acids Res. 36:3332–3340.

92. Soupene, E., W. C. van Heeswijk, J. Plumbridge, V. Stewart, D. Bertenthal,H. Lee, G. Prasad, O. Paliy, P. Charernnoppakul, and S. Kustu. 2003.Physiological studies of Escherichia coli strain MG1655: growth defects andapparent cross-regulation of gene expression. J. Bacteriol. 185:5611–5626.

93. Stoyanov, J. V., J. L. Hobman, and N. L. Brown. 2001. CueR (YbbI) ofEscherichia coli is a MerR family regulator controlling expression of thecopper exporter CopA. Mol. Microbiol. 39:502–511.

94. Stromback, L., and P. Lambrix. 2005. Representations of molecular path-ways: an evaluation of SBML, PSI MI and BioPAX. Bioinformatics 21:4401–4407.

95. Strong, M., P. Mallick, M. Pellegrini, M. J. Thompson, and D. Eisenberg.2003. Inference of protein function and protein linkages in Mycobacteriumtuberculosis based on prokaryotic genome organization: a combined com-putational approach. Genome Biol. 4:R59.

96. Tan, K., G. Moreno-Hagelsieb, J. Collado-Vides, and G. D. Stormo. 2001.A comparative genomics approach to prediction of new members of regu-lons. Genome Res. 11:566–584.

97. Teichmann, S. A., and M. M. Babu. 2004. Gene regulatory network growthby duplication. Nat. Genet. 36:492–496.

98. Thieffry, D., H. Salgado, A. M. Huerta, and J. Collado-Vides. 1998. Predic-tion of transcriptional regulatory sites in the complete genome sequence ofEscherichia coli K-12. Bioinformatics 14:391–400.

99. Thomas-Chollier, M., O. Sand, J. V. Turatsinze, R. Janky, M. Defrance, E.Vervisch, S. Brohee, and J. van Helden. 2008. RSAT: regulatory sequenceanalysis tools. Nucleic Acids Res. 36:W119–W127.

100. Tjaden, B. 2006. An approach for clustering gene expression data with errorinformation. BMC Bioinform. 7:17.

101. Tjaden, B., R. M. Saxena, S. Stolyar, D. R. Haynor, E. Kolker, and C.Rosenow. 2002. Transcriptome analysis of Escherichia coli using high-den-sity oligonucleotide probe arrays. Nucleic Acids Res. 30:3732–3738.

102. Tobes, R., and J. L. Ramos. 2002. AraC-XylS database: a family of positivetranscriptional regulators in bacteria. Nucleic Acids Res. 30:318–321.

103. Tran, T. T., P. Dam, Z. Su, F. L. Poole II, M. W. Adams, G. T. Zhou, andY. Xu. 2007. Operon prediction in Pyrococcus furiosus. Nucleic Acids Res.35:11–20.

104. Turatsinze, J. V., M. Thomas-Chollier, M. Defrance, and J. van Helden.2008. Using RSAT to scan genome sequences for transcription factor bind-ing sites and cis-regulatory modules. Nat. Protoc. 3:1578–1588.

105. van Helden, J. 2003. Regulatory sequence analysis tools. Nucleic Acids Res.31:3593–3596.

106. Veber, P., C. Guziolowski, M. Le Borgne, O. Radulescu, and A. Siegel. 2008.Inferring the role of transcription factors in regulatory networks. BMCBioinform. 9:228.

107. Veiga, D. F., F. F. Vicente, M. F. Nicolas, and A. T. Vasconcelos. 2008.Predicting transcriptional regulatory interactions with artificial neural net-works applied to E. coli multidrug resistance efflux pumps. BMC Microbiol.8:101.

108. Visick, K. L., T. M. O’Shea, A. H. Klein, K. Geszvain, and A. J. Wolfe. 2007.The sugar phosphotransferase system of Vibrio fischeri inhibits both motilityand bioluminescence. J. Bacteriol. 189:2571–2574.

109. Wade, J. T., N. B. Reppas, G. M. Church, and K. Struhl. 2005. Genomicanalysis of LexA binding reveals the permissive nature of the Escherichiacoli genome and identifies unconventional target sites. Genes Dev. 19:2619–2630.

110. Wall, M. E., W. S. Hlavacek, and M. A. Savageau. 2004. Design of genecircuits: lessons from bacteria. Nat. Rev. Genet. 5:34–42.

111. Weber, A., and K. Jung. 2002. Profiling early osmostress-dependent geneexpression in Escherichia coli using DNA microarrays. J. Bacteriol. 184:5502–5507.

112. Whitehead, A. N. 1967. Adventures of ideas. Free Press, New York, NY.113. Xiao, G., B. Martinez-Vaz, W. Pan, and A. B. Khodursky. 2006. Operon

information improves gene expression estimation for cDNA microarrays.BMC Genomics 7:87.

114. Yooseph, S., G. Sutton, D. B. Rusch, A. L. Halpern, S. J. Williamson, K.Remington, J. A. Eisen, K. B. Heidelberg, G. Manning, W. Li, L. Jarosze-wski, P. Cieplak, C. S. Miller, H. Li, S. T. Mashiyama, M. P. Joachimiak,C. van Belle, J. M. Chandonia, D. A. Soergel, Y. Zhai, K. Natarajan, S. Lee,B. J. Raphael, V. Bafna, R. Friedman, S. E. Brenner, A. Godzik, D. Eisen-berg, J. E. Dixon, S. S. Taylor, R. L. Strausberg, M. Frazier, and J. C.Venter. 2007. The Sorcerer II Global Ocean Sampling expedition: expand-ing the universe of protein families. PLoS Biol. 5:e16.

115. Zheng, D., C. Constantinidou, J. L. Hobman, and S. D. Minchin. 2004.Identification of the CRP regulon using in vitro and in vivo transcriptionalprofiling. Nucleic Acids Res. 32:5874–5893.

116. Zhou, Y., T. J. Merkel, and R. H. Ebright. 1994. Characterization of theactivating region of Escherichia coli catabolite gene activator protein(CAP). II. Role at class I and class II CAP-dependent promoters. J. Mol.Biol. 243:603–610.

VOL. 191, 2009 MINIREVIEW 31

on August 22, 2020 by guest

http://jb.asm.org/

Dow

nloaded from