16
6762–6777 Nucleic Acids Research, 2007, Vol. 35, No. 20 Published online 5 October 2007 doi:10.1093/nar/gkm631 Discovery of Fur binding site clusters in Escherichia coli by information theory models Zehua Chen 1 , Karen A. Lewis 1 , Ryan K. Shultzaberger 1 , Ilya G. Lyakhov 1,2 , Ming Zheng 3 , Bernard Doan 3,4 , Gisela Storz 3 and Thomas D. Schneider 1, * 1 National Cancer Institute at Frederick, Center for Cancer Research Nanobiology Program, 2 Basic Research Program, SAIC-Frederick, Inc., National Cancer Institute at Frederick, Frederick, MD 21702-1201, 3 National Institute of Child Health and Human Development, Cell Biology and Metabolism Branch and 4 Division of Extramural Activities, Referral and Program Analysis Branch, National Institute of Allergy and Infectious Diseases, Bethesda, MD 20892, USA Received June 5, 2007; Revised July 28, 2007; Accepted July 31, 2007 ABSTRACT Fur is a DNA binding protein that represses bacterial iron uptake systems. Eleven footprinted Escherichia coli Fur binding sites were used to create an initial information theory model of Fur binding, which was then refined by adding 13 experimentally confirmed sites. When the refined model was scanned across all available footprinted sequences, sequence walk- ers, which are visual depictions of predicted binding sites, frequently appeared in clusters that fit the footprints (»83% coverage). This indicated that the model can accurately predict Fur binding. Within the clusters, individual walkers were separated from their neighbors by exactly 3 or 6 bases, consistent with models in which Fur dimers bind on different faces of the DNA helix. When the E. coli genome was scanned, we found 363 unique clusters, which includes all known Fur-repressed genes that are involved in iron metabolism. In contrast, only a few of the known Fur-activated genes have predicted Fur binding sites at their promoters. These observa- tions suggest that Fur is either a direct repressor or an indirect activator. The Pseudomonas aeruginosa and Bacillus subtilis Fur models are highly similar to the E. coli Fur model, suggesting that the Fur–DNA recognition mechanism may be conserved for even distantly related bacteria. INTRODUCTION The protein Fur is the 16.8 kDa product of the fur (ferric uptake regulation) gene in Escherichia coli (1), so named because it was first observed to repress the transcription of genes that code for components of ferric (Fe +3 ) uptake systems found in the cell membrane. Since then, Fur also has been found to regulate other genes that are not directly related to iron transport, such as those encoding hemolysin, Shiga-like toxin and manganese superoxide dismutase (2–5). Fur binds to DNA and represses transcription in the presence of divalent metal ions. The ion is thought to be Fe +2 in vivo (6), however, DNase I footprinting experi- ments have shown that Fur also binds to DNA in the presence of Mn +2 , Co +2 , Cu +2 , Cd +2 , and Zn +2 (7). Recent studies have suggested that purified Fur contains at least one Zn +2 ion as a structural stabilizer (8). Fur has been observed to bind to DNA as a dimer and in higher order polymers (7,9), and electron microscopy has shown polymerization of Fur on DNA under high concentrations of protein and metal ions (2). Numerous strategies have been employed to find new Fur binding sites. Various consensus sequences have been derived from both footprinted and non- footprinted Fur binding sites (3,7,10) and these have been compared to sequences in the promoter region of suspected iron-regulated genes. Putative Fur targets were then investigated further through genetic and biochem- ical experiments. Stojiljkovic et al. created a successful The authors wish it to be known that, in their opinion, the first two authors should be regarded as joint First Authors. Present address: Karen A. Lewis, Department of Physiology, University of Texas Southwestern Medical Center at Dallas, Dallas, TX 75390-9040, USA. Ryan K. Shultzaberger, Department of Molecular and Cell Biology, University of California, Berkeley, CA 94720-3202, USA. Ming Zheng, Dupont Central Research and Development, Experimental Station E328-B31, Wilmington, DE 19880-0328, USA. Zehua Chen, Department of Biology, Boston College, MA 02467, USA. *To whom correspondence should be addressed. Tel: +1 301 846 5581; Fax: +1 301 846 5598; Email: [email protected] ß 2007 The Author(s) This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/ by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

Discovery of Fur binding site clusters in Escherichia coli ...biitcomm/research/references... · using an fhuF:lacZ fusion and Fur consensus sequence-containing plasmid titrant on

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Discovery of Fur binding site clusters in Escherichia coli ...biitcomm/research/references... · using an fhuF:lacZ fusion and Fur consensus sequence-containing plasmid titrant on

6762ndash6777 Nucleic Acids Research 2007 Vol 35 No 20 Published online 5 October 2007doi101093nargkm631

Discovery of Fur binding site clusters inEscherichia coli by information theory modelsZehua Chen1 Karen A Lewis1 Ryan K Shultzaberger1 Ilya G Lyakhov12 Ming Zheng3

Bernard Doan34 Gisela Storz3 and Thomas D Schneider1

1National Cancer Institute at Frederick Center for Cancer Research Nanobiology Program 2Basic ResearchProgram SAIC-Frederick Inc National Cancer Institute at Frederick Frederick MD 21702-12013National Institute of Child Health and Human Development Cell Biology and Metabolism Branch and4Division of Extramural Activities Referral and Program Analysis Branch National Institute of Allergy andInfectious Diseases Bethesda MD 20892 USA

Received June 5 2007 Revised July 28 2007 Accepted July 31 2007

ABSTRACT

Fur is a DNA binding protein that represses bacterialiron uptake systems Eleven footprinted Escherichiacoli Fur binding sites were used to create an initialinformation theory model of Fur binding which wasthen refined by adding 13 experimentally confirmedsites When the refined model was scanned acrossall available footprinted sequences sequence walk-ers which are visual depictions of predicted bindingsites frequently appeared in clusters that fit thefootprints (raquo83 coverage) This indicated that themodel can accurately predict Fur binding Withinthe clusters individual walkers were separated fromtheir neighbors by exactly 3 or 6 bases consistentwith models in which Fur dimers bind on differentfaces of the DNA helix When the E coli genome wasscanned we found 363 unique clusters whichincludes all known Fur-repressed genes that areinvolved in iron metabolism In contrast only a fewof the known Fur-activated genes have predictedFur binding sites at their promoters These observa-tions suggest that Fur is either a direct repressor oran indirect activator The Pseudomonas aeruginosaand Bacillus subtilis Fur models are highly similar tothe E coli Fur model suggesting that the FurndashDNArecognition mechanism may be conserved for evendistantly related bacteria

INTRODUCTION

The protein Fur is the 168 kDa product of the fur (ferricuptake regulation) gene in Escherichia coli (1) so namedbecause it was first observed to repress the transcription ofgenes that code for components of ferric (Fe+3) uptakesystems found in the cell membrane Since then Fur alsohas been found to regulate other genes that are notdirectly related to iron transport such as those encodinghemolysin Shiga-like toxin and manganese superoxidedismutase (2ndash5)

Fur binds to DNA and represses transcription in thepresence of divalent metal ions The ion is thought to beFe+2 in vivo (6) however DNase I footprinting experi-ments have shown that Fur also binds to DNA in thepresence of Mn+2 Co+2 Cu+2 Cd+2 and Zn+2 (7)Recent studies have suggested that purified Fur containsat least one Zn+2 ion as a structural stabilizer (8) Fur hasbeen observed to bind to DNA as a dimer and in higherorder polymers (79) and electron microscopy has shownpolymerization of Fur on DNA under high concentrationsof protein and metal ions (2)

Numerous strategies have been employed to findnew Fur binding sites Various consensus sequenceshave been derived from both footprinted and non-footprinted Fur binding sites (3710) and these havebeen compared to sequences in the promoter region ofsuspected iron-regulated genes Putative Fur targets werethen investigated further through genetic and biochem-ical experiments Stojiljkovic et al created a successful

The authors wish it to be known that in their opinion the first two authors should be regarded as joint First AuthorsPresent addressKaren A Lewis Department of Physiology University of Texas Southwestern Medical Center at Dallas Dallas TX 75390-9040 USARyan K Shultzaberger Department of Molecular and Cell Biology University of California Berkeley CA 94720-3202 USAMing Zheng Dupont Central Research and Development Experimental Station E328-B31 Wilmington DE 19880-0328 USAZehua Chen Department of Biology Boston College MA 02467 USA

To whom correspondence should be addressed Tel +1 301 846 5581 Fax +1 301 846 5598 Email tomsncifcrfgov

2007 The Author(s)

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (httpcreativecommonsorglicenses

by-nc20uk) which permits unrestricted non-commercial use distribution and reproduction in any medium provided the original work is properly cited

lsquoFur titration assayrsquo to locate new Fur binding sitesusing an fhuFlacZ fusion and Fur consensus sequence-containing plasmid titrant on MacConkey plates (1)Several new iron-regulated genes in E coli were discoveredusing this consensus sequence-based technique In addi-tion to the above studies have also been carried out usingE coli Fur for DNase I footprinting with non-E coliDNA (1112) Recently transcriptional profiles of E coligenes have been used to determine those that are regulatedby iron and Fur by evaluating mRNA levels in the absenceof iron or Fur protein (13)

Another method for finding Fur-regulated genes is touse molecular information theory to locate new bindingsites Using this approach classical information theory(1415) is applied to molecular biology (16) First a set ofbinding sites is aligned by maximizing the informationcontent (17) and then the average pattern at the sitesis represented by a computer graphic called a sequencelogo (18) Next the conservation of bases in the alignedset is used to create a weight matrix model that assignsa weight in bits to each base at each position according toits frequency in the data set (19) This can be displayedusing the sequence walker graphic (20)

In addition to displaying details of binding sitessequence logos can be used to understand the mechanismof binding In instances where factors bind in overlappingclusters it is difficult to assign the relative contribution ofa base in an overlapping region to the appropriate binderor to determine the range of the binding site Here wetested several Fur binding site models that were obtainedby multiply aligning Fur binding sequences using differentwindow sizes and identified the model that best representsbinding by a single Fur dimer

Information theory has previously been used to buildtwo models to evaluate and predict Fur binding sites(1321) Both models used ad hoc variations of informa-tion theory to assign scores to the predicted binding sitesrather than classical information content in bits In onecase the model was built using some sites that had notbeen footprinted by Fur and were probably not aligned tomaximize the information content (21)

The most rigorous approach to model building is to usea data set comprised of only footprinted binding sitesfrom one species By restricting the data set to experi-mentally proven sites one is certain that the model willreflect the binding characteristics of the protein the use ofa single species ensures that the protein and DNA bindingsequences evolved together and therefore correspondto one another (22) Many biases from previous modelsare thereby avoided but not all (23) The resultingexperimentally supported model is then scanned acrossthe entire genome of the species looking for sequencesthat contain a positive amount of information asevaluated by the weight matrix (19) Sequence walkerswhich are graphical representations of individual bindingsites then display probable binding sites on the genomebased on the model of proven sites (20) This method wassuccessfully used to discover that the OxyR transcriptionfactor controls the expression of the fur gene (24) toidentify additional sites for proteins such as Fis SoxSOxyR and PXRRXR (2325ndash27) to characterize splice

junctions (28) and to discover T7 islands a unique classof mobile genetic elements (2930) In this studyinformation theory has allowed us to identify new Furbinding sites 13 of which we confirmed experimentally

MATERIALS AND METHODS

Programs

Programs used in this study are available at httpwwwccrnpncifcrfgov~toms A web-based toolfor searching for Fur sites is available at httpwwwccrnpncifcrfgov~tomsdelilaserverhtml

Creating the Fur model

Eleven footprinted E coli Fur binding sequences wereextracted from the E coli K-12 genome (NC_000913)by the delila program (31) These sites are from thepromoters of the genes cir fecA fecI fur sodA tonB andiucA along with two bidirectional promoter regions forthe genes fepA-fes and fepB-entC (832ndash39) The promoterregion at fepB-entC has two distinctly protected regionseach region was included in the data set as individualsequences (fepB and entC) The promoter of iucA has anexceptionally long secondary footprint and so two regionswere used from the E coli plasmid ColV-K30 (M10930from 347 to 370 and from 365 to 393) (7) The comple-ment of each footprinted sequence was also includedsince Fur binds as a dimer (67940) The programmalign was used to obtain an alignment of the sequencesthat maximizes the information content within definedwindows of the alignment (17)As the initial model described above contains only

11 sites we then refined the model by including 13 moresites that were identified in the genome by a searchand confirmed by gel shift experiments in this study(see subsequently) The validity of the refined modelwas tested by scanning it across all available E coliFur footprinted regions using the programs scan andlister to create sequence walkers (1920) The model isverified if the walkers correspond to DNase I footprintregions For further verification the model was alsoscanned across synthetic oligonucleotides that containGATAAT repeats and which had been previouslyfootprinted (5)To compare Fur models of different bacteria we also

built Pseudomonas aeruginosa and Bacillus subtilis modelsby using footprinted Fur binding sequences from thesetwo species (4142)

Scanning for Fur binding sites

Fur in high concentration can bind weak sites while inlow concentration it binds only strong sites The affinity ofFur to its binding site also varies significantly with theavailability of metal Furthermore as demonstrated byfootprint data Fur binding can extend to flanking regionsthat bear little resemblance to the strong Fur binding sitesThis suggests that it may not be practical to use a fixedcutoff for a Fur binding site model under different

Nucleic Acids Research 2007 Vol 35 No 20 6763

conditions In our study we used two cutoff values fordifferent scansFor the whole E coli genome scan we used the lowest

information in the set of sites used to build the model asthe cutoff This should minimize false positives in ourpredictions as the model was built from experimentallyproven sites Groups of sites that were within 100 basesof each other were identified using the localbest programand the strongest one was selected to represent theregion This ensured that each region in the data set wasunique All sites with Ri values greater than 170 bits wereextracted (Table 4) This value was chosen simply to allowfor a manageable set of regions for further analysisAs Fur can extend binding to much weaker regions in

the footprints we should use a lower cutoff for scanningfootprinted sequences The second law of thermodynamicssets a theoretical lower bound for the information contentof a binding site (Ri) at zero bits (19) Therefore we useda zero-bit cutoff when scanning footprinted sequencesFur has been suggested to repress genes by binding to

their promoter elements (35 and 10 regions) therebyblocking the access of polymerase to the promoters(3343ndash46) To determine how many of the predictedFur binding clusters overlap with promoters we used aninformation theory-based flexible s70 binding model toscan the Fur clusters The s70 model was built from401 experimentally proven E coli promoters by uniformlytaking into account the information present in the 10the 35 and the uncertainty of the spacing between them(47) the same method has been successfully used to modelprokaryotic ribosome binding sites (48) The informationcontent (average conservation) of the promoter model islow (x frac14 65 bits SD=28 bits for the Ri distribution)and the individual information for a site in the modelranges from 03 to 126 bits The probability that a site isless than 4 bits is only 186 so we used 4 bits as a cutoffto predict reasonably strong non-activated promoters (47)

Gel mobility shift assays

Two sets of oligonucleotides containing predictedFur binding sites in E coli were designed andsynthesized (Oligos Etc) All oligonucleotides were self-complementary had a 50-GCTA-30 overhang on the50 end and contained a hairpin loop Oligos exbB exbDand fhuF contained a hairpin of the sequence 50-CGCGAAGC G-30 while the other 12 oligos [yoeAfepD-entS (formerly ybdA) gpmA ryhB fhuA nohAoppA gspC garP (formerly yhaU) yahA fadD and ygaC]contained a hairpin of the sequence 50-ACGATCGCGCGAAGC GCGATCGT-30 in the center Such loopsform a structure that is stabilized by base pairing betweenG3 and A5 of the central seven bases of each loop (49) andthey are convenient for use in DNA mobility shift assaysbecause of the exact equimolar concentrations of thecomplementary strands and their high stability (25)Three oligos containing the promoter regions for exbB

ygaC and the upstream region of exbD were created totest previously published consensus sequence predictions(5051) Eight oligos contained potential Fur-controlledpromoters identified using both an 11-site Fur model and

a s70 model as described above [yoeA fepD-entS (formerlyybdA) gpmA ryhB fhuA nohA oppA and fhuF] We werealso interested in whether Fur binds intragenically so fouroligos were synthesized that contained strong predictedsites found within gene coding regions (gspC garP yahAand fadD) These sequences can be found in SupplementaryTable S1 and Figure S1

Gel mobility shift assays were performed using the15 oligos (Figure 3) The oligonucleotides were labeled bya fill-in reaction with Taq DNA polymerase The 20 ml oflabeling mixture [20 pmol oligo 2 mM MgCl2 1PCRbuffer 5 units Taq polymerase (GibcoBRL) and 10 mMtetramethylrhodamine-6-dUTP (NEN)] was incubated at72C for 1 h 80 ml of ddH2O was added to the mixturefollowed by two phenol extractions a phenolchloroformextraction and a chloroform extraction The labeledoligonucleotides were diluted 15 in TE (10 mM TrisndashClpH 75 1 mM EDTA) boiled for 10 min and then placedon ice to prevent dimerization and trimerization

The labeled oligonucleotides (5 ml each 02 pmol) werethen incubated in Fur binding buffer (10 mM BisndashTrisndashHCl pH 75 5 mgml sonicated salmon sperm DNA 5glycerol 100 mM MnCl2 100 mgml BSA 1 mM MgCl240 mM KCl) with 150 nM Fur protein at 37C for 13 min(7) Fur was a gift from T OrsquoHalloran and C OuttenSamples were electrophoresed on a 5 polyacrylamidegel in Fur electrophoresis buffer (01 M BisndashTrisndashHClpH 75 10 mM MnCl2) at 120V for 1 h

Bands were visualized with an FMBIO II fluore-scent scanner (Hitachi) with an excitation wavelengthof 532 nm and a 585 nm filter for detection oftetramethylrhodamine

Footprinting

To generate the fhuF promoter construct (pGSO129) usedto test for Fur binding a 240-bp fragment amplified byPCR from genomic DNA (using the primers 50-GCGGCT GGA GAT GAA TTC GCC AGA TG and 50-GCCCTG CAA TCA GGG ATC CCG GCA GC) was clonedinto the BamHI and EcoRI sites of pUC18 (27) PurifiedFur protein generously provided by C Outten andT OrsquoHalloran was incubated with a Mn2+-containingbuffer according to de Lorenzo et al (7) DNase I foot-printing then was carried out as described previously (52)The 240-bp BamHIndashEcoRI fragment of pGSO129 waslabeled with [g-32P]ATP at either the BamHI site (topstrand) or the EcoRI site (bottom strand) The labeledfragments were incubated with 0 50 100 200 and 400 nMpurified Fur protein at room temperature for 5 minThe samples then were subjected to limited DNase Idigestion purified and separated on 8 polyacrylamidesequencing gels

RESULTS

E coli Fur binding model

Eleven footprinted Fur binding sites (see Table 3 andSupplementary Figure S1) (832ndash39) from E coliK-12 (53)were used to create an initial information theory model

6764 Nucleic Acids Research 2007 Vol 35 No 20

Because Fur binds as a dimer (954) the model includedboth the footprinted Fur sequences and their comple-ments The sequences were aligned using the programmalign which maximizes the information content within aregion of the alignment (a lsquowindowrsquo) by shuffling the

sequences back and forth (17) Multiple alignmentwith a window size of (12 thorn12) gave a sequencelogo that shows base conservation in the range of(12 thorn12) (Figure 1A) For convenience we named thislogo M12

A

B

C

M12208 bits(minus12 +12)

M9181 bits(minus9 +9)

M6154 bits(minus6 +6)

===========gtolt===========

========gtolt========

=====gtolt=====

========gtolt================gtolt========

=====gtolt==========gtolt=====

0

1

2

bit

s

5prime

minus20

GTAC

minus19

minus18

CGTA

minus17

CGTA

minus16

GACT

minus15

ACGT

minus14

GTCA

minus13

minus12

GTA

minus11

GTA

minus10

GACT

minus9CTAG

minus8

TCGA

minus7

GCTA

minus6

TCA

minus5

TA

minus4

CT

minus3

TAG

minus2

CGA

minus1

CGT

0

TA

1

GCA

2

GCT

3

ATC

4

GA

5

AT

6

GAT

7

GCTA

8

AGCT

9

GATC

10

CTGA

11

CAT

12

CAT

13 14

CAGT

15

TGCA

16

CTGA

17

GCAT

18

GCAT

19 20

CATG

3prime

0

1

2

bit

s

5prime

minus20

CGTA

minus19

minus18

minus17

GTCA

minus16

GCAT

minus15

GCTA

minus14

GTA

minus13

GACT

minus12

CAGT

minus11

CGA

minus10

GCAT

minus9

TA

minus

minus8

GTA

minus7

GCAT

minus6

CAG

minus5

CA

minus4

GAT

minus3

TA

minus2

GCA

minus1

GCT

0 1

CGA

2

CGT

3

AT

4

CTA

5

GT

6

GTC

7

GCTA

8CAT

9AT10

CGTA

11

GACT

12

GTCA

13

TCGA

14

CAT

15

CGAT

16

CGTA

17

CAGT

18 19 20

GCAT

3prime

0

1

2

bit

s

5prime minus20

GCA

minus19

GCAT

minus18

GCTA

minus

minus17

GCTA

minus16

GCAT

minus15

CAGT

minus14

TGCA

minus13

GTAC

minus12

GCTA

minus11

GTA

minus10

GACT

minus9

CAG

minus8

CTGA

minus7

GCAT

minus6

A

minus5

TA

minus4

CT

minus3

TCAG

minus2

CGA

minus1

AGT

0

TA

1

TCA

2

GCT

3

GATC

4

GA

5

AT

6

T7

GCTA

8

GACT

9

GATC

10

TCGA

11

CAT

12

GCAT

13

CTGA

14

ACGT

15

GTCA

16

GTCA

17

GCAT

18

GCAT

19

CGTA

20

CAGT

3prime

Figure 1 Different alignments of footprinted E coli Fur binding sites Sequence alignments were done using the program malign with differentwindow sizes [from (15 +15) to (5 +5)] (17) on 11 footprinted E coli Fur binding sequences and their complements (see Supplementary FigureS1 for sequences) Three classes of alignments M12 (A) M9 (B) and M6 (C) were obtained by using window sizes of (12 +12) (9 +9) and(6 +6) respectively In these logos the height of each letter is proportional to the frequency of that base at each position and the height of theletter stack is the conservation in bits (18) The dashed sine wave on each logo represents the 106 base helical twist of B-form DNA (3181) Thedouble dashed arrows on the top of each logo mark the inverted repeats in each model Note that the M12 logo corresponds to two overlapping M9sites as indicated by the two red double dashed arrows below the M12 logo A similar relationship occurs between the M9 and M6 logos

Nucleic Acids Research 2007 Vol 35 No 20 6765

The M12 logo appears to consist of two overlappingdirect repeats one is from 12 to thorn6 and the other from6 to thorn12 Because the multiple alignment process canincorporate one or the other of these repeats this suggeststhat different alignments may be obtained when otherwindows are used to align the sequences Because two Furdimers can bind overlapping sites with 6-base separation(4055) we then multiply aligned the sequences by usingwindow sizes ranging from (15 thorn15) to (5 thorn5) Thesesizes should cover the sites recognized by one or twoFur dimers (9) A total of seven different alignments wereobtained the corresponding sequence logos are shown inSupplementary Figure S2 According to the patternsshown in the logos the seven alignments that wereobtained can be further consolidated into and assignedto three basic alignment classes these alignment classesare represented by logos M12 M9 and M6 (Figure 1)Each of the three alignments itself is an inverted repeatthe M12 logo also appears to consist of two overlappingM9 sites (from 12 to thorn6 and from 6 to thorn12) and theM9 logo looks like two overlapping M6 sites (from 9to thorn3 and from 3 to thorn9) (Figure 1)To determine which of the three alignment classes best

represents a single-dimer binding model some publishedexperimental data was analyzed Lavrrar and McIntosh(55) used the Ferguson method (56) to determine howmany E coli Fur dimers can bind on five synthetic oligosand two natural sites fepB and entS We applied the threemodels to their sequences and used sequence walkersto show the predicted sites of each model A full list ofthe predictions (with walkers gt0 bits) is given inSupplementary Figure S3 The results show that themodels frequently found overlapping sites that areseparated by 3 or 6 bases The sites that are 6 basesapart can both be strong while 3-base separated siteshave one strong site and one weak site (SupplementaryFigure S3) Furthermore it has been shown that Furdimers can simultaneously bind two overlapping sites with6-bp spacing (40) but there is no experimental datashowing that 3-base separated overlapping sites can bebound at the same time Therefore to analyze the resultswe only counted major walkers (strong walkers that areseparated by 6 bases) for each model in each oligo

A summary of the results (with major walkers) is availablein Table 1 two examples of the prediction (with majorwalkers) are shown in Figure 2A and B The results showthat the M9 model made correct predictions for five ofthe seven oligos while the M12 and M6 models bothmade only two correct predictions Furthermore the M12model always predicted one less site than the M9 modelThese results suggest that the M9 model may be a single-dimer binding model the M12 model may representa two-dimer binding model while the M6 alignment maybe just a result of compression of the binding sequencesinto a small alignment window

Confirmation of 13 predicted sites and model refinement

Since our E coli Fur model had only 11 sites we wantedto refine the model by including more sites A genome scanwith the 11-site M9 model revealed 389 sites above 11 bits(the lowest information of a site in the model is 111 bits)Within these 13 sites (from 12 to 26 bits) were selectedfor experimental confirmation (Figure 3 SupplementaryTable S1 and Figure S1) These include eight sites locatedin promoter regions [yoeA fepD-entS (43) gpmA (= pgm)(51) ryhB (= yhhX-yhhY) (5157) fhuA (13) nohA (51)oppA (13) and fhuF (51)] and four sites inside genes(gspC garP yahA and fadD) We also included two sitesthat were predicted by consensus models to be in ygaCand exbD (5051) since our model only detected a weaksite (14 bits) in ygaC and a negative site (50 bits) inexbD (Figure 3) Gel mobility shift assays showed that all13 sites predicted by the M9 model shifted when incubatedwith Fur protein while the exbD site did not shift and theygaC site only showed extremely weak binding (Figure 3)

McHugh et al (13) found that both exbB and exbDwere regulated by Fur but their model did not detect abinding site in the exbB promoter while our informationtheory model successfully predicted that Fur binding siteand our gel shift results confirmed this (Figure 3)The exbD gene is located only seven bases downstreamof the exbB gene Both genes have the same orientationso they may belong to a single operon cotranscribed fromthe exbB promoter and coregulated by Fur

To strengthen our model we added the 13 confirmedsites to the 11 previously footprinted sites (Figure 4A)

Table 1 Information theory test of 11-site E coli Fur models M12 M9 and M6

Oligo1 Number of observed1

Fur dimers (nM)Number of M122

walkers (bits)Number of M92

walkers (bits)Number of M62

walkers (bits)

F-F 1 (25000) f 1 (55) 2 (90 94) f 1 (115)F-F-F 2 (3610) 1 (148) f 2 (118 153) f 2 (119 99)F-F-F-F 4 (630) 2 (166 192) 3 (118 189 153) 3 (119 119 99)F-F-x-R 2 (095) f 2 (221 43) f 2 (177 176) 1 (204)F-F-x-R-R 3 (078) 2 (296 227) f 3 (177 275 108) 2 (204 164)fepB 2 (860) 1 (157) f 2 (109 82) 1 (152)entS 3 (301) 2 (156 172) f 3 (75 172 95) 2 (83 122)

1Oligos and observed Fur dimers are from (55) The apparent Kd is also given (nM in parentheses) for each oligo2Number of sequence walkers of each model followed by the information (bits) of each walker are shown for each sequence The predictions thatmatch the number of observed Fur dimers are marked with a solid circle while those that do not are marked with an open circle Only majorwalkers (strong walkers with 6-bp separation) are counted (see the text) A full depiction of all walkers (gt0 bits) can be found in SupplementaryFigures S3 and S5

6766 Nucleic Acids Research 2007 Vol 35 No 20

As with the 11 sites multiple alignment of the 24 sitesusing different window sizes also gave three classes ofalignments M12 M9 and M6 (Supplementary Figure S4)When these models were tested against the same set ofsequences as that used to test the 11-site models (Table 1Supplementary Figure S3) similar but cleaner resultswere obtained ie the weak walkers that were not countedpreviously become weaker (some are even below 0 bits)but the strong walkers still have similar Ri values(Supplementary Figure S5) This confirms that ourcounting of major walkers predicted by the initial 11-sitemodels was reasonable

P aeruginosa and B subtilis Fur models

To compare E coli Fur models with other bacterialFur models we also built P aeruginosa and B subtilis Furbinding site models As with E coli Fur binding siteswe applied the same method to align footprinted Furbinding sites from P aeruginosa (41) and B subtilis (42)For both bacteria we obtained two classes of alignments

(Supplementary Figure S6) which correspond to theE coli M9 and M6 logos (Figure 1B and C) respectivelyNo M12 alignments were obtained for these two bacteriaprobably because some sites used to make the modelsare single-dimer binding sites (see Discussion Section)Both P aeruginosa and B subtilis M9 models arehighly similar to the E coli M9 model (Figure 4) TheP aeruginosa M6 logo looks almost identical to theE coli M6 logo while the B subtilis M6 logo is lesssimilar to the E coli M6 logo (Figure 1C SupplementaryFigure S6B and D)As with Lavrrar and McIntoshrsquos work on E coli Fur

binding sites (55) Baichoo and Helmann (40) performedsimilar experiments on eight synthetic oligos and twonatural sites (feuA and dhbA) with B subtilis Fur anddetermined how many Fur dimers can bind each of theseoligos We tested the B subtilis M9 and M6 modelswith these 10 sequences (Table 2 Figure 2C and Dand Supplementary Figure S7) The results show that theM9 model made correct predictions for all 10 sequenceswhile the M6 model was correct for only four oligos

A

B

fepB binds 2 Fur dimers

entS binds 3 Fur dimers

C

D

feuA binds 1 Fur dimer

dhbA binds 2 Fur dimers

10 20 30 5 g g a a t t c g a a a a t g a g a a g c a t t a t t g g a t c c c g 3

M12 157 bits

M9 109 bits

M9 82 bits

M6 152 bits

10 20 30 40 5 g g a a t t c g a t a a t g a a a t t a a t t a t c g t t a t c g g a t c c c g 3

M12 156 bits

M12 172 bits

M9 75 bits

M9 172 bits

M9 95 bits

M6 83 bits

M6 122 bits

10 20 30 5 a t t c c a a t t g a t a a t a g t t a t c a a t t g a a c a 3

M9 223 bits

M6 103 bits

M6 81 bits

10 20 30 5 a t t g a t a a t g a t a a t c a t t a t c a a t a g a t t g 3

M9 256 bits

M9 299 bits

M6 149 bits

M6 208 bits

M6 171 bits

Figure 2 Sequence walkers to test E coli (A and B) and B subtilis (C and D) Fur models (A and B) The three E coli Fur models (M12 M9 andM6) were tested with two E coli Fur binding sites fepB and entS (55) Sequence walkers of each model (green M12 red M9 blue M6) are shownfor each site (C and D) Two B subtilis Fur models (M9 and M6) were tested with two B subtilis Fur binding sites feuA and dhbA (40) Gray barsindicate natural sequence surrounded by synthetic DNA Different colored rectangles indicate different Fur models (by hue green M12 red M9blue M6) and their strengths (by saturation) (30) Only major walkers are shown in these figures for a full list of walkers (Rigt0 bits) please seeSupplementary Figures S3 S5 and S7

Nucleic Acids Research 2007 Vol 35 No 20 6767

For most of the oligos the M6 model predicted one moresite than the M9 model did Similar to our observationsin E coli these results strongly suggest that the M9 modelis a single-dimer binding model while the M6 alignment isdue to compression of the binding sequencesRelative to E coli Fur (NP_415209) the P aeruginosa

Fur (NP_253452) is moderately diverged (58 identical inprotein sequence) and the B subtilis Fur (NP_390233) isdistantly diverged (33 identity) However a comparisonof the three M9 models from these three species showsthat these models are highly similar to each other(Figure 4) suggesting that the FurndashDNA recognitionmechanism is conserved in even distantly related species

Scans of footprinted regions

Gel shift experiments cannot give precise Fur bindingregions Originally using the 11-site M9 model and laterusing the 24-site M9 E coli Fur model we predicted Fursites in the fhuF promoter region and then to validate ourmodel we performed DNase I footprinting on that regionTo further confirm the 24-site M9 model we also scannedit across all other available footprinted E coli Fur bindingregions and correlated our predictions with the publishedfootprints Fur has been documented to exhibit lsquosecondaryfootprintingrsquo protecting extended regions of DNA underhigher protein concentrations (Supplementary Figure S8)(333646) To reveal the extended regions we used a cutoffat zero bits since that is the theoretical lowest informationfor binding (19)The E coli 24-site M9 model predicts two separated

Fur binding clusters in the fhuF promoter region onecluster contains three strong sites of 161 208 and 223bits and the other contains two weaker sites of 146 and

93 bits (Figure 5A) The fhuF oligo used in the gel shiftexperiments in Figure 3 only contains the strong Furcluster To determine if both predicted clusters are pro-tected by Fur we performed DNase I footprinting on alarger fhuF promoter region (see Materials and MethodsSection) that contains both predicted Fur binding clustersThe results show that there are indeed two Fur-protectedregions in the fhuF promoter one has high affinity withFur (protected by 50 nM of Fur) and the other hasa lower affinity (weakly protected by 50 nM of Fur)(Figure 5B) The predicted strong Fur binding clustercovers 91 of the high-affinity footprinted region and theweak Fur cluster covers 66 of the low-affinity region(Table 3 Figure 5A)

There are 11 other footprinted E coli Fur bindingregions including the 10 regions used to build the initial11-site models and the fepD-entS promoter region (58)When these footprints were scanned similar coveragewas obtained (Table 3 Supplementary Figure S8)All 11 footprinted sequences displayed clusters of multipleoverlapping Fur walkers in these clusters the majorityof the walkers were separated by six bases some wereseparated by three bases (Table 3 SupplementaryFigure S8) For 7 of the 11 regions (fepA-fes fepB entCfur tonB cir and iucA) sequence walkers cover 83(72ndash93) of the footprints No sequence walkers coverthe secondary footprints in the fecA fepD-entS and sodAregions resulting in low coverage of these three footprints(66 54 and 56 respectively) The secondary sodAfootprint was only protected with a high concentration ofpurified Fur (200 and 500 nM) it was not protected bycrude extracts containing overproduced Fur (36) In theother region fecI several low-information content

A

B

yoeA fepD-entS gpmA ryhB fhuA nohA oppA

bits 257 172 243 220 161 205 187 11-site M9274 194 274 252 195 223 215 24-site M9

Fur minus + minus + minus + minus + minus + minus + minus +

gspC garP yahA fadD ygaC exbB exbD fhuF

bits 122 122 163 197 14 175 minus50 214 11-site M9144 138 186 217 minus34 191 minus71 223 24-site M9

Fur minus + minus + minus + minus + minus + minus + minus + minus +

Figure 3 Gel mobility shift assays to test predicted Fur binding sites Oligonucleotides containing predicted Fur binding sites were incubated without() or with (+) 150 nM E coli Fur protein and gel electrophoresed to test the 11-site M9 model (Figure 1B) Below each set of lanes is the strengthin bits of the strongest sequence walker found on each oligo by the 11-site M9 model Predictions by the 24-site M9 model which includes 13 of thesites tested here (not ygaC or exbD) are also given for comparison (A) The first set of oligos contain seven predicted Fur sites located in promoterregions (yoeA fepD-entS gpmA ryhB fhuA nohA and oppA) (B) The second set of oligos contain four predicted Fur sites located within genes(gspC garP yahA and fadD See Supplementary Figures S10ndashS13 for sequence walkers) two sites located in promoter regions (exbB and fhuF)and two consensus-predicted sites (ygaC and exbD) (5051) As expected all oligos that contain predicted sites (by the 11-site M9 model) exhibit oneor more mobility shifts following incubation with 150 nM Fur the oligo ygaC (14 bits) shows extremely weak binding and exbD (50 bits)does not shift

6768 Nucleic Acids Research 2007 Vol 35 No 20

walkers (02 and 08 bits) extend past the range of thefootprint resulting in 144 coverage of the footprint

In a previously published study Escolar et al synthe-sized oligonucleotides that contained repeats of thesequence GATAAT and determined Fur binding byfootprint experiments (5) No footprint was seen withone insert correspondingly no sequence walkers wereobserved in that sequence (Supplementary Figure S9)For the oligos with three four and five inserts sequencewalkers cover 83ndash90 of the footprints The 2-insertsequence had a weak interaction with Fur at high protein

concentrations and two overlapping walkers of 30 and42 bits appeared which cover the repeats (5) Theseresults demonstrate that the Fur model accurately predictsFur binding on both natural and synthetic sequences

Whole genome scan

The lowest information of any site in the 24-site M9 modelis 101 bits (Supplementary Figure S1) Since all sites inthe model are proven binding sites we used 10 bits asthe cutoff for a whole genome scan This presumably

A

B

C

======gtolt======

======gtolt======

======gtolt======

24 E coli Fur binding sites 196 bits (minus9 +9)

0

1

2b

its

5prime

minus20

CTGA

minus19

minus18

minus17

minus16

GCAT

minus15

GCTA

minus14

CGTA

minus13

GACT

minus12

CAGT

minus11

CGTA

minus10

GCAT

minus9

GTA

minus8

GTA

minus7

GCAT

minus6

CAG

minus5

GTCA

minus4

GAT

minus3

TA

minus2

TGCA

minus1

GACT

0 1

TCGA

2

ACGT

3

AT

4

TCA

5

CGAT

6

GTC

7

CTGA

8

CAT

9

CAT

10

CGTA

11

GCAT

12

GTCA

13

CTGA

14

GCAT

15

CGAT

16

CGTA

17 18 19 20

GCAT

3prime

20 P aeruginosa Fur sites 186 bits (minus9 +9)

0

1

2

bit

s

5prime

minus20

minus19

minus18

minus17

minus16

minus15

minus14

CGTA

minus13

ACGT

minus12

minus11

minus10

CGTA

minus9

CGTA

minus8

CTA

minus7

CT

minus6

TAG

minus5

GTCA

minus4

CAGT

minus3

TA

minus2

CTA

minus1

CT

0

GC

1

GA

2

GAT

3AT

4GTCA

5

CAGT

6

ATC

7

GA

8

GAT

9

GCAT

10

GCAT

11 12 13

TGCA

14

GCAT

15 16 17 18 19 20 3prime

22 B subtilis Fur sites 229 bits (minus9 +9)

0

1

2

bit

s

5prime

minus20

minus19

GCAT

minus18

GCTA

minus17

GACT

minus16

CTGA

minus15

minus14

GCAT

minus13

GCAT

minus12

minus11

GCAT

minus10

GCTA

minus9

CGTA

minus8

CGTA

minus7

AT

minus6

C

G

minus5

GCA

minus4

CGAT

minus3

TGA

minus2

TGA

minus1

GCAT

0

TAGC

1

GCTA

2

CAT

3

CAT

4

GCTA

5

CGT

6

G

C7

TA

8

GCAT

9

GCAT

10

CGAT

11

CGTA

12 13

CGTA

14

CGTA

15 16

GACT

17

CTGA

18

CGAT

19

CGTA

20 3prime

Figure 4 Comparison of M9 Fur models of three bacterial species Experimentally proven Fur binding sites from E coli (A) P aeruginosa (B) andB subtilis (C) were used to build the models for these three bacteria The marked region from 7 to +7 in each logo indicates the 7-1-7 model thatwas proposed by Baichoo and Helmann (40)

Nucleic Acids Research 2007 Vol 35 No 20 6769

minimizes false positives in our predictions When themodel was scanned across the E coli K-12 genome(NC_000913) a total of 412 sites were found above10 bits These sites belong to 363 unique clusters A list ofthe clusters is available at httpwwwccrnpncifcrfgovtomspapersfurThe 363 regions were extracted from 50 to thorn50

around the strongest site in each region and scanned withthe 24-site M9 model using a cutoff of 0 bits We thendetermined the exact region of each cluster that is coveredby walkers greater than 0 bits The results show that theseclusters contain 1ndash14 walkers covering a region from19ndash142 bases (350 bases on average) This is comparableto the known footprints (30ndash103 bases) (Table 3 Supple-mentary Figure S8) Of these clusters 155 are in inter-genic regions 143 are entirely inside gene coding regionsand 65 overlap the boundary between coding and inter-genic regionsTo determine how many of the Fur clusters overlap

with reasonably strong promoters we scanned these 363Fur clusters with a s70 promoter model (47) using a 4-bitcutoff The results showed that 303 clusters overlap withone or more strong promoters (Rigt4 bits) The other60 clusters do not overlap with a strong promoterWhen we scanned the flanking regions of these 60 clusters(50 to 0 of the left edge and 0 to thorn50 of the right edgeof each cluster) only 23 clusters were found to havestrong promoters (Ri gt 4 bits) in the flanking regionsWe examined the locations of Fur sites relative toknown promoter elements (transcriptional start site 10and 35) and found that they are more tightly distributedaround the promoter elements than observed with theDNA binding proteins Fis H-NS and IHF (47) (data notshown) These results suggest that Fur may mainly act as adirect repressor of genes by binding to and blockingthe promoters and that direct activation by Fur throughbinding to the region upstream of a promoter assuggested in Neisseria meningitidis by Delany et al (59)

may not be a common mechanism of Fur regulationin E coli

Out of the 363 clusters 42 were found containingat least one predicted Fur binding site above 170 bits(our arbitrary cutoff used to locate significant regions)(Table 4) Of the 42 clusters 39 were found to overlap withone or more strong s70 promoters (gt4 bits) The otherthree clusters fepD-entS yddA and fecA do not overlapScanning the flanking regions of these three clusters withthe s70 promoter model revealed that these Fur clustersare located between the respective promoters and transla-tional starts There may be only weak repression of thesegenes by Fur as none of them were significantly repressedby Fur according to microarray analyses (1360)

Scans of other proposed Fur-regulated genes

Many genes have been proposed to be Fur-regulated bycomparing conventional consensus sequences to promoterregions and also by homology to systems in other organ-isms Kammler et al annotated two Fur binding sites inthe promoter region of feoA in E coli by comparison toa consensus sequence (10) The 24-site M9 model predictstwo clusters of sites in the same region with each clustercovering one of the marked Fur boxes One cluster hasone major site of 105 bits (at 3538006) the other clusterhas two major overlapping sites of 109 (at 3538059) and134 bits (at 3538065) respectively The same authors haveconfirmed that Fur does bind to this region in vivo but theexact Fur binding regions have not been determined byfootprint experiments

Vassinova and Kozyrev used an in vivo selection tolocate Fur sites on Sau3A fragments from the E coligenome (51) The five regions from Figure 2 of their paperwere analyzed using sequence walkers Using an unidenti-fied consensus sequence yhhX (from 3578828 to 3577791)was predicted by Vassinova and Kozyrev to have two Furbinding sites In the same region we found one strongsite of 252 bits (at 3579054) (Table 4) overlapping twoweaker sites (82 and 83 bits) This cluster of Fur sites isactually located right in front of an sRNA gene ryhB(from 3579039 to 3578946) (Table 4) (6162) which hasbeen shown to be repressed by Fur (57) The promoterregion of gpmA (from 786818 to 786066 named pgm inreference 51) was predicted by consensus to contain twosites One of the highest sites (274 bits) was found in thisregion (at 786853) overlapping two weaker sites of 75and 56 bits The consensus sequence-predicted Fur site byVassinova and Kozyrev (51) is located about 600 basesupstream of the ygaC gene In the same region our 11-siteM9 model only found a 14-bit site (Figure 3) and the24-site M9 model found a negative site of 34 bits(Supplementary Figure S1) Instead in the nearby regionsthe 24-site M9 model found a 74-bit site (11 basesupstream of ygaC) and a 101-bit site (728 bases upstreamof ygaC) For nohA (from 1634391 to 1633822)two overlapping sites of 223 and 126 bits were found at1634624 and 1634630 (Table 4) The fifth region was fhuF(from 4603686 to 4602898) Figure 5 shows the DNase Ifootprints and predicted walkers in this region Onlythe high-affinity region (fhuF1) was identified by the

Table 2 Information theory test of B subtilis Fur models M9 and M6

Oligo1 Number of observed1

Fur dimers (nM)Number of M92

walkers (bits)Number of M62

walkers (bits)

6-mer 0 f 0 f 06-6 1 (500) f 1 (21) 2 (67 54)6-1-6 1 (200) f 1 (97) 2 (54 74)7-1-7 1 (100) f 1 (207) 2 (107 128)8-1-7 2 (100 1000) f 2 (76 202) f 2 (174 108)8-1-8 2 (100 1000) f 2 (76 230) f 2 (174 160)9-1-9 2 (50 100) f 2 (119 301) f 2 (208 194)2 (7-1-7) 2 (10 100) f 2 (166 257) 3 (61 208 143)feuA 1 (10) f 1 (223) 2 (103 81)dhbA 2 (10 100) f 2 (256 299) 3 (149 208 171)

1Oligos and observed Fur dimers are from (40) The Fur concentration(nM) at which one or two Fur dimer-DNA complexes appeared isgiven in parentheses2Number of sequence walkers of each model followed by theinformation (bits) of each walker are shown for each sequence Thepredictions that match the number of observed Fur dimers are markedwith a solid circle while those that do not are marked with an opencircle Only major walkers are counted A full depiction of all walkerscan be found in Supplementary Figure S7

6770 Nucleic Acids Research 2007 Vol 35 No 20

A B

L

H

H

L

H fhuF1

L fhuF2

yjjZ

4603880 4603870 4603860 4603850 4603840 4603830 4603820 5 a t c a g c a a t c c c g g c a g c a a c a c t c c c c a g c c a c t g c c c a g c g t a c g t t g c a a c a t g a t t t c a t c t c 3

4603810 4603800 4603790 4603780 4603770 4603760 4603750 5 t t t c a t t g a t a a t g a t a a c c a a t a t c a t a t g a t a a t t t t t a t c a t t t g c a a g c c a g a t a a a t c c c t t g c t a t c g g 3

Fur high affinity topFur high affinity bottom

M9 161 bits

M9 208 bits

M9 223 bits

M9 34 bits

fhuF

fhuF

4603740 4603730 4603720 4603710 4603700 4603690 4603680 5 g t a a a c c t a t c g c t a t g a t t a g c a a t c a t t a t c a t t t a g a t t a c t a t c c c g a t t a t g g c c t a t c g t t c 3rsquo

Fur low affinity topFur low affinity bottom

M9 22 bits

M9 01 bits

M9 146 bits

M9 93 bits

Figure 5 Fur model scan and DNase I footprints of the fhuF promoter region (A) The promoter region was scanned with the 24-site M9 model and two clusters of sequence walkers (Rigt 0bits) were detected with each closely corresponding to one of the two Fur protected regions (gray bars fhuF1 and fhuF2) The solid black arrow with two tails above the DNA at coordinates4603717 and 4603716 indicates the fhuF transcription starts The open arrows starting at coordinates 4603827 and 4603686 indicate the yjjZ and fhuF translation starts respectively Thesaturation of colored rectangles behind each sequence walker is proportional to the information of that site For reference colored lines cycle with 6-base periodicity (B) DNase I footprinting onthe fhuF promoter by Fur showed two regions protected by the protein marked in the figure by brackets The high-affinity and low-affinity regions are marked with H and L respectively Thefootprinting samples were run in parallel with MaxamndashGilbert sequencing ladders (marked by GA)

Nucleic

Acid

sResea

rch2007V

ol3

5N

o20

6771

consensus sequence Thus four of the five Sau3Afragments (yhhX gpmA nohA and fhuF) selected to haveFur binding sites in vivo were identified in our genomescan (Table 4) and our gel shift results confirmed this(Figure 3) the remaining site ygaC had a weak walker of14 bits (by the 11-site M9 model) and showed extremelyweak binding (Figure 3)Eick-Helmerich and Braun matched a Fur consensus

sequence to the promoters of exbB and exbD (50)One strong walker of 191 bits was found in the exbBpromoter region overlapping the lsquoFur boxrsquo The regionupstream of exbD showed no walkers with Ri greater than0 bits even though it had also been predicted to containa Fur binding site using the same consensus sequenceas used in the exbB promoter The highest walker thatwe could detect in the region upstream of the exbDgene is 71 bits (by the 24-site M9 model Figure 3 andSupplementary Figure S1) significantly lower than thetheoretical lower bound of a binding site (0 bits) (19) Thisindicates that Fur should not bind this region and our gelshift results confirmed this (Figure 3)A microarray analysis by McHugh et al found 101

genes to be regulated by Fur 53 of which were repressedand 48 activated (13) The 53 Fur-repressed genes belongto 32 transcription units (individual genes or operons)18 of which are involved in iron metabolism 14 in energymetabolism and other functions Using the 24-site M9model we predicted strong Fur binding clusters (higherthan 13 bits) for all 18 iron metabolism-related transcrip-tion units Of the other 14 transcription units onlytwo have a Fur binding site higher than 10 bits (nrdHIEF101 bits fimE 108 bits) The 48 Fur-activated genesbelong to 34 transcription units only three of these(garPLRK ynaE and ydfK) have a predicted Fur bindingcluster The garPLRK operon has a weak Fur clusterthat contains one major site of 104 bits The ynaE andydfK genes are almost identical in both promoter andcoding regions and the two Fur clusters (171 bitscentered at thorn8 from the translational start) are the same(Table 4) These results strongly suggest that in E coli

Fur is a direct repressor of iron metabolism genes and anindirect repressor or activator of other genes

McHugh et al (13) used an altered information theoryapproach for modeling Fur binding (63) to identify whichgenes are under direct Fur control Our model identifiedsites in all genes that their model did as well as fourothers fhuE (124 bits at 1160880) exbB (191 bits at3150076) fimE (108 bits at 4539697) and ydfK (171 bitsat 1631071)

The indirect activation by Fur requires an intermediateregulator which could directly target Fur-activated genesA small regulatory RNA gene ryhB is one such importantplayer Earlier study revealed several genes that aredirectly repressed by ryhB (57) Recently a microarrayanalysis of ryhB control by Masse et al (60) expanded thisdirect target set to 18 operons They also identified anadditional 10 indirectly down-regulated operons and 10up-regulated operons The 10 ryhB indirectly repressedoperons are directly repressed by Fur and for all ofthese operons we found strong Fur binding clusters(higher than 13 bits) in the corresponding promoterregions In contrast only a few of the other 28 ryhB-controlled operons (18 directly repressed operons and10 up-regulated operons) have a predicted Fur bindingcluster above 10 bits These include acnA (176 bits at1333484) ydhD (109 bits at 1732367) and oppA (215 bitsat 1298973) The acnA and oppA Fur clusters are bothstrong the former cluster has three overlapping sites of176 98 and 142 bits and the latter has two overlapp-ing sites of 215 and 96 bits (Table 4) Howevermicroarray analysis did not find these two genes to berepressed by Fur (1360) suggesting that the Fur bindingclusters may not be the primary control elements ofthese genes

DISCUSSION

Fur binding models

In this study we used experimentally proven sites to createinformation theory models of Fur binding Our models

Table 3 Scan of footprinted Fur binding sites in E coli

Accession Gene Footprinted region References Number Ri Percentage

NC_000913 fhuF1 4603763 ndash 4603815 This work 4 223 91NC_000913 fhuF2 4603680 ndash 4603733 This work 4 146 66NC_000913 fepA-fes 611868 ndash 611915 (35) 4 187 77NC_000913 fepB 623939 ndash 623969 (39) 2 167 81NC_000913 entC 624054 ndash 624102 (39) 4 231 76NC_000913 fur 709931 ndash 709960 (34) 2 123 83NC_000913 tonB 1309040 ndash 1309072 (838) 3 115 88NC_000913 cir 2244957 ndash 2244999 (32) 3 207 72NC_000913 sodA 4098726 ndash 4098780 (36) 3 233 56NC_000913 fecA 4514752 ndash 4514789 (33) 2 176 66NC_000913 fecI 4516300 ndash 4516340 (33) 7 231 144M10930 iucA 291 ndash 393 (37) 13 203 93NC_000913 fepD-entS 621394 ndash 621462 (58) 4 201 54

The footprinted regions were scanned with the 24-site M9 model (Rigt0) to test the modelrsquos validity This table lists the GenBank accession numberthe gene promoter that was footprinted the coordinates of the footprinted region the references for the footprints the number of walkers in theregion the strongest Ri value in the same region and the percent of the footprint covered by sequence walkers This table is a summary of Figure 5Aand Supplementary Figure S8 Sites in the promoters that are marked by an asterisk were used to build the 11-site models (Figure 1 SupplementaryFigure S1) in the case of iucA two sites were used in the model (see Materials and Methods Section)

6772 Nucleic Acids Research 2007 Vol 35 No 20

especially the 24-site M9 model approximate the bindingcharacteristics of Fur more fully than previous modelsthat depended on conventional consensus sequences datafrom multiple species and sequences that were not foot-printed (13251) The rigorous approach used hererevealed new binding sites disproved two sites predictedby a consensus sequence and clarified the manner inwhich the protein binds

To produce these models binding sequences weremultiply aligned to maximize the information content indifferent window sizes Because multiple Fur sites overlapour analysis of proven Fur binding sequences from three

bacterial species gave two or three different alignmentsdepending on the species (Figure 1 SupplementaryFigure S6) By testing these models against publisheddata we showed clearly that the M9 model representssingle-dimer binding sites the M12 model is for two-dimerbinding while the compact M6 alignment was caused bycompression of the binding sequences into a small window(Tables 1 and 2 Figures 1 and 2 Supplementary FiguresS3 S5 and S7)As evaluated by their respective M9 models our

initial set of 11 footprinted E coli Fur binding sitesall contain overlapping sites of 6-base spacing

Table 4 Strong Fur clusters in the E coli genome predicted by the 24-site M9 model

Ri Number Size Coordinate Genes Experimental Position References(bits) (bp) data

274 12 95 2066611 yoeA G 48 NA274 5 31 786853 gpmA G 35 (1351)253 4 28 579824 nohB 133 (1)252 4 31 3579054 ryhB G 15 (57)233 3 31 4098746 sodA F 87 (6069)231 7 59 4516309 fecI F 51 (1360)231 4 37 624072 entC F 36 (39)228 2 25 1903412 yebN 300 NA223 4 48 4603779 fhuFa FG 93 (135160)223 3 25 1634624 nohA G 233 (51)217 2 25 1886651 fadD G +1119b NA215 2 25 1298973 oppA G 233 (7071)213 3 31 1762463 sufA 53 (1360)207 3 31 2244980 cir F 189 (1372)206 5 40 1118358 yceJ 89 NA202 3 31 3465053 bfd 40 (1360)201 10 53 1787602 ydiE 35 (13)201 2 35 1080513 ycdN 57 (1360)199 8 106 3214572 yqjH 59 (13)199 7 103 1752756 ydhY 255 NA199 2 25 2510784 mntH 56 (7374)196 3 25 2231892 yeiT 163 NA196 2 25 1577358 yddA +8c (13)195 3 31 167439 fhuA G 45 (1360)194 4 37 621437 fepD ndash entS FG 25 86 (1375)193 3 52 675768 ybeQ +2c NA191 3 31 3150076 exbB G 70 (136072)187 8 96 2784036 ygaQ 383 NA187 4 37 611889 fepA ndash fes F 172 149 (13)186 7 38 3266404 yhaC 33 NA186 4 28 331901 yahA G +306e NA183 4 61 331085 yahA 510e NA176 6 41 1333484 acnA 371 (5776ndash78)176 2 25 4514779 fecA F 79 (1360)176 1 19 2254041 yeiL +5d NA175 3 25 490057 priC ndash ybaN 21 49 (7980)173 2 29 2454142 yfcV 474 NA172 8 41 3582772 yrhB 10 NA172 4 37 840877 fiu 123 (13)171 3 40 1631071 ydfK +8c (13)171 3 40 1432273 ynaE +8c (13)171 2 25 4570225 yjiT 212 NA

The strongest individual information value (Ri bits) in each cluster is shown followed by the number of walkers (Rigt 0 bits) the size of the clusterthe coordinate of the strongest site the genes that are presumably controlled experimental data footprinted (F) andor gel shifted (G) sites thelocation of the strongest Fur site relative to the gene start or stop codon (as noted) and references about Fur regulation of the genes (NA means noreference is available)aThis cluster corresponds to the high-affinity region (fhuF1) in the fhuF promoter (Figure 5)bThe coordinate is high because Fur cluster is inside the fadD coding regioncThese Fur clusters overlap with translational startsdThis Fur cluster overlaps with the stop codoneThe yahA gene has two Fur clusters one is inside the coding region another is 510 bases upstream of the translational start The Fur cluster insidethe gene was tested by gel shift experiments (Figure 3)

Nucleic Acids Research 2007 Vol 35 No 20 6773

(Supplementary Figure S8) while only 7 of the 20P aeruginosa Fur binding sites and 6 of the 22 B subtilisFur binding sites are single-dimer binding sites (data notshown) This scarcity of multiple sites explains why anM12 alignment can be obtained for E coli but not forP aeruginosa or B subtilis When we performed multiplealignment by only using overlapping sites of 6-bp spacingwe obtained M12 alignments for both P aeruginosa andB subtilis (data not shown) These observations alsosupport the use of M9 as a single-dimer modelOur multiple alignment method should apply to any

DNA binding sites that contain internal direct repeats asrepetition within binding sites may cause alternativealignments of the sites In the RegulonDB (httpregulondbccgunammx) (64) a Fur binding modelwas created by multiply aligning 47 Fur sites althoughit is not clear what these 47 sites are and how thealignment was made Based on their alignment (httpregulondbccgunammxhtmlmatrix_Alignmentjsp) we made a sequence logo (data notshown) which we found to be similar to our M6 logosAlthough the M6 models appear to be able to predictsome sites (eg Tables 1 and 2 Supplementary Figures S3S5 and S7) the models only represent the internal partof overlapping Fur binding sites ie from 6 to thorn6 of theM12 model (Figure 1A)A 18 A crystal structure for P aeruginosa Fur

was obtained by Pohl et al and based on this theauthors proposed a single-dimer binding model (9) TheP aeruginosa crystal structure contains a putative DNAbinding a-helix H4 and a loop between helices H1 and H2both of which were proposed to be involved in bindingDNA through contacting bases in the major groove Thusa single dimer would protect two consecutive majorgrooves on the same face of DNA and the minimalrecognition unit of Fur should be close to 20 bp (9) OurM9 binding site models (19 bp a 9-1-9 inverted repeat) fitwith this single-dimer binding model closely Pohl et alalso proposed a two-dimer binding model in whichtwo Fur dimers bind two overlapping sites that are 5-bpapart (9) However in our whole genome scans andfootprint scans as well we did not observe any case of5-bp separated overlapping sites above 5 bits Instead wemostly found 3- or 6-bp separated overlapping sitesAbove five bits we found 95 cases of 3-bp spacing and92 cases of 6-bp spacingSeveral other consensus-based Fur binding site models

have been proposed to interpret a 19-bp consensuslsquoFur boxrsquo (Figure 6) (65) Within these two earlier

models the classical model and hexamer model(573754) interpreted the lsquoFur boxrsquo as a single recogni-tion unit of a Fur dimer Later Lavrrar and McIntoshsuggested that a 13-bp inverted repeat (6-1-6) is theminimal unit recognized by a single Fur dimer and thattwo overlapping lsquo6-1-6rsquo motifs correspond to the Fur boxwhich is required for high-affinity binding of Fur (5558)Baichoo and Helmann reinterpreted the Fur box as twooverlapping 7-1-7 inverted repeats with a 6-bp spacingand suggested that a 7-1-7 site but not 6-1-6 representsthe minimal recognition unit of Fur (40) Baichoo andHelmann also demonstrated that high-affinity bindingby Fur can happen on a single-dimer binding site ie aninverted 7-1-7 site

The 7-1-7 model basically agrees with our M9 models(Figure 6) The main difference is that the 7-1-7 model didnot count the weakly conserved bases at positions 9 8thorn8 and thorn9 in the B subtilis Fur binding site alignment(Figure 4C) As the E coli and P aeruginosa M9 modelshave highly conserved bases at these four positions(Figure 4A and B) and also the Fur binding mechanismsfor these three species are similar (9) we suggest thata minimal Fur binding unit of 19 bp (9-1-9) as representedby our M9 models is more reasonable

Relative to our M9 models the Fur box only representsthe internal part of two overlapping sites that areseparated by 6 bases ie from 9 to thorn9 of the M12logo (Figures 1A and 6) thus any predictions based on theFur box may not be accurate Two sites (exbD and ygaC)that were predicted by matching to the Fur box wereshown to be non-binding (Figure 3)

Many difficulties in understanding Fur binding sites canbe attributed to the choice of consensus sequences asa model (66) In contrast to the individual informationweight matrix model the consensus sequence methodignores the varying importance of bases by treatingmismatches equivalently In addition the consensusmethod does not have a criterion for an acceptablenumber of mismatches

Sequence walkers predicted by the 24-site M9 modelcover 83 of the footprints (Table 3 Figure 5Supplementary Figure S8) Similar results were obtainedfor both the P aeruginosa and B subtilis M9 models(data not shown) These results strongly suggest that ourM9 models can accurately predict Fur binding

Fur Binding Site Clusters

The whole genome scan with the 24-site M9model revealed363 Fur binding clusters in E coli most of which containmultiple walkers (up to 14) covering regions up to about140 bases This is comparable to the size of knownfootprints which ranges from 30 to 103 bases (Table 3Supplementary Figure S8) Most of the clusters (303 outof 363 83) were found to overlap with one or morestrong s70 promoters Indeed the four Fur clusters inthe middle of coding regions gspC garP yahA and fadDthat we tested by gel mobility shift assays (Figure 3) allhave one or more potential s70 sites (Ri gt 4 bits) exactlyoverlapping them (see Figures S10ndashS13) The function ofthese control elements is unknown

Fur boxClassical model 9-1-9Hexamer model 6-6-1-6Lavrrar model 6-1-6

Baichoo model 7-1-7

M9 model 9-1-9

GATAATGATAATCATTATC

========gtOlt========

=====gt=====gtOlt=====

=====gtOlt==========gtOlt=====

t=====gtOlt============gtOlt=====a

aat=====gtOlt================gtOlt=====att

Figure 6 Different Fur binding models to interpret the Fur box

6774 Nucleic Acids Research 2007 Vol 35 No 20

We correlated our whole genome scan with publishedmicroarray data about Fur in E coli (1360) The resultsshowed that all operons that are involved in ironmetabolism have a predicted strong Fur binding cluster(higher than 13 bits) suggesting direct repression ofthese genes by Fur In contrast only a few of otherFur-activated or repressed operons have a predicted Furcluster higher than 10 bits suggesting indirect control ofthese genes by Fur It has been shown that RNA generyhB directly targets and represses a set of other genes andryhB itself is directly repressed by Fur thus ryhB playsan important role in indirect activation of genes by Furin E coli (135760)

It has been suggested that in N meningitidis Fur canalso directly activate genes by binding to the regionupstream of the promoters (59) However our wholegenome analysis did not support this activation mechan-ism in E coli The E coli ryhB gene is located betweengenes yhhX and yhhY No corresponding homologs of thegenes ryhB yhhX or yhhY can be found in N meningitidisThis suggests that Fur activation mechanisms may bedifferent in different species

Disentangling Fur models

Each binding site that we have studied in detail withinformation theory has had unique properties andchallenges for analysis For example ribosome bindingsites were initially modeled as rigid objects (16) but whenenough sites became available it was clear that theShinendashDalgarno sequence was being lost because it has avariable distance to the initiation codon (67) and a flexiblemodel was required (48) The flexible model was latersuccessfully applied to s70 promoters (47) and that wasused to study Fur sites Likewise information theoryanalysis revealed that Fis binding sites are commonlyfound to overlap each other (68) leading to someuncertainty as to how much nearby sites influence thesites being studied in the alignments Overlapping Fursites that are bound by Fur protein dimers create an evenmore complex situation

Although the 24-site M9 model was quite successful themodel is comprised of information from multiple Fur sitesbecause the Fur dimers overlap on the DNA Since theclusters frequently display 6-base separation of sitespositions 3 to thorn9 of the model contain data frompositions 9 to thorn3 of the adjacent Fur sites eg see thelogo of Figure 4 and the M9 walkers of Figure 2AIn addition information from sites separated by 3 basesand the complements of all of these also contribute to theoverall information content of the model It is not obvioushow to disentangle these effects We suggest that the M9model can be scanned across the genome to locate strongsingle-dimer binding sites which could then be experi-mentally confirmed and used to construct a disentangledmodel

Unfortunately the distribution of the individual infor-mation in these sites would be determined by the modelused to locate them and this could introduce biases Forexample the mean of the distribution can be altered justby selecting different subsets of sites from the distribution

A similar bias has occurred in binding site databaseswhere the consensus sequence was used to locate newsites (23) Thus although the sequence logo would nolonger contain self-overlapping data it would not neces-sarily represent the natural distribution of site strengthsTo approach such an ultimate model may require furtherscanning of footprinted regions Incorporation of theweaker sites in the footprints would of course re-entanglethe model so such scans might be best used only todetermine whether the model can detect the weaker sitesA viable solution may be to obtain all single-dimerbinding sites in the genome that have more than zero bitsand can be confirmed experimentally If the resultingmodel matches the observed footprints it may be consi-dered complete On the other hand if the resultingmodel does not match the footprints the mode of bindingby Fur could be different between single dimers andclusters

SUPPLEMENTARY DATA

Supplementary data are available at NAR Online

ACKNOWLEDGEMENTS

We thank Tom OrsquoHalloran and Caryn Outten for Furprotein Pete Rogan for helping to develop the local bestconcept Elaine Bucheimer for writing the localbestprogram and Lakshmanan Iyer Brent Jewett DanielleNeedle Xiao Ma Shu Ouyang Denise Rubens andBruce Shapiro for their helpful comments and discussionsKAL also thanks the National Cancer Institute atFrederick for sponsoring the Werner H Kirsten StudentIntern Program This publication has been funded in partwith Federal funds from the National Cancer InstituteNational Institutes of Health under contract NO1-CO-12400 The content of this publication does not necessarilyreflect the views or policies of the Department of Healthand Human Services nor does mention of trade namescommercial products or organizations imply endorsementby the US Government This research was supported inpart by the Intramural Research Program of the NIHNational Cancer Institute Center for Cancer ResearchFunding to pay the Open Access publication charges forthis article was provided by National Cancer Institute

Conflict of interest statement None declared

REFERENCES

1 StojiljkovicI BaumlerAJ and HantkeK (1994) Fur regulon ingram-negative bacteria Identification and characterization of newiron-regulated Escherichia coli genes by a fur titration assay[published erratum appears in J Mol Biol 1994 Jul 15240(3)271]J Mol Biol 236 531ndash545

2 Le CamE FrechonD BarrayM FourcadeA and DelainE(1994) Observation of binding and polymerization of Fur repressoronto operator-containing DNA with electron and atomic forcemicroscopes Proc Natl Acad Sci USA 91 11816ndash11820

3 CalderwoodSB and MekalanosJJ (1987) Iron regulation ofShiga-like toxin expression in Escherichia coli is mediated by thefur locus J Bacteriol 169 4759ndash4764

Nucleic Acids Research 2007 Vol 35 No 20 6775

4 NiederhofferEC NaranjoCM BradleyKL and FeeJA (1990)Control of Escherichia coli superoxide dismutase (sodA and sodB)genes by the ferric uptake regulation (fur) locus J Bacteriol 1721930ndash1938

5 EscolarL Perez-MartinJ and de LorenzoV (1998) Binding of theFur (Ferric Uptake Regulator) repressor of Escherichia coli toarrays of the GATAAT sequence J Mol Biol 283 537ndash547

6 BaggA and NeilandsJB (1987) Ferric uptake regulation proteinacts as a repressor employing iron (II) as a cofactor to bind theoperator of an iron transport operon in Escherichia coliBiochemistry 26 5471ndash5477

7 de LorenzoV WeeS HerreroM and NeilandsJB (1987)Operator sequences of the aerobactin operon of plasmid ColV-K30binding the ferric uptake regulation (fur) repressor J Bacteriol169 2624ndash2630

8 AlthausEW OuttenCE OlsonKE CaoH andOrsquoHalloranTV (1999) The ferric uptake regulation (Fur) repressoris a zinc metalloprotein Biochemistry 38 6559ndash6569

9 PohlE HallerJC MijovilovichA Meyer-KlauckeWGarmanE and VasilML (2003) Architecture of a protein centralto iron homeostasis crystal structure and spectroscopic analysis ofthe ferric uptake regulator Mol Microbiol 47 903ndash915

10 KammlerM SchonC and HantkeK (1993) Characterization ofthe ferrous iron uptake system of Escherichia coli J Bacteriol 1756212ndash6219

11 DesaiPJ AngererA and GencoCA (1996) Analysis of Furbinding to operator sequences within the Neisseria gonorrhoeae fbpApromoter J Bacteriol 178 5020ndash5023

12 HeidrichC HantkeK BierbaumG and SahlHG (1996)Identification and analysis of a gene encoding a Fur-likeprotein of Staphylococcus epidermidis FEMS Microbiol Lett 140253ndash259

13 McHughJP Rodriguez-QuinonesF Abdul-TehraniHSvistunenkoDA PooleRK CooperCE and AndrewsSC(2003) Global iron-dependent gene regulation in Escherichia coliA new mechanism for iron homeostasis J Biol Chem 27829478ndash29486

14 ShannonCE (1948) A mathematical theory of communicationBell Syst Tech J 27 379ndash423 623ndash656 httpcmbell-labscomcmmswhatshannondaypaperhtml

15 PierceJR (1980) An Introduction to Information Theory SymbolsSignals and Noise 2nd edn Dover Publications Inc New York

16 SchneiderTD StormoGD GoldL and EhrenfeuchtA (1986)Information content of binding sites on nucleotide sequencesJ Mol Biol 188 415ndash431 httpwwwccrnpncifcrfgov~tomspaperschneider1986

17 SchneiderTD and MastronardeD (1996) Fast multiple alignmentof ungapped DNA sequences using information theory and arelaxation method Discrete Appli Math 71 259ndash268 httpwwwccrnpncifcrfgov~tomspapermalign

18 SchneiderTD and StephensRM (1990) Sequence logos a newway to display consensus sequences Nucleic Acids Res 186097ndash6100 httpwwwccrnpncifcrfgov~tomspaperlogopaper

19 SchneiderTD (1997) Information content of individual geneticsequences J Theor Biol 189 427ndash441 httpwwwccrnpncifcrfgov~tomspaperri

20 SchneiderTD (1997) Sequence walkers a graphical method todisplay how binding proteins interact with DNA or RNAsequences Nucleic Acids Res 25 4408ndash4415 httpwwwccrnpncifcrfgov~tomspaperwalker erratum NAR 26(4)1135 1998

21 PaninaEM MironovAA and GelfandMS (2001) Comparativeanalysis of FUR regulons in gamma-proteobacteria Nucleic AcidsRes 29 5195ndash5206

22 SchneiderTD (2000) Evolution of biological informationNucleic Acids Res 28 2794ndash2799 httpwwwccrnpncifcrfgov~tomspaperev

23 VyhlidalCA RoganPK and LeederJS (2004) Development andrefinement of pregnane X receptor (PXR) DNA binding site modelusing information theory insights into PXR-mediated gene regula-tion J Biol Chem 279 46779ndash46786

24 ZhengM DoanB SchneiderTD and StorzG (1999) OxyR andSoxRS regulation of fur J Bacteriol 181 4639ndash4643 httpwwwccrnpncifcrfgov~tomspaperoxyrfur

25 HengenPN BartramSL StewartLE and SchneiderTD (1997)Information analysis of Fis binding sites Nucleic Acids Res 254994ndash5002 httpwwwccrnpncifcrfgov~tomspaperfisinfo

26 WoodTI GriffithKL FawcettWP Jair K-W SchneiderTDand WolfRE (1999) Interdependence of the position andorientation of SoxS binding sites in the transcriptional activation ofthe class I subset of Escherichia coli superoxide-inducible promotersMol Microbiol 34 414ndash430

27 ZhengM WangX DoanB LewisKA SchneiderTD andStorzG (2001) Computation-directed identification of OxyR-DNAbinding sites in Escherichia coli J Bacteriol 183 4571ndash4579

28 RoganPK FauxBM and SchneiderTD (1998) Informationanalysis of human splice site mutations Hum Mutat 12 153ndash171Erratum in Hum Mutat 199913(1)82 httpwwwccrnpncifcrfgov~tomspaperrfs

29 ChenZ and SchneiderTD (2005) Information theory basedT7-like promoter models classification of bacteriophages anddifferential evolution of promoters and their polymerasesNucleic Acids Res 33 6172ndash6187 httpwwwccrnpncifcrfgov~tomspaperst7like

30 ChenZ and SchneiderTD (2006) Comparative analysis of tandemT7-like promoter containing regions in enterobacterial genomesreveals a novel group of genetic islands Nucleic Acids Res 341133ndash1147 httpwwwccrnpncifcrfgov~tomspaperst7island

31 SchneiderTD (1996) Reading of DNA sequence logos predictionof major groove binding by information theory Meth Enzymol274 445ndash455 httpwwwccrnpncifcrfgov~tomspaperoxyr

32 GriggsDW and KoniskyJ (1989) Mechanism for iron-regulatedtranscription of the Escherichia coli cir gene metal-dependentbinding of Fur protein to the promoters J Bacteriol 1711048ndash1052

33 AngererA and BraunV (1998) Iron regulates transcription of theEscherichia coli ferric citrate transport genes directly and throughthe transcription initiation proteins Arch Microbiol 169 483ndash490

34 de LorenzoV HerreroM GiovanniniF and NeilandsJB (1988)Fur (ferric uptake regulation) protein and CAP (catabolite-activatorprotein) modulate transcription of fur gene in Escherichia coliEur J Biochem 173 537ndash546

35 HuntMD PettisGS and McIntoshMA (1994) Promoter andoperator determinants for fur-mediated iron regulation in thebidirectional fepA-fes control region of the Escherichia colienterobactin gene system J Bacteriol 176 3944ndash3955

36 TardatB and TouatiD (1993) Iron and oxygen regulation ofEscherichia coli MnSOD expression competition between the globalregulators Fur and ArcA for binding to DNA Mol Microbiol 953ndash63

37 EscolarL Perez-MartinJ and de LorenzoV (2000) Evidence ofan unusually long operator for the fur repressor in the aerobactinpromoter of Escherichia coli J Biol Chem 275 24709ndash24714

38 YoungGM and PostleK (1994) Repression of tonB transcriptionduring anaerobic growth requires Fur binding at the promoterand a second factor binding upstream Mol Microbiol 11943ndash954

39 BrickmanTJ OzenbergerBA and McIntoshMA (1990)Regulation of divergent transcription from the iron-responsivefepB-entC promoter-operator regions in Escherichia coliJ Mol Biol 212 669ndash682

40 BaichooN and HelmannJD (2002) Recognition of DNA by Fura reinterpretation of the Fur box consensus sequence J Bacteriol184 5826ndash5832

41 OchsnerUA and VasilML (1996) Gene repression by the ferricuptake regulator in Pseudomonas aeruginosa cycle selection of iron-regulated genes Proc Natl Acad Sci USA 93 4409ndash4414

42 BaichooN WangT YeR and HelmannJD (2002) Globalanalysis of the Bacillus subtilis Fur regulon and the iron starvationstimulon Mol Microbiol 45 1613ndash1629

43 ChristoffersenCA BrickmanTJ Hook-BarnardI andMcIntoshMA (2001) Regulatory architecture of the iron-regulatedfepD-ybdA bidirectional promoter region in Escherichia coliJ Bacteriol 183 2059ndash2070

44 ChaiS WelchTJ and CrosaJH (1998) Characterization of theinteraction between Fur and the iron transport promoter of thevirulence plasmid in Vibrio anguillarum J Biol Chem 27333841ndash33847

6776 Nucleic Acids Research 2007 Vol 35 No 20

45 EscolarL Perez-MartinJ and de LorenzoV (1998) Coordinatedrepression in vitro of the divergent fepA-fes promoters ofEscherichia coli by the iron uptake regulation (Fur) proteinJ Bacteriol 180 2579ndash2582

46 EscolarL de LorenzoV and Perez-MartinJ (1997)Metalloregulation in vitro of the aerobactin promoter of Escherichiacoli by the Fur (ferric uptake regulation) protein Mol Microbiol26 799ndash808

47 ShultzabergerRK ChenZ LewisKA and SchneiderTD (2007)Anatomy of Escherichia coli s70 promoters Nucleic Acids Res 35771ndash788 httpwwwccrnpncifcrfgov~tomspaperflexprom

48 ShultzabergerRK BucheimerRE RuddKE andSchneiderTD (2001) Anatomy of Escherichia coli ribosomebinding sites J Mol Biol 313 215ndash228 httpwwwccrnpncifcrfgov~tomspaperflexrbs

49 HiraoI KawaiG YoshizawaS NishimuraY IshidoYWatanabeK and MiuraK (1994) Most compact hairpin-turnstructure exerted by a short DNA fragment d(GCGAAGC) insolution an extraordinarily stable structure resistant to nucleasesand heat Nucleic Acids Res 22 576ndash582

50 Eick-HelmerichK and BraunV (1989) Import of biopolymers intoEscherichia coli nucleotide sequences of the exbB and exbD genesare homologous to those of the tolQ and tolR genes respectivelyJ Bacteriol 171 5117ndash5126

51 VassinovaN and KozyrevD (2000) A method for direct cloning ofFur-regulated genes identification of seven new Fur-regulated lociin Escherichia coli Microbiology 146 3171ndash3182

52 ToledanoMB KullikI TrinhF BairdPT SchneiderTD andStorzG (1994) Redox-dependent shift of OxyR-DNA contactsalong an extended DNA binding site a mechanism for differentialpromoter selection Cell 78 897ndash909

53 BlattnerFR PlunkettG III BlochCA PernaNT BurlandVRileyM Collado-VidesJ GlasnerJD RodeCK et al (1997)The complete genome sequence of Escherichia coli K-12 Science277 1453ndash1474

54 BaggA and NeilandsJB (1987) Molecular mechanism of regula-tion of siderophore-mediated iron assimilation Microbiol Rev 51509ndash518

55 LavrrarJL and McIntoshMA (2003) Architecture of a Furbinding site a comparative analysis J Bacteriol 185 2194ndash2202

56 OrchardK and MayGE (1993) An EMSA-based method fordetermining the molecular weight of a protein ndash DNA complexNucleic Acids Res 21 3335ndash3336

57 MasseE and GottesmanS (2002) A small RNA regulates theexpression of genes involved in iron metabolism in Escherichia coliProc Natl Acad Sci USA 99 4620ndash4625

58 LavrrarJL ChristoffersenCA and McIntoshMA (2002)FurndashDNA interactions at the bidirectional fepDGC-entS promoterregion in Escherichia coli J Mol Biol 322 983ndash995

59 DelanyI RappuoliR and ScarlatoV (2004) Fur functions as anactivator and as a repressor of putative virulence genes in Neisseriameningitidis Mol Microbiol 52 1081ndash1090

60 MasseE VanderpoolCK and GottesmanS (2005) Effect ofRyhB small RNA on global iron use in Escherichia coliJ Bacteriol 187 6962ndash6971

61 ArgamanL HershbergR VogelJ BejeranoG WagnerEGMargalitH and AltuviaS (2001) Novel small RNA-encodinggenes in the intergenic regions of Escherichia coli Curr Biol 11941ndash950

62 WassarmanKM RepoilaF RosenowC StorzG andGottesmanS (2001) Identification of novel small RNAs usingcomparative genomics and microarrays Genes Dev 15 1637ndash1651

63 RobisonK McGuireAM and ChurchGM (1998) A compre-hensive library of DNA-binding site matrices for 55 proteins appliedto the complete Escherichia coli K-12 genome J Mol Biol 284241ndash254

64 SalgadoH Santos-ZavaletaA Gama-CastroS Millan-ZarateDDiaz-PeredoE Sanchez-SolanoF Perez-RuedaEBonavides-MartinezC and Collado-VidesJ (2001) RegulonDB(version 32) transcriptional regulation and operon organization inEscherichia coli K-12 Nucleic Acids Res 29 72ndash74

65 AndrewsSC RobinsonAK and Rodriguez-QuinonesF (2003)Bacterial iron homeostasis FEMS Microbiol Rev 27 215ndash237

66 SchneiderTD (2002) Consensus Sequence Zen ApplBioinformatics 1 111ndash119 httpwwwccrnpncifcrfgov~tomspaperszen

67 RuddKE and SchneiderTD (1992) Compilation of E coliribosome binding sites In MillerJH (ed) A Short Course inBacterial Genetics A Laboratory Manual and Handbook forEscherichia coli and Related Bacteria Cold Spring HarborCold Spring Harbor Laboratory Press New York pp 1719ndash1745

68 HengenPN LyakhovIG StewartLE and SchneiderTD(2003) Molecular flip-flops formed by overlapping Fis sitesNucleic Acids Res 31 6663ndash6673

69 FeeJA (1991) Regulation of sod genes in Escherichia colirelevance to superoxide dismutase function Mol Microbiol 52599ndash2610

70 IgarashiK SaishoT YuguchiM and KashiwagiK (1997)Molecular mechanism of polyamine stimulation of thesynthesis of oligopeptide-binding protein J Biol Chem 2724058ndash4064

71 YoshidaM MeksuriyenD KashiwagiK KawaiG andIgarashiK (1999) Polyamine stimulation of the synthesis ofoligopeptide-binding protein (OppA) Involvement of a structuralchange of the Shine-Dalgarno sequence and the initiation codonaug in oppa mRNA J Biol Chem 274 22723ndash22728

72 LitwinCM and CalderwoodSB (1993) Role of iron in regulationof virulence genes Clin Microbiol Rev 6 137ndash149

73 MakuiH RoigE ColeST HelmannJD GrosP andCellierMF (2000) Identification of the Escherichia coli K-12Nramp orthologue (MntH) as a selective divalent metal iontransporter Mol Microbiol 35 1065ndash1078

74 PatzerSI and HantkeK (2001) Dual repression by Fe(2+)-Furand Mn(2+)-MntR of the mntH gene encoding an NRAMP-likeMn(2+) transporter in Escherichia coli J Bacteriol 1834806ndash4813

75 FurrerJL SandersDN Hook-BarnardIG and McIntoshMA(2002) Export of the siderophore enterobactin in Escherichia coliinvolvement of a 43 kDa membrane exporter Mol Microbiol 441225ndash1234

76 CunninghamL GruerMJ and GuestJR (1997) Transcriptionalregulation of the aconitase genes (acnA and acnB) of Escherichiacoli Microbiology 143 ( Pt 12) 3795ndash3805

77 FuentesAM Diaz-MejiaJJ Maldonado-RodriguezR andAmabile-CuevasCF (2001) Differential activities of theSoxR protein of Escherichia coli SoxS is not required for geneactivation under iron deprivation FEMS Microbiol Lett 201271ndash275

78 VargheseS TangY and ImlayJA (2003) Contrasting sensitivitiesof Escherichia coli aconitases A and B to oxidation and irondepletion J Bacteriol 185 221ndash230

79 ZavitzKH DiGateRJ and MariansKJ (1991) The PriB andPriC replication proteins of Escherichia coli Genes DNAsequence overexpression and purification J Biol Chem 26613988ndash13995

80 SandlerSJ (2005) Requirements for replication restart proteinsduring constitutive stable DNA replication in Escherichia coli K-12Genetics 169 1799ndash1806

81 SchneiderTD (2001) Strong minor groove base conservation insequence logos implies DNA distortion or base flipping duringreplication and transcription initiation Nucleic Acids Res 294881ndash4891 httpwwwccrnpncifcrfgov~tomspaperbaseflip

Nucleic Acids Research 2007 Vol 35 No 20 6777

Page 2: Discovery of Fur binding site clusters in Escherichia coli ...biitcomm/research/references... · using an fhuF:lacZ fusion and Fur consensus sequence-containing plasmid titrant on

lsquoFur titration assayrsquo to locate new Fur binding sitesusing an fhuFlacZ fusion and Fur consensus sequence-containing plasmid titrant on MacConkey plates (1)Several new iron-regulated genes in E coli were discoveredusing this consensus sequence-based technique In addi-tion to the above studies have also been carried out usingE coli Fur for DNase I footprinting with non-E coliDNA (1112) Recently transcriptional profiles of E coligenes have been used to determine those that are regulatedby iron and Fur by evaluating mRNA levels in the absenceof iron or Fur protein (13)

Another method for finding Fur-regulated genes is touse molecular information theory to locate new bindingsites Using this approach classical information theory(1415) is applied to molecular biology (16) First a set ofbinding sites is aligned by maximizing the informationcontent (17) and then the average pattern at the sitesis represented by a computer graphic called a sequencelogo (18) Next the conservation of bases in the alignedset is used to create a weight matrix model that assignsa weight in bits to each base at each position according toits frequency in the data set (19) This can be displayedusing the sequence walker graphic (20)

In addition to displaying details of binding sitessequence logos can be used to understand the mechanismof binding In instances where factors bind in overlappingclusters it is difficult to assign the relative contribution ofa base in an overlapping region to the appropriate binderor to determine the range of the binding site Here wetested several Fur binding site models that were obtainedby multiply aligning Fur binding sequences using differentwindow sizes and identified the model that best representsbinding by a single Fur dimer

Information theory has previously been used to buildtwo models to evaluate and predict Fur binding sites(1321) Both models used ad hoc variations of informa-tion theory to assign scores to the predicted binding sitesrather than classical information content in bits In onecase the model was built using some sites that had notbeen footprinted by Fur and were probably not aligned tomaximize the information content (21)

The most rigorous approach to model building is to usea data set comprised of only footprinted binding sitesfrom one species By restricting the data set to experi-mentally proven sites one is certain that the model willreflect the binding characteristics of the protein the use ofa single species ensures that the protein and DNA bindingsequences evolved together and therefore correspondto one another (22) Many biases from previous modelsare thereby avoided but not all (23) The resultingexperimentally supported model is then scanned acrossthe entire genome of the species looking for sequencesthat contain a positive amount of information asevaluated by the weight matrix (19) Sequence walkerswhich are graphical representations of individual bindingsites then display probable binding sites on the genomebased on the model of proven sites (20) This method wassuccessfully used to discover that the OxyR transcriptionfactor controls the expression of the fur gene (24) toidentify additional sites for proteins such as Fis SoxSOxyR and PXRRXR (2325ndash27) to characterize splice

junctions (28) and to discover T7 islands a unique classof mobile genetic elements (2930) In this studyinformation theory has allowed us to identify new Furbinding sites 13 of which we confirmed experimentally

MATERIALS AND METHODS

Programs

Programs used in this study are available at httpwwwccrnpncifcrfgov~toms A web-based toolfor searching for Fur sites is available at httpwwwccrnpncifcrfgov~tomsdelilaserverhtml

Creating the Fur model

Eleven footprinted E coli Fur binding sequences wereextracted from the E coli K-12 genome (NC_000913)by the delila program (31) These sites are from thepromoters of the genes cir fecA fecI fur sodA tonB andiucA along with two bidirectional promoter regions forthe genes fepA-fes and fepB-entC (832ndash39) The promoterregion at fepB-entC has two distinctly protected regionseach region was included in the data set as individualsequences (fepB and entC) The promoter of iucA has anexceptionally long secondary footprint and so two regionswere used from the E coli plasmid ColV-K30 (M10930from 347 to 370 and from 365 to 393) (7) The comple-ment of each footprinted sequence was also includedsince Fur binds as a dimer (67940) The programmalign was used to obtain an alignment of the sequencesthat maximizes the information content within definedwindows of the alignment (17)As the initial model described above contains only

11 sites we then refined the model by including 13 moresites that were identified in the genome by a searchand confirmed by gel shift experiments in this study(see subsequently) The validity of the refined modelwas tested by scanning it across all available E coliFur footprinted regions using the programs scan andlister to create sequence walkers (1920) The model isverified if the walkers correspond to DNase I footprintregions For further verification the model was alsoscanned across synthetic oligonucleotides that containGATAAT repeats and which had been previouslyfootprinted (5)To compare Fur models of different bacteria we also

built Pseudomonas aeruginosa and Bacillus subtilis modelsby using footprinted Fur binding sequences from thesetwo species (4142)

Scanning for Fur binding sites

Fur in high concentration can bind weak sites while inlow concentration it binds only strong sites The affinity ofFur to its binding site also varies significantly with theavailability of metal Furthermore as demonstrated byfootprint data Fur binding can extend to flanking regionsthat bear little resemblance to the strong Fur binding sitesThis suggests that it may not be practical to use a fixedcutoff for a Fur binding site model under different

Nucleic Acids Research 2007 Vol 35 No 20 6763

conditions In our study we used two cutoff values fordifferent scansFor the whole E coli genome scan we used the lowest

information in the set of sites used to build the model asthe cutoff This should minimize false positives in ourpredictions as the model was built from experimentallyproven sites Groups of sites that were within 100 basesof each other were identified using the localbest programand the strongest one was selected to represent theregion This ensured that each region in the data set wasunique All sites with Ri values greater than 170 bits wereextracted (Table 4) This value was chosen simply to allowfor a manageable set of regions for further analysisAs Fur can extend binding to much weaker regions in

the footprints we should use a lower cutoff for scanningfootprinted sequences The second law of thermodynamicssets a theoretical lower bound for the information contentof a binding site (Ri) at zero bits (19) Therefore we useda zero-bit cutoff when scanning footprinted sequencesFur has been suggested to repress genes by binding to

their promoter elements (35 and 10 regions) therebyblocking the access of polymerase to the promoters(3343ndash46) To determine how many of the predictedFur binding clusters overlap with promoters we used aninformation theory-based flexible s70 binding model toscan the Fur clusters The s70 model was built from401 experimentally proven E coli promoters by uniformlytaking into account the information present in the 10the 35 and the uncertainty of the spacing between them(47) the same method has been successfully used to modelprokaryotic ribosome binding sites (48) The informationcontent (average conservation) of the promoter model islow (x frac14 65 bits SD=28 bits for the Ri distribution)and the individual information for a site in the modelranges from 03 to 126 bits The probability that a site isless than 4 bits is only 186 so we used 4 bits as a cutoffto predict reasonably strong non-activated promoters (47)

Gel mobility shift assays

Two sets of oligonucleotides containing predictedFur binding sites in E coli were designed andsynthesized (Oligos Etc) All oligonucleotides were self-complementary had a 50-GCTA-30 overhang on the50 end and contained a hairpin loop Oligos exbB exbDand fhuF contained a hairpin of the sequence 50-CGCGAAGC G-30 while the other 12 oligos [yoeAfepD-entS (formerly ybdA) gpmA ryhB fhuA nohAoppA gspC garP (formerly yhaU) yahA fadD and ygaC]contained a hairpin of the sequence 50-ACGATCGCGCGAAGC GCGATCGT-30 in the center Such loopsform a structure that is stabilized by base pairing betweenG3 and A5 of the central seven bases of each loop (49) andthey are convenient for use in DNA mobility shift assaysbecause of the exact equimolar concentrations of thecomplementary strands and their high stability (25)Three oligos containing the promoter regions for exbB

ygaC and the upstream region of exbD were created totest previously published consensus sequence predictions(5051) Eight oligos contained potential Fur-controlledpromoters identified using both an 11-site Fur model and

a s70 model as described above [yoeA fepD-entS (formerlyybdA) gpmA ryhB fhuA nohA oppA and fhuF] We werealso interested in whether Fur binds intragenically so fouroligos were synthesized that contained strong predictedsites found within gene coding regions (gspC garP yahAand fadD) These sequences can be found in SupplementaryTable S1 and Figure S1

Gel mobility shift assays were performed using the15 oligos (Figure 3) The oligonucleotides were labeled bya fill-in reaction with Taq DNA polymerase The 20 ml oflabeling mixture [20 pmol oligo 2 mM MgCl2 1PCRbuffer 5 units Taq polymerase (GibcoBRL) and 10 mMtetramethylrhodamine-6-dUTP (NEN)] was incubated at72C for 1 h 80 ml of ddH2O was added to the mixturefollowed by two phenol extractions a phenolchloroformextraction and a chloroform extraction The labeledoligonucleotides were diluted 15 in TE (10 mM TrisndashClpH 75 1 mM EDTA) boiled for 10 min and then placedon ice to prevent dimerization and trimerization

The labeled oligonucleotides (5 ml each 02 pmol) werethen incubated in Fur binding buffer (10 mM BisndashTrisndashHCl pH 75 5 mgml sonicated salmon sperm DNA 5glycerol 100 mM MnCl2 100 mgml BSA 1 mM MgCl240 mM KCl) with 150 nM Fur protein at 37C for 13 min(7) Fur was a gift from T OrsquoHalloran and C OuttenSamples were electrophoresed on a 5 polyacrylamidegel in Fur electrophoresis buffer (01 M BisndashTrisndashHClpH 75 10 mM MnCl2) at 120V for 1 h

Bands were visualized with an FMBIO II fluore-scent scanner (Hitachi) with an excitation wavelengthof 532 nm and a 585 nm filter for detection oftetramethylrhodamine

Footprinting

To generate the fhuF promoter construct (pGSO129) usedto test for Fur binding a 240-bp fragment amplified byPCR from genomic DNA (using the primers 50-GCGGCT GGA GAT GAA TTC GCC AGA TG and 50-GCCCTG CAA TCA GGG ATC CCG GCA GC) was clonedinto the BamHI and EcoRI sites of pUC18 (27) PurifiedFur protein generously provided by C Outten andT OrsquoHalloran was incubated with a Mn2+-containingbuffer according to de Lorenzo et al (7) DNase I foot-printing then was carried out as described previously (52)The 240-bp BamHIndashEcoRI fragment of pGSO129 waslabeled with [g-32P]ATP at either the BamHI site (topstrand) or the EcoRI site (bottom strand) The labeledfragments were incubated with 0 50 100 200 and 400 nMpurified Fur protein at room temperature for 5 minThe samples then were subjected to limited DNase Idigestion purified and separated on 8 polyacrylamidesequencing gels

RESULTS

E coli Fur binding model

Eleven footprinted Fur binding sites (see Table 3 andSupplementary Figure S1) (832ndash39) from E coliK-12 (53)were used to create an initial information theory model

6764 Nucleic Acids Research 2007 Vol 35 No 20

Because Fur binds as a dimer (954) the model includedboth the footprinted Fur sequences and their comple-ments The sequences were aligned using the programmalign which maximizes the information content within aregion of the alignment (a lsquowindowrsquo) by shuffling the

sequences back and forth (17) Multiple alignmentwith a window size of (12 thorn12) gave a sequencelogo that shows base conservation in the range of(12 thorn12) (Figure 1A) For convenience we named thislogo M12

A

B

C

M12208 bits(minus12 +12)

M9181 bits(minus9 +9)

M6154 bits(minus6 +6)

===========gtolt===========

========gtolt========

=====gtolt=====

========gtolt================gtolt========

=====gtolt==========gtolt=====

0

1

2

bit

s

5prime

minus20

GTAC

minus19

minus18

CGTA

minus17

CGTA

minus16

GACT

minus15

ACGT

minus14

GTCA

minus13

minus12

GTA

minus11

GTA

minus10

GACT

minus9CTAG

minus8

TCGA

minus7

GCTA

minus6

TCA

minus5

TA

minus4

CT

minus3

TAG

minus2

CGA

minus1

CGT

0

TA

1

GCA

2

GCT

3

ATC

4

GA

5

AT

6

GAT

7

GCTA

8

AGCT

9

GATC

10

CTGA

11

CAT

12

CAT

13 14

CAGT

15

TGCA

16

CTGA

17

GCAT

18

GCAT

19 20

CATG

3prime

0

1

2

bit

s

5prime

minus20

CGTA

minus19

minus18

minus17

GTCA

minus16

GCAT

minus15

GCTA

minus14

GTA

minus13

GACT

minus12

CAGT

minus11

CGA

minus10

GCAT

minus9

TA

minus

minus8

GTA

minus7

GCAT

minus6

CAG

minus5

CA

minus4

GAT

minus3

TA

minus2

GCA

minus1

GCT

0 1

CGA

2

CGT

3

AT

4

CTA

5

GT

6

GTC

7

GCTA

8CAT

9AT10

CGTA

11

GACT

12

GTCA

13

TCGA

14

CAT

15

CGAT

16

CGTA

17

CAGT

18 19 20

GCAT

3prime

0

1

2

bit

s

5prime minus20

GCA

minus19

GCAT

minus18

GCTA

minus

minus17

GCTA

minus16

GCAT

minus15

CAGT

minus14

TGCA

minus13

GTAC

minus12

GCTA

minus11

GTA

minus10

GACT

minus9

CAG

minus8

CTGA

minus7

GCAT

minus6

A

minus5

TA

minus4

CT

minus3

TCAG

minus2

CGA

minus1

AGT

0

TA

1

TCA

2

GCT

3

GATC

4

GA

5

AT

6

T7

GCTA

8

GACT

9

GATC

10

TCGA

11

CAT

12

GCAT

13

CTGA

14

ACGT

15

GTCA

16

GTCA

17

GCAT

18

GCAT

19

CGTA

20

CAGT

3prime

Figure 1 Different alignments of footprinted E coli Fur binding sites Sequence alignments were done using the program malign with differentwindow sizes [from (15 +15) to (5 +5)] (17) on 11 footprinted E coli Fur binding sequences and their complements (see Supplementary FigureS1 for sequences) Three classes of alignments M12 (A) M9 (B) and M6 (C) were obtained by using window sizes of (12 +12) (9 +9) and(6 +6) respectively In these logos the height of each letter is proportional to the frequency of that base at each position and the height of theletter stack is the conservation in bits (18) The dashed sine wave on each logo represents the 106 base helical twist of B-form DNA (3181) Thedouble dashed arrows on the top of each logo mark the inverted repeats in each model Note that the M12 logo corresponds to two overlapping M9sites as indicated by the two red double dashed arrows below the M12 logo A similar relationship occurs between the M9 and M6 logos

Nucleic Acids Research 2007 Vol 35 No 20 6765

The M12 logo appears to consist of two overlappingdirect repeats one is from 12 to thorn6 and the other from6 to thorn12 Because the multiple alignment process canincorporate one or the other of these repeats this suggeststhat different alignments may be obtained when otherwindows are used to align the sequences Because two Furdimers can bind overlapping sites with 6-base separation(4055) we then multiply aligned the sequences by usingwindow sizes ranging from (15 thorn15) to (5 thorn5) Thesesizes should cover the sites recognized by one or twoFur dimers (9) A total of seven different alignments wereobtained the corresponding sequence logos are shown inSupplementary Figure S2 According to the patternsshown in the logos the seven alignments that wereobtained can be further consolidated into and assignedto three basic alignment classes these alignment classesare represented by logos M12 M9 and M6 (Figure 1)Each of the three alignments itself is an inverted repeatthe M12 logo also appears to consist of two overlappingM9 sites (from 12 to thorn6 and from 6 to thorn12) and theM9 logo looks like two overlapping M6 sites (from 9to thorn3 and from 3 to thorn9) (Figure 1)To determine which of the three alignment classes best

represents a single-dimer binding model some publishedexperimental data was analyzed Lavrrar and McIntosh(55) used the Ferguson method (56) to determine howmany E coli Fur dimers can bind on five synthetic oligosand two natural sites fepB and entS We applied the threemodels to their sequences and used sequence walkersto show the predicted sites of each model A full list ofthe predictions (with walkers gt0 bits) is given inSupplementary Figure S3 The results show that themodels frequently found overlapping sites that areseparated by 3 or 6 bases The sites that are 6 basesapart can both be strong while 3-base separated siteshave one strong site and one weak site (SupplementaryFigure S3) Furthermore it has been shown that Furdimers can simultaneously bind two overlapping sites with6-bp spacing (40) but there is no experimental datashowing that 3-base separated overlapping sites can bebound at the same time Therefore to analyze the resultswe only counted major walkers (strong walkers that areseparated by 6 bases) for each model in each oligo

A summary of the results (with major walkers) is availablein Table 1 two examples of the prediction (with majorwalkers) are shown in Figure 2A and B The results showthat the M9 model made correct predictions for five ofthe seven oligos while the M12 and M6 models bothmade only two correct predictions Furthermore the M12model always predicted one less site than the M9 modelThese results suggest that the M9 model may be a single-dimer binding model the M12 model may representa two-dimer binding model while the M6 alignment maybe just a result of compression of the binding sequencesinto a small alignment window

Confirmation of 13 predicted sites and model refinement

Since our E coli Fur model had only 11 sites we wantedto refine the model by including more sites A genome scanwith the 11-site M9 model revealed 389 sites above 11 bits(the lowest information of a site in the model is 111 bits)Within these 13 sites (from 12 to 26 bits) were selectedfor experimental confirmation (Figure 3 SupplementaryTable S1 and Figure S1) These include eight sites locatedin promoter regions [yoeA fepD-entS (43) gpmA (= pgm)(51) ryhB (= yhhX-yhhY) (5157) fhuA (13) nohA (51)oppA (13) and fhuF (51)] and four sites inside genes(gspC garP yahA and fadD) We also included two sitesthat were predicted by consensus models to be in ygaCand exbD (5051) since our model only detected a weaksite (14 bits) in ygaC and a negative site (50 bits) inexbD (Figure 3) Gel mobility shift assays showed that all13 sites predicted by the M9 model shifted when incubatedwith Fur protein while the exbD site did not shift and theygaC site only showed extremely weak binding (Figure 3)

McHugh et al (13) found that both exbB and exbDwere regulated by Fur but their model did not detect abinding site in the exbB promoter while our informationtheory model successfully predicted that Fur binding siteand our gel shift results confirmed this (Figure 3)The exbD gene is located only seven bases downstreamof the exbB gene Both genes have the same orientationso they may belong to a single operon cotranscribed fromthe exbB promoter and coregulated by Fur

To strengthen our model we added the 13 confirmedsites to the 11 previously footprinted sites (Figure 4A)

Table 1 Information theory test of 11-site E coli Fur models M12 M9 and M6

Oligo1 Number of observed1

Fur dimers (nM)Number of M122

walkers (bits)Number of M92

walkers (bits)Number of M62

walkers (bits)

F-F 1 (25000) f 1 (55) 2 (90 94) f 1 (115)F-F-F 2 (3610) 1 (148) f 2 (118 153) f 2 (119 99)F-F-F-F 4 (630) 2 (166 192) 3 (118 189 153) 3 (119 119 99)F-F-x-R 2 (095) f 2 (221 43) f 2 (177 176) 1 (204)F-F-x-R-R 3 (078) 2 (296 227) f 3 (177 275 108) 2 (204 164)fepB 2 (860) 1 (157) f 2 (109 82) 1 (152)entS 3 (301) 2 (156 172) f 3 (75 172 95) 2 (83 122)

1Oligos and observed Fur dimers are from (55) The apparent Kd is also given (nM in parentheses) for each oligo2Number of sequence walkers of each model followed by the information (bits) of each walker are shown for each sequence The predictions thatmatch the number of observed Fur dimers are marked with a solid circle while those that do not are marked with an open circle Only majorwalkers (strong walkers with 6-bp separation) are counted (see the text) A full depiction of all walkers (gt0 bits) can be found in SupplementaryFigures S3 and S5

6766 Nucleic Acids Research 2007 Vol 35 No 20

As with the 11 sites multiple alignment of the 24 sitesusing different window sizes also gave three classes ofalignments M12 M9 and M6 (Supplementary Figure S4)When these models were tested against the same set ofsequences as that used to test the 11-site models (Table 1Supplementary Figure S3) similar but cleaner resultswere obtained ie the weak walkers that were not countedpreviously become weaker (some are even below 0 bits)but the strong walkers still have similar Ri values(Supplementary Figure S5) This confirms that ourcounting of major walkers predicted by the initial 11-sitemodels was reasonable

P aeruginosa and B subtilis Fur models

To compare E coli Fur models with other bacterialFur models we also built P aeruginosa and B subtilis Furbinding site models As with E coli Fur binding siteswe applied the same method to align footprinted Furbinding sites from P aeruginosa (41) and B subtilis (42)For both bacteria we obtained two classes of alignments

(Supplementary Figure S6) which correspond to theE coli M9 and M6 logos (Figure 1B and C) respectivelyNo M12 alignments were obtained for these two bacteriaprobably because some sites used to make the modelsare single-dimer binding sites (see Discussion Section)Both P aeruginosa and B subtilis M9 models arehighly similar to the E coli M9 model (Figure 4) TheP aeruginosa M6 logo looks almost identical to theE coli M6 logo while the B subtilis M6 logo is lesssimilar to the E coli M6 logo (Figure 1C SupplementaryFigure S6B and D)As with Lavrrar and McIntoshrsquos work on E coli Fur

binding sites (55) Baichoo and Helmann (40) performedsimilar experiments on eight synthetic oligos and twonatural sites (feuA and dhbA) with B subtilis Fur anddetermined how many Fur dimers can bind each of theseoligos We tested the B subtilis M9 and M6 modelswith these 10 sequences (Table 2 Figure 2C and Dand Supplementary Figure S7) The results show that theM9 model made correct predictions for all 10 sequenceswhile the M6 model was correct for only four oligos

A

B

fepB binds 2 Fur dimers

entS binds 3 Fur dimers

C

D

feuA binds 1 Fur dimer

dhbA binds 2 Fur dimers

10 20 30 5 g g a a t t c g a a a a t g a g a a g c a t t a t t g g a t c c c g 3

M12 157 bits

M9 109 bits

M9 82 bits

M6 152 bits

10 20 30 40 5 g g a a t t c g a t a a t g a a a t t a a t t a t c g t t a t c g g a t c c c g 3

M12 156 bits

M12 172 bits

M9 75 bits

M9 172 bits

M9 95 bits

M6 83 bits

M6 122 bits

10 20 30 5 a t t c c a a t t g a t a a t a g t t a t c a a t t g a a c a 3

M9 223 bits

M6 103 bits

M6 81 bits

10 20 30 5 a t t g a t a a t g a t a a t c a t t a t c a a t a g a t t g 3

M9 256 bits

M9 299 bits

M6 149 bits

M6 208 bits

M6 171 bits

Figure 2 Sequence walkers to test E coli (A and B) and B subtilis (C and D) Fur models (A and B) The three E coli Fur models (M12 M9 andM6) were tested with two E coli Fur binding sites fepB and entS (55) Sequence walkers of each model (green M12 red M9 blue M6) are shownfor each site (C and D) Two B subtilis Fur models (M9 and M6) were tested with two B subtilis Fur binding sites feuA and dhbA (40) Gray barsindicate natural sequence surrounded by synthetic DNA Different colored rectangles indicate different Fur models (by hue green M12 red M9blue M6) and their strengths (by saturation) (30) Only major walkers are shown in these figures for a full list of walkers (Rigt0 bits) please seeSupplementary Figures S3 S5 and S7

Nucleic Acids Research 2007 Vol 35 No 20 6767

For most of the oligos the M6 model predicted one moresite than the M9 model did Similar to our observationsin E coli these results strongly suggest that the M9 modelis a single-dimer binding model while the M6 alignment isdue to compression of the binding sequencesRelative to E coli Fur (NP_415209) the P aeruginosa

Fur (NP_253452) is moderately diverged (58 identical inprotein sequence) and the B subtilis Fur (NP_390233) isdistantly diverged (33 identity) However a comparisonof the three M9 models from these three species showsthat these models are highly similar to each other(Figure 4) suggesting that the FurndashDNA recognitionmechanism is conserved in even distantly related species

Scans of footprinted regions

Gel shift experiments cannot give precise Fur bindingregions Originally using the 11-site M9 model and laterusing the 24-site M9 E coli Fur model we predicted Fursites in the fhuF promoter region and then to validate ourmodel we performed DNase I footprinting on that regionTo further confirm the 24-site M9 model we also scannedit across all other available footprinted E coli Fur bindingregions and correlated our predictions with the publishedfootprints Fur has been documented to exhibit lsquosecondaryfootprintingrsquo protecting extended regions of DNA underhigher protein concentrations (Supplementary Figure S8)(333646) To reveal the extended regions we used a cutoffat zero bits since that is the theoretical lowest informationfor binding (19)The E coli 24-site M9 model predicts two separated

Fur binding clusters in the fhuF promoter region onecluster contains three strong sites of 161 208 and 223bits and the other contains two weaker sites of 146 and

93 bits (Figure 5A) The fhuF oligo used in the gel shiftexperiments in Figure 3 only contains the strong Furcluster To determine if both predicted clusters are pro-tected by Fur we performed DNase I footprinting on alarger fhuF promoter region (see Materials and MethodsSection) that contains both predicted Fur binding clustersThe results show that there are indeed two Fur-protectedregions in the fhuF promoter one has high affinity withFur (protected by 50 nM of Fur) and the other hasa lower affinity (weakly protected by 50 nM of Fur)(Figure 5B) The predicted strong Fur binding clustercovers 91 of the high-affinity footprinted region and theweak Fur cluster covers 66 of the low-affinity region(Table 3 Figure 5A)

There are 11 other footprinted E coli Fur bindingregions including the 10 regions used to build the initial11-site models and the fepD-entS promoter region (58)When these footprints were scanned similar coveragewas obtained (Table 3 Supplementary Figure S8)All 11 footprinted sequences displayed clusters of multipleoverlapping Fur walkers in these clusters the majorityof the walkers were separated by six bases some wereseparated by three bases (Table 3 SupplementaryFigure S8) For 7 of the 11 regions (fepA-fes fepB entCfur tonB cir and iucA) sequence walkers cover 83(72ndash93) of the footprints No sequence walkers coverthe secondary footprints in the fecA fepD-entS and sodAregions resulting in low coverage of these three footprints(66 54 and 56 respectively) The secondary sodAfootprint was only protected with a high concentration ofpurified Fur (200 and 500 nM) it was not protected bycrude extracts containing overproduced Fur (36) In theother region fecI several low-information content

A

B

yoeA fepD-entS gpmA ryhB fhuA nohA oppA

bits 257 172 243 220 161 205 187 11-site M9274 194 274 252 195 223 215 24-site M9

Fur minus + minus + minus + minus + minus + minus + minus +

gspC garP yahA fadD ygaC exbB exbD fhuF

bits 122 122 163 197 14 175 minus50 214 11-site M9144 138 186 217 minus34 191 minus71 223 24-site M9

Fur minus + minus + minus + minus + minus + minus + minus + minus +

Figure 3 Gel mobility shift assays to test predicted Fur binding sites Oligonucleotides containing predicted Fur binding sites were incubated without() or with (+) 150 nM E coli Fur protein and gel electrophoresed to test the 11-site M9 model (Figure 1B) Below each set of lanes is the strengthin bits of the strongest sequence walker found on each oligo by the 11-site M9 model Predictions by the 24-site M9 model which includes 13 of thesites tested here (not ygaC or exbD) are also given for comparison (A) The first set of oligos contain seven predicted Fur sites located in promoterregions (yoeA fepD-entS gpmA ryhB fhuA nohA and oppA) (B) The second set of oligos contain four predicted Fur sites located within genes(gspC garP yahA and fadD See Supplementary Figures S10ndashS13 for sequence walkers) two sites located in promoter regions (exbB and fhuF)and two consensus-predicted sites (ygaC and exbD) (5051) As expected all oligos that contain predicted sites (by the 11-site M9 model) exhibit oneor more mobility shifts following incubation with 150 nM Fur the oligo ygaC (14 bits) shows extremely weak binding and exbD (50 bits)does not shift

6768 Nucleic Acids Research 2007 Vol 35 No 20

walkers (02 and 08 bits) extend past the range of thefootprint resulting in 144 coverage of the footprint

In a previously published study Escolar et al synthe-sized oligonucleotides that contained repeats of thesequence GATAAT and determined Fur binding byfootprint experiments (5) No footprint was seen withone insert correspondingly no sequence walkers wereobserved in that sequence (Supplementary Figure S9)For the oligos with three four and five inserts sequencewalkers cover 83ndash90 of the footprints The 2-insertsequence had a weak interaction with Fur at high protein

concentrations and two overlapping walkers of 30 and42 bits appeared which cover the repeats (5) Theseresults demonstrate that the Fur model accurately predictsFur binding on both natural and synthetic sequences

Whole genome scan

The lowest information of any site in the 24-site M9 modelis 101 bits (Supplementary Figure S1) Since all sites inthe model are proven binding sites we used 10 bits asthe cutoff for a whole genome scan This presumably

A

B

C

======gtolt======

======gtolt======

======gtolt======

24 E coli Fur binding sites 196 bits (minus9 +9)

0

1

2b

its

5prime

minus20

CTGA

minus19

minus18

minus17

minus16

GCAT

minus15

GCTA

minus14

CGTA

minus13

GACT

minus12

CAGT

minus11

CGTA

minus10

GCAT

minus9

GTA

minus8

GTA

minus7

GCAT

minus6

CAG

minus5

GTCA

minus4

GAT

minus3

TA

minus2

TGCA

minus1

GACT

0 1

TCGA

2

ACGT

3

AT

4

TCA

5

CGAT

6

GTC

7

CTGA

8

CAT

9

CAT

10

CGTA

11

GCAT

12

GTCA

13

CTGA

14

GCAT

15

CGAT

16

CGTA

17 18 19 20

GCAT

3prime

20 P aeruginosa Fur sites 186 bits (minus9 +9)

0

1

2

bit

s

5prime

minus20

minus19

minus18

minus17

minus16

minus15

minus14

CGTA

minus13

ACGT

minus12

minus11

minus10

CGTA

minus9

CGTA

minus8

CTA

minus7

CT

minus6

TAG

minus5

GTCA

minus4

CAGT

minus3

TA

minus2

CTA

minus1

CT

0

GC

1

GA

2

GAT

3AT

4GTCA

5

CAGT

6

ATC

7

GA

8

GAT

9

GCAT

10

GCAT

11 12 13

TGCA

14

GCAT

15 16 17 18 19 20 3prime

22 B subtilis Fur sites 229 bits (minus9 +9)

0

1

2

bit

s

5prime

minus20

minus19

GCAT

minus18

GCTA

minus17

GACT

minus16

CTGA

minus15

minus14

GCAT

minus13

GCAT

minus12

minus11

GCAT

minus10

GCTA

minus9

CGTA

minus8

CGTA

minus7

AT

minus6

C

G

minus5

GCA

minus4

CGAT

minus3

TGA

minus2

TGA

minus1

GCAT

0

TAGC

1

GCTA

2

CAT

3

CAT

4

GCTA

5

CGT

6

G

C7

TA

8

GCAT

9

GCAT

10

CGAT

11

CGTA

12 13

CGTA

14

CGTA

15 16

GACT

17

CTGA

18

CGAT

19

CGTA

20 3prime

Figure 4 Comparison of M9 Fur models of three bacterial species Experimentally proven Fur binding sites from E coli (A) P aeruginosa (B) andB subtilis (C) were used to build the models for these three bacteria The marked region from 7 to +7 in each logo indicates the 7-1-7 model thatwas proposed by Baichoo and Helmann (40)

Nucleic Acids Research 2007 Vol 35 No 20 6769

minimizes false positives in our predictions When themodel was scanned across the E coli K-12 genome(NC_000913) a total of 412 sites were found above10 bits These sites belong to 363 unique clusters A list ofthe clusters is available at httpwwwccrnpncifcrfgovtomspapersfurThe 363 regions were extracted from 50 to thorn50

around the strongest site in each region and scanned withthe 24-site M9 model using a cutoff of 0 bits We thendetermined the exact region of each cluster that is coveredby walkers greater than 0 bits The results show that theseclusters contain 1ndash14 walkers covering a region from19ndash142 bases (350 bases on average) This is comparableto the known footprints (30ndash103 bases) (Table 3 Supple-mentary Figure S8) Of these clusters 155 are in inter-genic regions 143 are entirely inside gene coding regionsand 65 overlap the boundary between coding and inter-genic regionsTo determine how many of the Fur clusters overlap

with reasonably strong promoters we scanned these 363Fur clusters with a s70 promoter model (47) using a 4-bitcutoff The results showed that 303 clusters overlap withone or more strong promoters (Rigt4 bits) The other60 clusters do not overlap with a strong promoterWhen we scanned the flanking regions of these 60 clusters(50 to 0 of the left edge and 0 to thorn50 of the right edgeof each cluster) only 23 clusters were found to havestrong promoters (Ri gt 4 bits) in the flanking regionsWe examined the locations of Fur sites relative toknown promoter elements (transcriptional start site 10and 35) and found that they are more tightly distributedaround the promoter elements than observed with theDNA binding proteins Fis H-NS and IHF (47) (data notshown) These results suggest that Fur may mainly act as adirect repressor of genes by binding to and blockingthe promoters and that direct activation by Fur throughbinding to the region upstream of a promoter assuggested in Neisseria meningitidis by Delany et al (59)

may not be a common mechanism of Fur regulationin E coli

Out of the 363 clusters 42 were found containingat least one predicted Fur binding site above 170 bits(our arbitrary cutoff used to locate significant regions)(Table 4) Of the 42 clusters 39 were found to overlap withone or more strong s70 promoters (gt4 bits) The otherthree clusters fepD-entS yddA and fecA do not overlapScanning the flanking regions of these three clusters withthe s70 promoter model revealed that these Fur clustersare located between the respective promoters and transla-tional starts There may be only weak repression of thesegenes by Fur as none of them were significantly repressedby Fur according to microarray analyses (1360)

Scans of other proposed Fur-regulated genes

Many genes have been proposed to be Fur-regulated bycomparing conventional consensus sequences to promoterregions and also by homology to systems in other organ-isms Kammler et al annotated two Fur binding sites inthe promoter region of feoA in E coli by comparison toa consensus sequence (10) The 24-site M9 model predictstwo clusters of sites in the same region with each clustercovering one of the marked Fur boxes One cluster hasone major site of 105 bits (at 3538006) the other clusterhas two major overlapping sites of 109 (at 3538059) and134 bits (at 3538065) respectively The same authors haveconfirmed that Fur does bind to this region in vivo but theexact Fur binding regions have not been determined byfootprint experiments

Vassinova and Kozyrev used an in vivo selection tolocate Fur sites on Sau3A fragments from the E coligenome (51) The five regions from Figure 2 of their paperwere analyzed using sequence walkers Using an unidenti-fied consensus sequence yhhX (from 3578828 to 3577791)was predicted by Vassinova and Kozyrev to have two Furbinding sites In the same region we found one strongsite of 252 bits (at 3579054) (Table 4) overlapping twoweaker sites (82 and 83 bits) This cluster of Fur sites isactually located right in front of an sRNA gene ryhB(from 3579039 to 3578946) (Table 4) (6162) which hasbeen shown to be repressed by Fur (57) The promoterregion of gpmA (from 786818 to 786066 named pgm inreference 51) was predicted by consensus to contain twosites One of the highest sites (274 bits) was found in thisregion (at 786853) overlapping two weaker sites of 75and 56 bits The consensus sequence-predicted Fur site byVassinova and Kozyrev (51) is located about 600 basesupstream of the ygaC gene In the same region our 11-siteM9 model only found a 14-bit site (Figure 3) and the24-site M9 model found a negative site of 34 bits(Supplementary Figure S1) Instead in the nearby regionsthe 24-site M9 model found a 74-bit site (11 basesupstream of ygaC) and a 101-bit site (728 bases upstreamof ygaC) For nohA (from 1634391 to 1633822)two overlapping sites of 223 and 126 bits were found at1634624 and 1634630 (Table 4) The fifth region was fhuF(from 4603686 to 4602898) Figure 5 shows the DNase Ifootprints and predicted walkers in this region Onlythe high-affinity region (fhuF1) was identified by the

Table 2 Information theory test of B subtilis Fur models M9 and M6

Oligo1 Number of observed1

Fur dimers (nM)Number of M92

walkers (bits)Number of M62

walkers (bits)

6-mer 0 f 0 f 06-6 1 (500) f 1 (21) 2 (67 54)6-1-6 1 (200) f 1 (97) 2 (54 74)7-1-7 1 (100) f 1 (207) 2 (107 128)8-1-7 2 (100 1000) f 2 (76 202) f 2 (174 108)8-1-8 2 (100 1000) f 2 (76 230) f 2 (174 160)9-1-9 2 (50 100) f 2 (119 301) f 2 (208 194)2 (7-1-7) 2 (10 100) f 2 (166 257) 3 (61 208 143)feuA 1 (10) f 1 (223) 2 (103 81)dhbA 2 (10 100) f 2 (256 299) 3 (149 208 171)

1Oligos and observed Fur dimers are from (40) The Fur concentration(nM) at which one or two Fur dimer-DNA complexes appeared isgiven in parentheses2Number of sequence walkers of each model followed by theinformation (bits) of each walker are shown for each sequence Thepredictions that match the number of observed Fur dimers are markedwith a solid circle while those that do not are marked with an opencircle Only major walkers are counted A full depiction of all walkerscan be found in Supplementary Figure S7

6770 Nucleic Acids Research 2007 Vol 35 No 20

A B

L

H

H

L

H fhuF1

L fhuF2

yjjZ

4603880 4603870 4603860 4603850 4603840 4603830 4603820 5 a t c a g c a a t c c c g g c a g c a a c a c t c c c c a g c c a c t g c c c a g c g t a c g t t g c a a c a t g a t t t c a t c t c 3

4603810 4603800 4603790 4603780 4603770 4603760 4603750 5 t t t c a t t g a t a a t g a t a a c c a a t a t c a t a t g a t a a t t t t t a t c a t t t g c a a g c c a g a t a a a t c c c t t g c t a t c g g 3

Fur high affinity topFur high affinity bottom

M9 161 bits

M9 208 bits

M9 223 bits

M9 34 bits

fhuF

fhuF

4603740 4603730 4603720 4603710 4603700 4603690 4603680 5 g t a a a c c t a t c g c t a t g a t t a g c a a t c a t t a t c a t t t a g a t t a c t a t c c c g a t t a t g g c c t a t c g t t c 3rsquo

Fur low affinity topFur low affinity bottom

M9 22 bits

M9 01 bits

M9 146 bits

M9 93 bits

Figure 5 Fur model scan and DNase I footprints of the fhuF promoter region (A) The promoter region was scanned with the 24-site M9 model and two clusters of sequence walkers (Rigt 0bits) were detected with each closely corresponding to one of the two Fur protected regions (gray bars fhuF1 and fhuF2) The solid black arrow with two tails above the DNA at coordinates4603717 and 4603716 indicates the fhuF transcription starts The open arrows starting at coordinates 4603827 and 4603686 indicate the yjjZ and fhuF translation starts respectively Thesaturation of colored rectangles behind each sequence walker is proportional to the information of that site For reference colored lines cycle with 6-base periodicity (B) DNase I footprinting onthe fhuF promoter by Fur showed two regions protected by the protein marked in the figure by brackets The high-affinity and low-affinity regions are marked with H and L respectively Thefootprinting samples were run in parallel with MaxamndashGilbert sequencing ladders (marked by GA)

Nucleic

Acid

sResea

rch2007V

ol3

5N

o20

6771

consensus sequence Thus four of the five Sau3Afragments (yhhX gpmA nohA and fhuF) selected to haveFur binding sites in vivo were identified in our genomescan (Table 4) and our gel shift results confirmed this(Figure 3) the remaining site ygaC had a weak walker of14 bits (by the 11-site M9 model) and showed extremelyweak binding (Figure 3)Eick-Helmerich and Braun matched a Fur consensus

sequence to the promoters of exbB and exbD (50)One strong walker of 191 bits was found in the exbBpromoter region overlapping the lsquoFur boxrsquo The regionupstream of exbD showed no walkers with Ri greater than0 bits even though it had also been predicted to containa Fur binding site using the same consensus sequenceas used in the exbB promoter The highest walker thatwe could detect in the region upstream of the exbDgene is 71 bits (by the 24-site M9 model Figure 3 andSupplementary Figure S1) significantly lower than thetheoretical lower bound of a binding site (0 bits) (19) Thisindicates that Fur should not bind this region and our gelshift results confirmed this (Figure 3)A microarray analysis by McHugh et al found 101

genes to be regulated by Fur 53 of which were repressedand 48 activated (13) The 53 Fur-repressed genes belongto 32 transcription units (individual genes or operons)18 of which are involved in iron metabolism 14 in energymetabolism and other functions Using the 24-site M9model we predicted strong Fur binding clusters (higherthan 13 bits) for all 18 iron metabolism-related transcrip-tion units Of the other 14 transcription units onlytwo have a Fur binding site higher than 10 bits (nrdHIEF101 bits fimE 108 bits) The 48 Fur-activated genesbelong to 34 transcription units only three of these(garPLRK ynaE and ydfK) have a predicted Fur bindingcluster The garPLRK operon has a weak Fur clusterthat contains one major site of 104 bits The ynaE andydfK genes are almost identical in both promoter andcoding regions and the two Fur clusters (171 bitscentered at thorn8 from the translational start) are the same(Table 4) These results strongly suggest that in E coli

Fur is a direct repressor of iron metabolism genes and anindirect repressor or activator of other genes

McHugh et al (13) used an altered information theoryapproach for modeling Fur binding (63) to identify whichgenes are under direct Fur control Our model identifiedsites in all genes that their model did as well as fourothers fhuE (124 bits at 1160880) exbB (191 bits at3150076) fimE (108 bits at 4539697) and ydfK (171 bitsat 1631071)

The indirect activation by Fur requires an intermediateregulator which could directly target Fur-activated genesA small regulatory RNA gene ryhB is one such importantplayer Earlier study revealed several genes that aredirectly repressed by ryhB (57) Recently a microarrayanalysis of ryhB control by Masse et al (60) expanded thisdirect target set to 18 operons They also identified anadditional 10 indirectly down-regulated operons and 10up-regulated operons The 10 ryhB indirectly repressedoperons are directly repressed by Fur and for all ofthese operons we found strong Fur binding clusters(higher than 13 bits) in the corresponding promoterregions In contrast only a few of the other 28 ryhB-controlled operons (18 directly repressed operons and10 up-regulated operons) have a predicted Fur bindingcluster above 10 bits These include acnA (176 bits at1333484) ydhD (109 bits at 1732367) and oppA (215 bitsat 1298973) The acnA and oppA Fur clusters are bothstrong the former cluster has three overlapping sites of176 98 and 142 bits and the latter has two overlapp-ing sites of 215 and 96 bits (Table 4) Howevermicroarray analysis did not find these two genes to berepressed by Fur (1360) suggesting that the Fur bindingclusters may not be the primary control elements ofthese genes

DISCUSSION

Fur binding models

In this study we used experimentally proven sites to createinformation theory models of Fur binding Our models

Table 3 Scan of footprinted Fur binding sites in E coli

Accession Gene Footprinted region References Number Ri Percentage

NC_000913 fhuF1 4603763 ndash 4603815 This work 4 223 91NC_000913 fhuF2 4603680 ndash 4603733 This work 4 146 66NC_000913 fepA-fes 611868 ndash 611915 (35) 4 187 77NC_000913 fepB 623939 ndash 623969 (39) 2 167 81NC_000913 entC 624054 ndash 624102 (39) 4 231 76NC_000913 fur 709931 ndash 709960 (34) 2 123 83NC_000913 tonB 1309040 ndash 1309072 (838) 3 115 88NC_000913 cir 2244957 ndash 2244999 (32) 3 207 72NC_000913 sodA 4098726 ndash 4098780 (36) 3 233 56NC_000913 fecA 4514752 ndash 4514789 (33) 2 176 66NC_000913 fecI 4516300 ndash 4516340 (33) 7 231 144M10930 iucA 291 ndash 393 (37) 13 203 93NC_000913 fepD-entS 621394 ndash 621462 (58) 4 201 54

The footprinted regions were scanned with the 24-site M9 model (Rigt0) to test the modelrsquos validity This table lists the GenBank accession numberthe gene promoter that was footprinted the coordinates of the footprinted region the references for the footprints the number of walkers in theregion the strongest Ri value in the same region and the percent of the footprint covered by sequence walkers This table is a summary of Figure 5Aand Supplementary Figure S8 Sites in the promoters that are marked by an asterisk were used to build the 11-site models (Figure 1 SupplementaryFigure S1) in the case of iucA two sites were used in the model (see Materials and Methods Section)

6772 Nucleic Acids Research 2007 Vol 35 No 20

especially the 24-site M9 model approximate the bindingcharacteristics of Fur more fully than previous modelsthat depended on conventional consensus sequences datafrom multiple species and sequences that were not foot-printed (13251) The rigorous approach used hererevealed new binding sites disproved two sites predictedby a consensus sequence and clarified the manner inwhich the protein binds

To produce these models binding sequences weremultiply aligned to maximize the information content indifferent window sizes Because multiple Fur sites overlapour analysis of proven Fur binding sequences from three

bacterial species gave two or three different alignmentsdepending on the species (Figure 1 SupplementaryFigure S6) By testing these models against publisheddata we showed clearly that the M9 model representssingle-dimer binding sites the M12 model is for two-dimerbinding while the compact M6 alignment was caused bycompression of the binding sequences into a small window(Tables 1 and 2 Figures 1 and 2 Supplementary FiguresS3 S5 and S7)As evaluated by their respective M9 models our

initial set of 11 footprinted E coli Fur binding sitesall contain overlapping sites of 6-base spacing

Table 4 Strong Fur clusters in the E coli genome predicted by the 24-site M9 model

Ri Number Size Coordinate Genes Experimental Position References(bits) (bp) data

274 12 95 2066611 yoeA G 48 NA274 5 31 786853 gpmA G 35 (1351)253 4 28 579824 nohB 133 (1)252 4 31 3579054 ryhB G 15 (57)233 3 31 4098746 sodA F 87 (6069)231 7 59 4516309 fecI F 51 (1360)231 4 37 624072 entC F 36 (39)228 2 25 1903412 yebN 300 NA223 4 48 4603779 fhuFa FG 93 (135160)223 3 25 1634624 nohA G 233 (51)217 2 25 1886651 fadD G +1119b NA215 2 25 1298973 oppA G 233 (7071)213 3 31 1762463 sufA 53 (1360)207 3 31 2244980 cir F 189 (1372)206 5 40 1118358 yceJ 89 NA202 3 31 3465053 bfd 40 (1360)201 10 53 1787602 ydiE 35 (13)201 2 35 1080513 ycdN 57 (1360)199 8 106 3214572 yqjH 59 (13)199 7 103 1752756 ydhY 255 NA199 2 25 2510784 mntH 56 (7374)196 3 25 2231892 yeiT 163 NA196 2 25 1577358 yddA +8c (13)195 3 31 167439 fhuA G 45 (1360)194 4 37 621437 fepD ndash entS FG 25 86 (1375)193 3 52 675768 ybeQ +2c NA191 3 31 3150076 exbB G 70 (136072)187 8 96 2784036 ygaQ 383 NA187 4 37 611889 fepA ndash fes F 172 149 (13)186 7 38 3266404 yhaC 33 NA186 4 28 331901 yahA G +306e NA183 4 61 331085 yahA 510e NA176 6 41 1333484 acnA 371 (5776ndash78)176 2 25 4514779 fecA F 79 (1360)176 1 19 2254041 yeiL +5d NA175 3 25 490057 priC ndash ybaN 21 49 (7980)173 2 29 2454142 yfcV 474 NA172 8 41 3582772 yrhB 10 NA172 4 37 840877 fiu 123 (13)171 3 40 1631071 ydfK +8c (13)171 3 40 1432273 ynaE +8c (13)171 2 25 4570225 yjiT 212 NA

The strongest individual information value (Ri bits) in each cluster is shown followed by the number of walkers (Rigt 0 bits) the size of the clusterthe coordinate of the strongest site the genes that are presumably controlled experimental data footprinted (F) andor gel shifted (G) sites thelocation of the strongest Fur site relative to the gene start or stop codon (as noted) and references about Fur regulation of the genes (NA means noreference is available)aThis cluster corresponds to the high-affinity region (fhuF1) in the fhuF promoter (Figure 5)bThe coordinate is high because Fur cluster is inside the fadD coding regioncThese Fur clusters overlap with translational startsdThis Fur cluster overlaps with the stop codoneThe yahA gene has two Fur clusters one is inside the coding region another is 510 bases upstream of the translational start The Fur cluster insidethe gene was tested by gel shift experiments (Figure 3)

Nucleic Acids Research 2007 Vol 35 No 20 6773

(Supplementary Figure S8) while only 7 of the 20P aeruginosa Fur binding sites and 6 of the 22 B subtilisFur binding sites are single-dimer binding sites (data notshown) This scarcity of multiple sites explains why anM12 alignment can be obtained for E coli but not forP aeruginosa or B subtilis When we performed multiplealignment by only using overlapping sites of 6-bp spacingwe obtained M12 alignments for both P aeruginosa andB subtilis (data not shown) These observations alsosupport the use of M9 as a single-dimer modelOur multiple alignment method should apply to any

DNA binding sites that contain internal direct repeats asrepetition within binding sites may cause alternativealignments of the sites In the RegulonDB (httpregulondbccgunammx) (64) a Fur binding modelwas created by multiply aligning 47 Fur sites althoughit is not clear what these 47 sites are and how thealignment was made Based on their alignment (httpregulondbccgunammxhtmlmatrix_Alignmentjsp) we made a sequence logo (data notshown) which we found to be similar to our M6 logosAlthough the M6 models appear to be able to predictsome sites (eg Tables 1 and 2 Supplementary Figures S3S5 and S7) the models only represent the internal partof overlapping Fur binding sites ie from 6 to thorn6 of theM12 model (Figure 1A)A 18 A crystal structure for P aeruginosa Fur

was obtained by Pohl et al and based on this theauthors proposed a single-dimer binding model (9) TheP aeruginosa crystal structure contains a putative DNAbinding a-helix H4 and a loop between helices H1 and H2both of which were proposed to be involved in bindingDNA through contacting bases in the major groove Thusa single dimer would protect two consecutive majorgrooves on the same face of DNA and the minimalrecognition unit of Fur should be close to 20 bp (9) OurM9 binding site models (19 bp a 9-1-9 inverted repeat) fitwith this single-dimer binding model closely Pohl et alalso proposed a two-dimer binding model in whichtwo Fur dimers bind two overlapping sites that are 5-bpapart (9) However in our whole genome scans andfootprint scans as well we did not observe any case of5-bp separated overlapping sites above 5 bits Instead wemostly found 3- or 6-bp separated overlapping sitesAbove five bits we found 95 cases of 3-bp spacing and92 cases of 6-bp spacingSeveral other consensus-based Fur binding site models

have been proposed to interpret a 19-bp consensuslsquoFur boxrsquo (Figure 6) (65) Within these two earlier

models the classical model and hexamer model(573754) interpreted the lsquoFur boxrsquo as a single recogni-tion unit of a Fur dimer Later Lavrrar and McIntoshsuggested that a 13-bp inverted repeat (6-1-6) is theminimal unit recognized by a single Fur dimer and thattwo overlapping lsquo6-1-6rsquo motifs correspond to the Fur boxwhich is required for high-affinity binding of Fur (5558)Baichoo and Helmann reinterpreted the Fur box as twooverlapping 7-1-7 inverted repeats with a 6-bp spacingand suggested that a 7-1-7 site but not 6-1-6 representsthe minimal recognition unit of Fur (40) Baichoo andHelmann also demonstrated that high-affinity bindingby Fur can happen on a single-dimer binding site ie aninverted 7-1-7 site

The 7-1-7 model basically agrees with our M9 models(Figure 6) The main difference is that the 7-1-7 model didnot count the weakly conserved bases at positions 9 8thorn8 and thorn9 in the B subtilis Fur binding site alignment(Figure 4C) As the E coli and P aeruginosa M9 modelshave highly conserved bases at these four positions(Figure 4A and B) and also the Fur binding mechanismsfor these three species are similar (9) we suggest thata minimal Fur binding unit of 19 bp (9-1-9) as representedby our M9 models is more reasonable

Relative to our M9 models the Fur box only representsthe internal part of two overlapping sites that areseparated by 6 bases ie from 9 to thorn9 of the M12logo (Figures 1A and 6) thus any predictions based on theFur box may not be accurate Two sites (exbD and ygaC)that were predicted by matching to the Fur box wereshown to be non-binding (Figure 3)

Many difficulties in understanding Fur binding sites canbe attributed to the choice of consensus sequences asa model (66) In contrast to the individual informationweight matrix model the consensus sequence methodignores the varying importance of bases by treatingmismatches equivalently In addition the consensusmethod does not have a criterion for an acceptablenumber of mismatches

Sequence walkers predicted by the 24-site M9 modelcover 83 of the footprints (Table 3 Figure 5Supplementary Figure S8) Similar results were obtainedfor both the P aeruginosa and B subtilis M9 models(data not shown) These results strongly suggest that ourM9 models can accurately predict Fur binding

Fur Binding Site Clusters

The whole genome scan with the 24-site M9model revealed363 Fur binding clusters in E coli most of which containmultiple walkers (up to 14) covering regions up to about140 bases This is comparable to the size of knownfootprints which ranges from 30 to 103 bases (Table 3Supplementary Figure S8) Most of the clusters (303 outof 363 83) were found to overlap with one or morestrong s70 promoters Indeed the four Fur clusters inthe middle of coding regions gspC garP yahA and fadDthat we tested by gel mobility shift assays (Figure 3) allhave one or more potential s70 sites (Ri gt 4 bits) exactlyoverlapping them (see Figures S10ndashS13) The function ofthese control elements is unknown

Fur boxClassical model 9-1-9Hexamer model 6-6-1-6Lavrrar model 6-1-6

Baichoo model 7-1-7

M9 model 9-1-9

GATAATGATAATCATTATC

========gtOlt========

=====gt=====gtOlt=====

=====gtOlt==========gtOlt=====

t=====gtOlt============gtOlt=====a

aat=====gtOlt================gtOlt=====att

Figure 6 Different Fur binding models to interpret the Fur box

6774 Nucleic Acids Research 2007 Vol 35 No 20

We correlated our whole genome scan with publishedmicroarray data about Fur in E coli (1360) The resultsshowed that all operons that are involved in ironmetabolism have a predicted strong Fur binding cluster(higher than 13 bits) suggesting direct repression ofthese genes by Fur In contrast only a few of otherFur-activated or repressed operons have a predicted Furcluster higher than 10 bits suggesting indirect control ofthese genes by Fur It has been shown that RNA generyhB directly targets and represses a set of other genes andryhB itself is directly repressed by Fur thus ryhB playsan important role in indirect activation of genes by Furin E coli (135760)

It has been suggested that in N meningitidis Fur canalso directly activate genes by binding to the regionupstream of the promoters (59) However our wholegenome analysis did not support this activation mechan-ism in E coli The E coli ryhB gene is located betweengenes yhhX and yhhY No corresponding homologs of thegenes ryhB yhhX or yhhY can be found in N meningitidisThis suggests that Fur activation mechanisms may bedifferent in different species

Disentangling Fur models

Each binding site that we have studied in detail withinformation theory has had unique properties andchallenges for analysis For example ribosome bindingsites were initially modeled as rigid objects (16) but whenenough sites became available it was clear that theShinendashDalgarno sequence was being lost because it has avariable distance to the initiation codon (67) and a flexiblemodel was required (48) The flexible model was latersuccessfully applied to s70 promoters (47) and that wasused to study Fur sites Likewise information theoryanalysis revealed that Fis binding sites are commonlyfound to overlap each other (68) leading to someuncertainty as to how much nearby sites influence thesites being studied in the alignments Overlapping Fursites that are bound by Fur protein dimers create an evenmore complex situation

Although the 24-site M9 model was quite successful themodel is comprised of information from multiple Fur sitesbecause the Fur dimers overlap on the DNA Since theclusters frequently display 6-base separation of sitespositions 3 to thorn9 of the model contain data frompositions 9 to thorn3 of the adjacent Fur sites eg see thelogo of Figure 4 and the M9 walkers of Figure 2AIn addition information from sites separated by 3 basesand the complements of all of these also contribute to theoverall information content of the model It is not obvioushow to disentangle these effects We suggest that the M9model can be scanned across the genome to locate strongsingle-dimer binding sites which could then be experi-mentally confirmed and used to construct a disentangledmodel

Unfortunately the distribution of the individual infor-mation in these sites would be determined by the modelused to locate them and this could introduce biases Forexample the mean of the distribution can be altered justby selecting different subsets of sites from the distribution

A similar bias has occurred in binding site databaseswhere the consensus sequence was used to locate newsites (23) Thus although the sequence logo would nolonger contain self-overlapping data it would not neces-sarily represent the natural distribution of site strengthsTo approach such an ultimate model may require furtherscanning of footprinted regions Incorporation of theweaker sites in the footprints would of course re-entanglethe model so such scans might be best used only todetermine whether the model can detect the weaker sitesA viable solution may be to obtain all single-dimerbinding sites in the genome that have more than zero bitsand can be confirmed experimentally If the resultingmodel matches the observed footprints it may be consi-dered complete On the other hand if the resultingmodel does not match the footprints the mode of bindingby Fur could be different between single dimers andclusters

SUPPLEMENTARY DATA

Supplementary data are available at NAR Online

ACKNOWLEDGEMENTS

We thank Tom OrsquoHalloran and Caryn Outten for Furprotein Pete Rogan for helping to develop the local bestconcept Elaine Bucheimer for writing the localbestprogram and Lakshmanan Iyer Brent Jewett DanielleNeedle Xiao Ma Shu Ouyang Denise Rubens andBruce Shapiro for their helpful comments and discussionsKAL also thanks the National Cancer Institute atFrederick for sponsoring the Werner H Kirsten StudentIntern Program This publication has been funded in partwith Federal funds from the National Cancer InstituteNational Institutes of Health under contract NO1-CO-12400 The content of this publication does not necessarilyreflect the views or policies of the Department of Healthand Human Services nor does mention of trade namescommercial products or organizations imply endorsementby the US Government This research was supported inpart by the Intramural Research Program of the NIHNational Cancer Institute Center for Cancer ResearchFunding to pay the Open Access publication charges forthis article was provided by National Cancer Institute

Conflict of interest statement None declared

REFERENCES

1 StojiljkovicI BaumlerAJ and HantkeK (1994) Fur regulon ingram-negative bacteria Identification and characterization of newiron-regulated Escherichia coli genes by a fur titration assay[published erratum appears in J Mol Biol 1994 Jul 15240(3)271]J Mol Biol 236 531ndash545

2 Le CamE FrechonD BarrayM FourcadeA and DelainE(1994) Observation of binding and polymerization of Fur repressoronto operator-containing DNA with electron and atomic forcemicroscopes Proc Natl Acad Sci USA 91 11816ndash11820

3 CalderwoodSB and MekalanosJJ (1987) Iron regulation ofShiga-like toxin expression in Escherichia coli is mediated by thefur locus J Bacteriol 169 4759ndash4764

Nucleic Acids Research 2007 Vol 35 No 20 6775

4 NiederhofferEC NaranjoCM BradleyKL and FeeJA (1990)Control of Escherichia coli superoxide dismutase (sodA and sodB)genes by the ferric uptake regulation (fur) locus J Bacteriol 1721930ndash1938

5 EscolarL Perez-MartinJ and de LorenzoV (1998) Binding of theFur (Ferric Uptake Regulator) repressor of Escherichia coli toarrays of the GATAAT sequence J Mol Biol 283 537ndash547

6 BaggA and NeilandsJB (1987) Ferric uptake regulation proteinacts as a repressor employing iron (II) as a cofactor to bind theoperator of an iron transport operon in Escherichia coliBiochemistry 26 5471ndash5477

7 de LorenzoV WeeS HerreroM and NeilandsJB (1987)Operator sequences of the aerobactin operon of plasmid ColV-K30binding the ferric uptake regulation (fur) repressor J Bacteriol169 2624ndash2630

8 AlthausEW OuttenCE OlsonKE CaoH andOrsquoHalloranTV (1999) The ferric uptake regulation (Fur) repressoris a zinc metalloprotein Biochemistry 38 6559ndash6569

9 PohlE HallerJC MijovilovichA Meyer-KlauckeWGarmanE and VasilML (2003) Architecture of a protein centralto iron homeostasis crystal structure and spectroscopic analysis ofthe ferric uptake regulator Mol Microbiol 47 903ndash915

10 KammlerM SchonC and HantkeK (1993) Characterization ofthe ferrous iron uptake system of Escherichia coli J Bacteriol 1756212ndash6219

11 DesaiPJ AngererA and GencoCA (1996) Analysis of Furbinding to operator sequences within the Neisseria gonorrhoeae fbpApromoter J Bacteriol 178 5020ndash5023

12 HeidrichC HantkeK BierbaumG and SahlHG (1996)Identification and analysis of a gene encoding a Fur-likeprotein of Staphylococcus epidermidis FEMS Microbiol Lett 140253ndash259

13 McHughJP Rodriguez-QuinonesF Abdul-TehraniHSvistunenkoDA PooleRK CooperCE and AndrewsSC(2003) Global iron-dependent gene regulation in Escherichia coliA new mechanism for iron homeostasis J Biol Chem 27829478ndash29486

14 ShannonCE (1948) A mathematical theory of communicationBell Syst Tech J 27 379ndash423 623ndash656 httpcmbell-labscomcmmswhatshannondaypaperhtml

15 PierceJR (1980) An Introduction to Information Theory SymbolsSignals and Noise 2nd edn Dover Publications Inc New York

16 SchneiderTD StormoGD GoldL and EhrenfeuchtA (1986)Information content of binding sites on nucleotide sequencesJ Mol Biol 188 415ndash431 httpwwwccrnpncifcrfgov~tomspaperschneider1986

17 SchneiderTD and MastronardeD (1996) Fast multiple alignmentof ungapped DNA sequences using information theory and arelaxation method Discrete Appli Math 71 259ndash268 httpwwwccrnpncifcrfgov~tomspapermalign

18 SchneiderTD and StephensRM (1990) Sequence logos a newway to display consensus sequences Nucleic Acids Res 186097ndash6100 httpwwwccrnpncifcrfgov~tomspaperlogopaper

19 SchneiderTD (1997) Information content of individual geneticsequences J Theor Biol 189 427ndash441 httpwwwccrnpncifcrfgov~tomspaperri

20 SchneiderTD (1997) Sequence walkers a graphical method todisplay how binding proteins interact with DNA or RNAsequences Nucleic Acids Res 25 4408ndash4415 httpwwwccrnpncifcrfgov~tomspaperwalker erratum NAR 26(4)1135 1998

21 PaninaEM MironovAA and GelfandMS (2001) Comparativeanalysis of FUR regulons in gamma-proteobacteria Nucleic AcidsRes 29 5195ndash5206

22 SchneiderTD (2000) Evolution of biological informationNucleic Acids Res 28 2794ndash2799 httpwwwccrnpncifcrfgov~tomspaperev

23 VyhlidalCA RoganPK and LeederJS (2004) Development andrefinement of pregnane X receptor (PXR) DNA binding site modelusing information theory insights into PXR-mediated gene regula-tion J Biol Chem 279 46779ndash46786

24 ZhengM DoanB SchneiderTD and StorzG (1999) OxyR andSoxRS regulation of fur J Bacteriol 181 4639ndash4643 httpwwwccrnpncifcrfgov~tomspaperoxyrfur

25 HengenPN BartramSL StewartLE and SchneiderTD (1997)Information analysis of Fis binding sites Nucleic Acids Res 254994ndash5002 httpwwwccrnpncifcrfgov~tomspaperfisinfo

26 WoodTI GriffithKL FawcettWP Jair K-W SchneiderTDand WolfRE (1999) Interdependence of the position andorientation of SoxS binding sites in the transcriptional activation ofthe class I subset of Escherichia coli superoxide-inducible promotersMol Microbiol 34 414ndash430

27 ZhengM WangX DoanB LewisKA SchneiderTD andStorzG (2001) Computation-directed identification of OxyR-DNAbinding sites in Escherichia coli J Bacteriol 183 4571ndash4579

28 RoganPK FauxBM and SchneiderTD (1998) Informationanalysis of human splice site mutations Hum Mutat 12 153ndash171Erratum in Hum Mutat 199913(1)82 httpwwwccrnpncifcrfgov~tomspaperrfs

29 ChenZ and SchneiderTD (2005) Information theory basedT7-like promoter models classification of bacteriophages anddifferential evolution of promoters and their polymerasesNucleic Acids Res 33 6172ndash6187 httpwwwccrnpncifcrfgov~tomspaperst7like

30 ChenZ and SchneiderTD (2006) Comparative analysis of tandemT7-like promoter containing regions in enterobacterial genomesreveals a novel group of genetic islands Nucleic Acids Res 341133ndash1147 httpwwwccrnpncifcrfgov~tomspaperst7island

31 SchneiderTD (1996) Reading of DNA sequence logos predictionof major groove binding by information theory Meth Enzymol274 445ndash455 httpwwwccrnpncifcrfgov~tomspaperoxyr

32 GriggsDW and KoniskyJ (1989) Mechanism for iron-regulatedtranscription of the Escherichia coli cir gene metal-dependentbinding of Fur protein to the promoters J Bacteriol 1711048ndash1052

33 AngererA and BraunV (1998) Iron regulates transcription of theEscherichia coli ferric citrate transport genes directly and throughthe transcription initiation proteins Arch Microbiol 169 483ndash490

34 de LorenzoV HerreroM GiovanniniF and NeilandsJB (1988)Fur (ferric uptake regulation) protein and CAP (catabolite-activatorprotein) modulate transcription of fur gene in Escherichia coliEur J Biochem 173 537ndash546

35 HuntMD PettisGS and McIntoshMA (1994) Promoter andoperator determinants for fur-mediated iron regulation in thebidirectional fepA-fes control region of the Escherichia colienterobactin gene system J Bacteriol 176 3944ndash3955

36 TardatB and TouatiD (1993) Iron and oxygen regulation ofEscherichia coli MnSOD expression competition between the globalregulators Fur and ArcA for binding to DNA Mol Microbiol 953ndash63

37 EscolarL Perez-MartinJ and de LorenzoV (2000) Evidence ofan unusually long operator for the fur repressor in the aerobactinpromoter of Escherichia coli J Biol Chem 275 24709ndash24714

38 YoungGM and PostleK (1994) Repression of tonB transcriptionduring anaerobic growth requires Fur binding at the promoterand a second factor binding upstream Mol Microbiol 11943ndash954

39 BrickmanTJ OzenbergerBA and McIntoshMA (1990)Regulation of divergent transcription from the iron-responsivefepB-entC promoter-operator regions in Escherichia coliJ Mol Biol 212 669ndash682

40 BaichooN and HelmannJD (2002) Recognition of DNA by Fura reinterpretation of the Fur box consensus sequence J Bacteriol184 5826ndash5832

41 OchsnerUA and VasilML (1996) Gene repression by the ferricuptake regulator in Pseudomonas aeruginosa cycle selection of iron-regulated genes Proc Natl Acad Sci USA 93 4409ndash4414

42 BaichooN WangT YeR and HelmannJD (2002) Globalanalysis of the Bacillus subtilis Fur regulon and the iron starvationstimulon Mol Microbiol 45 1613ndash1629

43 ChristoffersenCA BrickmanTJ Hook-BarnardI andMcIntoshMA (2001) Regulatory architecture of the iron-regulatedfepD-ybdA bidirectional promoter region in Escherichia coliJ Bacteriol 183 2059ndash2070

44 ChaiS WelchTJ and CrosaJH (1998) Characterization of theinteraction between Fur and the iron transport promoter of thevirulence plasmid in Vibrio anguillarum J Biol Chem 27333841ndash33847

6776 Nucleic Acids Research 2007 Vol 35 No 20

45 EscolarL Perez-MartinJ and de LorenzoV (1998) Coordinatedrepression in vitro of the divergent fepA-fes promoters ofEscherichia coli by the iron uptake regulation (Fur) proteinJ Bacteriol 180 2579ndash2582

46 EscolarL de LorenzoV and Perez-MartinJ (1997)Metalloregulation in vitro of the aerobactin promoter of Escherichiacoli by the Fur (ferric uptake regulation) protein Mol Microbiol26 799ndash808

47 ShultzabergerRK ChenZ LewisKA and SchneiderTD (2007)Anatomy of Escherichia coli s70 promoters Nucleic Acids Res 35771ndash788 httpwwwccrnpncifcrfgov~tomspaperflexprom

48 ShultzabergerRK BucheimerRE RuddKE andSchneiderTD (2001) Anatomy of Escherichia coli ribosomebinding sites J Mol Biol 313 215ndash228 httpwwwccrnpncifcrfgov~tomspaperflexrbs

49 HiraoI KawaiG YoshizawaS NishimuraY IshidoYWatanabeK and MiuraK (1994) Most compact hairpin-turnstructure exerted by a short DNA fragment d(GCGAAGC) insolution an extraordinarily stable structure resistant to nucleasesand heat Nucleic Acids Res 22 576ndash582

50 Eick-HelmerichK and BraunV (1989) Import of biopolymers intoEscherichia coli nucleotide sequences of the exbB and exbD genesare homologous to those of the tolQ and tolR genes respectivelyJ Bacteriol 171 5117ndash5126

51 VassinovaN and KozyrevD (2000) A method for direct cloning ofFur-regulated genes identification of seven new Fur-regulated lociin Escherichia coli Microbiology 146 3171ndash3182

52 ToledanoMB KullikI TrinhF BairdPT SchneiderTD andStorzG (1994) Redox-dependent shift of OxyR-DNA contactsalong an extended DNA binding site a mechanism for differentialpromoter selection Cell 78 897ndash909

53 BlattnerFR PlunkettG III BlochCA PernaNT BurlandVRileyM Collado-VidesJ GlasnerJD RodeCK et al (1997)The complete genome sequence of Escherichia coli K-12 Science277 1453ndash1474

54 BaggA and NeilandsJB (1987) Molecular mechanism of regula-tion of siderophore-mediated iron assimilation Microbiol Rev 51509ndash518

55 LavrrarJL and McIntoshMA (2003) Architecture of a Furbinding site a comparative analysis J Bacteriol 185 2194ndash2202

56 OrchardK and MayGE (1993) An EMSA-based method fordetermining the molecular weight of a protein ndash DNA complexNucleic Acids Res 21 3335ndash3336

57 MasseE and GottesmanS (2002) A small RNA regulates theexpression of genes involved in iron metabolism in Escherichia coliProc Natl Acad Sci USA 99 4620ndash4625

58 LavrrarJL ChristoffersenCA and McIntoshMA (2002)FurndashDNA interactions at the bidirectional fepDGC-entS promoterregion in Escherichia coli J Mol Biol 322 983ndash995

59 DelanyI RappuoliR and ScarlatoV (2004) Fur functions as anactivator and as a repressor of putative virulence genes in Neisseriameningitidis Mol Microbiol 52 1081ndash1090

60 MasseE VanderpoolCK and GottesmanS (2005) Effect ofRyhB small RNA on global iron use in Escherichia coliJ Bacteriol 187 6962ndash6971

61 ArgamanL HershbergR VogelJ BejeranoG WagnerEGMargalitH and AltuviaS (2001) Novel small RNA-encodinggenes in the intergenic regions of Escherichia coli Curr Biol 11941ndash950

62 WassarmanKM RepoilaF RosenowC StorzG andGottesmanS (2001) Identification of novel small RNAs usingcomparative genomics and microarrays Genes Dev 15 1637ndash1651

63 RobisonK McGuireAM and ChurchGM (1998) A compre-hensive library of DNA-binding site matrices for 55 proteins appliedto the complete Escherichia coli K-12 genome J Mol Biol 284241ndash254

64 SalgadoH Santos-ZavaletaA Gama-CastroS Millan-ZarateDDiaz-PeredoE Sanchez-SolanoF Perez-RuedaEBonavides-MartinezC and Collado-VidesJ (2001) RegulonDB(version 32) transcriptional regulation and operon organization inEscherichia coli K-12 Nucleic Acids Res 29 72ndash74

65 AndrewsSC RobinsonAK and Rodriguez-QuinonesF (2003)Bacterial iron homeostasis FEMS Microbiol Rev 27 215ndash237

66 SchneiderTD (2002) Consensus Sequence Zen ApplBioinformatics 1 111ndash119 httpwwwccrnpncifcrfgov~tomspaperszen

67 RuddKE and SchneiderTD (1992) Compilation of E coliribosome binding sites In MillerJH (ed) A Short Course inBacterial Genetics A Laboratory Manual and Handbook forEscherichia coli and Related Bacteria Cold Spring HarborCold Spring Harbor Laboratory Press New York pp 1719ndash1745

68 HengenPN LyakhovIG StewartLE and SchneiderTD(2003) Molecular flip-flops formed by overlapping Fis sitesNucleic Acids Res 31 6663ndash6673

69 FeeJA (1991) Regulation of sod genes in Escherichia colirelevance to superoxide dismutase function Mol Microbiol 52599ndash2610

70 IgarashiK SaishoT YuguchiM and KashiwagiK (1997)Molecular mechanism of polyamine stimulation of thesynthesis of oligopeptide-binding protein J Biol Chem 2724058ndash4064

71 YoshidaM MeksuriyenD KashiwagiK KawaiG andIgarashiK (1999) Polyamine stimulation of the synthesis ofoligopeptide-binding protein (OppA) Involvement of a structuralchange of the Shine-Dalgarno sequence and the initiation codonaug in oppa mRNA J Biol Chem 274 22723ndash22728

72 LitwinCM and CalderwoodSB (1993) Role of iron in regulationof virulence genes Clin Microbiol Rev 6 137ndash149

73 MakuiH RoigE ColeST HelmannJD GrosP andCellierMF (2000) Identification of the Escherichia coli K-12Nramp orthologue (MntH) as a selective divalent metal iontransporter Mol Microbiol 35 1065ndash1078

74 PatzerSI and HantkeK (2001) Dual repression by Fe(2+)-Furand Mn(2+)-MntR of the mntH gene encoding an NRAMP-likeMn(2+) transporter in Escherichia coli J Bacteriol 1834806ndash4813

75 FurrerJL SandersDN Hook-BarnardIG and McIntoshMA(2002) Export of the siderophore enterobactin in Escherichia coliinvolvement of a 43 kDa membrane exporter Mol Microbiol 441225ndash1234

76 CunninghamL GruerMJ and GuestJR (1997) Transcriptionalregulation of the aconitase genes (acnA and acnB) of Escherichiacoli Microbiology 143 ( Pt 12) 3795ndash3805

77 FuentesAM Diaz-MejiaJJ Maldonado-RodriguezR andAmabile-CuevasCF (2001) Differential activities of theSoxR protein of Escherichia coli SoxS is not required for geneactivation under iron deprivation FEMS Microbiol Lett 201271ndash275

78 VargheseS TangY and ImlayJA (2003) Contrasting sensitivitiesof Escherichia coli aconitases A and B to oxidation and irondepletion J Bacteriol 185 221ndash230

79 ZavitzKH DiGateRJ and MariansKJ (1991) The PriB andPriC replication proteins of Escherichia coli Genes DNAsequence overexpression and purification J Biol Chem 26613988ndash13995

80 SandlerSJ (2005) Requirements for replication restart proteinsduring constitutive stable DNA replication in Escherichia coli K-12Genetics 169 1799ndash1806

81 SchneiderTD (2001) Strong minor groove base conservation insequence logos implies DNA distortion or base flipping duringreplication and transcription initiation Nucleic Acids Res 294881ndash4891 httpwwwccrnpncifcrfgov~tomspaperbaseflip

Nucleic Acids Research 2007 Vol 35 No 20 6777

Page 3: Discovery of Fur binding site clusters in Escherichia coli ...biitcomm/research/references... · using an fhuF:lacZ fusion and Fur consensus sequence-containing plasmid titrant on

conditions In our study we used two cutoff values fordifferent scansFor the whole E coli genome scan we used the lowest

information in the set of sites used to build the model asthe cutoff This should minimize false positives in ourpredictions as the model was built from experimentallyproven sites Groups of sites that were within 100 basesof each other were identified using the localbest programand the strongest one was selected to represent theregion This ensured that each region in the data set wasunique All sites with Ri values greater than 170 bits wereextracted (Table 4) This value was chosen simply to allowfor a manageable set of regions for further analysisAs Fur can extend binding to much weaker regions in

the footprints we should use a lower cutoff for scanningfootprinted sequences The second law of thermodynamicssets a theoretical lower bound for the information contentof a binding site (Ri) at zero bits (19) Therefore we useda zero-bit cutoff when scanning footprinted sequencesFur has been suggested to repress genes by binding to

their promoter elements (35 and 10 regions) therebyblocking the access of polymerase to the promoters(3343ndash46) To determine how many of the predictedFur binding clusters overlap with promoters we used aninformation theory-based flexible s70 binding model toscan the Fur clusters The s70 model was built from401 experimentally proven E coli promoters by uniformlytaking into account the information present in the 10the 35 and the uncertainty of the spacing between them(47) the same method has been successfully used to modelprokaryotic ribosome binding sites (48) The informationcontent (average conservation) of the promoter model islow (x frac14 65 bits SD=28 bits for the Ri distribution)and the individual information for a site in the modelranges from 03 to 126 bits The probability that a site isless than 4 bits is only 186 so we used 4 bits as a cutoffto predict reasonably strong non-activated promoters (47)

Gel mobility shift assays

Two sets of oligonucleotides containing predictedFur binding sites in E coli were designed andsynthesized (Oligos Etc) All oligonucleotides were self-complementary had a 50-GCTA-30 overhang on the50 end and contained a hairpin loop Oligos exbB exbDand fhuF contained a hairpin of the sequence 50-CGCGAAGC G-30 while the other 12 oligos [yoeAfepD-entS (formerly ybdA) gpmA ryhB fhuA nohAoppA gspC garP (formerly yhaU) yahA fadD and ygaC]contained a hairpin of the sequence 50-ACGATCGCGCGAAGC GCGATCGT-30 in the center Such loopsform a structure that is stabilized by base pairing betweenG3 and A5 of the central seven bases of each loop (49) andthey are convenient for use in DNA mobility shift assaysbecause of the exact equimolar concentrations of thecomplementary strands and their high stability (25)Three oligos containing the promoter regions for exbB

ygaC and the upstream region of exbD were created totest previously published consensus sequence predictions(5051) Eight oligos contained potential Fur-controlledpromoters identified using both an 11-site Fur model and

a s70 model as described above [yoeA fepD-entS (formerlyybdA) gpmA ryhB fhuA nohA oppA and fhuF] We werealso interested in whether Fur binds intragenically so fouroligos were synthesized that contained strong predictedsites found within gene coding regions (gspC garP yahAand fadD) These sequences can be found in SupplementaryTable S1 and Figure S1

Gel mobility shift assays were performed using the15 oligos (Figure 3) The oligonucleotides were labeled bya fill-in reaction with Taq DNA polymerase The 20 ml oflabeling mixture [20 pmol oligo 2 mM MgCl2 1PCRbuffer 5 units Taq polymerase (GibcoBRL) and 10 mMtetramethylrhodamine-6-dUTP (NEN)] was incubated at72C for 1 h 80 ml of ddH2O was added to the mixturefollowed by two phenol extractions a phenolchloroformextraction and a chloroform extraction The labeledoligonucleotides were diluted 15 in TE (10 mM TrisndashClpH 75 1 mM EDTA) boiled for 10 min and then placedon ice to prevent dimerization and trimerization

The labeled oligonucleotides (5 ml each 02 pmol) werethen incubated in Fur binding buffer (10 mM BisndashTrisndashHCl pH 75 5 mgml sonicated salmon sperm DNA 5glycerol 100 mM MnCl2 100 mgml BSA 1 mM MgCl240 mM KCl) with 150 nM Fur protein at 37C for 13 min(7) Fur was a gift from T OrsquoHalloran and C OuttenSamples were electrophoresed on a 5 polyacrylamidegel in Fur electrophoresis buffer (01 M BisndashTrisndashHClpH 75 10 mM MnCl2) at 120V for 1 h

Bands were visualized with an FMBIO II fluore-scent scanner (Hitachi) with an excitation wavelengthof 532 nm and a 585 nm filter for detection oftetramethylrhodamine

Footprinting

To generate the fhuF promoter construct (pGSO129) usedto test for Fur binding a 240-bp fragment amplified byPCR from genomic DNA (using the primers 50-GCGGCT GGA GAT GAA TTC GCC AGA TG and 50-GCCCTG CAA TCA GGG ATC CCG GCA GC) was clonedinto the BamHI and EcoRI sites of pUC18 (27) PurifiedFur protein generously provided by C Outten andT OrsquoHalloran was incubated with a Mn2+-containingbuffer according to de Lorenzo et al (7) DNase I foot-printing then was carried out as described previously (52)The 240-bp BamHIndashEcoRI fragment of pGSO129 waslabeled with [g-32P]ATP at either the BamHI site (topstrand) or the EcoRI site (bottom strand) The labeledfragments were incubated with 0 50 100 200 and 400 nMpurified Fur protein at room temperature for 5 minThe samples then were subjected to limited DNase Idigestion purified and separated on 8 polyacrylamidesequencing gels

RESULTS

E coli Fur binding model

Eleven footprinted Fur binding sites (see Table 3 andSupplementary Figure S1) (832ndash39) from E coliK-12 (53)were used to create an initial information theory model

6764 Nucleic Acids Research 2007 Vol 35 No 20

Because Fur binds as a dimer (954) the model includedboth the footprinted Fur sequences and their comple-ments The sequences were aligned using the programmalign which maximizes the information content within aregion of the alignment (a lsquowindowrsquo) by shuffling the

sequences back and forth (17) Multiple alignmentwith a window size of (12 thorn12) gave a sequencelogo that shows base conservation in the range of(12 thorn12) (Figure 1A) For convenience we named thislogo M12

A

B

C

M12208 bits(minus12 +12)

M9181 bits(minus9 +9)

M6154 bits(minus6 +6)

===========gtolt===========

========gtolt========

=====gtolt=====

========gtolt================gtolt========

=====gtolt==========gtolt=====

0

1

2

bit

s

5prime

minus20

GTAC

minus19

minus18

CGTA

minus17

CGTA

minus16

GACT

minus15

ACGT

minus14

GTCA

minus13

minus12

GTA

minus11

GTA

minus10

GACT

minus9CTAG

minus8

TCGA

minus7

GCTA

minus6

TCA

minus5

TA

minus4

CT

minus3

TAG

minus2

CGA

minus1

CGT

0

TA

1

GCA

2

GCT

3

ATC

4

GA

5

AT

6

GAT

7

GCTA

8

AGCT

9

GATC

10

CTGA

11

CAT

12

CAT

13 14

CAGT

15

TGCA

16

CTGA

17

GCAT

18

GCAT

19 20

CATG

3prime

0

1

2

bit

s

5prime

minus20

CGTA

minus19

minus18

minus17

GTCA

minus16

GCAT

minus15

GCTA

minus14

GTA

minus13

GACT

minus12

CAGT

minus11

CGA

minus10

GCAT

minus9

TA

minus

minus8

GTA

minus7

GCAT

minus6

CAG

minus5

CA

minus4

GAT

minus3

TA

minus2

GCA

minus1

GCT

0 1

CGA

2

CGT

3

AT

4

CTA

5

GT

6

GTC

7

GCTA

8CAT

9AT10

CGTA

11

GACT

12

GTCA

13

TCGA

14

CAT

15

CGAT

16

CGTA

17

CAGT

18 19 20

GCAT

3prime

0

1

2

bit

s

5prime minus20

GCA

minus19

GCAT

minus18

GCTA

minus

minus17

GCTA

minus16

GCAT

minus15

CAGT

minus14

TGCA

minus13

GTAC

minus12

GCTA

minus11

GTA

minus10

GACT

minus9

CAG

minus8

CTGA

minus7

GCAT

minus6

A

minus5

TA

minus4

CT

minus3

TCAG

minus2

CGA

minus1

AGT

0

TA

1

TCA

2

GCT

3

GATC

4

GA

5

AT

6

T7

GCTA

8

GACT

9

GATC

10

TCGA

11

CAT

12

GCAT

13

CTGA

14

ACGT

15

GTCA

16

GTCA

17

GCAT

18

GCAT

19

CGTA

20

CAGT

3prime

Figure 1 Different alignments of footprinted E coli Fur binding sites Sequence alignments were done using the program malign with differentwindow sizes [from (15 +15) to (5 +5)] (17) on 11 footprinted E coli Fur binding sequences and their complements (see Supplementary FigureS1 for sequences) Three classes of alignments M12 (A) M9 (B) and M6 (C) were obtained by using window sizes of (12 +12) (9 +9) and(6 +6) respectively In these logos the height of each letter is proportional to the frequency of that base at each position and the height of theletter stack is the conservation in bits (18) The dashed sine wave on each logo represents the 106 base helical twist of B-form DNA (3181) Thedouble dashed arrows on the top of each logo mark the inverted repeats in each model Note that the M12 logo corresponds to two overlapping M9sites as indicated by the two red double dashed arrows below the M12 logo A similar relationship occurs between the M9 and M6 logos

Nucleic Acids Research 2007 Vol 35 No 20 6765

The M12 logo appears to consist of two overlappingdirect repeats one is from 12 to thorn6 and the other from6 to thorn12 Because the multiple alignment process canincorporate one or the other of these repeats this suggeststhat different alignments may be obtained when otherwindows are used to align the sequences Because two Furdimers can bind overlapping sites with 6-base separation(4055) we then multiply aligned the sequences by usingwindow sizes ranging from (15 thorn15) to (5 thorn5) Thesesizes should cover the sites recognized by one or twoFur dimers (9) A total of seven different alignments wereobtained the corresponding sequence logos are shown inSupplementary Figure S2 According to the patternsshown in the logos the seven alignments that wereobtained can be further consolidated into and assignedto three basic alignment classes these alignment classesare represented by logos M12 M9 and M6 (Figure 1)Each of the three alignments itself is an inverted repeatthe M12 logo also appears to consist of two overlappingM9 sites (from 12 to thorn6 and from 6 to thorn12) and theM9 logo looks like two overlapping M6 sites (from 9to thorn3 and from 3 to thorn9) (Figure 1)To determine which of the three alignment classes best

represents a single-dimer binding model some publishedexperimental data was analyzed Lavrrar and McIntosh(55) used the Ferguson method (56) to determine howmany E coli Fur dimers can bind on five synthetic oligosand two natural sites fepB and entS We applied the threemodels to their sequences and used sequence walkersto show the predicted sites of each model A full list ofthe predictions (with walkers gt0 bits) is given inSupplementary Figure S3 The results show that themodels frequently found overlapping sites that areseparated by 3 or 6 bases The sites that are 6 basesapart can both be strong while 3-base separated siteshave one strong site and one weak site (SupplementaryFigure S3) Furthermore it has been shown that Furdimers can simultaneously bind two overlapping sites with6-bp spacing (40) but there is no experimental datashowing that 3-base separated overlapping sites can bebound at the same time Therefore to analyze the resultswe only counted major walkers (strong walkers that areseparated by 6 bases) for each model in each oligo

A summary of the results (with major walkers) is availablein Table 1 two examples of the prediction (with majorwalkers) are shown in Figure 2A and B The results showthat the M9 model made correct predictions for five ofthe seven oligos while the M12 and M6 models bothmade only two correct predictions Furthermore the M12model always predicted one less site than the M9 modelThese results suggest that the M9 model may be a single-dimer binding model the M12 model may representa two-dimer binding model while the M6 alignment maybe just a result of compression of the binding sequencesinto a small alignment window

Confirmation of 13 predicted sites and model refinement

Since our E coli Fur model had only 11 sites we wantedto refine the model by including more sites A genome scanwith the 11-site M9 model revealed 389 sites above 11 bits(the lowest information of a site in the model is 111 bits)Within these 13 sites (from 12 to 26 bits) were selectedfor experimental confirmation (Figure 3 SupplementaryTable S1 and Figure S1) These include eight sites locatedin promoter regions [yoeA fepD-entS (43) gpmA (= pgm)(51) ryhB (= yhhX-yhhY) (5157) fhuA (13) nohA (51)oppA (13) and fhuF (51)] and four sites inside genes(gspC garP yahA and fadD) We also included two sitesthat were predicted by consensus models to be in ygaCand exbD (5051) since our model only detected a weaksite (14 bits) in ygaC and a negative site (50 bits) inexbD (Figure 3) Gel mobility shift assays showed that all13 sites predicted by the M9 model shifted when incubatedwith Fur protein while the exbD site did not shift and theygaC site only showed extremely weak binding (Figure 3)

McHugh et al (13) found that both exbB and exbDwere regulated by Fur but their model did not detect abinding site in the exbB promoter while our informationtheory model successfully predicted that Fur binding siteand our gel shift results confirmed this (Figure 3)The exbD gene is located only seven bases downstreamof the exbB gene Both genes have the same orientationso they may belong to a single operon cotranscribed fromthe exbB promoter and coregulated by Fur

To strengthen our model we added the 13 confirmedsites to the 11 previously footprinted sites (Figure 4A)

Table 1 Information theory test of 11-site E coli Fur models M12 M9 and M6

Oligo1 Number of observed1

Fur dimers (nM)Number of M122

walkers (bits)Number of M92

walkers (bits)Number of M62

walkers (bits)

F-F 1 (25000) f 1 (55) 2 (90 94) f 1 (115)F-F-F 2 (3610) 1 (148) f 2 (118 153) f 2 (119 99)F-F-F-F 4 (630) 2 (166 192) 3 (118 189 153) 3 (119 119 99)F-F-x-R 2 (095) f 2 (221 43) f 2 (177 176) 1 (204)F-F-x-R-R 3 (078) 2 (296 227) f 3 (177 275 108) 2 (204 164)fepB 2 (860) 1 (157) f 2 (109 82) 1 (152)entS 3 (301) 2 (156 172) f 3 (75 172 95) 2 (83 122)

1Oligos and observed Fur dimers are from (55) The apparent Kd is also given (nM in parentheses) for each oligo2Number of sequence walkers of each model followed by the information (bits) of each walker are shown for each sequence The predictions thatmatch the number of observed Fur dimers are marked with a solid circle while those that do not are marked with an open circle Only majorwalkers (strong walkers with 6-bp separation) are counted (see the text) A full depiction of all walkers (gt0 bits) can be found in SupplementaryFigures S3 and S5

6766 Nucleic Acids Research 2007 Vol 35 No 20

As with the 11 sites multiple alignment of the 24 sitesusing different window sizes also gave three classes ofalignments M12 M9 and M6 (Supplementary Figure S4)When these models were tested against the same set ofsequences as that used to test the 11-site models (Table 1Supplementary Figure S3) similar but cleaner resultswere obtained ie the weak walkers that were not countedpreviously become weaker (some are even below 0 bits)but the strong walkers still have similar Ri values(Supplementary Figure S5) This confirms that ourcounting of major walkers predicted by the initial 11-sitemodels was reasonable

P aeruginosa and B subtilis Fur models

To compare E coli Fur models with other bacterialFur models we also built P aeruginosa and B subtilis Furbinding site models As with E coli Fur binding siteswe applied the same method to align footprinted Furbinding sites from P aeruginosa (41) and B subtilis (42)For both bacteria we obtained two classes of alignments

(Supplementary Figure S6) which correspond to theE coli M9 and M6 logos (Figure 1B and C) respectivelyNo M12 alignments were obtained for these two bacteriaprobably because some sites used to make the modelsare single-dimer binding sites (see Discussion Section)Both P aeruginosa and B subtilis M9 models arehighly similar to the E coli M9 model (Figure 4) TheP aeruginosa M6 logo looks almost identical to theE coli M6 logo while the B subtilis M6 logo is lesssimilar to the E coli M6 logo (Figure 1C SupplementaryFigure S6B and D)As with Lavrrar and McIntoshrsquos work on E coli Fur

binding sites (55) Baichoo and Helmann (40) performedsimilar experiments on eight synthetic oligos and twonatural sites (feuA and dhbA) with B subtilis Fur anddetermined how many Fur dimers can bind each of theseoligos We tested the B subtilis M9 and M6 modelswith these 10 sequences (Table 2 Figure 2C and Dand Supplementary Figure S7) The results show that theM9 model made correct predictions for all 10 sequenceswhile the M6 model was correct for only four oligos

A

B

fepB binds 2 Fur dimers

entS binds 3 Fur dimers

C

D

feuA binds 1 Fur dimer

dhbA binds 2 Fur dimers

10 20 30 5 g g a a t t c g a a a a t g a g a a g c a t t a t t g g a t c c c g 3

M12 157 bits

M9 109 bits

M9 82 bits

M6 152 bits

10 20 30 40 5 g g a a t t c g a t a a t g a a a t t a a t t a t c g t t a t c g g a t c c c g 3

M12 156 bits

M12 172 bits

M9 75 bits

M9 172 bits

M9 95 bits

M6 83 bits

M6 122 bits

10 20 30 5 a t t c c a a t t g a t a a t a g t t a t c a a t t g a a c a 3

M9 223 bits

M6 103 bits

M6 81 bits

10 20 30 5 a t t g a t a a t g a t a a t c a t t a t c a a t a g a t t g 3

M9 256 bits

M9 299 bits

M6 149 bits

M6 208 bits

M6 171 bits

Figure 2 Sequence walkers to test E coli (A and B) and B subtilis (C and D) Fur models (A and B) The three E coli Fur models (M12 M9 andM6) were tested with two E coli Fur binding sites fepB and entS (55) Sequence walkers of each model (green M12 red M9 blue M6) are shownfor each site (C and D) Two B subtilis Fur models (M9 and M6) were tested with two B subtilis Fur binding sites feuA and dhbA (40) Gray barsindicate natural sequence surrounded by synthetic DNA Different colored rectangles indicate different Fur models (by hue green M12 red M9blue M6) and their strengths (by saturation) (30) Only major walkers are shown in these figures for a full list of walkers (Rigt0 bits) please seeSupplementary Figures S3 S5 and S7

Nucleic Acids Research 2007 Vol 35 No 20 6767

For most of the oligos the M6 model predicted one moresite than the M9 model did Similar to our observationsin E coli these results strongly suggest that the M9 modelis a single-dimer binding model while the M6 alignment isdue to compression of the binding sequencesRelative to E coli Fur (NP_415209) the P aeruginosa

Fur (NP_253452) is moderately diverged (58 identical inprotein sequence) and the B subtilis Fur (NP_390233) isdistantly diverged (33 identity) However a comparisonof the three M9 models from these three species showsthat these models are highly similar to each other(Figure 4) suggesting that the FurndashDNA recognitionmechanism is conserved in even distantly related species

Scans of footprinted regions

Gel shift experiments cannot give precise Fur bindingregions Originally using the 11-site M9 model and laterusing the 24-site M9 E coli Fur model we predicted Fursites in the fhuF promoter region and then to validate ourmodel we performed DNase I footprinting on that regionTo further confirm the 24-site M9 model we also scannedit across all other available footprinted E coli Fur bindingregions and correlated our predictions with the publishedfootprints Fur has been documented to exhibit lsquosecondaryfootprintingrsquo protecting extended regions of DNA underhigher protein concentrations (Supplementary Figure S8)(333646) To reveal the extended regions we used a cutoffat zero bits since that is the theoretical lowest informationfor binding (19)The E coli 24-site M9 model predicts two separated

Fur binding clusters in the fhuF promoter region onecluster contains three strong sites of 161 208 and 223bits and the other contains two weaker sites of 146 and

93 bits (Figure 5A) The fhuF oligo used in the gel shiftexperiments in Figure 3 only contains the strong Furcluster To determine if both predicted clusters are pro-tected by Fur we performed DNase I footprinting on alarger fhuF promoter region (see Materials and MethodsSection) that contains both predicted Fur binding clustersThe results show that there are indeed two Fur-protectedregions in the fhuF promoter one has high affinity withFur (protected by 50 nM of Fur) and the other hasa lower affinity (weakly protected by 50 nM of Fur)(Figure 5B) The predicted strong Fur binding clustercovers 91 of the high-affinity footprinted region and theweak Fur cluster covers 66 of the low-affinity region(Table 3 Figure 5A)

There are 11 other footprinted E coli Fur bindingregions including the 10 regions used to build the initial11-site models and the fepD-entS promoter region (58)When these footprints were scanned similar coveragewas obtained (Table 3 Supplementary Figure S8)All 11 footprinted sequences displayed clusters of multipleoverlapping Fur walkers in these clusters the majorityof the walkers were separated by six bases some wereseparated by three bases (Table 3 SupplementaryFigure S8) For 7 of the 11 regions (fepA-fes fepB entCfur tonB cir and iucA) sequence walkers cover 83(72ndash93) of the footprints No sequence walkers coverthe secondary footprints in the fecA fepD-entS and sodAregions resulting in low coverage of these three footprints(66 54 and 56 respectively) The secondary sodAfootprint was only protected with a high concentration ofpurified Fur (200 and 500 nM) it was not protected bycrude extracts containing overproduced Fur (36) In theother region fecI several low-information content

A

B

yoeA fepD-entS gpmA ryhB fhuA nohA oppA

bits 257 172 243 220 161 205 187 11-site M9274 194 274 252 195 223 215 24-site M9

Fur minus + minus + minus + minus + minus + minus + minus +

gspC garP yahA fadD ygaC exbB exbD fhuF

bits 122 122 163 197 14 175 minus50 214 11-site M9144 138 186 217 minus34 191 minus71 223 24-site M9

Fur minus + minus + minus + minus + minus + minus + minus + minus +

Figure 3 Gel mobility shift assays to test predicted Fur binding sites Oligonucleotides containing predicted Fur binding sites were incubated without() or with (+) 150 nM E coli Fur protein and gel electrophoresed to test the 11-site M9 model (Figure 1B) Below each set of lanes is the strengthin bits of the strongest sequence walker found on each oligo by the 11-site M9 model Predictions by the 24-site M9 model which includes 13 of thesites tested here (not ygaC or exbD) are also given for comparison (A) The first set of oligos contain seven predicted Fur sites located in promoterregions (yoeA fepD-entS gpmA ryhB fhuA nohA and oppA) (B) The second set of oligos contain four predicted Fur sites located within genes(gspC garP yahA and fadD See Supplementary Figures S10ndashS13 for sequence walkers) two sites located in promoter regions (exbB and fhuF)and two consensus-predicted sites (ygaC and exbD) (5051) As expected all oligos that contain predicted sites (by the 11-site M9 model) exhibit oneor more mobility shifts following incubation with 150 nM Fur the oligo ygaC (14 bits) shows extremely weak binding and exbD (50 bits)does not shift

6768 Nucleic Acids Research 2007 Vol 35 No 20

walkers (02 and 08 bits) extend past the range of thefootprint resulting in 144 coverage of the footprint

In a previously published study Escolar et al synthe-sized oligonucleotides that contained repeats of thesequence GATAAT and determined Fur binding byfootprint experiments (5) No footprint was seen withone insert correspondingly no sequence walkers wereobserved in that sequence (Supplementary Figure S9)For the oligos with three four and five inserts sequencewalkers cover 83ndash90 of the footprints The 2-insertsequence had a weak interaction with Fur at high protein

concentrations and two overlapping walkers of 30 and42 bits appeared which cover the repeats (5) Theseresults demonstrate that the Fur model accurately predictsFur binding on both natural and synthetic sequences

Whole genome scan

The lowest information of any site in the 24-site M9 modelis 101 bits (Supplementary Figure S1) Since all sites inthe model are proven binding sites we used 10 bits asthe cutoff for a whole genome scan This presumably

A

B

C

======gtolt======

======gtolt======

======gtolt======

24 E coli Fur binding sites 196 bits (minus9 +9)

0

1

2b

its

5prime

minus20

CTGA

minus19

minus18

minus17

minus16

GCAT

minus15

GCTA

minus14

CGTA

minus13

GACT

minus12

CAGT

minus11

CGTA

minus10

GCAT

minus9

GTA

minus8

GTA

minus7

GCAT

minus6

CAG

minus5

GTCA

minus4

GAT

minus3

TA

minus2

TGCA

minus1

GACT

0 1

TCGA

2

ACGT

3

AT

4

TCA

5

CGAT

6

GTC

7

CTGA

8

CAT

9

CAT

10

CGTA

11

GCAT

12

GTCA

13

CTGA

14

GCAT

15

CGAT

16

CGTA

17 18 19 20

GCAT

3prime

20 P aeruginosa Fur sites 186 bits (minus9 +9)

0

1

2

bit

s

5prime

minus20

minus19

minus18

minus17

minus16

minus15

minus14

CGTA

minus13

ACGT

minus12

minus11

minus10

CGTA

minus9

CGTA

minus8

CTA

minus7

CT

minus6

TAG

minus5

GTCA

minus4

CAGT

minus3

TA

minus2

CTA

minus1

CT

0

GC

1

GA

2

GAT

3AT

4GTCA

5

CAGT

6

ATC

7

GA

8

GAT

9

GCAT

10

GCAT

11 12 13

TGCA

14

GCAT

15 16 17 18 19 20 3prime

22 B subtilis Fur sites 229 bits (minus9 +9)

0

1

2

bit

s

5prime

minus20

minus19

GCAT

minus18

GCTA

minus17

GACT

minus16

CTGA

minus15

minus14

GCAT

minus13

GCAT

minus12

minus11

GCAT

minus10

GCTA

minus9

CGTA

minus8

CGTA

minus7

AT

minus6

C

G

minus5

GCA

minus4

CGAT

minus3

TGA

minus2

TGA

minus1

GCAT

0

TAGC

1

GCTA

2

CAT

3

CAT

4

GCTA

5

CGT

6

G

C7

TA

8

GCAT

9

GCAT

10

CGAT

11

CGTA

12 13

CGTA

14

CGTA

15 16

GACT

17

CTGA

18

CGAT

19

CGTA

20 3prime

Figure 4 Comparison of M9 Fur models of three bacterial species Experimentally proven Fur binding sites from E coli (A) P aeruginosa (B) andB subtilis (C) were used to build the models for these three bacteria The marked region from 7 to +7 in each logo indicates the 7-1-7 model thatwas proposed by Baichoo and Helmann (40)

Nucleic Acids Research 2007 Vol 35 No 20 6769

minimizes false positives in our predictions When themodel was scanned across the E coli K-12 genome(NC_000913) a total of 412 sites were found above10 bits These sites belong to 363 unique clusters A list ofthe clusters is available at httpwwwccrnpncifcrfgovtomspapersfurThe 363 regions were extracted from 50 to thorn50

around the strongest site in each region and scanned withthe 24-site M9 model using a cutoff of 0 bits We thendetermined the exact region of each cluster that is coveredby walkers greater than 0 bits The results show that theseclusters contain 1ndash14 walkers covering a region from19ndash142 bases (350 bases on average) This is comparableto the known footprints (30ndash103 bases) (Table 3 Supple-mentary Figure S8) Of these clusters 155 are in inter-genic regions 143 are entirely inside gene coding regionsand 65 overlap the boundary between coding and inter-genic regionsTo determine how many of the Fur clusters overlap

with reasonably strong promoters we scanned these 363Fur clusters with a s70 promoter model (47) using a 4-bitcutoff The results showed that 303 clusters overlap withone or more strong promoters (Rigt4 bits) The other60 clusters do not overlap with a strong promoterWhen we scanned the flanking regions of these 60 clusters(50 to 0 of the left edge and 0 to thorn50 of the right edgeof each cluster) only 23 clusters were found to havestrong promoters (Ri gt 4 bits) in the flanking regionsWe examined the locations of Fur sites relative toknown promoter elements (transcriptional start site 10and 35) and found that they are more tightly distributedaround the promoter elements than observed with theDNA binding proteins Fis H-NS and IHF (47) (data notshown) These results suggest that Fur may mainly act as adirect repressor of genes by binding to and blockingthe promoters and that direct activation by Fur throughbinding to the region upstream of a promoter assuggested in Neisseria meningitidis by Delany et al (59)

may not be a common mechanism of Fur regulationin E coli

Out of the 363 clusters 42 were found containingat least one predicted Fur binding site above 170 bits(our arbitrary cutoff used to locate significant regions)(Table 4) Of the 42 clusters 39 were found to overlap withone or more strong s70 promoters (gt4 bits) The otherthree clusters fepD-entS yddA and fecA do not overlapScanning the flanking regions of these three clusters withthe s70 promoter model revealed that these Fur clustersare located between the respective promoters and transla-tional starts There may be only weak repression of thesegenes by Fur as none of them were significantly repressedby Fur according to microarray analyses (1360)

Scans of other proposed Fur-regulated genes

Many genes have been proposed to be Fur-regulated bycomparing conventional consensus sequences to promoterregions and also by homology to systems in other organ-isms Kammler et al annotated two Fur binding sites inthe promoter region of feoA in E coli by comparison toa consensus sequence (10) The 24-site M9 model predictstwo clusters of sites in the same region with each clustercovering one of the marked Fur boxes One cluster hasone major site of 105 bits (at 3538006) the other clusterhas two major overlapping sites of 109 (at 3538059) and134 bits (at 3538065) respectively The same authors haveconfirmed that Fur does bind to this region in vivo but theexact Fur binding regions have not been determined byfootprint experiments

Vassinova and Kozyrev used an in vivo selection tolocate Fur sites on Sau3A fragments from the E coligenome (51) The five regions from Figure 2 of their paperwere analyzed using sequence walkers Using an unidenti-fied consensus sequence yhhX (from 3578828 to 3577791)was predicted by Vassinova and Kozyrev to have two Furbinding sites In the same region we found one strongsite of 252 bits (at 3579054) (Table 4) overlapping twoweaker sites (82 and 83 bits) This cluster of Fur sites isactually located right in front of an sRNA gene ryhB(from 3579039 to 3578946) (Table 4) (6162) which hasbeen shown to be repressed by Fur (57) The promoterregion of gpmA (from 786818 to 786066 named pgm inreference 51) was predicted by consensus to contain twosites One of the highest sites (274 bits) was found in thisregion (at 786853) overlapping two weaker sites of 75and 56 bits The consensus sequence-predicted Fur site byVassinova and Kozyrev (51) is located about 600 basesupstream of the ygaC gene In the same region our 11-siteM9 model only found a 14-bit site (Figure 3) and the24-site M9 model found a negative site of 34 bits(Supplementary Figure S1) Instead in the nearby regionsthe 24-site M9 model found a 74-bit site (11 basesupstream of ygaC) and a 101-bit site (728 bases upstreamof ygaC) For nohA (from 1634391 to 1633822)two overlapping sites of 223 and 126 bits were found at1634624 and 1634630 (Table 4) The fifth region was fhuF(from 4603686 to 4602898) Figure 5 shows the DNase Ifootprints and predicted walkers in this region Onlythe high-affinity region (fhuF1) was identified by the

Table 2 Information theory test of B subtilis Fur models M9 and M6

Oligo1 Number of observed1

Fur dimers (nM)Number of M92

walkers (bits)Number of M62

walkers (bits)

6-mer 0 f 0 f 06-6 1 (500) f 1 (21) 2 (67 54)6-1-6 1 (200) f 1 (97) 2 (54 74)7-1-7 1 (100) f 1 (207) 2 (107 128)8-1-7 2 (100 1000) f 2 (76 202) f 2 (174 108)8-1-8 2 (100 1000) f 2 (76 230) f 2 (174 160)9-1-9 2 (50 100) f 2 (119 301) f 2 (208 194)2 (7-1-7) 2 (10 100) f 2 (166 257) 3 (61 208 143)feuA 1 (10) f 1 (223) 2 (103 81)dhbA 2 (10 100) f 2 (256 299) 3 (149 208 171)

1Oligos and observed Fur dimers are from (40) The Fur concentration(nM) at which one or two Fur dimer-DNA complexes appeared isgiven in parentheses2Number of sequence walkers of each model followed by theinformation (bits) of each walker are shown for each sequence Thepredictions that match the number of observed Fur dimers are markedwith a solid circle while those that do not are marked with an opencircle Only major walkers are counted A full depiction of all walkerscan be found in Supplementary Figure S7

6770 Nucleic Acids Research 2007 Vol 35 No 20

A B

L

H

H

L

H fhuF1

L fhuF2

yjjZ

4603880 4603870 4603860 4603850 4603840 4603830 4603820 5 a t c a g c a a t c c c g g c a g c a a c a c t c c c c a g c c a c t g c c c a g c g t a c g t t g c a a c a t g a t t t c a t c t c 3

4603810 4603800 4603790 4603780 4603770 4603760 4603750 5 t t t c a t t g a t a a t g a t a a c c a a t a t c a t a t g a t a a t t t t t a t c a t t t g c a a g c c a g a t a a a t c c c t t g c t a t c g g 3

Fur high affinity topFur high affinity bottom

M9 161 bits

M9 208 bits

M9 223 bits

M9 34 bits

fhuF

fhuF

4603740 4603730 4603720 4603710 4603700 4603690 4603680 5 g t a a a c c t a t c g c t a t g a t t a g c a a t c a t t a t c a t t t a g a t t a c t a t c c c g a t t a t g g c c t a t c g t t c 3rsquo

Fur low affinity topFur low affinity bottom

M9 22 bits

M9 01 bits

M9 146 bits

M9 93 bits

Figure 5 Fur model scan and DNase I footprints of the fhuF promoter region (A) The promoter region was scanned with the 24-site M9 model and two clusters of sequence walkers (Rigt 0bits) were detected with each closely corresponding to one of the two Fur protected regions (gray bars fhuF1 and fhuF2) The solid black arrow with two tails above the DNA at coordinates4603717 and 4603716 indicates the fhuF transcription starts The open arrows starting at coordinates 4603827 and 4603686 indicate the yjjZ and fhuF translation starts respectively Thesaturation of colored rectangles behind each sequence walker is proportional to the information of that site For reference colored lines cycle with 6-base periodicity (B) DNase I footprinting onthe fhuF promoter by Fur showed two regions protected by the protein marked in the figure by brackets The high-affinity and low-affinity regions are marked with H and L respectively Thefootprinting samples were run in parallel with MaxamndashGilbert sequencing ladders (marked by GA)

Nucleic

Acid

sResea

rch2007V

ol3

5N

o20

6771

consensus sequence Thus four of the five Sau3Afragments (yhhX gpmA nohA and fhuF) selected to haveFur binding sites in vivo were identified in our genomescan (Table 4) and our gel shift results confirmed this(Figure 3) the remaining site ygaC had a weak walker of14 bits (by the 11-site M9 model) and showed extremelyweak binding (Figure 3)Eick-Helmerich and Braun matched a Fur consensus

sequence to the promoters of exbB and exbD (50)One strong walker of 191 bits was found in the exbBpromoter region overlapping the lsquoFur boxrsquo The regionupstream of exbD showed no walkers with Ri greater than0 bits even though it had also been predicted to containa Fur binding site using the same consensus sequenceas used in the exbB promoter The highest walker thatwe could detect in the region upstream of the exbDgene is 71 bits (by the 24-site M9 model Figure 3 andSupplementary Figure S1) significantly lower than thetheoretical lower bound of a binding site (0 bits) (19) Thisindicates that Fur should not bind this region and our gelshift results confirmed this (Figure 3)A microarray analysis by McHugh et al found 101

genes to be regulated by Fur 53 of which were repressedand 48 activated (13) The 53 Fur-repressed genes belongto 32 transcription units (individual genes or operons)18 of which are involved in iron metabolism 14 in energymetabolism and other functions Using the 24-site M9model we predicted strong Fur binding clusters (higherthan 13 bits) for all 18 iron metabolism-related transcrip-tion units Of the other 14 transcription units onlytwo have a Fur binding site higher than 10 bits (nrdHIEF101 bits fimE 108 bits) The 48 Fur-activated genesbelong to 34 transcription units only three of these(garPLRK ynaE and ydfK) have a predicted Fur bindingcluster The garPLRK operon has a weak Fur clusterthat contains one major site of 104 bits The ynaE andydfK genes are almost identical in both promoter andcoding regions and the two Fur clusters (171 bitscentered at thorn8 from the translational start) are the same(Table 4) These results strongly suggest that in E coli

Fur is a direct repressor of iron metabolism genes and anindirect repressor or activator of other genes

McHugh et al (13) used an altered information theoryapproach for modeling Fur binding (63) to identify whichgenes are under direct Fur control Our model identifiedsites in all genes that their model did as well as fourothers fhuE (124 bits at 1160880) exbB (191 bits at3150076) fimE (108 bits at 4539697) and ydfK (171 bitsat 1631071)

The indirect activation by Fur requires an intermediateregulator which could directly target Fur-activated genesA small regulatory RNA gene ryhB is one such importantplayer Earlier study revealed several genes that aredirectly repressed by ryhB (57) Recently a microarrayanalysis of ryhB control by Masse et al (60) expanded thisdirect target set to 18 operons They also identified anadditional 10 indirectly down-regulated operons and 10up-regulated operons The 10 ryhB indirectly repressedoperons are directly repressed by Fur and for all ofthese operons we found strong Fur binding clusters(higher than 13 bits) in the corresponding promoterregions In contrast only a few of the other 28 ryhB-controlled operons (18 directly repressed operons and10 up-regulated operons) have a predicted Fur bindingcluster above 10 bits These include acnA (176 bits at1333484) ydhD (109 bits at 1732367) and oppA (215 bitsat 1298973) The acnA and oppA Fur clusters are bothstrong the former cluster has three overlapping sites of176 98 and 142 bits and the latter has two overlapp-ing sites of 215 and 96 bits (Table 4) Howevermicroarray analysis did not find these two genes to berepressed by Fur (1360) suggesting that the Fur bindingclusters may not be the primary control elements ofthese genes

DISCUSSION

Fur binding models

In this study we used experimentally proven sites to createinformation theory models of Fur binding Our models

Table 3 Scan of footprinted Fur binding sites in E coli

Accession Gene Footprinted region References Number Ri Percentage

NC_000913 fhuF1 4603763 ndash 4603815 This work 4 223 91NC_000913 fhuF2 4603680 ndash 4603733 This work 4 146 66NC_000913 fepA-fes 611868 ndash 611915 (35) 4 187 77NC_000913 fepB 623939 ndash 623969 (39) 2 167 81NC_000913 entC 624054 ndash 624102 (39) 4 231 76NC_000913 fur 709931 ndash 709960 (34) 2 123 83NC_000913 tonB 1309040 ndash 1309072 (838) 3 115 88NC_000913 cir 2244957 ndash 2244999 (32) 3 207 72NC_000913 sodA 4098726 ndash 4098780 (36) 3 233 56NC_000913 fecA 4514752 ndash 4514789 (33) 2 176 66NC_000913 fecI 4516300 ndash 4516340 (33) 7 231 144M10930 iucA 291 ndash 393 (37) 13 203 93NC_000913 fepD-entS 621394 ndash 621462 (58) 4 201 54

The footprinted regions were scanned with the 24-site M9 model (Rigt0) to test the modelrsquos validity This table lists the GenBank accession numberthe gene promoter that was footprinted the coordinates of the footprinted region the references for the footprints the number of walkers in theregion the strongest Ri value in the same region and the percent of the footprint covered by sequence walkers This table is a summary of Figure 5Aand Supplementary Figure S8 Sites in the promoters that are marked by an asterisk were used to build the 11-site models (Figure 1 SupplementaryFigure S1) in the case of iucA two sites were used in the model (see Materials and Methods Section)

6772 Nucleic Acids Research 2007 Vol 35 No 20

especially the 24-site M9 model approximate the bindingcharacteristics of Fur more fully than previous modelsthat depended on conventional consensus sequences datafrom multiple species and sequences that were not foot-printed (13251) The rigorous approach used hererevealed new binding sites disproved two sites predictedby a consensus sequence and clarified the manner inwhich the protein binds

To produce these models binding sequences weremultiply aligned to maximize the information content indifferent window sizes Because multiple Fur sites overlapour analysis of proven Fur binding sequences from three

bacterial species gave two or three different alignmentsdepending on the species (Figure 1 SupplementaryFigure S6) By testing these models against publisheddata we showed clearly that the M9 model representssingle-dimer binding sites the M12 model is for two-dimerbinding while the compact M6 alignment was caused bycompression of the binding sequences into a small window(Tables 1 and 2 Figures 1 and 2 Supplementary FiguresS3 S5 and S7)As evaluated by their respective M9 models our

initial set of 11 footprinted E coli Fur binding sitesall contain overlapping sites of 6-base spacing

Table 4 Strong Fur clusters in the E coli genome predicted by the 24-site M9 model

Ri Number Size Coordinate Genes Experimental Position References(bits) (bp) data

274 12 95 2066611 yoeA G 48 NA274 5 31 786853 gpmA G 35 (1351)253 4 28 579824 nohB 133 (1)252 4 31 3579054 ryhB G 15 (57)233 3 31 4098746 sodA F 87 (6069)231 7 59 4516309 fecI F 51 (1360)231 4 37 624072 entC F 36 (39)228 2 25 1903412 yebN 300 NA223 4 48 4603779 fhuFa FG 93 (135160)223 3 25 1634624 nohA G 233 (51)217 2 25 1886651 fadD G +1119b NA215 2 25 1298973 oppA G 233 (7071)213 3 31 1762463 sufA 53 (1360)207 3 31 2244980 cir F 189 (1372)206 5 40 1118358 yceJ 89 NA202 3 31 3465053 bfd 40 (1360)201 10 53 1787602 ydiE 35 (13)201 2 35 1080513 ycdN 57 (1360)199 8 106 3214572 yqjH 59 (13)199 7 103 1752756 ydhY 255 NA199 2 25 2510784 mntH 56 (7374)196 3 25 2231892 yeiT 163 NA196 2 25 1577358 yddA +8c (13)195 3 31 167439 fhuA G 45 (1360)194 4 37 621437 fepD ndash entS FG 25 86 (1375)193 3 52 675768 ybeQ +2c NA191 3 31 3150076 exbB G 70 (136072)187 8 96 2784036 ygaQ 383 NA187 4 37 611889 fepA ndash fes F 172 149 (13)186 7 38 3266404 yhaC 33 NA186 4 28 331901 yahA G +306e NA183 4 61 331085 yahA 510e NA176 6 41 1333484 acnA 371 (5776ndash78)176 2 25 4514779 fecA F 79 (1360)176 1 19 2254041 yeiL +5d NA175 3 25 490057 priC ndash ybaN 21 49 (7980)173 2 29 2454142 yfcV 474 NA172 8 41 3582772 yrhB 10 NA172 4 37 840877 fiu 123 (13)171 3 40 1631071 ydfK +8c (13)171 3 40 1432273 ynaE +8c (13)171 2 25 4570225 yjiT 212 NA

The strongest individual information value (Ri bits) in each cluster is shown followed by the number of walkers (Rigt 0 bits) the size of the clusterthe coordinate of the strongest site the genes that are presumably controlled experimental data footprinted (F) andor gel shifted (G) sites thelocation of the strongest Fur site relative to the gene start or stop codon (as noted) and references about Fur regulation of the genes (NA means noreference is available)aThis cluster corresponds to the high-affinity region (fhuF1) in the fhuF promoter (Figure 5)bThe coordinate is high because Fur cluster is inside the fadD coding regioncThese Fur clusters overlap with translational startsdThis Fur cluster overlaps with the stop codoneThe yahA gene has two Fur clusters one is inside the coding region another is 510 bases upstream of the translational start The Fur cluster insidethe gene was tested by gel shift experiments (Figure 3)

Nucleic Acids Research 2007 Vol 35 No 20 6773

(Supplementary Figure S8) while only 7 of the 20P aeruginosa Fur binding sites and 6 of the 22 B subtilisFur binding sites are single-dimer binding sites (data notshown) This scarcity of multiple sites explains why anM12 alignment can be obtained for E coli but not forP aeruginosa or B subtilis When we performed multiplealignment by only using overlapping sites of 6-bp spacingwe obtained M12 alignments for both P aeruginosa andB subtilis (data not shown) These observations alsosupport the use of M9 as a single-dimer modelOur multiple alignment method should apply to any

DNA binding sites that contain internal direct repeats asrepetition within binding sites may cause alternativealignments of the sites In the RegulonDB (httpregulondbccgunammx) (64) a Fur binding modelwas created by multiply aligning 47 Fur sites althoughit is not clear what these 47 sites are and how thealignment was made Based on their alignment (httpregulondbccgunammxhtmlmatrix_Alignmentjsp) we made a sequence logo (data notshown) which we found to be similar to our M6 logosAlthough the M6 models appear to be able to predictsome sites (eg Tables 1 and 2 Supplementary Figures S3S5 and S7) the models only represent the internal partof overlapping Fur binding sites ie from 6 to thorn6 of theM12 model (Figure 1A)A 18 A crystal structure for P aeruginosa Fur

was obtained by Pohl et al and based on this theauthors proposed a single-dimer binding model (9) TheP aeruginosa crystal structure contains a putative DNAbinding a-helix H4 and a loop between helices H1 and H2both of which were proposed to be involved in bindingDNA through contacting bases in the major groove Thusa single dimer would protect two consecutive majorgrooves on the same face of DNA and the minimalrecognition unit of Fur should be close to 20 bp (9) OurM9 binding site models (19 bp a 9-1-9 inverted repeat) fitwith this single-dimer binding model closely Pohl et alalso proposed a two-dimer binding model in whichtwo Fur dimers bind two overlapping sites that are 5-bpapart (9) However in our whole genome scans andfootprint scans as well we did not observe any case of5-bp separated overlapping sites above 5 bits Instead wemostly found 3- or 6-bp separated overlapping sitesAbove five bits we found 95 cases of 3-bp spacing and92 cases of 6-bp spacingSeveral other consensus-based Fur binding site models

have been proposed to interpret a 19-bp consensuslsquoFur boxrsquo (Figure 6) (65) Within these two earlier

models the classical model and hexamer model(573754) interpreted the lsquoFur boxrsquo as a single recogni-tion unit of a Fur dimer Later Lavrrar and McIntoshsuggested that a 13-bp inverted repeat (6-1-6) is theminimal unit recognized by a single Fur dimer and thattwo overlapping lsquo6-1-6rsquo motifs correspond to the Fur boxwhich is required for high-affinity binding of Fur (5558)Baichoo and Helmann reinterpreted the Fur box as twooverlapping 7-1-7 inverted repeats with a 6-bp spacingand suggested that a 7-1-7 site but not 6-1-6 representsthe minimal recognition unit of Fur (40) Baichoo andHelmann also demonstrated that high-affinity bindingby Fur can happen on a single-dimer binding site ie aninverted 7-1-7 site

The 7-1-7 model basically agrees with our M9 models(Figure 6) The main difference is that the 7-1-7 model didnot count the weakly conserved bases at positions 9 8thorn8 and thorn9 in the B subtilis Fur binding site alignment(Figure 4C) As the E coli and P aeruginosa M9 modelshave highly conserved bases at these four positions(Figure 4A and B) and also the Fur binding mechanismsfor these three species are similar (9) we suggest thata minimal Fur binding unit of 19 bp (9-1-9) as representedby our M9 models is more reasonable

Relative to our M9 models the Fur box only representsthe internal part of two overlapping sites that areseparated by 6 bases ie from 9 to thorn9 of the M12logo (Figures 1A and 6) thus any predictions based on theFur box may not be accurate Two sites (exbD and ygaC)that were predicted by matching to the Fur box wereshown to be non-binding (Figure 3)

Many difficulties in understanding Fur binding sites canbe attributed to the choice of consensus sequences asa model (66) In contrast to the individual informationweight matrix model the consensus sequence methodignores the varying importance of bases by treatingmismatches equivalently In addition the consensusmethod does not have a criterion for an acceptablenumber of mismatches

Sequence walkers predicted by the 24-site M9 modelcover 83 of the footprints (Table 3 Figure 5Supplementary Figure S8) Similar results were obtainedfor both the P aeruginosa and B subtilis M9 models(data not shown) These results strongly suggest that ourM9 models can accurately predict Fur binding

Fur Binding Site Clusters

The whole genome scan with the 24-site M9model revealed363 Fur binding clusters in E coli most of which containmultiple walkers (up to 14) covering regions up to about140 bases This is comparable to the size of knownfootprints which ranges from 30 to 103 bases (Table 3Supplementary Figure S8) Most of the clusters (303 outof 363 83) were found to overlap with one or morestrong s70 promoters Indeed the four Fur clusters inthe middle of coding regions gspC garP yahA and fadDthat we tested by gel mobility shift assays (Figure 3) allhave one or more potential s70 sites (Ri gt 4 bits) exactlyoverlapping them (see Figures S10ndashS13) The function ofthese control elements is unknown

Fur boxClassical model 9-1-9Hexamer model 6-6-1-6Lavrrar model 6-1-6

Baichoo model 7-1-7

M9 model 9-1-9

GATAATGATAATCATTATC

========gtOlt========

=====gt=====gtOlt=====

=====gtOlt==========gtOlt=====

t=====gtOlt============gtOlt=====a

aat=====gtOlt================gtOlt=====att

Figure 6 Different Fur binding models to interpret the Fur box

6774 Nucleic Acids Research 2007 Vol 35 No 20

We correlated our whole genome scan with publishedmicroarray data about Fur in E coli (1360) The resultsshowed that all operons that are involved in ironmetabolism have a predicted strong Fur binding cluster(higher than 13 bits) suggesting direct repression ofthese genes by Fur In contrast only a few of otherFur-activated or repressed operons have a predicted Furcluster higher than 10 bits suggesting indirect control ofthese genes by Fur It has been shown that RNA generyhB directly targets and represses a set of other genes andryhB itself is directly repressed by Fur thus ryhB playsan important role in indirect activation of genes by Furin E coli (135760)

It has been suggested that in N meningitidis Fur canalso directly activate genes by binding to the regionupstream of the promoters (59) However our wholegenome analysis did not support this activation mechan-ism in E coli The E coli ryhB gene is located betweengenes yhhX and yhhY No corresponding homologs of thegenes ryhB yhhX or yhhY can be found in N meningitidisThis suggests that Fur activation mechanisms may bedifferent in different species

Disentangling Fur models

Each binding site that we have studied in detail withinformation theory has had unique properties andchallenges for analysis For example ribosome bindingsites were initially modeled as rigid objects (16) but whenenough sites became available it was clear that theShinendashDalgarno sequence was being lost because it has avariable distance to the initiation codon (67) and a flexiblemodel was required (48) The flexible model was latersuccessfully applied to s70 promoters (47) and that wasused to study Fur sites Likewise information theoryanalysis revealed that Fis binding sites are commonlyfound to overlap each other (68) leading to someuncertainty as to how much nearby sites influence thesites being studied in the alignments Overlapping Fursites that are bound by Fur protein dimers create an evenmore complex situation

Although the 24-site M9 model was quite successful themodel is comprised of information from multiple Fur sitesbecause the Fur dimers overlap on the DNA Since theclusters frequently display 6-base separation of sitespositions 3 to thorn9 of the model contain data frompositions 9 to thorn3 of the adjacent Fur sites eg see thelogo of Figure 4 and the M9 walkers of Figure 2AIn addition information from sites separated by 3 basesand the complements of all of these also contribute to theoverall information content of the model It is not obvioushow to disentangle these effects We suggest that the M9model can be scanned across the genome to locate strongsingle-dimer binding sites which could then be experi-mentally confirmed and used to construct a disentangledmodel

Unfortunately the distribution of the individual infor-mation in these sites would be determined by the modelused to locate them and this could introduce biases Forexample the mean of the distribution can be altered justby selecting different subsets of sites from the distribution

A similar bias has occurred in binding site databaseswhere the consensus sequence was used to locate newsites (23) Thus although the sequence logo would nolonger contain self-overlapping data it would not neces-sarily represent the natural distribution of site strengthsTo approach such an ultimate model may require furtherscanning of footprinted regions Incorporation of theweaker sites in the footprints would of course re-entanglethe model so such scans might be best used only todetermine whether the model can detect the weaker sitesA viable solution may be to obtain all single-dimerbinding sites in the genome that have more than zero bitsand can be confirmed experimentally If the resultingmodel matches the observed footprints it may be consi-dered complete On the other hand if the resultingmodel does not match the footprints the mode of bindingby Fur could be different between single dimers andclusters

SUPPLEMENTARY DATA

Supplementary data are available at NAR Online

ACKNOWLEDGEMENTS

We thank Tom OrsquoHalloran and Caryn Outten for Furprotein Pete Rogan for helping to develop the local bestconcept Elaine Bucheimer for writing the localbestprogram and Lakshmanan Iyer Brent Jewett DanielleNeedle Xiao Ma Shu Ouyang Denise Rubens andBruce Shapiro for their helpful comments and discussionsKAL also thanks the National Cancer Institute atFrederick for sponsoring the Werner H Kirsten StudentIntern Program This publication has been funded in partwith Federal funds from the National Cancer InstituteNational Institutes of Health under contract NO1-CO-12400 The content of this publication does not necessarilyreflect the views or policies of the Department of Healthand Human Services nor does mention of trade namescommercial products or organizations imply endorsementby the US Government This research was supported inpart by the Intramural Research Program of the NIHNational Cancer Institute Center for Cancer ResearchFunding to pay the Open Access publication charges forthis article was provided by National Cancer Institute

Conflict of interest statement None declared

REFERENCES

1 StojiljkovicI BaumlerAJ and HantkeK (1994) Fur regulon ingram-negative bacteria Identification and characterization of newiron-regulated Escherichia coli genes by a fur titration assay[published erratum appears in J Mol Biol 1994 Jul 15240(3)271]J Mol Biol 236 531ndash545

2 Le CamE FrechonD BarrayM FourcadeA and DelainE(1994) Observation of binding and polymerization of Fur repressoronto operator-containing DNA with electron and atomic forcemicroscopes Proc Natl Acad Sci USA 91 11816ndash11820

3 CalderwoodSB and MekalanosJJ (1987) Iron regulation ofShiga-like toxin expression in Escherichia coli is mediated by thefur locus J Bacteriol 169 4759ndash4764

Nucleic Acids Research 2007 Vol 35 No 20 6775

4 NiederhofferEC NaranjoCM BradleyKL and FeeJA (1990)Control of Escherichia coli superoxide dismutase (sodA and sodB)genes by the ferric uptake regulation (fur) locus J Bacteriol 1721930ndash1938

5 EscolarL Perez-MartinJ and de LorenzoV (1998) Binding of theFur (Ferric Uptake Regulator) repressor of Escherichia coli toarrays of the GATAAT sequence J Mol Biol 283 537ndash547

6 BaggA and NeilandsJB (1987) Ferric uptake regulation proteinacts as a repressor employing iron (II) as a cofactor to bind theoperator of an iron transport operon in Escherichia coliBiochemistry 26 5471ndash5477

7 de LorenzoV WeeS HerreroM and NeilandsJB (1987)Operator sequences of the aerobactin operon of plasmid ColV-K30binding the ferric uptake regulation (fur) repressor J Bacteriol169 2624ndash2630

8 AlthausEW OuttenCE OlsonKE CaoH andOrsquoHalloranTV (1999) The ferric uptake regulation (Fur) repressoris a zinc metalloprotein Biochemistry 38 6559ndash6569

9 PohlE HallerJC MijovilovichA Meyer-KlauckeWGarmanE and VasilML (2003) Architecture of a protein centralto iron homeostasis crystal structure and spectroscopic analysis ofthe ferric uptake regulator Mol Microbiol 47 903ndash915

10 KammlerM SchonC and HantkeK (1993) Characterization ofthe ferrous iron uptake system of Escherichia coli J Bacteriol 1756212ndash6219

11 DesaiPJ AngererA and GencoCA (1996) Analysis of Furbinding to operator sequences within the Neisseria gonorrhoeae fbpApromoter J Bacteriol 178 5020ndash5023

12 HeidrichC HantkeK BierbaumG and SahlHG (1996)Identification and analysis of a gene encoding a Fur-likeprotein of Staphylococcus epidermidis FEMS Microbiol Lett 140253ndash259

13 McHughJP Rodriguez-QuinonesF Abdul-TehraniHSvistunenkoDA PooleRK CooperCE and AndrewsSC(2003) Global iron-dependent gene regulation in Escherichia coliA new mechanism for iron homeostasis J Biol Chem 27829478ndash29486

14 ShannonCE (1948) A mathematical theory of communicationBell Syst Tech J 27 379ndash423 623ndash656 httpcmbell-labscomcmmswhatshannondaypaperhtml

15 PierceJR (1980) An Introduction to Information Theory SymbolsSignals and Noise 2nd edn Dover Publications Inc New York

16 SchneiderTD StormoGD GoldL and EhrenfeuchtA (1986)Information content of binding sites on nucleotide sequencesJ Mol Biol 188 415ndash431 httpwwwccrnpncifcrfgov~tomspaperschneider1986

17 SchneiderTD and MastronardeD (1996) Fast multiple alignmentof ungapped DNA sequences using information theory and arelaxation method Discrete Appli Math 71 259ndash268 httpwwwccrnpncifcrfgov~tomspapermalign

18 SchneiderTD and StephensRM (1990) Sequence logos a newway to display consensus sequences Nucleic Acids Res 186097ndash6100 httpwwwccrnpncifcrfgov~tomspaperlogopaper

19 SchneiderTD (1997) Information content of individual geneticsequences J Theor Biol 189 427ndash441 httpwwwccrnpncifcrfgov~tomspaperri

20 SchneiderTD (1997) Sequence walkers a graphical method todisplay how binding proteins interact with DNA or RNAsequences Nucleic Acids Res 25 4408ndash4415 httpwwwccrnpncifcrfgov~tomspaperwalker erratum NAR 26(4)1135 1998

21 PaninaEM MironovAA and GelfandMS (2001) Comparativeanalysis of FUR regulons in gamma-proteobacteria Nucleic AcidsRes 29 5195ndash5206

22 SchneiderTD (2000) Evolution of biological informationNucleic Acids Res 28 2794ndash2799 httpwwwccrnpncifcrfgov~tomspaperev

23 VyhlidalCA RoganPK and LeederJS (2004) Development andrefinement of pregnane X receptor (PXR) DNA binding site modelusing information theory insights into PXR-mediated gene regula-tion J Biol Chem 279 46779ndash46786

24 ZhengM DoanB SchneiderTD and StorzG (1999) OxyR andSoxRS regulation of fur J Bacteriol 181 4639ndash4643 httpwwwccrnpncifcrfgov~tomspaperoxyrfur

25 HengenPN BartramSL StewartLE and SchneiderTD (1997)Information analysis of Fis binding sites Nucleic Acids Res 254994ndash5002 httpwwwccrnpncifcrfgov~tomspaperfisinfo

26 WoodTI GriffithKL FawcettWP Jair K-W SchneiderTDand WolfRE (1999) Interdependence of the position andorientation of SoxS binding sites in the transcriptional activation ofthe class I subset of Escherichia coli superoxide-inducible promotersMol Microbiol 34 414ndash430

27 ZhengM WangX DoanB LewisKA SchneiderTD andStorzG (2001) Computation-directed identification of OxyR-DNAbinding sites in Escherichia coli J Bacteriol 183 4571ndash4579

28 RoganPK FauxBM and SchneiderTD (1998) Informationanalysis of human splice site mutations Hum Mutat 12 153ndash171Erratum in Hum Mutat 199913(1)82 httpwwwccrnpncifcrfgov~tomspaperrfs

29 ChenZ and SchneiderTD (2005) Information theory basedT7-like promoter models classification of bacteriophages anddifferential evolution of promoters and their polymerasesNucleic Acids Res 33 6172ndash6187 httpwwwccrnpncifcrfgov~tomspaperst7like

30 ChenZ and SchneiderTD (2006) Comparative analysis of tandemT7-like promoter containing regions in enterobacterial genomesreveals a novel group of genetic islands Nucleic Acids Res 341133ndash1147 httpwwwccrnpncifcrfgov~tomspaperst7island

31 SchneiderTD (1996) Reading of DNA sequence logos predictionof major groove binding by information theory Meth Enzymol274 445ndash455 httpwwwccrnpncifcrfgov~tomspaperoxyr

32 GriggsDW and KoniskyJ (1989) Mechanism for iron-regulatedtranscription of the Escherichia coli cir gene metal-dependentbinding of Fur protein to the promoters J Bacteriol 1711048ndash1052

33 AngererA and BraunV (1998) Iron regulates transcription of theEscherichia coli ferric citrate transport genes directly and throughthe transcription initiation proteins Arch Microbiol 169 483ndash490

34 de LorenzoV HerreroM GiovanniniF and NeilandsJB (1988)Fur (ferric uptake regulation) protein and CAP (catabolite-activatorprotein) modulate transcription of fur gene in Escherichia coliEur J Biochem 173 537ndash546

35 HuntMD PettisGS and McIntoshMA (1994) Promoter andoperator determinants for fur-mediated iron regulation in thebidirectional fepA-fes control region of the Escherichia colienterobactin gene system J Bacteriol 176 3944ndash3955

36 TardatB and TouatiD (1993) Iron and oxygen regulation ofEscherichia coli MnSOD expression competition between the globalregulators Fur and ArcA for binding to DNA Mol Microbiol 953ndash63

37 EscolarL Perez-MartinJ and de LorenzoV (2000) Evidence ofan unusually long operator for the fur repressor in the aerobactinpromoter of Escherichia coli J Biol Chem 275 24709ndash24714

38 YoungGM and PostleK (1994) Repression of tonB transcriptionduring anaerobic growth requires Fur binding at the promoterand a second factor binding upstream Mol Microbiol 11943ndash954

39 BrickmanTJ OzenbergerBA and McIntoshMA (1990)Regulation of divergent transcription from the iron-responsivefepB-entC promoter-operator regions in Escherichia coliJ Mol Biol 212 669ndash682

40 BaichooN and HelmannJD (2002) Recognition of DNA by Fura reinterpretation of the Fur box consensus sequence J Bacteriol184 5826ndash5832

41 OchsnerUA and VasilML (1996) Gene repression by the ferricuptake regulator in Pseudomonas aeruginosa cycle selection of iron-regulated genes Proc Natl Acad Sci USA 93 4409ndash4414

42 BaichooN WangT YeR and HelmannJD (2002) Globalanalysis of the Bacillus subtilis Fur regulon and the iron starvationstimulon Mol Microbiol 45 1613ndash1629

43 ChristoffersenCA BrickmanTJ Hook-BarnardI andMcIntoshMA (2001) Regulatory architecture of the iron-regulatedfepD-ybdA bidirectional promoter region in Escherichia coliJ Bacteriol 183 2059ndash2070

44 ChaiS WelchTJ and CrosaJH (1998) Characterization of theinteraction between Fur and the iron transport promoter of thevirulence plasmid in Vibrio anguillarum J Biol Chem 27333841ndash33847

6776 Nucleic Acids Research 2007 Vol 35 No 20

45 EscolarL Perez-MartinJ and de LorenzoV (1998) Coordinatedrepression in vitro of the divergent fepA-fes promoters ofEscherichia coli by the iron uptake regulation (Fur) proteinJ Bacteriol 180 2579ndash2582

46 EscolarL de LorenzoV and Perez-MartinJ (1997)Metalloregulation in vitro of the aerobactin promoter of Escherichiacoli by the Fur (ferric uptake regulation) protein Mol Microbiol26 799ndash808

47 ShultzabergerRK ChenZ LewisKA and SchneiderTD (2007)Anatomy of Escherichia coli s70 promoters Nucleic Acids Res 35771ndash788 httpwwwccrnpncifcrfgov~tomspaperflexprom

48 ShultzabergerRK BucheimerRE RuddKE andSchneiderTD (2001) Anatomy of Escherichia coli ribosomebinding sites J Mol Biol 313 215ndash228 httpwwwccrnpncifcrfgov~tomspaperflexrbs

49 HiraoI KawaiG YoshizawaS NishimuraY IshidoYWatanabeK and MiuraK (1994) Most compact hairpin-turnstructure exerted by a short DNA fragment d(GCGAAGC) insolution an extraordinarily stable structure resistant to nucleasesand heat Nucleic Acids Res 22 576ndash582

50 Eick-HelmerichK and BraunV (1989) Import of biopolymers intoEscherichia coli nucleotide sequences of the exbB and exbD genesare homologous to those of the tolQ and tolR genes respectivelyJ Bacteriol 171 5117ndash5126

51 VassinovaN and KozyrevD (2000) A method for direct cloning ofFur-regulated genes identification of seven new Fur-regulated lociin Escherichia coli Microbiology 146 3171ndash3182

52 ToledanoMB KullikI TrinhF BairdPT SchneiderTD andStorzG (1994) Redox-dependent shift of OxyR-DNA contactsalong an extended DNA binding site a mechanism for differentialpromoter selection Cell 78 897ndash909

53 BlattnerFR PlunkettG III BlochCA PernaNT BurlandVRileyM Collado-VidesJ GlasnerJD RodeCK et al (1997)The complete genome sequence of Escherichia coli K-12 Science277 1453ndash1474

54 BaggA and NeilandsJB (1987) Molecular mechanism of regula-tion of siderophore-mediated iron assimilation Microbiol Rev 51509ndash518

55 LavrrarJL and McIntoshMA (2003) Architecture of a Furbinding site a comparative analysis J Bacteriol 185 2194ndash2202

56 OrchardK and MayGE (1993) An EMSA-based method fordetermining the molecular weight of a protein ndash DNA complexNucleic Acids Res 21 3335ndash3336

57 MasseE and GottesmanS (2002) A small RNA regulates theexpression of genes involved in iron metabolism in Escherichia coliProc Natl Acad Sci USA 99 4620ndash4625

58 LavrrarJL ChristoffersenCA and McIntoshMA (2002)FurndashDNA interactions at the bidirectional fepDGC-entS promoterregion in Escherichia coli J Mol Biol 322 983ndash995

59 DelanyI RappuoliR and ScarlatoV (2004) Fur functions as anactivator and as a repressor of putative virulence genes in Neisseriameningitidis Mol Microbiol 52 1081ndash1090

60 MasseE VanderpoolCK and GottesmanS (2005) Effect ofRyhB small RNA on global iron use in Escherichia coliJ Bacteriol 187 6962ndash6971

61 ArgamanL HershbergR VogelJ BejeranoG WagnerEGMargalitH and AltuviaS (2001) Novel small RNA-encodinggenes in the intergenic regions of Escherichia coli Curr Biol 11941ndash950

62 WassarmanKM RepoilaF RosenowC StorzG andGottesmanS (2001) Identification of novel small RNAs usingcomparative genomics and microarrays Genes Dev 15 1637ndash1651

63 RobisonK McGuireAM and ChurchGM (1998) A compre-hensive library of DNA-binding site matrices for 55 proteins appliedto the complete Escherichia coli K-12 genome J Mol Biol 284241ndash254

64 SalgadoH Santos-ZavaletaA Gama-CastroS Millan-ZarateDDiaz-PeredoE Sanchez-SolanoF Perez-RuedaEBonavides-MartinezC and Collado-VidesJ (2001) RegulonDB(version 32) transcriptional regulation and operon organization inEscherichia coli K-12 Nucleic Acids Res 29 72ndash74

65 AndrewsSC RobinsonAK and Rodriguez-QuinonesF (2003)Bacterial iron homeostasis FEMS Microbiol Rev 27 215ndash237

66 SchneiderTD (2002) Consensus Sequence Zen ApplBioinformatics 1 111ndash119 httpwwwccrnpncifcrfgov~tomspaperszen

67 RuddKE and SchneiderTD (1992) Compilation of E coliribosome binding sites In MillerJH (ed) A Short Course inBacterial Genetics A Laboratory Manual and Handbook forEscherichia coli and Related Bacteria Cold Spring HarborCold Spring Harbor Laboratory Press New York pp 1719ndash1745

68 HengenPN LyakhovIG StewartLE and SchneiderTD(2003) Molecular flip-flops formed by overlapping Fis sitesNucleic Acids Res 31 6663ndash6673

69 FeeJA (1991) Regulation of sod genes in Escherichia colirelevance to superoxide dismutase function Mol Microbiol 52599ndash2610

70 IgarashiK SaishoT YuguchiM and KashiwagiK (1997)Molecular mechanism of polyamine stimulation of thesynthesis of oligopeptide-binding protein J Biol Chem 2724058ndash4064

71 YoshidaM MeksuriyenD KashiwagiK KawaiG andIgarashiK (1999) Polyamine stimulation of the synthesis ofoligopeptide-binding protein (OppA) Involvement of a structuralchange of the Shine-Dalgarno sequence and the initiation codonaug in oppa mRNA J Biol Chem 274 22723ndash22728

72 LitwinCM and CalderwoodSB (1993) Role of iron in regulationof virulence genes Clin Microbiol Rev 6 137ndash149

73 MakuiH RoigE ColeST HelmannJD GrosP andCellierMF (2000) Identification of the Escherichia coli K-12Nramp orthologue (MntH) as a selective divalent metal iontransporter Mol Microbiol 35 1065ndash1078

74 PatzerSI and HantkeK (2001) Dual repression by Fe(2+)-Furand Mn(2+)-MntR of the mntH gene encoding an NRAMP-likeMn(2+) transporter in Escherichia coli J Bacteriol 1834806ndash4813

75 FurrerJL SandersDN Hook-BarnardIG and McIntoshMA(2002) Export of the siderophore enterobactin in Escherichia coliinvolvement of a 43 kDa membrane exporter Mol Microbiol 441225ndash1234

76 CunninghamL GruerMJ and GuestJR (1997) Transcriptionalregulation of the aconitase genes (acnA and acnB) of Escherichiacoli Microbiology 143 ( Pt 12) 3795ndash3805

77 FuentesAM Diaz-MejiaJJ Maldonado-RodriguezR andAmabile-CuevasCF (2001) Differential activities of theSoxR protein of Escherichia coli SoxS is not required for geneactivation under iron deprivation FEMS Microbiol Lett 201271ndash275

78 VargheseS TangY and ImlayJA (2003) Contrasting sensitivitiesof Escherichia coli aconitases A and B to oxidation and irondepletion J Bacteriol 185 221ndash230

79 ZavitzKH DiGateRJ and MariansKJ (1991) The PriB andPriC replication proteins of Escherichia coli Genes DNAsequence overexpression and purification J Biol Chem 26613988ndash13995

80 SandlerSJ (2005) Requirements for replication restart proteinsduring constitutive stable DNA replication in Escherichia coli K-12Genetics 169 1799ndash1806

81 SchneiderTD (2001) Strong minor groove base conservation insequence logos implies DNA distortion or base flipping duringreplication and transcription initiation Nucleic Acids Res 294881ndash4891 httpwwwccrnpncifcrfgov~tomspaperbaseflip

Nucleic Acids Research 2007 Vol 35 No 20 6777

Page 4: Discovery of Fur binding site clusters in Escherichia coli ...biitcomm/research/references... · using an fhuF:lacZ fusion and Fur consensus sequence-containing plasmid titrant on

Because Fur binds as a dimer (954) the model includedboth the footprinted Fur sequences and their comple-ments The sequences were aligned using the programmalign which maximizes the information content within aregion of the alignment (a lsquowindowrsquo) by shuffling the

sequences back and forth (17) Multiple alignmentwith a window size of (12 thorn12) gave a sequencelogo that shows base conservation in the range of(12 thorn12) (Figure 1A) For convenience we named thislogo M12

A

B

C

M12208 bits(minus12 +12)

M9181 bits(minus9 +9)

M6154 bits(minus6 +6)

===========gtolt===========

========gtolt========

=====gtolt=====

========gtolt================gtolt========

=====gtolt==========gtolt=====

0

1

2

bit

s

5prime

minus20

GTAC

minus19

minus18

CGTA

minus17

CGTA

minus16

GACT

minus15

ACGT

minus14

GTCA

minus13

minus12

GTA

minus11

GTA

minus10

GACT

minus9CTAG

minus8

TCGA

minus7

GCTA

minus6

TCA

minus5

TA

minus4

CT

minus3

TAG

minus2

CGA

minus1

CGT

0

TA

1

GCA

2

GCT

3

ATC

4

GA

5

AT

6

GAT

7

GCTA

8

AGCT

9

GATC

10

CTGA

11

CAT

12

CAT

13 14

CAGT

15

TGCA

16

CTGA

17

GCAT

18

GCAT

19 20

CATG

3prime

0

1

2

bit

s

5prime

minus20

CGTA

minus19

minus18

minus17

GTCA

minus16

GCAT

minus15

GCTA

minus14

GTA

minus13

GACT

minus12

CAGT

minus11

CGA

minus10

GCAT

minus9

TA

minus

minus8

GTA

minus7

GCAT

minus6

CAG

minus5

CA

minus4

GAT

minus3

TA

minus2

GCA

minus1

GCT

0 1

CGA

2

CGT

3

AT

4

CTA

5

GT

6

GTC

7

GCTA

8CAT

9AT10

CGTA

11

GACT

12

GTCA

13

TCGA

14

CAT

15

CGAT

16

CGTA

17

CAGT

18 19 20

GCAT

3prime

0

1

2

bit

s

5prime minus20

GCA

minus19

GCAT

minus18

GCTA

minus

minus17

GCTA

minus16

GCAT

minus15

CAGT

minus14

TGCA

minus13

GTAC

minus12

GCTA

minus11

GTA

minus10

GACT

minus9

CAG

minus8

CTGA

minus7

GCAT

minus6

A

minus5

TA

minus4

CT

minus3

TCAG

minus2

CGA

minus1

AGT

0

TA

1

TCA

2

GCT

3

GATC

4

GA

5

AT

6

T7

GCTA

8

GACT

9

GATC

10

TCGA

11

CAT

12

GCAT

13

CTGA

14

ACGT

15

GTCA

16

GTCA

17

GCAT

18

GCAT

19

CGTA

20

CAGT

3prime

Figure 1 Different alignments of footprinted E coli Fur binding sites Sequence alignments were done using the program malign with differentwindow sizes [from (15 +15) to (5 +5)] (17) on 11 footprinted E coli Fur binding sequences and their complements (see Supplementary FigureS1 for sequences) Three classes of alignments M12 (A) M9 (B) and M6 (C) were obtained by using window sizes of (12 +12) (9 +9) and(6 +6) respectively In these logos the height of each letter is proportional to the frequency of that base at each position and the height of theletter stack is the conservation in bits (18) The dashed sine wave on each logo represents the 106 base helical twist of B-form DNA (3181) Thedouble dashed arrows on the top of each logo mark the inverted repeats in each model Note that the M12 logo corresponds to two overlapping M9sites as indicated by the two red double dashed arrows below the M12 logo A similar relationship occurs between the M9 and M6 logos

Nucleic Acids Research 2007 Vol 35 No 20 6765

The M12 logo appears to consist of two overlappingdirect repeats one is from 12 to thorn6 and the other from6 to thorn12 Because the multiple alignment process canincorporate one or the other of these repeats this suggeststhat different alignments may be obtained when otherwindows are used to align the sequences Because two Furdimers can bind overlapping sites with 6-base separation(4055) we then multiply aligned the sequences by usingwindow sizes ranging from (15 thorn15) to (5 thorn5) Thesesizes should cover the sites recognized by one or twoFur dimers (9) A total of seven different alignments wereobtained the corresponding sequence logos are shown inSupplementary Figure S2 According to the patternsshown in the logos the seven alignments that wereobtained can be further consolidated into and assignedto three basic alignment classes these alignment classesare represented by logos M12 M9 and M6 (Figure 1)Each of the three alignments itself is an inverted repeatthe M12 logo also appears to consist of two overlappingM9 sites (from 12 to thorn6 and from 6 to thorn12) and theM9 logo looks like two overlapping M6 sites (from 9to thorn3 and from 3 to thorn9) (Figure 1)To determine which of the three alignment classes best

represents a single-dimer binding model some publishedexperimental data was analyzed Lavrrar and McIntosh(55) used the Ferguson method (56) to determine howmany E coli Fur dimers can bind on five synthetic oligosand two natural sites fepB and entS We applied the threemodels to their sequences and used sequence walkersto show the predicted sites of each model A full list ofthe predictions (with walkers gt0 bits) is given inSupplementary Figure S3 The results show that themodels frequently found overlapping sites that areseparated by 3 or 6 bases The sites that are 6 basesapart can both be strong while 3-base separated siteshave one strong site and one weak site (SupplementaryFigure S3) Furthermore it has been shown that Furdimers can simultaneously bind two overlapping sites with6-bp spacing (40) but there is no experimental datashowing that 3-base separated overlapping sites can bebound at the same time Therefore to analyze the resultswe only counted major walkers (strong walkers that areseparated by 6 bases) for each model in each oligo

A summary of the results (with major walkers) is availablein Table 1 two examples of the prediction (with majorwalkers) are shown in Figure 2A and B The results showthat the M9 model made correct predictions for five ofthe seven oligos while the M12 and M6 models bothmade only two correct predictions Furthermore the M12model always predicted one less site than the M9 modelThese results suggest that the M9 model may be a single-dimer binding model the M12 model may representa two-dimer binding model while the M6 alignment maybe just a result of compression of the binding sequencesinto a small alignment window

Confirmation of 13 predicted sites and model refinement

Since our E coli Fur model had only 11 sites we wantedto refine the model by including more sites A genome scanwith the 11-site M9 model revealed 389 sites above 11 bits(the lowest information of a site in the model is 111 bits)Within these 13 sites (from 12 to 26 bits) were selectedfor experimental confirmation (Figure 3 SupplementaryTable S1 and Figure S1) These include eight sites locatedin promoter regions [yoeA fepD-entS (43) gpmA (= pgm)(51) ryhB (= yhhX-yhhY) (5157) fhuA (13) nohA (51)oppA (13) and fhuF (51)] and four sites inside genes(gspC garP yahA and fadD) We also included two sitesthat were predicted by consensus models to be in ygaCand exbD (5051) since our model only detected a weaksite (14 bits) in ygaC and a negative site (50 bits) inexbD (Figure 3) Gel mobility shift assays showed that all13 sites predicted by the M9 model shifted when incubatedwith Fur protein while the exbD site did not shift and theygaC site only showed extremely weak binding (Figure 3)

McHugh et al (13) found that both exbB and exbDwere regulated by Fur but their model did not detect abinding site in the exbB promoter while our informationtheory model successfully predicted that Fur binding siteand our gel shift results confirmed this (Figure 3)The exbD gene is located only seven bases downstreamof the exbB gene Both genes have the same orientationso they may belong to a single operon cotranscribed fromthe exbB promoter and coregulated by Fur

To strengthen our model we added the 13 confirmedsites to the 11 previously footprinted sites (Figure 4A)

Table 1 Information theory test of 11-site E coli Fur models M12 M9 and M6

Oligo1 Number of observed1

Fur dimers (nM)Number of M122

walkers (bits)Number of M92

walkers (bits)Number of M62

walkers (bits)

F-F 1 (25000) f 1 (55) 2 (90 94) f 1 (115)F-F-F 2 (3610) 1 (148) f 2 (118 153) f 2 (119 99)F-F-F-F 4 (630) 2 (166 192) 3 (118 189 153) 3 (119 119 99)F-F-x-R 2 (095) f 2 (221 43) f 2 (177 176) 1 (204)F-F-x-R-R 3 (078) 2 (296 227) f 3 (177 275 108) 2 (204 164)fepB 2 (860) 1 (157) f 2 (109 82) 1 (152)entS 3 (301) 2 (156 172) f 3 (75 172 95) 2 (83 122)

1Oligos and observed Fur dimers are from (55) The apparent Kd is also given (nM in parentheses) for each oligo2Number of sequence walkers of each model followed by the information (bits) of each walker are shown for each sequence The predictions thatmatch the number of observed Fur dimers are marked with a solid circle while those that do not are marked with an open circle Only majorwalkers (strong walkers with 6-bp separation) are counted (see the text) A full depiction of all walkers (gt0 bits) can be found in SupplementaryFigures S3 and S5

6766 Nucleic Acids Research 2007 Vol 35 No 20

As with the 11 sites multiple alignment of the 24 sitesusing different window sizes also gave three classes ofalignments M12 M9 and M6 (Supplementary Figure S4)When these models were tested against the same set ofsequences as that used to test the 11-site models (Table 1Supplementary Figure S3) similar but cleaner resultswere obtained ie the weak walkers that were not countedpreviously become weaker (some are even below 0 bits)but the strong walkers still have similar Ri values(Supplementary Figure S5) This confirms that ourcounting of major walkers predicted by the initial 11-sitemodels was reasonable

P aeruginosa and B subtilis Fur models

To compare E coli Fur models with other bacterialFur models we also built P aeruginosa and B subtilis Furbinding site models As with E coli Fur binding siteswe applied the same method to align footprinted Furbinding sites from P aeruginosa (41) and B subtilis (42)For both bacteria we obtained two classes of alignments

(Supplementary Figure S6) which correspond to theE coli M9 and M6 logos (Figure 1B and C) respectivelyNo M12 alignments were obtained for these two bacteriaprobably because some sites used to make the modelsare single-dimer binding sites (see Discussion Section)Both P aeruginosa and B subtilis M9 models arehighly similar to the E coli M9 model (Figure 4) TheP aeruginosa M6 logo looks almost identical to theE coli M6 logo while the B subtilis M6 logo is lesssimilar to the E coli M6 logo (Figure 1C SupplementaryFigure S6B and D)As with Lavrrar and McIntoshrsquos work on E coli Fur

binding sites (55) Baichoo and Helmann (40) performedsimilar experiments on eight synthetic oligos and twonatural sites (feuA and dhbA) with B subtilis Fur anddetermined how many Fur dimers can bind each of theseoligos We tested the B subtilis M9 and M6 modelswith these 10 sequences (Table 2 Figure 2C and Dand Supplementary Figure S7) The results show that theM9 model made correct predictions for all 10 sequenceswhile the M6 model was correct for only four oligos

A

B

fepB binds 2 Fur dimers

entS binds 3 Fur dimers

C

D

feuA binds 1 Fur dimer

dhbA binds 2 Fur dimers

10 20 30 5 g g a a t t c g a a a a t g a g a a g c a t t a t t g g a t c c c g 3

M12 157 bits

M9 109 bits

M9 82 bits

M6 152 bits

10 20 30 40 5 g g a a t t c g a t a a t g a a a t t a a t t a t c g t t a t c g g a t c c c g 3

M12 156 bits

M12 172 bits

M9 75 bits

M9 172 bits

M9 95 bits

M6 83 bits

M6 122 bits

10 20 30 5 a t t c c a a t t g a t a a t a g t t a t c a a t t g a a c a 3

M9 223 bits

M6 103 bits

M6 81 bits

10 20 30 5 a t t g a t a a t g a t a a t c a t t a t c a a t a g a t t g 3

M9 256 bits

M9 299 bits

M6 149 bits

M6 208 bits

M6 171 bits

Figure 2 Sequence walkers to test E coli (A and B) and B subtilis (C and D) Fur models (A and B) The three E coli Fur models (M12 M9 andM6) were tested with two E coli Fur binding sites fepB and entS (55) Sequence walkers of each model (green M12 red M9 blue M6) are shownfor each site (C and D) Two B subtilis Fur models (M9 and M6) were tested with two B subtilis Fur binding sites feuA and dhbA (40) Gray barsindicate natural sequence surrounded by synthetic DNA Different colored rectangles indicate different Fur models (by hue green M12 red M9blue M6) and their strengths (by saturation) (30) Only major walkers are shown in these figures for a full list of walkers (Rigt0 bits) please seeSupplementary Figures S3 S5 and S7

Nucleic Acids Research 2007 Vol 35 No 20 6767

For most of the oligos the M6 model predicted one moresite than the M9 model did Similar to our observationsin E coli these results strongly suggest that the M9 modelis a single-dimer binding model while the M6 alignment isdue to compression of the binding sequencesRelative to E coli Fur (NP_415209) the P aeruginosa

Fur (NP_253452) is moderately diverged (58 identical inprotein sequence) and the B subtilis Fur (NP_390233) isdistantly diverged (33 identity) However a comparisonof the three M9 models from these three species showsthat these models are highly similar to each other(Figure 4) suggesting that the FurndashDNA recognitionmechanism is conserved in even distantly related species

Scans of footprinted regions

Gel shift experiments cannot give precise Fur bindingregions Originally using the 11-site M9 model and laterusing the 24-site M9 E coli Fur model we predicted Fursites in the fhuF promoter region and then to validate ourmodel we performed DNase I footprinting on that regionTo further confirm the 24-site M9 model we also scannedit across all other available footprinted E coli Fur bindingregions and correlated our predictions with the publishedfootprints Fur has been documented to exhibit lsquosecondaryfootprintingrsquo protecting extended regions of DNA underhigher protein concentrations (Supplementary Figure S8)(333646) To reveal the extended regions we used a cutoffat zero bits since that is the theoretical lowest informationfor binding (19)The E coli 24-site M9 model predicts two separated

Fur binding clusters in the fhuF promoter region onecluster contains three strong sites of 161 208 and 223bits and the other contains two weaker sites of 146 and

93 bits (Figure 5A) The fhuF oligo used in the gel shiftexperiments in Figure 3 only contains the strong Furcluster To determine if both predicted clusters are pro-tected by Fur we performed DNase I footprinting on alarger fhuF promoter region (see Materials and MethodsSection) that contains both predicted Fur binding clustersThe results show that there are indeed two Fur-protectedregions in the fhuF promoter one has high affinity withFur (protected by 50 nM of Fur) and the other hasa lower affinity (weakly protected by 50 nM of Fur)(Figure 5B) The predicted strong Fur binding clustercovers 91 of the high-affinity footprinted region and theweak Fur cluster covers 66 of the low-affinity region(Table 3 Figure 5A)

There are 11 other footprinted E coli Fur bindingregions including the 10 regions used to build the initial11-site models and the fepD-entS promoter region (58)When these footprints were scanned similar coveragewas obtained (Table 3 Supplementary Figure S8)All 11 footprinted sequences displayed clusters of multipleoverlapping Fur walkers in these clusters the majorityof the walkers were separated by six bases some wereseparated by three bases (Table 3 SupplementaryFigure S8) For 7 of the 11 regions (fepA-fes fepB entCfur tonB cir and iucA) sequence walkers cover 83(72ndash93) of the footprints No sequence walkers coverthe secondary footprints in the fecA fepD-entS and sodAregions resulting in low coverage of these three footprints(66 54 and 56 respectively) The secondary sodAfootprint was only protected with a high concentration ofpurified Fur (200 and 500 nM) it was not protected bycrude extracts containing overproduced Fur (36) In theother region fecI several low-information content

A

B

yoeA fepD-entS gpmA ryhB fhuA nohA oppA

bits 257 172 243 220 161 205 187 11-site M9274 194 274 252 195 223 215 24-site M9

Fur minus + minus + minus + minus + minus + minus + minus +

gspC garP yahA fadD ygaC exbB exbD fhuF

bits 122 122 163 197 14 175 minus50 214 11-site M9144 138 186 217 minus34 191 minus71 223 24-site M9

Fur minus + minus + minus + minus + minus + minus + minus + minus +

Figure 3 Gel mobility shift assays to test predicted Fur binding sites Oligonucleotides containing predicted Fur binding sites were incubated without() or with (+) 150 nM E coli Fur protein and gel electrophoresed to test the 11-site M9 model (Figure 1B) Below each set of lanes is the strengthin bits of the strongest sequence walker found on each oligo by the 11-site M9 model Predictions by the 24-site M9 model which includes 13 of thesites tested here (not ygaC or exbD) are also given for comparison (A) The first set of oligos contain seven predicted Fur sites located in promoterregions (yoeA fepD-entS gpmA ryhB fhuA nohA and oppA) (B) The second set of oligos contain four predicted Fur sites located within genes(gspC garP yahA and fadD See Supplementary Figures S10ndashS13 for sequence walkers) two sites located in promoter regions (exbB and fhuF)and two consensus-predicted sites (ygaC and exbD) (5051) As expected all oligos that contain predicted sites (by the 11-site M9 model) exhibit oneor more mobility shifts following incubation with 150 nM Fur the oligo ygaC (14 bits) shows extremely weak binding and exbD (50 bits)does not shift

6768 Nucleic Acids Research 2007 Vol 35 No 20

walkers (02 and 08 bits) extend past the range of thefootprint resulting in 144 coverage of the footprint

In a previously published study Escolar et al synthe-sized oligonucleotides that contained repeats of thesequence GATAAT and determined Fur binding byfootprint experiments (5) No footprint was seen withone insert correspondingly no sequence walkers wereobserved in that sequence (Supplementary Figure S9)For the oligos with three four and five inserts sequencewalkers cover 83ndash90 of the footprints The 2-insertsequence had a weak interaction with Fur at high protein

concentrations and two overlapping walkers of 30 and42 bits appeared which cover the repeats (5) Theseresults demonstrate that the Fur model accurately predictsFur binding on both natural and synthetic sequences

Whole genome scan

The lowest information of any site in the 24-site M9 modelis 101 bits (Supplementary Figure S1) Since all sites inthe model are proven binding sites we used 10 bits asthe cutoff for a whole genome scan This presumably

A

B

C

======gtolt======

======gtolt======

======gtolt======

24 E coli Fur binding sites 196 bits (minus9 +9)

0

1

2b

its

5prime

minus20

CTGA

minus19

minus18

minus17

minus16

GCAT

minus15

GCTA

minus14

CGTA

minus13

GACT

minus12

CAGT

minus11

CGTA

minus10

GCAT

minus9

GTA

minus8

GTA

minus7

GCAT

minus6

CAG

minus5

GTCA

minus4

GAT

minus3

TA

minus2

TGCA

minus1

GACT

0 1

TCGA

2

ACGT

3

AT

4

TCA

5

CGAT

6

GTC

7

CTGA

8

CAT

9

CAT

10

CGTA

11

GCAT

12

GTCA

13

CTGA

14

GCAT

15

CGAT

16

CGTA

17 18 19 20

GCAT

3prime

20 P aeruginosa Fur sites 186 bits (minus9 +9)

0

1

2

bit

s

5prime

minus20

minus19

minus18

minus17

minus16

minus15

minus14

CGTA

minus13

ACGT

minus12

minus11

minus10

CGTA

minus9

CGTA

minus8

CTA

minus7

CT

minus6

TAG

minus5

GTCA

minus4

CAGT

minus3

TA

minus2

CTA

minus1

CT

0

GC

1

GA

2

GAT

3AT

4GTCA

5

CAGT

6

ATC

7

GA

8

GAT

9

GCAT

10

GCAT

11 12 13

TGCA

14

GCAT

15 16 17 18 19 20 3prime

22 B subtilis Fur sites 229 bits (minus9 +9)

0

1

2

bit

s

5prime

minus20

minus19

GCAT

minus18

GCTA

minus17

GACT

minus16

CTGA

minus15

minus14

GCAT

minus13

GCAT

minus12

minus11

GCAT

minus10

GCTA

minus9

CGTA

minus8

CGTA

minus7

AT

minus6

C

G

minus5

GCA

minus4

CGAT

minus3

TGA

minus2

TGA

minus1

GCAT

0

TAGC

1

GCTA

2

CAT

3

CAT

4

GCTA

5

CGT

6

G

C7

TA

8

GCAT

9

GCAT

10

CGAT

11

CGTA

12 13

CGTA

14

CGTA

15 16

GACT

17

CTGA

18

CGAT

19

CGTA

20 3prime

Figure 4 Comparison of M9 Fur models of three bacterial species Experimentally proven Fur binding sites from E coli (A) P aeruginosa (B) andB subtilis (C) were used to build the models for these three bacteria The marked region from 7 to +7 in each logo indicates the 7-1-7 model thatwas proposed by Baichoo and Helmann (40)

Nucleic Acids Research 2007 Vol 35 No 20 6769

minimizes false positives in our predictions When themodel was scanned across the E coli K-12 genome(NC_000913) a total of 412 sites were found above10 bits These sites belong to 363 unique clusters A list ofthe clusters is available at httpwwwccrnpncifcrfgovtomspapersfurThe 363 regions were extracted from 50 to thorn50

around the strongest site in each region and scanned withthe 24-site M9 model using a cutoff of 0 bits We thendetermined the exact region of each cluster that is coveredby walkers greater than 0 bits The results show that theseclusters contain 1ndash14 walkers covering a region from19ndash142 bases (350 bases on average) This is comparableto the known footprints (30ndash103 bases) (Table 3 Supple-mentary Figure S8) Of these clusters 155 are in inter-genic regions 143 are entirely inside gene coding regionsand 65 overlap the boundary between coding and inter-genic regionsTo determine how many of the Fur clusters overlap

with reasonably strong promoters we scanned these 363Fur clusters with a s70 promoter model (47) using a 4-bitcutoff The results showed that 303 clusters overlap withone or more strong promoters (Rigt4 bits) The other60 clusters do not overlap with a strong promoterWhen we scanned the flanking regions of these 60 clusters(50 to 0 of the left edge and 0 to thorn50 of the right edgeof each cluster) only 23 clusters were found to havestrong promoters (Ri gt 4 bits) in the flanking regionsWe examined the locations of Fur sites relative toknown promoter elements (transcriptional start site 10and 35) and found that they are more tightly distributedaround the promoter elements than observed with theDNA binding proteins Fis H-NS and IHF (47) (data notshown) These results suggest that Fur may mainly act as adirect repressor of genes by binding to and blockingthe promoters and that direct activation by Fur throughbinding to the region upstream of a promoter assuggested in Neisseria meningitidis by Delany et al (59)

may not be a common mechanism of Fur regulationin E coli

Out of the 363 clusters 42 were found containingat least one predicted Fur binding site above 170 bits(our arbitrary cutoff used to locate significant regions)(Table 4) Of the 42 clusters 39 were found to overlap withone or more strong s70 promoters (gt4 bits) The otherthree clusters fepD-entS yddA and fecA do not overlapScanning the flanking regions of these three clusters withthe s70 promoter model revealed that these Fur clustersare located between the respective promoters and transla-tional starts There may be only weak repression of thesegenes by Fur as none of them were significantly repressedby Fur according to microarray analyses (1360)

Scans of other proposed Fur-regulated genes

Many genes have been proposed to be Fur-regulated bycomparing conventional consensus sequences to promoterregions and also by homology to systems in other organ-isms Kammler et al annotated two Fur binding sites inthe promoter region of feoA in E coli by comparison toa consensus sequence (10) The 24-site M9 model predictstwo clusters of sites in the same region with each clustercovering one of the marked Fur boxes One cluster hasone major site of 105 bits (at 3538006) the other clusterhas two major overlapping sites of 109 (at 3538059) and134 bits (at 3538065) respectively The same authors haveconfirmed that Fur does bind to this region in vivo but theexact Fur binding regions have not been determined byfootprint experiments

Vassinova and Kozyrev used an in vivo selection tolocate Fur sites on Sau3A fragments from the E coligenome (51) The five regions from Figure 2 of their paperwere analyzed using sequence walkers Using an unidenti-fied consensus sequence yhhX (from 3578828 to 3577791)was predicted by Vassinova and Kozyrev to have two Furbinding sites In the same region we found one strongsite of 252 bits (at 3579054) (Table 4) overlapping twoweaker sites (82 and 83 bits) This cluster of Fur sites isactually located right in front of an sRNA gene ryhB(from 3579039 to 3578946) (Table 4) (6162) which hasbeen shown to be repressed by Fur (57) The promoterregion of gpmA (from 786818 to 786066 named pgm inreference 51) was predicted by consensus to contain twosites One of the highest sites (274 bits) was found in thisregion (at 786853) overlapping two weaker sites of 75and 56 bits The consensus sequence-predicted Fur site byVassinova and Kozyrev (51) is located about 600 basesupstream of the ygaC gene In the same region our 11-siteM9 model only found a 14-bit site (Figure 3) and the24-site M9 model found a negative site of 34 bits(Supplementary Figure S1) Instead in the nearby regionsthe 24-site M9 model found a 74-bit site (11 basesupstream of ygaC) and a 101-bit site (728 bases upstreamof ygaC) For nohA (from 1634391 to 1633822)two overlapping sites of 223 and 126 bits were found at1634624 and 1634630 (Table 4) The fifth region was fhuF(from 4603686 to 4602898) Figure 5 shows the DNase Ifootprints and predicted walkers in this region Onlythe high-affinity region (fhuF1) was identified by the

Table 2 Information theory test of B subtilis Fur models M9 and M6

Oligo1 Number of observed1

Fur dimers (nM)Number of M92

walkers (bits)Number of M62

walkers (bits)

6-mer 0 f 0 f 06-6 1 (500) f 1 (21) 2 (67 54)6-1-6 1 (200) f 1 (97) 2 (54 74)7-1-7 1 (100) f 1 (207) 2 (107 128)8-1-7 2 (100 1000) f 2 (76 202) f 2 (174 108)8-1-8 2 (100 1000) f 2 (76 230) f 2 (174 160)9-1-9 2 (50 100) f 2 (119 301) f 2 (208 194)2 (7-1-7) 2 (10 100) f 2 (166 257) 3 (61 208 143)feuA 1 (10) f 1 (223) 2 (103 81)dhbA 2 (10 100) f 2 (256 299) 3 (149 208 171)

1Oligos and observed Fur dimers are from (40) The Fur concentration(nM) at which one or two Fur dimer-DNA complexes appeared isgiven in parentheses2Number of sequence walkers of each model followed by theinformation (bits) of each walker are shown for each sequence Thepredictions that match the number of observed Fur dimers are markedwith a solid circle while those that do not are marked with an opencircle Only major walkers are counted A full depiction of all walkerscan be found in Supplementary Figure S7

6770 Nucleic Acids Research 2007 Vol 35 No 20

A B

L

H

H

L

H fhuF1

L fhuF2

yjjZ

4603880 4603870 4603860 4603850 4603840 4603830 4603820 5 a t c a g c a a t c c c g g c a g c a a c a c t c c c c a g c c a c t g c c c a g c g t a c g t t g c a a c a t g a t t t c a t c t c 3

4603810 4603800 4603790 4603780 4603770 4603760 4603750 5 t t t c a t t g a t a a t g a t a a c c a a t a t c a t a t g a t a a t t t t t a t c a t t t g c a a g c c a g a t a a a t c c c t t g c t a t c g g 3

Fur high affinity topFur high affinity bottom

M9 161 bits

M9 208 bits

M9 223 bits

M9 34 bits

fhuF

fhuF

4603740 4603730 4603720 4603710 4603700 4603690 4603680 5 g t a a a c c t a t c g c t a t g a t t a g c a a t c a t t a t c a t t t a g a t t a c t a t c c c g a t t a t g g c c t a t c g t t c 3rsquo

Fur low affinity topFur low affinity bottom

M9 22 bits

M9 01 bits

M9 146 bits

M9 93 bits

Figure 5 Fur model scan and DNase I footprints of the fhuF promoter region (A) The promoter region was scanned with the 24-site M9 model and two clusters of sequence walkers (Rigt 0bits) were detected with each closely corresponding to one of the two Fur protected regions (gray bars fhuF1 and fhuF2) The solid black arrow with two tails above the DNA at coordinates4603717 and 4603716 indicates the fhuF transcription starts The open arrows starting at coordinates 4603827 and 4603686 indicate the yjjZ and fhuF translation starts respectively Thesaturation of colored rectangles behind each sequence walker is proportional to the information of that site For reference colored lines cycle with 6-base periodicity (B) DNase I footprinting onthe fhuF promoter by Fur showed two regions protected by the protein marked in the figure by brackets The high-affinity and low-affinity regions are marked with H and L respectively Thefootprinting samples were run in parallel with MaxamndashGilbert sequencing ladders (marked by GA)

Nucleic

Acid

sResea

rch2007V

ol3

5N

o20

6771

consensus sequence Thus four of the five Sau3Afragments (yhhX gpmA nohA and fhuF) selected to haveFur binding sites in vivo were identified in our genomescan (Table 4) and our gel shift results confirmed this(Figure 3) the remaining site ygaC had a weak walker of14 bits (by the 11-site M9 model) and showed extremelyweak binding (Figure 3)Eick-Helmerich and Braun matched a Fur consensus

sequence to the promoters of exbB and exbD (50)One strong walker of 191 bits was found in the exbBpromoter region overlapping the lsquoFur boxrsquo The regionupstream of exbD showed no walkers with Ri greater than0 bits even though it had also been predicted to containa Fur binding site using the same consensus sequenceas used in the exbB promoter The highest walker thatwe could detect in the region upstream of the exbDgene is 71 bits (by the 24-site M9 model Figure 3 andSupplementary Figure S1) significantly lower than thetheoretical lower bound of a binding site (0 bits) (19) Thisindicates that Fur should not bind this region and our gelshift results confirmed this (Figure 3)A microarray analysis by McHugh et al found 101

genes to be regulated by Fur 53 of which were repressedand 48 activated (13) The 53 Fur-repressed genes belongto 32 transcription units (individual genes or operons)18 of which are involved in iron metabolism 14 in energymetabolism and other functions Using the 24-site M9model we predicted strong Fur binding clusters (higherthan 13 bits) for all 18 iron metabolism-related transcrip-tion units Of the other 14 transcription units onlytwo have a Fur binding site higher than 10 bits (nrdHIEF101 bits fimE 108 bits) The 48 Fur-activated genesbelong to 34 transcription units only three of these(garPLRK ynaE and ydfK) have a predicted Fur bindingcluster The garPLRK operon has a weak Fur clusterthat contains one major site of 104 bits The ynaE andydfK genes are almost identical in both promoter andcoding regions and the two Fur clusters (171 bitscentered at thorn8 from the translational start) are the same(Table 4) These results strongly suggest that in E coli

Fur is a direct repressor of iron metabolism genes and anindirect repressor or activator of other genes

McHugh et al (13) used an altered information theoryapproach for modeling Fur binding (63) to identify whichgenes are under direct Fur control Our model identifiedsites in all genes that their model did as well as fourothers fhuE (124 bits at 1160880) exbB (191 bits at3150076) fimE (108 bits at 4539697) and ydfK (171 bitsat 1631071)

The indirect activation by Fur requires an intermediateregulator which could directly target Fur-activated genesA small regulatory RNA gene ryhB is one such importantplayer Earlier study revealed several genes that aredirectly repressed by ryhB (57) Recently a microarrayanalysis of ryhB control by Masse et al (60) expanded thisdirect target set to 18 operons They also identified anadditional 10 indirectly down-regulated operons and 10up-regulated operons The 10 ryhB indirectly repressedoperons are directly repressed by Fur and for all ofthese operons we found strong Fur binding clusters(higher than 13 bits) in the corresponding promoterregions In contrast only a few of the other 28 ryhB-controlled operons (18 directly repressed operons and10 up-regulated operons) have a predicted Fur bindingcluster above 10 bits These include acnA (176 bits at1333484) ydhD (109 bits at 1732367) and oppA (215 bitsat 1298973) The acnA and oppA Fur clusters are bothstrong the former cluster has three overlapping sites of176 98 and 142 bits and the latter has two overlapp-ing sites of 215 and 96 bits (Table 4) Howevermicroarray analysis did not find these two genes to berepressed by Fur (1360) suggesting that the Fur bindingclusters may not be the primary control elements ofthese genes

DISCUSSION

Fur binding models

In this study we used experimentally proven sites to createinformation theory models of Fur binding Our models

Table 3 Scan of footprinted Fur binding sites in E coli

Accession Gene Footprinted region References Number Ri Percentage

NC_000913 fhuF1 4603763 ndash 4603815 This work 4 223 91NC_000913 fhuF2 4603680 ndash 4603733 This work 4 146 66NC_000913 fepA-fes 611868 ndash 611915 (35) 4 187 77NC_000913 fepB 623939 ndash 623969 (39) 2 167 81NC_000913 entC 624054 ndash 624102 (39) 4 231 76NC_000913 fur 709931 ndash 709960 (34) 2 123 83NC_000913 tonB 1309040 ndash 1309072 (838) 3 115 88NC_000913 cir 2244957 ndash 2244999 (32) 3 207 72NC_000913 sodA 4098726 ndash 4098780 (36) 3 233 56NC_000913 fecA 4514752 ndash 4514789 (33) 2 176 66NC_000913 fecI 4516300 ndash 4516340 (33) 7 231 144M10930 iucA 291 ndash 393 (37) 13 203 93NC_000913 fepD-entS 621394 ndash 621462 (58) 4 201 54

The footprinted regions were scanned with the 24-site M9 model (Rigt0) to test the modelrsquos validity This table lists the GenBank accession numberthe gene promoter that was footprinted the coordinates of the footprinted region the references for the footprints the number of walkers in theregion the strongest Ri value in the same region and the percent of the footprint covered by sequence walkers This table is a summary of Figure 5Aand Supplementary Figure S8 Sites in the promoters that are marked by an asterisk were used to build the 11-site models (Figure 1 SupplementaryFigure S1) in the case of iucA two sites were used in the model (see Materials and Methods Section)

6772 Nucleic Acids Research 2007 Vol 35 No 20

especially the 24-site M9 model approximate the bindingcharacteristics of Fur more fully than previous modelsthat depended on conventional consensus sequences datafrom multiple species and sequences that were not foot-printed (13251) The rigorous approach used hererevealed new binding sites disproved two sites predictedby a consensus sequence and clarified the manner inwhich the protein binds

To produce these models binding sequences weremultiply aligned to maximize the information content indifferent window sizes Because multiple Fur sites overlapour analysis of proven Fur binding sequences from three

bacterial species gave two or three different alignmentsdepending on the species (Figure 1 SupplementaryFigure S6) By testing these models against publisheddata we showed clearly that the M9 model representssingle-dimer binding sites the M12 model is for two-dimerbinding while the compact M6 alignment was caused bycompression of the binding sequences into a small window(Tables 1 and 2 Figures 1 and 2 Supplementary FiguresS3 S5 and S7)As evaluated by their respective M9 models our

initial set of 11 footprinted E coli Fur binding sitesall contain overlapping sites of 6-base spacing

Table 4 Strong Fur clusters in the E coli genome predicted by the 24-site M9 model

Ri Number Size Coordinate Genes Experimental Position References(bits) (bp) data

274 12 95 2066611 yoeA G 48 NA274 5 31 786853 gpmA G 35 (1351)253 4 28 579824 nohB 133 (1)252 4 31 3579054 ryhB G 15 (57)233 3 31 4098746 sodA F 87 (6069)231 7 59 4516309 fecI F 51 (1360)231 4 37 624072 entC F 36 (39)228 2 25 1903412 yebN 300 NA223 4 48 4603779 fhuFa FG 93 (135160)223 3 25 1634624 nohA G 233 (51)217 2 25 1886651 fadD G +1119b NA215 2 25 1298973 oppA G 233 (7071)213 3 31 1762463 sufA 53 (1360)207 3 31 2244980 cir F 189 (1372)206 5 40 1118358 yceJ 89 NA202 3 31 3465053 bfd 40 (1360)201 10 53 1787602 ydiE 35 (13)201 2 35 1080513 ycdN 57 (1360)199 8 106 3214572 yqjH 59 (13)199 7 103 1752756 ydhY 255 NA199 2 25 2510784 mntH 56 (7374)196 3 25 2231892 yeiT 163 NA196 2 25 1577358 yddA +8c (13)195 3 31 167439 fhuA G 45 (1360)194 4 37 621437 fepD ndash entS FG 25 86 (1375)193 3 52 675768 ybeQ +2c NA191 3 31 3150076 exbB G 70 (136072)187 8 96 2784036 ygaQ 383 NA187 4 37 611889 fepA ndash fes F 172 149 (13)186 7 38 3266404 yhaC 33 NA186 4 28 331901 yahA G +306e NA183 4 61 331085 yahA 510e NA176 6 41 1333484 acnA 371 (5776ndash78)176 2 25 4514779 fecA F 79 (1360)176 1 19 2254041 yeiL +5d NA175 3 25 490057 priC ndash ybaN 21 49 (7980)173 2 29 2454142 yfcV 474 NA172 8 41 3582772 yrhB 10 NA172 4 37 840877 fiu 123 (13)171 3 40 1631071 ydfK +8c (13)171 3 40 1432273 ynaE +8c (13)171 2 25 4570225 yjiT 212 NA

The strongest individual information value (Ri bits) in each cluster is shown followed by the number of walkers (Rigt 0 bits) the size of the clusterthe coordinate of the strongest site the genes that are presumably controlled experimental data footprinted (F) andor gel shifted (G) sites thelocation of the strongest Fur site relative to the gene start or stop codon (as noted) and references about Fur regulation of the genes (NA means noreference is available)aThis cluster corresponds to the high-affinity region (fhuF1) in the fhuF promoter (Figure 5)bThe coordinate is high because Fur cluster is inside the fadD coding regioncThese Fur clusters overlap with translational startsdThis Fur cluster overlaps with the stop codoneThe yahA gene has two Fur clusters one is inside the coding region another is 510 bases upstream of the translational start The Fur cluster insidethe gene was tested by gel shift experiments (Figure 3)

Nucleic Acids Research 2007 Vol 35 No 20 6773

(Supplementary Figure S8) while only 7 of the 20P aeruginosa Fur binding sites and 6 of the 22 B subtilisFur binding sites are single-dimer binding sites (data notshown) This scarcity of multiple sites explains why anM12 alignment can be obtained for E coli but not forP aeruginosa or B subtilis When we performed multiplealignment by only using overlapping sites of 6-bp spacingwe obtained M12 alignments for both P aeruginosa andB subtilis (data not shown) These observations alsosupport the use of M9 as a single-dimer modelOur multiple alignment method should apply to any

DNA binding sites that contain internal direct repeats asrepetition within binding sites may cause alternativealignments of the sites In the RegulonDB (httpregulondbccgunammx) (64) a Fur binding modelwas created by multiply aligning 47 Fur sites althoughit is not clear what these 47 sites are and how thealignment was made Based on their alignment (httpregulondbccgunammxhtmlmatrix_Alignmentjsp) we made a sequence logo (data notshown) which we found to be similar to our M6 logosAlthough the M6 models appear to be able to predictsome sites (eg Tables 1 and 2 Supplementary Figures S3S5 and S7) the models only represent the internal partof overlapping Fur binding sites ie from 6 to thorn6 of theM12 model (Figure 1A)A 18 A crystal structure for P aeruginosa Fur

was obtained by Pohl et al and based on this theauthors proposed a single-dimer binding model (9) TheP aeruginosa crystal structure contains a putative DNAbinding a-helix H4 and a loop between helices H1 and H2both of which were proposed to be involved in bindingDNA through contacting bases in the major groove Thusa single dimer would protect two consecutive majorgrooves on the same face of DNA and the minimalrecognition unit of Fur should be close to 20 bp (9) OurM9 binding site models (19 bp a 9-1-9 inverted repeat) fitwith this single-dimer binding model closely Pohl et alalso proposed a two-dimer binding model in whichtwo Fur dimers bind two overlapping sites that are 5-bpapart (9) However in our whole genome scans andfootprint scans as well we did not observe any case of5-bp separated overlapping sites above 5 bits Instead wemostly found 3- or 6-bp separated overlapping sitesAbove five bits we found 95 cases of 3-bp spacing and92 cases of 6-bp spacingSeveral other consensus-based Fur binding site models

have been proposed to interpret a 19-bp consensuslsquoFur boxrsquo (Figure 6) (65) Within these two earlier

models the classical model and hexamer model(573754) interpreted the lsquoFur boxrsquo as a single recogni-tion unit of a Fur dimer Later Lavrrar and McIntoshsuggested that a 13-bp inverted repeat (6-1-6) is theminimal unit recognized by a single Fur dimer and thattwo overlapping lsquo6-1-6rsquo motifs correspond to the Fur boxwhich is required for high-affinity binding of Fur (5558)Baichoo and Helmann reinterpreted the Fur box as twooverlapping 7-1-7 inverted repeats with a 6-bp spacingand suggested that a 7-1-7 site but not 6-1-6 representsthe minimal recognition unit of Fur (40) Baichoo andHelmann also demonstrated that high-affinity bindingby Fur can happen on a single-dimer binding site ie aninverted 7-1-7 site

The 7-1-7 model basically agrees with our M9 models(Figure 6) The main difference is that the 7-1-7 model didnot count the weakly conserved bases at positions 9 8thorn8 and thorn9 in the B subtilis Fur binding site alignment(Figure 4C) As the E coli and P aeruginosa M9 modelshave highly conserved bases at these four positions(Figure 4A and B) and also the Fur binding mechanismsfor these three species are similar (9) we suggest thata minimal Fur binding unit of 19 bp (9-1-9) as representedby our M9 models is more reasonable

Relative to our M9 models the Fur box only representsthe internal part of two overlapping sites that areseparated by 6 bases ie from 9 to thorn9 of the M12logo (Figures 1A and 6) thus any predictions based on theFur box may not be accurate Two sites (exbD and ygaC)that were predicted by matching to the Fur box wereshown to be non-binding (Figure 3)

Many difficulties in understanding Fur binding sites canbe attributed to the choice of consensus sequences asa model (66) In contrast to the individual informationweight matrix model the consensus sequence methodignores the varying importance of bases by treatingmismatches equivalently In addition the consensusmethod does not have a criterion for an acceptablenumber of mismatches

Sequence walkers predicted by the 24-site M9 modelcover 83 of the footprints (Table 3 Figure 5Supplementary Figure S8) Similar results were obtainedfor both the P aeruginosa and B subtilis M9 models(data not shown) These results strongly suggest that ourM9 models can accurately predict Fur binding

Fur Binding Site Clusters

The whole genome scan with the 24-site M9model revealed363 Fur binding clusters in E coli most of which containmultiple walkers (up to 14) covering regions up to about140 bases This is comparable to the size of knownfootprints which ranges from 30 to 103 bases (Table 3Supplementary Figure S8) Most of the clusters (303 outof 363 83) were found to overlap with one or morestrong s70 promoters Indeed the four Fur clusters inthe middle of coding regions gspC garP yahA and fadDthat we tested by gel mobility shift assays (Figure 3) allhave one or more potential s70 sites (Ri gt 4 bits) exactlyoverlapping them (see Figures S10ndashS13) The function ofthese control elements is unknown

Fur boxClassical model 9-1-9Hexamer model 6-6-1-6Lavrrar model 6-1-6

Baichoo model 7-1-7

M9 model 9-1-9

GATAATGATAATCATTATC

========gtOlt========

=====gt=====gtOlt=====

=====gtOlt==========gtOlt=====

t=====gtOlt============gtOlt=====a

aat=====gtOlt================gtOlt=====att

Figure 6 Different Fur binding models to interpret the Fur box

6774 Nucleic Acids Research 2007 Vol 35 No 20

We correlated our whole genome scan with publishedmicroarray data about Fur in E coli (1360) The resultsshowed that all operons that are involved in ironmetabolism have a predicted strong Fur binding cluster(higher than 13 bits) suggesting direct repression ofthese genes by Fur In contrast only a few of otherFur-activated or repressed operons have a predicted Furcluster higher than 10 bits suggesting indirect control ofthese genes by Fur It has been shown that RNA generyhB directly targets and represses a set of other genes andryhB itself is directly repressed by Fur thus ryhB playsan important role in indirect activation of genes by Furin E coli (135760)

It has been suggested that in N meningitidis Fur canalso directly activate genes by binding to the regionupstream of the promoters (59) However our wholegenome analysis did not support this activation mechan-ism in E coli The E coli ryhB gene is located betweengenes yhhX and yhhY No corresponding homologs of thegenes ryhB yhhX or yhhY can be found in N meningitidisThis suggests that Fur activation mechanisms may bedifferent in different species

Disentangling Fur models

Each binding site that we have studied in detail withinformation theory has had unique properties andchallenges for analysis For example ribosome bindingsites were initially modeled as rigid objects (16) but whenenough sites became available it was clear that theShinendashDalgarno sequence was being lost because it has avariable distance to the initiation codon (67) and a flexiblemodel was required (48) The flexible model was latersuccessfully applied to s70 promoters (47) and that wasused to study Fur sites Likewise information theoryanalysis revealed that Fis binding sites are commonlyfound to overlap each other (68) leading to someuncertainty as to how much nearby sites influence thesites being studied in the alignments Overlapping Fursites that are bound by Fur protein dimers create an evenmore complex situation

Although the 24-site M9 model was quite successful themodel is comprised of information from multiple Fur sitesbecause the Fur dimers overlap on the DNA Since theclusters frequently display 6-base separation of sitespositions 3 to thorn9 of the model contain data frompositions 9 to thorn3 of the adjacent Fur sites eg see thelogo of Figure 4 and the M9 walkers of Figure 2AIn addition information from sites separated by 3 basesand the complements of all of these also contribute to theoverall information content of the model It is not obvioushow to disentangle these effects We suggest that the M9model can be scanned across the genome to locate strongsingle-dimer binding sites which could then be experi-mentally confirmed and used to construct a disentangledmodel

Unfortunately the distribution of the individual infor-mation in these sites would be determined by the modelused to locate them and this could introduce biases Forexample the mean of the distribution can be altered justby selecting different subsets of sites from the distribution

A similar bias has occurred in binding site databaseswhere the consensus sequence was used to locate newsites (23) Thus although the sequence logo would nolonger contain self-overlapping data it would not neces-sarily represent the natural distribution of site strengthsTo approach such an ultimate model may require furtherscanning of footprinted regions Incorporation of theweaker sites in the footprints would of course re-entanglethe model so such scans might be best used only todetermine whether the model can detect the weaker sitesA viable solution may be to obtain all single-dimerbinding sites in the genome that have more than zero bitsand can be confirmed experimentally If the resultingmodel matches the observed footprints it may be consi-dered complete On the other hand if the resultingmodel does not match the footprints the mode of bindingby Fur could be different between single dimers andclusters

SUPPLEMENTARY DATA

Supplementary data are available at NAR Online

ACKNOWLEDGEMENTS

We thank Tom OrsquoHalloran and Caryn Outten for Furprotein Pete Rogan for helping to develop the local bestconcept Elaine Bucheimer for writing the localbestprogram and Lakshmanan Iyer Brent Jewett DanielleNeedle Xiao Ma Shu Ouyang Denise Rubens andBruce Shapiro for their helpful comments and discussionsKAL also thanks the National Cancer Institute atFrederick for sponsoring the Werner H Kirsten StudentIntern Program This publication has been funded in partwith Federal funds from the National Cancer InstituteNational Institutes of Health under contract NO1-CO-12400 The content of this publication does not necessarilyreflect the views or policies of the Department of Healthand Human Services nor does mention of trade namescommercial products or organizations imply endorsementby the US Government This research was supported inpart by the Intramural Research Program of the NIHNational Cancer Institute Center for Cancer ResearchFunding to pay the Open Access publication charges forthis article was provided by National Cancer Institute

Conflict of interest statement None declared

REFERENCES

1 StojiljkovicI BaumlerAJ and HantkeK (1994) Fur regulon ingram-negative bacteria Identification and characterization of newiron-regulated Escherichia coli genes by a fur titration assay[published erratum appears in J Mol Biol 1994 Jul 15240(3)271]J Mol Biol 236 531ndash545

2 Le CamE FrechonD BarrayM FourcadeA and DelainE(1994) Observation of binding and polymerization of Fur repressoronto operator-containing DNA with electron and atomic forcemicroscopes Proc Natl Acad Sci USA 91 11816ndash11820

3 CalderwoodSB and MekalanosJJ (1987) Iron regulation ofShiga-like toxin expression in Escherichia coli is mediated by thefur locus J Bacteriol 169 4759ndash4764

Nucleic Acids Research 2007 Vol 35 No 20 6775

4 NiederhofferEC NaranjoCM BradleyKL and FeeJA (1990)Control of Escherichia coli superoxide dismutase (sodA and sodB)genes by the ferric uptake regulation (fur) locus J Bacteriol 1721930ndash1938

5 EscolarL Perez-MartinJ and de LorenzoV (1998) Binding of theFur (Ferric Uptake Regulator) repressor of Escherichia coli toarrays of the GATAAT sequence J Mol Biol 283 537ndash547

6 BaggA and NeilandsJB (1987) Ferric uptake regulation proteinacts as a repressor employing iron (II) as a cofactor to bind theoperator of an iron transport operon in Escherichia coliBiochemistry 26 5471ndash5477

7 de LorenzoV WeeS HerreroM and NeilandsJB (1987)Operator sequences of the aerobactin operon of plasmid ColV-K30binding the ferric uptake regulation (fur) repressor J Bacteriol169 2624ndash2630

8 AlthausEW OuttenCE OlsonKE CaoH andOrsquoHalloranTV (1999) The ferric uptake regulation (Fur) repressoris a zinc metalloprotein Biochemistry 38 6559ndash6569

9 PohlE HallerJC MijovilovichA Meyer-KlauckeWGarmanE and VasilML (2003) Architecture of a protein centralto iron homeostasis crystal structure and spectroscopic analysis ofthe ferric uptake regulator Mol Microbiol 47 903ndash915

10 KammlerM SchonC and HantkeK (1993) Characterization ofthe ferrous iron uptake system of Escherichia coli J Bacteriol 1756212ndash6219

11 DesaiPJ AngererA and GencoCA (1996) Analysis of Furbinding to operator sequences within the Neisseria gonorrhoeae fbpApromoter J Bacteriol 178 5020ndash5023

12 HeidrichC HantkeK BierbaumG and SahlHG (1996)Identification and analysis of a gene encoding a Fur-likeprotein of Staphylococcus epidermidis FEMS Microbiol Lett 140253ndash259

13 McHughJP Rodriguez-QuinonesF Abdul-TehraniHSvistunenkoDA PooleRK CooperCE and AndrewsSC(2003) Global iron-dependent gene regulation in Escherichia coliA new mechanism for iron homeostasis J Biol Chem 27829478ndash29486

14 ShannonCE (1948) A mathematical theory of communicationBell Syst Tech J 27 379ndash423 623ndash656 httpcmbell-labscomcmmswhatshannondaypaperhtml

15 PierceJR (1980) An Introduction to Information Theory SymbolsSignals and Noise 2nd edn Dover Publications Inc New York

16 SchneiderTD StormoGD GoldL and EhrenfeuchtA (1986)Information content of binding sites on nucleotide sequencesJ Mol Biol 188 415ndash431 httpwwwccrnpncifcrfgov~tomspaperschneider1986

17 SchneiderTD and MastronardeD (1996) Fast multiple alignmentof ungapped DNA sequences using information theory and arelaxation method Discrete Appli Math 71 259ndash268 httpwwwccrnpncifcrfgov~tomspapermalign

18 SchneiderTD and StephensRM (1990) Sequence logos a newway to display consensus sequences Nucleic Acids Res 186097ndash6100 httpwwwccrnpncifcrfgov~tomspaperlogopaper

19 SchneiderTD (1997) Information content of individual geneticsequences J Theor Biol 189 427ndash441 httpwwwccrnpncifcrfgov~tomspaperri

20 SchneiderTD (1997) Sequence walkers a graphical method todisplay how binding proteins interact with DNA or RNAsequences Nucleic Acids Res 25 4408ndash4415 httpwwwccrnpncifcrfgov~tomspaperwalker erratum NAR 26(4)1135 1998

21 PaninaEM MironovAA and GelfandMS (2001) Comparativeanalysis of FUR regulons in gamma-proteobacteria Nucleic AcidsRes 29 5195ndash5206

22 SchneiderTD (2000) Evolution of biological informationNucleic Acids Res 28 2794ndash2799 httpwwwccrnpncifcrfgov~tomspaperev

23 VyhlidalCA RoganPK and LeederJS (2004) Development andrefinement of pregnane X receptor (PXR) DNA binding site modelusing information theory insights into PXR-mediated gene regula-tion J Biol Chem 279 46779ndash46786

24 ZhengM DoanB SchneiderTD and StorzG (1999) OxyR andSoxRS regulation of fur J Bacteriol 181 4639ndash4643 httpwwwccrnpncifcrfgov~tomspaperoxyrfur

25 HengenPN BartramSL StewartLE and SchneiderTD (1997)Information analysis of Fis binding sites Nucleic Acids Res 254994ndash5002 httpwwwccrnpncifcrfgov~tomspaperfisinfo

26 WoodTI GriffithKL FawcettWP Jair K-W SchneiderTDand WolfRE (1999) Interdependence of the position andorientation of SoxS binding sites in the transcriptional activation ofthe class I subset of Escherichia coli superoxide-inducible promotersMol Microbiol 34 414ndash430

27 ZhengM WangX DoanB LewisKA SchneiderTD andStorzG (2001) Computation-directed identification of OxyR-DNAbinding sites in Escherichia coli J Bacteriol 183 4571ndash4579

28 RoganPK FauxBM and SchneiderTD (1998) Informationanalysis of human splice site mutations Hum Mutat 12 153ndash171Erratum in Hum Mutat 199913(1)82 httpwwwccrnpncifcrfgov~tomspaperrfs

29 ChenZ and SchneiderTD (2005) Information theory basedT7-like promoter models classification of bacteriophages anddifferential evolution of promoters and their polymerasesNucleic Acids Res 33 6172ndash6187 httpwwwccrnpncifcrfgov~tomspaperst7like

30 ChenZ and SchneiderTD (2006) Comparative analysis of tandemT7-like promoter containing regions in enterobacterial genomesreveals a novel group of genetic islands Nucleic Acids Res 341133ndash1147 httpwwwccrnpncifcrfgov~tomspaperst7island

31 SchneiderTD (1996) Reading of DNA sequence logos predictionof major groove binding by information theory Meth Enzymol274 445ndash455 httpwwwccrnpncifcrfgov~tomspaperoxyr

32 GriggsDW and KoniskyJ (1989) Mechanism for iron-regulatedtranscription of the Escherichia coli cir gene metal-dependentbinding of Fur protein to the promoters J Bacteriol 1711048ndash1052

33 AngererA and BraunV (1998) Iron regulates transcription of theEscherichia coli ferric citrate transport genes directly and throughthe transcription initiation proteins Arch Microbiol 169 483ndash490

34 de LorenzoV HerreroM GiovanniniF and NeilandsJB (1988)Fur (ferric uptake regulation) protein and CAP (catabolite-activatorprotein) modulate transcription of fur gene in Escherichia coliEur J Biochem 173 537ndash546

35 HuntMD PettisGS and McIntoshMA (1994) Promoter andoperator determinants for fur-mediated iron regulation in thebidirectional fepA-fes control region of the Escherichia colienterobactin gene system J Bacteriol 176 3944ndash3955

36 TardatB and TouatiD (1993) Iron and oxygen regulation ofEscherichia coli MnSOD expression competition between the globalregulators Fur and ArcA for binding to DNA Mol Microbiol 953ndash63

37 EscolarL Perez-MartinJ and de LorenzoV (2000) Evidence ofan unusually long operator for the fur repressor in the aerobactinpromoter of Escherichia coli J Biol Chem 275 24709ndash24714

38 YoungGM and PostleK (1994) Repression of tonB transcriptionduring anaerobic growth requires Fur binding at the promoterand a second factor binding upstream Mol Microbiol 11943ndash954

39 BrickmanTJ OzenbergerBA and McIntoshMA (1990)Regulation of divergent transcription from the iron-responsivefepB-entC promoter-operator regions in Escherichia coliJ Mol Biol 212 669ndash682

40 BaichooN and HelmannJD (2002) Recognition of DNA by Fura reinterpretation of the Fur box consensus sequence J Bacteriol184 5826ndash5832

41 OchsnerUA and VasilML (1996) Gene repression by the ferricuptake regulator in Pseudomonas aeruginosa cycle selection of iron-regulated genes Proc Natl Acad Sci USA 93 4409ndash4414

42 BaichooN WangT YeR and HelmannJD (2002) Globalanalysis of the Bacillus subtilis Fur regulon and the iron starvationstimulon Mol Microbiol 45 1613ndash1629

43 ChristoffersenCA BrickmanTJ Hook-BarnardI andMcIntoshMA (2001) Regulatory architecture of the iron-regulatedfepD-ybdA bidirectional promoter region in Escherichia coliJ Bacteriol 183 2059ndash2070

44 ChaiS WelchTJ and CrosaJH (1998) Characterization of theinteraction between Fur and the iron transport promoter of thevirulence plasmid in Vibrio anguillarum J Biol Chem 27333841ndash33847

6776 Nucleic Acids Research 2007 Vol 35 No 20

45 EscolarL Perez-MartinJ and de LorenzoV (1998) Coordinatedrepression in vitro of the divergent fepA-fes promoters ofEscherichia coli by the iron uptake regulation (Fur) proteinJ Bacteriol 180 2579ndash2582

46 EscolarL de LorenzoV and Perez-MartinJ (1997)Metalloregulation in vitro of the aerobactin promoter of Escherichiacoli by the Fur (ferric uptake regulation) protein Mol Microbiol26 799ndash808

47 ShultzabergerRK ChenZ LewisKA and SchneiderTD (2007)Anatomy of Escherichia coli s70 promoters Nucleic Acids Res 35771ndash788 httpwwwccrnpncifcrfgov~tomspaperflexprom

48 ShultzabergerRK BucheimerRE RuddKE andSchneiderTD (2001) Anatomy of Escherichia coli ribosomebinding sites J Mol Biol 313 215ndash228 httpwwwccrnpncifcrfgov~tomspaperflexrbs

49 HiraoI KawaiG YoshizawaS NishimuraY IshidoYWatanabeK and MiuraK (1994) Most compact hairpin-turnstructure exerted by a short DNA fragment d(GCGAAGC) insolution an extraordinarily stable structure resistant to nucleasesand heat Nucleic Acids Res 22 576ndash582

50 Eick-HelmerichK and BraunV (1989) Import of biopolymers intoEscherichia coli nucleotide sequences of the exbB and exbD genesare homologous to those of the tolQ and tolR genes respectivelyJ Bacteriol 171 5117ndash5126

51 VassinovaN and KozyrevD (2000) A method for direct cloning ofFur-regulated genes identification of seven new Fur-regulated lociin Escherichia coli Microbiology 146 3171ndash3182

52 ToledanoMB KullikI TrinhF BairdPT SchneiderTD andStorzG (1994) Redox-dependent shift of OxyR-DNA contactsalong an extended DNA binding site a mechanism for differentialpromoter selection Cell 78 897ndash909

53 BlattnerFR PlunkettG III BlochCA PernaNT BurlandVRileyM Collado-VidesJ GlasnerJD RodeCK et al (1997)The complete genome sequence of Escherichia coli K-12 Science277 1453ndash1474

54 BaggA and NeilandsJB (1987) Molecular mechanism of regula-tion of siderophore-mediated iron assimilation Microbiol Rev 51509ndash518

55 LavrrarJL and McIntoshMA (2003) Architecture of a Furbinding site a comparative analysis J Bacteriol 185 2194ndash2202

56 OrchardK and MayGE (1993) An EMSA-based method fordetermining the molecular weight of a protein ndash DNA complexNucleic Acids Res 21 3335ndash3336

57 MasseE and GottesmanS (2002) A small RNA regulates theexpression of genes involved in iron metabolism in Escherichia coliProc Natl Acad Sci USA 99 4620ndash4625

58 LavrrarJL ChristoffersenCA and McIntoshMA (2002)FurndashDNA interactions at the bidirectional fepDGC-entS promoterregion in Escherichia coli J Mol Biol 322 983ndash995

59 DelanyI RappuoliR and ScarlatoV (2004) Fur functions as anactivator and as a repressor of putative virulence genes in Neisseriameningitidis Mol Microbiol 52 1081ndash1090

60 MasseE VanderpoolCK and GottesmanS (2005) Effect ofRyhB small RNA on global iron use in Escherichia coliJ Bacteriol 187 6962ndash6971

61 ArgamanL HershbergR VogelJ BejeranoG WagnerEGMargalitH and AltuviaS (2001) Novel small RNA-encodinggenes in the intergenic regions of Escherichia coli Curr Biol 11941ndash950

62 WassarmanKM RepoilaF RosenowC StorzG andGottesmanS (2001) Identification of novel small RNAs usingcomparative genomics and microarrays Genes Dev 15 1637ndash1651

63 RobisonK McGuireAM and ChurchGM (1998) A compre-hensive library of DNA-binding site matrices for 55 proteins appliedto the complete Escherichia coli K-12 genome J Mol Biol 284241ndash254

64 SalgadoH Santos-ZavaletaA Gama-CastroS Millan-ZarateDDiaz-PeredoE Sanchez-SolanoF Perez-RuedaEBonavides-MartinezC and Collado-VidesJ (2001) RegulonDB(version 32) transcriptional regulation and operon organization inEscherichia coli K-12 Nucleic Acids Res 29 72ndash74

65 AndrewsSC RobinsonAK and Rodriguez-QuinonesF (2003)Bacterial iron homeostasis FEMS Microbiol Rev 27 215ndash237

66 SchneiderTD (2002) Consensus Sequence Zen ApplBioinformatics 1 111ndash119 httpwwwccrnpncifcrfgov~tomspaperszen

67 RuddKE and SchneiderTD (1992) Compilation of E coliribosome binding sites In MillerJH (ed) A Short Course inBacterial Genetics A Laboratory Manual and Handbook forEscherichia coli and Related Bacteria Cold Spring HarborCold Spring Harbor Laboratory Press New York pp 1719ndash1745

68 HengenPN LyakhovIG StewartLE and SchneiderTD(2003) Molecular flip-flops formed by overlapping Fis sitesNucleic Acids Res 31 6663ndash6673

69 FeeJA (1991) Regulation of sod genes in Escherichia colirelevance to superoxide dismutase function Mol Microbiol 52599ndash2610

70 IgarashiK SaishoT YuguchiM and KashiwagiK (1997)Molecular mechanism of polyamine stimulation of thesynthesis of oligopeptide-binding protein J Biol Chem 2724058ndash4064

71 YoshidaM MeksuriyenD KashiwagiK KawaiG andIgarashiK (1999) Polyamine stimulation of the synthesis ofoligopeptide-binding protein (OppA) Involvement of a structuralchange of the Shine-Dalgarno sequence and the initiation codonaug in oppa mRNA J Biol Chem 274 22723ndash22728

72 LitwinCM and CalderwoodSB (1993) Role of iron in regulationof virulence genes Clin Microbiol Rev 6 137ndash149

73 MakuiH RoigE ColeST HelmannJD GrosP andCellierMF (2000) Identification of the Escherichia coli K-12Nramp orthologue (MntH) as a selective divalent metal iontransporter Mol Microbiol 35 1065ndash1078

74 PatzerSI and HantkeK (2001) Dual repression by Fe(2+)-Furand Mn(2+)-MntR of the mntH gene encoding an NRAMP-likeMn(2+) transporter in Escherichia coli J Bacteriol 1834806ndash4813

75 FurrerJL SandersDN Hook-BarnardIG and McIntoshMA(2002) Export of the siderophore enterobactin in Escherichia coliinvolvement of a 43 kDa membrane exporter Mol Microbiol 441225ndash1234

76 CunninghamL GruerMJ and GuestJR (1997) Transcriptionalregulation of the aconitase genes (acnA and acnB) of Escherichiacoli Microbiology 143 ( Pt 12) 3795ndash3805

77 FuentesAM Diaz-MejiaJJ Maldonado-RodriguezR andAmabile-CuevasCF (2001) Differential activities of theSoxR protein of Escherichia coli SoxS is not required for geneactivation under iron deprivation FEMS Microbiol Lett 201271ndash275

78 VargheseS TangY and ImlayJA (2003) Contrasting sensitivitiesof Escherichia coli aconitases A and B to oxidation and irondepletion J Bacteriol 185 221ndash230

79 ZavitzKH DiGateRJ and MariansKJ (1991) The PriB andPriC replication proteins of Escherichia coli Genes DNAsequence overexpression and purification J Biol Chem 26613988ndash13995

80 SandlerSJ (2005) Requirements for replication restart proteinsduring constitutive stable DNA replication in Escherichia coli K-12Genetics 169 1799ndash1806

81 SchneiderTD (2001) Strong minor groove base conservation insequence logos implies DNA distortion or base flipping duringreplication and transcription initiation Nucleic Acids Res 294881ndash4891 httpwwwccrnpncifcrfgov~tomspaperbaseflip

Nucleic Acids Research 2007 Vol 35 No 20 6777

Page 5: Discovery of Fur binding site clusters in Escherichia coli ...biitcomm/research/references... · using an fhuF:lacZ fusion and Fur consensus sequence-containing plasmid titrant on

The M12 logo appears to consist of two overlappingdirect repeats one is from 12 to thorn6 and the other from6 to thorn12 Because the multiple alignment process canincorporate one or the other of these repeats this suggeststhat different alignments may be obtained when otherwindows are used to align the sequences Because two Furdimers can bind overlapping sites with 6-base separation(4055) we then multiply aligned the sequences by usingwindow sizes ranging from (15 thorn15) to (5 thorn5) Thesesizes should cover the sites recognized by one or twoFur dimers (9) A total of seven different alignments wereobtained the corresponding sequence logos are shown inSupplementary Figure S2 According to the patternsshown in the logos the seven alignments that wereobtained can be further consolidated into and assignedto three basic alignment classes these alignment classesare represented by logos M12 M9 and M6 (Figure 1)Each of the three alignments itself is an inverted repeatthe M12 logo also appears to consist of two overlappingM9 sites (from 12 to thorn6 and from 6 to thorn12) and theM9 logo looks like two overlapping M6 sites (from 9to thorn3 and from 3 to thorn9) (Figure 1)To determine which of the three alignment classes best

represents a single-dimer binding model some publishedexperimental data was analyzed Lavrrar and McIntosh(55) used the Ferguson method (56) to determine howmany E coli Fur dimers can bind on five synthetic oligosand two natural sites fepB and entS We applied the threemodels to their sequences and used sequence walkersto show the predicted sites of each model A full list ofthe predictions (with walkers gt0 bits) is given inSupplementary Figure S3 The results show that themodels frequently found overlapping sites that areseparated by 3 or 6 bases The sites that are 6 basesapart can both be strong while 3-base separated siteshave one strong site and one weak site (SupplementaryFigure S3) Furthermore it has been shown that Furdimers can simultaneously bind two overlapping sites with6-bp spacing (40) but there is no experimental datashowing that 3-base separated overlapping sites can bebound at the same time Therefore to analyze the resultswe only counted major walkers (strong walkers that areseparated by 6 bases) for each model in each oligo

A summary of the results (with major walkers) is availablein Table 1 two examples of the prediction (with majorwalkers) are shown in Figure 2A and B The results showthat the M9 model made correct predictions for five ofthe seven oligos while the M12 and M6 models bothmade only two correct predictions Furthermore the M12model always predicted one less site than the M9 modelThese results suggest that the M9 model may be a single-dimer binding model the M12 model may representa two-dimer binding model while the M6 alignment maybe just a result of compression of the binding sequencesinto a small alignment window

Confirmation of 13 predicted sites and model refinement

Since our E coli Fur model had only 11 sites we wantedto refine the model by including more sites A genome scanwith the 11-site M9 model revealed 389 sites above 11 bits(the lowest information of a site in the model is 111 bits)Within these 13 sites (from 12 to 26 bits) were selectedfor experimental confirmation (Figure 3 SupplementaryTable S1 and Figure S1) These include eight sites locatedin promoter regions [yoeA fepD-entS (43) gpmA (= pgm)(51) ryhB (= yhhX-yhhY) (5157) fhuA (13) nohA (51)oppA (13) and fhuF (51)] and four sites inside genes(gspC garP yahA and fadD) We also included two sitesthat were predicted by consensus models to be in ygaCand exbD (5051) since our model only detected a weaksite (14 bits) in ygaC and a negative site (50 bits) inexbD (Figure 3) Gel mobility shift assays showed that all13 sites predicted by the M9 model shifted when incubatedwith Fur protein while the exbD site did not shift and theygaC site only showed extremely weak binding (Figure 3)

McHugh et al (13) found that both exbB and exbDwere regulated by Fur but their model did not detect abinding site in the exbB promoter while our informationtheory model successfully predicted that Fur binding siteand our gel shift results confirmed this (Figure 3)The exbD gene is located only seven bases downstreamof the exbB gene Both genes have the same orientationso they may belong to a single operon cotranscribed fromthe exbB promoter and coregulated by Fur

To strengthen our model we added the 13 confirmedsites to the 11 previously footprinted sites (Figure 4A)

Table 1 Information theory test of 11-site E coli Fur models M12 M9 and M6

Oligo1 Number of observed1

Fur dimers (nM)Number of M122

walkers (bits)Number of M92

walkers (bits)Number of M62

walkers (bits)

F-F 1 (25000) f 1 (55) 2 (90 94) f 1 (115)F-F-F 2 (3610) 1 (148) f 2 (118 153) f 2 (119 99)F-F-F-F 4 (630) 2 (166 192) 3 (118 189 153) 3 (119 119 99)F-F-x-R 2 (095) f 2 (221 43) f 2 (177 176) 1 (204)F-F-x-R-R 3 (078) 2 (296 227) f 3 (177 275 108) 2 (204 164)fepB 2 (860) 1 (157) f 2 (109 82) 1 (152)entS 3 (301) 2 (156 172) f 3 (75 172 95) 2 (83 122)

1Oligos and observed Fur dimers are from (55) The apparent Kd is also given (nM in parentheses) for each oligo2Number of sequence walkers of each model followed by the information (bits) of each walker are shown for each sequence The predictions thatmatch the number of observed Fur dimers are marked with a solid circle while those that do not are marked with an open circle Only majorwalkers (strong walkers with 6-bp separation) are counted (see the text) A full depiction of all walkers (gt0 bits) can be found in SupplementaryFigures S3 and S5

6766 Nucleic Acids Research 2007 Vol 35 No 20

As with the 11 sites multiple alignment of the 24 sitesusing different window sizes also gave three classes ofalignments M12 M9 and M6 (Supplementary Figure S4)When these models were tested against the same set ofsequences as that used to test the 11-site models (Table 1Supplementary Figure S3) similar but cleaner resultswere obtained ie the weak walkers that were not countedpreviously become weaker (some are even below 0 bits)but the strong walkers still have similar Ri values(Supplementary Figure S5) This confirms that ourcounting of major walkers predicted by the initial 11-sitemodels was reasonable

P aeruginosa and B subtilis Fur models

To compare E coli Fur models with other bacterialFur models we also built P aeruginosa and B subtilis Furbinding site models As with E coli Fur binding siteswe applied the same method to align footprinted Furbinding sites from P aeruginosa (41) and B subtilis (42)For both bacteria we obtained two classes of alignments

(Supplementary Figure S6) which correspond to theE coli M9 and M6 logos (Figure 1B and C) respectivelyNo M12 alignments were obtained for these two bacteriaprobably because some sites used to make the modelsare single-dimer binding sites (see Discussion Section)Both P aeruginosa and B subtilis M9 models arehighly similar to the E coli M9 model (Figure 4) TheP aeruginosa M6 logo looks almost identical to theE coli M6 logo while the B subtilis M6 logo is lesssimilar to the E coli M6 logo (Figure 1C SupplementaryFigure S6B and D)As with Lavrrar and McIntoshrsquos work on E coli Fur

binding sites (55) Baichoo and Helmann (40) performedsimilar experiments on eight synthetic oligos and twonatural sites (feuA and dhbA) with B subtilis Fur anddetermined how many Fur dimers can bind each of theseoligos We tested the B subtilis M9 and M6 modelswith these 10 sequences (Table 2 Figure 2C and Dand Supplementary Figure S7) The results show that theM9 model made correct predictions for all 10 sequenceswhile the M6 model was correct for only four oligos

A

B

fepB binds 2 Fur dimers

entS binds 3 Fur dimers

C

D

feuA binds 1 Fur dimer

dhbA binds 2 Fur dimers

10 20 30 5 g g a a t t c g a a a a t g a g a a g c a t t a t t g g a t c c c g 3

M12 157 bits

M9 109 bits

M9 82 bits

M6 152 bits

10 20 30 40 5 g g a a t t c g a t a a t g a a a t t a a t t a t c g t t a t c g g a t c c c g 3

M12 156 bits

M12 172 bits

M9 75 bits

M9 172 bits

M9 95 bits

M6 83 bits

M6 122 bits

10 20 30 5 a t t c c a a t t g a t a a t a g t t a t c a a t t g a a c a 3

M9 223 bits

M6 103 bits

M6 81 bits

10 20 30 5 a t t g a t a a t g a t a a t c a t t a t c a a t a g a t t g 3

M9 256 bits

M9 299 bits

M6 149 bits

M6 208 bits

M6 171 bits

Figure 2 Sequence walkers to test E coli (A and B) and B subtilis (C and D) Fur models (A and B) The three E coli Fur models (M12 M9 andM6) were tested with two E coli Fur binding sites fepB and entS (55) Sequence walkers of each model (green M12 red M9 blue M6) are shownfor each site (C and D) Two B subtilis Fur models (M9 and M6) were tested with two B subtilis Fur binding sites feuA and dhbA (40) Gray barsindicate natural sequence surrounded by synthetic DNA Different colored rectangles indicate different Fur models (by hue green M12 red M9blue M6) and their strengths (by saturation) (30) Only major walkers are shown in these figures for a full list of walkers (Rigt0 bits) please seeSupplementary Figures S3 S5 and S7

Nucleic Acids Research 2007 Vol 35 No 20 6767

For most of the oligos the M6 model predicted one moresite than the M9 model did Similar to our observationsin E coli these results strongly suggest that the M9 modelis a single-dimer binding model while the M6 alignment isdue to compression of the binding sequencesRelative to E coli Fur (NP_415209) the P aeruginosa

Fur (NP_253452) is moderately diverged (58 identical inprotein sequence) and the B subtilis Fur (NP_390233) isdistantly diverged (33 identity) However a comparisonof the three M9 models from these three species showsthat these models are highly similar to each other(Figure 4) suggesting that the FurndashDNA recognitionmechanism is conserved in even distantly related species

Scans of footprinted regions

Gel shift experiments cannot give precise Fur bindingregions Originally using the 11-site M9 model and laterusing the 24-site M9 E coli Fur model we predicted Fursites in the fhuF promoter region and then to validate ourmodel we performed DNase I footprinting on that regionTo further confirm the 24-site M9 model we also scannedit across all other available footprinted E coli Fur bindingregions and correlated our predictions with the publishedfootprints Fur has been documented to exhibit lsquosecondaryfootprintingrsquo protecting extended regions of DNA underhigher protein concentrations (Supplementary Figure S8)(333646) To reveal the extended regions we used a cutoffat zero bits since that is the theoretical lowest informationfor binding (19)The E coli 24-site M9 model predicts two separated

Fur binding clusters in the fhuF promoter region onecluster contains three strong sites of 161 208 and 223bits and the other contains two weaker sites of 146 and

93 bits (Figure 5A) The fhuF oligo used in the gel shiftexperiments in Figure 3 only contains the strong Furcluster To determine if both predicted clusters are pro-tected by Fur we performed DNase I footprinting on alarger fhuF promoter region (see Materials and MethodsSection) that contains both predicted Fur binding clustersThe results show that there are indeed two Fur-protectedregions in the fhuF promoter one has high affinity withFur (protected by 50 nM of Fur) and the other hasa lower affinity (weakly protected by 50 nM of Fur)(Figure 5B) The predicted strong Fur binding clustercovers 91 of the high-affinity footprinted region and theweak Fur cluster covers 66 of the low-affinity region(Table 3 Figure 5A)

There are 11 other footprinted E coli Fur bindingregions including the 10 regions used to build the initial11-site models and the fepD-entS promoter region (58)When these footprints were scanned similar coveragewas obtained (Table 3 Supplementary Figure S8)All 11 footprinted sequences displayed clusters of multipleoverlapping Fur walkers in these clusters the majorityof the walkers were separated by six bases some wereseparated by three bases (Table 3 SupplementaryFigure S8) For 7 of the 11 regions (fepA-fes fepB entCfur tonB cir and iucA) sequence walkers cover 83(72ndash93) of the footprints No sequence walkers coverthe secondary footprints in the fecA fepD-entS and sodAregions resulting in low coverage of these three footprints(66 54 and 56 respectively) The secondary sodAfootprint was only protected with a high concentration ofpurified Fur (200 and 500 nM) it was not protected bycrude extracts containing overproduced Fur (36) In theother region fecI several low-information content

A

B

yoeA fepD-entS gpmA ryhB fhuA nohA oppA

bits 257 172 243 220 161 205 187 11-site M9274 194 274 252 195 223 215 24-site M9

Fur minus + minus + minus + minus + minus + minus + minus +

gspC garP yahA fadD ygaC exbB exbD fhuF

bits 122 122 163 197 14 175 minus50 214 11-site M9144 138 186 217 minus34 191 minus71 223 24-site M9

Fur minus + minus + minus + minus + minus + minus + minus + minus +

Figure 3 Gel mobility shift assays to test predicted Fur binding sites Oligonucleotides containing predicted Fur binding sites were incubated without() or with (+) 150 nM E coli Fur protein and gel electrophoresed to test the 11-site M9 model (Figure 1B) Below each set of lanes is the strengthin bits of the strongest sequence walker found on each oligo by the 11-site M9 model Predictions by the 24-site M9 model which includes 13 of thesites tested here (not ygaC or exbD) are also given for comparison (A) The first set of oligos contain seven predicted Fur sites located in promoterregions (yoeA fepD-entS gpmA ryhB fhuA nohA and oppA) (B) The second set of oligos contain four predicted Fur sites located within genes(gspC garP yahA and fadD See Supplementary Figures S10ndashS13 for sequence walkers) two sites located in promoter regions (exbB and fhuF)and two consensus-predicted sites (ygaC and exbD) (5051) As expected all oligos that contain predicted sites (by the 11-site M9 model) exhibit oneor more mobility shifts following incubation with 150 nM Fur the oligo ygaC (14 bits) shows extremely weak binding and exbD (50 bits)does not shift

6768 Nucleic Acids Research 2007 Vol 35 No 20

walkers (02 and 08 bits) extend past the range of thefootprint resulting in 144 coverage of the footprint

In a previously published study Escolar et al synthe-sized oligonucleotides that contained repeats of thesequence GATAAT and determined Fur binding byfootprint experiments (5) No footprint was seen withone insert correspondingly no sequence walkers wereobserved in that sequence (Supplementary Figure S9)For the oligos with three four and five inserts sequencewalkers cover 83ndash90 of the footprints The 2-insertsequence had a weak interaction with Fur at high protein

concentrations and two overlapping walkers of 30 and42 bits appeared which cover the repeats (5) Theseresults demonstrate that the Fur model accurately predictsFur binding on both natural and synthetic sequences

Whole genome scan

The lowest information of any site in the 24-site M9 modelis 101 bits (Supplementary Figure S1) Since all sites inthe model are proven binding sites we used 10 bits asthe cutoff for a whole genome scan This presumably

A

B

C

======gtolt======

======gtolt======

======gtolt======

24 E coli Fur binding sites 196 bits (minus9 +9)

0

1

2b

its

5prime

minus20

CTGA

minus19

minus18

minus17

minus16

GCAT

minus15

GCTA

minus14

CGTA

minus13

GACT

minus12

CAGT

minus11

CGTA

minus10

GCAT

minus9

GTA

minus8

GTA

minus7

GCAT

minus6

CAG

minus5

GTCA

minus4

GAT

minus3

TA

minus2

TGCA

minus1

GACT

0 1

TCGA

2

ACGT

3

AT

4

TCA

5

CGAT

6

GTC

7

CTGA

8

CAT

9

CAT

10

CGTA

11

GCAT

12

GTCA

13

CTGA

14

GCAT

15

CGAT

16

CGTA

17 18 19 20

GCAT

3prime

20 P aeruginosa Fur sites 186 bits (minus9 +9)

0

1

2

bit

s

5prime

minus20

minus19

minus18

minus17

minus16

minus15

minus14

CGTA

minus13

ACGT

minus12

minus11

minus10

CGTA

minus9

CGTA

minus8

CTA

minus7

CT

minus6

TAG

minus5

GTCA

minus4

CAGT

minus3

TA

minus2

CTA

minus1

CT

0

GC

1

GA

2

GAT

3AT

4GTCA

5

CAGT

6

ATC

7

GA

8

GAT

9

GCAT

10

GCAT

11 12 13

TGCA

14

GCAT

15 16 17 18 19 20 3prime

22 B subtilis Fur sites 229 bits (minus9 +9)

0

1

2

bit

s

5prime

minus20

minus19

GCAT

minus18

GCTA

minus17

GACT

minus16

CTGA

minus15

minus14

GCAT

minus13

GCAT

minus12

minus11

GCAT

minus10

GCTA

minus9

CGTA

minus8

CGTA

minus7

AT

minus6

C

G

minus5

GCA

minus4

CGAT

minus3

TGA

minus2

TGA

minus1

GCAT

0

TAGC

1

GCTA

2

CAT

3

CAT

4

GCTA

5

CGT

6

G

C7

TA

8

GCAT

9

GCAT

10

CGAT

11

CGTA

12 13

CGTA

14

CGTA

15 16

GACT

17

CTGA

18

CGAT

19

CGTA

20 3prime

Figure 4 Comparison of M9 Fur models of three bacterial species Experimentally proven Fur binding sites from E coli (A) P aeruginosa (B) andB subtilis (C) were used to build the models for these three bacteria The marked region from 7 to +7 in each logo indicates the 7-1-7 model thatwas proposed by Baichoo and Helmann (40)

Nucleic Acids Research 2007 Vol 35 No 20 6769

minimizes false positives in our predictions When themodel was scanned across the E coli K-12 genome(NC_000913) a total of 412 sites were found above10 bits These sites belong to 363 unique clusters A list ofthe clusters is available at httpwwwccrnpncifcrfgovtomspapersfurThe 363 regions were extracted from 50 to thorn50

around the strongest site in each region and scanned withthe 24-site M9 model using a cutoff of 0 bits We thendetermined the exact region of each cluster that is coveredby walkers greater than 0 bits The results show that theseclusters contain 1ndash14 walkers covering a region from19ndash142 bases (350 bases on average) This is comparableto the known footprints (30ndash103 bases) (Table 3 Supple-mentary Figure S8) Of these clusters 155 are in inter-genic regions 143 are entirely inside gene coding regionsand 65 overlap the boundary between coding and inter-genic regionsTo determine how many of the Fur clusters overlap

with reasonably strong promoters we scanned these 363Fur clusters with a s70 promoter model (47) using a 4-bitcutoff The results showed that 303 clusters overlap withone or more strong promoters (Rigt4 bits) The other60 clusters do not overlap with a strong promoterWhen we scanned the flanking regions of these 60 clusters(50 to 0 of the left edge and 0 to thorn50 of the right edgeof each cluster) only 23 clusters were found to havestrong promoters (Ri gt 4 bits) in the flanking regionsWe examined the locations of Fur sites relative toknown promoter elements (transcriptional start site 10and 35) and found that they are more tightly distributedaround the promoter elements than observed with theDNA binding proteins Fis H-NS and IHF (47) (data notshown) These results suggest that Fur may mainly act as adirect repressor of genes by binding to and blockingthe promoters and that direct activation by Fur throughbinding to the region upstream of a promoter assuggested in Neisseria meningitidis by Delany et al (59)

may not be a common mechanism of Fur regulationin E coli

Out of the 363 clusters 42 were found containingat least one predicted Fur binding site above 170 bits(our arbitrary cutoff used to locate significant regions)(Table 4) Of the 42 clusters 39 were found to overlap withone or more strong s70 promoters (gt4 bits) The otherthree clusters fepD-entS yddA and fecA do not overlapScanning the flanking regions of these three clusters withthe s70 promoter model revealed that these Fur clustersare located between the respective promoters and transla-tional starts There may be only weak repression of thesegenes by Fur as none of them were significantly repressedby Fur according to microarray analyses (1360)

Scans of other proposed Fur-regulated genes

Many genes have been proposed to be Fur-regulated bycomparing conventional consensus sequences to promoterregions and also by homology to systems in other organ-isms Kammler et al annotated two Fur binding sites inthe promoter region of feoA in E coli by comparison toa consensus sequence (10) The 24-site M9 model predictstwo clusters of sites in the same region with each clustercovering one of the marked Fur boxes One cluster hasone major site of 105 bits (at 3538006) the other clusterhas two major overlapping sites of 109 (at 3538059) and134 bits (at 3538065) respectively The same authors haveconfirmed that Fur does bind to this region in vivo but theexact Fur binding regions have not been determined byfootprint experiments

Vassinova and Kozyrev used an in vivo selection tolocate Fur sites on Sau3A fragments from the E coligenome (51) The five regions from Figure 2 of their paperwere analyzed using sequence walkers Using an unidenti-fied consensus sequence yhhX (from 3578828 to 3577791)was predicted by Vassinova and Kozyrev to have two Furbinding sites In the same region we found one strongsite of 252 bits (at 3579054) (Table 4) overlapping twoweaker sites (82 and 83 bits) This cluster of Fur sites isactually located right in front of an sRNA gene ryhB(from 3579039 to 3578946) (Table 4) (6162) which hasbeen shown to be repressed by Fur (57) The promoterregion of gpmA (from 786818 to 786066 named pgm inreference 51) was predicted by consensus to contain twosites One of the highest sites (274 bits) was found in thisregion (at 786853) overlapping two weaker sites of 75and 56 bits The consensus sequence-predicted Fur site byVassinova and Kozyrev (51) is located about 600 basesupstream of the ygaC gene In the same region our 11-siteM9 model only found a 14-bit site (Figure 3) and the24-site M9 model found a negative site of 34 bits(Supplementary Figure S1) Instead in the nearby regionsthe 24-site M9 model found a 74-bit site (11 basesupstream of ygaC) and a 101-bit site (728 bases upstreamof ygaC) For nohA (from 1634391 to 1633822)two overlapping sites of 223 and 126 bits were found at1634624 and 1634630 (Table 4) The fifth region was fhuF(from 4603686 to 4602898) Figure 5 shows the DNase Ifootprints and predicted walkers in this region Onlythe high-affinity region (fhuF1) was identified by the

Table 2 Information theory test of B subtilis Fur models M9 and M6

Oligo1 Number of observed1

Fur dimers (nM)Number of M92

walkers (bits)Number of M62

walkers (bits)

6-mer 0 f 0 f 06-6 1 (500) f 1 (21) 2 (67 54)6-1-6 1 (200) f 1 (97) 2 (54 74)7-1-7 1 (100) f 1 (207) 2 (107 128)8-1-7 2 (100 1000) f 2 (76 202) f 2 (174 108)8-1-8 2 (100 1000) f 2 (76 230) f 2 (174 160)9-1-9 2 (50 100) f 2 (119 301) f 2 (208 194)2 (7-1-7) 2 (10 100) f 2 (166 257) 3 (61 208 143)feuA 1 (10) f 1 (223) 2 (103 81)dhbA 2 (10 100) f 2 (256 299) 3 (149 208 171)

1Oligos and observed Fur dimers are from (40) The Fur concentration(nM) at which one or two Fur dimer-DNA complexes appeared isgiven in parentheses2Number of sequence walkers of each model followed by theinformation (bits) of each walker are shown for each sequence Thepredictions that match the number of observed Fur dimers are markedwith a solid circle while those that do not are marked with an opencircle Only major walkers are counted A full depiction of all walkerscan be found in Supplementary Figure S7

6770 Nucleic Acids Research 2007 Vol 35 No 20

A B

L

H

H

L

H fhuF1

L fhuF2

yjjZ

4603880 4603870 4603860 4603850 4603840 4603830 4603820 5 a t c a g c a a t c c c g g c a g c a a c a c t c c c c a g c c a c t g c c c a g c g t a c g t t g c a a c a t g a t t t c a t c t c 3

4603810 4603800 4603790 4603780 4603770 4603760 4603750 5 t t t c a t t g a t a a t g a t a a c c a a t a t c a t a t g a t a a t t t t t a t c a t t t g c a a g c c a g a t a a a t c c c t t g c t a t c g g 3

Fur high affinity topFur high affinity bottom

M9 161 bits

M9 208 bits

M9 223 bits

M9 34 bits

fhuF

fhuF

4603740 4603730 4603720 4603710 4603700 4603690 4603680 5 g t a a a c c t a t c g c t a t g a t t a g c a a t c a t t a t c a t t t a g a t t a c t a t c c c g a t t a t g g c c t a t c g t t c 3rsquo

Fur low affinity topFur low affinity bottom

M9 22 bits

M9 01 bits

M9 146 bits

M9 93 bits

Figure 5 Fur model scan and DNase I footprints of the fhuF promoter region (A) The promoter region was scanned with the 24-site M9 model and two clusters of sequence walkers (Rigt 0bits) were detected with each closely corresponding to one of the two Fur protected regions (gray bars fhuF1 and fhuF2) The solid black arrow with two tails above the DNA at coordinates4603717 and 4603716 indicates the fhuF transcription starts The open arrows starting at coordinates 4603827 and 4603686 indicate the yjjZ and fhuF translation starts respectively Thesaturation of colored rectangles behind each sequence walker is proportional to the information of that site For reference colored lines cycle with 6-base periodicity (B) DNase I footprinting onthe fhuF promoter by Fur showed two regions protected by the protein marked in the figure by brackets The high-affinity and low-affinity regions are marked with H and L respectively Thefootprinting samples were run in parallel with MaxamndashGilbert sequencing ladders (marked by GA)

Nucleic

Acid

sResea

rch2007V

ol3

5N

o20

6771

consensus sequence Thus four of the five Sau3Afragments (yhhX gpmA nohA and fhuF) selected to haveFur binding sites in vivo were identified in our genomescan (Table 4) and our gel shift results confirmed this(Figure 3) the remaining site ygaC had a weak walker of14 bits (by the 11-site M9 model) and showed extremelyweak binding (Figure 3)Eick-Helmerich and Braun matched a Fur consensus

sequence to the promoters of exbB and exbD (50)One strong walker of 191 bits was found in the exbBpromoter region overlapping the lsquoFur boxrsquo The regionupstream of exbD showed no walkers with Ri greater than0 bits even though it had also been predicted to containa Fur binding site using the same consensus sequenceas used in the exbB promoter The highest walker thatwe could detect in the region upstream of the exbDgene is 71 bits (by the 24-site M9 model Figure 3 andSupplementary Figure S1) significantly lower than thetheoretical lower bound of a binding site (0 bits) (19) Thisindicates that Fur should not bind this region and our gelshift results confirmed this (Figure 3)A microarray analysis by McHugh et al found 101

genes to be regulated by Fur 53 of which were repressedand 48 activated (13) The 53 Fur-repressed genes belongto 32 transcription units (individual genes or operons)18 of which are involved in iron metabolism 14 in energymetabolism and other functions Using the 24-site M9model we predicted strong Fur binding clusters (higherthan 13 bits) for all 18 iron metabolism-related transcrip-tion units Of the other 14 transcription units onlytwo have a Fur binding site higher than 10 bits (nrdHIEF101 bits fimE 108 bits) The 48 Fur-activated genesbelong to 34 transcription units only three of these(garPLRK ynaE and ydfK) have a predicted Fur bindingcluster The garPLRK operon has a weak Fur clusterthat contains one major site of 104 bits The ynaE andydfK genes are almost identical in both promoter andcoding regions and the two Fur clusters (171 bitscentered at thorn8 from the translational start) are the same(Table 4) These results strongly suggest that in E coli

Fur is a direct repressor of iron metabolism genes and anindirect repressor or activator of other genes

McHugh et al (13) used an altered information theoryapproach for modeling Fur binding (63) to identify whichgenes are under direct Fur control Our model identifiedsites in all genes that their model did as well as fourothers fhuE (124 bits at 1160880) exbB (191 bits at3150076) fimE (108 bits at 4539697) and ydfK (171 bitsat 1631071)

The indirect activation by Fur requires an intermediateregulator which could directly target Fur-activated genesA small regulatory RNA gene ryhB is one such importantplayer Earlier study revealed several genes that aredirectly repressed by ryhB (57) Recently a microarrayanalysis of ryhB control by Masse et al (60) expanded thisdirect target set to 18 operons They also identified anadditional 10 indirectly down-regulated operons and 10up-regulated operons The 10 ryhB indirectly repressedoperons are directly repressed by Fur and for all ofthese operons we found strong Fur binding clusters(higher than 13 bits) in the corresponding promoterregions In contrast only a few of the other 28 ryhB-controlled operons (18 directly repressed operons and10 up-regulated operons) have a predicted Fur bindingcluster above 10 bits These include acnA (176 bits at1333484) ydhD (109 bits at 1732367) and oppA (215 bitsat 1298973) The acnA and oppA Fur clusters are bothstrong the former cluster has three overlapping sites of176 98 and 142 bits and the latter has two overlapp-ing sites of 215 and 96 bits (Table 4) Howevermicroarray analysis did not find these two genes to berepressed by Fur (1360) suggesting that the Fur bindingclusters may not be the primary control elements ofthese genes

DISCUSSION

Fur binding models

In this study we used experimentally proven sites to createinformation theory models of Fur binding Our models

Table 3 Scan of footprinted Fur binding sites in E coli

Accession Gene Footprinted region References Number Ri Percentage

NC_000913 fhuF1 4603763 ndash 4603815 This work 4 223 91NC_000913 fhuF2 4603680 ndash 4603733 This work 4 146 66NC_000913 fepA-fes 611868 ndash 611915 (35) 4 187 77NC_000913 fepB 623939 ndash 623969 (39) 2 167 81NC_000913 entC 624054 ndash 624102 (39) 4 231 76NC_000913 fur 709931 ndash 709960 (34) 2 123 83NC_000913 tonB 1309040 ndash 1309072 (838) 3 115 88NC_000913 cir 2244957 ndash 2244999 (32) 3 207 72NC_000913 sodA 4098726 ndash 4098780 (36) 3 233 56NC_000913 fecA 4514752 ndash 4514789 (33) 2 176 66NC_000913 fecI 4516300 ndash 4516340 (33) 7 231 144M10930 iucA 291 ndash 393 (37) 13 203 93NC_000913 fepD-entS 621394 ndash 621462 (58) 4 201 54

The footprinted regions were scanned with the 24-site M9 model (Rigt0) to test the modelrsquos validity This table lists the GenBank accession numberthe gene promoter that was footprinted the coordinates of the footprinted region the references for the footprints the number of walkers in theregion the strongest Ri value in the same region and the percent of the footprint covered by sequence walkers This table is a summary of Figure 5Aand Supplementary Figure S8 Sites in the promoters that are marked by an asterisk were used to build the 11-site models (Figure 1 SupplementaryFigure S1) in the case of iucA two sites were used in the model (see Materials and Methods Section)

6772 Nucleic Acids Research 2007 Vol 35 No 20

especially the 24-site M9 model approximate the bindingcharacteristics of Fur more fully than previous modelsthat depended on conventional consensus sequences datafrom multiple species and sequences that were not foot-printed (13251) The rigorous approach used hererevealed new binding sites disproved two sites predictedby a consensus sequence and clarified the manner inwhich the protein binds

To produce these models binding sequences weremultiply aligned to maximize the information content indifferent window sizes Because multiple Fur sites overlapour analysis of proven Fur binding sequences from three

bacterial species gave two or three different alignmentsdepending on the species (Figure 1 SupplementaryFigure S6) By testing these models against publisheddata we showed clearly that the M9 model representssingle-dimer binding sites the M12 model is for two-dimerbinding while the compact M6 alignment was caused bycompression of the binding sequences into a small window(Tables 1 and 2 Figures 1 and 2 Supplementary FiguresS3 S5 and S7)As evaluated by their respective M9 models our

initial set of 11 footprinted E coli Fur binding sitesall contain overlapping sites of 6-base spacing

Table 4 Strong Fur clusters in the E coli genome predicted by the 24-site M9 model

Ri Number Size Coordinate Genes Experimental Position References(bits) (bp) data

274 12 95 2066611 yoeA G 48 NA274 5 31 786853 gpmA G 35 (1351)253 4 28 579824 nohB 133 (1)252 4 31 3579054 ryhB G 15 (57)233 3 31 4098746 sodA F 87 (6069)231 7 59 4516309 fecI F 51 (1360)231 4 37 624072 entC F 36 (39)228 2 25 1903412 yebN 300 NA223 4 48 4603779 fhuFa FG 93 (135160)223 3 25 1634624 nohA G 233 (51)217 2 25 1886651 fadD G +1119b NA215 2 25 1298973 oppA G 233 (7071)213 3 31 1762463 sufA 53 (1360)207 3 31 2244980 cir F 189 (1372)206 5 40 1118358 yceJ 89 NA202 3 31 3465053 bfd 40 (1360)201 10 53 1787602 ydiE 35 (13)201 2 35 1080513 ycdN 57 (1360)199 8 106 3214572 yqjH 59 (13)199 7 103 1752756 ydhY 255 NA199 2 25 2510784 mntH 56 (7374)196 3 25 2231892 yeiT 163 NA196 2 25 1577358 yddA +8c (13)195 3 31 167439 fhuA G 45 (1360)194 4 37 621437 fepD ndash entS FG 25 86 (1375)193 3 52 675768 ybeQ +2c NA191 3 31 3150076 exbB G 70 (136072)187 8 96 2784036 ygaQ 383 NA187 4 37 611889 fepA ndash fes F 172 149 (13)186 7 38 3266404 yhaC 33 NA186 4 28 331901 yahA G +306e NA183 4 61 331085 yahA 510e NA176 6 41 1333484 acnA 371 (5776ndash78)176 2 25 4514779 fecA F 79 (1360)176 1 19 2254041 yeiL +5d NA175 3 25 490057 priC ndash ybaN 21 49 (7980)173 2 29 2454142 yfcV 474 NA172 8 41 3582772 yrhB 10 NA172 4 37 840877 fiu 123 (13)171 3 40 1631071 ydfK +8c (13)171 3 40 1432273 ynaE +8c (13)171 2 25 4570225 yjiT 212 NA

The strongest individual information value (Ri bits) in each cluster is shown followed by the number of walkers (Rigt 0 bits) the size of the clusterthe coordinate of the strongest site the genes that are presumably controlled experimental data footprinted (F) andor gel shifted (G) sites thelocation of the strongest Fur site relative to the gene start or stop codon (as noted) and references about Fur regulation of the genes (NA means noreference is available)aThis cluster corresponds to the high-affinity region (fhuF1) in the fhuF promoter (Figure 5)bThe coordinate is high because Fur cluster is inside the fadD coding regioncThese Fur clusters overlap with translational startsdThis Fur cluster overlaps with the stop codoneThe yahA gene has two Fur clusters one is inside the coding region another is 510 bases upstream of the translational start The Fur cluster insidethe gene was tested by gel shift experiments (Figure 3)

Nucleic Acids Research 2007 Vol 35 No 20 6773

(Supplementary Figure S8) while only 7 of the 20P aeruginosa Fur binding sites and 6 of the 22 B subtilisFur binding sites are single-dimer binding sites (data notshown) This scarcity of multiple sites explains why anM12 alignment can be obtained for E coli but not forP aeruginosa or B subtilis When we performed multiplealignment by only using overlapping sites of 6-bp spacingwe obtained M12 alignments for both P aeruginosa andB subtilis (data not shown) These observations alsosupport the use of M9 as a single-dimer modelOur multiple alignment method should apply to any

DNA binding sites that contain internal direct repeats asrepetition within binding sites may cause alternativealignments of the sites In the RegulonDB (httpregulondbccgunammx) (64) a Fur binding modelwas created by multiply aligning 47 Fur sites althoughit is not clear what these 47 sites are and how thealignment was made Based on their alignment (httpregulondbccgunammxhtmlmatrix_Alignmentjsp) we made a sequence logo (data notshown) which we found to be similar to our M6 logosAlthough the M6 models appear to be able to predictsome sites (eg Tables 1 and 2 Supplementary Figures S3S5 and S7) the models only represent the internal partof overlapping Fur binding sites ie from 6 to thorn6 of theM12 model (Figure 1A)A 18 A crystal structure for P aeruginosa Fur

was obtained by Pohl et al and based on this theauthors proposed a single-dimer binding model (9) TheP aeruginosa crystal structure contains a putative DNAbinding a-helix H4 and a loop between helices H1 and H2both of which were proposed to be involved in bindingDNA through contacting bases in the major groove Thusa single dimer would protect two consecutive majorgrooves on the same face of DNA and the minimalrecognition unit of Fur should be close to 20 bp (9) OurM9 binding site models (19 bp a 9-1-9 inverted repeat) fitwith this single-dimer binding model closely Pohl et alalso proposed a two-dimer binding model in whichtwo Fur dimers bind two overlapping sites that are 5-bpapart (9) However in our whole genome scans andfootprint scans as well we did not observe any case of5-bp separated overlapping sites above 5 bits Instead wemostly found 3- or 6-bp separated overlapping sitesAbove five bits we found 95 cases of 3-bp spacing and92 cases of 6-bp spacingSeveral other consensus-based Fur binding site models

have been proposed to interpret a 19-bp consensuslsquoFur boxrsquo (Figure 6) (65) Within these two earlier

models the classical model and hexamer model(573754) interpreted the lsquoFur boxrsquo as a single recogni-tion unit of a Fur dimer Later Lavrrar and McIntoshsuggested that a 13-bp inverted repeat (6-1-6) is theminimal unit recognized by a single Fur dimer and thattwo overlapping lsquo6-1-6rsquo motifs correspond to the Fur boxwhich is required for high-affinity binding of Fur (5558)Baichoo and Helmann reinterpreted the Fur box as twooverlapping 7-1-7 inverted repeats with a 6-bp spacingand suggested that a 7-1-7 site but not 6-1-6 representsthe minimal recognition unit of Fur (40) Baichoo andHelmann also demonstrated that high-affinity bindingby Fur can happen on a single-dimer binding site ie aninverted 7-1-7 site

The 7-1-7 model basically agrees with our M9 models(Figure 6) The main difference is that the 7-1-7 model didnot count the weakly conserved bases at positions 9 8thorn8 and thorn9 in the B subtilis Fur binding site alignment(Figure 4C) As the E coli and P aeruginosa M9 modelshave highly conserved bases at these four positions(Figure 4A and B) and also the Fur binding mechanismsfor these three species are similar (9) we suggest thata minimal Fur binding unit of 19 bp (9-1-9) as representedby our M9 models is more reasonable

Relative to our M9 models the Fur box only representsthe internal part of two overlapping sites that areseparated by 6 bases ie from 9 to thorn9 of the M12logo (Figures 1A and 6) thus any predictions based on theFur box may not be accurate Two sites (exbD and ygaC)that were predicted by matching to the Fur box wereshown to be non-binding (Figure 3)

Many difficulties in understanding Fur binding sites canbe attributed to the choice of consensus sequences asa model (66) In contrast to the individual informationweight matrix model the consensus sequence methodignores the varying importance of bases by treatingmismatches equivalently In addition the consensusmethod does not have a criterion for an acceptablenumber of mismatches

Sequence walkers predicted by the 24-site M9 modelcover 83 of the footprints (Table 3 Figure 5Supplementary Figure S8) Similar results were obtainedfor both the P aeruginosa and B subtilis M9 models(data not shown) These results strongly suggest that ourM9 models can accurately predict Fur binding

Fur Binding Site Clusters

The whole genome scan with the 24-site M9model revealed363 Fur binding clusters in E coli most of which containmultiple walkers (up to 14) covering regions up to about140 bases This is comparable to the size of knownfootprints which ranges from 30 to 103 bases (Table 3Supplementary Figure S8) Most of the clusters (303 outof 363 83) were found to overlap with one or morestrong s70 promoters Indeed the four Fur clusters inthe middle of coding regions gspC garP yahA and fadDthat we tested by gel mobility shift assays (Figure 3) allhave one or more potential s70 sites (Ri gt 4 bits) exactlyoverlapping them (see Figures S10ndashS13) The function ofthese control elements is unknown

Fur boxClassical model 9-1-9Hexamer model 6-6-1-6Lavrrar model 6-1-6

Baichoo model 7-1-7

M9 model 9-1-9

GATAATGATAATCATTATC

========gtOlt========

=====gt=====gtOlt=====

=====gtOlt==========gtOlt=====

t=====gtOlt============gtOlt=====a

aat=====gtOlt================gtOlt=====att

Figure 6 Different Fur binding models to interpret the Fur box

6774 Nucleic Acids Research 2007 Vol 35 No 20

We correlated our whole genome scan with publishedmicroarray data about Fur in E coli (1360) The resultsshowed that all operons that are involved in ironmetabolism have a predicted strong Fur binding cluster(higher than 13 bits) suggesting direct repression ofthese genes by Fur In contrast only a few of otherFur-activated or repressed operons have a predicted Furcluster higher than 10 bits suggesting indirect control ofthese genes by Fur It has been shown that RNA generyhB directly targets and represses a set of other genes andryhB itself is directly repressed by Fur thus ryhB playsan important role in indirect activation of genes by Furin E coli (135760)

It has been suggested that in N meningitidis Fur canalso directly activate genes by binding to the regionupstream of the promoters (59) However our wholegenome analysis did not support this activation mechan-ism in E coli The E coli ryhB gene is located betweengenes yhhX and yhhY No corresponding homologs of thegenes ryhB yhhX or yhhY can be found in N meningitidisThis suggests that Fur activation mechanisms may bedifferent in different species

Disentangling Fur models

Each binding site that we have studied in detail withinformation theory has had unique properties andchallenges for analysis For example ribosome bindingsites were initially modeled as rigid objects (16) but whenenough sites became available it was clear that theShinendashDalgarno sequence was being lost because it has avariable distance to the initiation codon (67) and a flexiblemodel was required (48) The flexible model was latersuccessfully applied to s70 promoters (47) and that wasused to study Fur sites Likewise information theoryanalysis revealed that Fis binding sites are commonlyfound to overlap each other (68) leading to someuncertainty as to how much nearby sites influence thesites being studied in the alignments Overlapping Fursites that are bound by Fur protein dimers create an evenmore complex situation

Although the 24-site M9 model was quite successful themodel is comprised of information from multiple Fur sitesbecause the Fur dimers overlap on the DNA Since theclusters frequently display 6-base separation of sitespositions 3 to thorn9 of the model contain data frompositions 9 to thorn3 of the adjacent Fur sites eg see thelogo of Figure 4 and the M9 walkers of Figure 2AIn addition information from sites separated by 3 basesand the complements of all of these also contribute to theoverall information content of the model It is not obvioushow to disentangle these effects We suggest that the M9model can be scanned across the genome to locate strongsingle-dimer binding sites which could then be experi-mentally confirmed and used to construct a disentangledmodel

Unfortunately the distribution of the individual infor-mation in these sites would be determined by the modelused to locate them and this could introduce biases Forexample the mean of the distribution can be altered justby selecting different subsets of sites from the distribution

A similar bias has occurred in binding site databaseswhere the consensus sequence was used to locate newsites (23) Thus although the sequence logo would nolonger contain self-overlapping data it would not neces-sarily represent the natural distribution of site strengthsTo approach such an ultimate model may require furtherscanning of footprinted regions Incorporation of theweaker sites in the footprints would of course re-entanglethe model so such scans might be best used only todetermine whether the model can detect the weaker sitesA viable solution may be to obtain all single-dimerbinding sites in the genome that have more than zero bitsand can be confirmed experimentally If the resultingmodel matches the observed footprints it may be consi-dered complete On the other hand if the resultingmodel does not match the footprints the mode of bindingby Fur could be different between single dimers andclusters

SUPPLEMENTARY DATA

Supplementary data are available at NAR Online

ACKNOWLEDGEMENTS

We thank Tom OrsquoHalloran and Caryn Outten for Furprotein Pete Rogan for helping to develop the local bestconcept Elaine Bucheimer for writing the localbestprogram and Lakshmanan Iyer Brent Jewett DanielleNeedle Xiao Ma Shu Ouyang Denise Rubens andBruce Shapiro for their helpful comments and discussionsKAL also thanks the National Cancer Institute atFrederick for sponsoring the Werner H Kirsten StudentIntern Program This publication has been funded in partwith Federal funds from the National Cancer InstituteNational Institutes of Health under contract NO1-CO-12400 The content of this publication does not necessarilyreflect the views or policies of the Department of Healthand Human Services nor does mention of trade namescommercial products or organizations imply endorsementby the US Government This research was supported inpart by the Intramural Research Program of the NIHNational Cancer Institute Center for Cancer ResearchFunding to pay the Open Access publication charges forthis article was provided by National Cancer Institute

Conflict of interest statement None declared

REFERENCES

1 StojiljkovicI BaumlerAJ and HantkeK (1994) Fur regulon ingram-negative bacteria Identification and characterization of newiron-regulated Escherichia coli genes by a fur titration assay[published erratum appears in J Mol Biol 1994 Jul 15240(3)271]J Mol Biol 236 531ndash545

2 Le CamE FrechonD BarrayM FourcadeA and DelainE(1994) Observation of binding and polymerization of Fur repressoronto operator-containing DNA with electron and atomic forcemicroscopes Proc Natl Acad Sci USA 91 11816ndash11820

3 CalderwoodSB and MekalanosJJ (1987) Iron regulation ofShiga-like toxin expression in Escherichia coli is mediated by thefur locus J Bacteriol 169 4759ndash4764

Nucleic Acids Research 2007 Vol 35 No 20 6775

4 NiederhofferEC NaranjoCM BradleyKL and FeeJA (1990)Control of Escherichia coli superoxide dismutase (sodA and sodB)genes by the ferric uptake regulation (fur) locus J Bacteriol 1721930ndash1938

5 EscolarL Perez-MartinJ and de LorenzoV (1998) Binding of theFur (Ferric Uptake Regulator) repressor of Escherichia coli toarrays of the GATAAT sequence J Mol Biol 283 537ndash547

6 BaggA and NeilandsJB (1987) Ferric uptake regulation proteinacts as a repressor employing iron (II) as a cofactor to bind theoperator of an iron transport operon in Escherichia coliBiochemistry 26 5471ndash5477

7 de LorenzoV WeeS HerreroM and NeilandsJB (1987)Operator sequences of the aerobactin operon of plasmid ColV-K30binding the ferric uptake regulation (fur) repressor J Bacteriol169 2624ndash2630

8 AlthausEW OuttenCE OlsonKE CaoH andOrsquoHalloranTV (1999) The ferric uptake regulation (Fur) repressoris a zinc metalloprotein Biochemistry 38 6559ndash6569

9 PohlE HallerJC MijovilovichA Meyer-KlauckeWGarmanE and VasilML (2003) Architecture of a protein centralto iron homeostasis crystal structure and spectroscopic analysis ofthe ferric uptake regulator Mol Microbiol 47 903ndash915

10 KammlerM SchonC and HantkeK (1993) Characterization ofthe ferrous iron uptake system of Escherichia coli J Bacteriol 1756212ndash6219

11 DesaiPJ AngererA and GencoCA (1996) Analysis of Furbinding to operator sequences within the Neisseria gonorrhoeae fbpApromoter J Bacteriol 178 5020ndash5023

12 HeidrichC HantkeK BierbaumG and SahlHG (1996)Identification and analysis of a gene encoding a Fur-likeprotein of Staphylococcus epidermidis FEMS Microbiol Lett 140253ndash259

13 McHughJP Rodriguez-QuinonesF Abdul-TehraniHSvistunenkoDA PooleRK CooperCE and AndrewsSC(2003) Global iron-dependent gene regulation in Escherichia coliA new mechanism for iron homeostasis J Biol Chem 27829478ndash29486

14 ShannonCE (1948) A mathematical theory of communicationBell Syst Tech J 27 379ndash423 623ndash656 httpcmbell-labscomcmmswhatshannondaypaperhtml

15 PierceJR (1980) An Introduction to Information Theory SymbolsSignals and Noise 2nd edn Dover Publications Inc New York

16 SchneiderTD StormoGD GoldL and EhrenfeuchtA (1986)Information content of binding sites on nucleotide sequencesJ Mol Biol 188 415ndash431 httpwwwccrnpncifcrfgov~tomspaperschneider1986

17 SchneiderTD and MastronardeD (1996) Fast multiple alignmentof ungapped DNA sequences using information theory and arelaxation method Discrete Appli Math 71 259ndash268 httpwwwccrnpncifcrfgov~tomspapermalign

18 SchneiderTD and StephensRM (1990) Sequence logos a newway to display consensus sequences Nucleic Acids Res 186097ndash6100 httpwwwccrnpncifcrfgov~tomspaperlogopaper

19 SchneiderTD (1997) Information content of individual geneticsequences J Theor Biol 189 427ndash441 httpwwwccrnpncifcrfgov~tomspaperri

20 SchneiderTD (1997) Sequence walkers a graphical method todisplay how binding proteins interact with DNA or RNAsequences Nucleic Acids Res 25 4408ndash4415 httpwwwccrnpncifcrfgov~tomspaperwalker erratum NAR 26(4)1135 1998

21 PaninaEM MironovAA and GelfandMS (2001) Comparativeanalysis of FUR regulons in gamma-proteobacteria Nucleic AcidsRes 29 5195ndash5206

22 SchneiderTD (2000) Evolution of biological informationNucleic Acids Res 28 2794ndash2799 httpwwwccrnpncifcrfgov~tomspaperev

23 VyhlidalCA RoganPK and LeederJS (2004) Development andrefinement of pregnane X receptor (PXR) DNA binding site modelusing information theory insights into PXR-mediated gene regula-tion J Biol Chem 279 46779ndash46786

24 ZhengM DoanB SchneiderTD and StorzG (1999) OxyR andSoxRS regulation of fur J Bacteriol 181 4639ndash4643 httpwwwccrnpncifcrfgov~tomspaperoxyrfur

25 HengenPN BartramSL StewartLE and SchneiderTD (1997)Information analysis of Fis binding sites Nucleic Acids Res 254994ndash5002 httpwwwccrnpncifcrfgov~tomspaperfisinfo

26 WoodTI GriffithKL FawcettWP Jair K-W SchneiderTDand WolfRE (1999) Interdependence of the position andorientation of SoxS binding sites in the transcriptional activation ofthe class I subset of Escherichia coli superoxide-inducible promotersMol Microbiol 34 414ndash430

27 ZhengM WangX DoanB LewisKA SchneiderTD andStorzG (2001) Computation-directed identification of OxyR-DNAbinding sites in Escherichia coli J Bacteriol 183 4571ndash4579

28 RoganPK FauxBM and SchneiderTD (1998) Informationanalysis of human splice site mutations Hum Mutat 12 153ndash171Erratum in Hum Mutat 199913(1)82 httpwwwccrnpncifcrfgov~tomspaperrfs

29 ChenZ and SchneiderTD (2005) Information theory basedT7-like promoter models classification of bacteriophages anddifferential evolution of promoters and their polymerasesNucleic Acids Res 33 6172ndash6187 httpwwwccrnpncifcrfgov~tomspaperst7like

30 ChenZ and SchneiderTD (2006) Comparative analysis of tandemT7-like promoter containing regions in enterobacterial genomesreveals a novel group of genetic islands Nucleic Acids Res 341133ndash1147 httpwwwccrnpncifcrfgov~tomspaperst7island

31 SchneiderTD (1996) Reading of DNA sequence logos predictionof major groove binding by information theory Meth Enzymol274 445ndash455 httpwwwccrnpncifcrfgov~tomspaperoxyr

32 GriggsDW and KoniskyJ (1989) Mechanism for iron-regulatedtranscription of the Escherichia coli cir gene metal-dependentbinding of Fur protein to the promoters J Bacteriol 1711048ndash1052

33 AngererA and BraunV (1998) Iron regulates transcription of theEscherichia coli ferric citrate transport genes directly and throughthe transcription initiation proteins Arch Microbiol 169 483ndash490

34 de LorenzoV HerreroM GiovanniniF and NeilandsJB (1988)Fur (ferric uptake regulation) protein and CAP (catabolite-activatorprotein) modulate transcription of fur gene in Escherichia coliEur J Biochem 173 537ndash546

35 HuntMD PettisGS and McIntoshMA (1994) Promoter andoperator determinants for fur-mediated iron regulation in thebidirectional fepA-fes control region of the Escherichia colienterobactin gene system J Bacteriol 176 3944ndash3955

36 TardatB and TouatiD (1993) Iron and oxygen regulation ofEscherichia coli MnSOD expression competition between the globalregulators Fur and ArcA for binding to DNA Mol Microbiol 953ndash63

37 EscolarL Perez-MartinJ and de LorenzoV (2000) Evidence ofan unusually long operator for the fur repressor in the aerobactinpromoter of Escherichia coli J Biol Chem 275 24709ndash24714

38 YoungGM and PostleK (1994) Repression of tonB transcriptionduring anaerobic growth requires Fur binding at the promoterand a second factor binding upstream Mol Microbiol 11943ndash954

39 BrickmanTJ OzenbergerBA and McIntoshMA (1990)Regulation of divergent transcription from the iron-responsivefepB-entC promoter-operator regions in Escherichia coliJ Mol Biol 212 669ndash682

40 BaichooN and HelmannJD (2002) Recognition of DNA by Fura reinterpretation of the Fur box consensus sequence J Bacteriol184 5826ndash5832

41 OchsnerUA and VasilML (1996) Gene repression by the ferricuptake regulator in Pseudomonas aeruginosa cycle selection of iron-regulated genes Proc Natl Acad Sci USA 93 4409ndash4414

42 BaichooN WangT YeR and HelmannJD (2002) Globalanalysis of the Bacillus subtilis Fur regulon and the iron starvationstimulon Mol Microbiol 45 1613ndash1629

43 ChristoffersenCA BrickmanTJ Hook-BarnardI andMcIntoshMA (2001) Regulatory architecture of the iron-regulatedfepD-ybdA bidirectional promoter region in Escherichia coliJ Bacteriol 183 2059ndash2070

44 ChaiS WelchTJ and CrosaJH (1998) Characterization of theinteraction between Fur and the iron transport promoter of thevirulence plasmid in Vibrio anguillarum J Biol Chem 27333841ndash33847

6776 Nucleic Acids Research 2007 Vol 35 No 20

45 EscolarL Perez-MartinJ and de LorenzoV (1998) Coordinatedrepression in vitro of the divergent fepA-fes promoters ofEscherichia coli by the iron uptake regulation (Fur) proteinJ Bacteriol 180 2579ndash2582

46 EscolarL de LorenzoV and Perez-MartinJ (1997)Metalloregulation in vitro of the aerobactin promoter of Escherichiacoli by the Fur (ferric uptake regulation) protein Mol Microbiol26 799ndash808

47 ShultzabergerRK ChenZ LewisKA and SchneiderTD (2007)Anatomy of Escherichia coli s70 promoters Nucleic Acids Res 35771ndash788 httpwwwccrnpncifcrfgov~tomspaperflexprom

48 ShultzabergerRK BucheimerRE RuddKE andSchneiderTD (2001) Anatomy of Escherichia coli ribosomebinding sites J Mol Biol 313 215ndash228 httpwwwccrnpncifcrfgov~tomspaperflexrbs

49 HiraoI KawaiG YoshizawaS NishimuraY IshidoYWatanabeK and MiuraK (1994) Most compact hairpin-turnstructure exerted by a short DNA fragment d(GCGAAGC) insolution an extraordinarily stable structure resistant to nucleasesand heat Nucleic Acids Res 22 576ndash582

50 Eick-HelmerichK and BraunV (1989) Import of biopolymers intoEscherichia coli nucleotide sequences of the exbB and exbD genesare homologous to those of the tolQ and tolR genes respectivelyJ Bacteriol 171 5117ndash5126

51 VassinovaN and KozyrevD (2000) A method for direct cloning ofFur-regulated genes identification of seven new Fur-regulated lociin Escherichia coli Microbiology 146 3171ndash3182

52 ToledanoMB KullikI TrinhF BairdPT SchneiderTD andStorzG (1994) Redox-dependent shift of OxyR-DNA contactsalong an extended DNA binding site a mechanism for differentialpromoter selection Cell 78 897ndash909

53 BlattnerFR PlunkettG III BlochCA PernaNT BurlandVRileyM Collado-VidesJ GlasnerJD RodeCK et al (1997)The complete genome sequence of Escherichia coli K-12 Science277 1453ndash1474

54 BaggA and NeilandsJB (1987) Molecular mechanism of regula-tion of siderophore-mediated iron assimilation Microbiol Rev 51509ndash518

55 LavrrarJL and McIntoshMA (2003) Architecture of a Furbinding site a comparative analysis J Bacteriol 185 2194ndash2202

56 OrchardK and MayGE (1993) An EMSA-based method fordetermining the molecular weight of a protein ndash DNA complexNucleic Acids Res 21 3335ndash3336

57 MasseE and GottesmanS (2002) A small RNA regulates theexpression of genes involved in iron metabolism in Escherichia coliProc Natl Acad Sci USA 99 4620ndash4625

58 LavrrarJL ChristoffersenCA and McIntoshMA (2002)FurndashDNA interactions at the bidirectional fepDGC-entS promoterregion in Escherichia coli J Mol Biol 322 983ndash995

59 DelanyI RappuoliR and ScarlatoV (2004) Fur functions as anactivator and as a repressor of putative virulence genes in Neisseriameningitidis Mol Microbiol 52 1081ndash1090

60 MasseE VanderpoolCK and GottesmanS (2005) Effect ofRyhB small RNA on global iron use in Escherichia coliJ Bacteriol 187 6962ndash6971

61 ArgamanL HershbergR VogelJ BejeranoG WagnerEGMargalitH and AltuviaS (2001) Novel small RNA-encodinggenes in the intergenic regions of Escherichia coli Curr Biol 11941ndash950

62 WassarmanKM RepoilaF RosenowC StorzG andGottesmanS (2001) Identification of novel small RNAs usingcomparative genomics and microarrays Genes Dev 15 1637ndash1651

63 RobisonK McGuireAM and ChurchGM (1998) A compre-hensive library of DNA-binding site matrices for 55 proteins appliedto the complete Escherichia coli K-12 genome J Mol Biol 284241ndash254

64 SalgadoH Santos-ZavaletaA Gama-CastroS Millan-ZarateDDiaz-PeredoE Sanchez-SolanoF Perez-RuedaEBonavides-MartinezC and Collado-VidesJ (2001) RegulonDB(version 32) transcriptional regulation and operon organization inEscherichia coli K-12 Nucleic Acids Res 29 72ndash74

65 AndrewsSC RobinsonAK and Rodriguez-QuinonesF (2003)Bacterial iron homeostasis FEMS Microbiol Rev 27 215ndash237

66 SchneiderTD (2002) Consensus Sequence Zen ApplBioinformatics 1 111ndash119 httpwwwccrnpncifcrfgov~tomspaperszen

67 RuddKE and SchneiderTD (1992) Compilation of E coliribosome binding sites In MillerJH (ed) A Short Course inBacterial Genetics A Laboratory Manual and Handbook forEscherichia coli and Related Bacteria Cold Spring HarborCold Spring Harbor Laboratory Press New York pp 1719ndash1745

68 HengenPN LyakhovIG StewartLE and SchneiderTD(2003) Molecular flip-flops formed by overlapping Fis sitesNucleic Acids Res 31 6663ndash6673

69 FeeJA (1991) Regulation of sod genes in Escherichia colirelevance to superoxide dismutase function Mol Microbiol 52599ndash2610

70 IgarashiK SaishoT YuguchiM and KashiwagiK (1997)Molecular mechanism of polyamine stimulation of thesynthesis of oligopeptide-binding protein J Biol Chem 2724058ndash4064

71 YoshidaM MeksuriyenD KashiwagiK KawaiG andIgarashiK (1999) Polyamine stimulation of the synthesis ofoligopeptide-binding protein (OppA) Involvement of a structuralchange of the Shine-Dalgarno sequence and the initiation codonaug in oppa mRNA J Biol Chem 274 22723ndash22728

72 LitwinCM and CalderwoodSB (1993) Role of iron in regulationof virulence genes Clin Microbiol Rev 6 137ndash149

73 MakuiH RoigE ColeST HelmannJD GrosP andCellierMF (2000) Identification of the Escherichia coli K-12Nramp orthologue (MntH) as a selective divalent metal iontransporter Mol Microbiol 35 1065ndash1078

74 PatzerSI and HantkeK (2001) Dual repression by Fe(2+)-Furand Mn(2+)-MntR of the mntH gene encoding an NRAMP-likeMn(2+) transporter in Escherichia coli J Bacteriol 1834806ndash4813

75 FurrerJL SandersDN Hook-BarnardIG and McIntoshMA(2002) Export of the siderophore enterobactin in Escherichia coliinvolvement of a 43 kDa membrane exporter Mol Microbiol 441225ndash1234

76 CunninghamL GruerMJ and GuestJR (1997) Transcriptionalregulation of the aconitase genes (acnA and acnB) of Escherichiacoli Microbiology 143 ( Pt 12) 3795ndash3805

77 FuentesAM Diaz-MejiaJJ Maldonado-RodriguezR andAmabile-CuevasCF (2001) Differential activities of theSoxR protein of Escherichia coli SoxS is not required for geneactivation under iron deprivation FEMS Microbiol Lett 201271ndash275

78 VargheseS TangY and ImlayJA (2003) Contrasting sensitivitiesof Escherichia coli aconitases A and B to oxidation and irondepletion J Bacteriol 185 221ndash230

79 ZavitzKH DiGateRJ and MariansKJ (1991) The PriB andPriC replication proteins of Escherichia coli Genes DNAsequence overexpression and purification J Biol Chem 26613988ndash13995

80 SandlerSJ (2005) Requirements for replication restart proteinsduring constitutive stable DNA replication in Escherichia coli K-12Genetics 169 1799ndash1806

81 SchneiderTD (2001) Strong minor groove base conservation insequence logos implies DNA distortion or base flipping duringreplication and transcription initiation Nucleic Acids Res 294881ndash4891 httpwwwccrnpncifcrfgov~tomspaperbaseflip

Nucleic Acids Research 2007 Vol 35 No 20 6777

Page 6: Discovery of Fur binding site clusters in Escherichia coli ...biitcomm/research/references... · using an fhuF:lacZ fusion and Fur consensus sequence-containing plasmid titrant on

As with the 11 sites multiple alignment of the 24 sitesusing different window sizes also gave three classes ofalignments M12 M9 and M6 (Supplementary Figure S4)When these models were tested against the same set ofsequences as that used to test the 11-site models (Table 1Supplementary Figure S3) similar but cleaner resultswere obtained ie the weak walkers that were not countedpreviously become weaker (some are even below 0 bits)but the strong walkers still have similar Ri values(Supplementary Figure S5) This confirms that ourcounting of major walkers predicted by the initial 11-sitemodels was reasonable

P aeruginosa and B subtilis Fur models

To compare E coli Fur models with other bacterialFur models we also built P aeruginosa and B subtilis Furbinding site models As with E coli Fur binding siteswe applied the same method to align footprinted Furbinding sites from P aeruginosa (41) and B subtilis (42)For both bacteria we obtained two classes of alignments

(Supplementary Figure S6) which correspond to theE coli M9 and M6 logos (Figure 1B and C) respectivelyNo M12 alignments were obtained for these two bacteriaprobably because some sites used to make the modelsare single-dimer binding sites (see Discussion Section)Both P aeruginosa and B subtilis M9 models arehighly similar to the E coli M9 model (Figure 4) TheP aeruginosa M6 logo looks almost identical to theE coli M6 logo while the B subtilis M6 logo is lesssimilar to the E coli M6 logo (Figure 1C SupplementaryFigure S6B and D)As with Lavrrar and McIntoshrsquos work on E coli Fur

binding sites (55) Baichoo and Helmann (40) performedsimilar experiments on eight synthetic oligos and twonatural sites (feuA and dhbA) with B subtilis Fur anddetermined how many Fur dimers can bind each of theseoligos We tested the B subtilis M9 and M6 modelswith these 10 sequences (Table 2 Figure 2C and Dand Supplementary Figure S7) The results show that theM9 model made correct predictions for all 10 sequenceswhile the M6 model was correct for only four oligos

A

B

fepB binds 2 Fur dimers

entS binds 3 Fur dimers

C

D

feuA binds 1 Fur dimer

dhbA binds 2 Fur dimers

10 20 30 5 g g a a t t c g a a a a t g a g a a g c a t t a t t g g a t c c c g 3

M12 157 bits

M9 109 bits

M9 82 bits

M6 152 bits

10 20 30 40 5 g g a a t t c g a t a a t g a a a t t a a t t a t c g t t a t c g g a t c c c g 3

M12 156 bits

M12 172 bits

M9 75 bits

M9 172 bits

M9 95 bits

M6 83 bits

M6 122 bits

10 20 30 5 a t t c c a a t t g a t a a t a g t t a t c a a t t g a a c a 3

M9 223 bits

M6 103 bits

M6 81 bits

10 20 30 5 a t t g a t a a t g a t a a t c a t t a t c a a t a g a t t g 3

M9 256 bits

M9 299 bits

M6 149 bits

M6 208 bits

M6 171 bits

Figure 2 Sequence walkers to test E coli (A and B) and B subtilis (C and D) Fur models (A and B) The three E coli Fur models (M12 M9 andM6) were tested with two E coli Fur binding sites fepB and entS (55) Sequence walkers of each model (green M12 red M9 blue M6) are shownfor each site (C and D) Two B subtilis Fur models (M9 and M6) were tested with two B subtilis Fur binding sites feuA and dhbA (40) Gray barsindicate natural sequence surrounded by synthetic DNA Different colored rectangles indicate different Fur models (by hue green M12 red M9blue M6) and their strengths (by saturation) (30) Only major walkers are shown in these figures for a full list of walkers (Rigt0 bits) please seeSupplementary Figures S3 S5 and S7

Nucleic Acids Research 2007 Vol 35 No 20 6767

For most of the oligos the M6 model predicted one moresite than the M9 model did Similar to our observationsin E coli these results strongly suggest that the M9 modelis a single-dimer binding model while the M6 alignment isdue to compression of the binding sequencesRelative to E coli Fur (NP_415209) the P aeruginosa

Fur (NP_253452) is moderately diverged (58 identical inprotein sequence) and the B subtilis Fur (NP_390233) isdistantly diverged (33 identity) However a comparisonof the three M9 models from these three species showsthat these models are highly similar to each other(Figure 4) suggesting that the FurndashDNA recognitionmechanism is conserved in even distantly related species

Scans of footprinted regions

Gel shift experiments cannot give precise Fur bindingregions Originally using the 11-site M9 model and laterusing the 24-site M9 E coli Fur model we predicted Fursites in the fhuF promoter region and then to validate ourmodel we performed DNase I footprinting on that regionTo further confirm the 24-site M9 model we also scannedit across all other available footprinted E coli Fur bindingregions and correlated our predictions with the publishedfootprints Fur has been documented to exhibit lsquosecondaryfootprintingrsquo protecting extended regions of DNA underhigher protein concentrations (Supplementary Figure S8)(333646) To reveal the extended regions we used a cutoffat zero bits since that is the theoretical lowest informationfor binding (19)The E coli 24-site M9 model predicts two separated

Fur binding clusters in the fhuF promoter region onecluster contains three strong sites of 161 208 and 223bits and the other contains two weaker sites of 146 and

93 bits (Figure 5A) The fhuF oligo used in the gel shiftexperiments in Figure 3 only contains the strong Furcluster To determine if both predicted clusters are pro-tected by Fur we performed DNase I footprinting on alarger fhuF promoter region (see Materials and MethodsSection) that contains both predicted Fur binding clustersThe results show that there are indeed two Fur-protectedregions in the fhuF promoter one has high affinity withFur (protected by 50 nM of Fur) and the other hasa lower affinity (weakly protected by 50 nM of Fur)(Figure 5B) The predicted strong Fur binding clustercovers 91 of the high-affinity footprinted region and theweak Fur cluster covers 66 of the low-affinity region(Table 3 Figure 5A)

There are 11 other footprinted E coli Fur bindingregions including the 10 regions used to build the initial11-site models and the fepD-entS promoter region (58)When these footprints were scanned similar coveragewas obtained (Table 3 Supplementary Figure S8)All 11 footprinted sequences displayed clusters of multipleoverlapping Fur walkers in these clusters the majorityof the walkers were separated by six bases some wereseparated by three bases (Table 3 SupplementaryFigure S8) For 7 of the 11 regions (fepA-fes fepB entCfur tonB cir and iucA) sequence walkers cover 83(72ndash93) of the footprints No sequence walkers coverthe secondary footprints in the fecA fepD-entS and sodAregions resulting in low coverage of these three footprints(66 54 and 56 respectively) The secondary sodAfootprint was only protected with a high concentration ofpurified Fur (200 and 500 nM) it was not protected bycrude extracts containing overproduced Fur (36) In theother region fecI several low-information content

A

B

yoeA fepD-entS gpmA ryhB fhuA nohA oppA

bits 257 172 243 220 161 205 187 11-site M9274 194 274 252 195 223 215 24-site M9

Fur minus + minus + minus + minus + minus + minus + minus +

gspC garP yahA fadD ygaC exbB exbD fhuF

bits 122 122 163 197 14 175 minus50 214 11-site M9144 138 186 217 minus34 191 minus71 223 24-site M9

Fur minus + minus + minus + minus + minus + minus + minus + minus +

Figure 3 Gel mobility shift assays to test predicted Fur binding sites Oligonucleotides containing predicted Fur binding sites were incubated without() or with (+) 150 nM E coli Fur protein and gel electrophoresed to test the 11-site M9 model (Figure 1B) Below each set of lanes is the strengthin bits of the strongest sequence walker found on each oligo by the 11-site M9 model Predictions by the 24-site M9 model which includes 13 of thesites tested here (not ygaC or exbD) are also given for comparison (A) The first set of oligos contain seven predicted Fur sites located in promoterregions (yoeA fepD-entS gpmA ryhB fhuA nohA and oppA) (B) The second set of oligos contain four predicted Fur sites located within genes(gspC garP yahA and fadD See Supplementary Figures S10ndashS13 for sequence walkers) two sites located in promoter regions (exbB and fhuF)and two consensus-predicted sites (ygaC and exbD) (5051) As expected all oligos that contain predicted sites (by the 11-site M9 model) exhibit oneor more mobility shifts following incubation with 150 nM Fur the oligo ygaC (14 bits) shows extremely weak binding and exbD (50 bits)does not shift

6768 Nucleic Acids Research 2007 Vol 35 No 20

walkers (02 and 08 bits) extend past the range of thefootprint resulting in 144 coverage of the footprint

In a previously published study Escolar et al synthe-sized oligonucleotides that contained repeats of thesequence GATAAT and determined Fur binding byfootprint experiments (5) No footprint was seen withone insert correspondingly no sequence walkers wereobserved in that sequence (Supplementary Figure S9)For the oligos with three four and five inserts sequencewalkers cover 83ndash90 of the footprints The 2-insertsequence had a weak interaction with Fur at high protein

concentrations and two overlapping walkers of 30 and42 bits appeared which cover the repeats (5) Theseresults demonstrate that the Fur model accurately predictsFur binding on both natural and synthetic sequences

Whole genome scan

The lowest information of any site in the 24-site M9 modelis 101 bits (Supplementary Figure S1) Since all sites inthe model are proven binding sites we used 10 bits asthe cutoff for a whole genome scan This presumably

A

B

C

======gtolt======

======gtolt======

======gtolt======

24 E coli Fur binding sites 196 bits (minus9 +9)

0

1

2b

its

5prime

minus20

CTGA

minus19

minus18

minus17

minus16

GCAT

minus15

GCTA

minus14

CGTA

minus13

GACT

minus12

CAGT

minus11

CGTA

minus10

GCAT

minus9

GTA

minus8

GTA

minus7

GCAT

minus6

CAG

minus5

GTCA

minus4

GAT

minus3

TA

minus2

TGCA

minus1

GACT

0 1

TCGA

2

ACGT

3

AT

4

TCA

5

CGAT

6

GTC

7

CTGA

8

CAT

9

CAT

10

CGTA

11

GCAT

12

GTCA

13

CTGA

14

GCAT

15

CGAT

16

CGTA

17 18 19 20

GCAT

3prime

20 P aeruginosa Fur sites 186 bits (minus9 +9)

0

1

2

bit

s

5prime

minus20

minus19

minus18

minus17

minus16

minus15

minus14

CGTA

minus13

ACGT

minus12

minus11

minus10

CGTA

minus9

CGTA

minus8

CTA

minus7

CT

minus6

TAG

minus5

GTCA

minus4

CAGT

minus3

TA

minus2

CTA

minus1

CT

0

GC

1

GA

2

GAT

3AT

4GTCA

5

CAGT

6

ATC

7

GA

8

GAT

9

GCAT

10

GCAT

11 12 13

TGCA

14

GCAT

15 16 17 18 19 20 3prime

22 B subtilis Fur sites 229 bits (minus9 +9)

0

1

2

bit

s

5prime

minus20

minus19

GCAT

minus18

GCTA

minus17

GACT

minus16

CTGA

minus15

minus14

GCAT

minus13

GCAT

minus12

minus11

GCAT

minus10

GCTA

minus9

CGTA

minus8

CGTA

minus7

AT

minus6

C

G

minus5

GCA

minus4

CGAT

minus3

TGA

minus2

TGA

minus1

GCAT

0

TAGC

1

GCTA

2

CAT

3

CAT

4

GCTA

5

CGT

6

G

C7

TA

8

GCAT

9

GCAT

10

CGAT

11

CGTA

12 13

CGTA

14

CGTA

15 16

GACT

17

CTGA

18

CGAT

19

CGTA

20 3prime

Figure 4 Comparison of M9 Fur models of three bacterial species Experimentally proven Fur binding sites from E coli (A) P aeruginosa (B) andB subtilis (C) were used to build the models for these three bacteria The marked region from 7 to +7 in each logo indicates the 7-1-7 model thatwas proposed by Baichoo and Helmann (40)

Nucleic Acids Research 2007 Vol 35 No 20 6769

minimizes false positives in our predictions When themodel was scanned across the E coli K-12 genome(NC_000913) a total of 412 sites were found above10 bits These sites belong to 363 unique clusters A list ofthe clusters is available at httpwwwccrnpncifcrfgovtomspapersfurThe 363 regions were extracted from 50 to thorn50

around the strongest site in each region and scanned withthe 24-site M9 model using a cutoff of 0 bits We thendetermined the exact region of each cluster that is coveredby walkers greater than 0 bits The results show that theseclusters contain 1ndash14 walkers covering a region from19ndash142 bases (350 bases on average) This is comparableto the known footprints (30ndash103 bases) (Table 3 Supple-mentary Figure S8) Of these clusters 155 are in inter-genic regions 143 are entirely inside gene coding regionsand 65 overlap the boundary between coding and inter-genic regionsTo determine how many of the Fur clusters overlap

with reasonably strong promoters we scanned these 363Fur clusters with a s70 promoter model (47) using a 4-bitcutoff The results showed that 303 clusters overlap withone or more strong promoters (Rigt4 bits) The other60 clusters do not overlap with a strong promoterWhen we scanned the flanking regions of these 60 clusters(50 to 0 of the left edge and 0 to thorn50 of the right edgeof each cluster) only 23 clusters were found to havestrong promoters (Ri gt 4 bits) in the flanking regionsWe examined the locations of Fur sites relative toknown promoter elements (transcriptional start site 10and 35) and found that they are more tightly distributedaround the promoter elements than observed with theDNA binding proteins Fis H-NS and IHF (47) (data notshown) These results suggest that Fur may mainly act as adirect repressor of genes by binding to and blockingthe promoters and that direct activation by Fur throughbinding to the region upstream of a promoter assuggested in Neisseria meningitidis by Delany et al (59)

may not be a common mechanism of Fur regulationin E coli

Out of the 363 clusters 42 were found containingat least one predicted Fur binding site above 170 bits(our arbitrary cutoff used to locate significant regions)(Table 4) Of the 42 clusters 39 were found to overlap withone or more strong s70 promoters (gt4 bits) The otherthree clusters fepD-entS yddA and fecA do not overlapScanning the flanking regions of these three clusters withthe s70 promoter model revealed that these Fur clustersare located between the respective promoters and transla-tional starts There may be only weak repression of thesegenes by Fur as none of them were significantly repressedby Fur according to microarray analyses (1360)

Scans of other proposed Fur-regulated genes

Many genes have been proposed to be Fur-regulated bycomparing conventional consensus sequences to promoterregions and also by homology to systems in other organ-isms Kammler et al annotated two Fur binding sites inthe promoter region of feoA in E coli by comparison toa consensus sequence (10) The 24-site M9 model predictstwo clusters of sites in the same region with each clustercovering one of the marked Fur boxes One cluster hasone major site of 105 bits (at 3538006) the other clusterhas two major overlapping sites of 109 (at 3538059) and134 bits (at 3538065) respectively The same authors haveconfirmed that Fur does bind to this region in vivo but theexact Fur binding regions have not been determined byfootprint experiments

Vassinova and Kozyrev used an in vivo selection tolocate Fur sites on Sau3A fragments from the E coligenome (51) The five regions from Figure 2 of their paperwere analyzed using sequence walkers Using an unidenti-fied consensus sequence yhhX (from 3578828 to 3577791)was predicted by Vassinova and Kozyrev to have two Furbinding sites In the same region we found one strongsite of 252 bits (at 3579054) (Table 4) overlapping twoweaker sites (82 and 83 bits) This cluster of Fur sites isactually located right in front of an sRNA gene ryhB(from 3579039 to 3578946) (Table 4) (6162) which hasbeen shown to be repressed by Fur (57) The promoterregion of gpmA (from 786818 to 786066 named pgm inreference 51) was predicted by consensus to contain twosites One of the highest sites (274 bits) was found in thisregion (at 786853) overlapping two weaker sites of 75and 56 bits The consensus sequence-predicted Fur site byVassinova and Kozyrev (51) is located about 600 basesupstream of the ygaC gene In the same region our 11-siteM9 model only found a 14-bit site (Figure 3) and the24-site M9 model found a negative site of 34 bits(Supplementary Figure S1) Instead in the nearby regionsthe 24-site M9 model found a 74-bit site (11 basesupstream of ygaC) and a 101-bit site (728 bases upstreamof ygaC) For nohA (from 1634391 to 1633822)two overlapping sites of 223 and 126 bits were found at1634624 and 1634630 (Table 4) The fifth region was fhuF(from 4603686 to 4602898) Figure 5 shows the DNase Ifootprints and predicted walkers in this region Onlythe high-affinity region (fhuF1) was identified by the

Table 2 Information theory test of B subtilis Fur models M9 and M6

Oligo1 Number of observed1

Fur dimers (nM)Number of M92

walkers (bits)Number of M62

walkers (bits)

6-mer 0 f 0 f 06-6 1 (500) f 1 (21) 2 (67 54)6-1-6 1 (200) f 1 (97) 2 (54 74)7-1-7 1 (100) f 1 (207) 2 (107 128)8-1-7 2 (100 1000) f 2 (76 202) f 2 (174 108)8-1-8 2 (100 1000) f 2 (76 230) f 2 (174 160)9-1-9 2 (50 100) f 2 (119 301) f 2 (208 194)2 (7-1-7) 2 (10 100) f 2 (166 257) 3 (61 208 143)feuA 1 (10) f 1 (223) 2 (103 81)dhbA 2 (10 100) f 2 (256 299) 3 (149 208 171)

1Oligos and observed Fur dimers are from (40) The Fur concentration(nM) at which one or two Fur dimer-DNA complexes appeared isgiven in parentheses2Number of sequence walkers of each model followed by theinformation (bits) of each walker are shown for each sequence Thepredictions that match the number of observed Fur dimers are markedwith a solid circle while those that do not are marked with an opencircle Only major walkers are counted A full depiction of all walkerscan be found in Supplementary Figure S7

6770 Nucleic Acids Research 2007 Vol 35 No 20

A B

L

H

H

L

H fhuF1

L fhuF2

yjjZ

4603880 4603870 4603860 4603850 4603840 4603830 4603820 5 a t c a g c a a t c c c g g c a g c a a c a c t c c c c a g c c a c t g c c c a g c g t a c g t t g c a a c a t g a t t t c a t c t c 3

4603810 4603800 4603790 4603780 4603770 4603760 4603750 5 t t t c a t t g a t a a t g a t a a c c a a t a t c a t a t g a t a a t t t t t a t c a t t t g c a a g c c a g a t a a a t c c c t t g c t a t c g g 3

Fur high affinity topFur high affinity bottom

M9 161 bits

M9 208 bits

M9 223 bits

M9 34 bits

fhuF

fhuF

4603740 4603730 4603720 4603710 4603700 4603690 4603680 5 g t a a a c c t a t c g c t a t g a t t a g c a a t c a t t a t c a t t t a g a t t a c t a t c c c g a t t a t g g c c t a t c g t t c 3rsquo

Fur low affinity topFur low affinity bottom

M9 22 bits

M9 01 bits

M9 146 bits

M9 93 bits

Figure 5 Fur model scan and DNase I footprints of the fhuF promoter region (A) The promoter region was scanned with the 24-site M9 model and two clusters of sequence walkers (Rigt 0bits) were detected with each closely corresponding to one of the two Fur protected regions (gray bars fhuF1 and fhuF2) The solid black arrow with two tails above the DNA at coordinates4603717 and 4603716 indicates the fhuF transcription starts The open arrows starting at coordinates 4603827 and 4603686 indicate the yjjZ and fhuF translation starts respectively Thesaturation of colored rectangles behind each sequence walker is proportional to the information of that site For reference colored lines cycle with 6-base periodicity (B) DNase I footprinting onthe fhuF promoter by Fur showed two regions protected by the protein marked in the figure by brackets The high-affinity and low-affinity regions are marked with H and L respectively Thefootprinting samples were run in parallel with MaxamndashGilbert sequencing ladders (marked by GA)

Nucleic

Acid

sResea

rch2007V

ol3

5N

o20

6771

consensus sequence Thus four of the five Sau3Afragments (yhhX gpmA nohA and fhuF) selected to haveFur binding sites in vivo were identified in our genomescan (Table 4) and our gel shift results confirmed this(Figure 3) the remaining site ygaC had a weak walker of14 bits (by the 11-site M9 model) and showed extremelyweak binding (Figure 3)Eick-Helmerich and Braun matched a Fur consensus

sequence to the promoters of exbB and exbD (50)One strong walker of 191 bits was found in the exbBpromoter region overlapping the lsquoFur boxrsquo The regionupstream of exbD showed no walkers with Ri greater than0 bits even though it had also been predicted to containa Fur binding site using the same consensus sequenceas used in the exbB promoter The highest walker thatwe could detect in the region upstream of the exbDgene is 71 bits (by the 24-site M9 model Figure 3 andSupplementary Figure S1) significantly lower than thetheoretical lower bound of a binding site (0 bits) (19) Thisindicates that Fur should not bind this region and our gelshift results confirmed this (Figure 3)A microarray analysis by McHugh et al found 101

genes to be regulated by Fur 53 of which were repressedand 48 activated (13) The 53 Fur-repressed genes belongto 32 transcription units (individual genes or operons)18 of which are involved in iron metabolism 14 in energymetabolism and other functions Using the 24-site M9model we predicted strong Fur binding clusters (higherthan 13 bits) for all 18 iron metabolism-related transcrip-tion units Of the other 14 transcription units onlytwo have a Fur binding site higher than 10 bits (nrdHIEF101 bits fimE 108 bits) The 48 Fur-activated genesbelong to 34 transcription units only three of these(garPLRK ynaE and ydfK) have a predicted Fur bindingcluster The garPLRK operon has a weak Fur clusterthat contains one major site of 104 bits The ynaE andydfK genes are almost identical in both promoter andcoding regions and the two Fur clusters (171 bitscentered at thorn8 from the translational start) are the same(Table 4) These results strongly suggest that in E coli

Fur is a direct repressor of iron metabolism genes and anindirect repressor or activator of other genes

McHugh et al (13) used an altered information theoryapproach for modeling Fur binding (63) to identify whichgenes are under direct Fur control Our model identifiedsites in all genes that their model did as well as fourothers fhuE (124 bits at 1160880) exbB (191 bits at3150076) fimE (108 bits at 4539697) and ydfK (171 bitsat 1631071)

The indirect activation by Fur requires an intermediateregulator which could directly target Fur-activated genesA small regulatory RNA gene ryhB is one such importantplayer Earlier study revealed several genes that aredirectly repressed by ryhB (57) Recently a microarrayanalysis of ryhB control by Masse et al (60) expanded thisdirect target set to 18 operons They also identified anadditional 10 indirectly down-regulated operons and 10up-regulated operons The 10 ryhB indirectly repressedoperons are directly repressed by Fur and for all ofthese operons we found strong Fur binding clusters(higher than 13 bits) in the corresponding promoterregions In contrast only a few of the other 28 ryhB-controlled operons (18 directly repressed operons and10 up-regulated operons) have a predicted Fur bindingcluster above 10 bits These include acnA (176 bits at1333484) ydhD (109 bits at 1732367) and oppA (215 bitsat 1298973) The acnA and oppA Fur clusters are bothstrong the former cluster has three overlapping sites of176 98 and 142 bits and the latter has two overlapp-ing sites of 215 and 96 bits (Table 4) Howevermicroarray analysis did not find these two genes to berepressed by Fur (1360) suggesting that the Fur bindingclusters may not be the primary control elements ofthese genes

DISCUSSION

Fur binding models

In this study we used experimentally proven sites to createinformation theory models of Fur binding Our models

Table 3 Scan of footprinted Fur binding sites in E coli

Accession Gene Footprinted region References Number Ri Percentage

NC_000913 fhuF1 4603763 ndash 4603815 This work 4 223 91NC_000913 fhuF2 4603680 ndash 4603733 This work 4 146 66NC_000913 fepA-fes 611868 ndash 611915 (35) 4 187 77NC_000913 fepB 623939 ndash 623969 (39) 2 167 81NC_000913 entC 624054 ndash 624102 (39) 4 231 76NC_000913 fur 709931 ndash 709960 (34) 2 123 83NC_000913 tonB 1309040 ndash 1309072 (838) 3 115 88NC_000913 cir 2244957 ndash 2244999 (32) 3 207 72NC_000913 sodA 4098726 ndash 4098780 (36) 3 233 56NC_000913 fecA 4514752 ndash 4514789 (33) 2 176 66NC_000913 fecI 4516300 ndash 4516340 (33) 7 231 144M10930 iucA 291 ndash 393 (37) 13 203 93NC_000913 fepD-entS 621394 ndash 621462 (58) 4 201 54

The footprinted regions were scanned with the 24-site M9 model (Rigt0) to test the modelrsquos validity This table lists the GenBank accession numberthe gene promoter that was footprinted the coordinates of the footprinted region the references for the footprints the number of walkers in theregion the strongest Ri value in the same region and the percent of the footprint covered by sequence walkers This table is a summary of Figure 5Aand Supplementary Figure S8 Sites in the promoters that are marked by an asterisk were used to build the 11-site models (Figure 1 SupplementaryFigure S1) in the case of iucA two sites were used in the model (see Materials and Methods Section)

6772 Nucleic Acids Research 2007 Vol 35 No 20

especially the 24-site M9 model approximate the bindingcharacteristics of Fur more fully than previous modelsthat depended on conventional consensus sequences datafrom multiple species and sequences that were not foot-printed (13251) The rigorous approach used hererevealed new binding sites disproved two sites predictedby a consensus sequence and clarified the manner inwhich the protein binds

To produce these models binding sequences weremultiply aligned to maximize the information content indifferent window sizes Because multiple Fur sites overlapour analysis of proven Fur binding sequences from three

bacterial species gave two or three different alignmentsdepending on the species (Figure 1 SupplementaryFigure S6) By testing these models against publisheddata we showed clearly that the M9 model representssingle-dimer binding sites the M12 model is for two-dimerbinding while the compact M6 alignment was caused bycompression of the binding sequences into a small window(Tables 1 and 2 Figures 1 and 2 Supplementary FiguresS3 S5 and S7)As evaluated by their respective M9 models our

initial set of 11 footprinted E coli Fur binding sitesall contain overlapping sites of 6-base spacing

Table 4 Strong Fur clusters in the E coli genome predicted by the 24-site M9 model

Ri Number Size Coordinate Genes Experimental Position References(bits) (bp) data

274 12 95 2066611 yoeA G 48 NA274 5 31 786853 gpmA G 35 (1351)253 4 28 579824 nohB 133 (1)252 4 31 3579054 ryhB G 15 (57)233 3 31 4098746 sodA F 87 (6069)231 7 59 4516309 fecI F 51 (1360)231 4 37 624072 entC F 36 (39)228 2 25 1903412 yebN 300 NA223 4 48 4603779 fhuFa FG 93 (135160)223 3 25 1634624 nohA G 233 (51)217 2 25 1886651 fadD G +1119b NA215 2 25 1298973 oppA G 233 (7071)213 3 31 1762463 sufA 53 (1360)207 3 31 2244980 cir F 189 (1372)206 5 40 1118358 yceJ 89 NA202 3 31 3465053 bfd 40 (1360)201 10 53 1787602 ydiE 35 (13)201 2 35 1080513 ycdN 57 (1360)199 8 106 3214572 yqjH 59 (13)199 7 103 1752756 ydhY 255 NA199 2 25 2510784 mntH 56 (7374)196 3 25 2231892 yeiT 163 NA196 2 25 1577358 yddA +8c (13)195 3 31 167439 fhuA G 45 (1360)194 4 37 621437 fepD ndash entS FG 25 86 (1375)193 3 52 675768 ybeQ +2c NA191 3 31 3150076 exbB G 70 (136072)187 8 96 2784036 ygaQ 383 NA187 4 37 611889 fepA ndash fes F 172 149 (13)186 7 38 3266404 yhaC 33 NA186 4 28 331901 yahA G +306e NA183 4 61 331085 yahA 510e NA176 6 41 1333484 acnA 371 (5776ndash78)176 2 25 4514779 fecA F 79 (1360)176 1 19 2254041 yeiL +5d NA175 3 25 490057 priC ndash ybaN 21 49 (7980)173 2 29 2454142 yfcV 474 NA172 8 41 3582772 yrhB 10 NA172 4 37 840877 fiu 123 (13)171 3 40 1631071 ydfK +8c (13)171 3 40 1432273 ynaE +8c (13)171 2 25 4570225 yjiT 212 NA

The strongest individual information value (Ri bits) in each cluster is shown followed by the number of walkers (Rigt 0 bits) the size of the clusterthe coordinate of the strongest site the genes that are presumably controlled experimental data footprinted (F) andor gel shifted (G) sites thelocation of the strongest Fur site relative to the gene start or stop codon (as noted) and references about Fur regulation of the genes (NA means noreference is available)aThis cluster corresponds to the high-affinity region (fhuF1) in the fhuF promoter (Figure 5)bThe coordinate is high because Fur cluster is inside the fadD coding regioncThese Fur clusters overlap with translational startsdThis Fur cluster overlaps with the stop codoneThe yahA gene has two Fur clusters one is inside the coding region another is 510 bases upstream of the translational start The Fur cluster insidethe gene was tested by gel shift experiments (Figure 3)

Nucleic Acids Research 2007 Vol 35 No 20 6773

(Supplementary Figure S8) while only 7 of the 20P aeruginosa Fur binding sites and 6 of the 22 B subtilisFur binding sites are single-dimer binding sites (data notshown) This scarcity of multiple sites explains why anM12 alignment can be obtained for E coli but not forP aeruginosa or B subtilis When we performed multiplealignment by only using overlapping sites of 6-bp spacingwe obtained M12 alignments for both P aeruginosa andB subtilis (data not shown) These observations alsosupport the use of M9 as a single-dimer modelOur multiple alignment method should apply to any

DNA binding sites that contain internal direct repeats asrepetition within binding sites may cause alternativealignments of the sites In the RegulonDB (httpregulondbccgunammx) (64) a Fur binding modelwas created by multiply aligning 47 Fur sites althoughit is not clear what these 47 sites are and how thealignment was made Based on their alignment (httpregulondbccgunammxhtmlmatrix_Alignmentjsp) we made a sequence logo (data notshown) which we found to be similar to our M6 logosAlthough the M6 models appear to be able to predictsome sites (eg Tables 1 and 2 Supplementary Figures S3S5 and S7) the models only represent the internal partof overlapping Fur binding sites ie from 6 to thorn6 of theM12 model (Figure 1A)A 18 A crystal structure for P aeruginosa Fur

was obtained by Pohl et al and based on this theauthors proposed a single-dimer binding model (9) TheP aeruginosa crystal structure contains a putative DNAbinding a-helix H4 and a loop between helices H1 and H2both of which were proposed to be involved in bindingDNA through contacting bases in the major groove Thusa single dimer would protect two consecutive majorgrooves on the same face of DNA and the minimalrecognition unit of Fur should be close to 20 bp (9) OurM9 binding site models (19 bp a 9-1-9 inverted repeat) fitwith this single-dimer binding model closely Pohl et alalso proposed a two-dimer binding model in whichtwo Fur dimers bind two overlapping sites that are 5-bpapart (9) However in our whole genome scans andfootprint scans as well we did not observe any case of5-bp separated overlapping sites above 5 bits Instead wemostly found 3- or 6-bp separated overlapping sitesAbove five bits we found 95 cases of 3-bp spacing and92 cases of 6-bp spacingSeveral other consensus-based Fur binding site models

have been proposed to interpret a 19-bp consensuslsquoFur boxrsquo (Figure 6) (65) Within these two earlier

models the classical model and hexamer model(573754) interpreted the lsquoFur boxrsquo as a single recogni-tion unit of a Fur dimer Later Lavrrar and McIntoshsuggested that a 13-bp inverted repeat (6-1-6) is theminimal unit recognized by a single Fur dimer and thattwo overlapping lsquo6-1-6rsquo motifs correspond to the Fur boxwhich is required for high-affinity binding of Fur (5558)Baichoo and Helmann reinterpreted the Fur box as twooverlapping 7-1-7 inverted repeats with a 6-bp spacingand suggested that a 7-1-7 site but not 6-1-6 representsthe minimal recognition unit of Fur (40) Baichoo andHelmann also demonstrated that high-affinity bindingby Fur can happen on a single-dimer binding site ie aninverted 7-1-7 site

The 7-1-7 model basically agrees with our M9 models(Figure 6) The main difference is that the 7-1-7 model didnot count the weakly conserved bases at positions 9 8thorn8 and thorn9 in the B subtilis Fur binding site alignment(Figure 4C) As the E coli and P aeruginosa M9 modelshave highly conserved bases at these four positions(Figure 4A and B) and also the Fur binding mechanismsfor these three species are similar (9) we suggest thata minimal Fur binding unit of 19 bp (9-1-9) as representedby our M9 models is more reasonable

Relative to our M9 models the Fur box only representsthe internal part of two overlapping sites that areseparated by 6 bases ie from 9 to thorn9 of the M12logo (Figures 1A and 6) thus any predictions based on theFur box may not be accurate Two sites (exbD and ygaC)that were predicted by matching to the Fur box wereshown to be non-binding (Figure 3)

Many difficulties in understanding Fur binding sites canbe attributed to the choice of consensus sequences asa model (66) In contrast to the individual informationweight matrix model the consensus sequence methodignores the varying importance of bases by treatingmismatches equivalently In addition the consensusmethod does not have a criterion for an acceptablenumber of mismatches

Sequence walkers predicted by the 24-site M9 modelcover 83 of the footprints (Table 3 Figure 5Supplementary Figure S8) Similar results were obtainedfor both the P aeruginosa and B subtilis M9 models(data not shown) These results strongly suggest that ourM9 models can accurately predict Fur binding

Fur Binding Site Clusters

The whole genome scan with the 24-site M9model revealed363 Fur binding clusters in E coli most of which containmultiple walkers (up to 14) covering regions up to about140 bases This is comparable to the size of knownfootprints which ranges from 30 to 103 bases (Table 3Supplementary Figure S8) Most of the clusters (303 outof 363 83) were found to overlap with one or morestrong s70 promoters Indeed the four Fur clusters inthe middle of coding regions gspC garP yahA and fadDthat we tested by gel mobility shift assays (Figure 3) allhave one or more potential s70 sites (Ri gt 4 bits) exactlyoverlapping them (see Figures S10ndashS13) The function ofthese control elements is unknown

Fur boxClassical model 9-1-9Hexamer model 6-6-1-6Lavrrar model 6-1-6

Baichoo model 7-1-7

M9 model 9-1-9

GATAATGATAATCATTATC

========gtOlt========

=====gt=====gtOlt=====

=====gtOlt==========gtOlt=====

t=====gtOlt============gtOlt=====a

aat=====gtOlt================gtOlt=====att

Figure 6 Different Fur binding models to interpret the Fur box

6774 Nucleic Acids Research 2007 Vol 35 No 20

We correlated our whole genome scan with publishedmicroarray data about Fur in E coli (1360) The resultsshowed that all operons that are involved in ironmetabolism have a predicted strong Fur binding cluster(higher than 13 bits) suggesting direct repression ofthese genes by Fur In contrast only a few of otherFur-activated or repressed operons have a predicted Furcluster higher than 10 bits suggesting indirect control ofthese genes by Fur It has been shown that RNA generyhB directly targets and represses a set of other genes andryhB itself is directly repressed by Fur thus ryhB playsan important role in indirect activation of genes by Furin E coli (135760)

It has been suggested that in N meningitidis Fur canalso directly activate genes by binding to the regionupstream of the promoters (59) However our wholegenome analysis did not support this activation mechan-ism in E coli The E coli ryhB gene is located betweengenes yhhX and yhhY No corresponding homologs of thegenes ryhB yhhX or yhhY can be found in N meningitidisThis suggests that Fur activation mechanisms may bedifferent in different species

Disentangling Fur models

Each binding site that we have studied in detail withinformation theory has had unique properties andchallenges for analysis For example ribosome bindingsites were initially modeled as rigid objects (16) but whenenough sites became available it was clear that theShinendashDalgarno sequence was being lost because it has avariable distance to the initiation codon (67) and a flexiblemodel was required (48) The flexible model was latersuccessfully applied to s70 promoters (47) and that wasused to study Fur sites Likewise information theoryanalysis revealed that Fis binding sites are commonlyfound to overlap each other (68) leading to someuncertainty as to how much nearby sites influence thesites being studied in the alignments Overlapping Fursites that are bound by Fur protein dimers create an evenmore complex situation

Although the 24-site M9 model was quite successful themodel is comprised of information from multiple Fur sitesbecause the Fur dimers overlap on the DNA Since theclusters frequently display 6-base separation of sitespositions 3 to thorn9 of the model contain data frompositions 9 to thorn3 of the adjacent Fur sites eg see thelogo of Figure 4 and the M9 walkers of Figure 2AIn addition information from sites separated by 3 basesand the complements of all of these also contribute to theoverall information content of the model It is not obvioushow to disentangle these effects We suggest that the M9model can be scanned across the genome to locate strongsingle-dimer binding sites which could then be experi-mentally confirmed and used to construct a disentangledmodel

Unfortunately the distribution of the individual infor-mation in these sites would be determined by the modelused to locate them and this could introduce biases Forexample the mean of the distribution can be altered justby selecting different subsets of sites from the distribution

A similar bias has occurred in binding site databaseswhere the consensus sequence was used to locate newsites (23) Thus although the sequence logo would nolonger contain self-overlapping data it would not neces-sarily represent the natural distribution of site strengthsTo approach such an ultimate model may require furtherscanning of footprinted regions Incorporation of theweaker sites in the footprints would of course re-entanglethe model so such scans might be best used only todetermine whether the model can detect the weaker sitesA viable solution may be to obtain all single-dimerbinding sites in the genome that have more than zero bitsand can be confirmed experimentally If the resultingmodel matches the observed footprints it may be consi-dered complete On the other hand if the resultingmodel does not match the footprints the mode of bindingby Fur could be different between single dimers andclusters

SUPPLEMENTARY DATA

Supplementary data are available at NAR Online

ACKNOWLEDGEMENTS

We thank Tom OrsquoHalloran and Caryn Outten for Furprotein Pete Rogan for helping to develop the local bestconcept Elaine Bucheimer for writing the localbestprogram and Lakshmanan Iyer Brent Jewett DanielleNeedle Xiao Ma Shu Ouyang Denise Rubens andBruce Shapiro for their helpful comments and discussionsKAL also thanks the National Cancer Institute atFrederick for sponsoring the Werner H Kirsten StudentIntern Program This publication has been funded in partwith Federal funds from the National Cancer InstituteNational Institutes of Health under contract NO1-CO-12400 The content of this publication does not necessarilyreflect the views or policies of the Department of Healthand Human Services nor does mention of trade namescommercial products or organizations imply endorsementby the US Government This research was supported inpart by the Intramural Research Program of the NIHNational Cancer Institute Center for Cancer ResearchFunding to pay the Open Access publication charges forthis article was provided by National Cancer Institute

Conflict of interest statement None declared

REFERENCES

1 StojiljkovicI BaumlerAJ and HantkeK (1994) Fur regulon ingram-negative bacteria Identification and characterization of newiron-regulated Escherichia coli genes by a fur titration assay[published erratum appears in J Mol Biol 1994 Jul 15240(3)271]J Mol Biol 236 531ndash545

2 Le CamE FrechonD BarrayM FourcadeA and DelainE(1994) Observation of binding and polymerization of Fur repressoronto operator-containing DNA with electron and atomic forcemicroscopes Proc Natl Acad Sci USA 91 11816ndash11820

3 CalderwoodSB and MekalanosJJ (1987) Iron regulation ofShiga-like toxin expression in Escherichia coli is mediated by thefur locus J Bacteriol 169 4759ndash4764

Nucleic Acids Research 2007 Vol 35 No 20 6775

4 NiederhofferEC NaranjoCM BradleyKL and FeeJA (1990)Control of Escherichia coli superoxide dismutase (sodA and sodB)genes by the ferric uptake regulation (fur) locus J Bacteriol 1721930ndash1938

5 EscolarL Perez-MartinJ and de LorenzoV (1998) Binding of theFur (Ferric Uptake Regulator) repressor of Escherichia coli toarrays of the GATAAT sequence J Mol Biol 283 537ndash547

6 BaggA and NeilandsJB (1987) Ferric uptake regulation proteinacts as a repressor employing iron (II) as a cofactor to bind theoperator of an iron transport operon in Escherichia coliBiochemistry 26 5471ndash5477

7 de LorenzoV WeeS HerreroM and NeilandsJB (1987)Operator sequences of the aerobactin operon of plasmid ColV-K30binding the ferric uptake regulation (fur) repressor J Bacteriol169 2624ndash2630

8 AlthausEW OuttenCE OlsonKE CaoH andOrsquoHalloranTV (1999) The ferric uptake regulation (Fur) repressoris a zinc metalloprotein Biochemistry 38 6559ndash6569

9 PohlE HallerJC MijovilovichA Meyer-KlauckeWGarmanE and VasilML (2003) Architecture of a protein centralto iron homeostasis crystal structure and spectroscopic analysis ofthe ferric uptake regulator Mol Microbiol 47 903ndash915

10 KammlerM SchonC and HantkeK (1993) Characterization ofthe ferrous iron uptake system of Escherichia coli J Bacteriol 1756212ndash6219

11 DesaiPJ AngererA and GencoCA (1996) Analysis of Furbinding to operator sequences within the Neisseria gonorrhoeae fbpApromoter J Bacteriol 178 5020ndash5023

12 HeidrichC HantkeK BierbaumG and SahlHG (1996)Identification and analysis of a gene encoding a Fur-likeprotein of Staphylococcus epidermidis FEMS Microbiol Lett 140253ndash259

13 McHughJP Rodriguez-QuinonesF Abdul-TehraniHSvistunenkoDA PooleRK CooperCE and AndrewsSC(2003) Global iron-dependent gene regulation in Escherichia coliA new mechanism for iron homeostasis J Biol Chem 27829478ndash29486

14 ShannonCE (1948) A mathematical theory of communicationBell Syst Tech J 27 379ndash423 623ndash656 httpcmbell-labscomcmmswhatshannondaypaperhtml

15 PierceJR (1980) An Introduction to Information Theory SymbolsSignals and Noise 2nd edn Dover Publications Inc New York

16 SchneiderTD StormoGD GoldL and EhrenfeuchtA (1986)Information content of binding sites on nucleotide sequencesJ Mol Biol 188 415ndash431 httpwwwccrnpncifcrfgov~tomspaperschneider1986

17 SchneiderTD and MastronardeD (1996) Fast multiple alignmentof ungapped DNA sequences using information theory and arelaxation method Discrete Appli Math 71 259ndash268 httpwwwccrnpncifcrfgov~tomspapermalign

18 SchneiderTD and StephensRM (1990) Sequence logos a newway to display consensus sequences Nucleic Acids Res 186097ndash6100 httpwwwccrnpncifcrfgov~tomspaperlogopaper

19 SchneiderTD (1997) Information content of individual geneticsequences J Theor Biol 189 427ndash441 httpwwwccrnpncifcrfgov~tomspaperri

20 SchneiderTD (1997) Sequence walkers a graphical method todisplay how binding proteins interact with DNA or RNAsequences Nucleic Acids Res 25 4408ndash4415 httpwwwccrnpncifcrfgov~tomspaperwalker erratum NAR 26(4)1135 1998

21 PaninaEM MironovAA and GelfandMS (2001) Comparativeanalysis of FUR regulons in gamma-proteobacteria Nucleic AcidsRes 29 5195ndash5206

22 SchneiderTD (2000) Evolution of biological informationNucleic Acids Res 28 2794ndash2799 httpwwwccrnpncifcrfgov~tomspaperev

23 VyhlidalCA RoganPK and LeederJS (2004) Development andrefinement of pregnane X receptor (PXR) DNA binding site modelusing information theory insights into PXR-mediated gene regula-tion J Biol Chem 279 46779ndash46786

24 ZhengM DoanB SchneiderTD and StorzG (1999) OxyR andSoxRS regulation of fur J Bacteriol 181 4639ndash4643 httpwwwccrnpncifcrfgov~tomspaperoxyrfur

25 HengenPN BartramSL StewartLE and SchneiderTD (1997)Information analysis of Fis binding sites Nucleic Acids Res 254994ndash5002 httpwwwccrnpncifcrfgov~tomspaperfisinfo

26 WoodTI GriffithKL FawcettWP Jair K-W SchneiderTDand WolfRE (1999) Interdependence of the position andorientation of SoxS binding sites in the transcriptional activation ofthe class I subset of Escherichia coli superoxide-inducible promotersMol Microbiol 34 414ndash430

27 ZhengM WangX DoanB LewisKA SchneiderTD andStorzG (2001) Computation-directed identification of OxyR-DNAbinding sites in Escherichia coli J Bacteriol 183 4571ndash4579

28 RoganPK FauxBM and SchneiderTD (1998) Informationanalysis of human splice site mutations Hum Mutat 12 153ndash171Erratum in Hum Mutat 199913(1)82 httpwwwccrnpncifcrfgov~tomspaperrfs

29 ChenZ and SchneiderTD (2005) Information theory basedT7-like promoter models classification of bacteriophages anddifferential evolution of promoters and their polymerasesNucleic Acids Res 33 6172ndash6187 httpwwwccrnpncifcrfgov~tomspaperst7like

30 ChenZ and SchneiderTD (2006) Comparative analysis of tandemT7-like promoter containing regions in enterobacterial genomesreveals a novel group of genetic islands Nucleic Acids Res 341133ndash1147 httpwwwccrnpncifcrfgov~tomspaperst7island

31 SchneiderTD (1996) Reading of DNA sequence logos predictionof major groove binding by information theory Meth Enzymol274 445ndash455 httpwwwccrnpncifcrfgov~tomspaperoxyr

32 GriggsDW and KoniskyJ (1989) Mechanism for iron-regulatedtranscription of the Escherichia coli cir gene metal-dependentbinding of Fur protein to the promoters J Bacteriol 1711048ndash1052

33 AngererA and BraunV (1998) Iron regulates transcription of theEscherichia coli ferric citrate transport genes directly and throughthe transcription initiation proteins Arch Microbiol 169 483ndash490

34 de LorenzoV HerreroM GiovanniniF and NeilandsJB (1988)Fur (ferric uptake regulation) protein and CAP (catabolite-activatorprotein) modulate transcription of fur gene in Escherichia coliEur J Biochem 173 537ndash546

35 HuntMD PettisGS and McIntoshMA (1994) Promoter andoperator determinants for fur-mediated iron regulation in thebidirectional fepA-fes control region of the Escherichia colienterobactin gene system J Bacteriol 176 3944ndash3955

36 TardatB and TouatiD (1993) Iron and oxygen regulation ofEscherichia coli MnSOD expression competition between the globalregulators Fur and ArcA for binding to DNA Mol Microbiol 953ndash63

37 EscolarL Perez-MartinJ and de LorenzoV (2000) Evidence ofan unusually long operator for the fur repressor in the aerobactinpromoter of Escherichia coli J Biol Chem 275 24709ndash24714

38 YoungGM and PostleK (1994) Repression of tonB transcriptionduring anaerobic growth requires Fur binding at the promoterand a second factor binding upstream Mol Microbiol 11943ndash954

39 BrickmanTJ OzenbergerBA and McIntoshMA (1990)Regulation of divergent transcription from the iron-responsivefepB-entC promoter-operator regions in Escherichia coliJ Mol Biol 212 669ndash682

40 BaichooN and HelmannJD (2002) Recognition of DNA by Fura reinterpretation of the Fur box consensus sequence J Bacteriol184 5826ndash5832

41 OchsnerUA and VasilML (1996) Gene repression by the ferricuptake regulator in Pseudomonas aeruginosa cycle selection of iron-regulated genes Proc Natl Acad Sci USA 93 4409ndash4414

42 BaichooN WangT YeR and HelmannJD (2002) Globalanalysis of the Bacillus subtilis Fur regulon and the iron starvationstimulon Mol Microbiol 45 1613ndash1629

43 ChristoffersenCA BrickmanTJ Hook-BarnardI andMcIntoshMA (2001) Regulatory architecture of the iron-regulatedfepD-ybdA bidirectional promoter region in Escherichia coliJ Bacteriol 183 2059ndash2070

44 ChaiS WelchTJ and CrosaJH (1998) Characterization of theinteraction between Fur and the iron transport promoter of thevirulence plasmid in Vibrio anguillarum J Biol Chem 27333841ndash33847

6776 Nucleic Acids Research 2007 Vol 35 No 20

45 EscolarL Perez-MartinJ and de LorenzoV (1998) Coordinatedrepression in vitro of the divergent fepA-fes promoters ofEscherichia coli by the iron uptake regulation (Fur) proteinJ Bacteriol 180 2579ndash2582

46 EscolarL de LorenzoV and Perez-MartinJ (1997)Metalloregulation in vitro of the aerobactin promoter of Escherichiacoli by the Fur (ferric uptake regulation) protein Mol Microbiol26 799ndash808

47 ShultzabergerRK ChenZ LewisKA and SchneiderTD (2007)Anatomy of Escherichia coli s70 promoters Nucleic Acids Res 35771ndash788 httpwwwccrnpncifcrfgov~tomspaperflexprom

48 ShultzabergerRK BucheimerRE RuddKE andSchneiderTD (2001) Anatomy of Escherichia coli ribosomebinding sites J Mol Biol 313 215ndash228 httpwwwccrnpncifcrfgov~tomspaperflexrbs

49 HiraoI KawaiG YoshizawaS NishimuraY IshidoYWatanabeK and MiuraK (1994) Most compact hairpin-turnstructure exerted by a short DNA fragment d(GCGAAGC) insolution an extraordinarily stable structure resistant to nucleasesand heat Nucleic Acids Res 22 576ndash582

50 Eick-HelmerichK and BraunV (1989) Import of biopolymers intoEscherichia coli nucleotide sequences of the exbB and exbD genesare homologous to those of the tolQ and tolR genes respectivelyJ Bacteriol 171 5117ndash5126

51 VassinovaN and KozyrevD (2000) A method for direct cloning ofFur-regulated genes identification of seven new Fur-regulated lociin Escherichia coli Microbiology 146 3171ndash3182

52 ToledanoMB KullikI TrinhF BairdPT SchneiderTD andStorzG (1994) Redox-dependent shift of OxyR-DNA contactsalong an extended DNA binding site a mechanism for differentialpromoter selection Cell 78 897ndash909

53 BlattnerFR PlunkettG III BlochCA PernaNT BurlandVRileyM Collado-VidesJ GlasnerJD RodeCK et al (1997)The complete genome sequence of Escherichia coli K-12 Science277 1453ndash1474

54 BaggA and NeilandsJB (1987) Molecular mechanism of regula-tion of siderophore-mediated iron assimilation Microbiol Rev 51509ndash518

55 LavrrarJL and McIntoshMA (2003) Architecture of a Furbinding site a comparative analysis J Bacteriol 185 2194ndash2202

56 OrchardK and MayGE (1993) An EMSA-based method fordetermining the molecular weight of a protein ndash DNA complexNucleic Acids Res 21 3335ndash3336

57 MasseE and GottesmanS (2002) A small RNA regulates theexpression of genes involved in iron metabolism in Escherichia coliProc Natl Acad Sci USA 99 4620ndash4625

58 LavrrarJL ChristoffersenCA and McIntoshMA (2002)FurndashDNA interactions at the bidirectional fepDGC-entS promoterregion in Escherichia coli J Mol Biol 322 983ndash995

59 DelanyI RappuoliR and ScarlatoV (2004) Fur functions as anactivator and as a repressor of putative virulence genes in Neisseriameningitidis Mol Microbiol 52 1081ndash1090

60 MasseE VanderpoolCK and GottesmanS (2005) Effect ofRyhB small RNA on global iron use in Escherichia coliJ Bacteriol 187 6962ndash6971

61 ArgamanL HershbergR VogelJ BejeranoG WagnerEGMargalitH and AltuviaS (2001) Novel small RNA-encodinggenes in the intergenic regions of Escherichia coli Curr Biol 11941ndash950

62 WassarmanKM RepoilaF RosenowC StorzG andGottesmanS (2001) Identification of novel small RNAs usingcomparative genomics and microarrays Genes Dev 15 1637ndash1651

63 RobisonK McGuireAM and ChurchGM (1998) A compre-hensive library of DNA-binding site matrices for 55 proteins appliedto the complete Escherichia coli K-12 genome J Mol Biol 284241ndash254

64 SalgadoH Santos-ZavaletaA Gama-CastroS Millan-ZarateDDiaz-PeredoE Sanchez-SolanoF Perez-RuedaEBonavides-MartinezC and Collado-VidesJ (2001) RegulonDB(version 32) transcriptional regulation and operon organization inEscherichia coli K-12 Nucleic Acids Res 29 72ndash74

65 AndrewsSC RobinsonAK and Rodriguez-QuinonesF (2003)Bacterial iron homeostasis FEMS Microbiol Rev 27 215ndash237

66 SchneiderTD (2002) Consensus Sequence Zen ApplBioinformatics 1 111ndash119 httpwwwccrnpncifcrfgov~tomspaperszen

67 RuddKE and SchneiderTD (1992) Compilation of E coliribosome binding sites In MillerJH (ed) A Short Course inBacterial Genetics A Laboratory Manual and Handbook forEscherichia coli and Related Bacteria Cold Spring HarborCold Spring Harbor Laboratory Press New York pp 1719ndash1745

68 HengenPN LyakhovIG StewartLE and SchneiderTD(2003) Molecular flip-flops formed by overlapping Fis sitesNucleic Acids Res 31 6663ndash6673

69 FeeJA (1991) Regulation of sod genes in Escherichia colirelevance to superoxide dismutase function Mol Microbiol 52599ndash2610

70 IgarashiK SaishoT YuguchiM and KashiwagiK (1997)Molecular mechanism of polyamine stimulation of thesynthesis of oligopeptide-binding protein J Biol Chem 2724058ndash4064

71 YoshidaM MeksuriyenD KashiwagiK KawaiG andIgarashiK (1999) Polyamine stimulation of the synthesis ofoligopeptide-binding protein (OppA) Involvement of a structuralchange of the Shine-Dalgarno sequence and the initiation codonaug in oppa mRNA J Biol Chem 274 22723ndash22728

72 LitwinCM and CalderwoodSB (1993) Role of iron in regulationof virulence genes Clin Microbiol Rev 6 137ndash149

73 MakuiH RoigE ColeST HelmannJD GrosP andCellierMF (2000) Identification of the Escherichia coli K-12Nramp orthologue (MntH) as a selective divalent metal iontransporter Mol Microbiol 35 1065ndash1078

74 PatzerSI and HantkeK (2001) Dual repression by Fe(2+)-Furand Mn(2+)-MntR of the mntH gene encoding an NRAMP-likeMn(2+) transporter in Escherichia coli J Bacteriol 1834806ndash4813

75 FurrerJL SandersDN Hook-BarnardIG and McIntoshMA(2002) Export of the siderophore enterobactin in Escherichia coliinvolvement of a 43 kDa membrane exporter Mol Microbiol 441225ndash1234

76 CunninghamL GruerMJ and GuestJR (1997) Transcriptionalregulation of the aconitase genes (acnA and acnB) of Escherichiacoli Microbiology 143 ( Pt 12) 3795ndash3805

77 FuentesAM Diaz-MejiaJJ Maldonado-RodriguezR andAmabile-CuevasCF (2001) Differential activities of theSoxR protein of Escherichia coli SoxS is not required for geneactivation under iron deprivation FEMS Microbiol Lett 201271ndash275

78 VargheseS TangY and ImlayJA (2003) Contrasting sensitivitiesof Escherichia coli aconitases A and B to oxidation and irondepletion J Bacteriol 185 221ndash230

79 ZavitzKH DiGateRJ and MariansKJ (1991) The PriB andPriC replication proteins of Escherichia coli Genes DNAsequence overexpression and purification J Biol Chem 26613988ndash13995

80 SandlerSJ (2005) Requirements for replication restart proteinsduring constitutive stable DNA replication in Escherichia coli K-12Genetics 169 1799ndash1806

81 SchneiderTD (2001) Strong minor groove base conservation insequence logos implies DNA distortion or base flipping duringreplication and transcription initiation Nucleic Acids Res 294881ndash4891 httpwwwccrnpncifcrfgov~tomspaperbaseflip

Nucleic Acids Research 2007 Vol 35 No 20 6777

Page 7: Discovery of Fur binding site clusters in Escherichia coli ...biitcomm/research/references... · using an fhuF:lacZ fusion and Fur consensus sequence-containing plasmid titrant on

For most of the oligos the M6 model predicted one moresite than the M9 model did Similar to our observationsin E coli these results strongly suggest that the M9 modelis a single-dimer binding model while the M6 alignment isdue to compression of the binding sequencesRelative to E coli Fur (NP_415209) the P aeruginosa

Fur (NP_253452) is moderately diverged (58 identical inprotein sequence) and the B subtilis Fur (NP_390233) isdistantly diverged (33 identity) However a comparisonof the three M9 models from these three species showsthat these models are highly similar to each other(Figure 4) suggesting that the FurndashDNA recognitionmechanism is conserved in even distantly related species

Scans of footprinted regions

Gel shift experiments cannot give precise Fur bindingregions Originally using the 11-site M9 model and laterusing the 24-site M9 E coli Fur model we predicted Fursites in the fhuF promoter region and then to validate ourmodel we performed DNase I footprinting on that regionTo further confirm the 24-site M9 model we also scannedit across all other available footprinted E coli Fur bindingregions and correlated our predictions with the publishedfootprints Fur has been documented to exhibit lsquosecondaryfootprintingrsquo protecting extended regions of DNA underhigher protein concentrations (Supplementary Figure S8)(333646) To reveal the extended regions we used a cutoffat zero bits since that is the theoretical lowest informationfor binding (19)The E coli 24-site M9 model predicts two separated

Fur binding clusters in the fhuF promoter region onecluster contains three strong sites of 161 208 and 223bits and the other contains two weaker sites of 146 and

93 bits (Figure 5A) The fhuF oligo used in the gel shiftexperiments in Figure 3 only contains the strong Furcluster To determine if both predicted clusters are pro-tected by Fur we performed DNase I footprinting on alarger fhuF promoter region (see Materials and MethodsSection) that contains both predicted Fur binding clustersThe results show that there are indeed two Fur-protectedregions in the fhuF promoter one has high affinity withFur (protected by 50 nM of Fur) and the other hasa lower affinity (weakly protected by 50 nM of Fur)(Figure 5B) The predicted strong Fur binding clustercovers 91 of the high-affinity footprinted region and theweak Fur cluster covers 66 of the low-affinity region(Table 3 Figure 5A)

There are 11 other footprinted E coli Fur bindingregions including the 10 regions used to build the initial11-site models and the fepD-entS promoter region (58)When these footprints were scanned similar coveragewas obtained (Table 3 Supplementary Figure S8)All 11 footprinted sequences displayed clusters of multipleoverlapping Fur walkers in these clusters the majorityof the walkers were separated by six bases some wereseparated by three bases (Table 3 SupplementaryFigure S8) For 7 of the 11 regions (fepA-fes fepB entCfur tonB cir and iucA) sequence walkers cover 83(72ndash93) of the footprints No sequence walkers coverthe secondary footprints in the fecA fepD-entS and sodAregions resulting in low coverage of these three footprints(66 54 and 56 respectively) The secondary sodAfootprint was only protected with a high concentration ofpurified Fur (200 and 500 nM) it was not protected bycrude extracts containing overproduced Fur (36) In theother region fecI several low-information content

A

B

yoeA fepD-entS gpmA ryhB fhuA nohA oppA

bits 257 172 243 220 161 205 187 11-site M9274 194 274 252 195 223 215 24-site M9

Fur minus + minus + minus + minus + minus + minus + minus +

gspC garP yahA fadD ygaC exbB exbD fhuF

bits 122 122 163 197 14 175 minus50 214 11-site M9144 138 186 217 minus34 191 minus71 223 24-site M9

Fur minus + minus + minus + minus + minus + minus + minus + minus +

Figure 3 Gel mobility shift assays to test predicted Fur binding sites Oligonucleotides containing predicted Fur binding sites were incubated without() or with (+) 150 nM E coli Fur protein and gel electrophoresed to test the 11-site M9 model (Figure 1B) Below each set of lanes is the strengthin bits of the strongest sequence walker found on each oligo by the 11-site M9 model Predictions by the 24-site M9 model which includes 13 of thesites tested here (not ygaC or exbD) are also given for comparison (A) The first set of oligos contain seven predicted Fur sites located in promoterregions (yoeA fepD-entS gpmA ryhB fhuA nohA and oppA) (B) The second set of oligos contain four predicted Fur sites located within genes(gspC garP yahA and fadD See Supplementary Figures S10ndashS13 for sequence walkers) two sites located in promoter regions (exbB and fhuF)and two consensus-predicted sites (ygaC and exbD) (5051) As expected all oligos that contain predicted sites (by the 11-site M9 model) exhibit oneor more mobility shifts following incubation with 150 nM Fur the oligo ygaC (14 bits) shows extremely weak binding and exbD (50 bits)does not shift

6768 Nucleic Acids Research 2007 Vol 35 No 20

walkers (02 and 08 bits) extend past the range of thefootprint resulting in 144 coverage of the footprint

In a previously published study Escolar et al synthe-sized oligonucleotides that contained repeats of thesequence GATAAT and determined Fur binding byfootprint experiments (5) No footprint was seen withone insert correspondingly no sequence walkers wereobserved in that sequence (Supplementary Figure S9)For the oligos with three four and five inserts sequencewalkers cover 83ndash90 of the footprints The 2-insertsequence had a weak interaction with Fur at high protein

concentrations and two overlapping walkers of 30 and42 bits appeared which cover the repeats (5) Theseresults demonstrate that the Fur model accurately predictsFur binding on both natural and synthetic sequences

Whole genome scan

The lowest information of any site in the 24-site M9 modelis 101 bits (Supplementary Figure S1) Since all sites inthe model are proven binding sites we used 10 bits asthe cutoff for a whole genome scan This presumably

A

B

C

======gtolt======

======gtolt======

======gtolt======

24 E coli Fur binding sites 196 bits (minus9 +9)

0

1

2b

its

5prime

minus20

CTGA

minus19

minus18

minus17

minus16

GCAT

minus15

GCTA

minus14

CGTA

minus13

GACT

minus12

CAGT

minus11

CGTA

minus10

GCAT

minus9

GTA

minus8

GTA

minus7

GCAT

minus6

CAG

minus5

GTCA

minus4

GAT

minus3

TA

minus2

TGCA

minus1

GACT

0 1

TCGA

2

ACGT

3

AT

4

TCA

5

CGAT

6

GTC

7

CTGA

8

CAT

9

CAT

10

CGTA

11

GCAT

12

GTCA

13

CTGA

14

GCAT

15

CGAT

16

CGTA

17 18 19 20

GCAT

3prime

20 P aeruginosa Fur sites 186 bits (minus9 +9)

0

1

2

bit

s

5prime

minus20

minus19

minus18

minus17

minus16

minus15

minus14

CGTA

minus13

ACGT

minus12

minus11

minus10

CGTA

minus9

CGTA

minus8

CTA

minus7

CT

minus6

TAG

minus5

GTCA

minus4

CAGT

minus3

TA

minus2

CTA

minus1

CT

0

GC

1

GA

2

GAT

3AT

4GTCA

5

CAGT

6

ATC

7

GA

8

GAT

9

GCAT

10

GCAT

11 12 13

TGCA

14

GCAT

15 16 17 18 19 20 3prime

22 B subtilis Fur sites 229 bits (minus9 +9)

0

1

2

bit

s

5prime

minus20

minus19

GCAT

minus18

GCTA

minus17

GACT

minus16

CTGA

minus15

minus14

GCAT

minus13

GCAT

minus12

minus11

GCAT

minus10

GCTA

minus9

CGTA

minus8

CGTA

minus7

AT

minus6

C

G

minus5

GCA

minus4

CGAT

minus3

TGA

minus2

TGA

minus1

GCAT

0

TAGC

1

GCTA

2

CAT

3

CAT

4

GCTA

5

CGT

6

G

C7

TA

8

GCAT

9

GCAT

10

CGAT

11

CGTA

12 13

CGTA

14

CGTA

15 16

GACT

17

CTGA

18

CGAT

19

CGTA

20 3prime

Figure 4 Comparison of M9 Fur models of three bacterial species Experimentally proven Fur binding sites from E coli (A) P aeruginosa (B) andB subtilis (C) were used to build the models for these three bacteria The marked region from 7 to +7 in each logo indicates the 7-1-7 model thatwas proposed by Baichoo and Helmann (40)

Nucleic Acids Research 2007 Vol 35 No 20 6769

minimizes false positives in our predictions When themodel was scanned across the E coli K-12 genome(NC_000913) a total of 412 sites were found above10 bits These sites belong to 363 unique clusters A list ofthe clusters is available at httpwwwccrnpncifcrfgovtomspapersfurThe 363 regions were extracted from 50 to thorn50

around the strongest site in each region and scanned withthe 24-site M9 model using a cutoff of 0 bits We thendetermined the exact region of each cluster that is coveredby walkers greater than 0 bits The results show that theseclusters contain 1ndash14 walkers covering a region from19ndash142 bases (350 bases on average) This is comparableto the known footprints (30ndash103 bases) (Table 3 Supple-mentary Figure S8) Of these clusters 155 are in inter-genic regions 143 are entirely inside gene coding regionsand 65 overlap the boundary between coding and inter-genic regionsTo determine how many of the Fur clusters overlap

with reasonably strong promoters we scanned these 363Fur clusters with a s70 promoter model (47) using a 4-bitcutoff The results showed that 303 clusters overlap withone or more strong promoters (Rigt4 bits) The other60 clusters do not overlap with a strong promoterWhen we scanned the flanking regions of these 60 clusters(50 to 0 of the left edge and 0 to thorn50 of the right edgeof each cluster) only 23 clusters were found to havestrong promoters (Ri gt 4 bits) in the flanking regionsWe examined the locations of Fur sites relative toknown promoter elements (transcriptional start site 10and 35) and found that they are more tightly distributedaround the promoter elements than observed with theDNA binding proteins Fis H-NS and IHF (47) (data notshown) These results suggest that Fur may mainly act as adirect repressor of genes by binding to and blockingthe promoters and that direct activation by Fur throughbinding to the region upstream of a promoter assuggested in Neisseria meningitidis by Delany et al (59)

may not be a common mechanism of Fur regulationin E coli

Out of the 363 clusters 42 were found containingat least one predicted Fur binding site above 170 bits(our arbitrary cutoff used to locate significant regions)(Table 4) Of the 42 clusters 39 were found to overlap withone or more strong s70 promoters (gt4 bits) The otherthree clusters fepD-entS yddA and fecA do not overlapScanning the flanking regions of these three clusters withthe s70 promoter model revealed that these Fur clustersare located between the respective promoters and transla-tional starts There may be only weak repression of thesegenes by Fur as none of them were significantly repressedby Fur according to microarray analyses (1360)

Scans of other proposed Fur-regulated genes

Many genes have been proposed to be Fur-regulated bycomparing conventional consensus sequences to promoterregions and also by homology to systems in other organ-isms Kammler et al annotated two Fur binding sites inthe promoter region of feoA in E coli by comparison toa consensus sequence (10) The 24-site M9 model predictstwo clusters of sites in the same region with each clustercovering one of the marked Fur boxes One cluster hasone major site of 105 bits (at 3538006) the other clusterhas two major overlapping sites of 109 (at 3538059) and134 bits (at 3538065) respectively The same authors haveconfirmed that Fur does bind to this region in vivo but theexact Fur binding regions have not been determined byfootprint experiments

Vassinova and Kozyrev used an in vivo selection tolocate Fur sites on Sau3A fragments from the E coligenome (51) The five regions from Figure 2 of their paperwere analyzed using sequence walkers Using an unidenti-fied consensus sequence yhhX (from 3578828 to 3577791)was predicted by Vassinova and Kozyrev to have two Furbinding sites In the same region we found one strongsite of 252 bits (at 3579054) (Table 4) overlapping twoweaker sites (82 and 83 bits) This cluster of Fur sites isactually located right in front of an sRNA gene ryhB(from 3579039 to 3578946) (Table 4) (6162) which hasbeen shown to be repressed by Fur (57) The promoterregion of gpmA (from 786818 to 786066 named pgm inreference 51) was predicted by consensus to contain twosites One of the highest sites (274 bits) was found in thisregion (at 786853) overlapping two weaker sites of 75and 56 bits The consensus sequence-predicted Fur site byVassinova and Kozyrev (51) is located about 600 basesupstream of the ygaC gene In the same region our 11-siteM9 model only found a 14-bit site (Figure 3) and the24-site M9 model found a negative site of 34 bits(Supplementary Figure S1) Instead in the nearby regionsthe 24-site M9 model found a 74-bit site (11 basesupstream of ygaC) and a 101-bit site (728 bases upstreamof ygaC) For nohA (from 1634391 to 1633822)two overlapping sites of 223 and 126 bits were found at1634624 and 1634630 (Table 4) The fifth region was fhuF(from 4603686 to 4602898) Figure 5 shows the DNase Ifootprints and predicted walkers in this region Onlythe high-affinity region (fhuF1) was identified by the

Table 2 Information theory test of B subtilis Fur models M9 and M6

Oligo1 Number of observed1

Fur dimers (nM)Number of M92

walkers (bits)Number of M62

walkers (bits)

6-mer 0 f 0 f 06-6 1 (500) f 1 (21) 2 (67 54)6-1-6 1 (200) f 1 (97) 2 (54 74)7-1-7 1 (100) f 1 (207) 2 (107 128)8-1-7 2 (100 1000) f 2 (76 202) f 2 (174 108)8-1-8 2 (100 1000) f 2 (76 230) f 2 (174 160)9-1-9 2 (50 100) f 2 (119 301) f 2 (208 194)2 (7-1-7) 2 (10 100) f 2 (166 257) 3 (61 208 143)feuA 1 (10) f 1 (223) 2 (103 81)dhbA 2 (10 100) f 2 (256 299) 3 (149 208 171)

1Oligos and observed Fur dimers are from (40) The Fur concentration(nM) at which one or two Fur dimer-DNA complexes appeared isgiven in parentheses2Number of sequence walkers of each model followed by theinformation (bits) of each walker are shown for each sequence Thepredictions that match the number of observed Fur dimers are markedwith a solid circle while those that do not are marked with an opencircle Only major walkers are counted A full depiction of all walkerscan be found in Supplementary Figure S7

6770 Nucleic Acids Research 2007 Vol 35 No 20

A B

L

H

H

L

H fhuF1

L fhuF2

yjjZ

4603880 4603870 4603860 4603850 4603840 4603830 4603820 5 a t c a g c a a t c c c g g c a g c a a c a c t c c c c a g c c a c t g c c c a g c g t a c g t t g c a a c a t g a t t t c a t c t c 3

4603810 4603800 4603790 4603780 4603770 4603760 4603750 5 t t t c a t t g a t a a t g a t a a c c a a t a t c a t a t g a t a a t t t t t a t c a t t t g c a a g c c a g a t a a a t c c c t t g c t a t c g g 3

Fur high affinity topFur high affinity bottom

M9 161 bits

M9 208 bits

M9 223 bits

M9 34 bits

fhuF

fhuF

4603740 4603730 4603720 4603710 4603700 4603690 4603680 5 g t a a a c c t a t c g c t a t g a t t a g c a a t c a t t a t c a t t t a g a t t a c t a t c c c g a t t a t g g c c t a t c g t t c 3rsquo

Fur low affinity topFur low affinity bottom

M9 22 bits

M9 01 bits

M9 146 bits

M9 93 bits

Figure 5 Fur model scan and DNase I footprints of the fhuF promoter region (A) The promoter region was scanned with the 24-site M9 model and two clusters of sequence walkers (Rigt 0bits) were detected with each closely corresponding to one of the two Fur protected regions (gray bars fhuF1 and fhuF2) The solid black arrow with two tails above the DNA at coordinates4603717 and 4603716 indicates the fhuF transcription starts The open arrows starting at coordinates 4603827 and 4603686 indicate the yjjZ and fhuF translation starts respectively Thesaturation of colored rectangles behind each sequence walker is proportional to the information of that site For reference colored lines cycle with 6-base periodicity (B) DNase I footprinting onthe fhuF promoter by Fur showed two regions protected by the protein marked in the figure by brackets The high-affinity and low-affinity regions are marked with H and L respectively Thefootprinting samples were run in parallel with MaxamndashGilbert sequencing ladders (marked by GA)

Nucleic

Acid

sResea

rch2007V

ol3

5N

o20

6771

consensus sequence Thus four of the five Sau3Afragments (yhhX gpmA nohA and fhuF) selected to haveFur binding sites in vivo were identified in our genomescan (Table 4) and our gel shift results confirmed this(Figure 3) the remaining site ygaC had a weak walker of14 bits (by the 11-site M9 model) and showed extremelyweak binding (Figure 3)Eick-Helmerich and Braun matched a Fur consensus

sequence to the promoters of exbB and exbD (50)One strong walker of 191 bits was found in the exbBpromoter region overlapping the lsquoFur boxrsquo The regionupstream of exbD showed no walkers with Ri greater than0 bits even though it had also been predicted to containa Fur binding site using the same consensus sequenceas used in the exbB promoter The highest walker thatwe could detect in the region upstream of the exbDgene is 71 bits (by the 24-site M9 model Figure 3 andSupplementary Figure S1) significantly lower than thetheoretical lower bound of a binding site (0 bits) (19) Thisindicates that Fur should not bind this region and our gelshift results confirmed this (Figure 3)A microarray analysis by McHugh et al found 101

genes to be regulated by Fur 53 of which were repressedand 48 activated (13) The 53 Fur-repressed genes belongto 32 transcription units (individual genes or operons)18 of which are involved in iron metabolism 14 in energymetabolism and other functions Using the 24-site M9model we predicted strong Fur binding clusters (higherthan 13 bits) for all 18 iron metabolism-related transcrip-tion units Of the other 14 transcription units onlytwo have a Fur binding site higher than 10 bits (nrdHIEF101 bits fimE 108 bits) The 48 Fur-activated genesbelong to 34 transcription units only three of these(garPLRK ynaE and ydfK) have a predicted Fur bindingcluster The garPLRK operon has a weak Fur clusterthat contains one major site of 104 bits The ynaE andydfK genes are almost identical in both promoter andcoding regions and the two Fur clusters (171 bitscentered at thorn8 from the translational start) are the same(Table 4) These results strongly suggest that in E coli

Fur is a direct repressor of iron metabolism genes and anindirect repressor or activator of other genes

McHugh et al (13) used an altered information theoryapproach for modeling Fur binding (63) to identify whichgenes are under direct Fur control Our model identifiedsites in all genes that their model did as well as fourothers fhuE (124 bits at 1160880) exbB (191 bits at3150076) fimE (108 bits at 4539697) and ydfK (171 bitsat 1631071)

The indirect activation by Fur requires an intermediateregulator which could directly target Fur-activated genesA small regulatory RNA gene ryhB is one such importantplayer Earlier study revealed several genes that aredirectly repressed by ryhB (57) Recently a microarrayanalysis of ryhB control by Masse et al (60) expanded thisdirect target set to 18 operons They also identified anadditional 10 indirectly down-regulated operons and 10up-regulated operons The 10 ryhB indirectly repressedoperons are directly repressed by Fur and for all ofthese operons we found strong Fur binding clusters(higher than 13 bits) in the corresponding promoterregions In contrast only a few of the other 28 ryhB-controlled operons (18 directly repressed operons and10 up-regulated operons) have a predicted Fur bindingcluster above 10 bits These include acnA (176 bits at1333484) ydhD (109 bits at 1732367) and oppA (215 bitsat 1298973) The acnA and oppA Fur clusters are bothstrong the former cluster has three overlapping sites of176 98 and 142 bits and the latter has two overlapp-ing sites of 215 and 96 bits (Table 4) Howevermicroarray analysis did not find these two genes to berepressed by Fur (1360) suggesting that the Fur bindingclusters may not be the primary control elements ofthese genes

DISCUSSION

Fur binding models

In this study we used experimentally proven sites to createinformation theory models of Fur binding Our models

Table 3 Scan of footprinted Fur binding sites in E coli

Accession Gene Footprinted region References Number Ri Percentage

NC_000913 fhuF1 4603763 ndash 4603815 This work 4 223 91NC_000913 fhuF2 4603680 ndash 4603733 This work 4 146 66NC_000913 fepA-fes 611868 ndash 611915 (35) 4 187 77NC_000913 fepB 623939 ndash 623969 (39) 2 167 81NC_000913 entC 624054 ndash 624102 (39) 4 231 76NC_000913 fur 709931 ndash 709960 (34) 2 123 83NC_000913 tonB 1309040 ndash 1309072 (838) 3 115 88NC_000913 cir 2244957 ndash 2244999 (32) 3 207 72NC_000913 sodA 4098726 ndash 4098780 (36) 3 233 56NC_000913 fecA 4514752 ndash 4514789 (33) 2 176 66NC_000913 fecI 4516300 ndash 4516340 (33) 7 231 144M10930 iucA 291 ndash 393 (37) 13 203 93NC_000913 fepD-entS 621394 ndash 621462 (58) 4 201 54

The footprinted regions were scanned with the 24-site M9 model (Rigt0) to test the modelrsquos validity This table lists the GenBank accession numberthe gene promoter that was footprinted the coordinates of the footprinted region the references for the footprints the number of walkers in theregion the strongest Ri value in the same region and the percent of the footprint covered by sequence walkers This table is a summary of Figure 5Aand Supplementary Figure S8 Sites in the promoters that are marked by an asterisk were used to build the 11-site models (Figure 1 SupplementaryFigure S1) in the case of iucA two sites were used in the model (see Materials and Methods Section)

6772 Nucleic Acids Research 2007 Vol 35 No 20

especially the 24-site M9 model approximate the bindingcharacteristics of Fur more fully than previous modelsthat depended on conventional consensus sequences datafrom multiple species and sequences that were not foot-printed (13251) The rigorous approach used hererevealed new binding sites disproved two sites predictedby a consensus sequence and clarified the manner inwhich the protein binds

To produce these models binding sequences weremultiply aligned to maximize the information content indifferent window sizes Because multiple Fur sites overlapour analysis of proven Fur binding sequences from three

bacterial species gave two or three different alignmentsdepending on the species (Figure 1 SupplementaryFigure S6) By testing these models against publisheddata we showed clearly that the M9 model representssingle-dimer binding sites the M12 model is for two-dimerbinding while the compact M6 alignment was caused bycompression of the binding sequences into a small window(Tables 1 and 2 Figures 1 and 2 Supplementary FiguresS3 S5 and S7)As evaluated by their respective M9 models our

initial set of 11 footprinted E coli Fur binding sitesall contain overlapping sites of 6-base spacing

Table 4 Strong Fur clusters in the E coli genome predicted by the 24-site M9 model

Ri Number Size Coordinate Genes Experimental Position References(bits) (bp) data

274 12 95 2066611 yoeA G 48 NA274 5 31 786853 gpmA G 35 (1351)253 4 28 579824 nohB 133 (1)252 4 31 3579054 ryhB G 15 (57)233 3 31 4098746 sodA F 87 (6069)231 7 59 4516309 fecI F 51 (1360)231 4 37 624072 entC F 36 (39)228 2 25 1903412 yebN 300 NA223 4 48 4603779 fhuFa FG 93 (135160)223 3 25 1634624 nohA G 233 (51)217 2 25 1886651 fadD G +1119b NA215 2 25 1298973 oppA G 233 (7071)213 3 31 1762463 sufA 53 (1360)207 3 31 2244980 cir F 189 (1372)206 5 40 1118358 yceJ 89 NA202 3 31 3465053 bfd 40 (1360)201 10 53 1787602 ydiE 35 (13)201 2 35 1080513 ycdN 57 (1360)199 8 106 3214572 yqjH 59 (13)199 7 103 1752756 ydhY 255 NA199 2 25 2510784 mntH 56 (7374)196 3 25 2231892 yeiT 163 NA196 2 25 1577358 yddA +8c (13)195 3 31 167439 fhuA G 45 (1360)194 4 37 621437 fepD ndash entS FG 25 86 (1375)193 3 52 675768 ybeQ +2c NA191 3 31 3150076 exbB G 70 (136072)187 8 96 2784036 ygaQ 383 NA187 4 37 611889 fepA ndash fes F 172 149 (13)186 7 38 3266404 yhaC 33 NA186 4 28 331901 yahA G +306e NA183 4 61 331085 yahA 510e NA176 6 41 1333484 acnA 371 (5776ndash78)176 2 25 4514779 fecA F 79 (1360)176 1 19 2254041 yeiL +5d NA175 3 25 490057 priC ndash ybaN 21 49 (7980)173 2 29 2454142 yfcV 474 NA172 8 41 3582772 yrhB 10 NA172 4 37 840877 fiu 123 (13)171 3 40 1631071 ydfK +8c (13)171 3 40 1432273 ynaE +8c (13)171 2 25 4570225 yjiT 212 NA

The strongest individual information value (Ri bits) in each cluster is shown followed by the number of walkers (Rigt 0 bits) the size of the clusterthe coordinate of the strongest site the genes that are presumably controlled experimental data footprinted (F) andor gel shifted (G) sites thelocation of the strongest Fur site relative to the gene start or stop codon (as noted) and references about Fur regulation of the genes (NA means noreference is available)aThis cluster corresponds to the high-affinity region (fhuF1) in the fhuF promoter (Figure 5)bThe coordinate is high because Fur cluster is inside the fadD coding regioncThese Fur clusters overlap with translational startsdThis Fur cluster overlaps with the stop codoneThe yahA gene has two Fur clusters one is inside the coding region another is 510 bases upstream of the translational start The Fur cluster insidethe gene was tested by gel shift experiments (Figure 3)

Nucleic Acids Research 2007 Vol 35 No 20 6773

(Supplementary Figure S8) while only 7 of the 20P aeruginosa Fur binding sites and 6 of the 22 B subtilisFur binding sites are single-dimer binding sites (data notshown) This scarcity of multiple sites explains why anM12 alignment can be obtained for E coli but not forP aeruginosa or B subtilis When we performed multiplealignment by only using overlapping sites of 6-bp spacingwe obtained M12 alignments for both P aeruginosa andB subtilis (data not shown) These observations alsosupport the use of M9 as a single-dimer modelOur multiple alignment method should apply to any

DNA binding sites that contain internal direct repeats asrepetition within binding sites may cause alternativealignments of the sites In the RegulonDB (httpregulondbccgunammx) (64) a Fur binding modelwas created by multiply aligning 47 Fur sites althoughit is not clear what these 47 sites are and how thealignment was made Based on their alignment (httpregulondbccgunammxhtmlmatrix_Alignmentjsp) we made a sequence logo (data notshown) which we found to be similar to our M6 logosAlthough the M6 models appear to be able to predictsome sites (eg Tables 1 and 2 Supplementary Figures S3S5 and S7) the models only represent the internal partof overlapping Fur binding sites ie from 6 to thorn6 of theM12 model (Figure 1A)A 18 A crystal structure for P aeruginosa Fur

was obtained by Pohl et al and based on this theauthors proposed a single-dimer binding model (9) TheP aeruginosa crystal structure contains a putative DNAbinding a-helix H4 and a loop between helices H1 and H2both of which were proposed to be involved in bindingDNA through contacting bases in the major groove Thusa single dimer would protect two consecutive majorgrooves on the same face of DNA and the minimalrecognition unit of Fur should be close to 20 bp (9) OurM9 binding site models (19 bp a 9-1-9 inverted repeat) fitwith this single-dimer binding model closely Pohl et alalso proposed a two-dimer binding model in whichtwo Fur dimers bind two overlapping sites that are 5-bpapart (9) However in our whole genome scans andfootprint scans as well we did not observe any case of5-bp separated overlapping sites above 5 bits Instead wemostly found 3- or 6-bp separated overlapping sitesAbove five bits we found 95 cases of 3-bp spacing and92 cases of 6-bp spacingSeveral other consensus-based Fur binding site models

have been proposed to interpret a 19-bp consensuslsquoFur boxrsquo (Figure 6) (65) Within these two earlier

models the classical model and hexamer model(573754) interpreted the lsquoFur boxrsquo as a single recogni-tion unit of a Fur dimer Later Lavrrar and McIntoshsuggested that a 13-bp inverted repeat (6-1-6) is theminimal unit recognized by a single Fur dimer and thattwo overlapping lsquo6-1-6rsquo motifs correspond to the Fur boxwhich is required for high-affinity binding of Fur (5558)Baichoo and Helmann reinterpreted the Fur box as twooverlapping 7-1-7 inverted repeats with a 6-bp spacingand suggested that a 7-1-7 site but not 6-1-6 representsthe minimal recognition unit of Fur (40) Baichoo andHelmann also demonstrated that high-affinity bindingby Fur can happen on a single-dimer binding site ie aninverted 7-1-7 site

The 7-1-7 model basically agrees with our M9 models(Figure 6) The main difference is that the 7-1-7 model didnot count the weakly conserved bases at positions 9 8thorn8 and thorn9 in the B subtilis Fur binding site alignment(Figure 4C) As the E coli and P aeruginosa M9 modelshave highly conserved bases at these four positions(Figure 4A and B) and also the Fur binding mechanismsfor these three species are similar (9) we suggest thata minimal Fur binding unit of 19 bp (9-1-9) as representedby our M9 models is more reasonable

Relative to our M9 models the Fur box only representsthe internal part of two overlapping sites that areseparated by 6 bases ie from 9 to thorn9 of the M12logo (Figures 1A and 6) thus any predictions based on theFur box may not be accurate Two sites (exbD and ygaC)that were predicted by matching to the Fur box wereshown to be non-binding (Figure 3)

Many difficulties in understanding Fur binding sites canbe attributed to the choice of consensus sequences asa model (66) In contrast to the individual informationweight matrix model the consensus sequence methodignores the varying importance of bases by treatingmismatches equivalently In addition the consensusmethod does not have a criterion for an acceptablenumber of mismatches

Sequence walkers predicted by the 24-site M9 modelcover 83 of the footprints (Table 3 Figure 5Supplementary Figure S8) Similar results were obtainedfor both the P aeruginosa and B subtilis M9 models(data not shown) These results strongly suggest that ourM9 models can accurately predict Fur binding

Fur Binding Site Clusters

The whole genome scan with the 24-site M9model revealed363 Fur binding clusters in E coli most of which containmultiple walkers (up to 14) covering regions up to about140 bases This is comparable to the size of knownfootprints which ranges from 30 to 103 bases (Table 3Supplementary Figure S8) Most of the clusters (303 outof 363 83) were found to overlap with one or morestrong s70 promoters Indeed the four Fur clusters inthe middle of coding regions gspC garP yahA and fadDthat we tested by gel mobility shift assays (Figure 3) allhave one or more potential s70 sites (Ri gt 4 bits) exactlyoverlapping them (see Figures S10ndashS13) The function ofthese control elements is unknown

Fur boxClassical model 9-1-9Hexamer model 6-6-1-6Lavrrar model 6-1-6

Baichoo model 7-1-7

M9 model 9-1-9

GATAATGATAATCATTATC

========gtOlt========

=====gt=====gtOlt=====

=====gtOlt==========gtOlt=====

t=====gtOlt============gtOlt=====a

aat=====gtOlt================gtOlt=====att

Figure 6 Different Fur binding models to interpret the Fur box

6774 Nucleic Acids Research 2007 Vol 35 No 20

We correlated our whole genome scan with publishedmicroarray data about Fur in E coli (1360) The resultsshowed that all operons that are involved in ironmetabolism have a predicted strong Fur binding cluster(higher than 13 bits) suggesting direct repression ofthese genes by Fur In contrast only a few of otherFur-activated or repressed operons have a predicted Furcluster higher than 10 bits suggesting indirect control ofthese genes by Fur It has been shown that RNA generyhB directly targets and represses a set of other genes andryhB itself is directly repressed by Fur thus ryhB playsan important role in indirect activation of genes by Furin E coli (135760)

It has been suggested that in N meningitidis Fur canalso directly activate genes by binding to the regionupstream of the promoters (59) However our wholegenome analysis did not support this activation mechan-ism in E coli The E coli ryhB gene is located betweengenes yhhX and yhhY No corresponding homologs of thegenes ryhB yhhX or yhhY can be found in N meningitidisThis suggests that Fur activation mechanisms may bedifferent in different species

Disentangling Fur models

Each binding site that we have studied in detail withinformation theory has had unique properties andchallenges for analysis For example ribosome bindingsites were initially modeled as rigid objects (16) but whenenough sites became available it was clear that theShinendashDalgarno sequence was being lost because it has avariable distance to the initiation codon (67) and a flexiblemodel was required (48) The flexible model was latersuccessfully applied to s70 promoters (47) and that wasused to study Fur sites Likewise information theoryanalysis revealed that Fis binding sites are commonlyfound to overlap each other (68) leading to someuncertainty as to how much nearby sites influence thesites being studied in the alignments Overlapping Fursites that are bound by Fur protein dimers create an evenmore complex situation

Although the 24-site M9 model was quite successful themodel is comprised of information from multiple Fur sitesbecause the Fur dimers overlap on the DNA Since theclusters frequently display 6-base separation of sitespositions 3 to thorn9 of the model contain data frompositions 9 to thorn3 of the adjacent Fur sites eg see thelogo of Figure 4 and the M9 walkers of Figure 2AIn addition information from sites separated by 3 basesand the complements of all of these also contribute to theoverall information content of the model It is not obvioushow to disentangle these effects We suggest that the M9model can be scanned across the genome to locate strongsingle-dimer binding sites which could then be experi-mentally confirmed and used to construct a disentangledmodel

Unfortunately the distribution of the individual infor-mation in these sites would be determined by the modelused to locate them and this could introduce biases Forexample the mean of the distribution can be altered justby selecting different subsets of sites from the distribution

A similar bias has occurred in binding site databaseswhere the consensus sequence was used to locate newsites (23) Thus although the sequence logo would nolonger contain self-overlapping data it would not neces-sarily represent the natural distribution of site strengthsTo approach such an ultimate model may require furtherscanning of footprinted regions Incorporation of theweaker sites in the footprints would of course re-entanglethe model so such scans might be best used only todetermine whether the model can detect the weaker sitesA viable solution may be to obtain all single-dimerbinding sites in the genome that have more than zero bitsand can be confirmed experimentally If the resultingmodel matches the observed footprints it may be consi-dered complete On the other hand if the resultingmodel does not match the footprints the mode of bindingby Fur could be different between single dimers andclusters

SUPPLEMENTARY DATA

Supplementary data are available at NAR Online

ACKNOWLEDGEMENTS

We thank Tom OrsquoHalloran and Caryn Outten for Furprotein Pete Rogan for helping to develop the local bestconcept Elaine Bucheimer for writing the localbestprogram and Lakshmanan Iyer Brent Jewett DanielleNeedle Xiao Ma Shu Ouyang Denise Rubens andBruce Shapiro for their helpful comments and discussionsKAL also thanks the National Cancer Institute atFrederick for sponsoring the Werner H Kirsten StudentIntern Program This publication has been funded in partwith Federal funds from the National Cancer InstituteNational Institutes of Health under contract NO1-CO-12400 The content of this publication does not necessarilyreflect the views or policies of the Department of Healthand Human Services nor does mention of trade namescommercial products or organizations imply endorsementby the US Government This research was supported inpart by the Intramural Research Program of the NIHNational Cancer Institute Center for Cancer ResearchFunding to pay the Open Access publication charges forthis article was provided by National Cancer Institute

Conflict of interest statement None declared

REFERENCES

1 StojiljkovicI BaumlerAJ and HantkeK (1994) Fur regulon ingram-negative bacteria Identification and characterization of newiron-regulated Escherichia coli genes by a fur titration assay[published erratum appears in J Mol Biol 1994 Jul 15240(3)271]J Mol Biol 236 531ndash545

2 Le CamE FrechonD BarrayM FourcadeA and DelainE(1994) Observation of binding and polymerization of Fur repressoronto operator-containing DNA with electron and atomic forcemicroscopes Proc Natl Acad Sci USA 91 11816ndash11820

3 CalderwoodSB and MekalanosJJ (1987) Iron regulation ofShiga-like toxin expression in Escherichia coli is mediated by thefur locus J Bacteriol 169 4759ndash4764

Nucleic Acids Research 2007 Vol 35 No 20 6775

4 NiederhofferEC NaranjoCM BradleyKL and FeeJA (1990)Control of Escherichia coli superoxide dismutase (sodA and sodB)genes by the ferric uptake regulation (fur) locus J Bacteriol 1721930ndash1938

5 EscolarL Perez-MartinJ and de LorenzoV (1998) Binding of theFur (Ferric Uptake Regulator) repressor of Escherichia coli toarrays of the GATAAT sequence J Mol Biol 283 537ndash547

6 BaggA and NeilandsJB (1987) Ferric uptake regulation proteinacts as a repressor employing iron (II) as a cofactor to bind theoperator of an iron transport operon in Escherichia coliBiochemistry 26 5471ndash5477

7 de LorenzoV WeeS HerreroM and NeilandsJB (1987)Operator sequences of the aerobactin operon of plasmid ColV-K30binding the ferric uptake regulation (fur) repressor J Bacteriol169 2624ndash2630

8 AlthausEW OuttenCE OlsonKE CaoH andOrsquoHalloranTV (1999) The ferric uptake regulation (Fur) repressoris a zinc metalloprotein Biochemistry 38 6559ndash6569

9 PohlE HallerJC MijovilovichA Meyer-KlauckeWGarmanE and VasilML (2003) Architecture of a protein centralto iron homeostasis crystal structure and spectroscopic analysis ofthe ferric uptake regulator Mol Microbiol 47 903ndash915

10 KammlerM SchonC and HantkeK (1993) Characterization ofthe ferrous iron uptake system of Escherichia coli J Bacteriol 1756212ndash6219

11 DesaiPJ AngererA and GencoCA (1996) Analysis of Furbinding to operator sequences within the Neisseria gonorrhoeae fbpApromoter J Bacteriol 178 5020ndash5023

12 HeidrichC HantkeK BierbaumG and SahlHG (1996)Identification and analysis of a gene encoding a Fur-likeprotein of Staphylococcus epidermidis FEMS Microbiol Lett 140253ndash259

13 McHughJP Rodriguez-QuinonesF Abdul-TehraniHSvistunenkoDA PooleRK CooperCE and AndrewsSC(2003) Global iron-dependent gene regulation in Escherichia coliA new mechanism for iron homeostasis J Biol Chem 27829478ndash29486

14 ShannonCE (1948) A mathematical theory of communicationBell Syst Tech J 27 379ndash423 623ndash656 httpcmbell-labscomcmmswhatshannondaypaperhtml

15 PierceJR (1980) An Introduction to Information Theory SymbolsSignals and Noise 2nd edn Dover Publications Inc New York

16 SchneiderTD StormoGD GoldL and EhrenfeuchtA (1986)Information content of binding sites on nucleotide sequencesJ Mol Biol 188 415ndash431 httpwwwccrnpncifcrfgov~tomspaperschneider1986

17 SchneiderTD and MastronardeD (1996) Fast multiple alignmentof ungapped DNA sequences using information theory and arelaxation method Discrete Appli Math 71 259ndash268 httpwwwccrnpncifcrfgov~tomspapermalign

18 SchneiderTD and StephensRM (1990) Sequence logos a newway to display consensus sequences Nucleic Acids Res 186097ndash6100 httpwwwccrnpncifcrfgov~tomspaperlogopaper

19 SchneiderTD (1997) Information content of individual geneticsequences J Theor Biol 189 427ndash441 httpwwwccrnpncifcrfgov~tomspaperri

20 SchneiderTD (1997) Sequence walkers a graphical method todisplay how binding proteins interact with DNA or RNAsequences Nucleic Acids Res 25 4408ndash4415 httpwwwccrnpncifcrfgov~tomspaperwalker erratum NAR 26(4)1135 1998

21 PaninaEM MironovAA and GelfandMS (2001) Comparativeanalysis of FUR regulons in gamma-proteobacteria Nucleic AcidsRes 29 5195ndash5206

22 SchneiderTD (2000) Evolution of biological informationNucleic Acids Res 28 2794ndash2799 httpwwwccrnpncifcrfgov~tomspaperev

23 VyhlidalCA RoganPK and LeederJS (2004) Development andrefinement of pregnane X receptor (PXR) DNA binding site modelusing information theory insights into PXR-mediated gene regula-tion J Biol Chem 279 46779ndash46786

24 ZhengM DoanB SchneiderTD and StorzG (1999) OxyR andSoxRS regulation of fur J Bacteriol 181 4639ndash4643 httpwwwccrnpncifcrfgov~tomspaperoxyrfur

25 HengenPN BartramSL StewartLE and SchneiderTD (1997)Information analysis of Fis binding sites Nucleic Acids Res 254994ndash5002 httpwwwccrnpncifcrfgov~tomspaperfisinfo

26 WoodTI GriffithKL FawcettWP Jair K-W SchneiderTDand WolfRE (1999) Interdependence of the position andorientation of SoxS binding sites in the transcriptional activation ofthe class I subset of Escherichia coli superoxide-inducible promotersMol Microbiol 34 414ndash430

27 ZhengM WangX DoanB LewisKA SchneiderTD andStorzG (2001) Computation-directed identification of OxyR-DNAbinding sites in Escherichia coli J Bacteriol 183 4571ndash4579

28 RoganPK FauxBM and SchneiderTD (1998) Informationanalysis of human splice site mutations Hum Mutat 12 153ndash171Erratum in Hum Mutat 199913(1)82 httpwwwccrnpncifcrfgov~tomspaperrfs

29 ChenZ and SchneiderTD (2005) Information theory basedT7-like promoter models classification of bacteriophages anddifferential evolution of promoters and their polymerasesNucleic Acids Res 33 6172ndash6187 httpwwwccrnpncifcrfgov~tomspaperst7like

30 ChenZ and SchneiderTD (2006) Comparative analysis of tandemT7-like promoter containing regions in enterobacterial genomesreveals a novel group of genetic islands Nucleic Acids Res 341133ndash1147 httpwwwccrnpncifcrfgov~tomspaperst7island

31 SchneiderTD (1996) Reading of DNA sequence logos predictionof major groove binding by information theory Meth Enzymol274 445ndash455 httpwwwccrnpncifcrfgov~tomspaperoxyr

32 GriggsDW and KoniskyJ (1989) Mechanism for iron-regulatedtranscription of the Escherichia coli cir gene metal-dependentbinding of Fur protein to the promoters J Bacteriol 1711048ndash1052

33 AngererA and BraunV (1998) Iron regulates transcription of theEscherichia coli ferric citrate transport genes directly and throughthe transcription initiation proteins Arch Microbiol 169 483ndash490

34 de LorenzoV HerreroM GiovanniniF and NeilandsJB (1988)Fur (ferric uptake regulation) protein and CAP (catabolite-activatorprotein) modulate transcription of fur gene in Escherichia coliEur J Biochem 173 537ndash546

35 HuntMD PettisGS and McIntoshMA (1994) Promoter andoperator determinants for fur-mediated iron regulation in thebidirectional fepA-fes control region of the Escherichia colienterobactin gene system J Bacteriol 176 3944ndash3955

36 TardatB and TouatiD (1993) Iron and oxygen regulation ofEscherichia coli MnSOD expression competition between the globalregulators Fur and ArcA for binding to DNA Mol Microbiol 953ndash63

37 EscolarL Perez-MartinJ and de LorenzoV (2000) Evidence ofan unusually long operator for the fur repressor in the aerobactinpromoter of Escherichia coli J Biol Chem 275 24709ndash24714

38 YoungGM and PostleK (1994) Repression of tonB transcriptionduring anaerobic growth requires Fur binding at the promoterand a second factor binding upstream Mol Microbiol 11943ndash954

39 BrickmanTJ OzenbergerBA and McIntoshMA (1990)Regulation of divergent transcription from the iron-responsivefepB-entC promoter-operator regions in Escherichia coliJ Mol Biol 212 669ndash682

40 BaichooN and HelmannJD (2002) Recognition of DNA by Fura reinterpretation of the Fur box consensus sequence J Bacteriol184 5826ndash5832

41 OchsnerUA and VasilML (1996) Gene repression by the ferricuptake regulator in Pseudomonas aeruginosa cycle selection of iron-regulated genes Proc Natl Acad Sci USA 93 4409ndash4414

42 BaichooN WangT YeR and HelmannJD (2002) Globalanalysis of the Bacillus subtilis Fur regulon and the iron starvationstimulon Mol Microbiol 45 1613ndash1629

43 ChristoffersenCA BrickmanTJ Hook-BarnardI andMcIntoshMA (2001) Regulatory architecture of the iron-regulatedfepD-ybdA bidirectional promoter region in Escherichia coliJ Bacteriol 183 2059ndash2070

44 ChaiS WelchTJ and CrosaJH (1998) Characterization of theinteraction between Fur and the iron transport promoter of thevirulence plasmid in Vibrio anguillarum J Biol Chem 27333841ndash33847

6776 Nucleic Acids Research 2007 Vol 35 No 20

45 EscolarL Perez-MartinJ and de LorenzoV (1998) Coordinatedrepression in vitro of the divergent fepA-fes promoters ofEscherichia coli by the iron uptake regulation (Fur) proteinJ Bacteriol 180 2579ndash2582

46 EscolarL de LorenzoV and Perez-MartinJ (1997)Metalloregulation in vitro of the aerobactin promoter of Escherichiacoli by the Fur (ferric uptake regulation) protein Mol Microbiol26 799ndash808

47 ShultzabergerRK ChenZ LewisKA and SchneiderTD (2007)Anatomy of Escherichia coli s70 promoters Nucleic Acids Res 35771ndash788 httpwwwccrnpncifcrfgov~tomspaperflexprom

48 ShultzabergerRK BucheimerRE RuddKE andSchneiderTD (2001) Anatomy of Escherichia coli ribosomebinding sites J Mol Biol 313 215ndash228 httpwwwccrnpncifcrfgov~tomspaperflexrbs

49 HiraoI KawaiG YoshizawaS NishimuraY IshidoYWatanabeK and MiuraK (1994) Most compact hairpin-turnstructure exerted by a short DNA fragment d(GCGAAGC) insolution an extraordinarily stable structure resistant to nucleasesand heat Nucleic Acids Res 22 576ndash582

50 Eick-HelmerichK and BraunV (1989) Import of biopolymers intoEscherichia coli nucleotide sequences of the exbB and exbD genesare homologous to those of the tolQ and tolR genes respectivelyJ Bacteriol 171 5117ndash5126

51 VassinovaN and KozyrevD (2000) A method for direct cloning ofFur-regulated genes identification of seven new Fur-regulated lociin Escherichia coli Microbiology 146 3171ndash3182

52 ToledanoMB KullikI TrinhF BairdPT SchneiderTD andStorzG (1994) Redox-dependent shift of OxyR-DNA contactsalong an extended DNA binding site a mechanism for differentialpromoter selection Cell 78 897ndash909

53 BlattnerFR PlunkettG III BlochCA PernaNT BurlandVRileyM Collado-VidesJ GlasnerJD RodeCK et al (1997)The complete genome sequence of Escherichia coli K-12 Science277 1453ndash1474

54 BaggA and NeilandsJB (1987) Molecular mechanism of regula-tion of siderophore-mediated iron assimilation Microbiol Rev 51509ndash518

55 LavrrarJL and McIntoshMA (2003) Architecture of a Furbinding site a comparative analysis J Bacteriol 185 2194ndash2202

56 OrchardK and MayGE (1993) An EMSA-based method fordetermining the molecular weight of a protein ndash DNA complexNucleic Acids Res 21 3335ndash3336

57 MasseE and GottesmanS (2002) A small RNA regulates theexpression of genes involved in iron metabolism in Escherichia coliProc Natl Acad Sci USA 99 4620ndash4625

58 LavrrarJL ChristoffersenCA and McIntoshMA (2002)FurndashDNA interactions at the bidirectional fepDGC-entS promoterregion in Escherichia coli J Mol Biol 322 983ndash995

59 DelanyI RappuoliR and ScarlatoV (2004) Fur functions as anactivator and as a repressor of putative virulence genes in Neisseriameningitidis Mol Microbiol 52 1081ndash1090

60 MasseE VanderpoolCK and GottesmanS (2005) Effect ofRyhB small RNA on global iron use in Escherichia coliJ Bacteriol 187 6962ndash6971

61 ArgamanL HershbergR VogelJ BejeranoG WagnerEGMargalitH and AltuviaS (2001) Novel small RNA-encodinggenes in the intergenic regions of Escherichia coli Curr Biol 11941ndash950

62 WassarmanKM RepoilaF RosenowC StorzG andGottesmanS (2001) Identification of novel small RNAs usingcomparative genomics and microarrays Genes Dev 15 1637ndash1651

63 RobisonK McGuireAM and ChurchGM (1998) A compre-hensive library of DNA-binding site matrices for 55 proteins appliedto the complete Escherichia coli K-12 genome J Mol Biol 284241ndash254

64 SalgadoH Santos-ZavaletaA Gama-CastroS Millan-ZarateDDiaz-PeredoE Sanchez-SolanoF Perez-RuedaEBonavides-MartinezC and Collado-VidesJ (2001) RegulonDB(version 32) transcriptional regulation and operon organization inEscherichia coli K-12 Nucleic Acids Res 29 72ndash74

65 AndrewsSC RobinsonAK and Rodriguez-QuinonesF (2003)Bacterial iron homeostasis FEMS Microbiol Rev 27 215ndash237

66 SchneiderTD (2002) Consensus Sequence Zen ApplBioinformatics 1 111ndash119 httpwwwccrnpncifcrfgov~tomspaperszen

67 RuddKE and SchneiderTD (1992) Compilation of E coliribosome binding sites In MillerJH (ed) A Short Course inBacterial Genetics A Laboratory Manual and Handbook forEscherichia coli and Related Bacteria Cold Spring HarborCold Spring Harbor Laboratory Press New York pp 1719ndash1745

68 HengenPN LyakhovIG StewartLE and SchneiderTD(2003) Molecular flip-flops formed by overlapping Fis sitesNucleic Acids Res 31 6663ndash6673

69 FeeJA (1991) Regulation of sod genes in Escherichia colirelevance to superoxide dismutase function Mol Microbiol 52599ndash2610

70 IgarashiK SaishoT YuguchiM and KashiwagiK (1997)Molecular mechanism of polyamine stimulation of thesynthesis of oligopeptide-binding protein J Biol Chem 2724058ndash4064

71 YoshidaM MeksuriyenD KashiwagiK KawaiG andIgarashiK (1999) Polyamine stimulation of the synthesis ofoligopeptide-binding protein (OppA) Involvement of a structuralchange of the Shine-Dalgarno sequence and the initiation codonaug in oppa mRNA J Biol Chem 274 22723ndash22728

72 LitwinCM and CalderwoodSB (1993) Role of iron in regulationof virulence genes Clin Microbiol Rev 6 137ndash149

73 MakuiH RoigE ColeST HelmannJD GrosP andCellierMF (2000) Identification of the Escherichia coli K-12Nramp orthologue (MntH) as a selective divalent metal iontransporter Mol Microbiol 35 1065ndash1078

74 PatzerSI and HantkeK (2001) Dual repression by Fe(2+)-Furand Mn(2+)-MntR of the mntH gene encoding an NRAMP-likeMn(2+) transporter in Escherichia coli J Bacteriol 1834806ndash4813

75 FurrerJL SandersDN Hook-BarnardIG and McIntoshMA(2002) Export of the siderophore enterobactin in Escherichia coliinvolvement of a 43 kDa membrane exporter Mol Microbiol 441225ndash1234

76 CunninghamL GruerMJ and GuestJR (1997) Transcriptionalregulation of the aconitase genes (acnA and acnB) of Escherichiacoli Microbiology 143 ( Pt 12) 3795ndash3805

77 FuentesAM Diaz-MejiaJJ Maldonado-RodriguezR andAmabile-CuevasCF (2001) Differential activities of theSoxR protein of Escherichia coli SoxS is not required for geneactivation under iron deprivation FEMS Microbiol Lett 201271ndash275

78 VargheseS TangY and ImlayJA (2003) Contrasting sensitivitiesof Escherichia coli aconitases A and B to oxidation and irondepletion J Bacteriol 185 221ndash230

79 ZavitzKH DiGateRJ and MariansKJ (1991) The PriB andPriC replication proteins of Escherichia coli Genes DNAsequence overexpression and purification J Biol Chem 26613988ndash13995

80 SandlerSJ (2005) Requirements for replication restart proteinsduring constitutive stable DNA replication in Escherichia coli K-12Genetics 169 1799ndash1806

81 SchneiderTD (2001) Strong minor groove base conservation insequence logos implies DNA distortion or base flipping duringreplication and transcription initiation Nucleic Acids Res 294881ndash4891 httpwwwccrnpncifcrfgov~tomspaperbaseflip

Nucleic Acids Research 2007 Vol 35 No 20 6777

Page 8: Discovery of Fur binding site clusters in Escherichia coli ...biitcomm/research/references... · using an fhuF:lacZ fusion and Fur consensus sequence-containing plasmid titrant on

walkers (02 and 08 bits) extend past the range of thefootprint resulting in 144 coverage of the footprint

In a previously published study Escolar et al synthe-sized oligonucleotides that contained repeats of thesequence GATAAT and determined Fur binding byfootprint experiments (5) No footprint was seen withone insert correspondingly no sequence walkers wereobserved in that sequence (Supplementary Figure S9)For the oligos with three four and five inserts sequencewalkers cover 83ndash90 of the footprints The 2-insertsequence had a weak interaction with Fur at high protein

concentrations and two overlapping walkers of 30 and42 bits appeared which cover the repeats (5) Theseresults demonstrate that the Fur model accurately predictsFur binding on both natural and synthetic sequences

Whole genome scan

The lowest information of any site in the 24-site M9 modelis 101 bits (Supplementary Figure S1) Since all sites inthe model are proven binding sites we used 10 bits asthe cutoff for a whole genome scan This presumably

A

B

C

======gtolt======

======gtolt======

======gtolt======

24 E coli Fur binding sites 196 bits (minus9 +9)

0

1

2b

its

5prime

minus20

CTGA

minus19

minus18

minus17

minus16

GCAT

minus15

GCTA

minus14

CGTA

minus13

GACT

minus12

CAGT

minus11

CGTA

minus10

GCAT

minus9

GTA

minus8

GTA

minus7

GCAT

minus6

CAG

minus5

GTCA

minus4

GAT

minus3

TA

minus2

TGCA

minus1

GACT

0 1

TCGA

2

ACGT

3

AT

4

TCA

5

CGAT

6

GTC

7

CTGA

8

CAT

9

CAT

10

CGTA

11

GCAT

12

GTCA

13

CTGA

14

GCAT

15

CGAT

16

CGTA

17 18 19 20

GCAT

3prime

20 P aeruginosa Fur sites 186 bits (minus9 +9)

0

1

2

bit

s

5prime

minus20

minus19

minus18

minus17

minus16

minus15

minus14

CGTA

minus13

ACGT

minus12

minus11

minus10

CGTA

minus9

CGTA

minus8

CTA

minus7

CT

minus6

TAG

minus5

GTCA

minus4

CAGT

minus3

TA

minus2

CTA

minus1

CT

0

GC

1

GA

2

GAT

3AT

4GTCA

5

CAGT

6

ATC

7

GA

8

GAT

9

GCAT

10

GCAT

11 12 13

TGCA

14

GCAT

15 16 17 18 19 20 3prime

22 B subtilis Fur sites 229 bits (minus9 +9)

0

1

2

bit

s

5prime

minus20

minus19

GCAT

minus18

GCTA

minus17

GACT

minus16

CTGA

minus15

minus14

GCAT

minus13

GCAT

minus12

minus11

GCAT

minus10

GCTA

minus9

CGTA

minus8

CGTA

minus7

AT

minus6

C

G

minus5

GCA

minus4

CGAT

minus3

TGA

minus2

TGA

minus1

GCAT

0

TAGC

1

GCTA

2

CAT

3

CAT

4

GCTA

5

CGT

6

G

C7

TA

8

GCAT

9

GCAT

10

CGAT

11

CGTA

12 13

CGTA

14

CGTA

15 16

GACT

17

CTGA

18

CGAT

19

CGTA

20 3prime

Figure 4 Comparison of M9 Fur models of three bacterial species Experimentally proven Fur binding sites from E coli (A) P aeruginosa (B) andB subtilis (C) were used to build the models for these three bacteria The marked region from 7 to +7 in each logo indicates the 7-1-7 model thatwas proposed by Baichoo and Helmann (40)

Nucleic Acids Research 2007 Vol 35 No 20 6769

minimizes false positives in our predictions When themodel was scanned across the E coli K-12 genome(NC_000913) a total of 412 sites were found above10 bits These sites belong to 363 unique clusters A list ofthe clusters is available at httpwwwccrnpncifcrfgovtomspapersfurThe 363 regions were extracted from 50 to thorn50

around the strongest site in each region and scanned withthe 24-site M9 model using a cutoff of 0 bits We thendetermined the exact region of each cluster that is coveredby walkers greater than 0 bits The results show that theseclusters contain 1ndash14 walkers covering a region from19ndash142 bases (350 bases on average) This is comparableto the known footprints (30ndash103 bases) (Table 3 Supple-mentary Figure S8) Of these clusters 155 are in inter-genic regions 143 are entirely inside gene coding regionsand 65 overlap the boundary between coding and inter-genic regionsTo determine how many of the Fur clusters overlap

with reasonably strong promoters we scanned these 363Fur clusters with a s70 promoter model (47) using a 4-bitcutoff The results showed that 303 clusters overlap withone or more strong promoters (Rigt4 bits) The other60 clusters do not overlap with a strong promoterWhen we scanned the flanking regions of these 60 clusters(50 to 0 of the left edge and 0 to thorn50 of the right edgeof each cluster) only 23 clusters were found to havestrong promoters (Ri gt 4 bits) in the flanking regionsWe examined the locations of Fur sites relative toknown promoter elements (transcriptional start site 10and 35) and found that they are more tightly distributedaround the promoter elements than observed with theDNA binding proteins Fis H-NS and IHF (47) (data notshown) These results suggest that Fur may mainly act as adirect repressor of genes by binding to and blockingthe promoters and that direct activation by Fur throughbinding to the region upstream of a promoter assuggested in Neisseria meningitidis by Delany et al (59)

may not be a common mechanism of Fur regulationin E coli

Out of the 363 clusters 42 were found containingat least one predicted Fur binding site above 170 bits(our arbitrary cutoff used to locate significant regions)(Table 4) Of the 42 clusters 39 were found to overlap withone or more strong s70 promoters (gt4 bits) The otherthree clusters fepD-entS yddA and fecA do not overlapScanning the flanking regions of these three clusters withthe s70 promoter model revealed that these Fur clustersare located between the respective promoters and transla-tional starts There may be only weak repression of thesegenes by Fur as none of them were significantly repressedby Fur according to microarray analyses (1360)

Scans of other proposed Fur-regulated genes

Many genes have been proposed to be Fur-regulated bycomparing conventional consensus sequences to promoterregions and also by homology to systems in other organ-isms Kammler et al annotated two Fur binding sites inthe promoter region of feoA in E coli by comparison toa consensus sequence (10) The 24-site M9 model predictstwo clusters of sites in the same region with each clustercovering one of the marked Fur boxes One cluster hasone major site of 105 bits (at 3538006) the other clusterhas two major overlapping sites of 109 (at 3538059) and134 bits (at 3538065) respectively The same authors haveconfirmed that Fur does bind to this region in vivo but theexact Fur binding regions have not been determined byfootprint experiments

Vassinova and Kozyrev used an in vivo selection tolocate Fur sites on Sau3A fragments from the E coligenome (51) The five regions from Figure 2 of their paperwere analyzed using sequence walkers Using an unidenti-fied consensus sequence yhhX (from 3578828 to 3577791)was predicted by Vassinova and Kozyrev to have two Furbinding sites In the same region we found one strongsite of 252 bits (at 3579054) (Table 4) overlapping twoweaker sites (82 and 83 bits) This cluster of Fur sites isactually located right in front of an sRNA gene ryhB(from 3579039 to 3578946) (Table 4) (6162) which hasbeen shown to be repressed by Fur (57) The promoterregion of gpmA (from 786818 to 786066 named pgm inreference 51) was predicted by consensus to contain twosites One of the highest sites (274 bits) was found in thisregion (at 786853) overlapping two weaker sites of 75and 56 bits The consensus sequence-predicted Fur site byVassinova and Kozyrev (51) is located about 600 basesupstream of the ygaC gene In the same region our 11-siteM9 model only found a 14-bit site (Figure 3) and the24-site M9 model found a negative site of 34 bits(Supplementary Figure S1) Instead in the nearby regionsthe 24-site M9 model found a 74-bit site (11 basesupstream of ygaC) and a 101-bit site (728 bases upstreamof ygaC) For nohA (from 1634391 to 1633822)two overlapping sites of 223 and 126 bits were found at1634624 and 1634630 (Table 4) The fifth region was fhuF(from 4603686 to 4602898) Figure 5 shows the DNase Ifootprints and predicted walkers in this region Onlythe high-affinity region (fhuF1) was identified by the

Table 2 Information theory test of B subtilis Fur models M9 and M6

Oligo1 Number of observed1

Fur dimers (nM)Number of M92

walkers (bits)Number of M62

walkers (bits)

6-mer 0 f 0 f 06-6 1 (500) f 1 (21) 2 (67 54)6-1-6 1 (200) f 1 (97) 2 (54 74)7-1-7 1 (100) f 1 (207) 2 (107 128)8-1-7 2 (100 1000) f 2 (76 202) f 2 (174 108)8-1-8 2 (100 1000) f 2 (76 230) f 2 (174 160)9-1-9 2 (50 100) f 2 (119 301) f 2 (208 194)2 (7-1-7) 2 (10 100) f 2 (166 257) 3 (61 208 143)feuA 1 (10) f 1 (223) 2 (103 81)dhbA 2 (10 100) f 2 (256 299) 3 (149 208 171)

1Oligos and observed Fur dimers are from (40) The Fur concentration(nM) at which one or two Fur dimer-DNA complexes appeared isgiven in parentheses2Number of sequence walkers of each model followed by theinformation (bits) of each walker are shown for each sequence Thepredictions that match the number of observed Fur dimers are markedwith a solid circle while those that do not are marked with an opencircle Only major walkers are counted A full depiction of all walkerscan be found in Supplementary Figure S7

6770 Nucleic Acids Research 2007 Vol 35 No 20

A B

L

H

H

L

H fhuF1

L fhuF2

yjjZ

4603880 4603870 4603860 4603850 4603840 4603830 4603820 5 a t c a g c a a t c c c g g c a g c a a c a c t c c c c a g c c a c t g c c c a g c g t a c g t t g c a a c a t g a t t t c a t c t c 3

4603810 4603800 4603790 4603780 4603770 4603760 4603750 5 t t t c a t t g a t a a t g a t a a c c a a t a t c a t a t g a t a a t t t t t a t c a t t t g c a a g c c a g a t a a a t c c c t t g c t a t c g g 3

Fur high affinity topFur high affinity bottom

M9 161 bits

M9 208 bits

M9 223 bits

M9 34 bits

fhuF

fhuF

4603740 4603730 4603720 4603710 4603700 4603690 4603680 5 g t a a a c c t a t c g c t a t g a t t a g c a a t c a t t a t c a t t t a g a t t a c t a t c c c g a t t a t g g c c t a t c g t t c 3rsquo

Fur low affinity topFur low affinity bottom

M9 22 bits

M9 01 bits

M9 146 bits

M9 93 bits

Figure 5 Fur model scan and DNase I footprints of the fhuF promoter region (A) The promoter region was scanned with the 24-site M9 model and two clusters of sequence walkers (Rigt 0bits) were detected with each closely corresponding to one of the two Fur protected regions (gray bars fhuF1 and fhuF2) The solid black arrow with two tails above the DNA at coordinates4603717 and 4603716 indicates the fhuF transcription starts The open arrows starting at coordinates 4603827 and 4603686 indicate the yjjZ and fhuF translation starts respectively Thesaturation of colored rectangles behind each sequence walker is proportional to the information of that site For reference colored lines cycle with 6-base periodicity (B) DNase I footprinting onthe fhuF promoter by Fur showed two regions protected by the protein marked in the figure by brackets The high-affinity and low-affinity regions are marked with H and L respectively Thefootprinting samples were run in parallel with MaxamndashGilbert sequencing ladders (marked by GA)

Nucleic

Acid

sResea

rch2007V

ol3

5N

o20

6771

consensus sequence Thus four of the five Sau3Afragments (yhhX gpmA nohA and fhuF) selected to haveFur binding sites in vivo were identified in our genomescan (Table 4) and our gel shift results confirmed this(Figure 3) the remaining site ygaC had a weak walker of14 bits (by the 11-site M9 model) and showed extremelyweak binding (Figure 3)Eick-Helmerich and Braun matched a Fur consensus

sequence to the promoters of exbB and exbD (50)One strong walker of 191 bits was found in the exbBpromoter region overlapping the lsquoFur boxrsquo The regionupstream of exbD showed no walkers with Ri greater than0 bits even though it had also been predicted to containa Fur binding site using the same consensus sequenceas used in the exbB promoter The highest walker thatwe could detect in the region upstream of the exbDgene is 71 bits (by the 24-site M9 model Figure 3 andSupplementary Figure S1) significantly lower than thetheoretical lower bound of a binding site (0 bits) (19) Thisindicates that Fur should not bind this region and our gelshift results confirmed this (Figure 3)A microarray analysis by McHugh et al found 101

genes to be regulated by Fur 53 of which were repressedand 48 activated (13) The 53 Fur-repressed genes belongto 32 transcription units (individual genes or operons)18 of which are involved in iron metabolism 14 in energymetabolism and other functions Using the 24-site M9model we predicted strong Fur binding clusters (higherthan 13 bits) for all 18 iron metabolism-related transcrip-tion units Of the other 14 transcription units onlytwo have a Fur binding site higher than 10 bits (nrdHIEF101 bits fimE 108 bits) The 48 Fur-activated genesbelong to 34 transcription units only three of these(garPLRK ynaE and ydfK) have a predicted Fur bindingcluster The garPLRK operon has a weak Fur clusterthat contains one major site of 104 bits The ynaE andydfK genes are almost identical in both promoter andcoding regions and the two Fur clusters (171 bitscentered at thorn8 from the translational start) are the same(Table 4) These results strongly suggest that in E coli

Fur is a direct repressor of iron metabolism genes and anindirect repressor or activator of other genes

McHugh et al (13) used an altered information theoryapproach for modeling Fur binding (63) to identify whichgenes are under direct Fur control Our model identifiedsites in all genes that their model did as well as fourothers fhuE (124 bits at 1160880) exbB (191 bits at3150076) fimE (108 bits at 4539697) and ydfK (171 bitsat 1631071)

The indirect activation by Fur requires an intermediateregulator which could directly target Fur-activated genesA small regulatory RNA gene ryhB is one such importantplayer Earlier study revealed several genes that aredirectly repressed by ryhB (57) Recently a microarrayanalysis of ryhB control by Masse et al (60) expanded thisdirect target set to 18 operons They also identified anadditional 10 indirectly down-regulated operons and 10up-regulated operons The 10 ryhB indirectly repressedoperons are directly repressed by Fur and for all ofthese operons we found strong Fur binding clusters(higher than 13 bits) in the corresponding promoterregions In contrast only a few of the other 28 ryhB-controlled operons (18 directly repressed operons and10 up-regulated operons) have a predicted Fur bindingcluster above 10 bits These include acnA (176 bits at1333484) ydhD (109 bits at 1732367) and oppA (215 bitsat 1298973) The acnA and oppA Fur clusters are bothstrong the former cluster has three overlapping sites of176 98 and 142 bits and the latter has two overlapp-ing sites of 215 and 96 bits (Table 4) Howevermicroarray analysis did not find these two genes to berepressed by Fur (1360) suggesting that the Fur bindingclusters may not be the primary control elements ofthese genes

DISCUSSION

Fur binding models

In this study we used experimentally proven sites to createinformation theory models of Fur binding Our models

Table 3 Scan of footprinted Fur binding sites in E coli

Accession Gene Footprinted region References Number Ri Percentage

NC_000913 fhuF1 4603763 ndash 4603815 This work 4 223 91NC_000913 fhuF2 4603680 ndash 4603733 This work 4 146 66NC_000913 fepA-fes 611868 ndash 611915 (35) 4 187 77NC_000913 fepB 623939 ndash 623969 (39) 2 167 81NC_000913 entC 624054 ndash 624102 (39) 4 231 76NC_000913 fur 709931 ndash 709960 (34) 2 123 83NC_000913 tonB 1309040 ndash 1309072 (838) 3 115 88NC_000913 cir 2244957 ndash 2244999 (32) 3 207 72NC_000913 sodA 4098726 ndash 4098780 (36) 3 233 56NC_000913 fecA 4514752 ndash 4514789 (33) 2 176 66NC_000913 fecI 4516300 ndash 4516340 (33) 7 231 144M10930 iucA 291 ndash 393 (37) 13 203 93NC_000913 fepD-entS 621394 ndash 621462 (58) 4 201 54

The footprinted regions were scanned with the 24-site M9 model (Rigt0) to test the modelrsquos validity This table lists the GenBank accession numberthe gene promoter that was footprinted the coordinates of the footprinted region the references for the footprints the number of walkers in theregion the strongest Ri value in the same region and the percent of the footprint covered by sequence walkers This table is a summary of Figure 5Aand Supplementary Figure S8 Sites in the promoters that are marked by an asterisk were used to build the 11-site models (Figure 1 SupplementaryFigure S1) in the case of iucA two sites were used in the model (see Materials and Methods Section)

6772 Nucleic Acids Research 2007 Vol 35 No 20

especially the 24-site M9 model approximate the bindingcharacteristics of Fur more fully than previous modelsthat depended on conventional consensus sequences datafrom multiple species and sequences that were not foot-printed (13251) The rigorous approach used hererevealed new binding sites disproved two sites predictedby a consensus sequence and clarified the manner inwhich the protein binds

To produce these models binding sequences weremultiply aligned to maximize the information content indifferent window sizes Because multiple Fur sites overlapour analysis of proven Fur binding sequences from three

bacterial species gave two or three different alignmentsdepending on the species (Figure 1 SupplementaryFigure S6) By testing these models against publisheddata we showed clearly that the M9 model representssingle-dimer binding sites the M12 model is for two-dimerbinding while the compact M6 alignment was caused bycompression of the binding sequences into a small window(Tables 1 and 2 Figures 1 and 2 Supplementary FiguresS3 S5 and S7)As evaluated by their respective M9 models our

initial set of 11 footprinted E coli Fur binding sitesall contain overlapping sites of 6-base spacing

Table 4 Strong Fur clusters in the E coli genome predicted by the 24-site M9 model

Ri Number Size Coordinate Genes Experimental Position References(bits) (bp) data

274 12 95 2066611 yoeA G 48 NA274 5 31 786853 gpmA G 35 (1351)253 4 28 579824 nohB 133 (1)252 4 31 3579054 ryhB G 15 (57)233 3 31 4098746 sodA F 87 (6069)231 7 59 4516309 fecI F 51 (1360)231 4 37 624072 entC F 36 (39)228 2 25 1903412 yebN 300 NA223 4 48 4603779 fhuFa FG 93 (135160)223 3 25 1634624 nohA G 233 (51)217 2 25 1886651 fadD G +1119b NA215 2 25 1298973 oppA G 233 (7071)213 3 31 1762463 sufA 53 (1360)207 3 31 2244980 cir F 189 (1372)206 5 40 1118358 yceJ 89 NA202 3 31 3465053 bfd 40 (1360)201 10 53 1787602 ydiE 35 (13)201 2 35 1080513 ycdN 57 (1360)199 8 106 3214572 yqjH 59 (13)199 7 103 1752756 ydhY 255 NA199 2 25 2510784 mntH 56 (7374)196 3 25 2231892 yeiT 163 NA196 2 25 1577358 yddA +8c (13)195 3 31 167439 fhuA G 45 (1360)194 4 37 621437 fepD ndash entS FG 25 86 (1375)193 3 52 675768 ybeQ +2c NA191 3 31 3150076 exbB G 70 (136072)187 8 96 2784036 ygaQ 383 NA187 4 37 611889 fepA ndash fes F 172 149 (13)186 7 38 3266404 yhaC 33 NA186 4 28 331901 yahA G +306e NA183 4 61 331085 yahA 510e NA176 6 41 1333484 acnA 371 (5776ndash78)176 2 25 4514779 fecA F 79 (1360)176 1 19 2254041 yeiL +5d NA175 3 25 490057 priC ndash ybaN 21 49 (7980)173 2 29 2454142 yfcV 474 NA172 8 41 3582772 yrhB 10 NA172 4 37 840877 fiu 123 (13)171 3 40 1631071 ydfK +8c (13)171 3 40 1432273 ynaE +8c (13)171 2 25 4570225 yjiT 212 NA

The strongest individual information value (Ri bits) in each cluster is shown followed by the number of walkers (Rigt 0 bits) the size of the clusterthe coordinate of the strongest site the genes that are presumably controlled experimental data footprinted (F) andor gel shifted (G) sites thelocation of the strongest Fur site relative to the gene start or stop codon (as noted) and references about Fur regulation of the genes (NA means noreference is available)aThis cluster corresponds to the high-affinity region (fhuF1) in the fhuF promoter (Figure 5)bThe coordinate is high because Fur cluster is inside the fadD coding regioncThese Fur clusters overlap with translational startsdThis Fur cluster overlaps with the stop codoneThe yahA gene has two Fur clusters one is inside the coding region another is 510 bases upstream of the translational start The Fur cluster insidethe gene was tested by gel shift experiments (Figure 3)

Nucleic Acids Research 2007 Vol 35 No 20 6773

(Supplementary Figure S8) while only 7 of the 20P aeruginosa Fur binding sites and 6 of the 22 B subtilisFur binding sites are single-dimer binding sites (data notshown) This scarcity of multiple sites explains why anM12 alignment can be obtained for E coli but not forP aeruginosa or B subtilis When we performed multiplealignment by only using overlapping sites of 6-bp spacingwe obtained M12 alignments for both P aeruginosa andB subtilis (data not shown) These observations alsosupport the use of M9 as a single-dimer modelOur multiple alignment method should apply to any

DNA binding sites that contain internal direct repeats asrepetition within binding sites may cause alternativealignments of the sites In the RegulonDB (httpregulondbccgunammx) (64) a Fur binding modelwas created by multiply aligning 47 Fur sites althoughit is not clear what these 47 sites are and how thealignment was made Based on their alignment (httpregulondbccgunammxhtmlmatrix_Alignmentjsp) we made a sequence logo (data notshown) which we found to be similar to our M6 logosAlthough the M6 models appear to be able to predictsome sites (eg Tables 1 and 2 Supplementary Figures S3S5 and S7) the models only represent the internal partof overlapping Fur binding sites ie from 6 to thorn6 of theM12 model (Figure 1A)A 18 A crystal structure for P aeruginosa Fur

was obtained by Pohl et al and based on this theauthors proposed a single-dimer binding model (9) TheP aeruginosa crystal structure contains a putative DNAbinding a-helix H4 and a loop between helices H1 and H2both of which were proposed to be involved in bindingDNA through contacting bases in the major groove Thusa single dimer would protect two consecutive majorgrooves on the same face of DNA and the minimalrecognition unit of Fur should be close to 20 bp (9) OurM9 binding site models (19 bp a 9-1-9 inverted repeat) fitwith this single-dimer binding model closely Pohl et alalso proposed a two-dimer binding model in whichtwo Fur dimers bind two overlapping sites that are 5-bpapart (9) However in our whole genome scans andfootprint scans as well we did not observe any case of5-bp separated overlapping sites above 5 bits Instead wemostly found 3- or 6-bp separated overlapping sitesAbove five bits we found 95 cases of 3-bp spacing and92 cases of 6-bp spacingSeveral other consensus-based Fur binding site models

have been proposed to interpret a 19-bp consensuslsquoFur boxrsquo (Figure 6) (65) Within these two earlier

models the classical model and hexamer model(573754) interpreted the lsquoFur boxrsquo as a single recogni-tion unit of a Fur dimer Later Lavrrar and McIntoshsuggested that a 13-bp inverted repeat (6-1-6) is theminimal unit recognized by a single Fur dimer and thattwo overlapping lsquo6-1-6rsquo motifs correspond to the Fur boxwhich is required for high-affinity binding of Fur (5558)Baichoo and Helmann reinterpreted the Fur box as twooverlapping 7-1-7 inverted repeats with a 6-bp spacingand suggested that a 7-1-7 site but not 6-1-6 representsthe minimal recognition unit of Fur (40) Baichoo andHelmann also demonstrated that high-affinity bindingby Fur can happen on a single-dimer binding site ie aninverted 7-1-7 site

The 7-1-7 model basically agrees with our M9 models(Figure 6) The main difference is that the 7-1-7 model didnot count the weakly conserved bases at positions 9 8thorn8 and thorn9 in the B subtilis Fur binding site alignment(Figure 4C) As the E coli and P aeruginosa M9 modelshave highly conserved bases at these four positions(Figure 4A and B) and also the Fur binding mechanismsfor these three species are similar (9) we suggest thata minimal Fur binding unit of 19 bp (9-1-9) as representedby our M9 models is more reasonable

Relative to our M9 models the Fur box only representsthe internal part of two overlapping sites that areseparated by 6 bases ie from 9 to thorn9 of the M12logo (Figures 1A and 6) thus any predictions based on theFur box may not be accurate Two sites (exbD and ygaC)that were predicted by matching to the Fur box wereshown to be non-binding (Figure 3)

Many difficulties in understanding Fur binding sites canbe attributed to the choice of consensus sequences asa model (66) In contrast to the individual informationweight matrix model the consensus sequence methodignores the varying importance of bases by treatingmismatches equivalently In addition the consensusmethod does not have a criterion for an acceptablenumber of mismatches

Sequence walkers predicted by the 24-site M9 modelcover 83 of the footprints (Table 3 Figure 5Supplementary Figure S8) Similar results were obtainedfor both the P aeruginosa and B subtilis M9 models(data not shown) These results strongly suggest that ourM9 models can accurately predict Fur binding

Fur Binding Site Clusters

The whole genome scan with the 24-site M9model revealed363 Fur binding clusters in E coli most of which containmultiple walkers (up to 14) covering regions up to about140 bases This is comparable to the size of knownfootprints which ranges from 30 to 103 bases (Table 3Supplementary Figure S8) Most of the clusters (303 outof 363 83) were found to overlap with one or morestrong s70 promoters Indeed the four Fur clusters inthe middle of coding regions gspC garP yahA and fadDthat we tested by gel mobility shift assays (Figure 3) allhave one or more potential s70 sites (Ri gt 4 bits) exactlyoverlapping them (see Figures S10ndashS13) The function ofthese control elements is unknown

Fur boxClassical model 9-1-9Hexamer model 6-6-1-6Lavrrar model 6-1-6

Baichoo model 7-1-7

M9 model 9-1-9

GATAATGATAATCATTATC

========gtOlt========

=====gt=====gtOlt=====

=====gtOlt==========gtOlt=====

t=====gtOlt============gtOlt=====a

aat=====gtOlt================gtOlt=====att

Figure 6 Different Fur binding models to interpret the Fur box

6774 Nucleic Acids Research 2007 Vol 35 No 20

We correlated our whole genome scan with publishedmicroarray data about Fur in E coli (1360) The resultsshowed that all operons that are involved in ironmetabolism have a predicted strong Fur binding cluster(higher than 13 bits) suggesting direct repression ofthese genes by Fur In contrast only a few of otherFur-activated or repressed operons have a predicted Furcluster higher than 10 bits suggesting indirect control ofthese genes by Fur It has been shown that RNA generyhB directly targets and represses a set of other genes andryhB itself is directly repressed by Fur thus ryhB playsan important role in indirect activation of genes by Furin E coli (135760)

It has been suggested that in N meningitidis Fur canalso directly activate genes by binding to the regionupstream of the promoters (59) However our wholegenome analysis did not support this activation mechan-ism in E coli The E coli ryhB gene is located betweengenes yhhX and yhhY No corresponding homologs of thegenes ryhB yhhX or yhhY can be found in N meningitidisThis suggests that Fur activation mechanisms may bedifferent in different species

Disentangling Fur models

Each binding site that we have studied in detail withinformation theory has had unique properties andchallenges for analysis For example ribosome bindingsites were initially modeled as rigid objects (16) but whenenough sites became available it was clear that theShinendashDalgarno sequence was being lost because it has avariable distance to the initiation codon (67) and a flexiblemodel was required (48) The flexible model was latersuccessfully applied to s70 promoters (47) and that wasused to study Fur sites Likewise information theoryanalysis revealed that Fis binding sites are commonlyfound to overlap each other (68) leading to someuncertainty as to how much nearby sites influence thesites being studied in the alignments Overlapping Fursites that are bound by Fur protein dimers create an evenmore complex situation

Although the 24-site M9 model was quite successful themodel is comprised of information from multiple Fur sitesbecause the Fur dimers overlap on the DNA Since theclusters frequently display 6-base separation of sitespositions 3 to thorn9 of the model contain data frompositions 9 to thorn3 of the adjacent Fur sites eg see thelogo of Figure 4 and the M9 walkers of Figure 2AIn addition information from sites separated by 3 basesand the complements of all of these also contribute to theoverall information content of the model It is not obvioushow to disentangle these effects We suggest that the M9model can be scanned across the genome to locate strongsingle-dimer binding sites which could then be experi-mentally confirmed and used to construct a disentangledmodel

Unfortunately the distribution of the individual infor-mation in these sites would be determined by the modelused to locate them and this could introduce biases Forexample the mean of the distribution can be altered justby selecting different subsets of sites from the distribution

A similar bias has occurred in binding site databaseswhere the consensus sequence was used to locate newsites (23) Thus although the sequence logo would nolonger contain self-overlapping data it would not neces-sarily represent the natural distribution of site strengthsTo approach such an ultimate model may require furtherscanning of footprinted regions Incorporation of theweaker sites in the footprints would of course re-entanglethe model so such scans might be best used only todetermine whether the model can detect the weaker sitesA viable solution may be to obtain all single-dimerbinding sites in the genome that have more than zero bitsand can be confirmed experimentally If the resultingmodel matches the observed footprints it may be consi-dered complete On the other hand if the resultingmodel does not match the footprints the mode of bindingby Fur could be different between single dimers andclusters

SUPPLEMENTARY DATA

Supplementary data are available at NAR Online

ACKNOWLEDGEMENTS

We thank Tom OrsquoHalloran and Caryn Outten for Furprotein Pete Rogan for helping to develop the local bestconcept Elaine Bucheimer for writing the localbestprogram and Lakshmanan Iyer Brent Jewett DanielleNeedle Xiao Ma Shu Ouyang Denise Rubens andBruce Shapiro for their helpful comments and discussionsKAL also thanks the National Cancer Institute atFrederick for sponsoring the Werner H Kirsten StudentIntern Program This publication has been funded in partwith Federal funds from the National Cancer InstituteNational Institutes of Health under contract NO1-CO-12400 The content of this publication does not necessarilyreflect the views or policies of the Department of Healthand Human Services nor does mention of trade namescommercial products or organizations imply endorsementby the US Government This research was supported inpart by the Intramural Research Program of the NIHNational Cancer Institute Center for Cancer ResearchFunding to pay the Open Access publication charges forthis article was provided by National Cancer Institute

Conflict of interest statement None declared

REFERENCES

1 StojiljkovicI BaumlerAJ and HantkeK (1994) Fur regulon ingram-negative bacteria Identification and characterization of newiron-regulated Escherichia coli genes by a fur titration assay[published erratum appears in J Mol Biol 1994 Jul 15240(3)271]J Mol Biol 236 531ndash545

2 Le CamE FrechonD BarrayM FourcadeA and DelainE(1994) Observation of binding and polymerization of Fur repressoronto operator-containing DNA with electron and atomic forcemicroscopes Proc Natl Acad Sci USA 91 11816ndash11820

3 CalderwoodSB and MekalanosJJ (1987) Iron regulation ofShiga-like toxin expression in Escherichia coli is mediated by thefur locus J Bacteriol 169 4759ndash4764

Nucleic Acids Research 2007 Vol 35 No 20 6775

4 NiederhofferEC NaranjoCM BradleyKL and FeeJA (1990)Control of Escherichia coli superoxide dismutase (sodA and sodB)genes by the ferric uptake regulation (fur) locus J Bacteriol 1721930ndash1938

5 EscolarL Perez-MartinJ and de LorenzoV (1998) Binding of theFur (Ferric Uptake Regulator) repressor of Escherichia coli toarrays of the GATAAT sequence J Mol Biol 283 537ndash547

6 BaggA and NeilandsJB (1987) Ferric uptake regulation proteinacts as a repressor employing iron (II) as a cofactor to bind theoperator of an iron transport operon in Escherichia coliBiochemistry 26 5471ndash5477

7 de LorenzoV WeeS HerreroM and NeilandsJB (1987)Operator sequences of the aerobactin operon of plasmid ColV-K30binding the ferric uptake regulation (fur) repressor J Bacteriol169 2624ndash2630

8 AlthausEW OuttenCE OlsonKE CaoH andOrsquoHalloranTV (1999) The ferric uptake regulation (Fur) repressoris a zinc metalloprotein Biochemistry 38 6559ndash6569

9 PohlE HallerJC MijovilovichA Meyer-KlauckeWGarmanE and VasilML (2003) Architecture of a protein centralto iron homeostasis crystal structure and spectroscopic analysis ofthe ferric uptake regulator Mol Microbiol 47 903ndash915

10 KammlerM SchonC and HantkeK (1993) Characterization ofthe ferrous iron uptake system of Escherichia coli J Bacteriol 1756212ndash6219

11 DesaiPJ AngererA and GencoCA (1996) Analysis of Furbinding to operator sequences within the Neisseria gonorrhoeae fbpApromoter J Bacteriol 178 5020ndash5023

12 HeidrichC HantkeK BierbaumG and SahlHG (1996)Identification and analysis of a gene encoding a Fur-likeprotein of Staphylococcus epidermidis FEMS Microbiol Lett 140253ndash259

13 McHughJP Rodriguez-QuinonesF Abdul-TehraniHSvistunenkoDA PooleRK CooperCE and AndrewsSC(2003) Global iron-dependent gene regulation in Escherichia coliA new mechanism for iron homeostasis J Biol Chem 27829478ndash29486

14 ShannonCE (1948) A mathematical theory of communicationBell Syst Tech J 27 379ndash423 623ndash656 httpcmbell-labscomcmmswhatshannondaypaperhtml

15 PierceJR (1980) An Introduction to Information Theory SymbolsSignals and Noise 2nd edn Dover Publications Inc New York

16 SchneiderTD StormoGD GoldL and EhrenfeuchtA (1986)Information content of binding sites on nucleotide sequencesJ Mol Biol 188 415ndash431 httpwwwccrnpncifcrfgov~tomspaperschneider1986

17 SchneiderTD and MastronardeD (1996) Fast multiple alignmentof ungapped DNA sequences using information theory and arelaxation method Discrete Appli Math 71 259ndash268 httpwwwccrnpncifcrfgov~tomspapermalign

18 SchneiderTD and StephensRM (1990) Sequence logos a newway to display consensus sequences Nucleic Acids Res 186097ndash6100 httpwwwccrnpncifcrfgov~tomspaperlogopaper

19 SchneiderTD (1997) Information content of individual geneticsequences J Theor Biol 189 427ndash441 httpwwwccrnpncifcrfgov~tomspaperri

20 SchneiderTD (1997) Sequence walkers a graphical method todisplay how binding proteins interact with DNA or RNAsequences Nucleic Acids Res 25 4408ndash4415 httpwwwccrnpncifcrfgov~tomspaperwalker erratum NAR 26(4)1135 1998

21 PaninaEM MironovAA and GelfandMS (2001) Comparativeanalysis of FUR regulons in gamma-proteobacteria Nucleic AcidsRes 29 5195ndash5206

22 SchneiderTD (2000) Evolution of biological informationNucleic Acids Res 28 2794ndash2799 httpwwwccrnpncifcrfgov~tomspaperev

23 VyhlidalCA RoganPK and LeederJS (2004) Development andrefinement of pregnane X receptor (PXR) DNA binding site modelusing information theory insights into PXR-mediated gene regula-tion J Biol Chem 279 46779ndash46786

24 ZhengM DoanB SchneiderTD and StorzG (1999) OxyR andSoxRS regulation of fur J Bacteriol 181 4639ndash4643 httpwwwccrnpncifcrfgov~tomspaperoxyrfur

25 HengenPN BartramSL StewartLE and SchneiderTD (1997)Information analysis of Fis binding sites Nucleic Acids Res 254994ndash5002 httpwwwccrnpncifcrfgov~tomspaperfisinfo

26 WoodTI GriffithKL FawcettWP Jair K-W SchneiderTDand WolfRE (1999) Interdependence of the position andorientation of SoxS binding sites in the transcriptional activation ofthe class I subset of Escherichia coli superoxide-inducible promotersMol Microbiol 34 414ndash430

27 ZhengM WangX DoanB LewisKA SchneiderTD andStorzG (2001) Computation-directed identification of OxyR-DNAbinding sites in Escherichia coli J Bacteriol 183 4571ndash4579

28 RoganPK FauxBM and SchneiderTD (1998) Informationanalysis of human splice site mutations Hum Mutat 12 153ndash171Erratum in Hum Mutat 199913(1)82 httpwwwccrnpncifcrfgov~tomspaperrfs

29 ChenZ and SchneiderTD (2005) Information theory basedT7-like promoter models classification of bacteriophages anddifferential evolution of promoters and their polymerasesNucleic Acids Res 33 6172ndash6187 httpwwwccrnpncifcrfgov~tomspaperst7like

30 ChenZ and SchneiderTD (2006) Comparative analysis of tandemT7-like promoter containing regions in enterobacterial genomesreveals a novel group of genetic islands Nucleic Acids Res 341133ndash1147 httpwwwccrnpncifcrfgov~tomspaperst7island

31 SchneiderTD (1996) Reading of DNA sequence logos predictionof major groove binding by information theory Meth Enzymol274 445ndash455 httpwwwccrnpncifcrfgov~tomspaperoxyr

32 GriggsDW and KoniskyJ (1989) Mechanism for iron-regulatedtranscription of the Escherichia coli cir gene metal-dependentbinding of Fur protein to the promoters J Bacteriol 1711048ndash1052

33 AngererA and BraunV (1998) Iron regulates transcription of theEscherichia coli ferric citrate transport genes directly and throughthe transcription initiation proteins Arch Microbiol 169 483ndash490

34 de LorenzoV HerreroM GiovanniniF and NeilandsJB (1988)Fur (ferric uptake regulation) protein and CAP (catabolite-activatorprotein) modulate transcription of fur gene in Escherichia coliEur J Biochem 173 537ndash546

35 HuntMD PettisGS and McIntoshMA (1994) Promoter andoperator determinants for fur-mediated iron regulation in thebidirectional fepA-fes control region of the Escherichia colienterobactin gene system J Bacteriol 176 3944ndash3955

36 TardatB and TouatiD (1993) Iron and oxygen regulation ofEscherichia coli MnSOD expression competition between the globalregulators Fur and ArcA for binding to DNA Mol Microbiol 953ndash63

37 EscolarL Perez-MartinJ and de LorenzoV (2000) Evidence ofan unusually long operator for the fur repressor in the aerobactinpromoter of Escherichia coli J Biol Chem 275 24709ndash24714

38 YoungGM and PostleK (1994) Repression of tonB transcriptionduring anaerobic growth requires Fur binding at the promoterand a second factor binding upstream Mol Microbiol 11943ndash954

39 BrickmanTJ OzenbergerBA and McIntoshMA (1990)Regulation of divergent transcription from the iron-responsivefepB-entC promoter-operator regions in Escherichia coliJ Mol Biol 212 669ndash682

40 BaichooN and HelmannJD (2002) Recognition of DNA by Fura reinterpretation of the Fur box consensus sequence J Bacteriol184 5826ndash5832

41 OchsnerUA and VasilML (1996) Gene repression by the ferricuptake regulator in Pseudomonas aeruginosa cycle selection of iron-regulated genes Proc Natl Acad Sci USA 93 4409ndash4414

42 BaichooN WangT YeR and HelmannJD (2002) Globalanalysis of the Bacillus subtilis Fur regulon and the iron starvationstimulon Mol Microbiol 45 1613ndash1629

43 ChristoffersenCA BrickmanTJ Hook-BarnardI andMcIntoshMA (2001) Regulatory architecture of the iron-regulatedfepD-ybdA bidirectional promoter region in Escherichia coliJ Bacteriol 183 2059ndash2070

44 ChaiS WelchTJ and CrosaJH (1998) Characterization of theinteraction between Fur and the iron transport promoter of thevirulence plasmid in Vibrio anguillarum J Biol Chem 27333841ndash33847

6776 Nucleic Acids Research 2007 Vol 35 No 20

45 EscolarL Perez-MartinJ and de LorenzoV (1998) Coordinatedrepression in vitro of the divergent fepA-fes promoters ofEscherichia coli by the iron uptake regulation (Fur) proteinJ Bacteriol 180 2579ndash2582

46 EscolarL de LorenzoV and Perez-MartinJ (1997)Metalloregulation in vitro of the aerobactin promoter of Escherichiacoli by the Fur (ferric uptake regulation) protein Mol Microbiol26 799ndash808

47 ShultzabergerRK ChenZ LewisKA and SchneiderTD (2007)Anatomy of Escherichia coli s70 promoters Nucleic Acids Res 35771ndash788 httpwwwccrnpncifcrfgov~tomspaperflexprom

48 ShultzabergerRK BucheimerRE RuddKE andSchneiderTD (2001) Anatomy of Escherichia coli ribosomebinding sites J Mol Biol 313 215ndash228 httpwwwccrnpncifcrfgov~tomspaperflexrbs

49 HiraoI KawaiG YoshizawaS NishimuraY IshidoYWatanabeK and MiuraK (1994) Most compact hairpin-turnstructure exerted by a short DNA fragment d(GCGAAGC) insolution an extraordinarily stable structure resistant to nucleasesand heat Nucleic Acids Res 22 576ndash582

50 Eick-HelmerichK and BraunV (1989) Import of biopolymers intoEscherichia coli nucleotide sequences of the exbB and exbD genesare homologous to those of the tolQ and tolR genes respectivelyJ Bacteriol 171 5117ndash5126

51 VassinovaN and KozyrevD (2000) A method for direct cloning ofFur-regulated genes identification of seven new Fur-regulated lociin Escherichia coli Microbiology 146 3171ndash3182

52 ToledanoMB KullikI TrinhF BairdPT SchneiderTD andStorzG (1994) Redox-dependent shift of OxyR-DNA contactsalong an extended DNA binding site a mechanism for differentialpromoter selection Cell 78 897ndash909

53 BlattnerFR PlunkettG III BlochCA PernaNT BurlandVRileyM Collado-VidesJ GlasnerJD RodeCK et al (1997)The complete genome sequence of Escherichia coli K-12 Science277 1453ndash1474

54 BaggA and NeilandsJB (1987) Molecular mechanism of regula-tion of siderophore-mediated iron assimilation Microbiol Rev 51509ndash518

55 LavrrarJL and McIntoshMA (2003) Architecture of a Furbinding site a comparative analysis J Bacteriol 185 2194ndash2202

56 OrchardK and MayGE (1993) An EMSA-based method fordetermining the molecular weight of a protein ndash DNA complexNucleic Acids Res 21 3335ndash3336

57 MasseE and GottesmanS (2002) A small RNA regulates theexpression of genes involved in iron metabolism in Escherichia coliProc Natl Acad Sci USA 99 4620ndash4625

58 LavrrarJL ChristoffersenCA and McIntoshMA (2002)FurndashDNA interactions at the bidirectional fepDGC-entS promoterregion in Escherichia coli J Mol Biol 322 983ndash995

59 DelanyI RappuoliR and ScarlatoV (2004) Fur functions as anactivator and as a repressor of putative virulence genes in Neisseriameningitidis Mol Microbiol 52 1081ndash1090

60 MasseE VanderpoolCK and GottesmanS (2005) Effect ofRyhB small RNA on global iron use in Escherichia coliJ Bacteriol 187 6962ndash6971

61 ArgamanL HershbergR VogelJ BejeranoG WagnerEGMargalitH and AltuviaS (2001) Novel small RNA-encodinggenes in the intergenic regions of Escherichia coli Curr Biol 11941ndash950

62 WassarmanKM RepoilaF RosenowC StorzG andGottesmanS (2001) Identification of novel small RNAs usingcomparative genomics and microarrays Genes Dev 15 1637ndash1651

63 RobisonK McGuireAM and ChurchGM (1998) A compre-hensive library of DNA-binding site matrices for 55 proteins appliedto the complete Escherichia coli K-12 genome J Mol Biol 284241ndash254

64 SalgadoH Santos-ZavaletaA Gama-CastroS Millan-ZarateDDiaz-PeredoE Sanchez-SolanoF Perez-RuedaEBonavides-MartinezC and Collado-VidesJ (2001) RegulonDB(version 32) transcriptional regulation and operon organization inEscherichia coli K-12 Nucleic Acids Res 29 72ndash74

65 AndrewsSC RobinsonAK and Rodriguez-QuinonesF (2003)Bacterial iron homeostasis FEMS Microbiol Rev 27 215ndash237

66 SchneiderTD (2002) Consensus Sequence Zen ApplBioinformatics 1 111ndash119 httpwwwccrnpncifcrfgov~tomspaperszen

67 RuddKE and SchneiderTD (1992) Compilation of E coliribosome binding sites In MillerJH (ed) A Short Course inBacterial Genetics A Laboratory Manual and Handbook forEscherichia coli and Related Bacteria Cold Spring HarborCold Spring Harbor Laboratory Press New York pp 1719ndash1745

68 HengenPN LyakhovIG StewartLE and SchneiderTD(2003) Molecular flip-flops formed by overlapping Fis sitesNucleic Acids Res 31 6663ndash6673

69 FeeJA (1991) Regulation of sod genes in Escherichia colirelevance to superoxide dismutase function Mol Microbiol 52599ndash2610

70 IgarashiK SaishoT YuguchiM and KashiwagiK (1997)Molecular mechanism of polyamine stimulation of thesynthesis of oligopeptide-binding protein J Biol Chem 2724058ndash4064

71 YoshidaM MeksuriyenD KashiwagiK KawaiG andIgarashiK (1999) Polyamine stimulation of the synthesis ofoligopeptide-binding protein (OppA) Involvement of a structuralchange of the Shine-Dalgarno sequence and the initiation codonaug in oppa mRNA J Biol Chem 274 22723ndash22728

72 LitwinCM and CalderwoodSB (1993) Role of iron in regulationof virulence genes Clin Microbiol Rev 6 137ndash149

73 MakuiH RoigE ColeST HelmannJD GrosP andCellierMF (2000) Identification of the Escherichia coli K-12Nramp orthologue (MntH) as a selective divalent metal iontransporter Mol Microbiol 35 1065ndash1078

74 PatzerSI and HantkeK (2001) Dual repression by Fe(2+)-Furand Mn(2+)-MntR of the mntH gene encoding an NRAMP-likeMn(2+) transporter in Escherichia coli J Bacteriol 1834806ndash4813

75 FurrerJL SandersDN Hook-BarnardIG and McIntoshMA(2002) Export of the siderophore enterobactin in Escherichia coliinvolvement of a 43 kDa membrane exporter Mol Microbiol 441225ndash1234

76 CunninghamL GruerMJ and GuestJR (1997) Transcriptionalregulation of the aconitase genes (acnA and acnB) of Escherichiacoli Microbiology 143 ( Pt 12) 3795ndash3805

77 FuentesAM Diaz-MejiaJJ Maldonado-RodriguezR andAmabile-CuevasCF (2001) Differential activities of theSoxR protein of Escherichia coli SoxS is not required for geneactivation under iron deprivation FEMS Microbiol Lett 201271ndash275

78 VargheseS TangY and ImlayJA (2003) Contrasting sensitivitiesof Escherichia coli aconitases A and B to oxidation and irondepletion J Bacteriol 185 221ndash230

79 ZavitzKH DiGateRJ and MariansKJ (1991) The PriB andPriC replication proteins of Escherichia coli Genes DNAsequence overexpression and purification J Biol Chem 26613988ndash13995

80 SandlerSJ (2005) Requirements for replication restart proteinsduring constitutive stable DNA replication in Escherichia coli K-12Genetics 169 1799ndash1806

81 SchneiderTD (2001) Strong minor groove base conservation insequence logos implies DNA distortion or base flipping duringreplication and transcription initiation Nucleic Acids Res 294881ndash4891 httpwwwccrnpncifcrfgov~tomspaperbaseflip

Nucleic Acids Research 2007 Vol 35 No 20 6777

Page 9: Discovery of Fur binding site clusters in Escherichia coli ...biitcomm/research/references... · using an fhuF:lacZ fusion and Fur consensus sequence-containing plasmid titrant on

minimizes false positives in our predictions When themodel was scanned across the E coli K-12 genome(NC_000913) a total of 412 sites were found above10 bits These sites belong to 363 unique clusters A list ofthe clusters is available at httpwwwccrnpncifcrfgovtomspapersfurThe 363 regions were extracted from 50 to thorn50

around the strongest site in each region and scanned withthe 24-site M9 model using a cutoff of 0 bits We thendetermined the exact region of each cluster that is coveredby walkers greater than 0 bits The results show that theseclusters contain 1ndash14 walkers covering a region from19ndash142 bases (350 bases on average) This is comparableto the known footprints (30ndash103 bases) (Table 3 Supple-mentary Figure S8) Of these clusters 155 are in inter-genic regions 143 are entirely inside gene coding regionsand 65 overlap the boundary between coding and inter-genic regionsTo determine how many of the Fur clusters overlap

with reasonably strong promoters we scanned these 363Fur clusters with a s70 promoter model (47) using a 4-bitcutoff The results showed that 303 clusters overlap withone or more strong promoters (Rigt4 bits) The other60 clusters do not overlap with a strong promoterWhen we scanned the flanking regions of these 60 clusters(50 to 0 of the left edge and 0 to thorn50 of the right edgeof each cluster) only 23 clusters were found to havestrong promoters (Ri gt 4 bits) in the flanking regionsWe examined the locations of Fur sites relative toknown promoter elements (transcriptional start site 10and 35) and found that they are more tightly distributedaround the promoter elements than observed with theDNA binding proteins Fis H-NS and IHF (47) (data notshown) These results suggest that Fur may mainly act as adirect repressor of genes by binding to and blockingthe promoters and that direct activation by Fur throughbinding to the region upstream of a promoter assuggested in Neisseria meningitidis by Delany et al (59)

may not be a common mechanism of Fur regulationin E coli

Out of the 363 clusters 42 were found containingat least one predicted Fur binding site above 170 bits(our arbitrary cutoff used to locate significant regions)(Table 4) Of the 42 clusters 39 were found to overlap withone or more strong s70 promoters (gt4 bits) The otherthree clusters fepD-entS yddA and fecA do not overlapScanning the flanking regions of these three clusters withthe s70 promoter model revealed that these Fur clustersare located between the respective promoters and transla-tional starts There may be only weak repression of thesegenes by Fur as none of them were significantly repressedby Fur according to microarray analyses (1360)

Scans of other proposed Fur-regulated genes

Many genes have been proposed to be Fur-regulated bycomparing conventional consensus sequences to promoterregions and also by homology to systems in other organ-isms Kammler et al annotated two Fur binding sites inthe promoter region of feoA in E coli by comparison toa consensus sequence (10) The 24-site M9 model predictstwo clusters of sites in the same region with each clustercovering one of the marked Fur boxes One cluster hasone major site of 105 bits (at 3538006) the other clusterhas two major overlapping sites of 109 (at 3538059) and134 bits (at 3538065) respectively The same authors haveconfirmed that Fur does bind to this region in vivo but theexact Fur binding regions have not been determined byfootprint experiments

Vassinova and Kozyrev used an in vivo selection tolocate Fur sites on Sau3A fragments from the E coligenome (51) The five regions from Figure 2 of their paperwere analyzed using sequence walkers Using an unidenti-fied consensus sequence yhhX (from 3578828 to 3577791)was predicted by Vassinova and Kozyrev to have two Furbinding sites In the same region we found one strongsite of 252 bits (at 3579054) (Table 4) overlapping twoweaker sites (82 and 83 bits) This cluster of Fur sites isactually located right in front of an sRNA gene ryhB(from 3579039 to 3578946) (Table 4) (6162) which hasbeen shown to be repressed by Fur (57) The promoterregion of gpmA (from 786818 to 786066 named pgm inreference 51) was predicted by consensus to contain twosites One of the highest sites (274 bits) was found in thisregion (at 786853) overlapping two weaker sites of 75and 56 bits The consensus sequence-predicted Fur site byVassinova and Kozyrev (51) is located about 600 basesupstream of the ygaC gene In the same region our 11-siteM9 model only found a 14-bit site (Figure 3) and the24-site M9 model found a negative site of 34 bits(Supplementary Figure S1) Instead in the nearby regionsthe 24-site M9 model found a 74-bit site (11 basesupstream of ygaC) and a 101-bit site (728 bases upstreamof ygaC) For nohA (from 1634391 to 1633822)two overlapping sites of 223 and 126 bits were found at1634624 and 1634630 (Table 4) The fifth region was fhuF(from 4603686 to 4602898) Figure 5 shows the DNase Ifootprints and predicted walkers in this region Onlythe high-affinity region (fhuF1) was identified by the

Table 2 Information theory test of B subtilis Fur models M9 and M6

Oligo1 Number of observed1

Fur dimers (nM)Number of M92

walkers (bits)Number of M62

walkers (bits)

6-mer 0 f 0 f 06-6 1 (500) f 1 (21) 2 (67 54)6-1-6 1 (200) f 1 (97) 2 (54 74)7-1-7 1 (100) f 1 (207) 2 (107 128)8-1-7 2 (100 1000) f 2 (76 202) f 2 (174 108)8-1-8 2 (100 1000) f 2 (76 230) f 2 (174 160)9-1-9 2 (50 100) f 2 (119 301) f 2 (208 194)2 (7-1-7) 2 (10 100) f 2 (166 257) 3 (61 208 143)feuA 1 (10) f 1 (223) 2 (103 81)dhbA 2 (10 100) f 2 (256 299) 3 (149 208 171)

1Oligos and observed Fur dimers are from (40) The Fur concentration(nM) at which one or two Fur dimer-DNA complexes appeared isgiven in parentheses2Number of sequence walkers of each model followed by theinformation (bits) of each walker are shown for each sequence Thepredictions that match the number of observed Fur dimers are markedwith a solid circle while those that do not are marked with an opencircle Only major walkers are counted A full depiction of all walkerscan be found in Supplementary Figure S7

6770 Nucleic Acids Research 2007 Vol 35 No 20

A B

L

H

H

L

H fhuF1

L fhuF2

yjjZ

4603880 4603870 4603860 4603850 4603840 4603830 4603820 5 a t c a g c a a t c c c g g c a g c a a c a c t c c c c a g c c a c t g c c c a g c g t a c g t t g c a a c a t g a t t t c a t c t c 3

4603810 4603800 4603790 4603780 4603770 4603760 4603750 5 t t t c a t t g a t a a t g a t a a c c a a t a t c a t a t g a t a a t t t t t a t c a t t t g c a a g c c a g a t a a a t c c c t t g c t a t c g g 3

Fur high affinity topFur high affinity bottom

M9 161 bits

M9 208 bits

M9 223 bits

M9 34 bits

fhuF

fhuF

4603740 4603730 4603720 4603710 4603700 4603690 4603680 5 g t a a a c c t a t c g c t a t g a t t a g c a a t c a t t a t c a t t t a g a t t a c t a t c c c g a t t a t g g c c t a t c g t t c 3rsquo

Fur low affinity topFur low affinity bottom

M9 22 bits

M9 01 bits

M9 146 bits

M9 93 bits

Figure 5 Fur model scan and DNase I footprints of the fhuF promoter region (A) The promoter region was scanned with the 24-site M9 model and two clusters of sequence walkers (Rigt 0bits) were detected with each closely corresponding to one of the two Fur protected regions (gray bars fhuF1 and fhuF2) The solid black arrow with two tails above the DNA at coordinates4603717 and 4603716 indicates the fhuF transcription starts The open arrows starting at coordinates 4603827 and 4603686 indicate the yjjZ and fhuF translation starts respectively Thesaturation of colored rectangles behind each sequence walker is proportional to the information of that site For reference colored lines cycle with 6-base periodicity (B) DNase I footprinting onthe fhuF promoter by Fur showed two regions protected by the protein marked in the figure by brackets The high-affinity and low-affinity regions are marked with H and L respectively Thefootprinting samples were run in parallel with MaxamndashGilbert sequencing ladders (marked by GA)

Nucleic

Acid

sResea

rch2007V

ol3

5N

o20

6771

consensus sequence Thus four of the five Sau3Afragments (yhhX gpmA nohA and fhuF) selected to haveFur binding sites in vivo were identified in our genomescan (Table 4) and our gel shift results confirmed this(Figure 3) the remaining site ygaC had a weak walker of14 bits (by the 11-site M9 model) and showed extremelyweak binding (Figure 3)Eick-Helmerich and Braun matched a Fur consensus

sequence to the promoters of exbB and exbD (50)One strong walker of 191 bits was found in the exbBpromoter region overlapping the lsquoFur boxrsquo The regionupstream of exbD showed no walkers with Ri greater than0 bits even though it had also been predicted to containa Fur binding site using the same consensus sequenceas used in the exbB promoter The highest walker thatwe could detect in the region upstream of the exbDgene is 71 bits (by the 24-site M9 model Figure 3 andSupplementary Figure S1) significantly lower than thetheoretical lower bound of a binding site (0 bits) (19) Thisindicates that Fur should not bind this region and our gelshift results confirmed this (Figure 3)A microarray analysis by McHugh et al found 101

genes to be regulated by Fur 53 of which were repressedand 48 activated (13) The 53 Fur-repressed genes belongto 32 transcription units (individual genes or operons)18 of which are involved in iron metabolism 14 in energymetabolism and other functions Using the 24-site M9model we predicted strong Fur binding clusters (higherthan 13 bits) for all 18 iron metabolism-related transcrip-tion units Of the other 14 transcription units onlytwo have a Fur binding site higher than 10 bits (nrdHIEF101 bits fimE 108 bits) The 48 Fur-activated genesbelong to 34 transcription units only three of these(garPLRK ynaE and ydfK) have a predicted Fur bindingcluster The garPLRK operon has a weak Fur clusterthat contains one major site of 104 bits The ynaE andydfK genes are almost identical in both promoter andcoding regions and the two Fur clusters (171 bitscentered at thorn8 from the translational start) are the same(Table 4) These results strongly suggest that in E coli

Fur is a direct repressor of iron metabolism genes and anindirect repressor or activator of other genes

McHugh et al (13) used an altered information theoryapproach for modeling Fur binding (63) to identify whichgenes are under direct Fur control Our model identifiedsites in all genes that their model did as well as fourothers fhuE (124 bits at 1160880) exbB (191 bits at3150076) fimE (108 bits at 4539697) and ydfK (171 bitsat 1631071)

The indirect activation by Fur requires an intermediateregulator which could directly target Fur-activated genesA small regulatory RNA gene ryhB is one such importantplayer Earlier study revealed several genes that aredirectly repressed by ryhB (57) Recently a microarrayanalysis of ryhB control by Masse et al (60) expanded thisdirect target set to 18 operons They also identified anadditional 10 indirectly down-regulated operons and 10up-regulated operons The 10 ryhB indirectly repressedoperons are directly repressed by Fur and for all ofthese operons we found strong Fur binding clusters(higher than 13 bits) in the corresponding promoterregions In contrast only a few of the other 28 ryhB-controlled operons (18 directly repressed operons and10 up-regulated operons) have a predicted Fur bindingcluster above 10 bits These include acnA (176 bits at1333484) ydhD (109 bits at 1732367) and oppA (215 bitsat 1298973) The acnA and oppA Fur clusters are bothstrong the former cluster has three overlapping sites of176 98 and 142 bits and the latter has two overlapp-ing sites of 215 and 96 bits (Table 4) Howevermicroarray analysis did not find these two genes to berepressed by Fur (1360) suggesting that the Fur bindingclusters may not be the primary control elements ofthese genes

DISCUSSION

Fur binding models

In this study we used experimentally proven sites to createinformation theory models of Fur binding Our models

Table 3 Scan of footprinted Fur binding sites in E coli

Accession Gene Footprinted region References Number Ri Percentage

NC_000913 fhuF1 4603763 ndash 4603815 This work 4 223 91NC_000913 fhuF2 4603680 ndash 4603733 This work 4 146 66NC_000913 fepA-fes 611868 ndash 611915 (35) 4 187 77NC_000913 fepB 623939 ndash 623969 (39) 2 167 81NC_000913 entC 624054 ndash 624102 (39) 4 231 76NC_000913 fur 709931 ndash 709960 (34) 2 123 83NC_000913 tonB 1309040 ndash 1309072 (838) 3 115 88NC_000913 cir 2244957 ndash 2244999 (32) 3 207 72NC_000913 sodA 4098726 ndash 4098780 (36) 3 233 56NC_000913 fecA 4514752 ndash 4514789 (33) 2 176 66NC_000913 fecI 4516300 ndash 4516340 (33) 7 231 144M10930 iucA 291 ndash 393 (37) 13 203 93NC_000913 fepD-entS 621394 ndash 621462 (58) 4 201 54

The footprinted regions were scanned with the 24-site M9 model (Rigt0) to test the modelrsquos validity This table lists the GenBank accession numberthe gene promoter that was footprinted the coordinates of the footprinted region the references for the footprints the number of walkers in theregion the strongest Ri value in the same region and the percent of the footprint covered by sequence walkers This table is a summary of Figure 5Aand Supplementary Figure S8 Sites in the promoters that are marked by an asterisk were used to build the 11-site models (Figure 1 SupplementaryFigure S1) in the case of iucA two sites were used in the model (see Materials and Methods Section)

6772 Nucleic Acids Research 2007 Vol 35 No 20

especially the 24-site M9 model approximate the bindingcharacteristics of Fur more fully than previous modelsthat depended on conventional consensus sequences datafrom multiple species and sequences that were not foot-printed (13251) The rigorous approach used hererevealed new binding sites disproved two sites predictedby a consensus sequence and clarified the manner inwhich the protein binds

To produce these models binding sequences weremultiply aligned to maximize the information content indifferent window sizes Because multiple Fur sites overlapour analysis of proven Fur binding sequences from three

bacterial species gave two or three different alignmentsdepending on the species (Figure 1 SupplementaryFigure S6) By testing these models against publisheddata we showed clearly that the M9 model representssingle-dimer binding sites the M12 model is for two-dimerbinding while the compact M6 alignment was caused bycompression of the binding sequences into a small window(Tables 1 and 2 Figures 1 and 2 Supplementary FiguresS3 S5 and S7)As evaluated by their respective M9 models our

initial set of 11 footprinted E coli Fur binding sitesall contain overlapping sites of 6-base spacing

Table 4 Strong Fur clusters in the E coli genome predicted by the 24-site M9 model

Ri Number Size Coordinate Genes Experimental Position References(bits) (bp) data

274 12 95 2066611 yoeA G 48 NA274 5 31 786853 gpmA G 35 (1351)253 4 28 579824 nohB 133 (1)252 4 31 3579054 ryhB G 15 (57)233 3 31 4098746 sodA F 87 (6069)231 7 59 4516309 fecI F 51 (1360)231 4 37 624072 entC F 36 (39)228 2 25 1903412 yebN 300 NA223 4 48 4603779 fhuFa FG 93 (135160)223 3 25 1634624 nohA G 233 (51)217 2 25 1886651 fadD G +1119b NA215 2 25 1298973 oppA G 233 (7071)213 3 31 1762463 sufA 53 (1360)207 3 31 2244980 cir F 189 (1372)206 5 40 1118358 yceJ 89 NA202 3 31 3465053 bfd 40 (1360)201 10 53 1787602 ydiE 35 (13)201 2 35 1080513 ycdN 57 (1360)199 8 106 3214572 yqjH 59 (13)199 7 103 1752756 ydhY 255 NA199 2 25 2510784 mntH 56 (7374)196 3 25 2231892 yeiT 163 NA196 2 25 1577358 yddA +8c (13)195 3 31 167439 fhuA G 45 (1360)194 4 37 621437 fepD ndash entS FG 25 86 (1375)193 3 52 675768 ybeQ +2c NA191 3 31 3150076 exbB G 70 (136072)187 8 96 2784036 ygaQ 383 NA187 4 37 611889 fepA ndash fes F 172 149 (13)186 7 38 3266404 yhaC 33 NA186 4 28 331901 yahA G +306e NA183 4 61 331085 yahA 510e NA176 6 41 1333484 acnA 371 (5776ndash78)176 2 25 4514779 fecA F 79 (1360)176 1 19 2254041 yeiL +5d NA175 3 25 490057 priC ndash ybaN 21 49 (7980)173 2 29 2454142 yfcV 474 NA172 8 41 3582772 yrhB 10 NA172 4 37 840877 fiu 123 (13)171 3 40 1631071 ydfK +8c (13)171 3 40 1432273 ynaE +8c (13)171 2 25 4570225 yjiT 212 NA

The strongest individual information value (Ri bits) in each cluster is shown followed by the number of walkers (Rigt 0 bits) the size of the clusterthe coordinate of the strongest site the genes that are presumably controlled experimental data footprinted (F) andor gel shifted (G) sites thelocation of the strongest Fur site relative to the gene start or stop codon (as noted) and references about Fur regulation of the genes (NA means noreference is available)aThis cluster corresponds to the high-affinity region (fhuF1) in the fhuF promoter (Figure 5)bThe coordinate is high because Fur cluster is inside the fadD coding regioncThese Fur clusters overlap with translational startsdThis Fur cluster overlaps with the stop codoneThe yahA gene has two Fur clusters one is inside the coding region another is 510 bases upstream of the translational start The Fur cluster insidethe gene was tested by gel shift experiments (Figure 3)

Nucleic Acids Research 2007 Vol 35 No 20 6773

(Supplementary Figure S8) while only 7 of the 20P aeruginosa Fur binding sites and 6 of the 22 B subtilisFur binding sites are single-dimer binding sites (data notshown) This scarcity of multiple sites explains why anM12 alignment can be obtained for E coli but not forP aeruginosa or B subtilis When we performed multiplealignment by only using overlapping sites of 6-bp spacingwe obtained M12 alignments for both P aeruginosa andB subtilis (data not shown) These observations alsosupport the use of M9 as a single-dimer modelOur multiple alignment method should apply to any

DNA binding sites that contain internal direct repeats asrepetition within binding sites may cause alternativealignments of the sites In the RegulonDB (httpregulondbccgunammx) (64) a Fur binding modelwas created by multiply aligning 47 Fur sites althoughit is not clear what these 47 sites are and how thealignment was made Based on their alignment (httpregulondbccgunammxhtmlmatrix_Alignmentjsp) we made a sequence logo (data notshown) which we found to be similar to our M6 logosAlthough the M6 models appear to be able to predictsome sites (eg Tables 1 and 2 Supplementary Figures S3S5 and S7) the models only represent the internal partof overlapping Fur binding sites ie from 6 to thorn6 of theM12 model (Figure 1A)A 18 A crystal structure for P aeruginosa Fur

was obtained by Pohl et al and based on this theauthors proposed a single-dimer binding model (9) TheP aeruginosa crystal structure contains a putative DNAbinding a-helix H4 and a loop between helices H1 and H2both of which were proposed to be involved in bindingDNA through contacting bases in the major groove Thusa single dimer would protect two consecutive majorgrooves on the same face of DNA and the minimalrecognition unit of Fur should be close to 20 bp (9) OurM9 binding site models (19 bp a 9-1-9 inverted repeat) fitwith this single-dimer binding model closely Pohl et alalso proposed a two-dimer binding model in whichtwo Fur dimers bind two overlapping sites that are 5-bpapart (9) However in our whole genome scans andfootprint scans as well we did not observe any case of5-bp separated overlapping sites above 5 bits Instead wemostly found 3- or 6-bp separated overlapping sitesAbove five bits we found 95 cases of 3-bp spacing and92 cases of 6-bp spacingSeveral other consensus-based Fur binding site models

have been proposed to interpret a 19-bp consensuslsquoFur boxrsquo (Figure 6) (65) Within these two earlier

models the classical model and hexamer model(573754) interpreted the lsquoFur boxrsquo as a single recogni-tion unit of a Fur dimer Later Lavrrar and McIntoshsuggested that a 13-bp inverted repeat (6-1-6) is theminimal unit recognized by a single Fur dimer and thattwo overlapping lsquo6-1-6rsquo motifs correspond to the Fur boxwhich is required for high-affinity binding of Fur (5558)Baichoo and Helmann reinterpreted the Fur box as twooverlapping 7-1-7 inverted repeats with a 6-bp spacingand suggested that a 7-1-7 site but not 6-1-6 representsthe minimal recognition unit of Fur (40) Baichoo andHelmann also demonstrated that high-affinity bindingby Fur can happen on a single-dimer binding site ie aninverted 7-1-7 site

The 7-1-7 model basically agrees with our M9 models(Figure 6) The main difference is that the 7-1-7 model didnot count the weakly conserved bases at positions 9 8thorn8 and thorn9 in the B subtilis Fur binding site alignment(Figure 4C) As the E coli and P aeruginosa M9 modelshave highly conserved bases at these four positions(Figure 4A and B) and also the Fur binding mechanismsfor these three species are similar (9) we suggest thata minimal Fur binding unit of 19 bp (9-1-9) as representedby our M9 models is more reasonable

Relative to our M9 models the Fur box only representsthe internal part of two overlapping sites that areseparated by 6 bases ie from 9 to thorn9 of the M12logo (Figures 1A and 6) thus any predictions based on theFur box may not be accurate Two sites (exbD and ygaC)that were predicted by matching to the Fur box wereshown to be non-binding (Figure 3)

Many difficulties in understanding Fur binding sites canbe attributed to the choice of consensus sequences asa model (66) In contrast to the individual informationweight matrix model the consensus sequence methodignores the varying importance of bases by treatingmismatches equivalently In addition the consensusmethod does not have a criterion for an acceptablenumber of mismatches

Sequence walkers predicted by the 24-site M9 modelcover 83 of the footprints (Table 3 Figure 5Supplementary Figure S8) Similar results were obtainedfor both the P aeruginosa and B subtilis M9 models(data not shown) These results strongly suggest that ourM9 models can accurately predict Fur binding

Fur Binding Site Clusters

The whole genome scan with the 24-site M9model revealed363 Fur binding clusters in E coli most of which containmultiple walkers (up to 14) covering regions up to about140 bases This is comparable to the size of knownfootprints which ranges from 30 to 103 bases (Table 3Supplementary Figure S8) Most of the clusters (303 outof 363 83) were found to overlap with one or morestrong s70 promoters Indeed the four Fur clusters inthe middle of coding regions gspC garP yahA and fadDthat we tested by gel mobility shift assays (Figure 3) allhave one or more potential s70 sites (Ri gt 4 bits) exactlyoverlapping them (see Figures S10ndashS13) The function ofthese control elements is unknown

Fur boxClassical model 9-1-9Hexamer model 6-6-1-6Lavrrar model 6-1-6

Baichoo model 7-1-7

M9 model 9-1-9

GATAATGATAATCATTATC

========gtOlt========

=====gt=====gtOlt=====

=====gtOlt==========gtOlt=====

t=====gtOlt============gtOlt=====a

aat=====gtOlt================gtOlt=====att

Figure 6 Different Fur binding models to interpret the Fur box

6774 Nucleic Acids Research 2007 Vol 35 No 20

We correlated our whole genome scan with publishedmicroarray data about Fur in E coli (1360) The resultsshowed that all operons that are involved in ironmetabolism have a predicted strong Fur binding cluster(higher than 13 bits) suggesting direct repression ofthese genes by Fur In contrast only a few of otherFur-activated or repressed operons have a predicted Furcluster higher than 10 bits suggesting indirect control ofthese genes by Fur It has been shown that RNA generyhB directly targets and represses a set of other genes andryhB itself is directly repressed by Fur thus ryhB playsan important role in indirect activation of genes by Furin E coli (135760)

It has been suggested that in N meningitidis Fur canalso directly activate genes by binding to the regionupstream of the promoters (59) However our wholegenome analysis did not support this activation mechan-ism in E coli The E coli ryhB gene is located betweengenes yhhX and yhhY No corresponding homologs of thegenes ryhB yhhX or yhhY can be found in N meningitidisThis suggests that Fur activation mechanisms may bedifferent in different species

Disentangling Fur models

Each binding site that we have studied in detail withinformation theory has had unique properties andchallenges for analysis For example ribosome bindingsites were initially modeled as rigid objects (16) but whenenough sites became available it was clear that theShinendashDalgarno sequence was being lost because it has avariable distance to the initiation codon (67) and a flexiblemodel was required (48) The flexible model was latersuccessfully applied to s70 promoters (47) and that wasused to study Fur sites Likewise information theoryanalysis revealed that Fis binding sites are commonlyfound to overlap each other (68) leading to someuncertainty as to how much nearby sites influence thesites being studied in the alignments Overlapping Fursites that are bound by Fur protein dimers create an evenmore complex situation

Although the 24-site M9 model was quite successful themodel is comprised of information from multiple Fur sitesbecause the Fur dimers overlap on the DNA Since theclusters frequently display 6-base separation of sitespositions 3 to thorn9 of the model contain data frompositions 9 to thorn3 of the adjacent Fur sites eg see thelogo of Figure 4 and the M9 walkers of Figure 2AIn addition information from sites separated by 3 basesand the complements of all of these also contribute to theoverall information content of the model It is not obvioushow to disentangle these effects We suggest that the M9model can be scanned across the genome to locate strongsingle-dimer binding sites which could then be experi-mentally confirmed and used to construct a disentangledmodel

Unfortunately the distribution of the individual infor-mation in these sites would be determined by the modelused to locate them and this could introduce biases Forexample the mean of the distribution can be altered justby selecting different subsets of sites from the distribution

A similar bias has occurred in binding site databaseswhere the consensus sequence was used to locate newsites (23) Thus although the sequence logo would nolonger contain self-overlapping data it would not neces-sarily represent the natural distribution of site strengthsTo approach such an ultimate model may require furtherscanning of footprinted regions Incorporation of theweaker sites in the footprints would of course re-entanglethe model so such scans might be best used only todetermine whether the model can detect the weaker sitesA viable solution may be to obtain all single-dimerbinding sites in the genome that have more than zero bitsand can be confirmed experimentally If the resultingmodel matches the observed footprints it may be consi-dered complete On the other hand if the resultingmodel does not match the footprints the mode of bindingby Fur could be different between single dimers andclusters

SUPPLEMENTARY DATA

Supplementary data are available at NAR Online

ACKNOWLEDGEMENTS

We thank Tom OrsquoHalloran and Caryn Outten for Furprotein Pete Rogan for helping to develop the local bestconcept Elaine Bucheimer for writing the localbestprogram and Lakshmanan Iyer Brent Jewett DanielleNeedle Xiao Ma Shu Ouyang Denise Rubens andBruce Shapiro for their helpful comments and discussionsKAL also thanks the National Cancer Institute atFrederick for sponsoring the Werner H Kirsten StudentIntern Program This publication has been funded in partwith Federal funds from the National Cancer InstituteNational Institutes of Health under contract NO1-CO-12400 The content of this publication does not necessarilyreflect the views or policies of the Department of Healthand Human Services nor does mention of trade namescommercial products or organizations imply endorsementby the US Government This research was supported inpart by the Intramural Research Program of the NIHNational Cancer Institute Center for Cancer ResearchFunding to pay the Open Access publication charges forthis article was provided by National Cancer Institute

Conflict of interest statement None declared

REFERENCES

1 StojiljkovicI BaumlerAJ and HantkeK (1994) Fur regulon ingram-negative bacteria Identification and characterization of newiron-regulated Escherichia coli genes by a fur titration assay[published erratum appears in J Mol Biol 1994 Jul 15240(3)271]J Mol Biol 236 531ndash545

2 Le CamE FrechonD BarrayM FourcadeA and DelainE(1994) Observation of binding and polymerization of Fur repressoronto operator-containing DNA with electron and atomic forcemicroscopes Proc Natl Acad Sci USA 91 11816ndash11820

3 CalderwoodSB and MekalanosJJ (1987) Iron regulation ofShiga-like toxin expression in Escherichia coli is mediated by thefur locus J Bacteriol 169 4759ndash4764

Nucleic Acids Research 2007 Vol 35 No 20 6775

4 NiederhofferEC NaranjoCM BradleyKL and FeeJA (1990)Control of Escherichia coli superoxide dismutase (sodA and sodB)genes by the ferric uptake regulation (fur) locus J Bacteriol 1721930ndash1938

5 EscolarL Perez-MartinJ and de LorenzoV (1998) Binding of theFur (Ferric Uptake Regulator) repressor of Escherichia coli toarrays of the GATAAT sequence J Mol Biol 283 537ndash547

6 BaggA and NeilandsJB (1987) Ferric uptake regulation proteinacts as a repressor employing iron (II) as a cofactor to bind theoperator of an iron transport operon in Escherichia coliBiochemistry 26 5471ndash5477

7 de LorenzoV WeeS HerreroM and NeilandsJB (1987)Operator sequences of the aerobactin operon of plasmid ColV-K30binding the ferric uptake regulation (fur) repressor J Bacteriol169 2624ndash2630

8 AlthausEW OuttenCE OlsonKE CaoH andOrsquoHalloranTV (1999) The ferric uptake regulation (Fur) repressoris a zinc metalloprotein Biochemistry 38 6559ndash6569

9 PohlE HallerJC MijovilovichA Meyer-KlauckeWGarmanE and VasilML (2003) Architecture of a protein centralto iron homeostasis crystal structure and spectroscopic analysis ofthe ferric uptake regulator Mol Microbiol 47 903ndash915

10 KammlerM SchonC and HantkeK (1993) Characterization ofthe ferrous iron uptake system of Escherichia coli J Bacteriol 1756212ndash6219

11 DesaiPJ AngererA and GencoCA (1996) Analysis of Furbinding to operator sequences within the Neisseria gonorrhoeae fbpApromoter J Bacteriol 178 5020ndash5023

12 HeidrichC HantkeK BierbaumG and SahlHG (1996)Identification and analysis of a gene encoding a Fur-likeprotein of Staphylococcus epidermidis FEMS Microbiol Lett 140253ndash259

13 McHughJP Rodriguez-QuinonesF Abdul-TehraniHSvistunenkoDA PooleRK CooperCE and AndrewsSC(2003) Global iron-dependent gene regulation in Escherichia coliA new mechanism for iron homeostasis J Biol Chem 27829478ndash29486

14 ShannonCE (1948) A mathematical theory of communicationBell Syst Tech J 27 379ndash423 623ndash656 httpcmbell-labscomcmmswhatshannondaypaperhtml

15 PierceJR (1980) An Introduction to Information Theory SymbolsSignals and Noise 2nd edn Dover Publications Inc New York

16 SchneiderTD StormoGD GoldL and EhrenfeuchtA (1986)Information content of binding sites on nucleotide sequencesJ Mol Biol 188 415ndash431 httpwwwccrnpncifcrfgov~tomspaperschneider1986

17 SchneiderTD and MastronardeD (1996) Fast multiple alignmentof ungapped DNA sequences using information theory and arelaxation method Discrete Appli Math 71 259ndash268 httpwwwccrnpncifcrfgov~tomspapermalign

18 SchneiderTD and StephensRM (1990) Sequence logos a newway to display consensus sequences Nucleic Acids Res 186097ndash6100 httpwwwccrnpncifcrfgov~tomspaperlogopaper

19 SchneiderTD (1997) Information content of individual geneticsequences J Theor Biol 189 427ndash441 httpwwwccrnpncifcrfgov~tomspaperri

20 SchneiderTD (1997) Sequence walkers a graphical method todisplay how binding proteins interact with DNA or RNAsequences Nucleic Acids Res 25 4408ndash4415 httpwwwccrnpncifcrfgov~tomspaperwalker erratum NAR 26(4)1135 1998

21 PaninaEM MironovAA and GelfandMS (2001) Comparativeanalysis of FUR regulons in gamma-proteobacteria Nucleic AcidsRes 29 5195ndash5206

22 SchneiderTD (2000) Evolution of biological informationNucleic Acids Res 28 2794ndash2799 httpwwwccrnpncifcrfgov~tomspaperev

23 VyhlidalCA RoganPK and LeederJS (2004) Development andrefinement of pregnane X receptor (PXR) DNA binding site modelusing information theory insights into PXR-mediated gene regula-tion J Biol Chem 279 46779ndash46786

24 ZhengM DoanB SchneiderTD and StorzG (1999) OxyR andSoxRS regulation of fur J Bacteriol 181 4639ndash4643 httpwwwccrnpncifcrfgov~tomspaperoxyrfur

25 HengenPN BartramSL StewartLE and SchneiderTD (1997)Information analysis of Fis binding sites Nucleic Acids Res 254994ndash5002 httpwwwccrnpncifcrfgov~tomspaperfisinfo

26 WoodTI GriffithKL FawcettWP Jair K-W SchneiderTDand WolfRE (1999) Interdependence of the position andorientation of SoxS binding sites in the transcriptional activation ofthe class I subset of Escherichia coli superoxide-inducible promotersMol Microbiol 34 414ndash430

27 ZhengM WangX DoanB LewisKA SchneiderTD andStorzG (2001) Computation-directed identification of OxyR-DNAbinding sites in Escherichia coli J Bacteriol 183 4571ndash4579

28 RoganPK FauxBM and SchneiderTD (1998) Informationanalysis of human splice site mutations Hum Mutat 12 153ndash171Erratum in Hum Mutat 199913(1)82 httpwwwccrnpncifcrfgov~tomspaperrfs

29 ChenZ and SchneiderTD (2005) Information theory basedT7-like promoter models classification of bacteriophages anddifferential evolution of promoters and their polymerasesNucleic Acids Res 33 6172ndash6187 httpwwwccrnpncifcrfgov~tomspaperst7like

30 ChenZ and SchneiderTD (2006) Comparative analysis of tandemT7-like promoter containing regions in enterobacterial genomesreveals a novel group of genetic islands Nucleic Acids Res 341133ndash1147 httpwwwccrnpncifcrfgov~tomspaperst7island

31 SchneiderTD (1996) Reading of DNA sequence logos predictionof major groove binding by information theory Meth Enzymol274 445ndash455 httpwwwccrnpncifcrfgov~tomspaperoxyr

32 GriggsDW and KoniskyJ (1989) Mechanism for iron-regulatedtranscription of the Escherichia coli cir gene metal-dependentbinding of Fur protein to the promoters J Bacteriol 1711048ndash1052

33 AngererA and BraunV (1998) Iron regulates transcription of theEscherichia coli ferric citrate transport genes directly and throughthe transcription initiation proteins Arch Microbiol 169 483ndash490

34 de LorenzoV HerreroM GiovanniniF and NeilandsJB (1988)Fur (ferric uptake regulation) protein and CAP (catabolite-activatorprotein) modulate transcription of fur gene in Escherichia coliEur J Biochem 173 537ndash546

35 HuntMD PettisGS and McIntoshMA (1994) Promoter andoperator determinants for fur-mediated iron regulation in thebidirectional fepA-fes control region of the Escherichia colienterobactin gene system J Bacteriol 176 3944ndash3955

36 TardatB and TouatiD (1993) Iron and oxygen regulation ofEscherichia coli MnSOD expression competition between the globalregulators Fur and ArcA for binding to DNA Mol Microbiol 953ndash63

37 EscolarL Perez-MartinJ and de LorenzoV (2000) Evidence ofan unusually long operator for the fur repressor in the aerobactinpromoter of Escherichia coli J Biol Chem 275 24709ndash24714

38 YoungGM and PostleK (1994) Repression of tonB transcriptionduring anaerobic growth requires Fur binding at the promoterand a second factor binding upstream Mol Microbiol 11943ndash954

39 BrickmanTJ OzenbergerBA and McIntoshMA (1990)Regulation of divergent transcription from the iron-responsivefepB-entC promoter-operator regions in Escherichia coliJ Mol Biol 212 669ndash682

40 BaichooN and HelmannJD (2002) Recognition of DNA by Fura reinterpretation of the Fur box consensus sequence J Bacteriol184 5826ndash5832

41 OchsnerUA and VasilML (1996) Gene repression by the ferricuptake regulator in Pseudomonas aeruginosa cycle selection of iron-regulated genes Proc Natl Acad Sci USA 93 4409ndash4414

42 BaichooN WangT YeR and HelmannJD (2002) Globalanalysis of the Bacillus subtilis Fur regulon and the iron starvationstimulon Mol Microbiol 45 1613ndash1629

43 ChristoffersenCA BrickmanTJ Hook-BarnardI andMcIntoshMA (2001) Regulatory architecture of the iron-regulatedfepD-ybdA bidirectional promoter region in Escherichia coliJ Bacteriol 183 2059ndash2070

44 ChaiS WelchTJ and CrosaJH (1998) Characterization of theinteraction between Fur and the iron transport promoter of thevirulence plasmid in Vibrio anguillarum J Biol Chem 27333841ndash33847

6776 Nucleic Acids Research 2007 Vol 35 No 20

45 EscolarL Perez-MartinJ and de LorenzoV (1998) Coordinatedrepression in vitro of the divergent fepA-fes promoters ofEscherichia coli by the iron uptake regulation (Fur) proteinJ Bacteriol 180 2579ndash2582

46 EscolarL de LorenzoV and Perez-MartinJ (1997)Metalloregulation in vitro of the aerobactin promoter of Escherichiacoli by the Fur (ferric uptake regulation) protein Mol Microbiol26 799ndash808

47 ShultzabergerRK ChenZ LewisKA and SchneiderTD (2007)Anatomy of Escherichia coli s70 promoters Nucleic Acids Res 35771ndash788 httpwwwccrnpncifcrfgov~tomspaperflexprom

48 ShultzabergerRK BucheimerRE RuddKE andSchneiderTD (2001) Anatomy of Escherichia coli ribosomebinding sites J Mol Biol 313 215ndash228 httpwwwccrnpncifcrfgov~tomspaperflexrbs

49 HiraoI KawaiG YoshizawaS NishimuraY IshidoYWatanabeK and MiuraK (1994) Most compact hairpin-turnstructure exerted by a short DNA fragment d(GCGAAGC) insolution an extraordinarily stable structure resistant to nucleasesand heat Nucleic Acids Res 22 576ndash582

50 Eick-HelmerichK and BraunV (1989) Import of biopolymers intoEscherichia coli nucleotide sequences of the exbB and exbD genesare homologous to those of the tolQ and tolR genes respectivelyJ Bacteriol 171 5117ndash5126

51 VassinovaN and KozyrevD (2000) A method for direct cloning ofFur-regulated genes identification of seven new Fur-regulated lociin Escherichia coli Microbiology 146 3171ndash3182

52 ToledanoMB KullikI TrinhF BairdPT SchneiderTD andStorzG (1994) Redox-dependent shift of OxyR-DNA contactsalong an extended DNA binding site a mechanism for differentialpromoter selection Cell 78 897ndash909

53 BlattnerFR PlunkettG III BlochCA PernaNT BurlandVRileyM Collado-VidesJ GlasnerJD RodeCK et al (1997)The complete genome sequence of Escherichia coli K-12 Science277 1453ndash1474

54 BaggA and NeilandsJB (1987) Molecular mechanism of regula-tion of siderophore-mediated iron assimilation Microbiol Rev 51509ndash518

55 LavrrarJL and McIntoshMA (2003) Architecture of a Furbinding site a comparative analysis J Bacteriol 185 2194ndash2202

56 OrchardK and MayGE (1993) An EMSA-based method fordetermining the molecular weight of a protein ndash DNA complexNucleic Acids Res 21 3335ndash3336

57 MasseE and GottesmanS (2002) A small RNA regulates theexpression of genes involved in iron metabolism in Escherichia coliProc Natl Acad Sci USA 99 4620ndash4625

58 LavrrarJL ChristoffersenCA and McIntoshMA (2002)FurndashDNA interactions at the bidirectional fepDGC-entS promoterregion in Escherichia coli J Mol Biol 322 983ndash995

59 DelanyI RappuoliR and ScarlatoV (2004) Fur functions as anactivator and as a repressor of putative virulence genes in Neisseriameningitidis Mol Microbiol 52 1081ndash1090

60 MasseE VanderpoolCK and GottesmanS (2005) Effect ofRyhB small RNA on global iron use in Escherichia coliJ Bacteriol 187 6962ndash6971

61 ArgamanL HershbergR VogelJ BejeranoG WagnerEGMargalitH and AltuviaS (2001) Novel small RNA-encodinggenes in the intergenic regions of Escherichia coli Curr Biol 11941ndash950

62 WassarmanKM RepoilaF RosenowC StorzG andGottesmanS (2001) Identification of novel small RNAs usingcomparative genomics and microarrays Genes Dev 15 1637ndash1651

63 RobisonK McGuireAM and ChurchGM (1998) A compre-hensive library of DNA-binding site matrices for 55 proteins appliedto the complete Escherichia coli K-12 genome J Mol Biol 284241ndash254

64 SalgadoH Santos-ZavaletaA Gama-CastroS Millan-ZarateDDiaz-PeredoE Sanchez-SolanoF Perez-RuedaEBonavides-MartinezC and Collado-VidesJ (2001) RegulonDB(version 32) transcriptional regulation and operon organization inEscherichia coli K-12 Nucleic Acids Res 29 72ndash74

65 AndrewsSC RobinsonAK and Rodriguez-QuinonesF (2003)Bacterial iron homeostasis FEMS Microbiol Rev 27 215ndash237

66 SchneiderTD (2002) Consensus Sequence Zen ApplBioinformatics 1 111ndash119 httpwwwccrnpncifcrfgov~tomspaperszen

67 RuddKE and SchneiderTD (1992) Compilation of E coliribosome binding sites In MillerJH (ed) A Short Course inBacterial Genetics A Laboratory Manual and Handbook forEscherichia coli and Related Bacteria Cold Spring HarborCold Spring Harbor Laboratory Press New York pp 1719ndash1745

68 HengenPN LyakhovIG StewartLE and SchneiderTD(2003) Molecular flip-flops formed by overlapping Fis sitesNucleic Acids Res 31 6663ndash6673

69 FeeJA (1991) Regulation of sod genes in Escherichia colirelevance to superoxide dismutase function Mol Microbiol 52599ndash2610

70 IgarashiK SaishoT YuguchiM and KashiwagiK (1997)Molecular mechanism of polyamine stimulation of thesynthesis of oligopeptide-binding protein J Biol Chem 2724058ndash4064

71 YoshidaM MeksuriyenD KashiwagiK KawaiG andIgarashiK (1999) Polyamine stimulation of the synthesis ofoligopeptide-binding protein (OppA) Involvement of a structuralchange of the Shine-Dalgarno sequence and the initiation codonaug in oppa mRNA J Biol Chem 274 22723ndash22728

72 LitwinCM and CalderwoodSB (1993) Role of iron in regulationof virulence genes Clin Microbiol Rev 6 137ndash149

73 MakuiH RoigE ColeST HelmannJD GrosP andCellierMF (2000) Identification of the Escherichia coli K-12Nramp orthologue (MntH) as a selective divalent metal iontransporter Mol Microbiol 35 1065ndash1078

74 PatzerSI and HantkeK (2001) Dual repression by Fe(2+)-Furand Mn(2+)-MntR of the mntH gene encoding an NRAMP-likeMn(2+) transporter in Escherichia coli J Bacteriol 1834806ndash4813

75 FurrerJL SandersDN Hook-BarnardIG and McIntoshMA(2002) Export of the siderophore enterobactin in Escherichia coliinvolvement of a 43 kDa membrane exporter Mol Microbiol 441225ndash1234

76 CunninghamL GruerMJ and GuestJR (1997) Transcriptionalregulation of the aconitase genes (acnA and acnB) of Escherichiacoli Microbiology 143 ( Pt 12) 3795ndash3805

77 FuentesAM Diaz-MejiaJJ Maldonado-RodriguezR andAmabile-CuevasCF (2001) Differential activities of theSoxR protein of Escherichia coli SoxS is not required for geneactivation under iron deprivation FEMS Microbiol Lett 201271ndash275

78 VargheseS TangY and ImlayJA (2003) Contrasting sensitivitiesof Escherichia coli aconitases A and B to oxidation and irondepletion J Bacteriol 185 221ndash230

79 ZavitzKH DiGateRJ and MariansKJ (1991) The PriB andPriC replication proteins of Escherichia coli Genes DNAsequence overexpression and purification J Biol Chem 26613988ndash13995

80 SandlerSJ (2005) Requirements for replication restart proteinsduring constitutive stable DNA replication in Escherichia coli K-12Genetics 169 1799ndash1806

81 SchneiderTD (2001) Strong minor groove base conservation insequence logos implies DNA distortion or base flipping duringreplication and transcription initiation Nucleic Acids Res 294881ndash4891 httpwwwccrnpncifcrfgov~tomspaperbaseflip

Nucleic Acids Research 2007 Vol 35 No 20 6777

Page 10: Discovery of Fur binding site clusters in Escherichia coli ...biitcomm/research/references... · using an fhuF:lacZ fusion and Fur consensus sequence-containing plasmid titrant on

A B

L

H

H

L

H fhuF1

L fhuF2

yjjZ

4603880 4603870 4603860 4603850 4603840 4603830 4603820 5 a t c a g c a a t c c c g g c a g c a a c a c t c c c c a g c c a c t g c c c a g c g t a c g t t g c a a c a t g a t t t c a t c t c 3

4603810 4603800 4603790 4603780 4603770 4603760 4603750 5 t t t c a t t g a t a a t g a t a a c c a a t a t c a t a t g a t a a t t t t t a t c a t t t g c a a g c c a g a t a a a t c c c t t g c t a t c g g 3

Fur high affinity topFur high affinity bottom

M9 161 bits

M9 208 bits

M9 223 bits

M9 34 bits

fhuF

fhuF

4603740 4603730 4603720 4603710 4603700 4603690 4603680 5 g t a a a c c t a t c g c t a t g a t t a g c a a t c a t t a t c a t t t a g a t t a c t a t c c c g a t t a t g g c c t a t c g t t c 3rsquo

Fur low affinity topFur low affinity bottom

M9 22 bits

M9 01 bits

M9 146 bits

M9 93 bits

Figure 5 Fur model scan and DNase I footprints of the fhuF promoter region (A) The promoter region was scanned with the 24-site M9 model and two clusters of sequence walkers (Rigt 0bits) were detected with each closely corresponding to one of the two Fur protected regions (gray bars fhuF1 and fhuF2) The solid black arrow with two tails above the DNA at coordinates4603717 and 4603716 indicates the fhuF transcription starts The open arrows starting at coordinates 4603827 and 4603686 indicate the yjjZ and fhuF translation starts respectively Thesaturation of colored rectangles behind each sequence walker is proportional to the information of that site For reference colored lines cycle with 6-base periodicity (B) DNase I footprinting onthe fhuF promoter by Fur showed two regions protected by the protein marked in the figure by brackets The high-affinity and low-affinity regions are marked with H and L respectively Thefootprinting samples were run in parallel with MaxamndashGilbert sequencing ladders (marked by GA)

Nucleic

Acid

sResea

rch2007V

ol3

5N

o20

6771

consensus sequence Thus four of the five Sau3Afragments (yhhX gpmA nohA and fhuF) selected to haveFur binding sites in vivo were identified in our genomescan (Table 4) and our gel shift results confirmed this(Figure 3) the remaining site ygaC had a weak walker of14 bits (by the 11-site M9 model) and showed extremelyweak binding (Figure 3)Eick-Helmerich and Braun matched a Fur consensus

sequence to the promoters of exbB and exbD (50)One strong walker of 191 bits was found in the exbBpromoter region overlapping the lsquoFur boxrsquo The regionupstream of exbD showed no walkers with Ri greater than0 bits even though it had also been predicted to containa Fur binding site using the same consensus sequenceas used in the exbB promoter The highest walker thatwe could detect in the region upstream of the exbDgene is 71 bits (by the 24-site M9 model Figure 3 andSupplementary Figure S1) significantly lower than thetheoretical lower bound of a binding site (0 bits) (19) Thisindicates that Fur should not bind this region and our gelshift results confirmed this (Figure 3)A microarray analysis by McHugh et al found 101

genes to be regulated by Fur 53 of which were repressedand 48 activated (13) The 53 Fur-repressed genes belongto 32 transcription units (individual genes or operons)18 of which are involved in iron metabolism 14 in energymetabolism and other functions Using the 24-site M9model we predicted strong Fur binding clusters (higherthan 13 bits) for all 18 iron metabolism-related transcrip-tion units Of the other 14 transcription units onlytwo have a Fur binding site higher than 10 bits (nrdHIEF101 bits fimE 108 bits) The 48 Fur-activated genesbelong to 34 transcription units only three of these(garPLRK ynaE and ydfK) have a predicted Fur bindingcluster The garPLRK operon has a weak Fur clusterthat contains one major site of 104 bits The ynaE andydfK genes are almost identical in both promoter andcoding regions and the two Fur clusters (171 bitscentered at thorn8 from the translational start) are the same(Table 4) These results strongly suggest that in E coli

Fur is a direct repressor of iron metabolism genes and anindirect repressor or activator of other genes

McHugh et al (13) used an altered information theoryapproach for modeling Fur binding (63) to identify whichgenes are under direct Fur control Our model identifiedsites in all genes that their model did as well as fourothers fhuE (124 bits at 1160880) exbB (191 bits at3150076) fimE (108 bits at 4539697) and ydfK (171 bitsat 1631071)

The indirect activation by Fur requires an intermediateregulator which could directly target Fur-activated genesA small regulatory RNA gene ryhB is one such importantplayer Earlier study revealed several genes that aredirectly repressed by ryhB (57) Recently a microarrayanalysis of ryhB control by Masse et al (60) expanded thisdirect target set to 18 operons They also identified anadditional 10 indirectly down-regulated operons and 10up-regulated operons The 10 ryhB indirectly repressedoperons are directly repressed by Fur and for all ofthese operons we found strong Fur binding clusters(higher than 13 bits) in the corresponding promoterregions In contrast only a few of the other 28 ryhB-controlled operons (18 directly repressed operons and10 up-regulated operons) have a predicted Fur bindingcluster above 10 bits These include acnA (176 bits at1333484) ydhD (109 bits at 1732367) and oppA (215 bitsat 1298973) The acnA and oppA Fur clusters are bothstrong the former cluster has three overlapping sites of176 98 and 142 bits and the latter has two overlapp-ing sites of 215 and 96 bits (Table 4) Howevermicroarray analysis did not find these two genes to berepressed by Fur (1360) suggesting that the Fur bindingclusters may not be the primary control elements ofthese genes

DISCUSSION

Fur binding models

In this study we used experimentally proven sites to createinformation theory models of Fur binding Our models

Table 3 Scan of footprinted Fur binding sites in E coli

Accession Gene Footprinted region References Number Ri Percentage

NC_000913 fhuF1 4603763 ndash 4603815 This work 4 223 91NC_000913 fhuF2 4603680 ndash 4603733 This work 4 146 66NC_000913 fepA-fes 611868 ndash 611915 (35) 4 187 77NC_000913 fepB 623939 ndash 623969 (39) 2 167 81NC_000913 entC 624054 ndash 624102 (39) 4 231 76NC_000913 fur 709931 ndash 709960 (34) 2 123 83NC_000913 tonB 1309040 ndash 1309072 (838) 3 115 88NC_000913 cir 2244957 ndash 2244999 (32) 3 207 72NC_000913 sodA 4098726 ndash 4098780 (36) 3 233 56NC_000913 fecA 4514752 ndash 4514789 (33) 2 176 66NC_000913 fecI 4516300 ndash 4516340 (33) 7 231 144M10930 iucA 291 ndash 393 (37) 13 203 93NC_000913 fepD-entS 621394 ndash 621462 (58) 4 201 54

The footprinted regions were scanned with the 24-site M9 model (Rigt0) to test the modelrsquos validity This table lists the GenBank accession numberthe gene promoter that was footprinted the coordinates of the footprinted region the references for the footprints the number of walkers in theregion the strongest Ri value in the same region and the percent of the footprint covered by sequence walkers This table is a summary of Figure 5Aand Supplementary Figure S8 Sites in the promoters that are marked by an asterisk were used to build the 11-site models (Figure 1 SupplementaryFigure S1) in the case of iucA two sites were used in the model (see Materials and Methods Section)

6772 Nucleic Acids Research 2007 Vol 35 No 20

especially the 24-site M9 model approximate the bindingcharacteristics of Fur more fully than previous modelsthat depended on conventional consensus sequences datafrom multiple species and sequences that were not foot-printed (13251) The rigorous approach used hererevealed new binding sites disproved two sites predictedby a consensus sequence and clarified the manner inwhich the protein binds

To produce these models binding sequences weremultiply aligned to maximize the information content indifferent window sizes Because multiple Fur sites overlapour analysis of proven Fur binding sequences from three

bacterial species gave two or three different alignmentsdepending on the species (Figure 1 SupplementaryFigure S6) By testing these models against publisheddata we showed clearly that the M9 model representssingle-dimer binding sites the M12 model is for two-dimerbinding while the compact M6 alignment was caused bycompression of the binding sequences into a small window(Tables 1 and 2 Figures 1 and 2 Supplementary FiguresS3 S5 and S7)As evaluated by their respective M9 models our

initial set of 11 footprinted E coli Fur binding sitesall contain overlapping sites of 6-base spacing

Table 4 Strong Fur clusters in the E coli genome predicted by the 24-site M9 model

Ri Number Size Coordinate Genes Experimental Position References(bits) (bp) data

274 12 95 2066611 yoeA G 48 NA274 5 31 786853 gpmA G 35 (1351)253 4 28 579824 nohB 133 (1)252 4 31 3579054 ryhB G 15 (57)233 3 31 4098746 sodA F 87 (6069)231 7 59 4516309 fecI F 51 (1360)231 4 37 624072 entC F 36 (39)228 2 25 1903412 yebN 300 NA223 4 48 4603779 fhuFa FG 93 (135160)223 3 25 1634624 nohA G 233 (51)217 2 25 1886651 fadD G +1119b NA215 2 25 1298973 oppA G 233 (7071)213 3 31 1762463 sufA 53 (1360)207 3 31 2244980 cir F 189 (1372)206 5 40 1118358 yceJ 89 NA202 3 31 3465053 bfd 40 (1360)201 10 53 1787602 ydiE 35 (13)201 2 35 1080513 ycdN 57 (1360)199 8 106 3214572 yqjH 59 (13)199 7 103 1752756 ydhY 255 NA199 2 25 2510784 mntH 56 (7374)196 3 25 2231892 yeiT 163 NA196 2 25 1577358 yddA +8c (13)195 3 31 167439 fhuA G 45 (1360)194 4 37 621437 fepD ndash entS FG 25 86 (1375)193 3 52 675768 ybeQ +2c NA191 3 31 3150076 exbB G 70 (136072)187 8 96 2784036 ygaQ 383 NA187 4 37 611889 fepA ndash fes F 172 149 (13)186 7 38 3266404 yhaC 33 NA186 4 28 331901 yahA G +306e NA183 4 61 331085 yahA 510e NA176 6 41 1333484 acnA 371 (5776ndash78)176 2 25 4514779 fecA F 79 (1360)176 1 19 2254041 yeiL +5d NA175 3 25 490057 priC ndash ybaN 21 49 (7980)173 2 29 2454142 yfcV 474 NA172 8 41 3582772 yrhB 10 NA172 4 37 840877 fiu 123 (13)171 3 40 1631071 ydfK +8c (13)171 3 40 1432273 ynaE +8c (13)171 2 25 4570225 yjiT 212 NA

The strongest individual information value (Ri bits) in each cluster is shown followed by the number of walkers (Rigt 0 bits) the size of the clusterthe coordinate of the strongest site the genes that are presumably controlled experimental data footprinted (F) andor gel shifted (G) sites thelocation of the strongest Fur site relative to the gene start or stop codon (as noted) and references about Fur regulation of the genes (NA means noreference is available)aThis cluster corresponds to the high-affinity region (fhuF1) in the fhuF promoter (Figure 5)bThe coordinate is high because Fur cluster is inside the fadD coding regioncThese Fur clusters overlap with translational startsdThis Fur cluster overlaps with the stop codoneThe yahA gene has two Fur clusters one is inside the coding region another is 510 bases upstream of the translational start The Fur cluster insidethe gene was tested by gel shift experiments (Figure 3)

Nucleic Acids Research 2007 Vol 35 No 20 6773

(Supplementary Figure S8) while only 7 of the 20P aeruginosa Fur binding sites and 6 of the 22 B subtilisFur binding sites are single-dimer binding sites (data notshown) This scarcity of multiple sites explains why anM12 alignment can be obtained for E coli but not forP aeruginosa or B subtilis When we performed multiplealignment by only using overlapping sites of 6-bp spacingwe obtained M12 alignments for both P aeruginosa andB subtilis (data not shown) These observations alsosupport the use of M9 as a single-dimer modelOur multiple alignment method should apply to any

DNA binding sites that contain internal direct repeats asrepetition within binding sites may cause alternativealignments of the sites In the RegulonDB (httpregulondbccgunammx) (64) a Fur binding modelwas created by multiply aligning 47 Fur sites althoughit is not clear what these 47 sites are and how thealignment was made Based on their alignment (httpregulondbccgunammxhtmlmatrix_Alignmentjsp) we made a sequence logo (data notshown) which we found to be similar to our M6 logosAlthough the M6 models appear to be able to predictsome sites (eg Tables 1 and 2 Supplementary Figures S3S5 and S7) the models only represent the internal partof overlapping Fur binding sites ie from 6 to thorn6 of theM12 model (Figure 1A)A 18 A crystal structure for P aeruginosa Fur

was obtained by Pohl et al and based on this theauthors proposed a single-dimer binding model (9) TheP aeruginosa crystal structure contains a putative DNAbinding a-helix H4 and a loop between helices H1 and H2both of which were proposed to be involved in bindingDNA through contacting bases in the major groove Thusa single dimer would protect two consecutive majorgrooves on the same face of DNA and the minimalrecognition unit of Fur should be close to 20 bp (9) OurM9 binding site models (19 bp a 9-1-9 inverted repeat) fitwith this single-dimer binding model closely Pohl et alalso proposed a two-dimer binding model in whichtwo Fur dimers bind two overlapping sites that are 5-bpapart (9) However in our whole genome scans andfootprint scans as well we did not observe any case of5-bp separated overlapping sites above 5 bits Instead wemostly found 3- or 6-bp separated overlapping sitesAbove five bits we found 95 cases of 3-bp spacing and92 cases of 6-bp spacingSeveral other consensus-based Fur binding site models

have been proposed to interpret a 19-bp consensuslsquoFur boxrsquo (Figure 6) (65) Within these two earlier

models the classical model and hexamer model(573754) interpreted the lsquoFur boxrsquo as a single recogni-tion unit of a Fur dimer Later Lavrrar and McIntoshsuggested that a 13-bp inverted repeat (6-1-6) is theminimal unit recognized by a single Fur dimer and thattwo overlapping lsquo6-1-6rsquo motifs correspond to the Fur boxwhich is required for high-affinity binding of Fur (5558)Baichoo and Helmann reinterpreted the Fur box as twooverlapping 7-1-7 inverted repeats with a 6-bp spacingand suggested that a 7-1-7 site but not 6-1-6 representsthe minimal recognition unit of Fur (40) Baichoo andHelmann also demonstrated that high-affinity bindingby Fur can happen on a single-dimer binding site ie aninverted 7-1-7 site

The 7-1-7 model basically agrees with our M9 models(Figure 6) The main difference is that the 7-1-7 model didnot count the weakly conserved bases at positions 9 8thorn8 and thorn9 in the B subtilis Fur binding site alignment(Figure 4C) As the E coli and P aeruginosa M9 modelshave highly conserved bases at these four positions(Figure 4A and B) and also the Fur binding mechanismsfor these three species are similar (9) we suggest thata minimal Fur binding unit of 19 bp (9-1-9) as representedby our M9 models is more reasonable

Relative to our M9 models the Fur box only representsthe internal part of two overlapping sites that areseparated by 6 bases ie from 9 to thorn9 of the M12logo (Figures 1A and 6) thus any predictions based on theFur box may not be accurate Two sites (exbD and ygaC)that were predicted by matching to the Fur box wereshown to be non-binding (Figure 3)

Many difficulties in understanding Fur binding sites canbe attributed to the choice of consensus sequences asa model (66) In contrast to the individual informationweight matrix model the consensus sequence methodignores the varying importance of bases by treatingmismatches equivalently In addition the consensusmethod does not have a criterion for an acceptablenumber of mismatches

Sequence walkers predicted by the 24-site M9 modelcover 83 of the footprints (Table 3 Figure 5Supplementary Figure S8) Similar results were obtainedfor both the P aeruginosa and B subtilis M9 models(data not shown) These results strongly suggest that ourM9 models can accurately predict Fur binding

Fur Binding Site Clusters

The whole genome scan with the 24-site M9model revealed363 Fur binding clusters in E coli most of which containmultiple walkers (up to 14) covering regions up to about140 bases This is comparable to the size of knownfootprints which ranges from 30 to 103 bases (Table 3Supplementary Figure S8) Most of the clusters (303 outof 363 83) were found to overlap with one or morestrong s70 promoters Indeed the four Fur clusters inthe middle of coding regions gspC garP yahA and fadDthat we tested by gel mobility shift assays (Figure 3) allhave one or more potential s70 sites (Ri gt 4 bits) exactlyoverlapping them (see Figures S10ndashS13) The function ofthese control elements is unknown

Fur boxClassical model 9-1-9Hexamer model 6-6-1-6Lavrrar model 6-1-6

Baichoo model 7-1-7

M9 model 9-1-9

GATAATGATAATCATTATC

========gtOlt========

=====gt=====gtOlt=====

=====gtOlt==========gtOlt=====

t=====gtOlt============gtOlt=====a

aat=====gtOlt================gtOlt=====att

Figure 6 Different Fur binding models to interpret the Fur box

6774 Nucleic Acids Research 2007 Vol 35 No 20

We correlated our whole genome scan with publishedmicroarray data about Fur in E coli (1360) The resultsshowed that all operons that are involved in ironmetabolism have a predicted strong Fur binding cluster(higher than 13 bits) suggesting direct repression ofthese genes by Fur In contrast only a few of otherFur-activated or repressed operons have a predicted Furcluster higher than 10 bits suggesting indirect control ofthese genes by Fur It has been shown that RNA generyhB directly targets and represses a set of other genes andryhB itself is directly repressed by Fur thus ryhB playsan important role in indirect activation of genes by Furin E coli (135760)

It has been suggested that in N meningitidis Fur canalso directly activate genes by binding to the regionupstream of the promoters (59) However our wholegenome analysis did not support this activation mechan-ism in E coli The E coli ryhB gene is located betweengenes yhhX and yhhY No corresponding homologs of thegenes ryhB yhhX or yhhY can be found in N meningitidisThis suggests that Fur activation mechanisms may bedifferent in different species

Disentangling Fur models

Each binding site that we have studied in detail withinformation theory has had unique properties andchallenges for analysis For example ribosome bindingsites were initially modeled as rigid objects (16) but whenenough sites became available it was clear that theShinendashDalgarno sequence was being lost because it has avariable distance to the initiation codon (67) and a flexiblemodel was required (48) The flexible model was latersuccessfully applied to s70 promoters (47) and that wasused to study Fur sites Likewise information theoryanalysis revealed that Fis binding sites are commonlyfound to overlap each other (68) leading to someuncertainty as to how much nearby sites influence thesites being studied in the alignments Overlapping Fursites that are bound by Fur protein dimers create an evenmore complex situation

Although the 24-site M9 model was quite successful themodel is comprised of information from multiple Fur sitesbecause the Fur dimers overlap on the DNA Since theclusters frequently display 6-base separation of sitespositions 3 to thorn9 of the model contain data frompositions 9 to thorn3 of the adjacent Fur sites eg see thelogo of Figure 4 and the M9 walkers of Figure 2AIn addition information from sites separated by 3 basesand the complements of all of these also contribute to theoverall information content of the model It is not obvioushow to disentangle these effects We suggest that the M9model can be scanned across the genome to locate strongsingle-dimer binding sites which could then be experi-mentally confirmed and used to construct a disentangledmodel

Unfortunately the distribution of the individual infor-mation in these sites would be determined by the modelused to locate them and this could introduce biases Forexample the mean of the distribution can be altered justby selecting different subsets of sites from the distribution

A similar bias has occurred in binding site databaseswhere the consensus sequence was used to locate newsites (23) Thus although the sequence logo would nolonger contain self-overlapping data it would not neces-sarily represent the natural distribution of site strengthsTo approach such an ultimate model may require furtherscanning of footprinted regions Incorporation of theweaker sites in the footprints would of course re-entanglethe model so such scans might be best used only todetermine whether the model can detect the weaker sitesA viable solution may be to obtain all single-dimerbinding sites in the genome that have more than zero bitsand can be confirmed experimentally If the resultingmodel matches the observed footprints it may be consi-dered complete On the other hand if the resultingmodel does not match the footprints the mode of bindingby Fur could be different between single dimers andclusters

SUPPLEMENTARY DATA

Supplementary data are available at NAR Online

ACKNOWLEDGEMENTS

We thank Tom OrsquoHalloran and Caryn Outten for Furprotein Pete Rogan for helping to develop the local bestconcept Elaine Bucheimer for writing the localbestprogram and Lakshmanan Iyer Brent Jewett DanielleNeedle Xiao Ma Shu Ouyang Denise Rubens andBruce Shapiro for their helpful comments and discussionsKAL also thanks the National Cancer Institute atFrederick for sponsoring the Werner H Kirsten StudentIntern Program This publication has been funded in partwith Federal funds from the National Cancer InstituteNational Institutes of Health under contract NO1-CO-12400 The content of this publication does not necessarilyreflect the views or policies of the Department of Healthand Human Services nor does mention of trade namescommercial products or organizations imply endorsementby the US Government This research was supported inpart by the Intramural Research Program of the NIHNational Cancer Institute Center for Cancer ResearchFunding to pay the Open Access publication charges forthis article was provided by National Cancer Institute

Conflict of interest statement None declared

REFERENCES

1 StojiljkovicI BaumlerAJ and HantkeK (1994) Fur regulon ingram-negative bacteria Identification and characterization of newiron-regulated Escherichia coli genes by a fur titration assay[published erratum appears in J Mol Biol 1994 Jul 15240(3)271]J Mol Biol 236 531ndash545

2 Le CamE FrechonD BarrayM FourcadeA and DelainE(1994) Observation of binding and polymerization of Fur repressoronto operator-containing DNA with electron and atomic forcemicroscopes Proc Natl Acad Sci USA 91 11816ndash11820

3 CalderwoodSB and MekalanosJJ (1987) Iron regulation ofShiga-like toxin expression in Escherichia coli is mediated by thefur locus J Bacteriol 169 4759ndash4764

Nucleic Acids Research 2007 Vol 35 No 20 6775

4 NiederhofferEC NaranjoCM BradleyKL and FeeJA (1990)Control of Escherichia coli superoxide dismutase (sodA and sodB)genes by the ferric uptake regulation (fur) locus J Bacteriol 1721930ndash1938

5 EscolarL Perez-MartinJ and de LorenzoV (1998) Binding of theFur (Ferric Uptake Regulator) repressor of Escherichia coli toarrays of the GATAAT sequence J Mol Biol 283 537ndash547

6 BaggA and NeilandsJB (1987) Ferric uptake regulation proteinacts as a repressor employing iron (II) as a cofactor to bind theoperator of an iron transport operon in Escherichia coliBiochemistry 26 5471ndash5477

7 de LorenzoV WeeS HerreroM and NeilandsJB (1987)Operator sequences of the aerobactin operon of plasmid ColV-K30binding the ferric uptake regulation (fur) repressor J Bacteriol169 2624ndash2630

8 AlthausEW OuttenCE OlsonKE CaoH andOrsquoHalloranTV (1999) The ferric uptake regulation (Fur) repressoris a zinc metalloprotein Biochemistry 38 6559ndash6569

9 PohlE HallerJC MijovilovichA Meyer-KlauckeWGarmanE and VasilML (2003) Architecture of a protein centralto iron homeostasis crystal structure and spectroscopic analysis ofthe ferric uptake regulator Mol Microbiol 47 903ndash915

10 KammlerM SchonC and HantkeK (1993) Characterization ofthe ferrous iron uptake system of Escherichia coli J Bacteriol 1756212ndash6219

11 DesaiPJ AngererA and GencoCA (1996) Analysis of Furbinding to operator sequences within the Neisseria gonorrhoeae fbpApromoter J Bacteriol 178 5020ndash5023

12 HeidrichC HantkeK BierbaumG and SahlHG (1996)Identification and analysis of a gene encoding a Fur-likeprotein of Staphylococcus epidermidis FEMS Microbiol Lett 140253ndash259

13 McHughJP Rodriguez-QuinonesF Abdul-TehraniHSvistunenkoDA PooleRK CooperCE and AndrewsSC(2003) Global iron-dependent gene regulation in Escherichia coliA new mechanism for iron homeostasis J Biol Chem 27829478ndash29486

14 ShannonCE (1948) A mathematical theory of communicationBell Syst Tech J 27 379ndash423 623ndash656 httpcmbell-labscomcmmswhatshannondaypaperhtml

15 PierceJR (1980) An Introduction to Information Theory SymbolsSignals and Noise 2nd edn Dover Publications Inc New York

16 SchneiderTD StormoGD GoldL and EhrenfeuchtA (1986)Information content of binding sites on nucleotide sequencesJ Mol Biol 188 415ndash431 httpwwwccrnpncifcrfgov~tomspaperschneider1986

17 SchneiderTD and MastronardeD (1996) Fast multiple alignmentof ungapped DNA sequences using information theory and arelaxation method Discrete Appli Math 71 259ndash268 httpwwwccrnpncifcrfgov~tomspapermalign

18 SchneiderTD and StephensRM (1990) Sequence logos a newway to display consensus sequences Nucleic Acids Res 186097ndash6100 httpwwwccrnpncifcrfgov~tomspaperlogopaper

19 SchneiderTD (1997) Information content of individual geneticsequences J Theor Biol 189 427ndash441 httpwwwccrnpncifcrfgov~tomspaperri

20 SchneiderTD (1997) Sequence walkers a graphical method todisplay how binding proteins interact with DNA or RNAsequences Nucleic Acids Res 25 4408ndash4415 httpwwwccrnpncifcrfgov~tomspaperwalker erratum NAR 26(4)1135 1998

21 PaninaEM MironovAA and GelfandMS (2001) Comparativeanalysis of FUR regulons in gamma-proteobacteria Nucleic AcidsRes 29 5195ndash5206

22 SchneiderTD (2000) Evolution of biological informationNucleic Acids Res 28 2794ndash2799 httpwwwccrnpncifcrfgov~tomspaperev

23 VyhlidalCA RoganPK and LeederJS (2004) Development andrefinement of pregnane X receptor (PXR) DNA binding site modelusing information theory insights into PXR-mediated gene regula-tion J Biol Chem 279 46779ndash46786

24 ZhengM DoanB SchneiderTD and StorzG (1999) OxyR andSoxRS regulation of fur J Bacteriol 181 4639ndash4643 httpwwwccrnpncifcrfgov~tomspaperoxyrfur

25 HengenPN BartramSL StewartLE and SchneiderTD (1997)Information analysis of Fis binding sites Nucleic Acids Res 254994ndash5002 httpwwwccrnpncifcrfgov~tomspaperfisinfo

26 WoodTI GriffithKL FawcettWP Jair K-W SchneiderTDand WolfRE (1999) Interdependence of the position andorientation of SoxS binding sites in the transcriptional activation ofthe class I subset of Escherichia coli superoxide-inducible promotersMol Microbiol 34 414ndash430

27 ZhengM WangX DoanB LewisKA SchneiderTD andStorzG (2001) Computation-directed identification of OxyR-DNAbinding sites in Escherichia coli J Bacteriol 183 4571ndash4579

28 RoganPK FauxBM and SchneiderTD (1998) Informationanalysis of human splice site mutations Hum Mutat 12 153ndash171Erratum in Hum Mutat 199913(1)82 httpwwwccrnpncifcrfgov~tomspaperrfs

29 ChenZ and SchneiderTD (2005) Information theory basedT7-like promoter models classification of bacteriophages anddifferential evolution of promoters and their polymerasesNucleic Acids Res 33 6172ndash6187 httpwwwccrnpncifcrfgov~tomspaperst7like

30 ChenZ and SchneiderTD (2006) Comparative analysis of tandemT7-like promoter containing regions in enterobacterial genomesreveals a novel group of genetic islands Nucleic Acids Res 341133ndash1147 httpwwwccrnpncifcrfgov~tomspaperst7island

31 SchneiderTD (1996) Reading of DNA sequence logos predictionof major groove binding by information theory Meth Enzymol274 445ndash455 httpwwwccrnpncifcrfgov~tomspaperoxyr

32 GriggsDW and KoniskyJ (1989) Mechanism for iron-regulatedtranscription of the Escherichia coli cir gene metal-dependentbinding of Fur protein to the promoters J Bacteriol 1711048ndash1052

33 AngererA and BraunV (1998) Iron regulates transcription of theEscherichia coli ferric citrate transport genes directly and throughthe transcription initiation proteins Arch Microbiol 169 483ndash490

34 de LorenzoV HerreroM GiovanniniF and NeilandsJB (1988)Fur (ferric uptake regulation) protein and CAP (catabolite-activatorprotein) modulate transcription of fur gene in Escherichia coliEur J Biochem 173 537ndash546

35 HuntMD PettisGS and McIntoshMA (1994) Promoter andoperator determinants for fur-mediated iron regulation in thebidirectional fepA-fes control region of the Escherichia colienterobactin gene system J Bacteriol 176 3944ndash3955

36 TardatB and TouatiD (1993) Iron and oxygen regulation ofEscherichia coli MnSOD expression competition between the globalregulators Fur and ArcA for binding to DNA Mol Microbiol 953ndash63

37 EscolarL Perez-MartinJ and de LorenzoV (2000) Evidence ofan unusually long operator for the fur repressor in the aerobactinpromoter of Escherichia coli J Biol Chem 275 24709ndash24714

38 YoungGM and PostleK (1994) Repression of tonB transcriptionduring anaerobic growth requires Fur binding at the promoterand a second factor binding upstream Mol Microbiol 11943ndash954

39 BrickmanTJ OzenbergerBA and McIntoshMA (1990)Regulation of divergent transcription from the iron-responsivefepB-entC promoter-operator regions in Escherichia coliJ Mol Biol 212 669ndash682

40 BaichooN and HelmannJD (2002) Recognition of DNA by Fura reinterpretation of the Fur box consensus sequence J Bacteriol184 5826ndash5832

41 OchsnerUA and VasilML (1996) Gene repression by the ferricuptake regulator in Pseudomonas aeruginosa cycle selection of iron-regulated genes Proc Natl Acad Sci USA 93 4409ndash4414

42 BaichooN WangT YeR and HelmannJD (2002) Globalanalysis of the Bacillus subtilis Fur regulon and the iron starvationstimulon Mol Microbiol 45 1613ndash1629

43 ChristoffersenCA BrickmanTJ Hook-BarnardI andMcIntoshMA (2001) Regulatory architecture of the iron-regulatedfepD-ybdA bidirectional promoter region in Escherichia coliJ Bacteriol 183 2059ndash2070

44 ChaiS WelchTJ and CrosaJH (1998) Characterization of theinteraction between Fur and the iron transport promoter of thevirulence plasmid in Vibrio anguillarum J Biol Chem 27333841ndash33847

6776 Nucleic Acids Research 2007 Vol 35 No 20

45 EscolarL Perez-MartinJ and de LorenzoV (1998) Coordinatedrepression in vitro of the divergent fepA-fes promoters ofEscherichia coli by the iron uptake regulation (Fur) proteinJ Bacteriol 180 2579ndash2582

46 EscolarL de LorenzoV and Perez-MartinJ (1997)Metalloregulation in vitro of the aerobactin promoter of Escherichiacoli by the Fur (ferric uptake regulation) protein Mol Microbiol26 799ndash808

47 ShultzabergerRK ChenZ LewisKA and SchneiderTD (2007)Anatomy of Escherichia coli s70 promoters Nucleic Acids Res 35771ndash788 httpwwwccrnpncifcrfgov~tomspaperflexprom

48 ShultzabergerRK BucheimerRE RuddKE andSchneiderTD (2001) Anatomy of Escherichia coli ribosomebinding sites J Mol Biol 313 215ndash228 httpwwwccrnpncifcrfgov~tomspaperflexrbs

49 HiraoI KawaiG YoshizawaS NishimuraY IshidoYWatanabeK and MiuraK (1994) Most compact hairpin-turnstructure exerted by a short DNA fragment d(GCGAAGC) insolution an extraordinarily stable structure resistant to nucleasesand heat Nucleic Acids Res 22 576ndash582

50 Eick-HelmerichK and BraunV (1989) Import of biopolymers intoEscherichia coli nucleotide sequences of the exbB and exbD genesare homologous to those of the tolQ and tolR genes respectivelyJ Bacteriol 171 5117ndash5126

51 VassinovaN and KozyrevD (2000) A method for direct cloning ofFur-regulated genes identification of seven new Fur-regulated lociin Escherichia coli Microbiology 146 3171ndash3182

52 ToledanoMB KullikI TrinhF BairdPT SchneiderTD andStorzG (1994) Redox-dependent shift of OxyR-DNA contactsalong an extended DNA binding site a mechanism for differentialpromoter selection Cell 78 897ndash909

53 BlattnerFR PlunkettG III BlochCA PernaNT BurlandVRileyM Collado-VidesJ GlasnerJD RodeCK et al (1997)The complete genome sequence of Escherichia coli K-12 Science277 1453ndash1474

54 BaggA and NeilandsJB (1987) Molecular mechanism of regula-tion of siderophore-mediated iron assimilation Microbiol Rev 51509ndash518

55 LavrrarJL and McIntoshMA (2003) Architecture of a Furbinding site a comparative analysis J Bacteriol 185 2194ndash2202

56 OrchardK and MayGE (1993) An EMSA-based method fordetermining the molecular weight of a protein ndash DNA complexNucleic Acids Res 21 3335ndash3336

57 MasseE and GottesmanS (2002) A small RNA regulates theexpression of genes involved in iron metabolism in Escherichia coliProc Natl Acad Sci USA 99 4620ndash4625

58 LavrrarJL ChristoffersenCA and McIntoshMA (2002)FurndashDNA interactions at the bidirectional fepDGC-entS promoterregion in Escherichia coli J Mol Biol 322 983ndash995

59 DelanyI RappuoliR and ScarlatoV (2004) Fur functions as anactivator and as a repressor of putative virulence genes in Neisseriameningitidis Mol Microbiol 52 1081ndash1090

60 MasseE VanderpoolCK and GottesmanS (2005) Effect ofRyhB small RNA on global iron use in Escherichia coliJ Bacteriol 187 6962ndash6971

61 ArgamanL HershbergR VogelJ BejeranoG WagnerEGMargalitH and AltuviaS (2001) Novel small RNA-encodinggenes in the intergenic regions of Escherichia coli Curr Biol 11941ndash950

62 WassarmanKM RepoilaF RosenowC StorzG andGottesmanS (2001) Identification of novel small RNAs usingcomparative genomics and microarrays Genes Dev 15 1637ndash1651

63 RobisonK McGuireAM and ChurchGM (1998) A compre-hensive library of DNA-binding site matrices for 55 proteins appliedto the complete Escherichia coli K-12 genome J Mol Biol 284241ndash254

64 SalgadoH Santos-ZavaletaA Gama-CastroS Millan-ZarateDDiaz-PeredoE Sanchez-SolanoF Perez-RuedaEBonavides-MartinezC and Collado-VidesJ (2001) RegulonDB(version 32) transcriptional regulation and operon organization inEscherichia coli K-12 Nucleic Acids Res 29 72ndash74

65 AndrewsSC RobinsonAK and Rodriguez-QuinonesF (2003)Bacterial iron homeostasis FEMS Microbiol Rev 27 215ndash237

66 SchneiderTD (2002) Consensus Sequence Zen ApplBioinformatics 1 111ndash119 httpwwwccrnpncifcrfgov~tomspaperszen

67 RuddKE and SchneiderTD (1992) Compilation of E coliribosome binding sites In MillerJH (ed) A Short Course inBacterial Genetics A Laboratory Manual and Handbook forEscherichia coli and Related Bacteria Cold Spring HarborCold Spring Harbor Laboratory Press New York pp 1719ndash1745

68 HengenPN LyakhovIG StewartLE and SchneiderTD(2003) Molecular flip-flops formed by overlapping Fis sitesNucleic Acids Res 31 6663ndash6673

69 FeeJA (1991) Regulation of sod genes in Escherichia colirelevance to superoxide dismutase function Mol Microbiol 52599ndash2610

70 IgarashiK SaishoT YuguchiM and KashiwagiK (1997)Molecular mechanism of polyamine stimulation of thesynthesis of oligopeptide-binding protein J Biol Chem 2724058ndash4064

71 YoshidaM MeksuriyenD KashiwagiK KawaiG andIgarashiK (1999) Polyamine stimulation of the synthesis ofoligopeptide-binding protein (OppA) Involvement of a structuralchange of the Shine-Dalgarno sequence and the initiation codonaug in oppa mRNA J Biol Chem 274 22723ndash22728

72 LitwinCM and CalderwoodSB (1993) Role of iron in regulationof virulence genes Clin Microbiol Rev 6 137ndash149

73 MakuiH RoigE ColeST HelmannJD GrosP andCellierMF (2000) Identification of the Escherichia coli K-12Nramp orthologue (MntH) as a selective divalent metal iontransporter Mol Microbiol 35 1065ndash1078

74 PatzerSI and HantkeK (2001) Dual repression by Fe(2+)-Furand Mn(2+)-MntR of the mntH gene encoding an NRAMP-likeMn(2+) transporter in Escherichia coli J Bacteriol 1834806ndash4813

75 FurrerJL SandersDN Hook-BarnardIG and McIntoshMA(2002) Export of the siderophore enterobactin in Escherichia coliinvolvement of a 43 kDa membrane exporter Mol Microbiol 441225ndash1234

76 CunninghamL GruerMJ and GuestJR (1997) Transcriptionalregulation of the aconitase genes (acnA and acnB) of Escherichiacoli Microbiology 143 ( Pt 12) 3795ndash3805

77 FuentesAM Diaz-MejiaJJ Maldonado-RodriguezR andAmabile-CuevasCF (2001) Differential activities of theSoxR protein of Escherichia coli SoxS is not required for geneactivation under iron deprivation FEMS Microbiol Lett 201271ndash275

78 VargheseS TangY and ImlayJA (2003) Contrasting sensitivitiesof Escherichia coli aconitases A and B to oxidation and irondepletion J Bacteriol 185 221ndash230

79 ZavitzKH DiGateRJ and MariansKJ (1991) The PriB andPriC replication proteins of Escherichia coli Genes DNAsequence overexpression and purification J Biol Chem 26613988ndash13995

80 SandlerSJ (2005) Requirements for replication restart proteinsduring constitutive stable DNA replication in Escherichia coli K-12Genetics 169 1799ndash1806

81 SchneiderTD (2001) Strong minor groove base conservation insequence logos implies DNA distortion or base flipping duringreplication and transcription initiation Nucleic Acids Res 294881ndash4891 httpwwwccrnpncifcrfgov~tomspaperbaseflip

Nucleic Acids Research 2007 Vol 35 No 20 6777

Page 11: Discovery of Fur binding site clusters in Escherichia coli ...biitcomm/research/references... · using an fhuF:lacZ fusion and Fur consensus sequence-containing plasmid titrant on

consensus sequence Thus four of the five Sau3Afragments (yhhX gpmA nohA and fhuF) selected to haveFur binding sites in vivo were identified in our genomescan (Table 4) and our gel shift results confirmed this(Figure 3) the remaining site ygaC had a weak walker of14 bits (by the 11-site M9 model) and showed extremelyweak binding (Figure 3)Eick-Helmerich and Braun matched a Fur consensus

sequence to the promoters of exbB and exbD (50)One strong walker of 191 bits was found in the exbBpromoter region overlapping the lsquoFur boxrsquo The regionupstream of exbD showed no walkers with Ri greater than0 bits even though it had also been predicted to containa Fur binding site using the same consensus sequenceas used in the exbB promoter The highest walker thatwe could detect in the region upstream of the exbDgene is 71 bits (by the 24-site M9 model Figure 3 andSupplementary Figure S1) significantly lower than thetheoretical lower bound of a binding site (0 bits) (19) Thisindicates that Fur should not bind this region and our gelshift results confirmed this (Figure 3)A microarray analysis by McHugh et al found 101

genes to be regulated by Fur 53 of which were repressedand 48 activated (13) The 53 Fur-repressed genes belongto 32 transcription units (individual genes or operons)18 of which are involved in iron metabolism 14 in energymetabolism and other functions Using the 24-site M9model we predicted strong Fur binding clusters (higherthan 13 bits) for all 18 iron metabolism-related transcrip-tion units Of the other 14 transcription units onlytwo have a Fur binding site higher than 10 bits (nrdHIEF101 bits fimE 108 bits) The 48 Fur-activated genesbelong to 34 transcription units only three of these(garPLRK ynaE and ydfK) have a predicted Fur bindingcluster The garPLRK operon has a weak Fur clusterthat contains one major site of 104 bits The ynaE andydfK genes are almost identical in both promoter andcoding regions and the two Fur clusters (171 bitscentered at thorn8 from the translational start) are the same(Table 4) These results strongly suggest that in E coli

Fur is a direct repressor of iron metabolism genes and anindirect repressor or activator of other genes

McHugh et al (13) used an altered information theoryapproach for modeling Fur binding (63) to identify whichgenes are under direct Fur control Our model identifiedsites in all genes that their model did as well as fourothers fhuE (124 bits at 1160880) exbB (191 bits at3150076) fimE (108 bits at 4539697) and ydfK (171 bitsat 1631071)

The indirect activation by Fur requires an intermediateregulator which could directly target Fur-activated genesA small regulatory RNA gene ryhB is one such importantplayer Earlier study revealed several genes that aredirectly repressed by ryhB (57) Recently a microarrayanalysis of ryhB control by Masse et al (60) expanded thisdirect target set to 18 operons They also identified anadditional 10 indirectly down-regulated operons and 10up-regulated operons The 10 ryhB indirectly repressedoperons are directly repressed by Fur and for all ofthese operons we found strong Fur binding clusters(higher than 13 bits) in the corresponding promoterregions In contrast only a few of the other 28 ryhB-controlled operons (18 directly repressed operons and10 up-regulated operons) have a predicted Fur bindingcluster above 10 bits These include acnA (176 bits at1333484) ydhD (109 bits at 1732367) and oppA (215 bitsat 1298973) The acnA and oppA Fur clusters are bothstrong the former cluster has three overlapping sites of176 98 and 142 bits and the latter has two overlapp-ing sites of 215 and 96 bits (Table 4) Howevermicroarray analysis did not find these two genes to berepressed by Fur (1360) suggesting that the Fur bindingclusters may not be the primary control elements ofthese genes

DISCUSSION

Fur binding models

In this study we used experimentally proven sites to createinformation theory models of Fur binding Our models

Table 3 Scan of footprinted Fur binding sites in E coli

Accession Gene Footprinted region References Number Ri Percentage

NC_000913 fhuF1 4603763 ndash 4603815 This work 4 223 91NC_000913 fhuF2 4603680 ndash 4603733 This work 4 146 66NC_000913 fepA-fes 611868 ndash 611915 (35) 4 187 77NC_000913 fepB 623939 ndash 623969 (39) 2 167 81NC_000913 entC 624054 ndash 624102 (39) 4 231 76NC_000913 fur 709931 ndash 709960 (34) 2 123 83NC_000913 tonB 1309040 ndash 1309072 (838) 3 115 88NC_000913 cir 2244957 ndash 2244999 (32) 3 207 72NC_000913 sodA 4098726 ndash 4098780 (36) 3 233 56NC_000913 fecA 4514752 ndash 4514789 (33) 2 176 66NC_000913 fecI 4516300 ndash 4516340 (33) 7 231 144M10930 iucA 291 ndash 393 (37) 13 203 93NC_000913 fepD-entS 621394 ndash 621462 (58) 4 201 54

The footprinted regions were scanned with the 24-site M9 model (Rigt0) to test the modelrsquos validity This table lists the GenBank accession numberthe gene promoter that was footprinted the coordinates of the footprinted region the references for the footprints the number of walkers in theregion the strongest Ri value in the same region and the percent of the footprint covered by sequence walkers This table is a summary of Figure 5Aand Supplementary Figure S8 Sites in the promoters that are marked by an asterisk were used to build the 11-site models (Figure 1 SupplementaryFigure S1) in the case of iucA two sites were used in the model (see Materials and Methods Section)

6772 Nucleic Acids Research 2007 Vol 35 No 20

especially the 24-site M9 model approximate the bindingcharacteristics of Fur more fully than previous modelsthat depended on conventional consensus sequences datafrom multiple species and sequences that were not foot-printed (13251) The rigorous approach used hererevealed new binding sites disproved two sites predictedby a consensus sequence and clarified the manner inwhich the protein binds

To produce these models binding sequences weremultiply aligned to maximize the information content indifferent window sizes Because multiple Fur sites overlapour analysis of proven Fur binding sequences from three

bacterial species gave two or three different alignmentsdepending on the species (Figure 1 SupplementaryFigure S6) By testing these models against publisheddata we showed clearly that the M9 model representssingle-dimer binding sites the M12 model is for two-dimerbinding while the compact M6 alignment was caused bycompression of the binding sequences into a small window(Tables 1 and 2 Figures 1 and 2 Supplementary FiguresS3 S5 and S7)As evaluated by their respective M9 models our

initial set of 11 footprinted E coli Fur binding sitesall contain overlapping sites of 6-base spacing

Table 4 Strong Fur clusters in the E coli genome predicted by the 24-site M9 model

Ri Number Size Coordinate Genes Experimental Position References(bits) (bp) data

274 12 95 2066611 yoeA G 48 NA274 5 31 786853 gpmA G 35 (1351)253 4 28 579824 nohB 133 (1)252 4 31 3579054 ryhB G 15 (57)233 3 31 4098746 sodA F 87 (6069)231 7 59 4516309 fecI F 51 (1360)231 4 37 624072 entC F 36 (39)228 2 25 1903412 yebN 300 NA223 4 48 4603779 fhuFa FG 93 (135160)223 3 25 1634624 nohA G 233 (51)217 2 25 1886651 fadD G +1119b NA215 2 25 1298973 oppA G 233 (7071)213 3 31 1762463 sufA 53 (1360)207 3 31 2244980 cir F 189 (1372)206 5 40 1118358 yceJ 89 NA202 3 31 3465053 bfd 40 (1360)201 10 53 1787602 ydiE 35 (13)201 2 35 1080513 ycdN 57 (1360)199 8 106 3214572 yqjH 59 (13)199 7 103 1752756 ydhY 255 NA199 2 25 2510784 mntH 56 (7374)196 3 25 2231892 yeiT 163 NA196 2 25 1577358 yddA +8c (13)195 3 31 167439 fhuA G 45 (1360)194 4 37 621437 fepD ndash entS FG 25 86 (1375)193 3 52 675768 ybeQ +2c NA191 3 31 3150076 exbB G 70 (136072)187 8 96 2784036 ygaQ 383 NA187 4 37 611889 fepA ndash fes F 172 149 (13)186 7 38 3266404 yhaC 33 NA186 4 28 331901 yahA G +306e NA183 4 61 331085 yahA 510e NA176 6 41 1333484 acnA 371 (5776ndash78)176 2 25 4514779 fecA F 79 (1360)176 1 19 2254041 yeiL +5d NA175 3 25 490057 priC ndash ybaN 21 49 (7980)173 2 29 2454142 yfcV 474 NA172 8 41 3582772 yrhB 10 NA172 4 37 840877 fiu 123 (13)171 3 40 1631071 ydfK +8c (13)171 3 40 1432273 ynaE +8c (13)171 2 25 4570225 yjiT 212 NA

The strongest individual information value (Ri bits) in each cluster is shown followed by the number of walkers (Rigt 0 bits) the size of the clusterthe coordinate of the strongest site the genes that are presumably controlled experimental data footprinted (F) andor gel shifted (G) sites thelocation of the strongest Fur site relative to the gene start or stop codon (as noted) and references about Fur regulation of the genes (NA means noreference is available)aThis cluster corresponds to the high-affinity region (fhuF1) in the fhuF promoter (Figure 5)bThe coordinate is high because Fur cluster is inside the fadD coding regioncThese Fur clusters overlap with translational startsdThis Fur cluster overlaps with the stop codoneThe yahA gene has two Fur clusters one is inside the coding region another is 510 bases upstream of the translational start The Fur cluster insidethe gene was tested by gel shift experiments (Figure 3)

Nucleic Acids Research 2007 Vol 35 No 20 6773

(Supplementary Figure S8) while only 7 of the 20P aeruginosa Fur binding sites and 6 of the 22 B subtilisFur binding sites are single-dimer binding sites (data notshown) This scarcity of multiple sites explains why anM12 alignment can be obtained for E coli but not forP aeruginosa or B subtilis When we performed multiplealignment by only using overlapping sites of 6-bp spacingwe obtained M12 alignments for both P aeruginosa andB subtilis (data not shown) These observations alsosupport the use of M9 as a single-dimer modelOur multiple alignment method should apply to any

DNA binding sites that contain internal direct repeats asrepetition within binding sites may cause alternativealignments of the sites In the RegulonDB (httpregulondbccgunammx) (64) a Fur binding modelwas created by multiply aligning 47 Fur sites althoughit is not clear what these 47 sites are and how thealignment was made Based on their alignment (httpregulondbccgunammxhtmlmatrix_Alignmentjsp) we made a sequence logo (data notshown) which we found to be similar to our M6 logosAlthough the M6 models appear to be able to predictsome sites (eg Tables 1 and 2 Supplementary Figures S3S5 and S7) the models only represent the internal partof overlapping Fur binding sites ie from 6 to thorn6 of theM12 model (Figure 1A)A 18 A crystal structure for P aeruginosa Fur

was obtained by Pohl et al and based on this theauthors proposed a single-dimer binding model (9) TheP aeruginosa crystal structure contains a putative DNAbinding a-helix H4 and a loop between helices H1 and H2both of which were proposed to be involved in bindingDNA through contacting bases in the major groove Thusa single dimer would protect two consecutive majorgrooves on the same face of DNA and the minimalrecognition unit of Fur should be close to 20 bp (9) OurM9 binding site models (19 bp a 9-1-9 inverted repeat) fitwith this single-dimer binding model closely Pohl et alalso proposed a two-dimer binding model in whichtwo Fur dimers bind two overlapping sites that are 5-bpapart (9) However in our whole genome scans andfootprint scans as well we did not observe any case of5-bp separated overlapping sites above 5 bits Instead wemostly found 3- or 6-bp separated overlapping sitesAbove five bits we found 95 cases of 3-bp spacing and92 cases of 6-bp spacingSeveral other consensus-based Fur binding site models

have been proposed to interpret a 19-bp consensuslsquoFur boxrsquo (Figure 6) (65) Within these two earlier

models the classical model and hexamer model(573754) interpreted the lsquoFur boxrsquo as a single recogni-tion unit of a Fur dimer Later Lavrrar and McIntoshsuggested that a 13-bp inverted repeat (6-1-6) is theminimal unit recognized by a single Fur dimer and thattwo overlapping lsquo6-1-6rsquo motifs correspond to the Fur boxwhich is required for high-affinity binding of Fur (5558)Baichoo and Helmann reinterpreted the Fur box as twooverlapping 7-1-7 inverted repeats with a 6-bp spacingand suggested that a 7-1-7 site but not 6-1-6 representsthe minimal recognition unit of Fur (40) Baichoo andHelmann also demonstrated that high-affinity bindingby Fur can happen on a single-dimer binding site ie aninverted 7-1-7 site

The 7-1-7 model basically agrees with our M9 models(Figure 6) The main difference is that the 7-1-7 model didnot count the weakly conserved bases at positions 9 8thorn8 and thorn9 in the B subtilis Fur binding site alignment(Figure 4C) As the E coli and P aeruginosa M9 modelshave highly conserved bases at these four positions(Figure 4A and B) and also the Fur binding mechanismsfor these three species are similar (9) we suggest thata minimal Fur binding unit of 19 bp (9-1-9) as representedby our M9 models is more reasonable

Relative to our M9 models the Fur box only representsthe internal part of two overlapping sites that areseparated by 6 bases ie from 9 to thorn9 of the M12logo (Figures 1A and 6) thus any predictions based on theFur box may not be accurate Two sites (exbD and ygaC)that were predicted by matching to the Fur box wereshown to be non-binding (Figure 3)

Many difficulties in understanding Fur binding sites canbe attributed to the choice of consensus sequences asa model (66) In contrast to the individual informationweight matrix model the consensus sequence methodignores the varying importance of bases by treatingmismatches equivalently In addition the consensusmethod does not have a criterion for an acceptablenumber of mismatches

Sequence walkers predicted by the 24-site M9 modelcover 83 of the footprints (Table 3 Figure 5Supplementary Figure S8) Similar results were obtainedfor both the P aeruginosa and B subtilis M9 models(data not shown) These results strongly suggest that ourM9 models can accurately predict Fur binding

Fur Binding Site Clusters

The whole genome scan with the 24-site M9model revealed363 Fur binding clusters in E coli most of which containmultiple walkers (up to 14) covering regions up to about140 bases This is comparable to the size of knownfootprints which ranges from 30 to 103 bases (Table 3Supplementary Figure S8) Most of the clusters (303 outof 363 83) were found to overlap with one or morestrong s70 promoters Indeed the four Fur clusters inthe middle of coding regions gspC garP yahA and fadDthat we tested by gel mobility shift assays (Figure 3) allhave one or more potential s70 sites (Ri gt 4 bits) exactlyoverlapping them (see Figures S10ndashS13) The function ofthese control elements is unknown

Fur boxClassical model 9-1-9Hexamer model 6-6-1-6Lavrrar model 6-1-6

Baichoo model 7-1-7

M9 model 9-1-9

GATAATGATAATCATTATC

========gtOlt========

=====gt=====gtOlt=====

=====gtOlt==========gtOlt=====

t=====gtOlt============gtOlt=====a

aat=====gtOlt================gtOlt=====att

Figure 6 Different Fur binding models to interpret the Fur box

6774 Nucleic Acids Research 2007 Vol 35 No 20

We correlated our whole genome scan with publishedmicroarray data about Fur in E coli (1360) The resultsshowed that all operons that are involved in ironmetabolism have a predicted strong Fur binding cluster(higher than 13 bits) suggesting direct repression ofthese genes by Fur In contrast only a few of otherFur-activated or repressed operons have a predicted Furcluster higher than 10 bits suggesting indirect control ofthese genes by Fur It has been shown that RNA generyhB directly targets and represses a set of other genes andryhB itself is directly repressed by Fur thus ryhB playsan important role in indirect activation of genes by Furin E coli (135760)

It has been suggested that in N meningitidis Fur canalso directly activate genes by binding to the regionupstream of the promoters (59) However our wholegenome analysis did not support this activation mechan-ism in E coli The E coli ryhB gene is located betweengenes yhhX and yhhY No corresponding homologs of thegenes ryhB yhhX or yhhY can be found in N meningitidisThis suggests that Fur activation mechanisms may bedifferent in different species

Disentangling Fur models

Each binding site that we have studied in detail withinformation theory has had unique properties andchallenges for analysis For example ribosome bindingsites were initially modeled as rigid objects (16) but whenenough sites became available it was clear that theShinendashDalgarno sequence was being lost because it has avariable distance to the initiation codon (67) and a flexiblemodel was required (48) The flexible model was latersuccessfully applied to s70 promoters (47) and that wasused to study Fur sites Likewise information theoryanalysis revealed that Fis binding sites are commonlyfound to overlap each other (68) leading to someuncertainty as to how much nearby sites influence thesites being studied in the alignments Overlapping Fursites that are bound by Fur protein dimers create an evenmore complex situation

Although the 24-site M9 model was quite successful themodel is comprised of information from multiple Fur sitesbecause the Fur dimers overlap on the DNA Since theclusters frequently display 6-base separation of sitespositions 3 to thorn9 of the model contain data frompositions 9 to thorn3 of the adjacent Fur sites eg see thelogo of Figure 4 and the M9 walkers of Figure 2AIn addition information from sites separated by 3 basesand the complements of all of these also contribute to theoverall information content of the model It is not obvioushow to disentangle these effects We suggest that the M9model can be scanned across the genome to locate strongsingle-dimer binding sites which could then be experi-mentally confirmed and used to construct a disentangledmodel

Unfortunately the distribution of the individual infor-mation in these sites would be determined by the modelused to locate them and this could introduce biases Forexample the mean of the distribution can be altered justby selecting different subsets of sites from the distribution

A similar bias has occurred in binding site databaseswhere the consensus sequence was used to locate newsites (23) Thus although the sequence logo would nolonger contain self-overlapping data it would not neces-sarily represent the natural distribution of site strengthsTo approach such an ultimate model may require furtherscanning of footprinted regions Incorporation of theweaker sites in the footprints would of course re-entanglethe model so such scans might be best used only todetermine whether the model can detect the weaker sitesA viable solution may be to obtain all single-dimerbinding sites in the genome that have more than zero bitsand can be confirmed experimentally If the resultingmodel matches the observed footprints it may be consi-dered complete On the other hand if the resultingmodel does not match the footprints the mode of bindingby Fur could be different between single dimers andclusters

SUPPLEMENTARY DATA

Supplementary data are available at NAR Online

ACKNOWLEDGEMENTS

We thank Tom OrsquoHalloran and Caryn Outten for Furprotein Pete Rogan for helping to develop the local bestconcept Elaine Bucheimer for writing the localbestprogram and Lakshmanan Iyer Brent Jewett DanielleNeedle Xiao Ma Shu Ouyang Denise Rubens andBruce Shapiro for their helpful comments and discussionsKAL also thanks the National Cancer Institute atFrederick for sponsoring the Werner H Kirsten StudentIntern Program This publication has been funded in partwith Federal funds from the National Cancer InstituteNational Institutes of Health under contract NO1-CO-12400 The content of this publication does not necessarilyreflect the views or policies of the Department of Healthand Human Services nor does mention of trade namescommercial products or organizations imply endorsementby the US Government This research was supported inpart by the Intramural Research Program of the NIHNational Cancer Institute Center for Cancer ResearchFunding to pay the Open Access publication charges forthis article was provided by National Cancer Institute

Conflict of interest statement None declared

REFERENCES

1 StojiljkovicI BaumlerAJ and HantkeK (1994) Fur regulon ingram-negative bacteria Identification and characterization of newiron-regulated Escherichia coli genes by a fur titration assay[published erratum appears in J Mol Biol 1994 Jul 15240(3)271]J Mol Biol 236 531ndash545

2 Le CamE FrechonD BarrayM FourcadeA and DelainE(1994) Observation of binding and polymerization of Fur repressoronto operator-containing DNA with electron and atomic forcemicroscopes Proc Natl Acad Sci USA 91 11816ndash11820

3 CalderwoodSB and MekalanosJJ (1987) Iron regulation ofShiga-like toxin expression in Escherichia coli is mediated by thefur locus J Bacteriol 169 4759ndash4764

Nucleic Acids Research 2007 Vol 35 No 20 6775

4 NiederhofferEC NaranjoCM BradleyKL and FeeJA (1990)Control of Escherichia coli superoxide dismutase (sodA and sodB)genes by the ferric uptake regulation (fur) locus J Bacteriol 1721930ndash1938

5 EscolarL Perez-MartinJ and de LorenzoV (1998) Binding of theFur (Ferric Uptake Regulator) repressor of Escherichia coli toarrays of the GATAAT sequence J Mol Biol 283 537ndash547

6 BaggA and NeilandsJB (1987) Ferric uptake regulation proteinacts as a repressor employing iron (II) as a cofactor to bind theoperator of an iron transport operon in Escherichia coliBiochemistry 26 5471ndash5477

7 de LorenzoV WeeS HerreroM and NeilandsJB (1987)Operator sequences of the aerobactin operon of plasmid ColV-K30binding the ferric uptake regulation (fur) repressor J Bacteriol169 2624ndash2630

8 AlthausEW OuttenCE OlsonKE CaoH andOrsquoHalloranTV (1999) The ferric uptake regulation (Fur) repressoris a zinc metalloprotein Biochemistry 38 6559ndash6569

9 PohlE HallerJC MijovilovichA Meyer-KlauckeWGarmanE and VasilML (2003) Architecture of a protein centralto iron homeostasis crystal structure and spectroscopic analysis ofthe ferric uptake regulator Mol Microbiol 47 903ndash915

10 KammlerM SchonC and HantkeK (1993) Characterization ofthe ferrous iron uptake system of Escherichia coli J Bacteriol 1756212ndash6219

11 DesaiPJ AngererA and GencoCA (1996) Analysis of Furbinding to operator sequences within the Neisseria gonorrhoeae fbpApromoter J Bacteriol 178 5020ndash5023

12 HeidrichC HantkeK BierbaumG and SahlHG (1996)Identification and analysis of a gene encoding a Fur-likeprotein of Staphylococcus epidermidis FEMS Microbiol Lett 140253ndash259

13 McHughJP Rodriguez-QuinonesF Abdul-TehraniHSvistunenkoDA PooleRK CooperCE and AndrewsSC(2003) Global iron-dependent gene regulation in Escherichia coliA new mechanism for iron homeostasis J Biol Chem 27829478ndash29486

14 ShannonCE (1948) A mathematical theory of communicationBell Syst Tech J 27 379ndash423 623ndash656 httpcmbell-labscomcmmswhatshannondaypaperhtml

15 PierceJR (1980) An Introduction to Information Theory SymbolsSignals and Noise 2nd edn Dover Publications Inc New York

16 SchneiderTD StormoGD GoldL and EhrenfeuchtA (1986)Information content of binding sites on nucleotide sequencesJ Mol Biol 188 415ndash431 httpwwwccrnpncifcrfgov~tomspaperschneider1986

17 SchneiderTD and MastronardeD (1996) Fast multiple alignmentof ungapped DNA sequences using information theory and arelaxation method Discrete Appli Math 71 259ndash268 httpwwwccrnpncifcrfgov~tomspapermalign

18 SchneiderTD and StephensRM (1990) Sequence logos a newway to display consensus sequences Nucleic Acids Res 186097ndash6100 httpwwwccrnpncifcrfgov~tomspaperlogopaper

19 SchneiderTD (1997) Information content of individual geneticsequences J Theor Biol 189 427ndash441 httpwwwccrnpncifcrfgov~tomspaperri

20 SchneiderTD (1997) Sequence walkers a graphical method todisplay how binding proteins interact with DNA or RNAsequences Nucleic Acids Res 25 4408ndash4415 httpwwwccrnpncifcrfgov~tomspaperwalker erratum NAR 26(4)1135 1998

21 PaninaEM MironovAA and GelfandMS (2001) Comparativeanalysis of FUR regulons in gamma-proteobacteria Nucleic AcidsRes 29 5195ndash5206

22 SchneiderTD (2000) Evolution of biological informationNucleic Acids Res 28 2794ndash2799 httpwwwccrnpncifcrfgov~tomspaperev

23 VyhlidalCA RoganPK and LeederJS (2004) Development andrefinement of pregnane X receptor (PXR) DNA binding site modelusing information theory insights into PXR-mediated gene regula-tion J Biol Chem 279 46779ndash46786

24 ZhengM DoanB SchneiderTD and StorzG (1999) OxyR andSoxRS regulation of fur J Bacteriol 181 4639ndash4643 httpwwwccrnpncifcrfgov~tomspaperoxyrfur

25 HengenPN BartramSL StewartLE and SchneiderTD (1997)Information analysis of Fis binding sites Nucleic Acids Res 254994ndash5002 httpwwwccrnpncifcrfgov~tomspaperfisinfo

26 WoodTI GriffithKL FawcettWP Jair K-W SchneiderTDand WolfRE (1999) Interdependence of the position andorientation of SoxS binding sites in the transcriptional activation ofthe class I subset of Escherichia coli superoxide-inducible promotersMol Microbiol 34 414ndash430

27 ZhengM WangX DoanB LewisKA SchneiderTD andStorzG (2001) Computation-directed identification of OxyR-DNAbinding sites in Escherichia coli J Bacteriol 183 4571ndash4579

28 RoganPK FauxBM and SchneiderTD (1998) Informationanalysis of human splice site mutations Hum Mutat 12 153ndash171Erratum in Hum Mutat 199913(1)82 httpwwwccrnpncifcrfgov~tomspaperrfs

29 ChenZ and SchneiderTD (2005) Information theory basedT7-like promoter models classification of bacteriophages anddifferential evolution of promoters and their polymerasesNucleic Acids Res 33 6172ndash6187 httpwwwccrnpncifcrfgov~tomspaperst7like

30 ChenZ and SchneiderTD (2006) Comparative analysis of tandemT7-like promoter containing regions in enterobacterial genomesreveals a novel group of genetic islands Nucleic Acids Res 341133ndash1147 httpwwwccrnpncifcrfgov~tomspaperst7island

31 SchneiderTD (1996) Reading of DNA sequence logos predictionof major groove binding by information theory Meth Enzymol274 445ndash455 httpwwwccrnpncifcrfgov~tomspaperoxyr

32 GriggsDW and KoniskyJ (1989) Mechanism for iron-regulatedtranscription of the Escherichia coli cir gene metal-dependentbinding of Fur protein to the promoters J Bacteriol 1711048ndash1052

33 AngererA and BraunV (1998) Iron regulates transcription of theEscherichia coli ferric citrate transport genes directly and throughthe transcription initiation proteins Arch Microbiol 169 483ndash490

34 de LorenzoV HerreroM GiovanniniF and NeilandsJB (1988)Fur (ferric uptake regulation) protein and CAP (catabolite-activatorprotein) modulate transcription of fur gene in Escherichia coliEur J Biochem 173 537ndash546

35 HuntMD PettisGS and McIntoshMA (1994) Promoter andoperator determinants for fur-mediated iron regulation in thebidirectional fepA-fes control region of the Escherichia colienterobactin gene system J Bacteriol 176 3944ndash3955

36 TardatB and TouatiD (1993) Iron and oxygen regulation ofEscherichia coli MnSOD expression competition between the globalregulators Fur and ArcA for binding to DNA Mol Microbiol 953ndash63

37 EscolarL Perez-MartinJ and de LorenzoV (2000) Evidence ofan unusually long operator for the fur repressor in the aerobactinpromoter of Escherichia coli J Biol Chem 275 24709ndash24714

38 YoungGM and PostleK (1994) Repression of tonB transcriptionduring anaerobic growth requires Fur binding at the promoterand a second factor binding upstream Mol Microbiol 11943ndash954

39 BrickmanTJ OzenbergerBA and McIntoshMA (1990)Regulation of divergent transcription from the iron-responsivefepB-entC promoter-operator regions in Escherichia coliJ Mol Biol 212 669ndash682

40 BaichooN and HelmannJD (2002) Recognition of DNA by Fura reinterpretation of the Fur box consensus sequence J Bacteriol184 5826ndash5832

41 OchsnerUA and VasilML (1996) Gene repression by the ferricuptake regulator in Pseudomonas aeruginosa cycle selection of iron-regulated genes Proc Natl Acad Sci USA 93 4409ndash4414

42 BaichooN WangT YeR and HelmannJD (2002) Globalanalysis of the Bacillus subtilis Fur regulon and the iron starvationstimulon Mol Microbiol 45 1613ndash1629

43 ChristoffersenCA BrickmanTJ Hook-BarnardI andMcIntoshMA (2001) Regulatory architecture of the iron-regulatedfepD-ybdA bidirectional promoter region in Escherichia coliJ Bacteriol 183 2059ndash2070

44 ChaiS WelchTJ and CrosaJH (1998) Characterization of theinteraction between Fur and the iron transport promoter of thevirulence plasmid in Vibrio anguillarum J Biol Chem 27333841ndash33847

6776 Nucleic Acids Research 2007 Vol 35 No 20

45 EscolarL Perez-MartinJ and de LorenzoV (1998) Coordinatedrepression in vitro of the divergent fepA-fes promoters ofEscherichia coli by the iron uptake regulation (Fur) proteinJ Bacteriol 180 2579ndash2582

46 EscolarL de LorenzoV and Perez-MartinJ (1997)Metalloregulation in vitro of the aerobactin promoter of Escherichiacoli by the Fur (ferric uptake regulation) protein Mol Microbiol26 799ndash808

47 ShultzabergerRK ChenZ LewisKA and SchneiderTD (2007)Anatomy of Escherichia coli s70 promoters Nucleic Acids Res 35771ndash788 httpwwwccrnpncifcrfgov~tomspaperflexprom

48 ShultzabergerRK BucheimerRE RuddKE andSchneiderTD (2001) Anatomy of Escherichia coli ribosomebinding sites J Mol Biol 313 215ndash228 httpwwwccrnpncifcrfgov~tomspaperflexrbs

49 HiraoI KawaiG YoshizawaS NishimuraY IshidoYWatanabeK and MiuraK (1994) Most compact hairpin-turnstructure exerted by a short DNA fragment d(GCGAAGC) insolution an extraordinarily stable structure resistant to nucleasesand heat Nucleic Acids Res 22 576ndash582

50 Eick-HelmerichK and BraunV (1989) Import of biopolymers intoEscherichia coli nucleotide sequences of the exbB and exbD genesare homologous to those of the tolQ and tolR genes respectivelyJ Bacteriol 171 5117ndash5126

51 VassinovaN and KozyrevD (2000) A method for direct cloning ofFur-regulated genes identification of seven new Fur-regulated lociin Escherichia coli Microbiology 146 3171ndash3182

52 ToledanoMB KullikI TrinhF BairdPT SchneiderTD andStorzG (1994) Redox-dependent shift of OxyR-DNA contactsalong an extended DNA binding site a mechanism for differentialpromoter selection Cell 78 897ndash909

53 BlattnerFR PlunkettG III BlochCA PernaNT BurlandVRileyM Collado-VidesJ GlasnerJD RodeCK et al (1997)The complete genome sequence of Escherichia coli K-12 Science277 1453ndash1474

54 BaggA and NeilandsJB (1987) Molecular mechanism of regula-tion of siderophore-mediated iron assimilation Microbiol Rev 51509ndash518

55 LavrrarJL and McIntoshMA (2003) Architecture of a Furbinding site a comparative analysis J Bacteriol 185 2194ndash2202

56 OrchardK and MayGE (1993) An EMSA-based method fordetermining the molecular weight of a protein ndash DNA complexNucleic Acids Res 21 3335ndash3336

57 MasseE and GottesmanS (2002) A small RNA regulates theexpression of genes involved in iron metabolism in Escherichia coliProc Natl Acad Sci USA 99 4620ndash4625

58 LavrrarJL ChristoffersenCA and McIntoshMA (2002)FurndashDNA interactions at the bidirectional fepDGC-entS promoterregion in Escherichia coli J Mol Biol 322 983ndash995

59 DelanyI RappuoliR and ScarlatoV (2004) Fur functions as anactivator and as a repressor of putative virulence genes in Neisseriameningitidis Mol Microbiol 52 1081ndash1090

60 MasseE VanderpoolCK and GottesmanS (2005) Effect ofRyhB small RNA on global iron use in Escherichia coliJ Bacteriol 187 6962ndash6971

61 ArgamanL HershbergR VogelJ BejeranoG WagnerEGMargalitH and AltuviaS (2001) Novel small RNA-encodinggenes in the intergenic regions of Escherichia coli Curr Biol 11941ndash950

62 WassarmanKM RepoilaF RosenowC StorzG andGottesmanS (2001) Identification of novel small RNAs usingcomparative genomics and microarrays Genes Dev 15 1637ndash1651

63 RobisonK McGuireAM and ChurchGM (1998) A compre-hensive library of DNA-binding site matrices for 55 proteins appliedto the complete Escherichia coli K-12 genome J Mol Biol 284241ndash254

64 SalgadoH Santos-ZavaletaA Gama-CastroS Millan-ZarateDDiaz-PeredoE Sanchez-SolanoF Perez-RuedaEBonavides-MartinezC and Collado-VidesJ (2001) RegulonDB(version 32) transcriptional regulation and operon organization inEscherichia coli K-12 Nucleic Acids Res 29 72ndash74

65 AndrewsSC RobinsonAK and Rodriguez-QuinonesF (2003)Bacterial iron homeostasis FEMS Microbiol Rev 27 215ndash237

66 SchneiderTD (2002) Consensus Sequence Zen ApplBioinformatics 1 111ndash119 httpwwwccrnpncifcrfgov~tomspaperszen

67 RuddKE and SchneiderTD (1992) Compilation of E coliribosome binding sites In MillerJH (ed) A Short Course inBacterial Genetics A Laboratory Manual and Handbook forEscherichia coli and Related Bacteria Cold Spring HarborCold Spring Harbor Laboratory Press New York pp 1719ndash1745

68 HengenPN LyakhovIG StewartLE and SchneiderTD(2003) Molecular flip-flops formed by overlapping Fis sitesNucleic Acids Res 31 6663ndash6673

69 FeeJA (1991) Regulation of sod genes in Escherichia colirelevance to superoxide dismutase function Mol Microbiol 52599ndash2610

70 IgarashiK SaishoT YuguchiM and KashiwagiK (1997)Molecular mechanism of polyamine stimulation of thesynthesis of oligopeptide-binding protein J Biol Chem 2724058ndash4064

71 YoshidaM MeksuriyenD KashiwagiK KawaiG andIgarashiK (1999) Polyamine stimulation of the synthesis ofoligopeptide-binding protein (OppA) Involvement of a structuralchange of the Shine-Dalgarno sequence and the initiation codonaug in oppa mRNA J Biol Chem 274 22723ndash22728

72 LitwinCM and CalderwoodSB (1993) Role of iron in regulationof virulence genes Clin Microbiol Rev 6 137ndash149

73 MakuiH RoigE ColeST HelmannJD GrosP andCellierMF (2000) Identification of the Escherichia coli K-12Nramp orthologue (MntH) as a selective divalent metal iontransporter Mol Microbiol 35 1065ndash1078

74 PatzerSI and HantkeK (2001) Dual repression by Fe(2+)-Furand Mn(2+)-MntR of the mntH gene encoding an NRAMP-likeMn(2+) transporter in Escherichia coli J Bacteriol 1834806ndash4813

75 FurrerJL SandersDN Hook-BarnardIG and McIntoshMA(2002) Export of the siderophore enterobactin in Escherichia coliinvolvement of a 43 kDa membrane exporter Mol Microbiol 441225ndash1234

76 CunninghamL GruerMJ and GuestJR (1997) Transcriptionalregulation of the aconitase genes (acnA and acnB) of Escherichiacoli Microbiology 143 ( Pt 12) 3795ndash3805

77 FuentesAM Diaz-MejiaJJ Maldonado-RodriguezR andAmabile-CuevasCF (2001) Differential activities of theSoxR protein of Escherichia coli SoxS is not required for geneactivation under iron deprivation FEMS Microbiol Lett 201271ndash275

78 VargheseS TangY and ImlayJA (2003) Contrasting sensitivitiesof Escherichia coli aconitases A and B to oxidation and irondepletion J Bacteriol 185 221ndash230

79 ZavitzKH DiGateRJ and MariansKJ (1991) The PriB andPriC replication proteins of Escherichia coli Genes DNAsequence overexpression and purification J Biol Chem 26613988ndash13995

80 SandlerSJ (2005) Requirements for replication restart proteinsduring constitutive stable DNA replication in Escherichia coli K-12Genetics 169 1799ndash1806

81 SchneiderTD (2001) Strong minor groove base conservation insequence logos implies DNA distortion or base flipping duringreplication and transcription initiation Nucleic Acids Res 294881ndash4891 httpwwwccrnpncifcrfgov~tomspaperbaseflip

Nucleic Acids Research 2007 Vol 35 No 20 6777

Page 12: Discovery of Fur binding site clusters in Escherichia coli ...biitcomm/research/references... · using an fhuF:lacZ fusion and Fur consensus sequence-containing plasmid titrant on

especially the 24-site M9 model approximate the bindingcharacteristics of Fur more fully than previous modelsthat depended on conventional consensus sequences datafrom multiple species and sequences that were not foot-printed (13251) The rigorous approach used hererevealed new binding sites disproved two sites predictedby a consensus sequence and clarified the manner inwhich the protein binds

To produce these models binding sequences weremultiply aligned to maximize the information content indifferent window sizes Because multiple Fur sites overlapour analysis of proven Fur binding sequences from three

bacterial species gave two or three different alignmentsdepending on the species (Figure 1 SupplementaryFigure S6) By testing these models against publisheddata we showed clearly that the M9 model representssingle-dimer binding sites the M12 model is for two-dimerbinding while the compact M6 alignment was caused bycompression of the binding sequences into a small window(Tables 1 and 2 Figures 1 and 2 Supplementary FiguresS3 S5 and S7)As evaluated by their respective M9 models our

initial set of 11 footprinted E coli Fur binding sitesall contain overlapping sites of 6-base spacing

Table 4 Strong Fur clusters in the E coli genome predicted by the 24-site M9 model

Ri Number Size Coordinate Genes Experimental Position References(bits) (bp) data

274 12 95 2066611 yoeA G 48 NA274 5 31 786853 gpmA G 35 (1351)253 4 28 579824 nohB 133 (1)252 4 31 3579054 ryhB G 15 (57)233 3 31 4098746 sodA F 87 (6069)231 7 59 4516309 fecI F 51 (1360)231 4 37 624072 entC F 36 (39)228 2 25 1903412 yebN 300 NA223 4 48 4603779 fhuFa FG 93 (135160)223 3 25 1634624 nohA G 233 (51)217 2 25 1886651 fadD G +1119b NA215 2 25 1298973 oppA G 233 (7071)213 3 31 1762463 sufA 53 (1360)207 3 31 2244980 cir F 189 (1372)206 5 40 1118358 yceJ 89 NA202 3 31 3465053 bfd 40 (1360)201 10 53 1787602 ydiE 35 (13)201 2 35 1080513 ycdN 57 (1360)199 8 106 3214572 yqjH 59 (13)199 7 103 1752756 ydhY 255 NA199 2 25 2510784 mntH 56 (7374)196 3 25 2231892 yeiT 163 NA196 2 25 1577358 yddA +8c (13)195 3 31 167439 fhuA G 45 (1360)194 4 37 621437 fepD ndash entS FG 25 86 (1375)193 3 52 675768 ybeQ +2c NA191 3 31 3150076 exbB G 70 (136072)187 8 96 2784036 ygaQ 383 NA187 4 37 611889 fepA ndash fes F 172 149 (13)186 7 38 3266404 yhaC 33 NA186 4 28 331901 yahA G +306e NA183 4 61 331085 yahA 510e NA176 6 41 1333484 acnA 371 (5776ndash78)176 2 25 4514779 fecA F 79 (1360)176 1 19 2254041 yeiL +5d NA175 3 25 490057 priC ndash ybaN 21 49 (7980)173 2 29 2454142 yfcV 474 NA172 8 41 3582772 yrhB 10 NA172 4 37 840877 fiu 123 (13)171 3 40 1631071 ydfK +8c (13)171 3 40 1432273 ynaE +8c (13)171 2 25 4570225 yjiT 212 NA

The strongest individual information value (Ri bits) in each cluster is shown followed by the number of walkers (Rigt 0 bits) the size of the clusterthe coordinate of the strongest site the genes that are presumably controlled experimental data footprinted (F) andor gel shifted (G) sites thelocation of the strongest Fur site relative to the gene start or stop codon (as noted) and references about Fur regulation of the genes (NA means noreference is available)aThis cluster corresponds to the high-affinity region (fhuF1) in the fhuF promoter (Figure 5)bThe coordinate is high because Fur cluster is inside the fadD coding regioncThese Fur clusters overlap with translational startsdThis Fur cluster overlaps with the stop codoneThe yahA gene has two Fur clusters one is inside the coding region another is 510 bases upstream of the translational start The Fur cluster insidethe gene was tested by gel shift experiments (Figure 3)

Nucleic Acids Research 2007 Vol 35 No 20 6773

(Supplementary Figure S8) while only 7 of the 20P aeruginosa Fur binding sites and 6 of the 22 B subtilisFur binding sites are single-dimer binding sites (data notshown) This scarcity of multiple sites explains why anM12 alignment can be obtained for E coli but not forP aeruginosa or B subtilis When we performed multiplealignment by only using overlapping sites of 6-bp spacingwe obtained M12 alignments for both P aeruginosa andB subtilis (data not shown) These observations alsosupport the use of M9 as a single-dimer modelOur multiple alignment method should apply to any

DNA binding sites that contain internal direct repeats asrepetition within binding sites may cause alternativealignments of the sites In the RegulonDB (httpregulondbccgunammx) (64) a Fur binding modelwas created by multiply aligning 47 Fur sites althoughit is not clear what these 47 sites are and how thealignment was made Based on their alignment (httpregulondbccgunammxhtmlmatrix_Alignmentjsp) we made a sequence logo (data notshown) which we found to be similar to our M6 logosAlthough the M6 models appear to be able to predictsome sites (eg Tables 1 and 2 Supplementary Figures S3S5 and S7) the models only represent the internal partof overlapping Fur binding sites ie from 6 to thorn6 of theM12 model (Figure 1A)A 18 A crystal structure for P aeruginosa Fur

was obtained by Pohl et al and based on this theauthors proposed a single-dimer binding model (9) TheP aeruginosa crystal structure contains a putative DNAbinding a-helix H4 and a loop between helices H1 and H2both of which were proposed to be involved in bindingDNA through contacting bases in the major groove Thusa single dimer would protect two consecutive majorgrooves on the same face of DNA and the minimalrecognition unit of Fur should be close to 20 bp (9) OurM9 binding site models (19 bp a 9-1-9 inverted repeat) fitwith this single-dimer binding model closely Pohl et alalso proposed a two-dimer binding model in whichtwo Fur dimers bind two overlapping sites that are 5-bpapart (9) However in our whole genome scans andfootprint scans as well we did not observe any case of5-bp separated overlapping sites above 5 bits Instead wemostly found 3- or 6-bp separated overlapping sitesAbove five bits we found 95 cases of 3-bp spacing and92 cases of 6-bp spacingSeveral other consensus-based Fur binding site models

have been proposed to interpret a 19-bp consensuslsquoFur boxrsquo (Figure 6) (65) Within these two earlier

models the classical model and hexamer model(573754) interpreted the lsquoFur boxrsquo as a single recogni-tion unit of a Fur dimer Later Lavrrar and McIntoshsuggested that a 13-bp inverted repeat (6-1-6) is theminimal unit recognized by a single Fur dimer and thattwo overlapping lsquo6-1-6rsquo motifs correspond to the Fur boxwhich is required for high-affinity binding of Fur (5558)Baichoo and Helmann reinterpreted the Fur box as twooverlapping 7-1-7 inverted repeats with a 6-bp spacingand suggested that a 7-1-7 site but not 6-1-6 representsthe minimal recognition unit of Fur (40) Baichoo andHelmann also demonstrated that high-affinity bindingby Fur can happen on a single-dimer binding site ie aninverted 7-1-7 site

The 7-1-7 model basically agrees with our M9 models(Figure 6) The main difference is that the 7-1-7 model didnot count the weakly conserved bases at positions 9 8thorn8 and thorn9 in the B subtilis Fur binding site alignment(Figure 4C) As the E coli and P aeruginosa M9 modelshave highly conserved bases at these four positions(Figure 4A and B) and also the Fur binding mechanismsfor these three species are similar (9) we suggest thata minimal Fur binding unit of 19 bp (9-1-9) as representedby our M9 models is more reasonable

Relative to our M9 models the Fur box only representsthe internal part of two overlapping sites that areseparated by 6 bases ie from 9 to thorn9 of the M12logo (Figures 1A and 6) thus any predictions based on theFur box may not be accurate Two sites (exbD and ygaC)that were predicted by matching to the Fur box wereshown to be non-binding (Figure 3)

Many difficulties in understanding Fur binding sites canbe attributed to the choice of consensus sequences asa model (66) In contrast to the individual informationweight matrix model the consensus sequence methodignores the varying importance of bases by treatingmismatches equivalently In addition the consensusmethod does not have a criterion for an acceptablenumber of mismatches

Sequence walkers predicted by the 24-site M9 modelcover 83 of the footprints (Table 3 Figure 5Supplementary Figure S8) Similar results were obtainedfor both the P aeruginosa and B subtilis M9 models(data not shown) These results strongly suggest that ourM9 models can accurately predict Fur binding

Fur Binding Site Clusters

The whole genome scan with the 24-site M9model revealed363 Fur binding clusters in E coli most of which containmultiple walkers (up to 14) covering regions up to about140 bases This is comparable to the size of knownfootprints which ranges from 30 to 103 bases (Table 3Supplementary Figure S8) Most of the clusters (303 outof 363 83) were found to overlap with one or morestrong s70 promoters Indeed the four Fur clusters inthe middle of coding regions gspC garP yahA and fadDthat we tested by gel mobility shift assays (Figure 3) allhave one or more potential s70 sites (Ri gt 4 bits) exactlyoverlapping them (see Figures S10ndashS13) The function ofthese control elements is unknown

Fur boxClassical model 9-1-9Hexamer model 6-6-1-6Lavrrar model 6-1-6

Baichoo model 7-1-7

M9 model 9-1-9

GATAATGATAATCATTATC

========gtOlt========

=====gt=====gtOlt=====

=====gtOlt==========gtOlt=====

t=====gtOlt============gtOlt=====a

aat=====gtOlt================gtOlt=====att

Figure 6 Different Fur binding models to interpret the Fur box

6774 Nucleic Acids Research 2007 Vol 35 No 20

We correlated our whole genome scan with publishedmicroarray data about Fur in E coli (1360) The resultsshowed that all operons that are involved in ironmetabolism have a predicted strong Fur binding cluster(higher than 13 bits) suggesting direct repression ofthese genes by Fur In contrast only a few of otherFur-activated or repressed operons have a predicted Furcluster higher than 10 bits suggesting indirect control ofthese genes by Fur It has been shown that RNA generyhB directly targets and represses a set of other genes andryhB itself is directly repressed by Fur thus ryhB playsan important role in indirect activation of genes by Furin E coli (135760)

It has been suggested that in N meningitidis Fur canalso directly activate genes by binding to the regionupstream of the promoters (59) However our wholegenome analysis did not support this activation mechan-ism in E coli The E coli ryhB gene is located betweengenes yhhX and yhhY No corresponding homologs of thegenes ryhB yhhX or yhhY can be found in N meningitidisThis suggests that Fur activation mechanisms may bedifferent in different species

Disentangling Fur models

Each binding site that we have studied in detail withinformation theory has had unique properties andchallenges for analysis For example ribosome bindingsites were initially modeled as rigid objects (16) but whenenough sites became available it was clear that theShinendashDalgarno sequence was being lost because it has avariable distance to the initiation codon (67) and a flexiblemodel was required (48) The flexible model was latersuccessfully applied to s70 promoters (47) and that wasused to study Fur sites Likewise information theoryanalysis revealed that Fis binding sites are commonlyfound to overlap each other (68) leading to someuncertainty as to how much nearby sites influence thesites being studied in the alignments Overlapping Fursites that are bound by Fur protein dimers create an evenmore complex situation

Although the 24-site M9 model was quite successful themodel is comprised of information from multiple Fur sitesbecause the Fur dimers overlap on the DNA Since theclusters frequently display 6-base separation of sitespositions 3 to thorn9 of the model contain data frompositions 9 to thorn3 of the adjacent Fur sites eg see thelogo of Figure 4 and the M9 walkers of Figure 2AIn addition information from sites separated by 3 basesand the complements of all of these also contribute to theoverall information content of the model It is not obvioushow to disentangle these effects We suggest that the M9model can be scanned across the genome to locate strongsingle-dimer binding sites which could then be experi-mentally confirmed and used to construct a disentangledmodel

Unfortunately the distribution of the individual infor-mation in these sites would be determined by the modelused to locate them and this could introduce biases Forexample the mean of the distribution can be altered justby selecting different subsets of sites from the distribution

A similar bias has occurred in binding site databaseswhere the consensus sequence was used to locate newsites (23) Thus although the sequence logo would nolonger contain self-overlapping data it would not neces-sarily represent the natural distribution of site strengthsTo approach such an ultimate model may require furtherscanning of footprinted regions Incorporation of theweaker sites in the footprints would of course re-entanglethe model so such scans might be best used only todetermine whether the model can detect the weaker sitesA viable solution may be to obtain all single-dimerbinding sites in the genome that have more than zero bitsand can be confirmed experimentally If the resultingmodel matches the observed footprints it may be consi-dered complete On the other hand if the resultingmodel does not match the footprints the mode of bindingby Fur could be different between single dimers andclusters

SUPPLEMENTARY DATA

Supplementary data are available at NAR Online

ACKNOWLEDGEMENTS

We thank Tom OrsquoHalloran and Caryn Outten for Furprotein Pete Rogan for helping to develop the local bestconcept Elaine Bucheimer for writing the localbestprogram and Lakshmanan Iyer Brent Jewett DanielleNeedle Xiao Ma Shu Ouyang Denise Rubens andBruce Shapiro for their helpful comments and discussionsKAL also thanks the National Cancer Institute atFrederick for sponsoring the Werner H Kirsten StudentIntern Program This publication has been funded in partwith Federal funds from the National Cancer InstituteNational Institutes of Health under contract NO1-CO-12400 The content of this publication does not necessarilyreflect the views or policies of the Department of Healthand Human Services nor does mention of trade namescommercial products or organizations imply endorsementby the US Government This research was supported inpart by the Intramural Research Program of the NIHNational Cancer Institute Center for Cancer ResearchFunding to pay the Open Access publication charges forthis article was provided by National Cancer Institute

Conflict of interest statement None declared

REFERENCES

1 StojiljkovicI BaumlerAJ and HantkeK (1994) Fur regulon ingram-negative bacteria Identification and characterization of newiron-regulated Escherichia coli genes by a fur titration assay[published erratum appears in J Mol Biol 1994 Jul 15240(3)271]J Mol Biol 236 531ndash545

2 Le CamE FrechonD BarrayM FourcadeA and DelainE(1994) Observation of binding and polymerization of Fur repressoronto operator-containing DNA with electron and atomic forcemicroscopes Proc Natl Acad Sci USA 91 11816ndash11820

3 CalderwoodSB and MekalanosJJ (1987) Iron regulation ofShiga-like toxin expression in Escherichia coli is mediated by thefur locus J Bacteriol 169 4759ndash4764

Nucleic Acids Research 2007 Vol 35 No 20 6775

4 NiederhofferEC NaranjoCM BradleyKL and FeeJA (1990)Control of Escherichia coli superoxide dismutase (sodA and sodB)genes by the ferric uptake regulation (fur) locus J Bacteriol 1721930ndash1938

5 EscolarL Perez-MartinJ and de LorenzoV (1998) Binding of theFur (Ferric Uptake Regulator) repressor of Escherichia coli toarrays of the GATAAT sequence J Mol Biol 283 537ndash547

6 BaggA and NeilandsJB (1987) Ferric uptake regulation proteinacts as a repressor employing iron (II) as a cofactor to bind theoperator of an iron transport operon in Escherichia coliBiochemistry 26 5471ndash5477

7 de LorenzoV WeeS HerreroM and NeilandsJB (1987)Operator sequences of the aerobactin operon of plasmid ColV-K30binding the ferric uptake regulation (fur) repressor J Bacteriol169 2624ndash2630

8 AlthausEW OuttenCE OlsonKE CaoH andOrsquoHalloranTV (1999) The ferric uptake regulation (Fur) repressoris a zinc metalloprotein Biochemistry 38 6559ndash6569

9 PohlE HallerJC MijovilovichA Meyer-KlauckeWGarmanE and VasilML (2003) Architecture of a protein centralto iron homeostasis crystal structure and spectroscopic analysis ofthe ferric uptake regulator Mol Microbiol 47 903ndash915

10 KammlerM SchonC and HantkeK (1993) Characterization ofthe ferrous iron uptake system of Escherichia coli J Bacteriol 1756212ndash6219

11 DesaiPJ AngererA and GencoCA (1996) Analysis of Furbinding to operator sequences within the Neisseria gonorrhoeae fbpApromoter J Bacteriol 178 5020ndash5023

12 HeidrichC HantkeK BierbaumG and SahlHG (1996)Identification and analysis of a gene encoding a Fur-likeprotein of Staphylococcus epidermidis FEMS Microbiol Lett 140253ndash259

13 McHughJP Rodriguez-QuinonesF Abdul-TehraniHSvistunenkoDA PooleRK CooperCE and AndrewsSC(2003) Global iron-dependent gene regulation in Escherichia coliA new mechanism for iron homeostasis J Biol Chem 27829478ndash29486

14 ShannonCE (1948) A mathematical theory of communicationBell Syst Tech J 27 379ndash423 623ndash656 httpcmbell-labscomcmmswhatshannondaypaperhtml

15 PierceJR (1980) An Introduction to Information Theory SymbolsSignals and Noise 2nd edn Dover Publications Inc New York

16 SchneiderTD StormoGD GoldL and EhrenfeuchtA (1986)Information content of binding sites on nucleotide sequencesJ Mol Biol 188 415ndash431 httpwwwccrnpncifcrfgov~tomspaperschneider1986

17 SchneiderTD and MastronardeD (1996) Fast multiple alignmentof ungapped DNA sequences using information theory and arelaxation method Discrete Appli Math 71 259ndash268 httpwwwccrnpncifcrfgov~tomspapermalign

18 SchneiderTD and StephensRM (1990) Sequence logos a newway to display consensus sequences Nucleic Acids Res 186097ndash6100 httpwwwccrnpncifcrfgov~tomspaperlogopaper

19 SchneiderTD (1997) Information content of individual geneticsequences J Theor Biol 189 427ndash441 httpwwwccrnpncifcrfgov~tomspaperri

20 SchneiderTD (1997) Sequence walkers a graphical method todisplay how binding proteins interact with DNA or RNAsequences Nucleic Acids Res 25 4408ndash4415 httpwwwccrnpncifcrfgov~tomspaperwalker erratum NAR 26(4)1135 1998

21 PaninaEM MironovAA and GelfandMS (2001) Comparativeanalysis of FUR regulons in gamma-proteobacteria Nucleic AcidsRes 29 5195ndash5206

22 SchneiderTD (2000) Evolution of biological informationNucleic Acids Res 28 2794ndash2799 httpwwwccrnpncifcrfgov~tomspaperev

23 VyhlidalCA RoganPK and LeederJS (2004) Development andrefinement of pregnane X receptor (PXR) DNA binding site modelusing information theory insights into PXR-mediated gene regula-tion J Biol Chem 279 46779ndash46786

24 ZhengM DoanB SchneiderTD and StorzG (1999) OxyR andSoxRS regulation of fur J Bacteriol 181 4639ndash4643 httpwwwccrnpncifcrfgov~tomspaperoxyrfur

25 HengenPN BartramSL StewartLE and SchneiderTD (1997)Information analysis of Fis binding sites Nucleic Acids Res 254994ndash5002 httpwwwccrnpncifcrfgov~tomspaperfisinfo

26 WoodTI GriffithKL FawcettWP Jair K-W SchneiderTDand WolfRE (1999) Interdependence of the position andorientation of SoxS binding sites in the transcriptional activation ofthe class I subset of Escherichia coli superoxide-inducible promotersMol Microbiol 34 414ndash430

27 ZhengM WangX DoanB LewisKA SchneiderTD andStorzG (2001) Computation-directed identification of OxyR-DNAbinding sites in Escherichia coli J Bacteriol 183 4571ndash4579

28 RoganPK FauxBM and SchneiderTD (1998) Informationanalysis of human splice site mutations Hum Mutat 12 153ndash171Erratum in Hum Mutat 199913(1)82 httpwwwccrnpncifcrfgov~tomspaperrfs

29 ChenZ and SchneiderTD (2005) Information theory basedT7-like promoter models classification of bacteriophages anddifferential evolution of promoters and their polymerasesNucleic Acids Res 33 6172ndash6187 httpwwwccrnpncifcrfgov~tomspaperst7like

30 ChenZ and SchneiderTD (2006) Comparative analysis of tandemT7-like promoter containing regions in enterobacterial genomesreveals a novel group of genetic islands Nucleic Acids Res 341133ndash1147 httpwwwccrnpncifcrfgov~tomspaperst7island

31 SchneiderTD (1996) Reading of DNA sequence logos predictionof major groove binding by information theory Meth Enzymol274 445ndash455 httpwwwccrnpncifcrfgov~tomspaperoxyr

32 GriggsDW and KoniskyJ (1989) Mechanism for iron-regulatedtranscription of the Escherichia coli cir gene metal-dependentbinding of Fur protein to the promoters J Bacteriol 1711048ndash1052

33 AngererA and BraunV (1998) Iron regulates transcription of theEscherichia coli ferric citrate transport genes directly and throughthe transcription initiation proteins Arch Microbiol 169 483ndash490

34 de LorenzoV HerreroM GiovanniniF and NeilandsJB (1988)Fur (ferric uptake regulation) protein and CAP (catabolite-activatorprotein) modulate transcription of fur gene in Escherichia coliEur J Biochem 173 537ndash546

35 HuntMD PettisGS and McIntoshMA (1994) Promoter andoperator determinants for fur-mediated iron regulation in thebidirectional fepA-fes control region of the Escherichia colienterobactin gene system J Bacteriol 176 3944ndash3955

36 TardatB and TouatiD (1993) Iron and oxygen regulation ofEscherichia coli MnSOD expression competition between the globalregulators Fur and ArcA for binding to DNA Mol Microbiol 953ndash63

37 EscolarL Perez-MartinJ and de LorenzoV (2000) Evidence ofan unusually long operator for the fur repressor in the aerobactinpromoter of Escherichia coli J Biol Chem 275 24709ndash24714

38 YoungGM and PostleK (1994) Repression of tonB transcriptionduring anaerobic growth requires Fur binding at the promoterand a second factor binding upstream Mol Microbiol 11943ndash954

39 BrickmanTJ OzenbergerBA and McIntoshMA (1990)Regulation of divergent transcription from the iron-responsivefepB-entC promoter-operator regions in Escherichia coliJ Mol Biol 212 669ndash682

40 BaichooN and HelmannJD (2002) Recognition of DNA by Fura reinterpretation of the Fur box consensus sequence J Bacteriol184 5826ndash5832

41 OchsnerUA and VasilML (1996) Gene repression by the ferricuptake regulator in Pseudomonas aeruginosa cycle selection of iron-regulated genes Proc Natl Acad Sci USA 93 4409ndash4414

42 BaichooN WangT YeR and HelmannJD (2002) Globalanalysis of the Bacillus subtilis Fur regulon and the iron starvationstimulon Mol Microbiol 45 1613ndash1629

43 ChristoffersenCA BrickmanTJ Hook-BarnardI andMcIntoshMA (2001) Regulatory architecture of the iron-regulatedfepD-ybdA bidirectional promoter region in Escherichia coliJ Bacteriol 183 2059ndash2070

44 ChaiS WelchTJ and CrosaJH (1998) Characterization of theinteraction between Fur and the iron transport promoter of thevirulence plasmid in Vibrio anguillarum J Biol Chem 27333841ndash33847

6776 Nucleic Acids Research 2007 Vol 35 No 20

45 EscolarL Perez-MartinJ and de LorenzoV (1998) Coordinatedrepression in vitro of the divergent fepA-fes promoters ofEscherichia coli by the iron uptake regulation (Fur) proteinJ Bacteriol 180 2579ndash2582

46 EscolarL de LorenzoV and Perez-MartinJ (1997)Metalloregulation in vitro of the aerobactin promoter of Escherichiacoli by the Fur (ferric uptake regulation) protein Mol Microbiol26 799ndash808

47 ShultzabergerRK ChenZ LewisKA and SchneiderTD (2007)Anatomy of Escherichia coli s70 promoters Nucleic Acids Res 35771ndash788 httpwwwccrnpncifcrfgov~tomspaperflexprom

48 ShultzabergerRK BucheimerRE RuddKE andSchneiderTD (2001) Anatomy of Escherichia coli ribosomebinding sites J Mol Biol 313 215ndash228 httpwwwccrnpncifcrfgov~tomspaperflexrbs

49 HiraoI KawaiG YoshizawaS NishimuraY IshidoYWatanabeK and MiuraK (1994) Most compact hairpin-turnstructure exerted by a short DNA fragment d(GCGAAGC) insolution an extraordinarily stable structure resistant to nucleasesand heat Nucleic Acids Res 22 576ndash582

50 Eick-HelmerichK and BraunV (1989) Import of biopolymers intoEscherichia coli nucleotide sequences of the exbB and exbD genesare homologous to those of the tolQ and tolR genes respectivelyJ Bacteriol 171 5117ndash5126

51 VassinovaN and KozyrevD (2000) A method for direct cloning ofFur-regulated genes identification of seven new Fur-regulated lociin Escherichia coli Microbiology 146 3171ndash3182

52 ToledanoMB KullikI TrinhF BairdPT SchneiderTD andStorzG (1994) Redox-dependent shift of OxyR-DNA contactsalong an extended DNA binding site a mechanism for differentialpromoter selection Cell 78 897ndash909

53 BlattnerFR PlunkettG III BlochCA PernaNT BurlandVRileyM Collado-VidesJ GlasnerJD RodeCK et al (1997)The complete genome sequence of Escherichia coli K-12 Science277 1453ndash1474

54 BaggA and NeilandsJB (1987) Molecular mechanism of regula-tion of siderophore-mediated iron assimilation Microbiol Rev 51509ndash518

55 LavrrarJL and McIntoshMA (2003) Architecture of a Furbinding site a comparative analysis J Bacteriol 185 2194ndash2202

56 OrchardK and MayGE (1993) An EMSA-based method fordetermining the molecular weight of a protein ndash DNA complexNucleic Acids Res 21 3335ndash3336

57 MasseE and GottesmanS (2002) A small RNA regulates theexpression of genes involved in iron metabolism in Escherichia coliProc Natl Acad Sci USA 99 4620ndash4625

58 LavrrarJL ChristoffersenCA and McIntoshMA (2002)FurndashDNA interactions at the bidirectional fepDGC-entS promoterregion in Escherichia coli J Mol Biol 322 983ndash995

59 DelanyI RappuoliR and ScarlatoV (2004) Fur functions as anactivator and as a repressor of putative virulence genes in Neisseriameningitidis Mol Microbiol 52 1081ndash1090

60 MasseE VanderpoolCK and GottesmanS (2005) Effect ofRyhB small RNA on global iron use in Escherichia coliJ Bacteriol 187 6962ndash6971

61 ArgamanL HershbergR VogelJ BejeranoG WagnerEGMargalitH and AltuviaS (2001) Novel small RNA-encodinggenes in the intergenic regions of Escherichia coli Curr Biol 11941ndash950

62 WassarmanKM RepoilaF RosenowC StorzG andGottesmanS (2001) Identification of novel small RNAs usingcomparative genomics and microarrays Genes Dev 15 1637ndash1651

63 RobisonK McGuireAM and ChurchGM (1998) A compre-hensive library of DNA-binding site matrices for 55 proteins appliedto the complete Escherichia coli K-12 genome J Mol Biol 284241ndash254

64 SalgadoH Santos-ZavaletaA Gama-CastroS Millan-ZarateDDiaz-PeredoE Sanchez-SolanoF Perez-RuedaEBonavides-MartinezC and Collado-VidesJ (2001) RegulonDB(version 32) transcriptional regulation and operon organization inEscherichia coli K-12 Nucleic Acids Res 29 72ndash74

65 AndrewsSC RobinsonAK and Rodriguez-QuinonesF (2003)Bacterial iron homeostasis FEMS Microbiol Rev 27 215ndash237

66 SchneiderTD (2002) Consensus Sequence Zen ApplBioinformatics 1 111ndash119 httpwwwccrnpncifcrfgov~tomspaperszen

67 RuddKE and SchneiderTD (1992) Compilation of E coliribosome binding sites In MillerJH (ed) A Short Course inBacterial Genetics A Laboratory Manual and Handbook forEscherichia coli and Related Bacteria Cold Spring HarborCold Spring Harbor Laboratory Press New York pp 1719ndash1745

68 HengenPN LyakhovIG StewartLE and SchneiderTD(2003) Molecular flip-flops formed by overlapping Fis sitesNucleic Acids Res 31 6663ndash6673

69 FeeJA (1991) Regulation of sod genes in Escherichia colirelevance to superoxide dismutase function Mol Microbiol 52599ndash2610

70 IgarashiK SaishoT YuguchiM and KashiwagiK (1997)Molecular mechanism of polyamine stimulation of thesynthesis of oligopeptide-binding protein J Biol Chem 2724058ndash4064

71 YoshidaM MeksuriyenD KashiwagiK KawaiG andIgarashiK (1999) Polyamine stimulation of the synthesis ofoligopeptide-binding protein (OppA) Involvement of a structuralchange of the Shine-Dalgarno sequence and the initiation codonaug in oppa mRNA J Biol Chem 274 22723ndash22728

72 LitwinCM and CalderwoodSB (1993) Role of iron in regulationof virulence genes Clin Microbiol Rev 6 137ndash149

73 MakuiH RoigE ColeST HelmannJD GrosP andCellierMF (2000) Identification of the Escherichia coli K-12Nramp orthologue (MntH) as a selective divalent metal iontransporter Mol Microbiol 35 1065ndash1078

74 PatzerSI and HantkeK (2001) Dual repression by Fe(2+)-Furand Mn(2+)-MntR of the mntH gene encoding an NRAMP-likeMn(2+) transporter in Escherichia coli J Bacteriol 1834806ndash4813

75 FurrerJL SandersDN Hook-BarnardIG and McIntoshMA(2002) Export of the siderophore enterobactin in Escherichia coliinvolvement of a 43 kDa membrane exporter Mol Microbiol 441225ndash1234

76 CunninghamL GruerMJ and GuestJR (1997) Transcriptionalregulation of the aconitase genes (acnA and acnB) of Escherichiacoli Microbiology 143 ( Pt 12) 3795ndash3805

77 FuentesAM Diaz-MejiaJJ Maldonado-RodriguezR andAmabile-CuevasCF (2001) Differential activities of theSoxR protein of Escherichia coli SoxS is not required for geneactivation under iron deprivation FEMS Microbiol Lett 201271ndash275

78 VargheseS TangY and ImlayJA (2003) Contrasting sensitivitiesof Escherichia coli aconitases A and B to oxidation and irondepletion J Bacteriol 185 221ndash230

79 ZavitzKH DiGateRJ and MariansKJ (1991) The PriB andPriC replication proteins of Escherichia coli Genes DNAsequence overexpression and purification J Biol Chem 26613988ndash13995

80 SandlerSJ (2005) Requirements for replication restart proteinsduring constitutive stable DNA replication in Escherichia coli K-12Genetics 169 1799ndash1806

81 SchneiderTD (2001) Strong minor groove base conservation insequence logos implies DNA distortion or base flipping duringreplication and transcription initiation Nucleic Acids Res 294881ndash4891 httpwwwccrnpncifcrfgov~tomspaperbaseflip

Nucleic Acids Research 2007 Vol 35 No 20 6777

Page 13: Discovery of Fur binding site clusters in Escherichia coli ...biitcomm/research/references... · using an fhuF:lacZ fusion and Fur consensus sequence-containing plasmid titrant on

(Supplementary Figure S8) while only 7 of the 20P aeruginosa Fur binding sites and 6 of the 22 B subtilisFur binding sites are single-dimer binding sites (data notshown) This scarcity of multiple sites explains why anM12 alignment can be obtained for E coli but not forP aeruginosa or B subtilis When we performed multiplealignment by only using overlapping sites of 6-bp spacingwe obtained M12 alignments for both P aeruginosa andB subtilis (data not shown) These observations alsosupport the use of M9 as a single-dimer modelOur multiple alignment method should apply to any

DNA binding sites that contain internal direct repeats asrepetition within binding sites may cause alternativealignments of the sites In the RegulonDB (httpregulondbccgunammx) (64) a Fur binding modelwas created by multiply aligning 47 Fur sites althoughit is not clear what these 47 sites are and how thealignment was made Based on their alignment (httpregulondbccgunammxhtmlmatrix_Alignmentjsp) we made a sequence logo (data notshown) which we found to be similar to our M6 logosAlthough the M6 models appear to be able to predictsome sites (eg Tables 1 and 2 Supplementary Figures S3S5 and S7) the models only represent the internal partof overlapping Fur binding sites ie from 6 to thorn6 of theM12 model (Figure 1A)A 18 A crystal structure for P aeruginosa Fur

was obtained by Pohl et al and based on this theauthors proposed a single-dimer binding model (9) TheP aeruginosa crystal structure contains a putative DNAbinding a-helix H4 and a loop between helices H1 and H2both of which were proposed to be involved in bindingDNA through contacting bases in the major groove Thusa single dimer would protect two consecutive majorgrooves on the same face of DNA and the minimalrecognition unit of Fur should be close to 20 bp (9) OurM9 binding site models (19 bp a 9-1-9 inverted repeat) fitwith this single-dimer binding model closely Pohl et alalso proposed a two-dimer binding model in whichtwo Fur dimers bind two overlapping sites that are 5-bpapart (9) However in our whole genome scans andfootprint scans as well we did not observe any case of5-bp separated overlapping sites above 5 bits Instead wemostly found 3- or 6-bp separated overlapping sitesAbove five bits we found 95 cases of 3-bp spacing and92 cases of 6-bp spacingSeveral other consensus-based Fur binding site models

have been proposed to interpret a 19-bp consensuslsquoFur boxrsquo (Figure 6) (65) Within these two earlier

models the classical model and hexamer model(573754) interpreted the lsquoFur boxrsquo as a single recogni-tion unit of a Fur dimer Later Lavrrar and McIntoshsuggested that a 13-bp inverted repeat (6-1-6) is theminimal unit recognized by a single Fur dimer and thattwo overlapping lsquo6-1-6rsquo motifs correspond to the Fur boxwhich is required for high-affinity binding of Fur (5558)Baichoo and Helmann reinterpreted the Fur box as twooverlapping 7-1-7 inverted repeats with a 6-bp spacingand suggested that a 7-1-7 site but not 6-1-6 representsthe minimal recognition unit of Fur (40) Baichoo andHelmann also demonstrated that high-affinity bindingby Fur can happen on a single-dimer binding site ie aninverted 7-1-7 site

The 7-1-7 model basically agrees with our M9 models(Figure 6) The main difference is that the 7-1-7 model didnot count the weakly conserved bases at positions 9 8thorn8 and thorn9 in the B subtilis Fur binding site alignment(Figure 4C) As the E coli and P aeruginosa M9 modelshave highly conserved bases at these four positions(Figure 4A and B) and also the Fur binding mechanismsfor these three species are similar (9) we suggest thata minimal Fur binding unit of 19 bp (9-1-9) as representedby our M9 models is more reasonable

Relative to our M9 models the Fur box only representsthe internal part of two overlapping sites that areseparated by 6 bases ie from 9 to thorn9 of the M12logo (Figures 1A and 6) thus any predictions based on theFur box may not be accurate Two sites (exbD and ygaC)that were predicted by matching to the Fur box wereshown to be non-binding (Figure 3)

Many difficulties in understanding Fur binding sites canbe attributed to the choice of consensus sequences asa model (66) In contrast to the individual informationweight matrix model the consensus sequence methodignores the varying importance of bases by treatingmismatches equivalently In addition the consensusmethod does not have a criterion for an acceptablenumber of mismatches

Sequence walkers predicted by the 24-site M9 modelcover 83 of the footprints (Table 3 Figure 5Supplementary Figure S8) Similar results were obtainedfor both the P aeruginosa and B subtilis M9 models(data not shown) These results strongly suggest that ourM9 models can accurately predict Fur binding

Fur Binding Site Clusters

The whole genome scan with the 24-site M9model revealed363 Fur binding clusters in E coli most of which containmultiple walkers (up to 14) covering regions up to about140 bases This is comparable to the size of knownfootprints which ranges from 30 to 103 bases (Table 3Supplementary Figure S8) Most of the clusters (303 outof 363 83) were found to overlap with one or morestrong s70 promoters Indeed the four Fur clusters inthe middle of coding regions gspC garP yahA and fadDthat we tested by gel mobility shift assays (Figure 3) allhave one or more potential s70 sites (Ri gt 4 bits) exactlyoverlapping them (see Figures S10ndashS13) The function ofthese control elements is unknown

Fur boxClassical model 9-1-9Hexamer model 6-6-1-6Lavrrar model 6-1-6

Baichoo model 7-1-7

M9 model 9-1-9

GATAATGATAATCATTATC

========gtOlt========

=====gt=====gtOlt=====

=====gtOlt==========gtOlt=====

t=====gtOlt============gtOlt=====a

aat=====gtOlt================gtOlt=====att

Figure 6 Different Fur binding models to interpret the Fur box

6774 Nucleic Acids Research 2007 Vol 35 No 20

We correlated our whole genome scan with publishedmicroarray data about Fur in E coli (1360) The resultsshowed that all operons that are involved in ironmetabolism have a predicted strong Fur binding cluster(higher than 13 bits) suggesting direct repression ofthese genes by Fur In contrast only a few of otherFur-activated or repressed operons have a predicted Furcluster higher than 10 bits suggesting indirect control ofthese genes by Fur It has been shown that RNA generyhB directly targets and represses a set of other genes andryhB itself is directly repressed by Fur thus ryhB playsan important role in indirect activation of genes by Furin E coli (135760)

It has been suggested that in N meningitidis Fur canalso directly activate genes by binding to the regionupstream of the promoters (59) However our wholegenome analysis did not support this activation mechan-ism in E coli The E coli ryhB gene is located betweengenes yhhX and yhhY No corresponding homologs of thegenes ryhB yhhX or yhhY can be found in N meningitidisThis suggests that Fur activation mechanisms may bedifferent in different species

Disentangling Fur models

Each binding site that we have studied in detail withinformation theory has had unique properties andchallenges for analysis For example ribosome bindingsites were initially modeled as rigid objects (16) but whenenough sites became available it was clear that theShinendashDalgarno sequence was being lost because it has avariable distance to the initiation codon (67) and a flexiblemodel was required (48) The flexible model was latersuccessfully applied to s70 promoters (47) and that wasused to study Fur sites Likewise information theoryanalysis revealed that Fis binding sites are commonlyfound to overlap each other (68) leading to someuncertainty as to how much nearby sites influence thesites being studied in the alignments Overlapping Fursites that are bound by Fur protein dimers create an evenmore complex situation

Although the 24-site M9 model was quite successful themodel is comprised of information from multiple Fur sitesbecause the Fur dimers overlap on the DNA Since theclusters frequently display 6-base separation of sitespositions 3 to thorn9 of the model contain data frompositions 9 to thorn3 of the adjacent Fur sites eg see thelogo of Figure 4 and the M9 walkers of Figure 2AIn addition information from sites separated by 3 basesand the complements of all of these also contribute to theoverall information content of the model It is not obvioushow to disentangle these effects We suggest that the M9model can be scanned across the genome to locate strongsingle-dimer binding sites which could then be experi-mentally confirmed and used to construct a disentangledmodel

Unfortunately the distribution of the individual infor-mation in these sites would be determined by the modelused to locate them and this could introduce biases Forexample the mean of the distribution can be altered justby selecting different subsets of sites from the distribution

A similar bias has occurred in binding site databaseswhere the consensus sequence was used to locate newsites (23) Thus although the sequence logo would nolonger contain self-overlapping data it would not neces-sarily represent the natural distribution of site strengthsTo approach such an ultimate model may require furtherscanning of footprinted regions Incorporation of theweaker sites in the footprints would of course re-entanglethe model so such scans might be best used only todetermine whether the model can detect the weaker sitesA viable solution may be to obtain all single-dimerbinding sites in the genome that have more than zero bitsand can be confirmed experimentally If the resultingmodel matches the observed footprints it may be consi-dered complete On the other hand if the resultingmodel does not match the footprints the mode of bindingby Fur could be different between single dimers andclusters

SUPPLEMENTARY DATA

Supplementary data are available at NAR Online

ACKNOWLEDGEMENTS

We thank Tom OrsquoHalloran and Caryn Outten for Furprotein Pete Rogan for helping to develop the local bestconcept Elaine Bucheimer for writing the localbestprogram and Lakshmanan Iyer Brent Jewett DanielleNeedle Xiao Ma Shu Ouyang Denise Rubens andBruce Shapiro for their helpful comments and discussionsKAL also thanks the National Cancer Institute atFrederick for sponsoring the Werner H Kirsten StudentIntern Program This publication has been funded in partwith Federal funds from the National Cancer InstituteNational Institutes of Health under contract NO1-CO-12400 The content of this publication does not necessarilyreflect the views or policies of the Department of Healthand Human Services nor does mention of trade namescommercial products or organizations imply endorsementby the US Government This research was supported inpart by the Intramural Research Program of the NIHNational Cancer Institute Center for Cancer ResearchFunding to pay the Open Access publication charges forthis article was provided by National Cancer Institute

Conflict of interest statement None declared

REFERENCES

1 StojiljkovicI BaumlerAJ and HantkeK (1994) Fur regulon ingram-negative bacteria Identification and characterization of newiron-regulated Escherichia coli genes by a fur titration assay[published erratum appears in J Mol Biol 1994 Jul 15240(3)271]J Mol Biol 236 531ndash545

2 Le CamE FrechonD BarrayM FourcadeA and DelainE(1994) Observation of binding and polymerization of Fur repressoronto operator-containing DNA with electron and atomic forcemicroscopes Proc Natl Acad Sci USA 91 11816ndash11820

3 CalderwoodSB and MekalanosJJ (1987) Iron regulation ofShiga-like toxin expression in Escherichia coli is mediated by thefur locus J Bacteriol 169 4759ndash4764

Nucleic Acids Research 2007 Vol 35 No 20 6775

4 NiederhofferEC NaranjoCM BradleyKL and FeeJA (1990)Control of Escherichia coli superoxide dismutase (sodA and sodB)genes by the ferric uptake regulation (fur) locus J Bacteriol 1721930ndash1938

5 EscolarL Perez-MartinJ and de LorenzoV (1998) Binding of theFur (Ferric Uptake Regulator) repressor of Escherichia coli toarrays of the GATAAT sequence J Mol Biol 283 537ndash547

6 BaggA and NeilandsJB (1987) Ferric uptake regulation proteinacts as a repressor employing iron (II) as a cofactor to bind theoperator of an iron transport operon in Escherichia coliBiochemistry 26 5471ndash5477

7 de LorenzoV WeeS HerreroM and NeilandsJB (1987)Operator sequences of the aerobactin operon of plasmid ColV-K30binding the ferric uptake regulation (fur) repressor J Bacteriol169 2624ndash2630

8 AlthausEW OuttenCE OlsonKE CaoH andOrsquoHalloranTV (1999) The ferric uptake regulation (Fur) repressoris a zinc metalloprotein Biochemistry 38 6559ndash6569

9 PohlE HallerJC MijovilovichA Meyer-KlauckeWGarmanE and VasilML (2003) Architecture of a protein centralto iron homeostasis crystal structure and spectroscopic analysis ofthe ferric uptake regulator Mol Microbiol 47 903ndash915

10 KammlerM SchonC and HantkeK (1993) Characterization ofthe ferrous iron uptake system of Escherichia coli J Bacteriol 1756212ndash6219

11 DesaiPJ AngererA and GencoCA (1996) Analysis of Furbinding to operator sequences within the Neisseria gonorrhoeae fbpApromoter J Bacteriol 178 5020ndash5023

12 HeidrichC HantkeK BierbaumG and SahlHG (1996)Identification and analysis of a gene encoding a Fur-likeprotein of Staphylococcus epidermidis FEMS Microbiol Lett 140253ndash259

13 McHughJP Rodriguez-QuinonesF Abdul-TehraniHSvistunenkoDA PooleRK CooperCE and AndrewsSC(2003) Global iron-dependent gene regulation in Escherichia coliA new mechanism for iron homeostasis J Biol Chem 27829478ndash29486

14 ShannonCE (1948) A mathematical theory of communicationBell Syst Tech J 27 379ndash423 623ndash656 httpcmbell-labscomcmmswhatshannondaypaperhtml

15 PierceJR (1980) An Introduction to Information Theory SymbolsSignals and Noise 2nd edn Dover Publications Inc New York

16 SchneiderTD StormoGD GoldL and EhrenfeuchtA (1986)Information content of binding sites on nucleotide sequencesJ Mol Biol 188 415ndash431 httpwwwccrnpncifcrfgov~tomspaperschneider1986

17 SchneiderTD and MastronardeD (1996) Fast multiple alignmentof ungapped DNA sequences using information theory and arelaxation method Discrete Appli Math 71 259ndash268 httpwwwccrnpncifcrfgov~tomspapermalign

18 SchneiderTD and StephensRM (1990) Sequence logos a newway to display consensus sequences Nucleic Acids Res 186097ndash6100 httpwwwccrnpncifcrfgov~tomspaperlogopaper

19 SchneiderTD (1997) Information content of individual geneticsequences J Theor Biol 189 427ndash441 httpwwwccrnpncifcrfgov~tomspaperri

20 SchneiderTD (1997) Sequence walkers a graphical method todisplay how binding proteins interact with DNA or RNAsequences Nucleic Acids Res 25 4408ndash4415 httpwwwccrnpncifcrfgov~tomspaperwalker erratum NAR 26(4)1135 1998

21 PaninaEM MironovAA and GelfandMS (2001) Comparativeanalysis of FUR regulons in gamma-proteobacteria Nucleic AcidsRes 29 5195ndash5206

22 SchneiderTD (2000) Evolution of biological informationNucleic Acids Res 28 2794ndash2799 httpwwwccrnpncifcrfgov~tomspaperev

23 VyhlidalCA RoganPK and LeederJS (2004) Development andrefinement of pregnane X receptor (PXR) DNA binding site modelusing information theory insights into PXR-mediated gene regula-tion J Biol Chem 279 46779ndash46786

24 ZhengM DoanB SchneiderTD and StorzG (1999) OxyR andSoxRS regulation of fur J Bacteriol 181 4639ndash4643 httpwwwccrnpncifcrfgov~tomspaperoxyrfur

25 HengenPN BartramSL StewartLE and SchneiderTD (1997)Information analysis of Fis binding sites Nucleic Acids Res 254994ndash5002 httpwwwccrnpncifcrfgov~tomspaperfisinfo

26 WoodTI GriffithKL FawcettWP Jair K-W SchneiderTDand WolfRE (1999) Interdependence of the position andorientation of SoxS binding sites in the transcriptional activation ofthe class I subset of Escherichia coli superoxide-inducible promotersMol Microbiol 34 414ndash430

27 ZhengM WangX DoanB LewisKA SchneiderTD andStorzG (2001) Computation-directed identification of OxyR-DNAbinding sites in Escherichia coli J Bacteriol 183 4571ndash4579

28 RoganPK FauxBM and SchneiderTD (1998) Informationanalysis of human splice site mutations Hum Mutat 12 153ndash171Erratum in Hum Mutat 199913(1)82 httpwwwccrnpncifcrfgov~tomspaperrfs

29 ChenZ and SchneiderTD (2005) Information theory basedT7-like promoter models classification of bacteriophages anddifferential evolution of promoters and their polymerasesNucleic Acids Res 33 6172ndash6187 httpwwwccrnpncifcrfgov~tomspaperst7like

30 ChenZ and SchneiderTD (2006) Comparative analysis of tandemT7-like promoter containing regions in enterobacterial genomesreveals a novel group of genetic islands Nucleic Acids Res 341133ndash1147 httpwwwccrnpncifcrfgov~tomspaperst7island

31 SchneiderTD (1996) Reading of DNA sequence logos predictionof major groove binding by information theory Meth Enzymol274 445ndash455 httpwwwccrnpncifcrfgov~tomspaperoxyr

32 GriggsDW and KoniskyJ (1989) Mechanism for iron-regulatedtranscription of the Escherichia coli cir gene metal-dependentbinding of Fur protein to the promoters J Bacteriol 1711048ndash1052

33 AngererA and BraunV (1998) Iron regulates transcription of theEscherichia coli ferric citrate transport genes directly and throughthe transcription initiation proteins Arch Microbiol 169 483ndash490

34 de LorenzoV HerreroM GiovanniniF and NeilandsJB (1988)Fur (ferric uptake regulation) protein and CAP (catabolite-activatorprotein) modulate transcription of fur gene in Escherichia coliEur J Biochem 173 537ndash546

35 HuntMD PettisGS and McIntoshMA (1994) Promoter andoperator determinants for fur-mediated iron regulation in thebidirectional fepA-fes control region of the Escherichia colienterobactin gene system J Bacteriol 176 3944ndash3955

36 TardatB and TouatiD (1993) Iron and oxygen regulation ofEscherichia coli MnSOD expression competition between the globalregulators Fur and ArcA for binding to DNA Mol Microbiol 953ndash63

37 EscolarL Perez-MartinJ and de LorenzoV (2000) Evidence ofan unusually long operator for the fur repressor in the aerobactinpromoter of Escherichia coli J Biol Chem 275 24709ndash24714

38 YoungGM and PostleK (1994) Repression of tonB transcriptionduring anaerobic growth requires Fur binding at the promoterand a second factor binding upstream Mol Microbiol 11943ndash954

39 BrickmanTJ OzenbergerBA and McIntoshMA (1990)Regulation of divergent transcription from the iron-responsivefepB-entC promoter-operator regions in Escherichia coliJ Mol Biol 212 669ndash682

40 BaichooN and HelmannJD (2002) Recognition of DNA by Fura reinterpretation of the Fur box consensus sequence J Bacteriol184 5826ndash5832

41 OchsnerUA and VasilML (1996) Gene repression by the ferricuptake regulator in Pseudomonas aeruginosa cycle selection of iron-regulated genes Proc Natl Acad Sci USA 93 4409ndash4414

42 BaichooN WangT YeR and HelmannJD (2002) Globalanalysis of the Bacillus subtilis Fur regulon and the iron starvationstimulon Mol Microbiol 45 1613ndash1629

43 ChristoffersenCA BrickmanTJ Hook-BarnardI andMcIntoshMA (2001) Regulatory architecture of the iron-regulatedfepD-ybdA bidirectional promoter region in Escherichia coliJ Bacteriol 183 2059ndash2070

44 ChaiS WelchTJ and CrosaJH (1998) Characterization of theinteraction between Fur and the iron transport promoter of thevirulence plasmid in Vibrio anguillarum J Biol Chem 27333841ndash33847

6776 Nucleic Acids Research 2007 Vol 35 No 20

45 EscolarL Perez-MartinJ and de LorenzoV (1998) Coordinatedrepression in vitro of the divergent fepA-fes promoters ofEscherichia coli by the iron uptake regulation (Fur) proteinJ Bacteriol 180 2579ndash2582

46 EscolarL de LorenzoV and Perez-MartinJ (1997)Metalloregulation in vitro of the aerobactin promoter of Escherichiacoli by the Fur (ferric uptake regulation) protein Mol Microbiol26 799ndash808

47 ShultzabergerRK ChenZ LewisKA and SchneiderTD (2007)Anatomy of Escherichia coli s70 promoters Nucleic Acids Res 35771ndash788 httpwwwccrnpncifcrfgov~tomspaperflexprom

48 ShultzabergerRK BucheimerRE RuddKE andSchneiderTD (2001) Anatomy of Escherichia coli ribosomebinding sites J Mol Biol 313 215ndash228 httpwwwccrnpncifcrfgov~tomspaperflexrbs

49 HiraoI KawaiG YoshizawaS NishimuraY IshidoYWatanabeK and MiuraK (1994) Most compact hairpin-turnstructure exerted by a short DNA fragment d(GCGAAGC) insolution an extraordinarily stable structure resistant to nucleasesand heat Nucleic Acids Res 22 576ndash582

50 Eick-HelmerichK and BraunV (1989) Import of biopolymers intoEscherichia coli nucleotide sequences of the exbB and exbD genesare homologous to those of the tolQ and tolR genes respectivelyJ Bacteriol 171 5117ndash5126

51 VassinovaN and KozyrevD (2000) A method for direct cloning ofFur-regulated genes identification of seven new Fur-regulated lociin Escherichia coli Microbiology 146 3171ndash3182

52 ToledanoMB KullikI TrinhF BairdPT SchneiderTD andStorzG (1994) Redox-dependent shift of OxyR-DNA contactsalong an extended DNA binding site a mechanism for differentialpromoter selection Cell 78 897ndash909

53 BlattnerFR PlunkettG III BlochCA PernaNT BurlandVRileyM Collado-VidesJ GlasnerJD RodeCK et al (1997)The complete genome sequence of Escherichia coli K-12 Science277 1453ndash1474

54 BaggA and NeilandsJB (1987) Molecular mechanism of regula-tion of siderophore-mediated iron assimilation Microbiol Rev 51509ndash518

55 LavrrarJL and McIntoshMA (2003) Architecture of a Furbinding site a comparative analysis J Bacteriol 185 2194ndash2202

56 OrchardK and MayGE (1993) An EMSA-based method fordetermining the molecular weight of a protein ndash DNA complexNucleic Acids Res 21 3335ndash3336

57 MasseE and GottesmanS (2002) A small RNA regulates theexpression of genes involved in iron metabolism in Escherichia coliProc Natl Acad Sci USA 99 4620ndash4625

58 LavrrarJL ChristoffersenCA and McIntoshMA (2002)FurndashDNA interactions at the bidirectional fepDGC-entS promoterregion in Escherichia coli J Mol Biol 322 983ndash995

59 DelanyI RappuoliR and ScarlatoV (2004) Fur functions as anactivator and as a repressor of putative virulence genes in Neisseriameningitidis Mol Microbiol 52 1081ndash1090

60 MasseE VanderpoolCK and GottesmanS (2005) Effect ofRyhB small RNA on global iron use in Escherichia coliJ Bacteriol 187 6962ndash6971

61 ArgamanL HershbergR VogelJ BejeranoG WagnerEGMargalitH and AltuviaS (2001) Novel small RNA-encodinggenes in the intergenic regions of Escherichia coli Curr Biol 11941ndash950

62 WassarmanKM RepoilaF RosenowC StorzG andGottesmanS (2001) Identification of novel small RNAs usingcomparative genomics and microarrays Genes Dev 15 1637ndash1651

63 RobisonK McGuireAM and ChurchGM (1998) A compre-hensive library of DNA-binding site matrices for 55 proteins appliedto the complete Escherichia coli K-12 genome J Mol Biol 284241ndash254

64 SalgadoH Santos-ZavaletaA Gama-CastroS Millan-ZarateDDiaz-PeredoE Sanchez-SolanoF Perez-RuedaEBonavides-MartinezC and Collado-VidesJ (2001) RegulonDB(version 32) transcriptional regulation and operon organization inEscherichia coli K-12 Nucleic Acids Res 29 72ndash74

65 AndrewsSC RobinsonAK and Rodriguez-QuinonesF (2003)Bacterial iron homeostasis FEMS Microbiol Rev 27 215ndash237

66 SchneiderTD (2002) Consensus Sequence Zen ApplBioinformatics 1 111ndash119 httpwwwccrnpncifcrfgov~tomspaperszen

67 RuddKE and SchneiderTD (1992) Compilation of E coliribosome binding sites In MillerJH (ed) A Short Course inBacterial Genetics A Laboratory Manual and Handbook forEscherichia coli and Related Bacteria Cold Spring HarborCold Spring Harbor Laboratory Press New York pp 1719ndash1745

68 HengenPN LyakhovIG StewartLE and SchneiderTD(2003) Molecular flip-flops formed by overlapping Fis sitesNucleic Acids Res 31 6663ndash6673

69 FeeJA (1991) Regulation of sod genes in Escherichia colirelevance to superoxide dismutase function Mol Microbiol 52599ndash2610

70 IgarashiK SaishoT YuguchiM and KashiwagiK (1997)Molecular mechanism of polyamine stimulation of thesynthesis of oligopeptide-binding protein J Biol Chem 2724058ndash4064

71 YoshidaM MeksuriyenD KashiwagiK KawaiG andIgarashiK (1999) Polyamine stimulation of the synthesis ofoligopeptide-binding protein (OppA) Involvement of a structuralchange of the Shine-Dalgarno sequence and the initiation codonaug in oppa mRNA J Biol Chem 274 22723ndash22728

72 LitwinCM and CalderwoodSB (1993) Role of iron in regulationof virulence genes Clin Microbiol Rev 6 137ndash149

73 MakuiH RoigE ColeST HelmannJD GrosP andCellierMF (2000) Identification of the Escherichia coli K-12Nramp orthologue (MntH) as a selective divalent metal iontransporter Mol Microbiol 35 1065ndash1078

74 PatzerSI and HantkeK (2001) Dual repression by Fe(2+)-Furand Mn(2+)-MntR of the mntH gene encoding an NRAMP-likeMn(2+) transporter in Escherichia coli J Bacteriol 1834806ndash4813

75 FurrerJL SandersDN Hook-BarnardIG and McIntoshMA(2002) Export of the siderophore enterobactin in Escherichia coliinvolvement of a 43 kDa membrane exporter Mol Microbiol 441225ndash1234

76 CunninghamL GruerMJ and GuestJR (1997) Transcriptionalregulation of the aconitase genes (acnA and acnB) of Escherichiacoli Microbiology 143 ( Pt 12) 3795ndash3805

77 FuentesAM Diaz-MejiaJJ Maldonado-RodriguezR andAmabile-CuevasCF (2001) Differential activities of theSoxR protein of Escherichia coli SoxS is not required for geneactivation under iron deprivation FEMS Microbiol Lett 201271ndash275

78 VargheseS TangY and ImlayJA (2003) Contrasting sensitivitiesof Escherichia coli aconitases A and B to oxidation and irondepletion J Bacteriol 185 221ndash230

79 ZavitzKH DiGateRJ and MariansKJ (1991) The PriB andPriC replication proteins of Escherichia coli Genes DNAsequence overexpression and purification J Biol Chem 26613988ndash13995

80 SandlerSJ (2005) Requirements for replication restart proteinsduring constitutive stable DNA replication in Escherichia coli K-12Genetics 169 1799ndash1806

81 SchneiderTD (2001) Strong minor groove base conservation insequence logos implies DNA distortion or base flipping duringreplication and transcription initiation Nucleic Acids Res 294881ndash4891 httpwwwccrnpncifcrfgov~tomspaperbaseflip

Nucleic Acids Research 2007 Vol 35 No 20 6777

Page 14: Discovery of Fur binding site clusters in Escherichia coli ...biitcomm/research/references... · using an fhuF:lacZ fusion and Fur consensus sequence-containing plasmid titrant on

We correlated our whole genome scan with publishedmicroarray data about Fur in E coli (1360) The resultsshowed that all operons that are involved in ironmetabolism have a predicted strong Fur binding cluster(higher than 13 bits) suggesting direct repression ofthese genes by Fur In contrast only a few of otherFur-activated or repressed operons have a predicted Furcluster higher than 10 bits suggesting indirect control ofthese genes by Fur It has been shown that RNA generyhB directly targets and represses a set of other genes andryhB itself is directly repressed by Fur thus ryhB playsan important role in indirect activation of genes by Furin E coli (135760)

It has been suggested that in N meningitidis Fur canalso directly activate genes by binding to the regionupstream of the promoters (59) However our wholegenome analysis did not support this activation mechan-ism in E coli The E coli ryhB gene is located betweengenes yhhX and yhhY No corresponding homologs of thegenes ryhB yhhX or yhhY can be found in N meningitidisThis suggests that Fur activation mechanisms may bedifferent in different species

Disentangling Fur models

Each binding site that we have studied in detail withinformation theory has had unique properties andchallenges for analysis For example ribosome bindingsites were initially modeled as rigid objects (16) but whenenough sites became available it was clear that theShinendashDalgarno sequence was being lost because it has avariable distance to the initiation codon (67) and a flexiblemodel was required (48) The flexible model was latersuccessfully applied to s70 promoters (47) and that wasused to study Fur sites Likewise information theoryanalysis revealed that Fis binding sites are commonlyfound to overlap each other (68) leading to someuncertainty as to how much nearby sites influence thesites being studied in the alignments Overlapping Fursites that are bound by Fur protein dimers create an evenmore complex situation

Although the 24-site M9 model was quite successful themodel is comprised of information from multiple Fur sitesbecause the Fur dimers overlap on the DNA Since theclusters frequently display 6-base separation of sitespositions 3 to thorn9 of the model contain data frompositions 9 to thorn3 of the adjacent Fur sites eg see thelogo of Figure 4 and the M9 walkers of Figure 2AIn addition information from sites separated by 3 basesand the complements of all of these also contribute to theoverall information content of the model It is not obvioushow to disentangle these effects We suggest that the M9model can be scanned across the genome to locate strongsingle-dimer binding sites which could then be experi-mentally confirmed and used to construct a disentangledmodel

Unfortunately the distribution of the individual infor-mation in these sites would be determined by the modelused to locate them and this could introduce biases Forexample the mean of the distribution can be altered justby selecting different subsets of sites from the distribution

A similar bias has occurred in binding site databaseswhere the consensus sequence was used to locate newsites (23) Thus although the sequence logo would nolonger contain self-overlapping data it would not neces-sarily represent the natural distribution of site strengthsTo approach such an ultimate model may require furtherscanning of footprinted regions Incorporation of theweaker sites in the footprints would of course re-entanglethe model so such scans might be best used only todetermine whether the model can detect the weaker sitesA viable solution may be to obtain all single-dimerbinding sites in the genome that have more than zero bitsand can be confirmed experimentally If the resultingmodel matches the observed footprints it may be consi-dered complete On the other hand if the resultingmodel does not match the footprints the mode of bindingby Fur could be different between single dimers andclusters

SUPPLEMENTARY DATA

Supplementary data are available at NAR Online

ACKNOWLEDGEMENTS

We thank Tom OrsquoHalloran and Caryn Outten for Furprotein Pete Rogan for helping to develop the local bestconcept Elaine Bucheimer for writing the localbestprogram and Lakshmanan Iyer Brent Jewett DanielleNeedle Xiao Ma Shu Ouyang Denise Rubens andBruce Shapiro for their helpful comments and discussionsKAL also thanks the National Cancer Institute atFrederick for sponsoring the Werner H Kirsten StudentIntern Program This publication has been funded in partwith Federal funds from the National Cancer InstituteNational Institutes of Health under contract NO1-CO-12400 The content of this publication does not necessarilyreflect the views or policies of the Department of Healthand Human Services nor does mention of trade namescommercial products or organizations imply endorsementby the US Government This research was supported inpart by the Intramural Research Program of the NIHNational Cancer Institute Center for Cancer ResearchFunding to pay the Open Access publication charges forthis article was provided by National Cancer Institute

Conflict of interest statement None declared

REFERENCES

1 StojiljkovicI BaumlerAJ and HantkeK (1994) Fur regulon ingram-negative bacteria Identification and characterization of newiron-regulated Escherichia coli genes by a fur titration assay[published erratum appears in J Mol Biol 1994 Jul 15240(3)271]J Mol Biol 236 531ndash545

2 Le CamE FrechonD BarrayM FourcadeA and DelainE(1994) Observation of binding and polymerization of Fur repressoronto operator-containing DNA with electron and atomic forcemicroscopes Proc Natl Acad Sci USA 91 11816ndash11820

3 CalderwoodSB and MekalanosJJ (1987) Iron regulation ofShiga-like toxin expression in Escherichia coli is mediated by thefur locus J Bacteriol 169 4759ndash4764

Nucleic Acids Research 2007 Vol 35 No 20 6775

4 NiederhofferEC NaranjoCM BradleyKL and FeeJA (1990)Control of Escherichia coli superoxide dismutase (sodA and sodB)genes by the ferric uptake regulation (fur) locus J Bacteriol 1721930ndash1938

5 EscolarL Perez-MartinJ and de LorenzoV (1998) Binding of theFur (Ferric Uptake Regulator) repressor of Escherichia coli toarrays of the GATAAT sequence J Mol Biol 283 537ndash547

6 BaggA and NeilandsJB (1987) Ferric uptake regulation proteinacts as a repressor employing iron (II) as a cofactor to bind theoperator of an iron transport operon in Escherichia coliBiochemistry 26 5471ndash5477

7 de LorenzoV WeeS HerreroM and NeilandsJB (1987)Operator sequences of the aerobactin operon of plasmid ColV-K30binding the ferric uptake regulation (fur) repressor J Bacteriol169 2624ndash2630

8 AlthausEW OuttenCE OlsonKE CaoH andOrsquoHalloranTV (1999) The ferric uptake regulation (Fur) repressoris a zinc metalloprotein Biochemistry 38 6559ndash6569

9 PohlE HallerJC MijovilovichA Meyer-KlauckeWGarmanE and VasilML (2003) Architecture of a protein centralto iron homeostasis crystal structure and spectroscopic analysis ofthe ferric uptake regulator Mol Microbiol 47 903ndash915

10 KammlerM SchonC and HantkeK (1993) Characterization ofthe ferrous iron uptake system of Escherichia coli J Bacteriol 1756212ndash6219

11 DesaiPJ AngererA and GencoCA (1996) Analysis of Furbinding to operator sequences within the Neisseria gonorrhoeae fbpApromoter J Bacteriol 178 5020ndash5023

12 HeidrichC HantkeK BierbaumG and SahlHG (1996)Identification and analysis of a gene encoding a Fur-likeprotein of Staphylococcus epidermidis FEMS Microbiol Lett 140253ndash259

13 McHughJP Rodriguez-QuinonesF Abdul-TehraniHSvistunenkoDA PooleRK CooperCE and AndrewsSC(2003) Global iron-dependent gene regulation in Escherichia coliA new mechanism for iron homeostasis J Biol Chem 27829478ndash29486

14 ShannonCE (1948) A mathematical theory of communicationBell Syst Tech J 27 379ndash423 623ndash656 httpcmbell-labscomcmmswhatshannondaypaperhtml

15 PierceJR (1980) An Introduction to Information Theory SymbolsSignals and Noise 2nd edn Dover Publications Inc New York

16 SchneiderTD StormoGD GoldL and EhrenfeuchtA (1986)Information content of binding sites on nucleotide sequencesJ Mol Biol 188 415ndash431 httpwwwccrnpncifcrfgov~tomspaperschneider1986

17 SchneiderTD and MastronardeD (1996) Fast multiple alignmentof ungapped DNA sequences using information theory and arelaxation method Discrete Appli Math 71 259ndash268 httpwwwccrnpncifcrfgov~tomspapermalign

18 SchneiderTD and StephensRM (1990) Sequence logos a newway to display consensus sequences Nucleic Acids Res 186097ndash6100 httpwwwccrnpncifcrfgov~tomspaperlogopaper

19 SchneiderTD (1997) Information content of individual geneticsequences J Theor Biol 189 427ndash441 httpwwwccrnpncifcrfgov~tomspaperri

20 SchneiderTD (1997) Sequence walkers a graphical method todisplay how binding proteins interact with DNA or RNAsequences Nucleic Acids Res 25 4408ndash4415 httpwwwccrnpncifcrfgov~tomspaperwalker erratum NAR 26(4)1135 1998

21 PaninaEM MironovAA and GelfandMS (2001) Comparativeanalysis of FUR regulons in gamma-proteobacteria Nucleic AcidsRes 29 5195ndash5206

22 SchneiderTD (2000) Evolution of biological informationNucleic Acids Res 28 2794ndash2799 httpwwwccrnpncifcrfgov~tomspaperev

23 VyhlidalCA RoganPK and LeederJS (2004) Development andrefinement of pregnane X receptor (PXR) DNA binding site modelusing information theory insights into PXR-mediated gene regula-tion J Biol Chem 279 46779ndash46786

24 ZhengM DoanB SchneiderTD and StorzG (1999) OxyR andSoxRS regulation of fur J Bacteriol 181 4639ndash4643 httpwwwccrnpncifcrfgov~tomspaperoxyrfur

25 HengenPN BartramSL StewartLE and SchneiderTD (1997)Information analysis of Fis binding sites Nucleic Acids Res 254994ndash5002 httpwwwccrnpncifcrfgov~tomspaperfisinfo

26 WoodTI GriffithKL FawcettWP Jair K-W SchneiderTDand WolfRE (1999) Interdependence of the position andorientation of SoxS binding sites in the transcriptional activation ofthe class I subset of Escherichia coli superoxide-inducible promotersMol Microbiol 34 414ndash430

27 ZhengM WangX DoanB LewisKA SchneiderTD andStorzG (2001) Computation-directed identification of OxyR-DNAbinding sites in Escherichia coli J Bacteriol 183 4571ndash4579

28 RoganPK FauxBM and SchneiderTD (1998) Informationanalysis of human splice site mutations Hum Mutat 12 153ndash171Erratum in Hum Mutat 199913(1)82 httpwwwccrnpncifcrfgov~tomspaperrfs

29 ChenZ and SchneiderTD (2005) Information theory basedT7-like promoter models classification of bacteriophages anddifferential evolution of promoters and their polymerasesNucleic Acids Res 33 6172ndash6187 httpwwwccrnpncifcrfgov~tomspaperst7like

30 ChenZ and SchneiderTD (2006) Comparative analysis of tandemT7-like promoter containing regions in enterobacterial genomesreveals a novel group of genetic islands Nucleic Acids Res 341133ndash1147 httpwwwccrnpncifcrfgov~tomspaperst7island

31 SchneiderTD (1996) Reading of DNA sequence logos predictionof major groove binding by information theory Meth Enzymol274 445ndash455 httpwwwccrnpncifcrfgov~tomspaperoxyr

32 GriggsDW and KoniskyJ (1989) Mechanism for iron-regulatedtranscription of the Escherichia coli cir gene metal-dependentbinding of Fur protein to the promoters J Bacteriol 1711048ndash1052

33 AngererA and BraunV (1998) Iron regulates transcription of theEscherichia coli ferric citrate transport genes directly and throughthe transcription initiation proteins Arch Microbiol 169 483ndash490

34 de LorenzoV HerreroM GiovanniniF and NeilandsJB (1988)Fur (ferric uptake regulation) protein and CAP (catabolite-activatorprotein) modulate transcription of fur gene in Escherichia coliEur J Biochem 173 537ndash546

35 HuntMD PettisGS and McIntoshMA (1994) Promoter andoperator determinants for fur-mediated iron regulation in thebidirectional fepA-fes control region of the Escherichia colienterobactin gene system J Bacteriol 176 3944ndash3955

36 TardatB and TouatiD (1993) Iron and oxygen regulation ofEscherichia coli MnSOD expression competition between the globalregulators Fur and ArcA for binding to DNA Mol Microbiol 953ndash63

37 EscolarL Perez-MartinJ and de LorenzoV (2000) Evidence ofan unusually long operator for the fur repressor in the aerobactinpromoter of Escherichia coli J Biol Chem 275 24709ndash24714

38 YoungGM and PostleK (1994) Repression of tonB transcriptionduring anaerobic growth requires Fur binding at the promoterand a second factor binding upstream Mol Microbiol 11943ndash954

39 BrickmanTJ OzenbergerBA and McIntoshMA (1990)Regulation of divergent transcription from the iron-responsivefepB-entC promoter-operator regions in Escherichia coliJ Mol Biol 212 669ndash682

40 BaichooN and HelmannJD (2002) Recognition of DNA by Fura reinterpretation of the Fur box consensus sequence J Bacteriol184 5826ndash5832

41 OchsnerUA and VasilML (1996) Gene repression by the ferricuptake regulator in Pseudomonas aeruginosa cycle selection of iron-regulated genes Proc Natl Acad Sci USA 93 4409ndash4414

42 BaichooN WangT YeR and HelmannJD (2002) Globalanalysis of the Bacillus subtilis Fur regulon and the iron starvationstimulon Mol Microbiol 45 1613ndash1629

43 ChristoffersenCA BrickmanTJ Hook-BarnardI andMcIntoshMA (2001) Regulatory architecture of the iron-regulatedfepD-ybdA bidirectional promoter region in Escherichia coliJ Bacteriol 183 2059ndash2070

44 ChaiS WelchTJ and CrosaJH (1998) Characterization of theinteraction between Fur and the iron transport promoter of thevirulence plasmid in Vibrio anguillarum J Biol Chem 27333841ndash33847

6776 Nucleic Acids Research 2007 Vol 35 No 20

45 EscolarL Perez-MartinJ and de LorenzoV (1998) Coordinatedrepression in vitro of the divergent fepA-fes promoters ofEscherichia coli by the iron uptake regulation (Fur) proteinJ Bacteriol 180 2579ndash2582

46 EscolarL de LorenzoV and Perez-MartinJ (1997)Metalloregulation in vitro of the aerobactin promoter of Escherichiacoli by the Fur (ferric uptake regulation) protein Mol Microbiol26 799ndash808

47 ShultzabergerRK ChenZ LewisKA and SchneiderTD (2007)Anatomy of Escherichia coli s70 promoters Nucleic Acids Res 35771ndash788 httpwwwccrnpncifcrfgov~tomspaperflexprom

48 ShultzabergerRK BucheimerRE RuddKE andSchneiderTD (2001) Anatomy of Escherichia coli ribosomebinding sites J Mol Biol 313 215ndash228 httpwwwccrnpncifcrfgov~tomspaperflexrbs

49 HiraoI KawaiG YoshizawaS NishimuraY IshidoYWatanabeK and MiuraK (1994) Most compact hairpin-turnstructure exerted by a short DNA fragment d(GCGAAGC) insolution an extraordinarily stable structure resistant to nucleasesand heat Nucleic Acids Res 22 576ndash582

50 Eick-HelmerichK and BraunV (1989) Import of biopolymers intoEscherichia coli nucleotide sequences of the exbB and exbD genesare homologous to those of the tolQ and tolR genes respectivelyJ Bacteriol 171 5117ndash5126

51 VassinovaN and KozyrevD (2000) A method for direct cloning ofFur-regulated genes identification of seven new Fur-regulated lociin Escherichia coli Microbiology 146 3171ndash3182

52 ToledanoMB KullikI TrinhF BairdPT SchneiderTD andStorzG (1994) Redox-dependent shift of OxyR-DNA contactsalong an extended DNA binding site a mechanism for differentialpromoter selection Cell 78 897ndash909

53 BlattnerFR PlunkettG III BlochCA PernaNT BurlandVRileyM Collado-VidesJ GlasnerJD RodeCK et al (1997)The complete genome sequence of Escherichia coli K-12 Science277 1453ndash1474

54 BaggA and NeilandsJB (1987) Molecular mechanism of regula-tion of siderophore-mediated iron assimilation Microbiol Rev 51509ndash518

55 LavrrarJL and McIntoshMA (2003) Architecture of a Furbinding site a comparative analysis J Bacteriol 185 2194ndash2202

56 OrchardK and MayGE (1993) An EMSA-based method fordetermining the molecular weight of a protein ndash DNA complexNucleic Acids Res 21 3335ndash3336

57 MasseE and GottesmanS (2002) A small RNA regulates theexpression of genes involved in iron metabolism in Escherichia coliProc Natl Acad Sci USA 99 4620ndash4625

58 LavrrarJL ChristoffersenCA and McIntoshMA (2002)FurndashDNA interactions at the bidirectional fepDGC-entS promoterregion in Escherichia coli J Mol Biol 322 983ndash995

59 DelanyI RappuoliR and ScarlatoV (2004) Fur functions as anactivator and as a repressor of putative virulence genes in Neisseriameningitidis Mol Microbiol 52 1081ndash1090

60 MasseE VanderpoolCK and GottesmanS (2005) Effect ofRyhB small RNA on global iron use in Escherichia coliJ Bacteriol 187 6962ndash6971

61 ArgamanL HershbergR VogelJ BejeranoG WagnerEGMargalitH and AltuviaS (2001) Novel small RNA-encodinggenes in the intergenic regions of Escherichia coli Curr Biol 11941ndash950

62 WassarmanKM RepoilaF RosenowC StorzG andGottesmanS (2001) Identification of novel small RNAs usingcomparative genomics and microarrays Genes Dev 15 1637ndash1651

63 RobisonK McGuireAM and ChurchGM (1998) A compre-hensive library of DNA-binding site matrices for 55 proteins appliedto the complete Escherichia coli K-12 genome J Mol Biol 284241ndash254

64 SalgadoH Santos-ZavaletaA Gama-CastroS Millan-ZarateDDiaz-PeredoE Sanchez-SolanoF Perez-RuedaEBonavides-MartinezC and Collado-VidesJ (2001) RegulonDB(version 32) transcriptional regulation and operon organization inEscherichia coli K-12 Nucleic Acids Res 29 72ndash74

65 AndrewsSC RobinsonAK and Rodriguez-QuinonesF (2003)Bacterial iron homeostasis FEMS Microbiol Rev 27 215ndash237

66 SchneiderTD (2002) Consensus Sequence Zen ApplBioinformatics 1 111ndash119 httpwwwccrnpncifcrfgov~tomspaperszen

67 RuddKE and SchneiderTD (1992) Compilation of E coliribosome binding sites In MillerJH (ed) A Short Course inBacterial Genetics A Laboratory Manual and Handbook forEscherichia coli and Related Bacteria Cold Spring HarborCold Spring Harbor Laboratory Press New York pp 1719ndash1745

68 HengenPN LyakhovIG StewartLE and SchneiderTD(2003) Molecular flip-flops formed by overlapping Fis sitesNucleic Acids Res 31 6663ndash6673

69 FeeJA (1991) Regulation of sod genes in Escherichia colirelevance to superoxide dismutase function Mol Microbiol 52599ndash2610

70 IgarashiK SaishoT YuguchiM and KashiwagiK (1997)Molecular mechanism of polyamine stimulation of thesynthesis of oligopeptide-binding protein J Biol Chem 2724058ndash4064

71 YoshidaM MeksuriyenD KashiwagiK KawaiG andIgarashiK (1999) Polyamine stimulation of the synthesis ofoligopeptide-binding protein (OppA) Involvement of a structuralchange of the Shine-Dalgarno sequence and the initiation codonaug in oppa mRNA J Biol Chem 274 22723ndash22728

72 LitwinCM and CalderwoodSB (1993) Role of iron in regulationof virulence genes Clin Microbiol Rev 6 137ndash149

73 MakuiH RoigE ColeST HelmannJD GrosP andCellierMF (2000) Identification of the Escherichia coli K-12Nramp orthologue (MntH) as a selective divalent metal iontransporter Mol Microbiol 35 1065ndash1078

74 PatzerSI and HantkeK (2001) Dual repression by Fe(2+)-Furand Mn(2+)-MntR of the mntH gene encoding an NRAMP-likeMn(2+) transporter in Escherichia coli J Bacteriol 1834806ndash4813

75 FurrerJL SandersDN Hook-BarnardIG and McIntoshMA(2002) Export of the siderophore enterobactin in Escherichia coliinvolvement of a 43 kDa membrane exporter Mol Microbiol 441225ndash1234

76 CunninghamL GruerMJ and GuestJR (1997) Transcriptionalregulation of the aconitase genes (acnA and acnB) of Escherichiacoli Microbiology 143 ( Pt 12) 3795ndash3805

77 FuentesAM Diaz-MejiaJJ Maldonado-RodriguezR andAmabile-CuevasCF (2001) Differential activities of theSoxR protein of Escherichia coli SoxS is not required for geneactivation under iron deprivation FEMS Microbiol Lett 201271ndash275

78 VargheseS TangY and ImlayJA (2003) Contrasting sensitivitiesof Escherichia coli aconitases A and B to oxidation and irondepletion J Bacteriol 185 221ndash230

79 ZavitzKH DiGateRJ and MariansKJ (1991) The PriB andPriC replication proteins of Escherichia coli Genes DNAsequence overexpression and purification J Biol Chem 26613988ndash13995

80 SandlerSJ (2005) Requirements for replication restart proteinsduring constitutive stable DNA replication in Escherichia coli K-12Genetics 169 1799ndash1806

81 SchneiderTD (2001) Strong minor groove base conservation insequence logos implies DNA distortion or base flipping duringreplication and transcription initiation Nucleic Acids Res 294881ndash4891 httpwwwccrnpncifcrfgov~tomspaperbaseflip

Nucleic Acids Research 2007 Vol 35 No 20 6777

Page 15: Discovery of Fur binding site clusters in Escherichia coli ...biitcomm/research/references... · using an fhuF:lacZ fusion and Fur consensus sequence-containing plasmid titrant on

4 NiederhofferEC NaranjoCM BradleyKL and FeeJA (1990)Control of Escherichia coli superoxide dismutase (sodA and sodB)genes by the ferric uptake regulation (fur) locus J Bacteriol 1721930ndash1938

5 EscolarL Perez-MartinJ and de LorenzoV (1998) Binding of theFur (Ferric Uptake Regulator) repressor of Escherichia coli toarrays of the GATAAT sequence J Mol Biol 283 537ndash547

6 BaggA and NeilandsJB (1987) Ferric uptake regulation proteinacts as a repressor employing iron (II) as a cofactor to bind theoperator of an iron transport operon in Escherichia coliBiochemistry 26 5471ndash5477

7 de LorenzoV WeeS HerreroM and NeilandsJB (1987)Operator sequences of the aerobactin operon of plasmid ColV-K30binding the ferric uptake regulation (fur) repressor J Bacteriol169 2624ndash2630

8 AlthausEW OuttenCE OlsonKE CaoH andOrsquoHalloranTV (1999) The ferric uptake regulation (Fur) repressoris a zinc metalloprotein Biochemistry 38 6559ndash6569

9 PohlE HallerJC MijovilovichA Meyer-KlauckeWGarmanE and VasilML (2003) Architecture of a protein centralto iron homeostasis crystal structure and spectroscopic analysis ofthe ferric uptake regulator Mol Microbiol 47 903ndash915

10 KammlerM SchonC and HantkeK (1993) Characterization ofthe ferrous iron uptake system of Escherichia coli J Bacteriol 1756212ndash6219

11 DesaiPJ AngererA and GencoCA (1996) Analysis of Furbinding to operator sequences within the Neisseria gonorrhoeae fbpApromoter J Bacteriol 178 5020ndash5023

12 HeidrichC HantkeK BierbaumG and SahlHG (1996)Identification and analysis of a gene encoding a Fur-likeprotein of Staphylococcus epidermidis FEMS Microbiol Lett 140253ndash259

13 McHughJP Rodriguez-QuinonesF Abdul-TehraniHSvistunenkoDA PooleRK CooperCE and AndrewsSC(2003) Global iron-dependent gene regulation in Escherichia coliA new mechanism for iron homeostasis J Biol Chem 27829478ndash29486

14 ShannonCE (1948) A mathematical theory of communicationBell Syst Tech J 27 379ndash423 623ndash656 httpcmbell-labscomcmmswhatshannondaypaperhtml

15 PierceJR (1980) An Introduction to Information Theory SymbolsSignals and Noise 2nd edn Dover Publications Inc New York

16 SchneiderTD StormoGD GoldL and EhrenfeuchtA (1986)Information content of binding sites on nucleotide sequencesJ Mol Biol 188 415ndash431 httpwwwccrnpncifcrfgov~tomspaperschneider1986

17 SchneiderTD and MastronardeD (1996) Fast multiple alignmentof ungapped DNA sequences using information theory and arelaxation method Discrete Appli Math 71 259ndash268 httpwwwccrnpncifcrfgov~tomspapermalign

18 SchneiderTD and StephensRM (1990) Sequence logos a newway to display consensus sequences Nucleic Acids Res 186097ndash6100 httpwwwccrnpncifcrfgov~tomspaperlogopaper

19 SchneiderTD (1997) Information content of individual geneticsequences J Theor Biol 189 427ndash441 httpwwwccrnpncifcrfgov~tomspaperri

20 SchneiderTD (1997) Sequence walkers a graphical method todisplay how binding proteins interact with DNA or RNAsequences Nucleic Acids Res 25 4408ndash4415 httpwwwccrnpncifcrfgov~tomspaperwalker erratum NAR 26(4)1135 1998

21 PaninaEM MironovAA and GelfandMS (2001) Comparativeanalysis of FUR regulons in gamma-proteobacteria Nucleic AcidsRes 29 5195ndash5206

22 SchneiderTD (2000) Evolution of biological informationNucleic Acids Res 28 2794ndash2799 httpwwwccrnpncifcrfgov~tomspaperev

23 VyhlidalCA RoganPK and LeederJS (2004) Development andrefinement of pregnane X receptor (PXR) DNA binding site modelusing information theory insights into PXR-mediated gene regula-tion J Biol Chem 279 46779ndash46786

24 ZhengM DoanB SchneiderTD and StorzG (1999) OxyR andSoxRS regulation of fur J Bacteriol 181 4639ndash4643 httpwwwccrnpncifcrfgov~tomspaperoxyrfur

25 HengenPN BartramSL StewartLE and SchneiderTD (1997)Information analysis of Fis binding sites Nucleic Acids Res 254994ndash5002 httpwwwccrnpncifcrfgov~tomspaperfisinfo

26 WoodTI GriffithKL FawcettWP Jair K-W SchneiderTDand WolfRE (1999) Interdependence of the position andorientation of SoxS binding sites in the transcriptional activation ofthe class I subset of Escherichia coli superoxide-inducible promotersMol Microbiol 34 414ndash430

27 ZhengM WangX DoanB LewisKA SchneiderTD andStorzG (2001) Computation-directed identification of OxyR-DNAbinding sites in Escherichia coli J Bacteriol 183 4571ndash4579

28 RoganPK FauxBM and SchneiderTD (1998) Informationanalysis of human splice site mutations Hum Mutat 12 153ndash171Erratum in Hum Mutat 199913(1)82 httpwwwccrnpncifcrfgov~tomspaperrfs

29 ChenZ and SchneiderTD (2005) Information theory basedT7-like promoter models classification of bacteriophages anddifferential evolution of promoters and their polymerasesNucleic Acids Res 33 6172ndash6187 httpwwwccrnpncifcrfgov~tomspaperst7like

30 ChenZ and SchneiderTD (2006) Comparative analysis of tandemT7-like promoter containing regions in enterobacterial genomesreveals a novel group of genetic islands Nucleic Acids Res 341133ndash1147 httpwwwccrnpncifcrfgov~tomspaperst7island

31 SchneiderTD (1996) Reading of DNA sequence logos predictionof major groove binding by information theory Meth Enzymol274 445ndash455 httpwwwccrnpncifcrfgov~tomspaperoxyr

32 GriggsDW and KoniskyJ (1989) Mechanism for iron-regulatedtranscription of the Escherichia coli cir gene metal-dependentbinding of Fur protein to the promoters J Bacteriol 1711048ndash1052

33 AngererA and BraunV (1998) Iron regulates transcription of theEscherichia coli ferric citrate transport genes directly and throughthe transcription initiation proteins Arch Microbiol 169 483ndash490

34 de LorenzoV HerreroM GiovanniniF and NeilandsJB (1988)Fur (ferric uptake regulation) protein and CAP (catabolite-activatorprotein) modulate transcription of fur gene in Escherichia coliEur J Biochem 173 537ndash546

35 HuntMD PettisGS and McIntoshMA (1994) Promoter andoperator determinants for fur-mediated iron regulation in thebidirectional fepA-fes control region of the Escherichia colienterobactin gene system J Bacteriol 176 3944ndash3955

36 TardatB and TouatiD (1993) Iron and oxygen regulation ofEscherichia coli MnSOD expression competition between the globalregulators Fur and ArcA for binding to DNA Mol Microbiol 953ndash63

37 EscolarL Perez-MartinJ and de LorenzoV (2000) Evidence ofan unusually long operator for the fur repressor in the aerobactinpromoter of Escherichia coli J Biol Chem 275 24709ndash24714

38 YoungGM and PostleK (1994) Repression of tonB transcriptionduring anaerobic growth requires Fur binding at the promoterand a second factor binding upstream Mol Microbiol 11943ndash954

39 BrickmanTJ OzenbergerBA and McIntoshMA (1990)Regulation of divergent transcription from the iron-responsivefepB-entC promoter-operator regions in Escherichia coliJ Mol Biol 212 669ndash682

40 BaichooN and HelmannJD (2002) Recognition of DNA by Fura reinterpretation of the Fur box consensus sequence J Bacteriol184 5826ndash5832

41 OchsnerUA and VasilML (1996) Gene repression by the ferricuptake regulator in Pseudomonas aeruginosa cycle selection of iron-regulated genes Proc Natl Acad Sci USA 93 4409ndash4414

42 BaichooN WangT YeR and HelmannJD (2002) Globalanalysis of the Bacillus subtilis Fur regulon and the iron starvationstimulon Mol Microbiol 45 1613ndash1629

43 ChristoffersenCA BrickmanTJ Hook-BarnardI andMcIntoshMA (2001) Regulatory architecture of the iron-regulatedfepD-ybdA bidirectional promoter region in Escherichia coliJ Bacteriol 183 2059ndash2070

44 ChaiS WelchTJ and CrosaJH (1998) Characterization of theinteraction between Fur and the iron transport promoter of thevirulence plasmid in Vibrio anguillarum J Biol Chem 27333841ndash33847

6776 Nucleic Acids Research 2007 Vol 35 No 20

45 EscolarL Perez-MartinJ and de LorenzoV (1998) Coordinatedrepression in vitro of the divergent fepA-fes promoters ofEscherichia coli by the iron uptake regulation (Fur) proteinJ Bacteriol 180 2579ndash2582

46 EscolarL de LorenzoV and Perez-MartinJ (1997)Metalloregulation in vitro of the aerobactin promoter of Escherichiacoli by the Fur (ferric uptake regulation) protein Mol Microbiol26 799ndash808

47 ShultzabergerRK ChenZ LewisKA and SchneiderTD (2007)Anatomy of Escherichia coli s70 promoters Nucleic Acids Res 35771ndash788 httpwwwccrnpncifcrfgov~tomspaperflexprom

48 ShultzabergerRK BucheimerRE RuddKE andSchneiderTD (2001) Anatomy of Escherichia coli ribosomebinding sites J Mol Biol 313 215ndash228 httpwwwccrnpncifcrfgov~tomspaperflexrbs

49 HiraoI KawaiG YoshizawaS NishimuraY IshidoYWatanabeK and MiuraK (1994) Most compact hairpin-turnstructure exerted by a short DNA fragment d(GCGAAGC) insolution an extraordinarily stable structure resistant to nucleasesand heat Nucleic Acids Res 22 576ndash582

50 Eick-HelmerichK and BraunV (1989) Import of biopolymers intoEscherichia coli nucleotide sequences of the exbB and exbD genesare homologous to those of the tolQ and tolR genes respectivelyJ Bacteriol 171 5117ndash5126

51 VassinovaN and KozyrevD (2000) A method for direct cloning ofFur-regulated genes identification of seven new Fur-regulated lociin Escherichia coli Microbiology 146 3171ndash3182

52 ToledanoMB KullikI TrinhF BairdPT SchneiderTD andStorzG (1994) Redox-dependent shift of OxyR-DNA contactsalong an extended DNA binding site a mechanism for differentialpromoter selection Cell 78 897ndash909

53 BlattnerFR PlunkettG III BlochCA PernaNT BurlandVRileyM Collado-VidesJ GlasnerJD RodeCK et al (1997)The complete genome sequence of Escherichia coli K-12 Science277 1453ndash1474

54 BaggA and NeilandsJB (1987) Molecular mechanism of regula-tion of siderophore-mediated iron assimilation Microbiol Rev 51509ndash518

55 LavrrarJL and McIntoshMA (2003) Architecture of a Furbinding site a comparative analysis J Bacteriol 185 2194ndash2202

56 OrchardK and MayGE (1993) An EMSA-based method fordetermining the molecular weight of a protein ndash DNA complexNucleic Acids Res 21 3335ndash3336

57 MasseE and GottesmanS (2002) A small RNA regulates theexpression of genes involved in iron metabolism in Escherichia coliProc Natl Acad Sci USA 99 4620ndash4625

58 LavrrarJL ChristoffersenCA and McIntoshMA (2002)FurndashDNA interactions at the bidirectional fepDGC-entS promoterregion in Escherichia coli J Mol Biol 322 983ndash995

59 DelanyI RappuoliR and ScarlatoV (2004) Fur functions as anactivator and as a repressor of putative virulence genes in Neisseriameningitidis Mol Microbiol 52 1081ndash1090

60 MasseE VanderpoolCK and GottesmanS (2005) Effect ofRyhB small RNA on global iron use in Escherichia coliJ Bacteriol 187 6962ndash6971

61 ArgamanL HershbergR VogelJ BejeranoG WagnerEGMargalitH and AltuviaS (2001) Novel small RNA-encodinggenes in the intergenic regions of Escherichia coli Curr Biol 11941ndash950

62 WassarmanKM RepoilaF RosenowC StorzG andGottesmanS (2001) Identification of novel small RNAs usingcomparative genomics and microarrays Genes Dev 15 1637ndash1651

63 RobisonK McGuireAM and ChurchGM (1998) A compre-hensive library of DNA-binding site matrices for 55 proteins appliedto the complete Escherichia coli K-12 genome J Mol Biol 284241ndash254

64 SalgadoH Santos-ZavaletaA Gama-CastroS Millan-ZarateDDiaz-PeredoE Sanchez-SolanoF Perez-RuedaEBonavides-MartinezC and Collado-VidesJ (2001) RegulonDB(version 32) transcriptional regulation and operon organization inEscherichia coli K-12 Nucleic Acids Res 29 72ndash74

65 AndrewsSC RobinsonAK and Rodriguez-QuinonesF (2003)Bacterial iron homeostasis FEMS Microbiol Rev 27 215ndash237

66 SchneiderTD (2002) Consensus Sequence Zen ApplBioinformatics 1 111ndash119 httpwwwccrnpncifcrfgov~tomspaperszen

67 RuddKE and SchneiderTD (1992) Compilation of E coliribosome binding sites In MillerJH (ed) A Short Course inBacterial Genetics A Laboratory Manual and Handbook forEscherichia coli and Related Bacteria Cold Spring HarborCold Spring Harbor Laboratory Press New York pp 1719ndash1745

68 HengenPN LyakhovIG StewartLE and SchneiderTD(2003) Molecular flip-flops formed by overlapping Fis sitesNucleic Acids Res 31 6663ndash6673

69 FeeJA (1991) Regulation of sod genes in Escherichia colirelevance to superoxide dismutase function Mol Microbiol 52599ndash2610

70 IgarashiK SaishoT YuguchiM and KashiwagiK (1997)Molecular mechanism of polyamine stimulation of thesynthesis of oligopeptide-binding protein J Biol Chem 2724058ndash4064

71 YoshidaM MeksuriyenD KashiwagiK KawaiG andIgarashiK (1999) Polyamine stimulation of the synthesis ofoligopeptide-binding protein (OppA) Involvement of a structuralchange of the Shine-Dalgarno sequence and the initiation codonaug in oppa mRNA J Biol Chem 274 22723ndash22728

72 LitwinCM and CalderwoodSB (1993) Role of iron in regulationof virulence genes Clin Microbiol Rev 6 137ndash149

73 MakuiH RoigE ColeST HelmannJD GrosP andCellierMF (2000) Identification of the Escherichia coli K-12Nramp orthologue (MntH) as a selective divalent metal iontransporter Mol Microbiol 35 1065ndash1078

74 PatzerSI and HantkeK (2001) Dual repression by Fe(2+)-Furand Mn(2+)-MntR of the mntH gene encoding an NRAMP-likeMn(2+) transporter in Escherichia coli J Bacteriol 1834806ndash4813

75 FurrerJL SandersDN Hook-BarnardIG and McIntoshMA(2002) Export of the siderophore enterobactin in Escherichia coliinvolvement of a 43 kDa membrane exporter Mol Microbiol 441225ndash1234

76 CunninghamL GruerMJ and GuestJR (1997) Transcriptionalregulation of the aconitase genes (acnA and acnB) of Escherichiacoli Microbiology 143 ( Pt 12) 3795ndash3805

77 FuentesAM Diaz-MejiaJJ Maldonado-RodriguezR andAmabile-CuevasCF (2001) Differential activities of theSoxR protein of Escherichia coli SoxS is not required for geneactivation under iron deprivation FEMS Microbiol Lett 201271ndash275

78 VargheseS TangY and ImlayJA (2003) Contrasting sensitivitiesof Escherichia coli aconitases A and B to oxidation and irondepletion J Bacteriol 185 221ndash230

79 ZavitzKH DiGateRJ and MariansKJ (1991) The PriB andPriC replication proteins of Escherichia coli Genes DNAsequence overexpression and purification J Biol Chem 26613988ndash13995

80 SandlerSJ (2005) Requirements for replication restart proteinsduring constitutive stable DNA replication in Escherichia coli K-12Genetics 169 1799ndash1806

81 SchneiderTD (2001) Strong minor groove base conservation insequence logos implies DNA distortion or base flipping duringreplication and transcription initiation Nucleic Acids Res 294881ndash4891 httpwwwccrnpncifcrfgov~tomspaperbaseflip

Nucleic Acids Research 2007 Vol 35 No 20 6777

Page 16: Discovery of Fur binding site clusters in Escherichia coli ...biitcomm/research/references... · using an fhuF:lacZ fusion and Fur consensus sequence-containing plasmid titrant on

45 EscolarL Perez-MartinJ and de LorenzoV (1998) Coordinatedrepression in vitro of the divergent fepA-fes promoters ofEscherichia coli by the iron uptake regulation (Fur) proteinJ Bacteriol 180 2579ndash2582

46 EscolarL de LorenzoV and Perez-MartinJ (1997)Metalloregulation in vitro of the aerobactin promoter of Escherichiacoli by the Fur (ferric uptake regulation) protein Mol Microbiol26 799ndash808

47 ShultzabergerRK ChenZ LewisKA and SchneiderTD (2007)Anatomy of Escherichia coli s70 promoters Nucleic Acids Res 35771ndash788 httpwwwccrnpncifcrfgov~tomspaperflexprom

48 ShultzabergerRK BucheimerRE RuddKE andSchneiderTD (2001) Anatomy of Escherichia coli ribosomebinding sites J Mol Biol 313 215ndash228 httpwwwccrnpncifcrfgov~tomspaperflexrbs

49 HiraoI KawaiG YoshizawaS NishimuraY IshidoYWatanabeK and MiuraK (1994) Most compact hairpin-turnstructure exerted by a short DNA fragment d(GCGAAGC) insolution an extraordinarily stable structure resistant to nucleasesand heat Nucleic Acids Res 22 576ndash582

50 Eick-HelmerichK and BraunV (1989) Import of biopolymers intoEscherichia coli nucleotide sequences of the exbB and exbD genesare homologous to those of the tolQ and tolR genes respectivelyJ Bacteriol 171 5117ndash5126

51 VassinovaN and KozyrevD (2000) A method for direct cloning ofFur-regulated genes identification of seven new Fur-regulated lociin Escherichia coli Microbiology 146 3171ndash3182

52 ToledanoMB KullikI TrinhF BairdPT SchneiderTD andStorzG (1994) Redox-dependent shift of OxyR-DNA contactsalong an extended DNA binding site a mechanism for differentialpromoter selection Cell 78 897ndash909

53 BlattnerFR PlunkettG III BlochCA PernaNT BurlandVRileyM Collado-VidesJ GlasnerJD RodeCK et al (1997)The complete genome sequence of Escherichia coli K-12 Science277 1453ndash1474

54 BaggA and NeilandsJB (1987) Molecular mechanism of regula-tion of siderophore-mediated iron assimilation Microbiol Rev 51509ndash518

55 LavrrarJL and McIntoshMA (2003) Architecture of a Furbinding site a comparative analysis J Bacteriol 185 2194ndash2202

56 OrchardK and MayGE (1993) An EMSA-based method fordetermining the molecular weight of a protein ndash DNA complexNucleic Acids Res 21 3335ndash3336

57 MasseE and GottesmanS (2002) A small RNA regulates theexpression of genes involved in iron metabolism in Escherichia coliProc Natl Acad Sci USA 99 4620ndash4625

58 LavrrarJL ChristoffersenCA and McIntoshMA (2002)FurndashDNA interactions at the bidirectional fepDGC-entS promoterregion in Escherichia coli J Mol Biol 322 983ndash995

59 DelanyI RappuoliR and ScarlatoV (2004) Fur functions as anactivator and as a repressor of putative virulence genes in Neisseriameningitidis Mol Microbiol 52 1081ndash1090

60 MasseE VanderpoolCK and GottesmanS (2005) Effect ofRyhB small RNA on global iron use in Escherichia coliJ Bacteriol 187 6962ndash6971

61 ArgamanL HershbergR VogelJ BejeranoG WagnerEGMargalitH and AltuviaS (2001) Novel small RNA-encodinggenes in the intergenic regions of Escherichia coli Curr Biol 11941ndash950

62 WassarmanKM RepoilaF RosenowC StorzG andGottesmanS (2001) Identification of novel small RNAs usingcomparative genomics and microarrays Genes Dev 15 1637ndash1651

63 RobisonK McGuireAM and ChurchGM (1998) A compre-hensive library of DNA-binding site matrices for 55 proteins appliedto the complete Escherichia coli K-12 genome J Mol Biol 284241ndash254

64 SalgadoH Santos-ZavaletaA Gama-CastroS Millan-ZarateDDiaz-PeredoE Sanchez-SolanoF Perez-RuedaEBonavides-MartinezC and Collado-VidesJ (2001) RegulonDB(version 32) transcriptional regulation and operon organization inEscherichia coli K-12 Nucleic Acids Res 29 72ndash74

65 AndrewsSC RobinsonAK and Rodriguez-QuinonesF (2003)Bacterial iron homeostasis FEMS Microbiol Rev 27 215ndash237

66 SchneiderTD (2002) Consensus Sequence Zen ApplBioinformatics 1 111ndash119 httpwwwccrnpncifcrfgov~tomspaperszen

67 RuddKE and SchneiderTD (1992) Compilation of E coliribosome binding sites In MillerJH (ed) A Short Course inBacterial Genetics A Laboratory Manual and Handbook forEscherichia coli and Related Bacteria Cold Spring HarborCold Spring Harbor Laboratory Press New York pp 1719ndash1745

68 HengenPN LyakhovIG StewartLE and SchneiderTD(2003) Molecular flip-flops formed by overlapping Fis sitesNucleic Acids Res 31 6663ndash6673

69 FeeJA (1991) Regulation of sod genes in Escherichia colirelevance to superoxide dismutase function Mol Microbiol 52599ndash2610

70 IgarashiK SaishoT YuguchiM and KashiwagiK (1997)Molecular mechanism of polyamine stimulation of thesynthesis of oligopeptide-binding protein J Biol Chem 2724058ndash4064

71 YoshidaM MeksuriyenD KashiwagiK KawaiG andIgarashiK (1999) Polyamine stimulation of the synthesis ofoligopeptide-binding protein (OppA) Involvement of a structuralchange of the Shine-Dalgarno sequence and the initiation codonaug in oppa mRNA J Biol Chem 274 22723ndash22728

72 LitwinCM and CalderwoodSB (1993) Role of iron in regulationof virulence genes Clin Microbiol Rev 6 137ndash149

73 MakuiH RoigE ColeST HelmannJD GrosP andCellierMF (2000) Identification of the Escherichia coli K-12Nramp orthologue (MntH) as a selective divalent metal iontransporter Mol Microbiol 35 1065ndash1078

74 PatzerSI and HantkeK (2001) Dual repression by Fe(2+)-Furand Mn(2+)-MntR of the mntH gene encoding an NRAMP-likeMn(2+) transporter in Escherichia coli J Bacteriol 1834806ndash4813

75 FurrerJL SandersDN Hook-BarnardIG and McIntoshMA(2002) Export of the siderophore enterobactin in Escherichia coliinvolvement of a 43 kDa membrane exporter Mol Microbiol 441225ndash1234

76 CunninghamL GruerMJ and GuestJR (1997) Transcriptionalregulation of the aconitase genes (acnA and acnB) of Escherichiacoli Microbiology 143 ( Pt 12) 3795ndash3805

77 FuentesAM Diaz-MejiaJJ Maldonado-RodriguezR andAmabile-CuevasCF (2001) Differential activities of theSoxR protein of Escherichia coli SoxS is not required for geneactivation under iron deprivation FEMS Microbiol Lett 201271ndash275

78 VargheseS TangY and ImlayJA (2003) Contrasting sensitivitiesof Escherichia coli aconitases A and B to oxidation and irondepletion J Bacteriol 185 221ndash230

79 ZavitzKH DiGateRJ and MariansKJ (1991) The PriB andPriC replication proteins of Escherichia coli Genes DNAsequence overexpression and purification J Biol Chem 26613988ndash13995

80 SandlerSJ (2005) Requirements for replication restart proteinsduring constitutive stable DNA replication in Escherichia coli K-12Genetics 169 1799ndash1806

81 SchneiderTD (2001) Strong minor groove base conservation insequence logos implies DNA distortion or base flipping duringreplication and transcription initiation Nucleic Acids Res 294881ndash4891 httpwwwccrnpncifcrfgov~tomspaperbaseflip

Nucleic Acids Research 2007 Vol 35 No 20 6777