Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
6762ndash6777 Nucleic Acids Research 2007 Vol 35 No 20 Published online 5 October 2007doi101093nargkm631
Discovery of Fur binding site clusters inEscherichia coli by information theory modelsZehua Chen1 Karen A Lewis1 Ryan K Shultzaberger1 Ilya G Lyakhov12 Ming Zheng3
Bernard Doan34 Gisela Storz3 and Thomas D Schneider1
1National Cancer Institute at Frederick Center for Cancer Research Nanobiology Program 2Basic ResearchProgram SAIC-Frederick Inc National Cancer Institute at Frederick Frederick MD 21702-12013National Institute of Child Health and Human Development Cell Biology and Metabolism Branch and4Division of Extramural Activities Referral and Program Analysis Branch National Institute of Allergy andInfectious Diseases Bethesda MD 20892 USA
Received June 5 2007 Revised July 28 2007 Accepted July 31 2007
ABSTRACT
Fur is a DNA binding protein that represses bacterialiron uptake systems Eleven footprinted Escherichiacoli Fur binding sites were used to create an initialinformation theory model of Fur binding which wasthen refined by adding 13 experimentally confirmedsites When the refined model was scanned acrossall available footprinted sequences sequence walk-ers which are visual depictions of predicted bindingsites frequently appeared in clusters that fit thefootprints (raquo83 coverage) This indicated that themodel can accurately predict Fur binding Withinthe clusters individual walkers were separated fromtheir neighbors by exactly 3 or 6 bases consistentwith models in which Fur dimers bind on differentfaces of the DNA helix When the E coli genome wasscanned we found 363 unique clusters whichincludes all known Fur-repressed genes that areinvolved in iron metabolism In contrast only a fewof the known Fur-activated genes have predictedFur binding sites at their promoters These observa-tions suggest that Fur is either a direct repressor oran indirect activator The Pseudomonas aeruginosaand Bacillus subtilis Fur models are highly similar tothe E coli Fur model suggesting that the FurndashDNArecognition mechanism may be conserved for evendistantly related bacteria
INTRODUCTION
The protein Fur is the 168 kDa product of the fur (ferricuptake regulation) gene in Escherichia coli (1) so namedbecause it was first observed to repress the transcription ofgenes that code for components of ferric (Fe+3) uptakesystems found in the cell membrane Since then Fur alsohas been found to regulate other genes that are notdirectly related to iron transport such as those encodinghemolysin Shiga-like toxin and manganese superoxidedismutase (2ndash5)
Fur binds to DNA and represses transcription in thepresence of divalent metal ions The ion is thought to beFe+2 in vivo (6) however DNase I footprinting experi-ments have shown that Fur also binds to DNA in thepresence of Mn+2 Co+2 Cu+2 Cd+2 and Zn+2 (7)Recent studies have suggested that purified Fur containsat least one Zn+2 ion as a structural stabilizer (8) Fur hasbeen observed to bind to DNA as a dimer and in higherorder polymers (79) and electron microscopy has shownpolymerization of Fur on DNA under high concentrationsof protein and metal ions (2)
Numerous strategies have been employed to findnew Fur binding sites Various consensus sequenceshave been derived from both footprinted and non-footprinted Fur binding sites (3710) and these havebeen compared to sequences in the promoter region ofsuspected iron-regulated genes Putative Fur targets werethen investigated further through genetic and biochem-ical experiments Stojiljkovic et al created a successful
The authors wish it to be known that in their opinion the first two authors should be regarded as joint First AuthorsPresent addressKaren A Lewis Department of Physiology University of Texas Southwestern Medical Center at Dallas Dallas TX 75390-9040 USARyan K Shultzaberger Department of Molecular and Cell Biology University of California Berkeley CA 94720-3202 USAMing Zheng Dupont Central Research and Development Experimental Station E328-B31 Wilmington DE 19880-0328 USAZehua Chen Department of Biology Boston College MA 02467 USA
To whom correspondence should be addressed Tel +1 301 846 5581 Fax +1 301 846 5598 Email tomsncifcrfgov
2007 The Author(s)
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (httpcreativecommonsorglicenses
by-nc20uk) which permits unrestricted non-commercial use distribution and reproduction in any medium provided the original work is properly cited
lsquoFur titration assayrsquo to locate new Fur binding sitesusing an fhuFlacZ fusion and Fur consensus sequence-containing plasmid titrant on MacConkey plates (1)Several new iron-regulated genes in E coli were discoveredusing this consensus sequence-based technique In addi-tion to the above studies have also been carried out usingE coli Fur for DNase I footprinting with non-E coliDNA (1112) Recently transcriptional profiles of E coligenes have been used to determine those that are regulatedby iron and Fur by evaluating mRNA levels in the absenceof iron or Fur protein (13)
Another method for finding Fur-regulated genes is touse molecular information theory to locate new bindingsites Using this approach classical information theory(1415) is applied to molecular biology (16) First a set ofbinding sites is aligned by maximizing the informationcontent (17) and then the average pattern at the sitesis represented by a computer graphic called a sequencelogo (18) Next the conservation of bases in the alignedset is used to create a weight matrix model that assignsa weight in bits to each base at each position according toits frequency in the data set (19) This can be displayedusing the sequence walker graphic (20)
In addition to displaying details of binding sitessequence logos can be used to understand the mechanismof binding In instances where factors bind in overlappingclusters it is difficult to assign the relative contribution ofa base in an overlapping region to the appropriate binderor to determine the range of the binding site Here wetested several Fur binding site models that were obtainedby multiply aligning Fur binding sequences using differentwindow sizes and identified the model that best representsbinding by a single Fur dimer
Information theory has previously been used to buildtwo models to evaluate and predict Fur binding sites(1321) Both models used ad hoc variations of informa-tion theory to assign scores to the predicted binding sitesrather than classical information content in bits In onecase the model was built using some sites that had notbeen footprinted by Fur and were probably not aligned tomaximize the information content (21)
The most rigorous approach to model building is to usea data set comprised of only footprinted binding sitesfrom one species By restricting the data set to experi-mentally proven sites one is certain that the model willreflect the binding characteristics of the protein the use ofa single species ensures that the protein and DNA bindingsequences evolved together and therefore correspondto one another (22) Many biases from previous modelsare thereby avoided but not all (23) The resultingexperimentally supported model is then scanned acrossthe entire genome of the species looking for sequencesthat contain a positive amount of information asevaluated by the weight matrix (19) Sequence walkerswhich are graphical representations of individual bindingsites then display probable binding sites on the genomebased on the model of proven sites (20) This method wassuccessfully used to discover that the OxyR transcriptionfactor controls the expression of the fur gene (24) toidentify additional sites for proteins such as Fis SoxSOxyR and PXRRXR (2325ndash27) to characterize splice
junctions (28) and to discover T7 islands a unique classof mobile genetic elements (2930) In this studyinformation theory has allowed us to identify new Furbinding sites 13 of which we confirmed experimentally
MATERIALS AND METHODS
Programs
Programs used in this study are available at httpwwwccrnpncifcrfgov~toms A web-based toolfor searching for Fur sites is available at httpwwwccrnpncifcrfgov~tomsdelilaserverhtml
Creating the Fur model
Eleven footprinted E coli Fur binding sequences wereextracted from the E coli K-12 genome (NC_000913)by the delila program (31) These sites are from thepromoters of the genes cir fecA fecI fur sodA tonB andiucA along with two bidirectional promoter regions forthe genes fepA-fes and fepB-entC (832ndash39) The promoterregion at fepB-entC has two distinctly protected regionseach region was included in the data set as individualsequences (fepB and entC) The promoter of iucA has anexceptionally long secondary footprint and so two regionswere used from the E coli plasmid ColV-K30 (M10930from 347 to 370 and from 365 to 393) (7) The comple-ment of each footprinted sequence was also includedsince Fur binds as a dimer (67940) The programmalign was used to obtain an alignment of the sequencesthat maximizes the information content within definedwindows of the alignment (17)As the initial model described above contains only
11 sites we then refined the model by including 13 moresites that were identified in the genome by a searchand confirmed by gel shift experiments in this study(see subsequently) The validity of the refined modelwas tested by scanning it across all available E coliFur footprinted regions using the programs scan andlister to create sequence walkers (1920) The model isverified if the walkers correspond to DNase I footprintregions For further verification the model was alsoscanned across synthetic oligonucleotides that containGATAAT repeats and which had been previouslyfootprinted (5)To compare Fur models of different bacteria we also
built Pseudomonas aeruginosa and Bacillus subtilis modelsby using footprinted Fur binding sequences from thesetwo species (4142)
Scanning for Fur binding sites
Fur in high concentration can bind weak sites while inlow concentration it binds only strong sites The affinity ofFur to its binding site also varies significantly with theavailability of metal Furthermore as demonstrated byfootprint data Fur binding can extend to flanking regionsthat bear little resemblance to the strong Fur binding sitesThis suggests that it may not be practical to use a fixedcutoff for a Fur binding site model under different
Nucleic Acids Research 2007 Vol 35 No 20 6763
conditions In our study we used two cutoff values fordifferent scansFor the whole E coli genome scan we used the lowest
information in the set of sites used to build the model asthe cutoff This should minimize false positives in ourpredictions as the model was built from experimentallyproven sites Groups of sites that were within 100 basesof each other were identified using the localbest programand the strongest one was selected to represent theregion This ensured that each region in the data set wasunique All sites with Ri values greater than 170 bits wereextracted (Table 4) This value was chosen simply to allowfor a manageable set of regions for further analysisAs Fur can extend binding to much weaker regions in
the footprints we should use a lower cutoff for scanningfootprinted sequences The second law of thermodynamicssets a theoretical lower bound for the information contentof a binding site (Ri) at zero bits (19) Therefore we useda zero-bit cutoff when scanning footprinted sequencesFur has been suggested to repress genes by binding to
their promoter elements (35 and 10 regions) therebyblocking the access of polymerase to the promoters(3343ndash46) To determine how many of the predictedFur binding clusters overlap with promoters we used aninformation theory-based flexible s70 binding model toscan the Fur clusters The s70 model was built from401 experimentally proven E coli promoters by uniformlytaking into account the information present in the 10the 35 and the uncertainty of the spacing between them(47) the same method has been successfully used to modelprokaryotic ribosome binding sites (48) The informationcontent (average conservation) of the promoter model islow (x frac14 65 bits SD=28 bits for the Ri distribution)and the individual information for a site in the modelranges from 03 to 126 bits The probability that a site isless than 4 bits is only 186 so we used 4 bits as a cutoffto predict reasonably strong non-activated promoters (47)
Gel mobility shift assays
Two sets of oligonucleotides containing predictedFur binding sites in E coli were designed andsynthesized (Oligos Etc) All oligonucleotides were self-complementary had a 50-GCTA-30 overhang on the50 end and contained a hairpin loop Oligos exbB exbDand fhuF contained a hairpin of the sequence 50-CGCGAAGC G-30 while the other 12 oligos [yoeAfepD-entS (formerly ybdA) gpmA ryhB fhuA nohAoppA gspC garP (formerly yhaU) yahA fadD and ygaC]contained a hairpin of the sequence 50-ACGATCGCGCGAAGC GCGATCGT-30 in the center Such loopsform a structure that is stabilized by base pairing betweenG3 and A5 of the central seven bases of each loop (49) andthey are convenient for use in DNA mobility shift assaysbecause of the exact equimolar concentrations of thecomplementary strands and their high stability (25)Three oligos containing the promoter regions for exbB
ygaC and the upstream region of exbD were created totest previously published consensus sequence predictions(5051) Eight oligos contained potential Fur-controlledpromoters identified using both an 11-site Fur model and
a s70 model as described above [yoeA fepD-entS (formerlyybdA) gpmA ryhB fhuA nohA oppA and fhuF] We werealso interested in whether Fur binds intragenically so fouroligos were synthesized that contained strong predictedsites found within gene coding regions (gspC garP yahAand fadD) These sequences can be found in SupplementaryTable S1 and Figure S1
Gel mobility shift assays were performed using the15 oligos (Figure 3) The oligonucleotides were labeled bya fill-in reaction with Taq DNA polymerase The 20 ml oflabeling mixture [20 pmol oligo 2 mM MgCl2 1PCRbuffer 5 units Taq polymerase (GibcoBRL) and 10 mMtetramethylrhodamine-6-dUTP (NEN)] was incubated at72C for 1 h 80 ml of ddH2O was added to the mixturefollowed by two phenol extractions a phenolchloroformextraction and a chloroform extraction The labeledoligonucleotides were diluted 15 in TE (10 mM TrisndashClpH 75 1 mM EDTA) boiled for 10 min and then placedon ice to prevent dimerization and trimerization
The labeled oligonucleotides (5 ml each 02 pmol) werethen incubated in Fur binding buffer (10 mM BisndashTrisndashHCl pH 75 5 mgml sonicated salmon sperm DNA 5glycerol 100 mM MnCl2 100 mgml BSA 1 mM MgCl240 mM KCl) with 150 nM Fur protein at 37C for 13 min(7) Fur was a gift from T OrsquoHalloran and C OuttenSamples were electrophoresed on a 5 polyacrylamidegel in Fur electrophoresis buffer (01 M BisndashTrisndashHClpH 75 10 mM MnCl2) at 120V for 1 h
Bands were visualized with an FMBIO II fluore-scent scanner (Hitachi) with an excitation wavelengthof 532 nm and a 585 nm filter for detection oftetramethylrhodamine
Footprinting
To generate the fhuF promoter construct (pGSO129) usedto test for Fur binding a 240-bp fragment amplified byPCR from genomic DNA (using the primers 50-GCGGCT GGA GAT GAA TTC GCC AGA TG and 50-GCCCTG CAA TCA GGG ATC CCG GCA GC) was clonedinto the BamHI and EcoRI sites of pUC18 (27) PurifiedFur protein generously provided by C Outten andT OrsquoHalloran was incubated with a Mn2+-containingbuffer according to de Lorenzo et al (7) DNase I foot-printing then was carried out as described previously (52)The 240-bp BamHIndashEcoRI fragment of pGSO129 waslabeled with [g-32P]ATP at either the BamHI site (topstrand) or the EcoRI site (bottom strand) The labeledfragments were incubated with 0 50 100 200 and 400 nMpurified Fur protein at room temperature for 5 minThe samples then were subjected to limited DNase Idigestion purified and separated on 8 polyacrylamidesequencing gels
RESULTS
E coli Fur binding model
Eleven footprinted Fur binding sites (see Table 3 andSupplementary Figure S1) (832ndash39) from E coliK-12 (53)were used to create an initial information theory model
6764 Nucleic Acids Research 2007 Vol 35 No 20
Because Fur binds as a dimer (954) the model includedboth the footprinted Fur sequences and their comple-ments The sequences were aligned using the programmalign which maximizes the information content within aregion of the alignment (a lsquowindowrsquo) by shuffling the
sequences back and forth (17) Multiple alignmentwith a window size of (12 thorn12) gave a sequencelogo that shows base conservation in the range of(12 thorn12) (Figure 1A) For convenience we named thislogo M12
A
B
C
M12208 bits(minus12 +12)
M9181 bits(minus9 +9)
M6154 bits(minus6 +6)
===========gtolt===========
========gtolt========
=====gtolt=====
========gtolt================gtolt========
=====gtolt==========gtolt=====
0
1
2
bit
s
5prime
minus20
GTAC
minus19
minus18
CGTA
minus17
CGTA
minus16
GACT
minus15
ACGT
minus14
GTCA
minus13
minus12
GTA
minus11
GTA
minus10
GACT
minus9CTAG
minus8
TCGA
minus7
GCTA
minus6
TCA
minus5
TA
minus4
CT
minus3
TAG
minus2
CGA
minus1
CGT
0
TA
1
GCA
2
GCT
3
ATC
4
GA
5
AT
6
GAT
7
GCTA
8
AGCT
9
GATC
10
CTGA
11
CAT
12
CAT
13 14
CAGT
15
TGCA
16
CTGA
17
GCAT
18
GCAT
19 20
CATG
3prime
0
1
2
bit
s
5prime
minus20
CGTA
minus19
minus18
minus17
GTCA
minus16
GCAT
minus15
GCTA
minus14
GTA
minus13
GACT
minus12
CAGT
minus11
CGA
minus10
GCAT
minus9
TA
minus
minus8
GTA
minus7
GCAT
minus6
CAG
minus5
CA
minus4
GAT
minus3
TA
minus2
GCA
minus1
GCT
0 1
CGA
2
CGT
3
AT
4
CTA
5
GT
6
GTC
7
GCTA
8CAT
9AT10
CGTA
11
GACT
12
GTCA
13
TCGA
14
CAT
15
CGAT
16
CGTA
17
CAGT
18 19 20
GCAT
3prime
0
1
2
bit
s
5prime minus20
GCA
minus19
GCAT
minus18
GCTA
minus
minus17
GCTA
minus16
GCAT
minus15
CAGT
minus14
TGCA
minus13
GTAC
minus12
GCTA
minus11
GTA
minus10
GACT
minus9
CAG
minus8
CTGA
minus7
GCAT
minus6
A
minus5
TA
minus4
CT
minus3
TCAG
minus2
CGA
minus1
AGT
0
TA
1
TCA
2
GCT
3
GATC
4
GA
5
AT
6
T7
GCTA
8
GACT
9
GATC
10
TCGA
11
CAT
12
GCAT
13
CTGA
14
ACGT
15
GTCA
16
GTCA
17
GCAT
18
GCAT
19
CGTA
20
CAGT
3prime
Figure 1 Different alignments of footprinted E coli Fur binding sites Sequence alignments were done using the program malign with differentwindow sizes [from (15 +15) to (5 +5)] (17) on 11 footprinted E coli Fur binding sequences and their complements (see Supplementary FigureS1 for sequences) Three classes of alignments M12 (A) M9 (B) and M6 (C) were obtained by using window sizes of (12 +12) (9 +9) and(6 +6) respectively In these logos the height of each letter is proportional to the frequency of that base at each position and the height of theletter stack is the conservation in bits (18) The dashed sine wave on each logo represents the 106 base helical twist of B-form DNA (3181) Thedouble dashed arrows on the top of each logo mark the inverted repeats in each model Note that the M12 logo corresponds to two overlapping M9sites as indicated by the two red double dashed arrows below the M12 logo A similar relationship occurs between the M9 and M6 logos
Nucleic Acids Research 2007 Vol 35 No 20 6765
The M12 logo appears to consist of two overlappingdirect repeats one is from 12 to thorn6 and the other from6 to thorn12 Because the multiple alignment process canincorporate one or the other of these repeats this suggeststhat different alignments may be obtained when otherwindows are used to align the sequences Because two Furdimers can bind overlapping sites with 6-base separation(4055) we then multiply aligned the sequences by usingwindow sizes ranging from (15 thorn15) to (5 thorn5) Thesesizes should cover the sites recognized by one or twoFur dimers (9) A total of seven different alignments wereobtained the corresponding sequence logos are shown inSupplementary Figure S2 According to the patternsshown in the logos the seven alignments that wereobtained can be further consolidated into and assignedto three basic alignment classes these alignment classesare represented by logos M12 M9 and M6 (Figure 1)Each of the three alignments itself is an inverted repeatthe M12 logo also appears to consist of two overlappingM9 sites (from 12 to thorn6 and from 6 to thorn12) and theM9 logo looks like two overlapping M6 sites (from 9to thorn3 and from 3 to thorn9) (Figure 1)To determine which of the three alignment classes best
represents a single-dimer binding model some publishedexperimental data was analyzed Lavrrar and McIntosh(55) used the Ferguson method (56) to determine howmany E coli Fur dimers can bind on five synthetic oligosand two natural sites fepB and entS We applied the threemodels to their sequences and used sequence walkersto show the predicted sites of each model A full list ofthe predictions (with walkers gt0 bits) is given inSupplementary Figure S3 The results show that themodels frequently found overlapping sites that areseparated by 3 or 6 bases The sites that are 6 basesapart can both be strong while 3-base separated siteshave one strong site and one weak site (SupplementaryFigure S3) Furthermore it has been shown that Furdimers can simultaneously bind two overlapping sites with6-bp spacing (40) but there is no experimental datashowing that 3-base separated overlapping sites can bebound at the same time Therefore to analyze the resultswe only counted major walkers (strong walkers that areseparated by 6 bases) for each model in each oligo
A summary of the results (with major walkers) is availablein Table 1 two examples of the prediction (with majorwalkers) are shown in Figure 2A and B The results showthat the M9 model made correct predictions for five ofthe seven oligos while the M12 and M6 models bothmade only two correct predictions Furthermore the M12model always predicted one less site than the M9 modelThese results suggest that the M9 model may be a single-dimer binding model the M12 model may representa two-dimer binding model while the M6 alignment maybe just a result of compression of the binding sequencesinto a small alignment window
Confirmation of 13 predicted sites and model refinement
Since our E coli Fur model had only 11 sites we wantedto refine the model by including more sites A genome scanwith the 11-site M9 model revealed 389 sites above 11 bits(the lowest information of a site in the model is 111 bits)Within these 13 sites (from 12 to 26 bits) were selectedfor experimental confirmation (Figure 3 SupplementaryTable S1 and Figure S1) These include eight sites locatedin promoter regions [yoeA fepD-entS (43) gpmA (= pgm)(51) ryhB (= yhhX-yhhY) (5157) fhuA (13) nohA (51)oppA (13) and fhuF (51)] and four sites inside genes(gspC garP yahA and fadD) We also included two sitesthat were predicted by consensus models to be in ygaCand exbD (5051) since our model only detected a weaksite (14 bits) in ygaC and a negative site (50 bits) inexbD (Figure 3) Gel mobility shift assays showed that all13 sites predicted by the M9 model shifted when incubatedwith Fur protein while the exbD site did not shift and theygaC site only showed extremely weak binding (Figure 3)
McHugh et al (13) found that both exbB and exbDwere regulated by Fur but their model did not detect abinding site in the exbB promoter while our informationtheory model successfully predicted that Fur binding siteand our gel shift results confirmed this (Figure 3)The exbD gene is located only seven bases downstreamof the exbB gene Both genes have the same orientationso they may belong to a single operon cotranscribed fromthe exbB promoter and coregulated by Fur
To strengthen our model we added the 13 confirmedsites to the 11 previously footprinted sites (Figure 4A)
Table 1 Information theory test of 11-site E coli Fur models M12 M9 and M6
Oligo1 Number of observed1
Fur dimers (nM)Number of M122
walkers (bits)Number of M92
walkers (bits)Number of M62
walkers (bits)
F-F 1 (25000) f 1 (55) 2 (90 94) f 1 (115)F-F-F 2 (3610) 1 (148) f 2 (118 153) f 2 (119 99)F-F-F-F 4 (630) 2 (166 192) 3 (118 189 153) 3 (119 119 99)F-F-x-R 2 (095) f 2 (221 43) f 2 (177 176) 1 (204)F-F-x-R-R 3 (078) 2 (296 227) f 3 (177 275 108) 2 (204 164)fepB 2 (860) 1 (157) f 2 (109 82) 1 (152)entS 3 (301) 2 (156 172) f 3 (75 172 95) 2 (83 122)
1Oligos and observed Fur dimers are from (55) The apparent Kd is also given (nM in parentheses) for each oligo2Number of sequence walkers of each model followed by the information (bits) of each walker are shown for each sequence The predictions thatmatch the number of observed Fur dimers are marked with a solid circle while those that do not are marked with an open circle Only majorwalkers (strong walkers with 6-bp separation) are counted (see the text) A full depiction of all walkers (gt0 bits) can be found in SupplementaryFigures S3 and S5
6766 Nucleic Acids Research 2007 Vol 35 No 20
As with the 11 sites multiple alignment of the 24 sitesusing different window sizes also gave three classes ofalignments M12 M9 and M6 (Supplementary Figure S4)When these models were tested against the same set ofsequences as that used to test the 11-site models (Table 1Supplementary Figure S3) similar but cleaner resultswere obtained ie the weak walkers that were not countedpreviously become weaker (some are even below 0 bits)but the strong walkers still have similar Ri values(Supplementary Figure S5) This confirms that ourcounting of major walkers predicted by the initial 11-sitemodels was reasonable
P aeruginosa and B subtilis Fur models
To compare E coli Fur models with other bacterialFur models we also built P aeruginosa and B subtilis Furbinding site models As with E coli Fur binding siteswe applied the same method to align footprinted Furbinding sites from P aeruginosa (41) and B subtilis (42)For both bacteria we obtained two classes of alignments
(Supplementary Figure S6) which correspond to theE coli M9 and M6 logos (Figure 1B and C) respectivelyNo M12 alignments were obtained for these two bacteriaprobably because some sites used to make the modelsare single-dimer binding sites (see Discussion Section)Both P aeruginosa and B subtilis M9 models arehighly similar to the E coli M9 model (Figure 4) TheP aeruginosa M6 logo looks almost identical to theE coli M6 logo while the B subtilis M6 logo is lesssimilar to the E coli M6 logo (Figure 1C SupplementaryFigure S6B and D)As with Lavrrar and McIntoshrsquos work on E coli Fur
binding sites (55) Baichoo and Helmann (40) performedsimilar experiments on eight synthetic oligos and twonatural sites (feuA and dhbA) with B subtilis Fur anddetermined how many Fur dimers can bind each of theseoligos We tested the B subtilis M9 and M6 modelswith these 10 sequences (Table 2 Figure 2C and Dand Supplementary Figure S7) The results show that theM9 model made correct predictions for all 10 sequenceswhile the M6 model was correct for only four oligos
A
B
fepB binds 2 Fur dimers
entS binds 3 Fur dimers
C
D
feuA binds 1 Fur dimer
dhbA binds 2 Fur dimers
10 20 30 5 g g a a t t c g a a a a t g a g a a g c a t t a t t g g a t c c c g 3
M12 157 bits
M9 109 bits
M9 82 bits
M6 152 bits
10 20 30 40 5 g g a a t t c g a t a a t g a a a t t a a t t a t c g t t a t c g g a t c c c g 3
M12 156 bits
M12 172 bits
M9 75 bits
M9 172 bits
M9 95 bits
M6 83 bits
M6 122 bits
10 20 30 5 a t t c c a a t t g a t a a t a g t t a t c a a t t g a a c a 3
M9 223 bits
M6 103 bits
M6 81 bits
10 20 30 5 a t t g a t a a t g a t a a t c a t t a t c a a t a g a t t g 3
M9 256 bits
M9 299 bits
M6 149 bits
M6 208 bits
M6 171 bits
Figure 2 Sequence walkers to test E coli (A and B) and B subtilis (C and D) Fur models (A and B) The three E coli Fur models (M12 M9 andM6) were tested with two E coli Fur binding sites fepB and entS (55) Sequence walkers of each model (green M12 red M9 blue M6) are shownfor each site (C and D) Two B subtilis Fur models (M9 and M6) were tested with two B subtilis Fur binding sites feuA and dhbA (40) Gray barsindicate natural sequence surrounded by synthetic DNA Different colored rectangles indicate different Fur models (by hue green M12 red M9blue M6) and their strengths (by saturation) (30) Only major walkers are shown in these figures for a full list of walkers (Rigt0 bits) please seeSupplementary Figures S3 S5 and S7
Nucleic Acids Research 2007 Vol 35 No 20 6767
For most of the oligos the M6 model predicted one moresite than the M9 model did Similar to our observationsin E coli these results strongly suggest that the M9 modelis a single-dimer binding model while the M6 alignment isdue to compression of the binding sequencesRelative to E coli Fur (NP_415209) the P aeruginosa
Fur (NP_253452) is moderately diverged (58 identical inprotein sequence) and the B subtilis Fur (NP_390233) isdistantly diverged (33 identity) However a comparisonof the three M9 models from these three species showsthat these models are highly similar to each other(Figure 4) suggesting that the FurndashDNA recognitionmechanism is conserved in even distantly related species
Scans of footprinted regions
Gel shift experiments cannot give precise Fur bindingregions Originally using the 11-site M9 model and laterusing the 24-site M9 E coli Fur model we predicted Fursites in the fhuF promoter region and then to validate ourmodel we performed DNase I footprinting on that regionTo further confirm the 24-site M9 model we also scannedit across all other available footprinted E coli Fur bindingregions and correlated our predictions with the publishedfootprints Fur has been documented to exhibit lsquosecondaryfootprintingrsquo protecting extended regions of DNA underhigher protein concentrations (Supplementary Figure S8)(333646) To reveal the extended regions we used a cutoffat zero bits since that is the theoretical lowest informationfor binding (19)The E coli 24-site M9 model predicts two separated
Fur binding clusters in the fhuF promoter region onecluster contains three strong sites of 161 208 and 223bits and the other contains two weaker sites of 146 and
93 bits (Figure 5A) The fhuF oligo used in the gel shiftexperiments in Figure 3 only contains the strong Furcluster To determine if both predicted clusters are pro-tected by Fur we performed DNase I footprinting on alarger fhuF promoter region (see Materials and MethodsSection) that contains both predicted Fur binding clustersThe results show that there are indeed two Fur-protectedregions in the fhuF promoter one has high affinity withFur (protected by 50 nM of Fur) and the other hasa lower affinity (weakly protected by 50 nM of Fur)(Figure 5B) The predicted strong Fur binding clustercovers 91 of the high-affinity footprinted region and theweak Fur cluster covers 66 of the low-affinity region(Table 3 Figure 5A)
There are 11 other footprinted E coli Fur bindingregions including the 10 regions used to build the initial11-site models and the fepD-entS promoter region (58)When these footprints were scanned similar coveragewas obtained (Table 3 Supplementary Figure S8)All 11 footprinted sequences displayed clusters of multipleoverlapping Fur walkers in these clusters the majorityof the walkers were separated by six bases some wereseparated by three bases (Table 3 SupplementaryFigure S8) For 7 of the 11 regions (fepA-fes fepB entCfur tonB cir and iucA) sequence walkers cover 83(72ndash93) of the footprints No sequence walkers coverthe secondary footprints in the fecA fepD-entS and sodAregions resulting in low coverage of these three footprints(66 54 and 56 respectively) The secondary sodAfootprint was only protected with a high concentration ofpurified Fur (200 and 500 nM) it was not protected bycrude extracts containing overproduced Fur (36) In theother region fecI several low-information content
A
B
yoeA fepD-entS gpmA ryhB fhuA nohA oppA
bits 257 172 243 220 161 205 187 11-site M9274 194 274 252 195 223 215 24-site M9
Fur minus + minus + minus + minus + minus + minus + minus +
gspC garP yahA fadD ygaC exbB exbD fhuF
bits 122 122 163 197 14 175 minus50 214 11-site M9144 138 186 217 minus34 191 minus71 223 24-site M9
Fur minus + minus + minus + minus + minus + minus + minus + minus +
Figure 3 Gel mobility shift assays to test predicted Fur binding sites Oligonucleotides containing predicted Fur binding sites were incubated without() or with (+) 150 nM E coli Fur protein and gel electrophoresed to test the 11-site M9 model (Figure 1B) Below each set of lanes is the strengthin bits of the strongest sequence walker found on each oligo by the 11-site M9 model Predictions by the 24-site M9 model which includes 13 of thesites tested here (not ygaC or exbD) are also given for comparison (A) The first set of oligos contain seven predicted Fur sites located in promoterregions (yoeA fepD-entS gpmA ryhB fhuA nohA and oppA) (B) The second set of oligos contain four predicted Fur sites located within genes(gspC garP yahA and fadD See Supplementary Figures S10ndashS13 for sequence walkers) two sites located in promoter regions (exbB and fhuF)and two consensus-predicted sites (ygaC and exbD) (5051) As expected all oligos that contain predicted sites (by the 11-site M9 model) exhibit oneor more mobility shifts following incubation with 150 nM Fur the oligo ygaC (14 bits) shows extremely weak binding and exbD (50 bits)does not shift
6768 Nucleic Acids Research 2007 Vol 35 No 20
walkers (02 and 08 bits) extend past the range of thefootprint resulting in 144 coverage of the footprint
In a previously published study Escolar et al synthe-sized oligonucleotides that contained repeats of thesequence GATAAT and determined Fur binding byfootprint experiments (5) No footprint was seen withone insert correspondingly no sequence walkers wereobserved in that sequence (Supplementary Figure S9)For the oligos with three four and five inserts sequencewalkers cover 83ndash90 of the footprints The 2-insertsequence had a weak interaction with Fur at high protein
concentrations and two overlapping walkers of 30 and42 bits appeared which cover the repeats (5) Theseresults demonstrate that the Fur model accurately predictsFur binding on both natural and synthetic sequences
Whole genome scan
The lowest information of any site in the 24-site M9 modelis 101 bits (Supplementary Figure S1) Since all sites inthe model are proven binding sites we used 10 bits asthe cutoff for a whole genome scan This presumably
A
B
C
======gtolt======
======gtolt======
======gtolt======
24 E coli Fur binding sites 196 bits (minus9 +9)
0
1
2b
its
5prime
minus20
CTGA
minus19
minus18
minus17
minus16
GCAT
minus15
GCTA
minus14
CGTA
minus13
GACT
minus12
CAGT
minus11
CGTA
minus10
GCAT
minus9
GTA
minus8
GTA
minus7
GCAT
minus6
CAG
minus5
GTCA
minus4
GAT
minus3
TA
minus2
TGCA
minus1
GACT
0 1
TCGA
2
ACGT
3
AT
4
TCA
5
CGAT
6
GTC
7
CTGA
8
CAT
9
CAT
10
CGTA
11
GCAT
12
GTCA
13
CTGA
14
GCAT
15
CGAT
16
CGTA
17 18 19 20
GCAT
3prime
20 P aeruginosa Fur sites 186 bits (minus9 +9)
0
1
2
bit
s
5prime
minus20
minus19
minus18
minus17
minus16
minus15
minus14
CGTA
minus13
ACGT
minus12
minus11
minus10
CGTA
minus9
CGTA
minus8
CTA
minus7
CT
minus6
TAG
minus5
GTCA
minus4
CAGT
minus3
TA
minus2
CTA
minus1
CT
0
GC
1
GA
2
GAT
3AT
4GTCA
5
CAGT
6
ATC
7
GA
8
GAT
9
GCAT
10
GCAT
11 12 13
TGCA
14
GCAT
15 16 17 18 19 20 3prime
22 B subtilis Fur sites 229 bits (minus9 +9)
0
1
2
bit
s
5prime
minus20
minus19
GCAT
minus18
GCTA
minus17
GACT
minus16
CTGA
minus15
minus14
GCAT
minus13
GCAT
minus12
minus11
GCAT
minus10
GCTA
minus9
CGTA
minus8
CGTA
minus7
AT
minus6
C
G
minus5
GCA
minus4
CGAT
minus3
TGA
minus2
TGA
minus1
GCAT
0
TAGC
1
GCTA
2
CAT
3
CAT
4
GCTA
5
CGT
6
G
C7
TA
8
GCAT
9
GCAT
10
CGAT
11
CGTA
12 13
CGTA
14
CGTA
15 16
GACT
17
CTGA
18
CGAT
19
CGTA
20 3prime
Figure 4 Comparison of M9 Fur models of three bacterial species Experimentally proven Fur binding sites from E coli (A) P aeruginosa (B) andB subtilis (C) were used to build the models for these three bacteria The marked region from 7 to +7 in each logo indicates the 7-1-7 model thatwas proposed by Baichoo and Helmann (40)
Nucleic Acids Research 2007 Vol 35 No 20 6769
minimizes false positives in our predictions When themodel was scanned across the E coli K-12 genome(NC_000913) a total of 412 sites were found above10 bits These sites belong to 363 unique clusters A list ofthe clusters is available at httpwwwccrnpncifcrfgovtomspapersfurThe 363 regions were extracted from 50 to thorn50
around the strongest site in each region and scanned withthe 24-site M9 model using a cutoff of 0 bits We thendetermined the exact region of each cluster that is coveredby walkers greater than 0 bits The results show that theseclusters contain 1ndash14 walkers covering a region from19ndash142 bases (350 bases on average) This is comparableto the known footprints (30ndash103 bases) (Table 3 Supple-mentary Figure S8) Of these clusters 155 are in inter-genic regions 143 are entirely inside gene coding regionsand 65 overlap the boundary between coding and inter-genic regionsTo determine how many of the Fur clusters overlap
with reasonably strong promoters we scanned these 363Fur clusters with a s70 promoter model (47) using a 4-bitcutoff The results showed that 303 clusters overlap withone or more strong promoters (Rigt4 bits) The other60 clusters do not overlap with a strong promoterWhen we scanned the flanking regions of these 60 clusters(50 to 0 of the left edge and 0 to thorn50 of the right edgeof each cluster) only 23 clusters were found to havestrong promoters (Ri gt 4 bits) in the flanking regionsWe examined the locations of Fur sites relative toknown promoter elements (transcriptional start site 10and 35) and found that they are more tightly distributedaround the promoter elements than observed with theDNA binding proteins Fis H-NS and IHF (47) (data notshown) These results suggest that Fur may mainly act as adirect repressor of genes by binding to and blockingthe promoters and that direct activation by Fur throughbinding to the region upstream of a promoter assuggested in Neisseria meningitidis by Delany et al (59)
may not be a common mechanism of Fur regulationin E coli
Out of the 363 clusters 42 were found containingat least one predicted Fur binding site above 170 bits(our arbitrary cutoff used to locate significant regions)(Table 4) Of the 42 clusters 39 were found to overlap withone or more strong s70 promoters (gt4 bits) The otherthree clusters fepD-entS yddA and fecA do not overlapScanning the flanking regions of these three clusters withthe s70 promoter model revealed that these Fur clustersare located between the respective promoters and transla-tional starts There may be only weak repression of thesegenes by Fur as none of them were significantly repressedby Fur according to microarray analyses (1360)
Scans of other proposed Fur-regulated genes
Many genes have been proposed to be Fur-regulated bycomparing conventional consensus sequences to promoterregions and also by homology to systems in other organ-isms Kammler et al annotated two Fur binding sites inthe promoter region of feoA in E coli by comparison toa consensus sequence (10) The 24-site M9 model predictstwo clusters of sites in the same region with each clustercovering one of the marked Fur boxes One cluster hasone major site of 105 bits (at 3538006) the other clusterhas two major overlapping sites of 109 (at 3538059) and134 bits (at 3538065) respectively The same authors haveconfirmed that Fur does bind to this region in vivo but theexact Fur binding regions have not been determined byfootprint experiments
Vassinova and Kozyrev used an in vivo selection tolocate Fur sites on Sau3A fragments from the E coligenome (51) The five regions from Figure 2 of their paperwere analyzed using sequence walkers Using an unidenti-fied consensus sequence yhhX (from 3578828 to 3577791)was predicted by Vassinova and Kozyrev to have two Furbinding sites In the same region we found one strongsite of 252 bits (at 3579054) (Table 4) overlapping twoweaker sites (82 and 83 bits) This cluster of Fur sites isactually located right in front of an sRNA gene ryhB(from 3579039 to 3578946) (Table 4) (6162) which hasbeen shown to be repressed by Fur (57) The promoterregion of gpmA (from 786818 to 786066 named pgm inreference 51) was predicted by consensus to contain twosites One of the highest sites (274 bits) was found in thisregion (at 786853) overlapping two weaker sites of 75and 56 bits The consensus sequence-predicted Fur site byVassinova and Kozyrev (51) is located about 600 basesupstream of the ygaC gene In the same region our 11-siteM9 model only found a 14-bit site (Figure 3) and the24-site M9 model found a negative site of 34 bits(Supplementary Figure S1) Instead in the nearby regionsthe 24-site M9 model found a 74-bit site (11 basesupstream of ygaC) and a 101-bit site (728 bases upstreamof ygaC) For nohA (from 1634391 to 1633822)two overlapping sites of 223 and 126 bits were found at1634624 and 1634630 (Table 4) The fifth region was fhuF(from 4603686 to 4602898) Figure 5 shows the DNase Ifootprints and predicted walkers in this region Onlythe high-affinity region (fhuF1) was identified by the
Table 2 Information theory test of B subtilis Fur models M9 and M6
Oligo1 Number of observed1
Fur dimers (nM)Number of M92
walkers (bits)Number of M62
walkers (bits)
6-mer 0 f 0 f 06-6 1 (500) f 1 (21) 2 (67 54)6-1-6 1 (200) f 1 (97) 2 (54 74)7-1-7 1 (100) f 1 (207) 2 (107 128)8-1-7 2 (100 1000) f 2 (76 202) f 2 (174 108)8-1-8 2 (100 1000) f 2 (76 230) f 2 (174 160)9-1-9 2 (50 100) f 2 (119 301) f 2 (208 194)2 (7-1-7) 2 (10 100) f 2 (166 257) 3 (61 208 143)feuA 1 (10) f 1 (223) 2 (103 81)dhbA 2 (10 100) f 2 (256 299) 3 (149 208 171)
1Oligos and observed Fur dimers are from (40) The Fur concentration(nM) at which one or two Fur dimer-DNA complexes appeared isgiven in parentheses2Number of sequence walkers of each model followed by theinformation (bits) of each walker are shown for each sequence Thepredictions that match the number of observed Fur dimers are markedwith a solid circle while those that do not are marked with an opencircle Only major walkers are counted A full depiction of all walkerscan be found in Supplementary Figure S7
6770 Nucleic Acids Research 2007 Vol 35 No 20
A B
L
H
H
L
H fhuF1
L fhuF2
yjjZ
4603880 4603870 4603860 4603850 4603840 4603830 4603820 5 a t c a g c a a t c c c g g c a g c a a c a c t c c c c a g c c a c t g c c c a g c g t a c g t t g c a a c a t g a t t t c a t c t c 3
4603810 4603800 4603790 4603780 4603770 4603760 4603750 5 t t t c a t t g a t a a t g a t a a c c a a t a t c a t a t g a t a a t t t t t a t c a t t t g c a a g c c a g a t a a a t c c c t t g c t a t c g g 3
Fur high affinity topFur high affinity bottom
M9 161 bits
M9 208 bits
M9 223 bits
M9 34 bits
fhuF
fhuF
4603740 4603730 4603720 4603710 4603700 4603690 4603680 5 g t a a a c c t a t c g c t a t g a t t a g c a a t c a t t a t c a t t t a g a t t a c t a t c c c g a t t a t g g c c t a t c g t t c 3rsquo
Fur low affinity topFur low affinity bottom
M9 22 bits
M9 01 bits
M9 146 bits
M9 93 bits
Figure 5 Fur model scan and DNase I footprints of the fhuF promoter region (A) The promoter region was scanned with the 24-site M9 model and two clusters of sequence walkers (Rigt 0bits) were detected with each closely corresponding to one of the two Fur protected regions (gray bars fhuF1 and fhuF2) The solid black arrow with two tails above the DNA at coordinates4603717 and 4603716 indicates the fhuF transcription starts The open arrows starting at coordinates 4603827 and 4603686 indicate the yjjZ and fhuF translation starts respectively Thesaturation of colored rectangles behind each sequence walker is proportional to the information of that site For reference colored lines cycle with 6-base periodicity (B) DNase I footprinting onthe fhuF promoter by Fur showed two regions protected by the protein marked in the figure by brackets The high-affinity and low-affinity regions are marked with H and L respectively Thefootprinting samples were run in parallel with MaxamndashGilbert sequencing ladders (marked by GA)
Nucleic
Acid
sResea
rch2007V
ol3
5N
o20
6771
consensus sequence Thus four of the five Sau3Afragments (yhhX gpmA nohA and fhuF) selected to haveFur binding sites in vivo were identified in our genomescan (Table 4) and our gel shift results confirmed this(Figure 3) the remaining site ygaC had a weak walker of14 bits (by the 11-site M9 model) and showed extremelyweak binding (Figure 3)Eick-Helmerich and Braun matched a Fur consensus
sequence to the promoters of exbB and exbD (50)One strong walker of 191 bits was found in the exbBpromoter region overlapping the lsquoFur boxrsquo The regionupstream of exbD showed no walkers with Ri greater than0 bits even though it had also been predicted to containa Fur binding site using the same consensus sequenceas used in the exbB promoter The highest walker thatwe could detect in the region upstream of the exbDgene is 71 bits (by the 24-site M9 model Figure 3 andSupplementary Figure S1) significantly lower than thetheoretical lower bound of a binding site (0 bits) (19) Thisindicates that Fur should not bind this region and our gelshift results confirmed this (Figure 3)A microarray analysis by McHugh et al found 101
genes to be regulated by Fur 53 of which were repressedand 48 activated (13) The 53 Fur-repressed genes belongto 32 transcription units (individual genes or operons)18 of which are involved in iron metabolism 14 in energymetabolism and other functions Using the 24-site M9model we predicted strong Fur binding clusters (higherthan 13 bits) for all 18 iron metabolism-related transcrip-tion units Of the other 14 transcription units onlytwo have a Fur binding site higher than 10 bits (nrdHIEF101 bits fimE 108 bits) The 48 Fur-activated genesbelong to 34 transcription units only three of these(garPLRK ynaE and ydfK) have a predicted Fur bindingcluster The garPLRK operon has a weak Fur clusterthat contains one major site of 104 bits The ynaE andydfK genes are almost identical in both promoter andcoding regions and the two Fur clusters (171 bitscentered at thorn8 from the translational start) are the same(Table 4) These results strongly suggest that in E coli
Fur is a direct repressor of iron metabolism genes and anindirect repressor or activator of other genes
McHugh et al (13) used an altered information theoryapproach for modeling Fur binding (63) to identify whichgenes are under direct Fur control Our model identifiedsites in all genes that their model did as well as fourothers fhuE (124 bits at 1160880) exbB (191 bits at3150076) fimE (108 bits at 4539697) and ydfK (171 bitsat 1631071)
The indirect activation by Fur requires an intermediateregulator which could directly target Fur-activated genesA small regulatory RNA gene ryhB is one such importantplayer Earlier study revealed several genes that aredirectly repressed by ryhB (57) Recently a microarrayanalysis of ryhB control by Masse et al (60) expanded thisdirect target set to 18 operons They also identified anadditional 10 indirectly down-regulated operons and 10up-regulated operons The 10 ryhB indirectly repressedoperons are directly repressed by Fur and for all ofthese operons we found strong Fur binding clusters(higher than 13 bits) in the corresponding promoterregions In contrast only a few of the other 28 ryhB-controlled operons (18 directly repressed operons and10 up-regulated operons) have a predicted Fur bindingcluster above 10 bits These include acnA (176 bits at1333484) ydhD (109 bits at 1732367) and oppA (215 bitsat 1298973) The acnA and oppA Fur clusters are bothstrong the former cluster has three overlapping sites of176 98 and 142 bits and the latter has two overlapp-ing sites of 215 and 96 bits (Table 4) Howevermicroarray analysis did not find these two genes to berepressed by Fur (1360) suggesting that the Fur bindingclusters may not be the primary control elements ofthese genes
DISCUSSION
Fur binding models
In this study we used experimentally proven sites to createinformation theory models of Fur binding Our models
Table 3 Scan of footprinted Fur binding sites in E coli
Accession Gene Footprinted region References Number Ri Percentage
NC_000913 fhuF1 4603763 ndash 4603815 This work 4 223 91NC_000913 fhuF2 4603680 ndash 4603733 This work 4 146 66NC_000913 fepA-fes 611868 ndash 611915 (35) 4 187 77NC_000913 fepB 623939 ndash 623969 (39) 2 167 81NC_000913 entC 624054 ndash 624102 (39) 4 231 76NC_000913 fur 709931 ndash 709960 (34) 2 123 83NC_000913 tonB 1309040 ndash 1309072 (838) 3 115 88NC_000913 cir 2244957 ndash 2244999 (32) 3 207 72NC_000913 sodA 4098726 ndash 4098780 (36) 3 233 56NC_000913 fecA 4514752 ndash 4514789 (33) 2 176 66NC_000913 fecI 4516300 ndash 4516340 (33) 7 231 144M10930 iucA 291 ndash 393 (37) 13 203 93NC_000913 fepD-entS 621394 ndash 621462 (58) 4 201 54
The footprinted regions were scanned with the 24-site M9 model (Rigt0) to test the modelrsquos validity This table lists the GenBank accession numberthe gene promoter that was footprinted the coordinates of the footprinted region the references for the footprints the number of walkers in theregion the strongest Ri value in the same region and the percent of the footprint covered by sequence walkers This table is a summary of Figure 5Aand Supplementary Figure S8 Sites in the promoters that are marked by an asterisk were used to build the 11-site models (Figure 1 SupplementaryFigure S1) in the case of iucA two sites were used in the model (see Materials and Methods Section)
6772 Nucleic Acids Research 2007 Vol 35 No 20
especially the 24-site M9 model approximate the bindingcharacteristics of Fur more fully than previous modelsthat depended on conventional consensus sequences datafrom multiple species and sequences that were not foot-printed (13251) The rigorous approach used hererevealed new binding sites disproved two sites predictedby a consensus sequence and clarified the manner inwhich the protein binds
To produce these models binding sequences weremultiply aligned to maximize the information content indifferent window sizes Because multiple Fur sites overlapour analysis of proven Fur binding sequences from three
bacterial species gave two or three different alignmentsdepending on the species (Figure 1 SupplementaryFigure S6) By testing these models against publisheddata we showed clearly that the M9 model representssingle-dimer binding sites the M12 model is for two-dimerbinding while the compact M6 alignment was caused bycompression of the binding sequences into a small window(Tables 1 and 2 Figures 1 and 2 Supplementary FiguresS3 S5 and S7)As evaluated by their respective M9 models our
initial set of 11 footprinted E coli Fur binding sitesall contain overlapping sites of 6-base spacing
Table 4 Strong Fur clusters in the E coli genome predicted by the 24-site M9 model
Ri Number Size Coordinate Genes Experimental Position References(bits) (bp) data
274 12 95 2066611 yoeA G 48 NA274 5 31 786853 gpmA G 35 (1351)253 4 28 579824 nohB 133 (1)252 4 31 3579054 ryhB G 15 (57)233 3 31 4098746 sodA F 87 (6069)231 7 59 4516309 fecI F 51 (1360)231 4 37 624072 entC F 36 (39)228 2 25 1903412 yebN 300 NA223 4 48 4603779 fhuFa FG 93 (135160)223 3 25 1634624 nohA G 233 (51)217 2 25 1886651 fadD G +1119b NA215 2 25 1298973 oppA G 233 (7071)213 3 31 1762463 sufA 53 (1360)207 3 31 2244980 cir F 189 (1372)206 5 40 1118358 yceJ 89 NA202 3 31 3465053 bfd 40 (1360)201 10 53 1787602 ydiE 35 (13)201 2 35 1080513 ycdN 57 (1360)199 8 106 3214572 yqjH 59 (13)199 7 103 1752756 ydhY 255 NA199 2 25 2510784 mntH 56 (7374)196 3 25 2231892 yeiT 163 NA196 2 25 1577358 yddA +8c (13)195 3 31 167439 fhuA G 45 (1360)194 4 37 621437 fepD ndash entS FG 25 86 (1375)193 3 52 675768 ybeQ +2c NA191 3 31 3150076 exbB G 70 (136072)187 8 96 2784036 ygaQ 383 NA187 4 37 611889 fepA ndash fes F 172 149 (13)186 7 38 3266404 yhaC 33 NA186 4 28 331901 yahA G +306e NA183 4 61 331085 yahA 510e NA176 6 41 1333484 acnA 371 (5776ndash78)176 2 25 4514779 fecA F 79 (1360)176 1 19 2254041 yeiL +5d NA175 3 25 490057 priC ndash ybaN 21 49 (7980)173 2 29 2454142 yfcV 474 NA172 8 41 3582772 yrhB 10 NA172 4 37 840877 fiu 123 (13)171 3 40 1631071 ydfK +8c (13)171 3 40 1432273 ynaE +8c (13)171 2 25 4570225 yjiT 212 NA
The strongest individual information value (Ri bits) in each cluster is shown followed by the number of walkers (Rigt 0 bits) the size of the clusterthe coordinate of the strongest site the genes that are presumably controlled experimental data footprinted (F) andor gel shifted (G) sites thelocation of the strongest Fur site relative to the gene start or stop codon (as noted) and references about Fur regulation of the genes (NA means noreference is available)aThis cluster corresponds to the high-affinity region (fhuF1) in the fhuF promoter (Figure 5)bThe coordinate is high because Fur cluster is inside the fadD coding regioncThese Fur clusters overlap with translational startsdThis Fur cluster overlaps with the stop codoneThe yahA gene has two Fur clusters one is inside the coding region another is 510 bases upstream of the translational start The Fur cluster insidethe gene was tested by gel shift experiments (Figure 3)
Nucleic Acids Research 2007 Vol 35 No 20 6773
(Supplementary Figure S8) while only 7 of the 20P aeruginosa Fur binding sites and 6 of the 22 B subtilisFur binding sites are single-dimer binding sites (data notshown) This scarcity of multiple sites explains why anM12 alignment can be obtained for E coli but not forP aeruginosa or B subtilis When we performed multiplealignment by only using overlapping sites of 6-bp spacingwe obtained M12 alignments for both P aeruginosa andB subtilis (data not shown) These observations alsosupport the use of M9 as a single-dimer modelOur multiple alignment method should apply to any
DNA binding sites that contain internal direct repeats asrepetition within binding sites may cause alternativealignments of the sites In the RegulonDB (httpregulondbccgunammx) (64) a Fur binding modelwas created by multiply aligning 47 Fur sites althoughit is not clear what these 47 sites are and how thealignment was made Based on their alignment (httpregulondbccgunammxhtmlmatrix_Alignmentjsp) we made a sequence logo (data notshown) which we found to be similar to our M6 logosAlthough the M6 models appear to be able to predictsome sites (eg Tables 1 and 2 Supplementary Figures S3S5 and S7) the models only represent the internal partof overlapping Fur binding sites ie from 6 to thorn6 of theM12 model (Figure 1A)A 18 A crystal structure for P aeruginosa Fur
was obtained by Pohl et al and based on this theauthors proposed a single-dimer binding model (9) TheP aeruginosa crystal structure contains a putative DNAbinding a-helix H4 and a loop between helices H1 and H2both of which were proposed to be involved in bindingDNA through contacting bases in the major groove Thusa single dimer would protect two consecutive majorgrooves on the same face of DNA and the minimalrecognition unit of Fur should be close to 20 bp (9) OurM9 binding site models (19 bp a 9-1-9 inverted repeat) fitwith this single-dimer binding model closely Pohl et alalso proposed a two-dimer binding model in whichtwo Fur dimers bind two overlapping sites that are 5-bpapart (9) However in our whole genome scans andfootprint scans as well we did not observe any case of5-bp separated overlapping sites above 5 bits Instead wemostly found 3- or 6-bp separated overlapping sitesAbove five bits we found 95 cases of 3-bp spacing and92 cases of 6-bp spacingSeveral other consensus-based Fur binding site models
have been proposed to interpret a 19-bp consensuslsquoFur boxrsquo (Figure 6) (65) Within these two earlier
models the classical model and hexamer model(573754) interpreted the lsquoFur boxrsquo as a single recogni-tion unit of a Fur dimer Later Lavrrar and McIntoshsuggested that a 13-bp inverted repeat (6-1-6) is theminimal unit recognized by a single Fur dimer and thattwo overlapping lsquo6-1-6rsquo motifs correspond to the Fur boxwhich is required for high-affinity binding of Fur (5558)Baichoo and Helmann reinterpreted the Fur box as twooverlapping 7-1-7 inverted repeats with a 6-bp spacingand suggested that a 7-1-7 site but not 6-1-6 representsthe minimal recognition unit of Fur (40) Baichoo andHelmann also demonstrated that high-affinity bindingby Fur can happen on a single-dimer binding site ie aninverted 7-1-7 site
The 7-1-7 model basically agrees with our M9 models(Figure 6) The main difference is that the 7-1-7 model didnot count the weakly conserved bases at positions 9 8thorn8 and thorn9 in the B subtilis Fur binding site alignment(Figure 4C) As the E coli and P aeruginosa M9 modelshave highly conserved bases at these four positions(Figure 4A and B) and also the Fur binding mechanismsfor these three species are similar (9) we suggest thata minimal Fur binding unit of 19 bp (9-1-9) as representedby our M9 models is more reasonable
Relative to our M9 models the Fur box only representsthe internal part of two overlapping sites that areseparated by 6 bases ie from 9 to thorn9 of the M12logo (Figures 1A and 6) thus any predictions based on theFur box may not be accurate Two sites (exbD and ygaC)that were predicted by matching to the Fur box wereshown to be non-binding (Figure 3)
Many difficulties in understanding Fur binding sites canbe attributed to the choice of consensus sequences asa model (66) In contrast to the individual informationweight matrix model the consensus sequence methodignores the varying importance of bases by treatingmismatches equivalently In addition the consensusmethod does not have a criterion for an acceptablenumber of mismatches
Sequence walkers predicted by the 24-site M9 modelcover 83 of the footprints (Table 3 Figure 5Supplementary Figure S8) Similar results were obtainedfor both the P aeruginosa and B subtilis M9 models(data not shown) These results strongly suggest that ourM9 models can accurately predict Fur binding
Fur Binding Site Clusters
The whole genome scan with the 24-site M9model revealed363 Fur binding clusters in E coli most of which containmultiple walkers (up to 14) covering regions up to about140 bases This is comparable to the size of knownfootprints which ranges from 30 to 103 bases (Table 3Supplementary Figure S8) Most of the clusters (303 outof 363 83) were found to overlap with one or morestrong s70 promoters Indeed the four Fur clusters inthe middle of coding regions gspC garP yahA and fadDthat we tested by gel mobility shift assays (Figure 3) allhave one or more potential s70 sites (Ri gt 4 bits) exactlyoverlapping them (see Figures S10ndashS13) The function ofthese control elements is unknown
Fur boxClassical model 9-1-9Hexamer model 6-6-1-6Lavrrar model 6-1-6
Baichoo model 7-1-7
M9 model 9-1-9
GATAATGATAATCATTATC
========gtOlt========
=====gt=====gtOlt=====
=====gtOlt==========gtOlt=====
t=====gtOlt============gtOlt=====a
aat=====gtOlt================gtOlt=====att
Figure 6 Different Fur binding models to interpret the Fur box
6774 Nucleic Acids Research 2007 Vol 35 No 20
We correlated our whole genome scan with publishedmicroarray data about Fur in E coli (1360) The resultsshowed that all operons that are involved in ironmetabolism have a predicted strong Fur binding cluster(higher than 13 bits) suggesting direct repression ofthese genes by Fur In contrast only a few of otherFur-activated or repressed operons have a predicted Furcluster higher than 10 bits suggesting indirect control ofthese genes by Fur It has been shown that RNA generyhB directly targets and represses a set of other genes andryhB itself is directly repressed by Fur thus ryhB playsan important role in indirect activation of genes by Furin E coli (135760)
It has been suggested that in N meningitidis Fur canalso directly activate genes by binding to the regionupstream of the promoters (59) However our wholegenome analysis did not support this activation mechan-ism in E coli The E coli ryhB gene is located betweengenes yhhX and yhhY No corresponding homologs of thegenes ryhB yhhX or yhhY can be found in N meningitidisThis suggests that Fur activation mechanisms may bedifferent in different species
Disentangling Fur models
Each binding site that we have studied in detail withinformation theory has had unique properties andchallenges for analysis For example ribosome bindingsites were initially modeled as rigid objects (16) but whenenough sites became available it was clear that theShinendashDalgarno sequence was being lost because it has avariable distance to the initiation codon (67) and a flexiblemodel was required (48) The flexible model was latersuccessfully applied to s70 promoters (47) and that wasused to study Fur sites Likewise information theoryanalysis revealed that Fis binding sites are commonlyfound to overlap each other (68) leading to someuncertainty as to how much nearby sites influence thesites being studied in the alignments Overlapping Fursites that are bound by Fur protein dimers create an evenmore complex situation
Although the 24-site M9 model was quite successful themodel is comprised of information from multiple Fur sitesbecause the Fur dimers overlap on the DNA Since theclusters frequently display 6-base separation of sitespositions 3 to thorn9 of the model contain data frompositions 9 to thorn3 of the adjacent Fur sites eg see thelogo of Figure 4 and the M9 walkers of Figure 2AIn addition information from sites separated by 3 basesand the complements of all of these also contribute to theoverall information content of the model It is not obvioushow to disentangle these effects We suggest that the M9model can be scanned across the genome to locate strongsingle-dimer binding sites which could then be experi-mentally confirmed and used to construct a disentangledmodel
Unfortunately the distribution of the individual infor-mation in these sites would be determined by the modelused to locate them and this could introduce biases Forexample the mean of the distribution can be altered justby selecting different subsets of sites from the distribution
A similar bias has occurred in binding site databaseswhere the consensus sequence was used to locate newsites (23) Thus although the sequence logo would nolonger contain self-overlapping data it would not neces-sarily represent the natural distribution of site strengthsTo approach such an ultimate model may require furtherscanning of footprinted regions Incorporation of theweaker sites in the footprints would of course re-entanglethe model so such scans might be best used only todetermine whether the model can detect the weaker sitesA viable solution may be to obtain all single-dimerbinding sites in the genome that have more than zero bitsand can be confirmed experimentally If the resultingmodel matches the observed footprints it may be consi-dered complete On the other hand if the resultingmodel does not match the footprints the mode of bindingby Fur could be different between single dimers andclusters
SUPPLEMENTARY DATA
Supplementary data are available at NAR Online
ACKNOWLEDGEMENTS
We thank Tom OrsquoHalloran and Caryn Outten for Furprotein Pete Rogan for helping to develop the local bestconcept Elaine Bucheimer for writing the localbestprogram and Lakshmanan Iyer Brent Jewett DanielleNeedle Xiao Ma Shu Ouyang Denise Rubens andBruce Shapiro for their helpful comments and discussionsKAL also thanks the National Cancer Institute atFrederick for sponsoring the Werner H Kirsten StudentIntern Program This publication has been funded in partwith Federal funds from the National Cancer InstituteNational Institutes of Health under contract NO1-CO-12400 The content of this publication does not necessarilyreflect the views or policies of the Department of Healthand Human Services nor does mention of trade namescommercial products or organizations imply endorsementby the US Government This research was supported inpart by the Intramural Research Program of the NIHNational Cancer Institute Center for Cancer ResearchFunding to pay the Open Access publication charges forthis article was provided by National Cancer Institute
Conflict of interest statement None declared
REFERENCES
1 StojiljkovicI BaumlerAJ and HantkeK (1994) Fur regulon ingram-negative bacteria Identification and characterization of newiron-regulated Escherichia coli genes by a fur titration assay[published erratum appears in J Mol Biol 1994 Jul 15240(3)271]J Mol Biol 236 531ndash545
2 Le CamE FrechonD BarrayM FourcadeA and DelainE(1994) Observation of binding and polymerization of Fur repressoronto operator-containing DNA with electron and atomic forcemicroscopes Proc Natl Acad Sci USA 91 11816ndash11820
3 CalderwoodSB and MekalanosJJ (1987) Iron regulation ofShiga-like toxin expression in Escherichia coli is mediated by thefur locus J Bacteriol 169 4759ndash4764
Nucleic Acids Research 2007 Vol 35 No 20 6775
4 NiederhofferEC NaranjoCM BradleyKL and FeeJA (1990)Control of Escherichia coli superoxide dismutase (sodA and sodB)genes by the ferric uptake regulation (fur) locus J Bacteriol 1721930ndash1938
5 EscolarL Perez-MartinJ and de LorenzoV (1998) Binding of theFur (Ferric Uptake Regulator) repressor of Escherichia coli toarrays of the GATAAT sequence J Mol Biol 283 537ndash547
6 BaggA and NeilandsJB (1987) Ferric uptake regulation proteinacts as a repressor employing iron (II) as a cofactor to bind theoperator of an iron transport operon in Escherichia coliBiochemistry 26 5471ndash5477
7 de LorenzoV WeeS HerreroM and NeilandsJB (1987)Operator sequences of the aerobactin operon of plasmid ColV-K30binding the ferric uptake regulation (fur) repressor J Bacteriol169 2624ndash2630
8 AlthausEW OuttenCE OlsonKE CaoH andOrsquoHalloranTV (1999) The ferric uptake regulation (Fur) repressoris a zinc metalloprotein Biochemistry 38 6559ndash6569
9 PohlE HallerJC MijovilovichA Meyer-KlauckeWGarmanE and VasilML (2003) Architecture of a protein centralto iron homeostasis crystal structure and spectroscopic analysis ofthe ferric uptake regulator Mol Microbiol 47 903ndash915
10 KammlerM SchonC and HantkeK (1993) Characterization ofthe ferrous iron uptake system of Escherichia coli J Bacteriol 1756212ndash6219
11 DesaiPJ AngererA and GencoCA (1996) Analysis of Furbinding to operator sequences within the Neisseria gonorrhoeae fbpApromoter J Bacteriol 178 5020ndash5023
12 HeidrichC HantkeK BierbaumG and SahlHG (1996)Identification and analysis of a gene encoding a Fur-likeprotein of Staphylococcus epidermidis FEMS Microbiol Lett 140253ndash259
13 McHughJP Rodriguez-QuinonesF Abdul-TehraniHSvistunenkoDA PooleRK CooperCE and AndrewsSC(2003) Global iron-dependent gene regulation in Escherichia coliA new mechanism for iron homeostasis J Biol Chem 27829478ndash29486
14 ShannonCE (1948) A mathematical theory of communicationBell Syst Tech J 27 379ndash423 623ndash656 httpcmbell-labscomcmmswhatshannondaypaperhtml
15 PierceJR (1980) An Introduction to Information Theory SymbolsSignals and Noise 2nd edn Dover Publications Inc New York
16 SchneiderTD StormoGD GoldL and EhrenfeuchtA (1986)Information content of binding sites on nucleotide sequencesJ Mol Biol 188 415ndash431 httpwwwccrnpncifcrfgov~tomspaperschneider1986
17 SchneiderTD and MastronardeD (1996) Fast multiple alignmentof ungapped DNA sequences using information theory and arelaxation method Discrete Appli Math 71 259ndash268 httpwwwccrnpncifcrfgov~tomspapermalign
18 SchneiderTD and StephensRM (1990) Sequence logos a newway to display consensus sequences Nucleic Acids Res 186097ndash6100 httpwwwccrnpncifcrfgov~tomspaperlogopaper
19 SchneiderTD (1997) Information content of individual geneticsequences J Theor Biol 189 427ndash441 httpwwwccrnpncifcrfgov~tomspaperri
20 SchneiderTD (1997) Sequence walkers a graphical method todisplay how binding proteins interact with DNA or RNAsequences Nucleic Acids Res 25 4408ndash4415 httpwwwccrnpncifcrfgov~tomspaperwalker erratum NAR 26(4)1135 1998
21 PaninaEM MironovAA and GelfandMS (2001) Comparativeanalysis of FUR regulons in gamma-proteobacteria Nucleic AcidsRes 29 5195ndash5206
22 SchneiderTD (2000) Evolution of biological informationNucleic Acids Res 28 2794ndash2799 httpwwwccrnpncifcrfgov~tomspaperev
23 VyhlidalCA RoganPK and LeederJS (2004) Development andrefinement of pregnane X receptor (PXR) DNA binding site modelusing information theory insights into PXR-mediated gene regula-tion J Biol Chem 279 46779ndash46786
24 ZhengM DoanB SchneiderTD and StorzG (1999) OxyR andSoxRS regulation of fur J Bacteriol 181 4639ndash4643 httpwwwccrnpncifcrfgov~tomspaperoxyrfur
25 HengenPN BartramSL StewartLE and SchneiderTD (1997)Information analysis of Fis binding sites Nucleic Acids Res 254994ndash5002 httpwwwccrnpncifcrfgov~tomspaperfisinfo
26 WoodTI GriffithKL FawcettWP Jair K-W SchneiderTDand WolfRE (1999) Interdependence of the position andorientation of SoxS binding sites in the transcriptional activation ofthe class I subset of Escherichia coli superoxide-inducible promotersMol Microbiol 34 414ndash430
27 ZhengM WangX DoanB LewisKA SchneiderTD andStorzG (2001) Computation-directed identification of OxyR-DNAbinding sites in Escherichia coli J Bacteriol 183 4571ndash4579
28 RoganPK FauxBM and SchneiderTD (1998) Informationanalysis of human splice site mutations Hum Mutat 12 153ndash171Erratum in Hum Mutat 199913(1)82 httpwwwccrnpncifcrfgov~tomspaperrfs
29 ChenZ and SchneiderTD (2005) Information theory basedT7-like promoter models classification of bacteriophages anddifferential evolution of promoters and their polymerasesNucleic Acids Res 33 6172ndash6187 httpwwwccrnpncifcrfgov~tomspaperst7like
30 ChenZ and SchneiderTD (2006) Comparative analysis of tandemT7-like promoter containing regions in enterobacterial genomesreveals a novel group of genetic islands Nucleic Acids Res 341133ndash1147 httpwwwccrnpncifcrfgov~tomspaperst7island
31 SchneiderTD (1996) Reading of DNA sequence logos predictionof major groove binding by information theory Meth Enzymol274 445ndash455 httpwwwccrnpncifcrfgov~tomspaperoxyr
32 GriggsDW and KoniskyJ (1989) Mechanism for iron-regulatedtranscription of the Escherichia coli cir gene metal-dependentbinding of Fur protein to the promoters J Bacteriol 1711048ndash1052
33 AngererA and BraunV (1998) Iron regulates transcription of theEscherichia coli ferric citrate transport genes directly and throughthe transcription initiation proteins Arch Microbiol 169 483ndash490
34 de LorenzoV HerreroM GiovanniniF and NeilandsJB (1988)Fur (ferric uptake regulation) protein and CAP (catabolite-activatorprotein) modulate transcription of fur gene in Escherichia coliEur J Biochem 173 537ndash546
35 HuntMD PettisGS and McIntoshMA (1994) Promoter andoperator determinants for fur-mediated iron regulation in thebidirectional fepA-fes control region of the Escherichia colienterobactin gene system J Bacteriol 176 3944ndash3955
36 TardatB and TouatiD (1993) Iron and oxygen regulation ofEscherichia coli MnSOD expression competition between the globalregulators Fur and ArcA for binding to DNA Mol Microbiol 953ndash63
37 EscolarL Perez-MartinJ and de LorenzoV (2000) Evidence ofan unusually long operator for the fur repressor in the aerobactinpromoter of Escherichia coli J Biol Chem 275 24709ndash24714
38 YoungGM and PostleK (1994) Repression of tonB transcriptionduring anaerobic growth requires Fur binding at the promoterand a second factor binding upstream Mol Microbiol 11943ndash954
39 BrickmanTJ OzenbergerBA and McIntoshMA (1990)Regulation of divergent transcription from the iron-responsivefepB-entC promoter-operator regions in Escherichia coliJ Mol Biol 212 669ndash682
40 BaichooN and HelmannJD (2002) Recognition of DNA by Fura reinterpretation of the Fur box consensus sequence J Bacteriol184 5826ndash5832
41 OchsnerUA and VasilML (1996) Gene repression by the ferricuptake regulator in Pseudomonas aeruginosa cycle selection of iron-regulated genes Proc Natl Acad Sci USA 93 4409ndash4414
42 BaichooN WangT YeR and HelmannJD (2002) Globalanalysis of the Bacillus subtilis Fur regulon and the iron starvationstimulon Mol Microbiol 45 1613ndash1629
43 ChristoffersenCA BrickmanTJ Hook-BarnardI andMcIntoshMA (2001) Regulatory architecture of the iron-regulatedfepD-ybdA bidirectional promoter region in Escherichia coliJ Bacteriol 183 2059ndash2070
44 ChaiS WelchTJ and CrosaJH (1998) Characterization of theinteraction between Fur and the iron transport promoter of thevirulence plasmid in Vibrio anguillarum J Biol Chem 27333841ndash33847
6776 Nucleic Acids Research 2007 Vol 35 No 20
45 EscolarL Perez-MartinJ and de LorenzoV (1998) Coordinatedrepression in vitro of the divergent fepA-fes promoters ofEscherichia coli by the iron uptake regulation (Fur) proteinJ Bacteriol 180 2579ndash2582
46 EscolarL de LorenzoV and Perez-MartinJ (1997)Metalloregulation in vitro of the aerobactin promoter of Escherichiacoli by the Fur (ferric uptake regulation) protein Mol Microbiol26 799ndash808
47 ShultzabergerRK ChenZ LewisKA and SchneiderTD (2007)Anatomy of Escherichia coli s70 promoters Nucleic Acids Res 35771ndash788 httpwwwccrnpncifcrfgov~tomspaperflexprom
48 ShultzabergerRK BucheimerRE RuddKE andSchneiderTD (2001) Anatomy of Escherichia coli ribosomebinding sites J Mol Biol 313 215ndash228 httpwwwccrnpncifcrfgov~tomspaperflexrbs
49 HiraoI KawaiG YoshizawaS NishimuraY IshidoYWatanabeK and MiuraK (1994) Most compact hairpin-turnstructure exerted by a short DNA fragment d(GCGAAGC) insolution an extraordinarily stable structure resistant to nucleasesand heat Nucleic Acids Res 22 576ndash582
50 Eick-HelmerichK and BraunV (1989) Import of biopolymers intoEscherichia coli nucleotide sequences of the exbB and exbD genesare homologous to those of the tolQ and tolR genes respectivelyJ Bacteriol 171 5117ndash5126
51 VassinovaN and KozyrevD (2000) A method for direct cloning ofFur-regulated genes identification of seven new Fur-regulated lociin Escherichia coli Microbiology 146 3171ndash3182
52 ToledanoMB KullikI TrinhF BairdPT SchneiderTD andStorzG (1994) Redox-dependent shift of OxyR-DNA contactsalong an extended DNA binding site a mechanism for differentialpromoter selection Cell 78 897ndash909
53 BlattnerFR PlunkettG III BlochCA PernaNT BurlandVRileyM Collado-VidesJ GlasnerJD RodeCK et al (1997)The complete genome sequence of Escherichia coli K-12 Science277 1453ndash1474
54 BaggA and NeilandsJB (1987) Molecular mechanism of regula-tion of siderophore-mediated iron assimilation Microbiol Rev 51509ndash518
55 LavrrarJL and McIntoshMA (2003) Architecture of a Furbinding site a comparative analysis J Bacteriol 185 2194ndash2202
56 OrchardK and MayGE (1993) An EMSA-based method fordetermining the molecular weight of a protein ndash DNA complexNucleic Acids Res 21 3335ndash3336
57 MasseE and GottesmanS (2002) A small RNA regulates theexpression of genes involved in iron metabolism in Escherichia coliProc Natl Acad Sci USA 99 4620ndash4625
58 LavrrarJL ChristoffersenCA and McIntoshMA (2002)FurndashDNA interactions at the bidirectional fepDGC-entS promoterregion in Escherichia coli J Mol Biol 322 983ndash995
59 DelanyI RappuoliR and ScarlatoV (2004) Fur functions as anactivator and as a repressor of putative virulence genes in Neisseriameningitidis Mol Microbiol 52 1081ndash1090
60 MasseE VanderpoolCK and GottesmanS (2005) Effect ofRyhB small RNA on global iron use in Escherichia coliJ Bacteriol 187 6962ndash6971
61 ArgamanL HershbergR VogelJ BejeranoG WagnerEGMargalitH and AltuviaS (2001) Novel small RNA-encodinggenes in the intergenic regions of Escherichia coli Curr Biol 11941ndash950
62 WassarmanKM RepoilaF RosenowC StorzG andGottesmanS (2001) Identification of novel small RNAs usingcomparative genomics and microarrays Genes Dev 15 1637ndash1651
63 RobisonK McGuireAM and ChurchGM (1998) A compre-hensive library of DNA-binding site matrices for 55 proteins appliedto the complete Escherichia coli K-12 genome J Mol Biol 284241ndash254
64 SalgadoH Santos-ZavaletaA Gama-CastroS Millan-ZarateDDiaz-PeredoE Sanchez-SolanoF Perez-RuedaEBonavides-MartinezC and Collado-VidesJ (2001) RegulonDB(version 32) transcriptional regulation and operon organization inEscherichia coli K-12 Nucleic Acids Res 29 72ndash74
65 AndrewsSC RobinsonAK and Rodriguez-QuinonesF (2003)Bacterial iron homeostasis FEMS Microbiol Rev 27 215ndash237
66 SchneiderTD (2002) Consensus Sequence Zen ApplBioinformatics 1 111ndash119 httpwwwccrnpncifcrfgov~tomspaperszen
67 RuddKE and SchneiderTD (1992) Compilation of E coliribosome binding sites In MillerJH (ed) A Short Course inBacterial Genetics A Laboratory Manual and Handbook forEscherichia coli and Related Bacteria Cold Spring HarborCold Spring Harbor Laboratory Press New York pp 1719ndash1745
68 HengenPN LyakhovIG StewartLE and SchneiderTD(2003) Molecular flip-flops formed by overlapping Fis sitesNucleic Acids Res 31 6663ndash6673
69 FeeJA (1991) Regulation of sod genes in Escherichia colirelevance to superoxide dismutase function Mol Microbiol 52599ndash2610
70 IgarashiK SaishoT YuguchiM and KashiwagiK (1997)Molecular mechanism of polyamine stimulation of thesynthesis of oligopeptide-binding protein J Biol Chem 2724058ndash4064
71 YoshidaM MeksuriyenD KashiwagiK KawaiG andIgarashiK (1999) Polyamine stimulation of the synthesis ofoligopeptide-binding protein (OppA) Involvement of a structuralchange of the Shine-Dalgarno sequence and the initiation codonaug in oppa mRNA J Biol Chem 274 22723ndash22728
72 LitwinCM and CalderwoodSB (1993) Role of iron in regulationof virulence genes Clin Microbiol Rev 6 137ndash149
73 MakuiH RoigE ColeST HelmannJD GrosP andCellierMF (2000) Identification of the Escherichia coli K-12Nramp orthologue (MntH) as a selective divalent metal iontransporter Mol Microbiol 35 1065ndash1078
74 PatzerSI and HantkeK (2001) Dual repression by Fe(2+)-Furand Mn(2+)-MntR of the mntH gene encoding an NRAMP-likeMn(2+) transporter in Escherichia coli J Bacteriol 1834806ndash4813
75 FurrerJL SandersDN Hook-BarnardIG and McIntoshMA(2002) Export of the siderophore enterobactin in Escherichia coliinvolvement of a 43 kDa membrane exporter Mol Microbiol 441225ndash1234
76 CunninghamL GruerMJ and GuestJR (1997) Transcriptionalregulation of the aconitase genes (acnA and acnB) of Escherichiacoli Microbiology 143 ( Pt 12) 3795ndash3805
77 FuentesAM Diaz-MejiaJJ Maldonado-RodriguezR andAmabile-CuevasCF (2001) Differential activities of theSoxR protein of Escherichia coli SoxS is not required for geneactivation under iron deprivation FEMS Microbiol Lett 201271ndash275
78 VargheseS TangY and ImlayJA (2003) Contrasting sensitivitiesof Escherichia coli aconitases A and B to oxidation and irondepletion J Bacteriol 185 221ndash230
79 ZavitzKH DiGateRJ and MariansKJ (1991) The PriB andPriC replication proteins of Escherichia coli Genes DNAsequence overexpression and purification J Biol Chem 26613988ndash13995
80 SandlerSJ (2005) Requirements for replication restart proteinsduring constitutive stable DNA replication in Escherichia coli K-12Genetics 169 1799ndash1806
81 SchneiderTD (2001) Strong minor groove base conservation insequence logos implies DNA distortion or base flipping duringreplication and transcription initiation Nucleic Acids Res 294881ndash4891 httpwwwccrnpncifcrfgov~tomspaperbaseflip
Nucleic Acids Research 2007 Vol 35 No 20 6777
lsquoFur titration assayrsquo to locate new Fur binding sitesusing an fhuFlacZ fusion and Fur consensus sequence-containing plasmid titrant on MacConkey plates (1)Several new iron-regulated genes in E coli were discoveredusing this consensus sequence-based technique In addi-tion to the above studies have also been carried out usingE coli Fur for DNase I footprinting with non-E coliDNA (1112) Recently transcriptional profiles of E coligenes have been used to determine those that are regulatedby iron and Fur by evaluating mRNA levels in the absenceof iron or Fur protein (13)
Another method for finding Fur-regulated genes is touse molecular information theory to locate new bindingsites Using this approach classical information theory(1415) is applied to molecular biology (16) First a set ofbinding sites is aligned by maximizing the informationcontent (17) and then the average pattern at the sitesis represented by a computer graphic called a sequencelogo (18) Next the conservation of bases in the alignedset is used to create a weight matrix model that assignsa weight in bits to each base at each position according toits frequency in the data set (19) This can be displayedusing the sequence walker graphic (20)
In addition to displaying details of binding sitessequence logos can be used to understand the mechanismof binding In instances where factors bind in overlappingclusters it is difficult to assign the relative contribution ofa base in an overlapping region to the appropriate binderor to determine the range of the binding site Here wetested several Fur binding site models that were obtainedby multiply aligning Fur binding sequences using differentwindow sizes and identified the model that best representsbinding by a single Fur dimer
Information theory has previously been used to buildtwo models to evaluate and predict Fur binding sites(1321) Both models used ad hoc variations of informa-tion theory to assign scores to the predicted binding sitesrather than classical information content in bits In onecase the model was built using some sites that had notbeen footprinted by Fur and were probably not aligned tomaximize the information content (21)
The most rigorous approach to model building is to usea data set comprised of only footprinted binding sitesfrom one species By restricting the data set to experi-mentally proven sites one is certain that the model willreflect the binding characteristics of the protein the use ofa single species ensures that the protein and DNA bindingsequences evolved together and therefore correspondto one another (22) Many biases from previous modelsare thereby avoided but not all (23) The resultingexperimentally supported model is then scanned acrossthe entire genome of the species looking for sequencesthat contain a positive amount of information asevaluated by the weight matrix (19) Sequence walkerswhich are graphical representations of individual bindingsites then display probable binding sites on the genomebased on the model of proven sites (20) This method wassuccessfully used to discover that the OxyR transcriptionfactor controls the expression of the fur gene (24) toidentify additional sites for proteins such as Fis SoxSOxyR and PXRRXR (2325ndash27) to characterize splice
junctions (28) and to discover T7 islands a unique classof mobile genetic elements (2930) In this studyinformation theory has allowed us to identify new Furbinding sites 13 of which we confirmed experimentally
MATERIALS AND METHODS
Programs
Programs used in this study are available at httpwwwccrnpncifcrfgov~toms A web-based toolfor searching for Fur sites is available at httpwwwccrnpncifcrfgov~tomsdelilaserverhtml
Creating the Fur model
Eleven footprinted E coli Fur binding sequences wereextracted from the E coli K-12 genome (NC_000913)by the delila program (31) These sites are from thepromoters of the genes cir fecA fecI fur sodA tonB andiucA along with two bidirectional promoter regions forthe genes fepA-fes and fepB-entC (832ndash39) The promoterregion at fepB-entC has two distinctly protected regionseach region was included in the data set as individualsequences (fepB and entC) The promoter of iucA has anexceptionally long secondary footprint and so two regionswere used from the E coli plasmid ColV-K30 (M10930from 347 to 370 and from 365 to 393) (7) The comple-ment of each footprinted sequence was also includedsince Fur binds as a dimer (67940) The programmalign was used to obtain an alignment of the sequencesthat maximizes the information content within definedwindows of the alignment (17)As the initial model described above contains only
11 sites we then refined the model by including 13 moresites that were identified in the genome by a searchand confirmed by gel shift experiments in this study(see subsequently) The validity of the refined modelwas tested by scanning it across all available E coliFur footprinted regions using the programs scan andlister to create sequence walkers (1920) The model isverified if the walkers correspond to DNase I footprintregions For further verification the model was alsoscanned across synthetic oligonucleotides that containGATAAT repeats and which had been previouslyfootprinted (5)To compare Fur models of different bacteria we also
built Pseudomonas aeruginosa and Bacillus subtilis modelsby using footprinted Fur binding sequences from thesetwo species (4142)
Scanning for Fur binding sites
Fur in high concentration can bind weak sites while inlow concentration it binds only strong sites The affinity ofFur to its binding site also varies significantly with theavailability of metal Furthermore as demonstrated byfootprint data Fur binding can extend to flanking regionsthat bear little resemblance to the strong Fur binding sitesThis suggests that it may not be practical to use a fixedcutoff for a Fur binding site model under different
Nucleic Acids Research 2007 Vol 35 No 20 6763
conditions In our study we used two cutoff values fordifferent scansFor the whole E coli genome scan we used the lowest
information in the set of sites used to build the model asthe cutoff This should minimize false positives in ourpredictions as the model was built from experimentallyproven sites Groups of sites that were within 100 basesof each other were identified using the localbest programand the strongest one was selected to represent theregion This ensured that each region in the data set wasunique All sites with Ri values greater than 170 bits wereextracted (Table 4) This value was chosen simply to allowfor a manageable set of regions for further analysisAs Fur can extend binding to much weaker regions in
the footprints we should use a lower cutoff for scanningfootprinted sequences The second law of thermodynamicssets a theoretical lower bound for the information contentof a binding site (Ri) at zero bits (19) Therefore we useda zero-bit cutoff when scanning footprinted sequencesFur has been suggested to repress genes by binding to
their promoter elements (35 and 10 regions) therebyblocking the access of polymerase to the promoters(3343ndash46) To determine how many of the predictedFur binding clusters overlap with promoters we used aninformation theory-based flexible s70 binding model toscan the Fur clusters The s70 model was built from401 experimentally proven E coli promoters by uniformlytaking into account the information present in the 10the 35 and the uncertainty of the spacing between them(47) the same method has been successfully used to modelprokaryotic ribosome binding sites (48) The informationcontent (average conservation) of the promoter model islow (x frac14 65 bits SD=28 bits for the Ri distribution)and the individual information for a site in the modelranges from 03 to 126 bits The probability that a site isless than 4 bits is only 186 so we used 4 bits as a cutoffto predict reasonably strong non-activated promoters (47)
Gel mobility shift assays
Two sets of oligonucleotides containing predictedFur binding sites in E coli were designed andsynthesized (Oligos Etc) All oligonucleotides were self-complementary had a 50-GCTA-30 overhang on the50 end and contained a hairpin loop Oligos exbB exbDand fhuF contained a hairpin of the sequence 50-CGCGAAGC G-30 while the other 12 oligos [yoeAfepD-entS (formerly ybdA) gpmA ryhB fhuA nohAoppA gspC garP (formerly yhaU) yahA fadD and ygaC]contained a hairpin of the sequence 50-ACGATCGCGCGAAGC GCGATCGT-30 in the center Such loopsform a structure that is stabilized by base pairing betweenG3 and A5 of the central seven bases of each loop (49) andthey are convenient for use in DNA mobility shift assaysbecause of the exact equimolar concentrations of thecomplementary strands and their high stability (25)Three oligos containing the promoter regions for exbB
ygaC and the upstream region of exbD were created totest previously published consensus sequence predictions(5051) Eight oligos contained potential Fur-controlledpromoters identified using both an 11-site Fur model and
a s70 model as described above [yoeA fepD-entS (formerlyybdA) gpmA ryhB fhuA nohA oppA and fhuF] We werealso interested in whether Fur binds intragenically so fouroligos were synthesized that contained strong predictedsites found within gene coding regions (gspC garP yahAand fadD) These sequences can be found in SupplementaryTable S1 and Figure S1
Gel mobility shift assays were performed using the15 oligos (Figure 3) The oligonucleotides were labeled bya fill-in reaction with Taq DNA polymerase The 20 ml oflabeling mixture [20 pmol oligo 2 mM MgCl2 1PCRbuffer 5 units Taq polymerase (GibcoBRL) and 10 mMtetramethylrhodamine-6-dUTP (NEN)] was incubated at72C for 1 h 80 ml of ddH2O was added to the mixturefollowed by two phenol extractions a phenolchloroformextraction and a chloroform extraction The labeledoligonucleotides were diluted 15 in TE (10 mM TrisndashClpH 75 1 mM EDTA) boiled for 10 min and then placedon ice to prevent dimerization and trimerization
The labeled oligonucleotides (5 ml each 02 pmol) werethen incubated in Fur binding buffer (10 mM BisndashTrisndashHCl pH 75 5 mgml sonicated salmon sperm DNA 5glycerol 100 mM MnCl2 100 mgml BSA 1 mM MgCl240 mM KCl) with 150 nM Fur protein at 37C for 13 min(7) Fur was a gift from T OrsquoHalloran and C OuttenSamples were electrophoresed on a 5 polyacrylamidegel in Fur electrophoresis buffer (01 M BisndashTrisndashHClpH 75 10 mM MnCl2) at 120V for 1 h
Bands were visualized with an FMBIO II fluore-scent scanner (Hitachi) with an excitation wavelengthof 532 nm and a 585 nm filter for detection oftetramethylrhodamine
Footprinting
To generate the fhuF promoter construct (pGSO129) usedto test for Fur binding a 240-bp fragment amplified byPCR from genomic DNA (using the primers 50-GCGGCT GGA GAT GAA TTC GCC AGA TG and 50-GCCCTG CAA TCA GGG ATC CCG GCA GC) was clonedinto the BamHI and EcoRI sites of pUC18 (27) PurifiedFur protein generously provided by C Outten andT OrsquoHalloran was incubated with a Mn2+-containingbuffer according to de Lorenzo et al (7) DNase I foot-printing then was carried out as described previously (52)The 240-bp BamHIndashEcoRI fragment of pGSO129 waslabeled with [g-32P]ATP at either the BamHI site (topstrand) or the EcoRI site (bottom strand) The labeledfragments were incubated with 0 50 100 200 and 400 nMpurified Fur protein at room temperature for 5 minThe samples then were subjected to limited DNase Idigestion purified and separated on 8 polyacrylamidesequencing gels
RESULTS
E coli Fur binding model
Eleven footprinted Fur binding sites (see Table 3 andSupplementary Figure S1) (832ndash39) from E coliK-12 (53)were used to create an initial information theory model
6764 Nucleic Acids Research 2007 Vol 35 No 20
Because Fur binds as a dimer (954) the model includedboth the footprinted Fur sequences and their comple-ments The sequences were aligned using the programmalign which maximizes the information content within aregion of the alignment (a lsquowindowrsquo) by shuffling the
sequences back and forth (17) Multiple alignmentwith a window size of (12 thorn12) gave a sequencelogo that shows base conservation in the range of(12 thorn12) (Figure 1A) For convenience we named thislogo M12
A
B
C
M12208 bits(minus12 +12)
M9181 bits(minus9 +9)
M6154 bits(minus6 +6)
===========gtolt===========
========gtolt========
=====gtolt=====
========gtolt================gtolt========
=====gtolt==========gtolt=====
0
1
2
bit
s
5prime
minus20
GTAC
minus19
minus18
CGTA
minus17
CGTA
minus16
GACT
minus15
ACGT
minus14
GTCA
minus13
minus12
GTA
minus11
GTA
minus10
GACT
minus9CTAG
minus8
TCGA
minus7
GCTA
minus6
TCA
minus5
TA
minus4
CT
minus3
TAG
minus2
CGA
minus1
CGT
0
TA
1
GCA
2
GCT
3
ATC
4
GA
5
AT
6
GAT
7
GCTA
8
AGCT
9
GATC
10
CTGA
11
CAT
12
CAT
13 14
CAGT
15
TGCA
16
CTGA
17
GCAT
18
GCAT
19 20
CATG
3prime
0
1
2
bit
s
5prime
minus20
CGTA
minus19
minus18
minus17
GTCA
minus16
GCAT
minus15
GCTA
minus14
GTA
minus13
GACT
minus12
CAGT
minus11
CGA
minus10
GCAT
minus9
TA
minus
minus8
GTA
minus7
GCAT
minus6
CAG
minus5
CA
minus4
GAT
minus3
TA
minus2
GCA
minus1
GCT
0 1
CGA
2
CGT
3
AT
4
CTA
5
GT
6
GTC
7
GCTA
8CAT
9AT10
CGTA
11
GACT
12
GTCA
13
TCGA
14
CAT
15
CGAT
16
CGTA
17
CAGT
18 19 20
GCAT
3prime
0
1
2
bit
s
5prime minus20
GCA
minus19
GCAT
minus18
GCTA
minus
minus17
GCTA
minus16
GCAT
minus15
CAGT
minus14
TGCA
minus13
GTAC
minus12
GCTA
minus11
GTA
minus10
GACT
minus9
CAG
minus8
CTGA
minus7
GCAT
minus6
A
minus5
TA
minus4
CT
minus3
TCAG
minus2
CGA
minus1
AGT
0
TA
1
TCA
2
GCT
3
GATC
4
GA
5
AT
6
T7
GCTA
8
GACT
9
GATC
10
TCGA
11
CAT
12
GCAT
13
CTGA
14
ACGT
15
GTCA
16
GTCA
17
GCAT
18
GCAT
19
CGTA
20
CAGT
3prime
Figure 1 Different alignments of footprinted E coli Fur binding sites Sequence alignments were done using the program malign with differentwindow sizes [from (15 +15) to (5 +5)] (17) on 11 footprinted E coli Fur binding sequences and their complements (see Supplementary FigureS1 for sequences) Three classes of alignments M12 (A) M9 (B) and M6 (C) were obtained by using window sizes of (12 +12) (9 +9) and(6 +6) respectively In these logos the height of each letter is proportional to the frequency of that base at each position and the height of theletter stack is the conservation in bits (18) The dashed sine wave on each logo represents the 106 base helical twist of B-form DNA (3181) Thedouble dashed arrows on the top of each logo mark the inverted repeats in each model Note that the M12 logo corresponds to two overlapping M9sites as indicated by the two red double dashed arrows below the M12 logo A similar relationship occurs between the M9 and M6 logos
Nucleic Acids Research 2007 Vol 35 No 20 6765
The M12 logo appears to consist of two overlappingdirect repeats one is from 12 to thorn6 and the other from6 to thorn12 Because the multiple alignment process canincorporate one or the other of these repeats this suggeststhat different alignments may be obtained when otherwindows are used to align the sequences Because two Furdimers can bind overlapping sites with 6-base separation(4055) we then multiply aligned the sequences by usingwindow sizes ranging from (15 thorn15) to (5 thorn5) Thesesizes should cover the sites recognized by one or twoFur dimers (9) A total of seven different alignments wereobtained the corresponding sequence logos are shown inSupplementary Figure S2 According to the patternsshown in the logos the seven alignments that wereobtained can be further consolidated into and assignedto three basic alignment classes these alignment classesare represented by logos M12 M9 and M6 (Figure 1)Each of the three alignments itself is an inverted repeatthe M12 logo also appears to consist of two overlappingM9 sites (from 12 to thorn6 and from 6 to thorn12) and theM9 logo looks like two overlapping M6 sites (from 9to thorn3 and from 3 to thorn9) (Figure 1)To determine which of the three alignment classes best
represents a single-dimer binding model some publishedexperimental data was analyzed Lavrrar and McIntosh(55) used the Ferguson method (56) to determine howmany E coli Fur dimers can bind on five synthetic oligosand two natural sites fepB and entS We applied the threemodels to their sequences and used sequence walkersto show the predicted sites of each model A full list ofthe predictions (with walkers gt0 bits) is given inSupplementary Figure S3 The results show that themodels frequently found overlapping sites that areseparated by 3 or 6 bases The sites that are 6 basesapart can both be strong while 3-base separated siteshave one strong site and one weak site (SupplementaryFigure S3) Furthermore it has been shown that Furdimers can simultaneously bind two overlapping sites with6-bp spacing (40) but there is no experimental datashowing that 3-base separated overlapping sites can bebound at the same time Therefore to analyze the resultswe only counted major walkers (strong walkers that areseparated by 6 bases) for each model in each oligo
A summary of the results (with major walkers) is availablein Table 1 two examples of the prediction (with majorwalkers) are shown in Figure 2A and B The results showthat the M9 model made correct predictions for five ofthe seven oligos while the M12 and M6 models bothmade only two correct predictions Furthermore the M12model always predicted one less site than the M9 modelThese results suggest that the M9 model may be a single-dimer binding model the M12 model may representa two-dimer binding model while the M6 alignment maybe just a result of compression of the binding sequencesinto a small alignment window
Confirmation of 13 predicted sites and model refinement
Since our E coli Fur model had only 11 sites we wantedto refine the model by including more sites A genome scanwith the 11-site M9 model revealed 389 sites above 11 bits(the lowest information of a site in the model is 111 bits)Within these 13 sites (from 12 to 26 bits) were selectedfor experimental confirmation (Figure 3 SupplementaryTable S1 and Figure S1) These include eight sites locatedin promoter regions [yoeA fepD-entS (43) gpmA (= pgm)(51) ryhB (= yhhX-yhhY) (5157) fhuA (13) nohA (51)oppA (13) and fhuF (51)] and four sites inside genes(gspC garP yahA and fadD) We also included two sitesthat were predicted by consensus models to be in ygaCand exbD (5051) since our model only detected a weaksite (14 bits) in ygaC and a negative site (50 bits) inexbD (Figure 3) Gel mobility shift assays showed that all13 sites predicted by the M9 model shifted when incubatedwith Fur protein while the exbD site did not shift and theygaC site only showed extremely weak binding (Figure 3)
McHugh et al (13) found that both exbB and exbDwere regulated by Fur but their model did not detect abinding site in the exbB promoter while our informationtheory model successfully predicted that Fur binding siteand our gel shift results confirmed this (Figure 3)The exbD gene is located only seven bases downstreamof the exbB gene Both genes have the same orientationso they may belong to a single operon cotranscribed fromthe exbB promoter and coregulated by Fur
To strengthen our model we added the 13 confirmedsites to the 11 previously footprinted sites (Figure 4A)
Table 1 Information theory test of 11-site E coli Fur models M12 M9 and M6
Oligo1 Number of observed1
Fur dimers (nM)Number of M122
walkers (bits)Number of M92
walkers (bits)Number of M62
walkers (bits)
F-F 1 (25000) f 1 (55) 2 (90 94) f 1 (115)F-F-F 2 (3610) 1 (148) f 2 (118 153) f 2 (119 99)F-F-F-F 4 (630) 2 (166 192) 3 (118 189 153) 3 (119 119 99)F-F-x-R 2 (095) f 2 (221 43) f 2 (177 176) 1 (204)F-F-x-R-R 3 (078) 2 (296 227) f 3 (177 275 108) 2 (204 164)fepB 2 (860) 1 (157) f 2 (109 82) 1 (152)entS 3 (301) 2 (156 172) f 3 (75 172 95) 2 (83 122)
1Oligos and observed Fur dimers are from (55) The apparent Kd is also given (nM in parentheses) for each oligo2Number of sequence walkers of each model followed by the information (bits) of each walker are shown for each sequence The predictions thatmatch the number of observed Fur dimers are marked with a solid circle while those that do not are marked with an open circle Only majorwalkers (strong walkers with 6-bp separation) are counted (see the text) A full depiction of all walkers (gt0 bits) can be found in SupplementaryFigures S3 and S5
6766 Nucleic Acids Research 2007 Vol 35 No 20
As with the 11 sites multiple alignment of the 24 sitesusing different window sizes also gave three classes ofalignments M12 M9 and M6 (Supplementary Figure S4)When these models were tested against the same set ofsequences as that used to test the 11-site models (Table 1Supplementary Figure S3) similar but cleaner resultswere obtained ie the weak walkers that were not countedpreviously become weaker (some are even below 0 bits)but the strong walkers still have similar Ri values(Supplementary Figure S5) This confirms that ourcounting of major walkers predicted by the initial 11-sitemodels was reasonable
P aeruginosa and B subtilis Fur models
To compare E coli Fur models with other bacterialFur models we also built P aeruginosa and B subtilis Furbinding site models As with E coli Fur binding siteswe applied the same method to align footprinted Furbinding sites from P aeruginosa (41) and B subtilis (42)For both bacteria we obtained two classes of alignments
(Supplementary Figure S6) which correspond to theE coli M9 and M6 logos (Figure 1B and C) respectivelyNo M12 alignments were obtained for these two bacteriaprobably because some sites used to make the modelsare single-dimer binding sites (see Discussion Section)Both P aeruginosa and B subtilis M9 models arehighly similar to the E coli M9 model (Figure 4) TheP aeruginosa M6 logo looks almost identical to theE coli M6 logo while the B subtilis M6 logo is lesssimilar to the E coli M6 logo (Figure 1C SupplementaryFigure S6B and D)As with Lavrrar and McIntoshrsquos work on E coli Fur
binding sites (55) Baichoo and Helmann (40) performedsimilar experiments on eight synthetic oligos and twonatural sites (feuA and dhbA) with B subtilis Fur anddetermined how many Fur dimers can bind each of theseoligos We tested the B subtilis M9 and M6 modelswith these 10 sequences (Table 2 Figure 2C and Dand Supplementary Figure S7) The results show that theM9 model made correct predictions for all 10 sequenceswhile the M6 model was correct for only four oligos
A
B
fepB binds 2 Fur dimers
entS binds 3 Fur dimers
C
D
feuA binds 1 Fur dimer
dhbA binds 2 Fur dimers
10 20 30 5 g g a a t t c g a a a a t g a g a a g c a t t a t t g g a t c c c g 3
M12 157 bits
M9 109 bits
M9 82 bits
M6 152 bits
10 20 30 40 5 g g a a t t c g a t a a t g a a a t t a a t t a t c g t t a t c g g a t c c c g 3
M12 156 bits
M12 172 bits
M9 75 bits
M9 172 bits
M9 95 bits
M6 83 bits
M6 122 bits
10 20 30 5 a t t c c a a t t g a t a a t a g t t a t c a a t t g a a c a 3
M9 223 bits
M6 103 bits
M6 81 bits
10 20 30 5 a t t g a t a a t g a t a a t c a t t a t c a a t a g a t t g 3
M9 256 bits
M9 299 bits
M6 149 bits
M6 208 bits
M6 171 bits
Figure 2 Sequence walkers to test E coli (A and B) and B subtilis (C and D) Fur models (A and B) The three E coli Fur models (M12 M9 andM6) were tested with two E coli Fur binding sites fepB and entS (55) Sequence walkers of each model (green M12 red M9 blue M6) are shownfor each site (C and D) Two B subtilis Fur models (M9 and M6) were tested with two B subtilis Fur binding sites feuA and dhbA (40) Gray barsindicate natural sequence surrounded by synthetic DNA Different colored rectangles indicate different Fur models (by hue green M12 red M9blue M6) and their strengths (by saturation) (30) Only major walkers are shown in these figures for a full list of walkers (Rigt0 bits) please seeSupplementary Figures S3 S5 and S7
Nucleic Acids Research 2007 Vol 35 No 20 6767
For most of the oligos the M6 model predicted one moresite than the M9 model did Similar to our observationsin E coli these results strongly suggest that the M9 modelis a single-dimer binding model while the M6 alignment isdue to compression of the binding sequencesRelative to E coli Fur (NP_415209) the P aeruginosa
Fur (NP_253452) is moderately diverged (58 identical inprotein sequence) and the B subtilis Fur (NP_390233) isdistantly diverged (33 identity) However a comparisonof the three M9 models from these three species showsthat these models are highly similar to each other(Figure 4) suggesting that the FurndashDNA recognitionmechanism is conserved in even distantly related species
Scans of footprinted regions
Gel shift experiments cannot give precise Fur bindingregions Originally using the 11-site M9 model and laterusing the 24-site M9 E coli Fur model we predicted Fursites in the fhuF promoter region and then to validate ourmodel we performed DNase I footprinting on that regionTo further confirm the 24-site M9 model we also scannedit across all other available footprinted E coli Fur bindingregions and correlated our predictions with the publishedfootprints Fur has been documented to exhibit lsquosecondaryfootprintingrsquo protecting extended regions of DNA underhigher protein concentrations (Supplementary Figure S8)(333646) To reveal the extended regions we used a cutoffat zero bits since that is the theoretical lowest informationfor binding (19)The E coli 24-site M9 model predicts two separated
Fur binding clusters in the fhuF promoter region onecluster contains three strong sites of 161 208 and 223bits and the other contains two weaker sites of 146 and
93 bits (Figure 5A) The fhuF oligo used in the gel shiftexperiments in Figure 3 only contains the strong Furcluster To determine if both predicted clusters are pro-tected by Fur we performed DNase I footprinting on alarger fhuF promoter region (see Materials and MethodsSection) that contains both predicted Fur binding clustersThe results show that there are indeed two Fur-protectedregions in the fhuF promoter one has high affinity withFur (protected by 50 nM of Fur) and the other hasa lower affinity (weakly protected by 50 nM of Fur)(Figure 5B) The predicted strong Fur binding clustercovers 91 of the high-affinity footprinted region and theweak Fur cluster covers 66 of the low-affinity region(Table 3 Figure 5A)
There are 11 other footprinted E coli Fur bindingregions including the 10 regions used to build the initial11-site models and the fepD-entS promoter region (58)When these footprints were scanned similar coveragewas obtained (Table 3 Supplementary Figure S8)All 11 footprinted sequences displayed clusters of multipleoverlapping Fur walkers in these clusters the majorityof the walkers were separated by six bases some wereseparated by three bases (Table 3 SupplementaryFigure S8) For 7 of the 11 regions (fepA-fes fepB entCfur tonB cir and iucA) sequence walkers cover 83(72ndash93) of the footprints No sequence walkers coverthe secondary footprints in the fecA fepD-entS and sodAregions resulting in low coverage of these three footprints(66 54 and 56 respectively) The secondary sodAfootprint was only protected with a high concentration ofpurified Fur (200 and 500 nM) it was not protected bycrude extracts containing overproduced Fur (36) In theother region fecI several low-information content
A
B
yoeA fepD-entS gpmA ryhB fhuA nohA oppA
bits 257 172 243 220 161 205 187 11-site M9274 194 274 252 195 223 215 24-site M9
Fur minus + minus + minus + minus + minus + minus + minus +
gspC garP yahA fadD ygaC exbB exbD fhuF
bits 122 122 163 197 14 175 minus50 214 11-site M9144 138 186 217 minus34 191 minus71 223 24-site M9
Fur minus + minus + minus + minus + minus + minus + minus + minus +
Figure 3 Gel mobility shift assays to test predicted Fur binding sites Oligonucleotides containing predicted Fur binding sites were incubated without() or with (+) 150 nM E coli Fur protein and gel electrophoresed to test the 11-site M9 model (Figure 1B) Below each set of lanes is the strengthin bits of the strongest sequence walker found on each oligo by the 11-site M9 model Predictions by the 24-site M9 model which includes 13 of thesites tested here (not ygaC or exbD) are also given for comparison (A) The first set of oligos contain seven predicted Fur sites located in promoterregions (yoeA fepD-entS gpmA ryhB fhuA nohA and oppA) (B) The second set of oligos contain four predicted Fur sites located within genes(gspC garP yahA and fadD See Supplementary Figures S10ndashS13 for sequence walkers) two sites located in promoter regions (exbB and fhuF)and two consensus-predicted sites (ygaC and exbD) (5051) As expected all oligos that contain predicted sites (by the 11-site M9 model) exhibit oneor more mobility shifts following incubation with 150 nM Fur the oligo ygaC (14 bits) shows extremely weak binding and exbD (50 bits)does not shift
6768 Nucleic Acids Research 2007 Vol 35 No 20
walkers (02 and 08 bits) extend past the range of thefootprint resulting in 144 coverage of the footprint
In a previously published study Escolar et al synthe-sized oligonucleotides that contained repeats of thesequence GATAAT and determined Fur binding byfootprint experiments (5) No footprint was seen withone insert correspondingly no sequence walkers wereobserved in that sequence (Supplementary Figure S9)For the oligos with three four and five inserts sequencewalkers cover 83ndash90 of the footprints The 2-insertsequence had a weak interaction with Fur at high protein
concentrations and two overlapping walkers of 30 and42 bits appeared which cover the repeats (5) Theseresults demonstrate that the Fur model accurately predictsFur binding on both natural and synthetic sequences
Whole genome scan
The lowest information of any site in the 24-site M9 modelis 101 bits (Supplementary Figure S1) Since all sites inthe model are proven binding sites we used 10 bits asthe cutoff for a whole genome scan This presumably
A
B
C
======gtolt======
======gtolt======
======gtolt======
24 E coli Fur binding sites 196 bits (minus9 +9)
0
1
2b
its
5prime
minus20
CTGA
minus19
minus18
minus17
minus16
GCAT
minus15
GCTA
minus14
CGTA
minus13
GACT
minus12
CAGT
minus11
CGTA
minus10
GCAT
minus9
GTA
minus8
GTA
minus7
GCAT
minus6
CAG
minus5
GTCA
minus4
GAT
minus3
TA
minus2
TGCA
minus1
GACT
0 1
TCGA
2
ACGT
3
AT
4
TCA
5
CGAT
6
GTC
7
CTGA
8
CAT
9
CAT
10
CGTA
11
GCAT
12
GTCA
13
CTGA
14
GCAT
15
CGAT
16
CGTA
17 18 19 20
GCAT
3prime
20 P aeruginosa Fur sites 186 bits (minus9 +9)
0
1
2
bit
s
5prime
minus20
minus19
minus18
minus17
minus16
minus15
minus14
CGTA
minus13
ACGT
minus12
minus11
minus10
CGTA
minus9
CGTA
minus8
CTA
minus7
CT
minus6
TAG
minus5
GTCA
minus4
CAGT
minus3
TA
minus2
CTA
minus1
CT
0
GC
1
GA
2
GAT
3AT
4GTCA
5
CAGT
6
ATC
7
GA
8
GAT
9
GCAT
10
GCAT
11 12 13
TGCA
14
GCAT
15 16 17 18 19 20 3prime
22 B subtilis Fur sites 229 bits (minus9 +9)
0
1
2
bit
s
5prime
minus20
minus19
GCAT
minus18
GCTA
minus17
GACT
minus16
CTGA
minus15
minus14
GCAT
minus13
GCAT
minus12
minus11
GCAT
minus10
GCTA
minus9
CGTA
minus8
CGTA
minus7
AT
minus6
C
G
minus5
GCA
minus4
CGAT
minus3
TGA
minus2
TGA
minus1
GCAT
0
TAGC
1
GCTA
2
CAT
3
CAT
4
GCTA
5
CGT
6
G
C7
TA
8
GCAT
9
GCAT
10
CGAT
11
CGTA
12 13
CGTA
14
CGTA
15 16
GACT
17
CTGA
18
CGAT
19
CGTA
20 3prime
Figure 4 Comparison of M9 Fur models of three bacterial species Experimentally proven Fur binding sites from E coli (A) P aeruginosa (B) andB subtilis (C) were used to build the models for these three bacteria The marked region from 7 to +7 in each logo indicates the 7-1-7 model thatwas proposed by Baichoo and Helmann (40)
Nucleic Acids Research 2007 Vol 35 No 20 6769
minimizes false positives in our predictions When themodel was scanned across the E coli K-12 genome(NC_000913) a total of 412 sites were found above10 bits These sites belong to 363 unique clusters A list ofthe clusters is available at httpwwwccrnpncifcrfgovtomspapersfurThe 363 regions were extracted from 50 to thorn50
around the strongest site in each region and scanned withthe 24-site M9 model using a cutoff of 0 bits We thendetermined the exact region of each cluster that is coveredby walkers greater than 0 bits The results show that theseclusters contain 1ndash14 walkers covering a region from19ndash142 bases (350 bases on average) This is comparableto the known footprints (30ndash103 bases) (Table 3 Supple-mentary Figure S8) Of these clusters 155 are in inter-genic regions 143 are entirely inside gene coding regionsand 65 overlap the boundary between coding and inter-genic regionsTo determine how many of the Fur clusters overlap
with reasonably strong promoters we scanned these 363Fur clusters with a s70 promoter model (47) using a 4-bitcutoff The results showed that 303 clusters overlap withone or more strong promoters (Rigt4 bits) The other60 clusters do not overlap with a strong promoterWhen we scanned the flanking regions of these 60 clusters(50 to 0 of the left edge and 0 to thorn50 of the right edgeof each cluster) only 23 clusters were found to havestrong promoters (Ri gt 4 bits) in the flanking regionsWe examined the locations of Fur sites relative toknown promoter elements (transcriptional start site 10and 35) and found that they are more tightly distributedaround the promoter elements than observed with theDNA binding proteins Fis H-NS and IHF (47) (data notshown) These results suggest that Fur may mainly act as adirect repressor of genes by binding to and blockingthe promoters and that direct activation by Fur throughbinding to the region upstream of a promoter assuggested in Neisseria meningitidis by Delany et al (59)
may not be a common mechanism of Fur regulationin E coli
Out of the 363 clusters 42 were found containingat least one predicted Fur binding site above 170 bits(our arbitrary cutoff used to locate significant regions)(Table 4) Of the 42 clusters 39 were found to overlap withone or more strong s70 promoters (gt4 bits) The otherthree clusters fepD-entS yddA and fecA do not overlapScanning the flanking regions of these three clusters withthe s70 promoter model revealed that these Fur clustersare located between the respective promoters and transla-tional starts There may be only weak repression of thesegenes by Fur as none of them were significantly repressedby Fur according to microarray analyses (1360)
Scans of other proposed Fur-regulated genes
Many genes have been proposed to be Fur-regulated bycomparing conventional consensus sequences to promoterregions and also by homology to systems in other organ-isms Kammler et al annotated two Fur binding sites inthe promoter region of feoA in E coli by comparison toa consensus sequence (10) The 24-site M9 model predictstwo clusters of sites in the same region with each clustercovering one of the marked Fur boxes One cluster hasone major site of 105 bits (at 3538006) the other clusterhas two major overlapping sites of 109 (at 3538059) and134 bits (at 3538065) respectively The same authors haveconfirmed that Fur does bind to this region in vivo but theexact Fur binding regions have not been determined byfootprint experiments
Vassinova and Kozyrev used an in vivo selection tolocate Fur sites on Sau3A fragments from the E coligenome (51) The five regions from Figure 2 of their paperwere analyzed using sequence walkers Using an unidenti-fied consensus sequence yhhX (from 3578828 to 3577791)was predicted by Vassinova and Kozyrev to have two Furbinding sites In the same region we found one strongsite of 252 bits (at 3579054) (Table 4) overlapping twoweaker sites (82 and 83 bits) This cluster of Fur sites isactually located right in front of an sRNA gene ryhB(from 3579039 to 3578946) (Table 4) (6162) which hasbeen shown to be repressed by Fur (57) The promoterregion of gpmA (from 786818 to 786066 named pgm inreference 51) was predicted by consensus to contain twosites One of the highest sites (274 bits) was found in thisregion (at 786853) overlapping two weaker sites of 75and 56 bits The consensus sequence-predicted Fur site byVassinova and Kozyrev (51) is located about 600 basesupstream of the ygaC gene In the same region our 11-siteM9 model only found a 14-bit site (Figure 3) and the24-site M9 model found a negative site of 34 bits(Supplementary Figure S1) Instead in the nearby regionsthe 24-site M9 model found a 74-bit site (11 basesupstream of ygaC) and a 101-bit site (728 bases upstreamof ygaC) For nohA (from 1634391 to 1633822)two overlapping sites of 223 and 126 bits were found at1634624 and 1634630 (Table 4) The fifth region was fhuF(from 4603686 to 4602898) Figure 5 shows the DNase Ifootprints and predicted walkers in this region Onlythe high-affinity region (fhuF1) was identified by the
Table 2 Information theory test of B subtilis Fur models M9 and M6
Oligo1 Number of observed1
Fur dimers (nM)Number of M92
walkers (bits)Number of M62
walkers (bits)
6-mer 0 f 0 f 06-6 1 (500) f 1 (21) 2 (67 54)6-1-6 1 (200) f 1 (97) 2 (54 74)7-1-7 1 (100) f 1 (207) 2 (107 128)8-1-7 2 (100 1000) f 2 (76 202) f 2 (174 108)8-1-8 2 (100 1000) f 2 (76 230) f 2 (174 160)9-1-9 2 (50 100) f 2 (119 301) f 2 (208 194)2 (7-1-7) 2 (10 100) f 2 (166 257) 3 (61 208 143)feuA 1 (10) f 1 (223) 2 (103 81)dhbA 2 (10 100) f 2 (256 299) 3 (149 208 171)
1Oligos and observed Fur dimers are from (40) The Fur concentration(nM) at which one or two Fur dimer-DNA complexes appeared isgiven in parentheses2Number of sequence walkers of each model followed by theinformation (bits) of each walker are shown for each sequence Thepredictions that match the number of observed Fur dimers are markedwith a solid circle while those that do not are marked with an opencircle Only major walkers are counted A full depiction of all walkerscan be found in Supplementary Figure S7
6770 Nucleic Acids Research 2007 Vol 35 No 20
A B
L
H
H
L
H fhuF1
L fhuF2
yjjZ
4603880 4603870 4603860 4603850 4603840 4603830 4603820 5 a t c a g c a a t c c c g g c a g c a a c a c t c c c c a g c c a c t g c c c a g c g t a c g t t g c a a c a t g a t t t c a t c t c 3
4603810 4603800 4603790 4603780 4603770 4603760 4603750 5 t t t c a t t g a t a a t g a t a a c c a a t a t c a t a t g a t a a t t t t t a t c a t t t g c a a g c c a g a t a a a t c c c t t g c t a t c g g 3
Fur high affinity topFur high affinity bottom
M9 161 bits
M9 208 bits
M9 223 bits
M9 34 bits
fhuF
fhuF
4603740 4603730 4603720 4603710 4603700 4603690 4603680 5 g t a a a c c t a t c g c t a t g a t t a g c a a t c a t t a t c a t t t a g a t t a c t a t c c c g a t t a t g g c c t a t c g t t c 3rsquo
Fur low affinity topFur low affinity bottom
M9 22 bits
M9 01 bits
M9 146 bits
M9 93 bits
Figure 5 Fur model scan and DNase I footprints of the fhuF promoter region (A) The promoter region was scanned with the 24-site M9 model and two clusters of sequence walkers (Rigt 0bits) were detected with each closely corresponding to one of the two Fur protected regions (gray bars fhuF1 and fhuF2) The solid black arrow with two tails above the DNA at coordinates4603717 and 4603716 indicates the fhuF transcription starts The open arrows starting at coordinates 4603827 and 4603686 indicate the yjjZ and fhuF translation starts respectively Thesaturation of colored rectangles behind each sequence walker is proportional to the information of that site For reference colored lines cycle with 6-base periodicity (B) DNase I footprinting onthe fhuF promoter by Fur showed two regions protected by the protein marked in the figure by brackets The high-affinity and low-affinity regions are marked with H and L respectively Thefootprinting samples were run in parallel with MaxamndashGilbert sequencing ladders (marked by GA)
Nucleic
Acid
sResea
rch2007V
ol3
5N
o20
6771
consensus sequence Thus four of the five Sau3Afragments (yhhX gpmA nohA and fhuF) selected to haveFur binding sites in vivo were identified in our genomescan (Table 4) and our gel shift results confirmed this(Figure 3) the remaining site ygaC had a weak walker of14 bits (by the 11-site M9 model) and showed extremelyweak binding (Figure 3)Eick-Helmerich and Braun matched a Fur consensus
sequence to the promoters of exbB and exbD (50)One strong walker of 191 bits was found in the exbBpromoter region overlapping the lsquoFur boxrsquo The regionupstream of exbD showed no walkers with Ri greater than0 bits even though it had also been predicted to containa Fur binding site using the same consensus sequenceas used in the exbB promoter The highest walker thatwe could detect in the region upstream of the exbDgene is 71 bits (by the 24-site M9 model Figure 3 andSupplementary Figure S1) significantly lower than thetheoretical lower bound of a binding site (0 bits) (19) Thisindicates that Fur should not bind this region and our gelshift results confirmed this (Figure 3)A microarray analysis by McHugh et al found 101
genes to be regulated by Fur 53 of which were repressedand 48 activated (13) The 53 Fur-repressed genes belongto 32 transcription units (individual genes or operons)18 of which are involved in iron metabolism 14 in energymetabolism and other functions Using the 24-site M9model we predicted strong Fur binding clusters (higherthan 13 bits) for all 18 iron metabolism-related transcrip-tion units Of the other 14 transcription units onlytwo have a Fur binding site higher than 10 bits (nrdHIEF101 bits fimE 108 bits) The 48 Fur-activated genesbelong to 34 transcription units only three of these(garPLRK ynaE and ydfK) have a predicted Fur bindingcluster The garPLRK operon has a weak Fur clusterthat contains one major site of 104 bits The ynaE andydfK genes are almost identical in both promoter andcoding regions and the two Fur clusters (171 bitscentered at thorn8 from the translational start) are the same(Table 4) These results strongly suggest that in E coli
Fur is a direct repressor of iron metabolism genes and anindirect repressor or activator of other genes
McHugh et al (13) used an altered information theoryapproach for modeling Fur binding (63) to identify whichgenes are under direct Fur control Our model identifiedsites in all genes that their model did as well as fourothers fhuE (124 bits at 1160880) exbB (191 bits at3150076) fimE (108 bits at 4539697) and ydfK (171 bitsat 1631071)
The indirect activation by Fur requires an intermediateregulator which could directly target Fur-activated genesA small regulatory RNA gene ryhB is one such importantplayer Earlier study revealed several genes that aredirectly repressed by ryhB (57) Recently a microarrayanalysis of ryhB control by Masse et al (60) expanded thisdirect target set to 18 operons They also identified anadditional 10 indirectly down-regulated operons and 10up-regulated operons The 10 ryhB indirectly repressedoperons are directly repressed by Fur and for all ofthese operons we found strong Fur binding clusters(higher than 13 bits) in the corresponding promoterregions In contrast only a few of the other 28 ryhB-controlled operons (18 directly repressed operons and10 up-regulated operons) have a predicted Fur bindingcluster above 10 bits These include acnA (176 bits at1333484) ydhD (109 bits at 1732367) and oppA (215 bitsat 1298973) The acnA and oppA Fur clusters are bothstrong the former cluster has three overlapping sites of176 98 and 142 bits and the latter has two overlapp-ing sites of 215 and 96 bits (Table 4) Howevermicroarray analysis did not find these two genes to berepressed by Fur (1360) suggesting that the Fur bindingclusters may not be the primary control elements ofthese genes
DISCUSSION
Fur binding models
In this study we used experimentally proven sites to createinformation theory models of Fur binding Our models
Table 3 Scan of footprinted Fur binding sites in E coli
Accession Gene Footprinted region References Number Ri Percentage
NC_000913 fhuF1 4603763 ndash 4603815 This work 4 223 91NC_000913 fhuF2 4603680 ndash 4603733 This work 4 146 66NC_000913 fepA-fes 611868 ndash 611915 (35) 4 187 77NC_000913 fepB 623939 ndash 623969 (39) 2 167 81NC_000913 entC 624054 ndash 624102 (39) 4 231 76NC_000913 fur 709931 ndash 709960 (34) 2 123 83NC_000913 tonB 1309040 ndash 1309072 (838) 3 115 88NC_000913 cir 2244957 ndash 2244999 (32) 3 207 72NC_000913 sodA 4098726 ndash 4098780 (36) 3 233 56NC_000913 fecA 4514752 ndash 4514789 (33) 2 176 66NC_000913 fecI 4516300 ndash 4516340 (33) 7 231 144M10930 iucA 291 ndash 393 (37) 13 203 93NC_000913 fepD-entS 621394 ndash 621462 (58) 4 201 54
The footprinted regions were scanned with the 24-site M9 model (Rigt0) to test the modelrsquos validity This table lists the GenBank accession numberthe gene promoter that was footprinted the coordinates of the footprinted region the references for the footprints the number of walkers in theregion the strongest Ri value in the same region and the percent of the footprint covered by sequence walkers This table is a summary of Figure 5Aand Supplementary Figure S8 Sites in the promoters that are marked by an asterisk were used to build the 11-site models (Figure 1 SupplementaryFigure S1) in the case of iucA two sites were used in the model (see Materials and Methods Section)
6772 Nucleic Acids Research 2007 Vol 35 No 20
especially the 24-site M9 model approximate the bindingcharacteristics of Fur more fully than previous modelsthat depended on conventional consensus sequences datafrom multiple species and sequences that were not foot-printed (13251) The rigorous approach used hererevealed new binding sites disproved two sites predictedby a consensus sequence and clarified the manner inwhich the protein binds
To produce these models binding sequences weremultiply aligned to maximize the information content indifferent window sizes Because multiple Fur sites overlapour analysis of proven Fur binding sequences from three
bacterial species gave two or three different alignmentsdepending on the species (Figure 1 SupplementaryFigure S6) By testing these models against publisheddata we showed clearly that the M9 model representssingle-dimer binding sites the M12 model is for two-dimerbinding while the compact M6 alignment was caused bycompression of the binding sequences into a small window(Tables 1 and 2 Figures 1 and 2 Supplementary FiguresS3 S5 and S7)As evaluated by their respective M9 models our
initial set of 11 footprinted E coli Fur binding sitesall contain overlapping sites of 6-base spacing
Table 4 Strong Fur clusters in the E coli genome predicted by the 24-site M9 model
Ri Number Size Coordinate Genes Experimental Position References(bits) (bp) data
274 12 95 2066611 yoeA G 48 NA274 5 31 786853 gpmA G 35 (1351)253 4 28 579824 nohB 133 (1)252 4 31 3579054 ryhB G 15 (57)233 3 31 4098746 sodA F 87 (6069)231 7 59 4516309 fecI F 51 (1360)231 4 37 624072 entC F 36 (39)228 2 25 1903412 yebN 300 NA223 4 48 4603779 fhuFa FG 93 (135160)223 3 25 1634624 nohA G 233 (51)217 2 25 1886651 fadD G +1119b NA215 2 25 1298973 oppA G 233 (7071)213 3 31 1762463 sufA 53 (1360)207 3 31 2244980 cir F 189 (1372)206 5 40 1118358 yceJ 89 NA202 3 31 3465053 bfd 40 (1360)201 10 53 1787602 ydiE 35 (13)201 2 35 1080513 ycdN 57 (1360)199 8 106 3214572 yqjH 59 (13)199 7 103 1752756 ydhY 255 NA199 2 25 2510784 mntH 56 (7374)196 3 25 2231892 yeiT 163 NA196 2 25 1577358 yddA +8c (13)195 3 31 167439 fhuA G 45 (1360)194 4 37 621437 fepD ndash entS FG 25 86 (1375)193 3 52 675768 ybeQ +2c NA191 3 31 3150076 exbB G 70 (136072)187 8 96 2784036 ygaQ 383 NA187 4 37 611889 fepA ndash fes F 172 149 (13)186 7 38 3266404 yhaC 33 NA186 4 28 331901 yahA G +306e NA183 4 61 331085 yahA 510e NA176 6 41 1333484 acnA 371 (5776ndash78)176 2 25 4514779 fecA F 79 (1360)176 1 19 2254041 yeiL +5d NA175 3 25 490057 priC ndash ybaN 21 49 (7980)173 2 29 2454142 yfcV 474 NA172 8 41 3582772 yrhB 10 NA172 4 37 840877 fiu 123 (13)171 3 40 1631071 ydfK +8c (13)171 3 40 1432273 ynaE +8c (13)171 2 25 4570225 yjiT 212 NA
The strongest individual information value (Ri bits) in each cluster is shown followed by the number of walkers (Rigt 0 bits) the size of the clusterthe coordinate of the strongest site the genes that are presumably controlled experimental data footprinted (F) andor gel shifted (G) sites thelocation of the strongest Fur site relative to the gene start or stop codon (as noted) and references about Fur regulation of the genes (NA means noreference is available)aThis cluster corresponds to the high-affinity region (fhuF1) in the fhuF promoter (Figure 5)bThe coordinate is high because Fur cluster is inside the fadD coding regioncThese Fur clusters overlap with translational startsdThis Fur cluster overlaps with the stop codoneThe yahA gene has two Fur clusters one is inside the coding region another is 510 bases upstream of the translational start The Fur cluster insidethe gene was tested by gel shift experiments (Figure 3)
Nucleic Acids Research 2007 Vol 35 No 20 6773
(Supplementary Figure S8) while only 7 of the 20P aeruginosa Fur binding sites and 6 of the 22 B subtilisFur binding sites are single-dimer binding sites (data notshown) This scarcity of multiple sites explains why anM12 alignment can be obtained for E coli but not forP aeruginosa or B subtilis When we performed multiplealignment by only using overlapping sites of 6-bp spacingwe obtained M12 alignments for both P aeruginosa andB subtilis (data not shown) These observations alsosupport the use of M9 as a single-dimer modelOur multiple alignment method should apply to any
DNA binding sites that contain internal direct repeats asrepetition within binding sites may cause alternativealignments of the sites In the RegulonDB (httpregulondbccgunammx) (64) a Fur binding modelwas created by multiply aligning 47 Fur sites althoughit is not clear what these 47 sites are and how thealignment was made Based on their alignment (httpregulondbccgunammxhtmlmatrix_Alignmentjsp) we made a sequence logo (data notshown) which we found to be similar to our M6 logosAlthough the M6 models appear to be able to predictsome sites (eg Tables 1 and 2 Supplementary Figures S3S5 and S7) the models only represent the internal partof overlapping Fur binding sites ie from 6 to thorn6 of theM12 model (Figure 1A)A 18 A crystal structure for P aeruginosa Fur
was obtained by Pohl et al and based on this theauthors proposed a single-dimer binding model (9) TheP aeruginosa crystal structure contains a putative DNAbinding a-helix H4 and a loop between helices H1 and H2both of which were proposed to be involved in bindingDNA through contacting bases in the major groove Thusa single dimer would protect two consecutive majorgrooves on the same face of DNA and the minimalrecognition unit of Fur should be close to 20 bp (9) OurM9 binding site models (19 bp a 9-1-9 inverted repeat) fitwith this single-dimer binding model closely Pohl et alalso proposed a two-dimer binding model in whichtwo Fur dimers bind two overlapping sites that are 5-bpapart (9) However in our whole genome scans andfootprint scans as well we did not observe any case of5-bp separated overlapping sites above 5 bits Instead wemostly found 3- or 6-bp separated overlapping sitesAbove five bits we found 95 cases of 3-bp spacing and92 cases of 6-bp spacingSeveral other consensus-based Fur binding site models
have been proposed to interpret a 19-bp consensuslsquoFur boxrsquo (Figure 6) (65) Within these two earlier
models the classical model and hexamer model(573754) interpreted the lsquoFur boxrsquo as a single recogni-tion unit of a Fur dimer Later Lavrrar and McIntoshsuggested that a 13-bp inverted repeat (6-1-6) is theminimal unit recognized by a single Fur dimer and thattwo overlapping lsquo6-1-6rsquo motifs correspond to the Fur boxwhich is required for high-affinity binding of Fur (5558)Baichoo and Helmann reinterpreted the Fur box as twooverlapping 7-1-7 inverted repeats with a 6-bp spacingand suggested that a 7-1-7 site but not 6-1-6 representsthe minimal recognition unit of Fur (40) Baichoo andHelmann also demonstrated that high-affinity bindingby Fur can happen on a single-dimer binding site ie aninverted 7-1-7 site
The 7-1-7 model basically agrees with our M9 models(Figure 6) The main difference is that the 7-1-7 model didnot count the weakly conserved bases at positions 9 8thorn8 and thorn9 in the B subtilis Fur binding site alignment(Figure 4C) As the E coli and P aeruginosa M9 modelshave highly conserved bases at these four positions(Figure 4A and B) and also the Fur binding mechanismsfor these three species are similar (9) we suggest thata minimal Fur binding unit of 19 bp (9-1-9) as representedby our M9 models is more reasonable
Relative to our M9 models the Fur box only representsthe internal part of two overlapping sites that areseparated by 6 bases ie from 9 to thorn9 of the M12logo (Figures 1A and 6) thus any predictions based on theFur box may not be accurate Two sites (exbD and ygaC)that were predicted by matching to the Fur box wereshown to be non-binding (Figure 3)
Many difficulties in understanding Fur binding sites canbe attributed to the choice of consensus sequences asa model (66) In contrast to the individual informationweight matrix model the consensus sequence methodignores the varying importance of bases by treatingmismatches equivalently In addition the consensusmethod does not have a criterion for an acceptablenumber of mismatches
Sequence walkers predicted by the 24-site M9 modelcover 83 of the footprints (Table 3 Figure 5Supplementary Figure S8) Similar results were obtainedfor both the P aeruginosa and B subtilis M9 models(data not shown) These results strongly suggest that ourM9 models can accurately predict Fur binding
Fur Binding Site Clusters
The whole genome scan with the 24-site M9model revealed363 Fur binding clusters in E coli most of which containmultiple walkers (up to 14) covering regions up to about140 bases This is comparable to the size of knownfootprints which ranges from 30 to 103 bases (Table 3Supplementary Figure S8) Most of the clusters (303 outof 363 83) were found to overlap with one or morestrong s70 promoters Indeed the four Fur clusters inthe middle of coding regions gspC garP yahA and fadDthat we tested by gel mobility shift assays (Figure 3) allhave one or more potential s70 sites (Ri gt 4 bits) exactlyoverlapping them (see Figures S10ndashS13) The function ofthese control elements is unknown
Fur boxClassical model 9-1-9Hexamer model 6-6-1-6Lavrrar model 6-1-6
Baichoo model 7-1-7
M9 model 9-1-9
GATAATGATAATCATTATC
========gtOlt========
=====gt=====gtOlt=====
=====gtOlt==========gtOlt=====
t=====gtOlt============gtOlt=====a
aat=====gtOlt================gtOlt=====att
Figure 6 Different Fur binding models to interpret the Fur box
6774 Nucleic Acids Research 2007 Vol 35 No 20
We correlated our whole genome scan with publishedmicroarray data about Fur in E coli (1360) The resultsshowed that all operons that are involved in ironmetabolism have a predicted strong Fur binding cluster(higher than 13 bits) suggesting direct repression ofthese genes by Fur In contrast only a few of otherFur-activated or repressed operons have a predicted Furcluster higher than 10 bits suggesting indirect control ofthese genes by Fur It has been shown that RNA generyhB directly targets and represses a set of other genes andryhB itself is directly repressed by Fur thus ryhB playsan important role in indirect activation of genes by Furin E coli (135760)
It has been suggested that in N meningitidis Fur canalso directly activate genes by binding to the regionupstream of the promoters (59) However our wholegenome analysis did not support this activation mechan-ism in E coli The E coli ryhB gene is located betweengenes yhhX and yhhY No corresponding homologs of thegenes ryhB yhhX or yhhY can be found in N meningitidisThis suggests that Fur activation mechanisms may bedifferent in different species
Disentangling Fur models
Each binding site that we have studied in detail withinformation theory has had unique properties andchallenges for analysis For example ribosome bindingsites were initially modeled as rigid objects (16) but whenenough sites became available it was clear that theShinendashDalgarno sequence was being lost because it has avariable distance to the initiation codon (67) and a flexiblemodel was required (48) The flexible model was latersuccessfully applied to s70 promoters (47) and that wasused to study Fur sites Likewise information theoryanalysis revealed that Fis binding sites are commonlyfound to overlap each other (68) leading to someuncertainty as to how much nearby sites influence thesites being studied in the alignments Overlapping Fursites that are bound by Fur protein dimers create an evenmore complex situation
Although the 24-site M9 model was quite successful themodel is comprised of information from multiple Fur sitesbecause the Fur dimers overlap on the DNA Since theclusters frequently display 6-base separation of sitespositions 3 to thorn9 of the model contain data frompositions 9 to thorn3 of the adjacent Fur sites eg see thelogo of Figure 4 and the M9 walkers of Figure 2AIn addition information from sites separated by 3 basesand the complements of all of these also contribute to theoverall information content of the model It is not obvioushow to disentangle these effects We suggest that the M9model can be scanned across the genome to locate strongsingle-dimer binding sites which could then be experi-mentally confirmed and used to construct a disentangledmodel
Unfortunately the distribution of the individual infor-mation in these sites would be determined by the modelused to locate them and this could introduce biases Forexample the mean of the distribution can be altered justby selecting different subsets of sites from the distribution
A similar bias has occurred in binding site databaseswhere the consensus sequence was used to locate newsites (23) Thus although the sequence logo would nolonger contain self-overlapping data it would not neces-sarily represent the natural distribution of site strengthsTo approach such an ultimate model may require furtherscanning of footprinted regions Incorporation of theweaker sites in the footprints would of course re-entanglethe model so such scans might be best used only todetermine whether the model can detect the weaker sitesA viable solution may be to obtain all single-dimerbinding sites in the genome that have more than zero bitsand can be confirmed experimentally If the resultingmodel matches the observed footprints it may be consi-dered complete On the other hand if the resultingmodel does not match the footprints the mode of bindingby Fur could be different between single dimers andclusters
SUPPLEMENTARY DATA
Supplementary data are available at NAR Online
ACKNOWLEDGEMENTS
We thank Tom OrsquoHalloran and Caryn Outten for Furprotein Pete Rogan for helping to develop the local bestconcept Elaine Bucheimer for writing the localbestprogram and Lakshmanan Iyer Brent Jewett DanielleNeedle Xiao Ma Shu Ouyang Denise Rubens andBruce Shapiro for their helpful comments and discussionsKAL also thanks the National Cancer Institute atFrederick for sponsoring the Werner H Kirsten StudentIntern Program This publication has been funded in partwith Federal funds from the National Cancer InstituteNational Institutes of Health under contract NO1-CO-12400 The content of this publication does not necessarilyreflect the views or policies of the Department of Healthand Human Services nor does mention of trade namescommercial products or organizations imply endorsementby the US Government This research was supported inpart by the Intramural Research Program of the NIHNational Cancer Institute Center for Cancer ResearchFunding to pay the Open Access publication charges forthis article was provided by National Cancer Institute
Conflict of interest statement None declared
REFERENCES
1 StojiljkovicI BaumlerAJ and HantkeK (1994) Fur regulon ingram-negative bacteria Identification and characterization of newiron-regulated Escherichia coli genes by a fur titration assay[published erratum appears in J Mol Biol 1994 Jul 15240(3)271]J Mol Biol 236 531ndash545
2 Le CamE FrechonD BarrayM FourcadeA and DelainE(1994) Observation of binding and polymerization of Fur repressoronto operator-containing DNA with electron and atomic forcemicroscopes Proc Natl Acad Sci USA 91 11816ndash11820
3 CalderwoodSB and MekalanosJJ (1987) Iron regulation ofShiga-like toxin expression in Escherichia coli is mediated by thefur locus J Bacteriol 169 4759ndash4764
Nucleic Acids Research 2007 Vol 35 No 20 6775
4 NiederhofferEC NaranjoCM BradleyKL and FeeJA (1990)Control of Escherichia coli superoxide dismutase (sodA and sodB)genes by the ferric uptake regulation (fur) locus J Bacteriol 1721930ndash1938
5 EscolarL Perez-MartinJ and de LorenzoV (1998) Binding of theFur (Ferric Uptake Regulator) repressor of Escherichia coli toarrays of the GATAAT sequence J Mol Biol 283 537ndash547
6 BaggA and NeilandsJB (1987) Ferric uptake regulation proteinacts as a repressor employing iron (II) as a cofactor to bind theoperator of an iron transport operon in Escherichia coliBiochemistry 26 5471ndash5477
7 de LorenzoV WeeS HerreroM and NeilandsJB (1987)Operator sequences of the aerobactin operon of plasmid ColV-K30binding the ferric uptake regulation (fur) repressor J Bacteriol169 2624ndash2630
8 AlthausEW OuttenCE OlsonKE CaoH andOrsquoHalloranTV (1999) The ferric uptake regulation (Fur) repressoris a zinc metalloprotein Biochemistry 38 6559ndash6569
9 PohlE HallerJC MijovilovichA Meyer-KlauckeWGarmanE and VasilML (2003) Architecture of a protein centralto iron homeostasis crystal structure and spectroscopic analysis ofthe ferric uptake regulator Mol Microbiol 47 903ndash915
10 KammlerM SchonC and HantkeK (1993) Characterization ofthe ferrous iron uptake system of Escherichia coli J Bacteriol 1756212ndash6219
11 DesaiPJ AngererA and GencoCA (1996) Analysis of Furbinding to operator sequences within the Neisseria gonorrhoeae fbpApromoter J Bacteriol 178 5020ndash5023
12 HeidrichC HantkeK BierbaumG and SahlHG (1996)Identification and analysis of a gene encoding a Fur-likeprotein of Staphylococcus epidermidis FEMS Microbiol Lett 140253ndash259
13 McHughJP Rodriguez-QuinonesF Abdul-TehraniHSvistunenkoDA PooleRK CooperCE and AndrewsSC(2003) Global iron-dependent gene regulation in Escherichia coliA new mechanism for iron homeostasis J Biol Chem 27829478ndash29486
14 ShannonCE (1948) A mathematical theory of communicationBell Syst Tech J 27 379ndash423 623ndash656 httpcmbell-labscomcmmswhatshannondaypaperhtml
15 PierceJR (1980) An Introduction to Information Theory SymbolsSignals and Noise 2nd edn Dover Publications Inc New York
16 SchneiderTD StormoGD GoldL and EhrenfeuchtA (1986)Information content of binding sites on nucleotide sequencesJ Mol Biol 188 415ndash431 httpwwwccrnpncifcrfgov~tomspaperschneider1986
17 SchneiderTD and MastronardeD (1996) Fast multiple alignmentof ungapped DNA sequences using information theory and arelaxation method Discrete Appli Math 71 259ndash268 httpwwwccrnpncifcrfgov~tomspapermalign
18 SchneiderTD and StephensRM (1990) Sequence logos a newway to display consensus sequences Nucleic Acids Res 186097ndash6100 httpwwwccrnpncifcrfgov~tomspaperlogopaper
19 SchneiderTD (1997) Information content of individual geneticsequences J Theor Biol 189 427ndash441 httpwwwccrnpncifcrfgov~tomspaperri
20 SchneiderTD (1997) Sequence walkers a graphical method todisplay how binding proteins interact with DNA or RNAsequences Nucleic Acids Res 25 4408ndash4415 httpwwwccrnpncifcrfgov~tomspaperwalker erratum NAR 26(4)1135 1998
21 PaninaEM MironovAA and GelfandMS (2001) Comparativeanalysis of FUR regulons in gamma-proteobacteria Nucleic AcidsRes 29 5195ndash5206
22 SchneiderTD (2000) Evolution of biological informationNucleic Acids Res 28 2794ndash2799 httpwwwccrnpncifcrfgov~tomspaperev
23 VyhlidalCA RoganPK and LeederJS (2004) Development andrefinement of pregnane X receptor (PXR) DNA binding site modelusing information theory insights into PXR-mediated gene regula-tion J Biol Chem 279 46779ndash46786
24 ZhengM DoanB SchneiderTD and StorzG (1999) OxyR andSoxRS regulation of fur J Bacteriol 181 4639ndash4643 httpwwwccrnpncifcrfgov~tomspaperoxyrfur
25 HengenPN BartramSL StewartLE and SchneiderTD (1997)Information analysis of Fis binding sites Nucleic Acids Res 254994ndash5002 httpwwwccrnpncifcrfgov~tomspaperfisinfo
26 WoodTI GriffithKL FawcettWP Jair K-W SchneiderTDand WolfRE (1999) Interdependence of the position andorientation of SoxS binding sites in the transcriptional activation ofthe class I subset of Escherichia coli superoxide-inducible promotersMol Microbiol 34 414ndash430
27 ZhengM WangX DoanB LewisKA SchneiderTD andStorzG (2001) Computation-directed identification of OxyR-DNAbinding sites in Escherichia coli J Bacteriol 183 4571ndash4579
28 RoganPK FauxBM and SchneiderTD (1998) Informationanalysis of human splice site mutations Hum Mutat 12 153ndash171Erratum in Hum Mutat 199913(1)82 httpwwwccrnpncifcrfgov~tomspaperrfs
29 ChenZ and SchneiderTD (2005) Information theory basedT7-like promoter models classification of bacteriophages anddifferential evolution of promoters and their polymerasesNucleic Acids Res 33 6172ndash6187 httpwwwccrnpncifcrfgov~tomspaperst7like
30 ChenZ and SchneiderTD (2006) Comparative analysis of tandemT7-like promoter containing regions in enterobacterial genomesreveals a novel group of genetic islands Nucleic Acids Res 341133ndash1147 httpwwwccrnpncifcrfgov~tomspaperst7island
31 SchneiderTD (1996) Reading of DNA sequence logos predictionof major groove binding by information theory Meth Enzymol274 445ndash455 httpwwwccrnpncifcrfgov~tomspaperoxyr
32 GriggsDW and KoniskyJ (1989) Mechanism for iron-regulatedtranscription of the Escherichia coli cir gene metal-dependentbinding of Fur protein to the promoters J Bacteriol 1711048ndash1052
33 AngererA and BraunV (1998) Iron regulates transcription of theEscherichia coli ferric citrate transport genes directly and throughthe transcription initiation proteins Arch Microbiol 169 483ndash490
34 de LorenzoV HerreroM GiovanniniF and NeilandsJB (1988)Fur (ferric uptake regulation) protein and CAP (catabolite-activatorprotein) modulate transcription of fur gene in Escherichia coliEur J Biochem 173 537ndash546
35 HuntMD PettisGS and McIntoshMA (1994) Promoter andoperator determinants for fur-mediated iron regulation in thebidirectional fepA-fes control region of the Escherichia colienterobactin gene system J Bacteriol 176 3944ndash3955
36 TardatB and TouatiD (1993) Iron and oxygen regulation ofEscherichia coli MnSOD expression competition between the globalregulators Fur and ArcA for binding to DNA Mol Microbiol 953ndash63
37 EscolarL Perez-MartinJ and de LorenzoV (2000) Evidence ofan unusually long operator for the fur repressor in the aerobactinpromoter of Escherichia coli J Biol Chem 275 24709ndash24714
38 YoungGM and PostleK (1994) Repression of tonB transcriptionduring anaerobic growth requires Fur binding at the promoterand a second factor binding upstream Mol Microbiol 11943ndash954
39 BrickmanTJ OzenbergerBA and McIntoshMA (1990)Regulation of divergent transcription from the iron-responsivefepB-entC promoter-operator regions in Escherichia coliJ Mol Biol 212 669ndash682
40 BaichooN and HelmannJD (2002) Recognition of DNA by Fura reinterpretation of the Fur box consensus sequence J Bacteriol184 5826ndash5832
41 OchsnerUA and VasilML (1996) Gene repression by the ferricuptake regulator in Pseudomonas aeruginosa cycle selection of iron-regulated genes Proc Natl Acad Sci USA 93 4409ndash4414
42 BaichooN WangT YeR and HelmannJD (2002) Globalanalysis of the Bacillus subtilis Fur regulon and the iron starvationstimulon Mol Microbiol 45 1613ndash1629
43 ChristoffersenCA BrickmanTJ Hook-BarnardI andMcIntoshMA (2001) Regulatory architecture of the iron-regulatedfepD-ybdA bidirectional promoter region in Escherichia coliJ Bacteriol 183 2059ndash2070
44 ChaiS WelchTJ and CrosaJH (1998) Characterization of theinteraction between Fur and the iron transport promoter of thevirulence plasmid in Vibrio anguillarum J Biol Chem 27333841ndash33847
6776 Nucleic Acids Research 2007 Vol 35 No 20
45 EscolarL Perez-MartinJ and de LorenzoV (1998) Coordinatedrepression in vitro of the divergent fepA-fes promoters ofEscherichia coli by the iron uptake regulation (Fur) proteinJ Bacteriol 180 2579ndash2582
46 EscolarL de LorenzoV and Perez-MartinJ (1997)Metalloregulation in vitro of the aerobactin promoter of Escherichiacoli by the Fur (ferric uptake regulation) protein Mol Microbiol26 799ndash808
47 ShultzabergerRK ChenZ LewisKA and SchneiderTD (2007)Anatomy of Escherichia coli s70 promoters Nucleic Acids Res 35771ndash788 httpwwwccrnpncifcrfgov~tomspaperflexprom
48 ShultzabergerRK BucheimerRE RuddKE andSchneiderTD (2001) Anatomy of Escherichia coli ribosomebinding sites J Mol Biol 313 215ndash228 httpwwwccrnpncifcrfgov~tomspaperflexrbs
49 HiraoI KawaiG YoshizawaS NishimuraY IshidoYWatanabeK and MiuraK (1994) Most compact hairpin-turnstructure exerted by a short DNA fragment d(GCGAAGC) insolution an extraordinarily stable structure resistant to nucleasesand heat Nucleic Acids Res 22 576ndash582
50 Eick-HelmerichK and BraunV (1989) Import of biopolymers intoEscherichia coli nucleotide sequences of the exbB and exbD genesare homologous to those of the tolQ and tolR genes respectivelyJ Bacteriol 171 5117ndash5126
51 VassinovaN and KozyrevD (2000) A method for direct cloning ofFur-regulated genes identification of seven new Fur-regulated lociin Escherichia coli Microbiology 146 3171ndash3182
52 ToledanoMB KullikI TrinhF BairdPT SchneiderTD andStorzG (1994) Redox-dependent shift of OxyR-DNA contactsalong an extended DNA binding site a mechanism for differentialpromoter selection Cell 78 897ndash909
53 BlattnerFR PlunkettG III BlochCA PernaNT BurlandVRileyM Collado-VidesJ GlasnerJD RodeCK et al (1997)The complete genome sequence of Escherichia coli K-12 Science277 1453ndash1474
54 BaggA and NeilandsJB (1987) Molecular mechanism of regula-tion of siderophore-mediated iron assimilation Microbiol Rev 51509ndash518
55 LavrrarJL and McIntoshMA (2003) Architecture of a Furbinding site a comparative analysis J Bacteriol 185 2194ndash2202
56 OrchardK and MayGE (1993) An EMSA-based method fordetermining the molecular weight of a protein ndash DNA complexNucleic Acids Res 21 3335ndash3336
57 MasseE and GottesmanS (2002) A small RNA regulates theexpression of genes involved in iron metabolism in Escherichia coliProc Natl Acad Sci USA 99 4620ndash4625
58 LavrrarJL ChristoffersenCA and McIntoshMA (2002)FurndashDNA interactions at the bidirectional fepDGC-entS promoterregion in Escherichia coli J Mol Biol 322 983ndash995
59 DelanyI RappuoliR and ScarlatoV (2004) Fur functions as anactivator and as a repressor of putative virulence genes in Neisseriameningitidis Mol Microbiol 52 1081ndash1090
60 MasseE VanderpoolCK and GottesmanS (2005) Effect ofRyhB small RNA on global iron use in Escherichia coliJ Bacteriol 187 6962ndash6971
61 ArgamanL HershbergR VogelJ BejeranoG WagnerEGMargalitH and AltuviaS (2001) Novel small RNA-encodinggenes in the intergenic regions of Escherichia coli Curr Biol 11941ndash950
62 WassarmanKM RepoilaF RosenowC StorzG andGottesmanS (2001) Identification of novel small RNAs usingcomparative genomics and microarrays Genes Dev 15 1637ndash1651
63 RobisonK McGuireAM and ChurchGM (1998) A compre-hensive library of DNA-binding site matrices for 55 proteins appliedto the complete Escherichia coli K-12 genome J Mol Biol 284241ndash254
64 SalgadoH Santos-ZavaletaA Gama-CastroS Millan-ZarateDDiaz-PeredoE Sanchez-SolanoF Perez-RuedaEBonavides-MartinezC and Collado-VidesJ (2001) RegulonDB(version 32) transcriptional regulation and operon organization inEscherichia coli K-12 Nucleic Acids Res 29 72ndash74
65 AndrewsSC RobinsonAK and Rodriguez-QuinonesF (2003)Bacterial iron homeostasis FEMS Microbiol Rev 27 215ndash237
66 SchneiderTD (2002) Consensus Sequence Zen ApplBioinformatics 1 111ndash119 httpwwwccrnpncifcrfgov~tomspaperszen
67 RuddKE and SchneiderTD (1992) Compilation of E coliribosome binding sites In MillerJH (ed) A Short Course inBacterial Genetics A Laboratory Manual and Handbook forEscherichia coli and Related Bacteria Cold Spring HarborCold Spring Harbor Laboratory Press New York pp 1719ndash1745
68 HengenPN LyakhovIG StewartLE and SchneiderTD(2003) Molecular flip-flops formed by overlapping Fis sitesNucleic Acids Res 31 6663ndash6673
69 FeeJA (1991) Regulation of sod genes in Escherichia colirelevance to superoxide dismutase function Mol Microbiol 52599ndash2610
70 IgarashiK SaishoT YuguchiM and KashiwagiK (1997)Molecular mechanism of polyamine stimulation of thesynthesis of oligopeptide-binding protein J Biol Chem 2724058ndash4064
71 YoshidaM MeksuriyenD KashiwagiK KawaiG andIgarashiK (1999) Polyamine stimulation of the synthesis ofoligopeptide-binding protein (OppA) Involvement of a structuralchange of the Shine-Dalgarno sequence and the initiation codonaug in oppa mRNA J Biol Chem 274 22723ndash22728
72 LitwinCM and CalderwoodSB (1993) Role of iron in regulationof virulence genes Clin Microbiol Rev 6 137ndash149
73 MakuiH RoigE ColeST HelmannJD GrosP andCellierMF (2000) Identification of the Escherichia coli K-12Nramp orthologue (MntH) as a selective divalent metal iontransporter Mol Microbiol 35 1065ndash1078
74 PatzerSI and HantkeK (2001) Dual repression by Fe(2+)-Furand Mn(2+)-MntR of the mntH gene encoding an NRAMP-likeMn(2+) transporter in Escherichia coli J Bacteriol 1834806ndash4813
75 FurrerJL SandersDN Hook-BarnardIG and McIntoshMA(2002) Export of the siderophore enterobactin in Escherichia coliinvolvement of a 43 kDa membrane exporter Mol Microbiol 441225ndash1234
76 CunninghamL GruerMJ and GuestJR (1997) Transcriptionalregulation of the aconitase genes (acnA and acnB) of Escherichiacoli Microbiology 143 ( Pt 12) 3795ndash3805
77 FuentesAM Diaz-MejiaJJ Maldonado-RodriguezR andAmabile-CuevasCF (2001) Differential activities of theSoxR protein of Escherichia coli SoxS is not required for geneactivation under iron deprivation FEMS Microbiol Lett 201271ndash275
78 VargheseS TangY and ImlayJA (2003) Contrasting sensitivitiesof Escherichia coli aconitases A and B to oxidation and irondepletion J Bacteriol 185 221ndash230
79 ZavitzKH DiGateRJ and MariansKJ (1991) The PriB andPriC replication proteins of Escherichia coli Genes DNAsequence overexpression and purification J Biol Chem 26613988ndash13995
80 SandlerSJ (2005) Requirements for replication restart proteinsduring constitutive stable DNA replication in Escherichia coli K-12Genetics 169 1799ndash1806
81 SchneiderTD (2001) Strong minor groove base conservation insequence logos implies DNA distortion or base flipping duringreplication and transcription initiation Nucleic Acids Res 294881ndash4891 httpwwwccrnpncifcrfgov~tomspaperbaseflip
Nucleic Acids Research 2007 Vol 35 No 20 6777
conditions In our study we used two cutoff values fordifferent scansFor the whole E coli genome scan we used the lowest
information in the set of sites used to build the model asthe cutoff This should minimize false positives in ourpredictions as the model was built from experimentallyproven sites Groups of sites that were within 100 basesof each other were identified using the localbest programand the strongest one was selected to represent theregion This ensured that each region in the data set wasunique All sites with Ri values greater than 170 bits wereextracted (Table 4) This value was chosen simply to allowfor a manageable set of regions for further analysisAs Fur can extend binding to much weaker regions in
the footprints we should use a lower cutoff for scanningfootprinted sequences The second law of thermodynamicssets a theoretical lower bound for the information contentof a binding site (Ri) at zero bits (19) Therefore we useda zero-bit cutoff when scanning footprinted sequencesFur has been suggested to repress genes by binding to
their promoter elements (35 and 10 regions) therebyblocking the access of polymerase to the promoters(3343ndash46) To determine how many of the predictedFur binding clusters overlap with promoters we used aninformation theory-based flexible s70 binding model toscan the Fur clusters The s70 model was built from401 experimentally proven E coli promoters by uniformlytaking into account the information present in the 10the 35 and the uncertainty of the spacing between them(47) the same method has been successfully used to modelprokaryotic ribosome binding sites (48) The informationcontent (average conservation) of the promoter model islow (x frac14 65 bits SD=28 bits for the Ri distribution)and the individual information for a site in the modelranges from 03 to 126 bits The probability that a site isless than 4 bits is only 186 so we used 4 bits as a cutoffto predict reasonably strong non-activated promoters (47)
Gel mobility shift assays
Two sets of oligonucleotides containing predictedFur binding sites in E coli were designed andsynthesized (Oligos Etc) All oligonucleotides were self-complementary had a 50-GCTA-30 overhang on the50 end and contained a hairpin loop Oligos exbB exbDand fhuF contained a hairpin of the sequence 50-CGCGAAGC G-30 while the other 12 oligos [yoeAfepD-entS (formerly ybdA) gpmA ryhB fhuA nohAoppA gspC garP (formerly yhaU) yahA fadD and ygaC]contained a hairpin of the sequence 50-ACGATCGCGCGAAGC GCGATCGT-30 in the center Such loopsform a structure that is stabilized by base pairing betweenG3 and A5 of the central seven bases of each loop (49) andthey are convenient for use in DNA mobility shift assaysbecause of the exact equimolar concentrations of thecomplementary strands and their high stability (25)Three oligos containing the promoter regions for exbB
ygaC and the upstream region of exbD were created totest previously published consensus sequence predictions(5051) Eight oligos contained potential Fur-controlledpromoters identified using both an 11-site Fur model and
a s70 model as described above [yoeA fepD-entS (formerlyybdA) gpmA ryhB fhuA nohA oppA and fhuF] We werealso interested in whether Fur binds intragenically so fouroligos were synthesized that contained strong predictedsites found within gene coding regions (gspC garP yahAand fadD) These sequences can be found in SupplementaryTable S1 and Figure S1
Gel mobility shift assays were performed using the15 oligos (Figure 3) The oligonucleotides were labeled bya fill-in reaction with Taq DNA polymerase The 20 ml oflabeling mixture [20 pmol oligo 2 mM MgCl2 1PCRbuffer 5 units Taq polymerase (GibcoBRL) and 10 mMtetramethylrhodamine-6-dUTP (NEN)] was incubated at72C for 1 h 80 ml of ddH2O was added to the mixturefollowed by two phenol extractions a phenolchloroformextraction and a chloroform extraction The labeledoligonucleotides were diluted 15 in TE (10 mM TrisndashClpH 75 1 mM EDTA) boiled for 10 min and then placedon ice to prevent dimerization and trimerization
The labeled oligonucleotides (5 ml each 02 pmol) werethen incubated in Fur binding buffer (10 mM BisndashTrisndashHCl pH 75 5 mgml sonicated salmon sperm DNA 5glycerol 100 mM MnCl2 100 mgml BSA 1 mM MgCl240 mM KCl) with 150 nM Fur protein at 37C for 13 min(7) Fur was a gift from T OrsquoHalloran and C OuttenSamples were electrophoresed on a 5 polyacrylamidegel in Fur electrophoresis buffer (01 M BisndashTrisndashHClpH 75 10 mM MnCl2) at 120V for 1 h
Bands were visualized with an FMBIO II fluore-scent scanner (Hitachi) with an excitation wavelengthof 532 nm and a 585 nm filter for detection oftetramethylrhodamine
Footprinting
To generate the fhuF promoter construct (pGSO129) usedto test for Fur binding a 240-bp fragment amplified byPCR from genomic DNA (using the primers 50-GCGGCT GGA GAT GAA TTC GCC AGA TG and 50-GCCCTG CAA TCA GGG ATC CCG GCA GC) was clonedinto the BamHI and EcoRI sites of pUC18 (27) PurifiedFur protein generously provided by C Outten andT OrsquoHalloran was incubated with a Mn2+-containingbuffer according to de Lorenzo et al (7) DNase I foot-printing then was carried out as described previously (52)The 240-bp BamHIndashEcoRI fragment of pGSO129 waslabeled with [g-32P]ATP at either the BamHI site (topstrand) or the EcoRI site (bottom strand) The labeledfragments were incubated with 0 50 100 200 and 400 nMpurified Fur protein at room temperature for 5 minThe samples then were subjected to limited DNase Idigestion purified and separated on 8 polyacrylamidesequencing gels
RESULTS
E coli Fur binding model
Eleven footprinted Fur binding sites (see Table 3 andSupplementary Figure S1) (832ndash39) from E coliK-12 (53)were used to create an initial information theory model
6764 Nucleic Acids Research 2007 Vol 35 No 20
Because Fur binds as a dimer (954) the model includedboth the footprinted Fur sequences and their comple-ments The sequences were aligned using the programmalign which maximizes the information content within aregion of the alignment (a lsquowindowrsquo) by shuffling the
sequences back and forth (17) Multiple alignmentwith a window size of (12 thorn12) gave a sequencelogo that shows base conservation in the range of(12 thorn12) (Figure 1A) For convenience we named thislogo M12
A
B
C
M12208 bits(minus12 +12)
M9181 bits(minus9 +9)
M6154 bits(minus6 +6)
===========gtolt===========
========gtolt========
=====gtolt=====
========gtolt================gtolt========
=====gtolt==========gtolt=====
0
1
2
bit
s
5prime
minus20
GTAC
minus19
minus18
CGTA
minus17
CGTA
minus16
GACT
minus15
ACGT
minus14
GTCA
minus13
minus12
GTA
minus11
GTA
minus10
GACT
minus9CTAG
minus8
TCGA
minus7
GCTA
minus6
TCA
minus5
TA
minus4
CT
minus3
TAG
minus2
CGA
minus1
CGT
0
TA
1
GCA
2
GCT
3
ATC
4
GA
5
AT
6
GAT
7
GCTA
8
AGCT
9
GATC
10
CTGA
11
CAT
12
CAT
13 14
CAGT
15
TGCA
16
CTGA
17
GCAT
18
GCAT
19 20
CATG
3prime
0
1
2
bit
s
5prime
minus20
CGTA
minus19
minus18
minus17
GTCA
minus16
GCAT
minus15
GCTA
minus14
GTA
minus13
GACT
minus12
CAGT
minus11
CGA
minus10
GCAT
minus9
TA
minus
minus8
GTA
minus7
GCAT
minus6
CAG
minus5
CA
minus4
GAT
minus3
TA
minus2
GCA
minus1
GCT
0 1
CGA
2
CGT
3
AT
4
CTA
5
GT
6
GTC
7
GCTA
8CAT
9AT10
CGTA
11
GACT
12
GTCA
13
TCGA
14
CAT
15
CGAT
16
CGTA
17
CAGT
18 19 20
GCAT
3prime
0
1
2
bit
s
5prime minus20
GCA
minus19
GCAT
minus18
GCTA
minus
minus17
GCTA
minus16
GCAT
minus15
CAGT
minus14
TGCA
minus13
GTAC
minus12
GCTA
minus11
GTA
minus10
GACT
minus9
CAG
minus8
CTGA
minus7
GCAT
minus6
A
minus5
TA
minus4
CT
minus3
TCAG
minus2
CGA
minus1
AGT
0
TA
1
TCA
2
GCT
3
GATC
4
GA
5
AT
6
T7
GCTA
8
GACT
9
GATC
10
TCGA
11
CAT
12
GCAT
13
CTGA
14
ACGT
15
GTCA
16
GTCA
17
GCAT
18
GCAT
19
CGTA
20
CAGT
3prime
Figure 1 Different alignments of footprinted E coli Fur binding sites Sequence alignments were done using the program malign with differentwindow sizes [from (15 +15) to (5 +5)] (17) on 11 footprinted E coli Fur binding sequences and their complements (see Supplementary FigureS1 for sequences) Three classes of alignments M12 (A) M9 (B) and M6 (C) were obtained by using window sizes of (12 +12) (9 +9) and(6 +6) respectively In these logos the height of each letter is proportional to the frequency of that base at each position and the height of theletter stack is the conservation in bits (18) The dashed sine wave on each logo represents the 106 base helical twist of B-form DNA (3181) Thedouble dashed arrows on the top of each logo mark the inverted repeats in each model Note that the M12 logo corresponds to two overlapping M9sites as indicated by the two red double dashed arrows below the M12 logo A similar relationship occurs between the M9 and M6 logos
Nucleic Acids Research 2007 Vol 35 No 20 6765
The M12 logo appears to consist of two overlappingdirect repeats one is from 12 to thorn6 and the other from6 to thorn12 Because the multiple alignment process canincorporate one or the other of these repeats this suggeststhat different alignments may be obtained when otherwindows are used to align the sequences Because two Furdimers can bind overlapping sites with 6-base separation(4055) we then multiply aligned the sequences by usingwindow sizes ranging from (15 thorn15) to (5 thorn5) Thesesizes should cover the sites recognized by one or twoFur dimers (9) A total of seven different alignments wereobtained the corresponding sequence logos are shown inSupplementary Figure S2 According to the patternsshown in the logos the seven alignments that wereobtained can be further consolidated into and assignedto three basic alignment classes these alignment classesare represented by logos M12 M9 and M6 (Figure 1)Each of the three alignments itself is an inverted repeatthe M12 logo also appears to consist of two overlappingM9 sites (from 12 to thorn6 and from 6 to thorn12) and theM9 logo looks like two overlapping M6 sites (from 9to thorn3 and from 3 to thorn9) (Figure 1)To determine which of the three alignment classes best
represents a single-dimer binding model some publishedexperimental data was analyzed Lavrrar and McIntosh(55) used the Ferguson method (56) to determine howmany E coli Fur dimers can bind on five synthetic oligosand two natural sites fepB and entS We applied the threemodels to their sequences and used sequence walkersto show the predicted sites of each model A full list ofthe predictions (with walkers gt0 bits) is given inSupplementary Figure S3 The results show that themodels frequently found overlapping sites that areseparated by 3 or 6 bases The sites that are 6 basesapart can both be strong while 3-base separated siteshave one strong site and one weak site (SupplementaryFigure S3) Furthermore it has been shown that Furdimers can simultaneously bind two overlapping sites with6-bp spacing (40) but there is no experimental datashowing that 3-base separated overlapping sites can bebound at the same time Therefore to analyze the resultswe only counted major walkers (strong walkers that areseparated by 6 bases) for each model in each oligo
A summary of the results (with major walkers) is availablein Table 1 two examples of the prediction (with majorwalkers) are shown in Figure 2A and B The results showthat the M9 model made correct predictions for five ofthe seven oligos while the M12 and M6 models bothmade only two correct predictions Furthermore the M12model always predicted one less site than the M9 modelThese results suggest that the M9 model may be a single-dimer binding model the M12 model may representa two-dimer binding model while the M6 alignment maybe just a result of compression of the binding sequencesinto a small alignment window
Confirmation of 13 predicted sites and model refinement
Since our E coli Fur model had only 11 sites we wantedto refine the model by including more sites A genome scanwith the 11-site M9 model revealed 389 sites above 11 bits(the lowest information of a site in the model is 111 bits)Within these 13 sites (from 12 to 26 bits) were selectedfor experimental confirmation (Figure 3 SupplementaryTable S1 and Figure S1) These include eight sites locatedin promoter regions [yoeA fepD-entS (43) gpmA (= pgm)(51) ryhB (= yhhX-yhhY) (5157) fhuA (13) nohA (51)oppA (13) and fhuF (51)] and four sites inside genes(gspC garP yahA and fadD) We also included two sitesthat were predicted by consensus models to be in ygaCand exbD (5051) since our model only detected a weaksite (14 bits) in ygaC and a negative site (50 bits) inexbD (Figure 3) Gel mobility shift assays showed that all13 sites predicted by the M9 model shifted when incubatedwith Fur protein while the exbD site did not shift and theygaC site only showed extremely weak binding (Figure 3)
McHugh et al (13) found that both exbB and exbDwere regulated by Fur but their model did not detect abinding site in the exbB promoter while our informationtheory model successfully predicted that Fur binding siteand our gel shift results confirmed this (Figure 3)The exbD gene is located only seven bases downstreamof the exbB gene Both genes have the same orientationso they may belong to a single operon cotranscribed fromthe exbB promoter and coregulated by Fur
To strengthen our model we added the 13 confirmedsites to the 11 previously footprinted sites (Figure 4A)
Table 1 Information theory test of 11-site E coli Fur models M12 M9 and M6
Oligo1 Number of observed1
Fur dimers (nM)Number of M122
walkers (bits)Number of M92
walkers (bits)Number of M62
walkers (bits)
F-F 1 (25000) f 1 (55) 2 (90 94) f 1 (115)F-F-F 2 (3610) 1 (148) f 2 (118 153) f 2 (119 99)F-F-F-F 4 (630) 2 (166 192) 3 (118 189 153) 3 (119 119 99)F-F-x-R 2 (095) f 2 (221 43) f 2 (177 176) 1 (204)F-F-x-R-R 3 (078) 2 (296 227) f 3 (177 275 108) 2 (204 164)fepB 2 (860) 1 (157) f 2 (109 82) 1 (152)entS 3 (301) 2 (156 172) f 3 (75 172 95) 2 (83 122)
1Oligos and observed Fur dimers are from (55) The apparent Kd is also given (nM in parentheses) for each oligo2Number of sequence walkers of each model followed by the information (bits) of each walker are shown for each sequence The predictions thatmatch the number of observed Fur dimers are marked with a solid circle while those that do not are marked with an open circle Only majorwalkers (strong walkers with 6-bp separation) are counted (see the text) A full depiction of all walkers (gt0 bits) can be found in SupplementaryFigures S3 and S5
6766 Nucleic Acids Research 2007 Vol 35 No 20
As with the 11 sites multiple alignment of the 24 sitesusing different window sizes also gave three classes ofalignments M12 M9 and M6 (Supplementary Figure S4)When these models were tested against the same set ofsequences as that used to test the 11-site models (Table 1Supplementary Figure S3) similar but cleaner resultswere obtained ie the weak walkers that were not countedpreviously become weaker (some are even below 0 bits)but the strong walkers still have similar Ri values(Supplementary Figure S5) This confirms that ourcounting of major walkers predicted by the initial 11-sitemodels was reasonable
P aeruginosa and B subtilis Fur models
To compare E coli Fur models with other bacterialFur models we also built P aeruginosa and B subtilis Furbinding site models As with E coli Fur binding siteswe applied the same method to align footprinted Furbinding sites from P aeruginosa (41) and B subtilis (42)For both bacteria we obtained two classes of alignments
(Supplementary Figure S6) which correspond to theE coli M9 and M6 logos (Figure 1B and C) respectivelyNo M12 alignments were obtained for these two bacteriaprobably because some sites used to make the modelsare single-dimer binding sites (see Discussion Section)Both P aeruginosa and B subtilis M9 models arehighly similar to the E coli M9 model (Figure 4) TheP aeruginosa M6 logo looks almost identical to theE coli M6 logo while the B subtilis M6 logo is lesssimilar to the E coli M6 logo (Figure 1C SupplementaryFigure S6B and D)As with Lavrrar and McIntoshrsquos work on E coli Fur
binding sites (55) Baichoo and Helmann (40) performedsimilar experiments on eight synthetic oligos and twonatural sites (feuA and dhbA) with B subtilis Fur anddetermined how many Fur dimers can bind each of theseoligos We tested the B subtilis M9 and M6 modelswith these 10 sequences (Table 2 Figure 2C and Dand Supplementary Figure S7) The results show that theM9 model made correct predictions for all 10 sequenceswhile the M6 model was correct for only four oligos
A
B
fepB binds 2 Fur dimers
entS binds 3 Fur dimers
C
D
feuA binds 1 Fur dimer
dhbA binds 2 Fur dimers
10 20 30 5 g g a a t t c g a a a a t g a g a a g c a t t a t t g g a t c c c g 3
M12 157 bits
M9 109 bits
M9 82 bits
M6 152 bits
10 20 30 40 5 g g a a t t c g a t a a t g a a a t t a a t t a t c g t t a t c g g a t c c c g 3
M12 156 bits
M12 172 bits
M9 75 bits
M9 172 bits
M9 95 bits
M6 83 bits
M6 122 bits
10 20 30 5 a t t c c a a t t g a t a a t a g t t a t c a a t t g a a c a 3
M9 223 bits
M6 103 bits
M6 81 bits
10 20 30 5 a t t g a t a a t g a t a a t c a t t a t c a a t a g a t t g 3
M9 256 bits
M9 299 bits
M6 149 bits
M6 208 bits
M6 171 bits
Figure 2 Sequence walkers to test E coli (A and B) and B subtilis (C and D) Fur models (A and B) The three E coli Fur models (M12 M9 andM6) were tested with two E coli Fur binding sites fepB and entS (55) Sequence walkers of each model (green M12 red M9 blue M6) are shownfor each site (C and D) Two B subtilis Fur models (M9 and M6) were tested with two B subtilis Fur binding sites feuA and dhbA (40) Gray barsindicate natural sequence surrounded by synthetic DNA Different colored rectangles indicate different Fur models (by hue green M12 red M9blue M6) and their strengths (by saturation) (30) Only major walkers are shown in these figures for a full list of walkers (Rigt0 bits) please seeSupplementary Figures S3 S5 and S7
Nucleic Acids Research 2007 Vol 35 No 20 6767
For most of the oligos the M6 model predicted one moresite than the M9 model did Similar to our observationsin E coli these results strongly suggest that the M9 modelis a single-dimer binding model while the M6 alignment isdue to compression of the binding sequencesRelative to E coli Fur (NP_415209) the P aeruginosa
Fur (NP_253452) is moderately diverged (58 identical inprotein sequence) and the B subtilis Fur (NP_390233) isdistantly diverged (33 identity) However a comparisonof the three M9 models from these three species showsthat these models are highly similar to each other(Figure 4) suggesting that the FurndashDNA recognitionmechanism is conserved in even distantly related species
Scans of footprinted regions
Gel shift experiments cannot give precise Fur bindingregions Originally using the 11-site M9 model and laterusing the 24-site M9 E coli Fur model we predicted Fursites in the fhuF promoter region and then to validate ourmodel we performed DNase I footprinting on that regionTo further confirm the 24-site M9 model we also scannedit across all other available footprinted E coli Fur bindingregions and correlated our predictions with the publishedfootprints Fur has been documented to exhibit lsquosecondaryfootprintingrsquo protecting extended regions of DNA underhigher protein concentrations (Supplementary Figure S8)(333646) To reveal the extended regions we used a cutoffat zero bits since that is the theoretical lowest informationfor binding (19)The E coli 24-site M9 model predicts two separated
Fur binding clusters in the fhuF promoter region onecluster contains three strong sites of 161 208 and 223bits and the other contains two weaker sites of 146 and
93 bits (Figure 5A) The fhuF oligo used in the gel shiftexperiments in Figure 3 only contains the strong Furcluster To determine if both predicted clusters are pro-tected by Fur we performed DNase I footprinting on alarger fhuF promoter region (see Materials and MethodsSection) that contains both predicted Fur binding clustersThe results show that there are indeed two Fur-protectedregions in the fhuF promoter one has high affinity withFur (protected by 50 nM of Fur) and the other hasa lower affinity (weakly protected by 50 nM of Fur)(Figure 5B) The predicted strong Fur binding clustercovers 91 of the high-affinity footprinted region and theweak Fur cluster covers 66 of the low-affinity region(Table 3 Figure 5A)
There are 11 other footprinted E coli Fur bindingregions including the 10 regions used to build the initial11-site models and the fepD-entS promoter region (58)When these footprints were scanned similar coveragewas obtained (Table 3 Supplementary Figure S8)All 11 footprinted sequences displayed clusters of multipleoverlapping Fur walkers in these clusters the majorityof the walkers were separated by six bases some wereseparated by three bases (Table 3 SupplementaryFigure S8) For 7 of the 11 regions (fepA-fes fepB entCfur tonB cir and iucA) sequence walkers cover 83(72ndash93) of the footprints No sequence walkers coverthe secondary footprints in the fecA fepD-entS and sodAregions resulting in low coverage of these three footprints(66 54 and 56 respectively) The secondary sodAfootprint was only protected with a high concentration ofpurified Fur (200 and 500 nM) it was not protected bycrude extracts containing overproduced Fur (36) In theother region fecI several low-information content
A
B
yoeA fepD-entS gpmA ryhB fhuA nohA oppA
bits 257 172 243 220 161 205 187 11-site M9274 194 274 252 195 223 215 24-site M9
Fur minus + minus + minus + minus + minus + minus + minus +
gspC garP yahA fadD ygaC exbB exbD fhuF
bits 122 122 163 197 14 175 minus50 214 11-site M9144 138 186 217 minus34 191 minus71 223 24-site M9
Fur minus + minus + minus + minus + minus + minus + minus + minus +
Figure 3 Gel mobility shift assays to test predicted Fur binding sites Oligonucleotides containing predicted Fur binding sites were incubated without() or with (+) 150 nM E coli Fur protein and gel electrophoresed to test the 11-site M9 model (Figure 1B) Below each set of lanes is the strengthin bits of the strongest sequence walker found on each oligo by the 11-site M9 model Predictions by the 24-site M9 model which includes 13 of thesites tested here (not ygaC or exbD) are also given for comparison (A) The first set of oligos contain seven predicted Fur sites located in promoterregions (yoeA fepD-entS gpmA ryhB fhuA nohA and oppA) (B) The second set of oligos contain four predicted Fur sites located within genes(gspC garP yahA and fadD See Supplementary Figures S10ndashS13 for sequence walkers) two sites located in promoter regions (exbB and fhuF)and two consensus-predicted sites (ygaC and exbD) (5051) As expected all oligos that contain predicted sites (by the 11-site M9 model) exhibit oneor more mobility shifts following incubation with 150 nM Fur the oligo ygaC (14 bits) shows extremely weak binding and exbD (50 bits)does not shift
6768 Nucleic Acids Research 2007 Vol 35 No 20
walkers (02 and 08 bits) extend past the range of thefootprint resulting in 144 coverage of the footprint
In a previously published study Escolar et al synthe-sized oligonucleotides that contained repeats of thesequence GATAAT and determined Fur binding byfootprint experiments (5) No footprint was seen withone insert correspondingly no sequence walkers wereobserved in that sequence (Supplementary Figure S9)For the oligos with three four and five inserts sequencewalkers cover 83ndash90 of the footprints The 2-insertsequence had a weak interaction with Fur at high protein
concentrations and two overlapping walkers of 30 and42 bits appeared which cover the repeats (5) Theseresults demonstrate that the Fur model accurately predictsFur binding on both natural and synthetic sequences
Whole genome scan
The lowest information of any site in the 24-site M9 modelis 101 bits (Supplementary Figure S1) Since all sites inthe model are proven binding sites we used 10 bits asthe cutoff for a whole genome scan This presumably
A
B
C
======gtolt======
======gtolt======
======gtolt======
24 E coli Fur binding sites 196 bits (minus9 +9)
0
1
2b
its
5prime
minus20
CTGA
minus19
minus18
minus17
minus16
GCAT
minus15
GCTA
minus14
CGTA
minus13
GACT
minus12
CAGT
minus11
CGTA
minus10
GCAT
minus9
GTA
minus8
GTA
minus7
GCAT
minus6
CAG
minus5
GTCA
minus4
GAT
minus3
TA
minus2
TGCA
minus1
GACT
0 1
TCGA
2
ACGT
3
AT
4
TCA
5
CGAT
6
GTC
7
CTGA
8
CAT
9
CAT
10
CGTA
11
GCAT
12
GTCA
13
CTGA
14
GCAT
15
CGAT
16
CGTA
17 18 19 20
GCAT
3prime
20 P aeruginosa Fur sites 186 bits (minus9 +9)
0
1
2
bit
s
5prime
minus20
minus19
minus18
minus17
minus16
minus15
minus14
CGTA
minus13
ACGT
minus12
minus11
minus10
CGTA
minus9
CGTA
minus8
CTA
minus7
CT
minus6
TAG
minus5
GTCA
minus4
CAGT
minus3
TA
minus2
CTA
minus1
CT
0
GC
1
GA
2
GAT
3AT
4GTCA
5
CAGT
6
ATC
7
GA
8
GAT
9
GCAT
10
GCAT
11 12 13
TGCA
14
GCAT
15 16 17 18 19 20 3prime
22 B subtilis Fur sites 229 bits (minus9 +9)
0
1
2
bit
s
5prime
minus20
minus19
GCAT
minus18
GCTA
minus17
GACT
minus16
CTGA
minus15
minus14
GCAT
minus13
GCAT
minus12
minus11
GCAT
minus10
GCTA
minus9
CGTA
minus8
CGTA
minus7
AT
minus6
C
G
minus5
GCA
minus4
CGAT
minus3
TGA
minus2
TGA
minus1
GCAT
0
TAGC
1
GCTA
2
CAT
3
CAT
4
GCTA
5
CGT
6
G
C7
TA
8
GCAT
9
GCAT
10
CGAT
11
CGTA
12 13
CGTA
14
CGTA
15 16
GACT
17
CTGA
18
CGAT
19
CGTA
20 3prime
Figure 4 Comparison of M9 Fur models of three bacterial species Experimentally proven Fur binding sites from E coli (A) P aeruginosa (B) andB subtilis (C) were used to build the models for these three bacteria The marked region from 7 to +7 in each logo indicates the 7-1-7 model thatwas proposed by Baichoo and Helmann (40)
Nucleic Acids Research 2007 Vol 35 No 20 6769
minimizes false positives in our predictions When themodel was scanned across the E coli K-12 genome(NC_000913) a total of 412 sites were found above10 bits These sites belong to 363 unique clusters A list ofthe clusters is available at httpwwwccrnpncifcrfgovtomspapersfurThe 363 regions were extracted from 50 to thorn50
around the strongest site in each region and scanned withthe 24-site M9 model using a cutoff of 0 bits We thendetermined the exact region of each cluster that is coveredby walkers greater than 0 bits The results show that theseclusters contain 1ndash14 walkers covering a region from19ndash142 bases (350 bases on average) This is comparableto the known footprints (30ndash103 bases) (Table 3 Supple-mentary Figure S8) Of these clusters 155 are in inter-genic regions 143 are entirely inside gene coding regionsand 65 overlap the boundary between coding and inter-genic regionsTo determine how many of the Fur clusters overlap
with reasonably strong promoters we scanned these 363Fur clusters with a s70 promoter model (47) using a 4-bitcutoff The results showed that 303 clusters overlap withone or more strong promoters (Rigt4 bits) The other60 clusters do not overlap with a strong promoterWhen we scanned the flanking regions of these 60 clusters(50 to 0 of the left edge and 0 to thorn50 of the right edgeof each cluster) only 23 clusters were found to havestrong promoters (Ri gt 4 bits) in the flanking regionsWe examined the locations of Fur sites relative toknown promoter elements (transcriptional start site 10and 35) and found that they are more tightly distributedaround the promoter elements than observed with theDNA binding proteins Fis H-NS and IHF (47) (data notshown) These results suggest that Fur may mainly act as adirect repressor of genes by binding to and blockingthe promoters and that direct activation by Fur throughbinding to the region upstream of a promoter assuggested in Neisseria meningitidis by Delany et al (59)
may not be a common mechanism of Fur regulationin E coli
Out of the 363 clusters 42 were found containingat least one predicted Fur binding site above 170 bits(our arbitrary cutoff used to locate significant regions)(Table 4) Of the 42 clusters 39 were found to overlap withone or more strong s70 promoters (gt4 bits) The otherthree clusters fepD-entS yddA and fecA do not overlapScanning the flanking regions of these three clusters withthe s70 promoter model revealed that these Fur clustersare located between the respective promoters and transla-tional starts There may be only weak repression of thesegenes by Fur as none of them were significantly repressedby Fur according to microarray analyses (1360)
Scans of other proposed Fur-regulated genes
Many genes have been proposed to be Fur-regulated bycomparing conventional consensus sequences to promoterregions and also by homology to systems in other organ-isms Kammler et al annotated two Fur binding sites inthe promoter region of feoA in E coli by comparison toa consensus sequence (10) The 24-site M9 model predictstwo clusters of sites in the same region with each clustercovering one of the marked Fur boxes One cluster hasone major site of 105 bits (at 3538006) the other clusterhas two major overlapping sites of 109 (at 3538059) and134 bits (at 3538065) respectively The same authors haveconfirmed that Fur does bind to this region in vivo but theexact Fur binding regions have not been determined byfootprint experiments
Vassinova and Kozyrev used an in vivo selection tolocate Fur sites on Sau3A fragments from the E coligenome (51) The five regions from Figure 2 of their paperwere analyzed using sequence walkers Using an unidenti-fied consensus sequence yhhX (from 3578828 to 3577791)was predicted by Vassinova and Kozyrev to have two Furbinding sites In the same region we found one strongsite of 252 bits (at 3579054) (Table 4) overlapping twoweaker sites (82 and 83 bits) This cluster of Fur sites isactually located right in front of an sRNA gene ryhB(from 3579039 to 3578946) (Table 4) (6162) which hasbeen shown to be repressed by Fur (57) The promoterregion of gpmA (from 786818 to 786066 named pgm inreference 51) was predicted by consensus to contain twosites One of the highest sites (274 bits) was found in thisregion (at 786853) overlapping two weaker sites of 75and 56 bits The consensus sequence-predicted Fur site byVassinova and Kozyrev (51) is located about 600 basesupstream of the ygaC gene In the same region our 11-siteM9 model only found a 14-bit site (Figure 3) and the24-site M9 model found a negative site of 34 bits(Supplementary Figure S1) Instead in the nearby regionsthe 24-site M9 model found a 74-bit site (11 basesupstream of ygaC) and a 101-bit site (728 bases upstreamof ygaC) For nohA (from 1634391 to 1633822)two overlapping sites of 223 and 126 bits were found at1634624 and 1634630 (Table 4) The fifth region was fhuF(from 4603686 to 4602898) Figure 5 shows the DNase Ifootprints and predicted walkers in this region Onlythe high-affinity region (fhuF1) was identified by the
Table 2 Information theory test of B subtilis Fur models M9 and M6
Oligo1 Number of observed1
Fur dimers (nM)Number of M92
walkers (bits)Number of M62
walkers (bits)
6-mer 0 f 0 f 06-6 1 (500) f 1 (21) 2 (67 54)6-1-6 1 (200) f 1 (97) 2 (54 74)7-1-7 1 (100) f 1 (207) 2 (107 128)8-1-7 2 (100 1000) f 2 (76 202) f 2 (174 108)8-1-8 2 (100 1000) f 2 (76 230) f 2 (174 160)9-1-9 2 (50 100) f 2 (119 301) f 2 (208 194)2 (7-1-7) 2 (10 100) f 2 (166 257) 3 (61 208 143)feuA 1 (10) f 1 (223) 2 (103 81)dhbA 2 (10 100) f 2 (256 299) 3 (149 208 171)
1Oligos and observed Fur dimers are from (40) The Fur concentration(nM) at which one or two Fur dimer-DNA complexes appeared isgiven in parentheses2Number of sequence walkers of each model followed by theinformation (bits) of each walker are shown for each sequence Thepredictions that match the number of observed Fur dimers are markedwith a solid circle while those that do not are marked with an opencircle Only major walkers are counted A full depiction of all walkerscan be found in Supplementary Figure S7
6770 Nucleic Acids Research 2007 Vol 35 No 20
A B
L
H
H
L
H fhuF1
L fhuF2
yjjZ
4603880 4603870 4603860 4603850 4603840 4603830 4603820 5 a t c a g c a a t c c c g g c a g c a a c a c t c c c c a g c c a c t g c c c a g c g t a c g t t g c a a c a t g a t t t c a t c t c 3
4603810 4603800 4603790 4603780 4603770 4603760 4603750 5 t t t c a t t g a t a a t g a t a a c c a a t a t c a t a t g a t a a t t t t t a t c a t t t g c a a g c c a g a t a a a t c c c t t g c t a t c g g 3
Fur high affinity topFur high affinity bottom
M9 161 bits
M9 208 bits
M9 223 bits
M9 34 bits
fhuF
fhuF
4603740 4603730 4603720 4603710 4603700 4603690 4603680 5 g t a a a c c t a t c g c t a t g a t t a g c a a t c a t t a t c a t t t a g a t t a c t a t c c c g a t t a t g g c c t a t c g t t c 3rsquo
Fur low affinity topFur low affinity bottom
M9 22 bits
M9 01 bits
M9 146 bits
M9 93 bits
Figure 5 Fur model scan and DNase I footprints of the fhuF promoter region (A) The promoter region was scanned with the 24-site M9 model and two clusters of sequence walkers (Rigt 0bits) were detected with each closely corresponding to one of the two Fur protected regions (gray bars fhuF1 and fhuF2) The solid black arrow with two tails above the DNA at coordinates4603717 and 4603716 indicates the fhuF transcription starts The open arrows starting at coordinates 4603827 and 4603686 indicate the yjjZ and fhuF translation starts respectively Thesaturation of colored rectangles behind each sequence walker is proportional to the information of that site For reference colored lines cycle with 6-base periodicity (B) DNase I footprinting onthe fhuF promoter by Fur showed two regions protected by the protein marked in the figure by brackets The high-affinity and low-affinity regions are marked with H and L respectively Thefootprinting samples were run in parallel with MaxamndashGilbert sequencing ladders (marked by GA)
Nucleic
Acid
sResea
rch2007V
ol3
5N
o20
6771
consensus sequence Thus four of the five Sau3Afragments (yhhX gpmA nohA and fhuF) selected to haveFur binding sites in vivo were identified in our genomescan (Table 4) and our gel shift results confirmed this(Figure 3) the remaining site ygaC had a weak walker of14 bits (by the 11-site M9 model) and showed extremelyweak binding (Figure 3)Eick-Helmerich and Braun matched a Fur consensus
sequence to the promoters of exbB and exbD (50)One strong walker of 191 bits was found in the exbBpromoter region overlapping the lsquoFur boxrsquo The regionupstream of exbD showed no walkers with Ri greater than0 bits even though it had also been predicted to containa Fur binding site using the same consensus sequenceas used in the exbB promoter The highest walker thatwe could detect in the region upstream of the exbDgene is 71 bits (by the 24-site M9 model Figure 3 andSupplementary Figure S1) significantly lower than thetheoretical lower bound of a binding site (0 bits) (19) Thisindicates that Fur should not bind this region and our gelshift results confirmed this (Figure 3)A microarray analysis by McHugh et al found 101
genes to be regulated by Fur 53 of which were repressedand 48 activated (13) The 53 Fur-repressed genes belongto 32 transcription units (individual genes or operons)18 of which are involved in iron metabolism 14 in energymetabolism and other functions Using the 24-site M9model we predicted strong Fur binding clusters (higherthan 13 bits) for all 18 iron metabolism-related transcrip-tion units Of the other 14 transcription units onlytwo have a Fur binding site higher than 10 bits (nrdHIEF101 bits fimE 108 bits) The 48 Fur-activated genesbelong to 34 transcription units only three of these(garPLRK ynaE and ydfK) have a predicted Fur bindingcluster The garPLRK operon has a weak Fur clusterthat contains one major site of 104 bits The ynaE andydfK genes are almost identical in both promoter andcoding regions and the two Fur clusters (171 bitscentered at thorn8 from the translational start) are the same(Table 4) These results strongly suggest that in E coli
Fur is a direct repressor of iron metabolism genes and anindirect repressor or activator of other genes
McHugh et al (13) used an altered information theoryapproach for modeling Fur binding (63) to identify whichgenes are under direct Fur control Our model identifiedsites in all genes that their model did as well as fourothers fhuE (124 bits at 1160880) exbB (191 bits at3150076) fimE (108 bits at 4539697) and ydfK (171 bitsat 1631071)
The indirect activation by Fur requires an intermediateregulator which could directly target Fur-activated genesA small regulatory RNA gene ryhB is one such importantplayer Earlier study revealed several genes that aredirectly repressed by ryhB (57) Recently a microarrayanalysis of ryhB control by Masse et al (60) expanded thisdirect target set to 18 operons They also identified anadditional 10 indirectly down-regulated operons and 10up-regulated operons The 10 ryhB indirectly repressedoperons are directly repressed by Fur and for all ofthese operons we found strong Fur binding clusters(higher than 13 bits) in the corresponding promoterregions In contrast only a few of the other 28 ryhB-controlled operons (18 directly repressed operons and10 up-regulated operons) have a predicted Fur bindingcluster above 10 bits These include acnA (176 bits at1333484) ydhD (109 bits at 1732367) and oppA (215 bitsat 1298973) The acnA and oppA Fur clusters are bothstrong the former cluster has three overlapping sites of176 98 and 142 bits and the latter has two overlapp-ing sites of 215 and 96 bits (Table 4) Howevermicroarray analysis did not find these two genes to berepressed by Fur (1360) suggesting that the Fur bindingclusters may not be the primary control elements ofthese genes
DISCUSSION
Fur binding models
In this study we used experimentally proven sites to createinformation theory models of Fur binding Our models
Table 3 Scan of footprinted Fur binding sites in E coli
Accession Gene Footprinted region References Number Ri Percentage
NC_000913 fhuF1 4603763 ndash 4603815 This work 4 223 91NC_000913 fhuF2 4603680 ndash 4603733 This work 4 146 66NC_000913 fepA-fes 611868 ndash 611915 (35) 4 187 77NC_000913 fepB 623939 ndash 623969 (39) 2 167 81NC_000913 entC 624054 ndash 624102 (39) 4 231 76NC_000913 fur 709931 ndash 709960 (34) 2 123 83NC_000913 tonB 1309040 ndash 1309072 (838) 3 115 88NC_000913 cir 2244957 ndash 2244999 (32) 3 207 72NC_000913 sodA 4098726 ndash 4098780 (36) 3 233 56NC_000913 fecA 4514752 ndash 4514789 (33) 2 176 66NC_000913 fecI 4516300 ndash 4516340 (33) 7 231 144M10930 iucA 291 ndash 393 (37) 13 203 93NC_000913 fepD-entS 621394 ndash 621462 (58) 4 201 54
The footprinted regions were scanned with the 24-site M9 model (Rigt0) to test the modelrsquos validity This table lists the GenBank accession numberthe gene promoter that was footprinted the coordinates of the footprinted region the references for the footprints the number of walkers in theregion the strongest Ri value in the same region and the percent of the footprint covered by sequence walkers This table is a summary of Figure 5Aand Supplementary Figure S8 Sites in the promoters that are marked by an asterisk were used to build the 11-site models (Figure 1 SupplementaryFigure S1) in the case of iucA two sites were used in the model (see Materials and Methods Section)
6772 Nucleic Acids Research 2007 Vol 35 No 20
especially the 24-site M9 model approximate the bindingcharacteristics of Fur more fully than previous modelsthat depended on conventional consensus sequences datafrom multiple species and sequences that were not foot-printed (13251) The rigorous approach used hererevealed new binding sites disproved two sites predictedby a consensus sequence and clarified the manner inwhich the protein binds
To produce these models binding sequences weremultiply aligned to maximize the information content indifferent window sizes Because multiple Fur sites overlapour analysis of proven Fur binding sequences from three
bacterial species gave two or three different alignmentsdepending on the species (Figure 1 SupplementaryFigure S6) By testing these models against publisheddata we showed clearly that the M9 model representssingle-dimer binding sites the M12 model is for two-dimerbinding while the compact M6 alignment was caused bycompression of the binding sequences into a small window(Tables 1 and 2 Figures 1 and 2 Supplementary FiguresS3 S5 and S7)As evaluated by their respective M9 models our
initial set of 11 footprinted E coli Fur binding sitesall contain overlapping sites of 6-base spacing
Table 4 Strong Fur clusters in the E coli genome predicted by the 24-site M9 model
Ri Number Size Coordinate Genes Experimental Position References(bits) (bp) data
274 12 95 2066611 yoeA G 48 NA274 5 31 786853 gpmA G 35 (1351)253 4 28 579824 nohB 133 (1)252 4 31 3579054 ryhB G 15 (57)233 3 31 4098746 sodA F 87 (6069)231 7 59 4516309 fecI F 51 (1360)231 4 37 624072 entC F 36 (39)228 2 25 1903412 yebN 300 NA223 4 48 4603779 fhuFa FG 93 (135160)223 3 25 1634624 nohA G 233 (51)217 2 25 1886651 fadD G +1119b NA215 2 25 1298973 oppA G 233 (7071)213 3 31 1762463 sufA 53 (1360)207 3 31 2244980 cir F 189 (1372)206 5 40 1118358 yceJ 89 NA202 3 31 3465053 bfd 40 (1360)201 10 53 1787602 ydiE 35 (13)201 2 35 1080513 ycdN 57 (1360)199 8 106 3214572 yqjH 59 (13)199 7 103 1752756 ydhY 255 NA199 2 25 2510784 mntH 56 (7374)196 3 25 2231892 yeiT 163 NA196 2 25 1577358 yddA +8c (13)195 3 31 167439 fhuA G 45 (1360)194 4 37 621437 fepD ndash entS FG 25 86 (1375)193 3 52 675768 ybeQ +2c NA191 3 31 3150076 exbB G 70 (136072)187 8 96 2784036 ygaQ 383 NA187 4 37 611889 fepA ndash fes F 172 149 (13)186 7 38 3266404 yhaC 33 NA186 4 28 331901 yahA G +306e NA183 4 61 331085 yahA 510e NA176 6 41 1333484 acnA 371 (5776ndash78)176 2 25 4514779 fecA F 79 (1360)176 1 19 2254041 yeiL +5d NA175 3 25 490057 priC ndash ybaN 21 49 (7980)173 2 29 2454142 yfcV 474 NA172 8 41 3582772 yrhB 10 NA172 4 37 840877 fiu 123 (13)171 3 40 1631071 ydfK +8c (13)171 3 40 1432273 ynaE +8c (13)171 2 25 4570225 yjiT 212 NA
The strongest individual information value (Ri bits) in each cluster is shown followed by the number of walkers (Rigt 0 bits) the size of the clusterthe coordinate of the strongest site the genes that are presumably controlled experimental data footprinted (F) andor gel shifted (G) sites thelocation of the strongest Fur site relative to the gene start or stop codon (as noted) and references about Fur regulation of the genes (NA means noreference is available)aThis cluster corresponds to the high-affinity region (fhuF1) in the fhuF promoter (Figure 5)bThe coordinate is high because Fur cluster is inside the fadD coding regioncThese Fur clusters overlap with translational startsdThis Fur cluster overlaps with the stop codoneThe yahA gene has two Fur clusters one is inside the coding region another is 510 bases upstream of the translational start The Fur cluster insidethe gene was tested by gel shift experiments (Figure 3)
Nucleic Acids Research 2007 Vol 35 No 20 6773
(Supplementary Figure S8) while only 7 of the 20P aeruginosa Fur binding sites and 6 of the 22 B subtilisFur binding sites are single-dimer binding sites (data notshown) This scarcity of multiple sites explains why anM12 alignment can be obtained for E coli but not forP aeruginosa or B subtilis When we performed multiplealignment by only using overlapping sites of 6-bp spacingwe obtained M12 alignments for both P aeruginosa andB subtilis (data not shown) These observations alsosupport the use of M9 as a single-dimer modelOur multiple alignment method should apply to any
DNA binding sites that contain internal direct repeats asrepetition within binding sites may cause alternativealignments of the sites In the RegulonDB (httpregulondbccgunammx) (64) a Fur binding modelwas created by multiply aligning 47 Fur sites althoughit is not clear what these 47 sites are and how thealignment was made Based on their alignment (httpregulondbccgunammxhtmlmatrix_Alignmentjsp) we made a sequence logo (data notshown) which we found to be similar to our M6 logosAlthough the M6 models appear to be able to predictsome sites (eg Tables 1 and 2 Supplementary Figures S3S5 and S7) the models only represent the internal partof overlapping Fur binding sites ie from 6 to thorn6 of theM12 model (Figure 1A)A 18 A crystal structure for P aeruginosa Fur
was obtained by Pohl et al and based on this theauthors proposed a single-dimer binding model (9) TheP aeruginosa crystal structure contains a putative DNAbinding a-helix H4 and a loop between helices H1 and H2both of which were proposed to be involved in bindingDNA through contacting bases in the major groove Thusa single dimer would protect two consecutive majorgrooves on the same face of DNA and the minimalrecognition unit of Fur should be close to 20 bp (9) OurM9 binding site models (19 bp a 9-1-9 inverted repeat) fitwith this single-dimer binding model closely Pohl et alalso proposed a two-dimer binding model in whichtwo Fur dimers bind two overlapping sites that are 5-bpapart (9) However in our whole genome scans andfootprint scans as well we did not observe any case of5-bp separated overlapping sites above 5 bits Instead wemostly found 3- or 6-bp separated overlapping sitesAbove five bits we found 95 cases of 3-bp spacing and92 cases of 6-bp spacingSeveral other consensus-based Fur binding site models
have been proposed to interpret a 19-bp consensuslsquoFur boxrsquo (Figure 6) (65) Within these two earlier
models the classical model and hexamer model(573754) interpreted the lsquoFur boxrsquo as a single recogni-tion unit of a Fur dimer Later Lavrrar and McIntoshsuggested that a 13-bp inverted repeat (6-1-6) is theminimal unit recognized by a single Fur dimer and thattwo overlapping lsquo6-1-6rsquo motifs correspond to the Fur boxwhich is required for high-affinity binding of Fur (5558)Baichoo and Helmann reinterpreted the Fur box as twooverlapping 7-1-7 inverted repeats with a 6-bp spacingand suggested that a 7-1-7 site but not 6-1-6 representsthe minimal recognition unit of Fur (40) Baichoo andHelmann also demonstrated that high-affinity bindingby Fur can happen on a single-dimer binding site ie aninverted 7-1-7 site
The 7-1-7 model basically agrees with our M9 models(Figure 6) The main difference is that the 7-1-7 model didnot count the weakly conserved bases at positions 9 8thorn8 and thorn9 in the B subtilis Fur binding site alignment(Figure 4C) As the E coli and P aeruginosa M9 modelshave highly conserved bases at these four positions(Figure 4A and B) and also the Fur binding mechanismsfor these three species are similar (9) we suggest thata minimal Fur binding unit of 19 bp (9-1-9) as representedby our M9 models is more reasonable
Relative to our M9 models the Fur box only representsthe internal part of two overlapping sites that areseparated by 6 bases ie from 9 to thorn9 of the M12logo (Figures 1A and 6) thus any predictions based on theFur box may not be accurate Two sites (exbD and ygaC)that were predicted by matching to the Fur box wereshown to be non-binding (Figure 3)
Many difficulties in understanding Fur binding sites canbe attributed to the choice of consensus sequences asa model (66) In contrast to the individual informationweight matrix model the consensus sequence methodignores the varying importance of bases by treatingmismatches equivalently In addition the consensusmethod does not have a criterion for an acceptablenumber of mismatches
Sequence walkers predicted by the 24-site M9 modelcover 83 of the footprints (Table 3 Figure 5Supplementary Figure S8) Similar results were obtainedfor both the P aeruginosa and B subtilis M9 models(data not shown) These results strongly suggest that ourM9 models can accurately predict Fur binding
Fur Binding Site Clusters
The whole genome scan with the 24-site M9model revealed363 Fur binding clusters in E coli most of which containmultiple walkers (up to 14) covering regions up to about140 bases This is comparable to the size of knownfootprints which ranges from 30 to 103 bases (Table 3Supplementary Figure S8) Most of the clusters (303 outof 363 83) were found to overlap with one or morestrong s70 promoters Indeed the four Fur clusters inthe middle of coding regions gspC garP yahA and fadDthat we tested by gel mobility shift assays (Figure 3) allhave one or more potential s70 sites (Ri gt 4 bits) exactlyoverlapping them (see Figures S10ndashS13) The function ofthese control elements is unknown
Fur boxClassical model 9-1-9Hexamer model 6-6-1-6Lavrrar model 6-1-6
Baichoo model 7-1-7
M9 model 9-1-9
GATAATGATAATCATTATC
========gtOlt========
=====gt=====gtOlt=====
=====gtOlt==========gtOlt=====
t=====gtOlt============gtOlt=====a
aat=====gtOlt================gtOlt=====att
Figure 6 Different Fur binding models to interpret the Fur box
6774 Nucleic Acids Research 2007 Vol 35 No 20
We correlated our whole genome scan with publishedmicroarray data about Fur in E coli (1360) The resultsshowed that all operons that are involved in ironmetabolism have a predicted strong Fur binding cluster(higher than 13 bits) suggesting direct repression ofthese genes by Fur In contrast only a few of otherFur-activated or repressed operons have a predicted Furcluster higher than 10 bits suggesting indirect control ofthese genes by Fur It has been shown that RNA generyhB directly targets and represses a set of other genes andryhB itself is directly repressed by Fur thus ryhB playsan important role in indirect activation of genes by Furin E coli (135760)
It has been suggested that in N meningitidis Fur canalso directly activate genes by binding to the regionupstream of the promoters (59) However our wholegenome analysis did not support this activation mechan-ism in E coli The E coli ryhB gene is located betweengenes yhhX and yhhY No corresponding homologs of thegenes ryhB yhhX or yhhY can be found in N meningitidisThis suggests that Fur activation mechanisms may bedifferent in different species
Disentangling Fur models
Each binding site that we have studied in detail withinformation theory has had unique properties andchallenges for analysis For example ribosome bindingsites were initially modeled as rigid objects (16) but whenenough sites became available it was clear that theShinendashDalgarno sequence was being lost because it has avariable distance to the initiation codon (67) and a flexiblemodel was required (48) The flexible model was latersuccessfully applied to s70 promoters (47) and that wasused to study Fur sites Likewise information theoryanalysis revealed that Fis binding sites are commonlyfound to overlap each other (68) leading to someuncertainty as to how much nearby sites influence thesites being studied in the alignments Overlapping Fursites that are bound by Fur protein dimers create an evenmore complex situation
Although the 24-site M9 model was quite successful themodel is comprised of information from multiple Fur sitesbecause the Fur dimers overlap on the DNA Since theclusters frequently display 6-base separation of sitespositions 3 to thorn9 of the model contain data frompositions 9 to thorn3 of the adjacent Fur sites eg see thelogo of Figure 4 and the M9 walkers of Figure 2AIn addition information from sites separated by 3 basesand the complements of all of these also contribute to theoverall information content of the model It is not obvioushow to disentangle these effects We suggest that the M9model can be scanned across the genome to locate strongsingle-dimer binding sites which could then be experi-mentally confirmed and used to construct a disentangledmodel
Unfortunately the distribution of the individual infor-mation in these sites would be determined by the modelused to locate them and this could introduce biases Forexample the mean of the distribution can be altered justby selecting different subsets of sites from the distribution
A similar bias has occurred in binding site databaseswhere the consensus sequence was used to locate newsites (23) Thus although the sequence logo would nolonger contain self-overlapping data it would not neces-sarily represent the natural distribution of site strengthsTo approach such an ultimate model may require furtherscanning of footprinted regions Incorporation of theweaker sites in the footprints would of course re-entanglethe model so such scans might be best used only todetermine whether the model can detect the weaker sitesA viable solution may be to obtain all single-dimerbinding sites in the genome that have more than zero bitsand can be confirmed experimentally If the resultingmodel matches the observed footprints it may be consi-dered complete On the other hand if the resultingmodel does not match the footprints the mode of bindingby Fur could be different between single dimers andclusters
SUPPLEMENTARY DATA
Supplementary data are available at NAR Online
ACKNOWLEDGEMENTS
We thank Tom OrsquoHalloran and Caryn Outten for Furprotein Pete Rogan for helping to develop the local bestconcept Elaine Bucheimer for writing the localbestprogram and Lakshmanan Iyer Brent Jewett DanielleNeedle Xiao Ma Shu Ouyang Denise Rubens andBruce Shapiro for their helpful comments and discussionsKAL also thanks the National Cancer Institute atFrederick for sponsoring the Werner H Kirsten StudentIntern Program This publication has been funded in partwith Federal funds from the National Cancer InstituteNational Institutes of Health under contract NO1-CO-12400 The content of this publication does not necessarilyreflect the views or policies of the Department of Healthand Human Services nor does mention of trade namescommercial products or organizations imply endorsementby the US Government This research was supported inpart by the Intramural Research Program of the NIHNational Cancer Institute Center for Cancer ResearchFunding to pay the Open Access publication charges forthis article was provided by National Cancer Institute
Conflict of interest statement None declared
REFERENCES
1 StojiljkovicI BaumlerAJ and HantkeK (1994) Fur regulon ingram-negative bacteria Identification and characterization of newiron-regulated Escherichia coli genes by a fur titration assay[published erratum appears in J Mol Biol 1994 Jul 15240(3)271]J Mol Biol 236 531ndash545
2 Le CamE FrechonD BarrayM FourcadeA and DelainE(1994) Observation of binding and polymerization of Fur repressoronto operator-containing DNA with electron and atomic forcemicroscopes Proc Natl Acad Sci USA 91 11816ndash11820
3 CalderwoodSB and MekalanosJJ (1987) Iron regulation ofShiga-like toxin expression in Escherichia coli is mediated by thefur locus J Bacteriol 169 4759ndash4764
Nucleic Acids Research 2007 Vol 35 No 20 6775
4 NiederhofferEC NaranjoCM BradleyKL and FeeJA (1990)Control of Escherichia coli superoxide dismutase (sodA and sodB)genes by the ferric uptake regulation (fur) locus J Bacteriol 1721930ndash1938
5 EscolarL Perez-MartinJ and de LorenzoV (1998) Binding of theFur (Ferric Uptake Regulator) repressor of Escherichia coli toarrays of the GATAAT sequence J Mol Biol 283 537ndash547
6 BaggA and NeilandsJB (1987) Ferric uptake regulation proteinacts as a repressor employing iron (II) as a cofactor to bind theoperator of an iron transport operon in Escherichia coliBiochemistry 26 5471ndash5477
7 de LorenzoV WeeS HerreroM and NeilandsJB (1987)Operator sequences of the aerobactin operon of plasmid ColV-K30binding the ferric uptake regulation (fur) repressor J Bacteriol169 2624ndash2630
8 AlthausEW OuttenCE OlsonKE CaoH andOrsquoHalloranTV (1999) The ferric uptake regulation (Fur) repressoris a zinc metalloprotein Biochemistry 38 6559ndash6569
9 PohlE HallerJC MijovilovichA Meyer-KlauckeWGarmanE and VasilML (2003) Architecture of a protein centralto iron homeostasis crystal structure and spectroscopic analysis ofthe ferric uptake regulator Mol Microbiol 47 903ndash915
10 KammlerM SchonC and HantkeK (1993) Characterization ofthe ferrous iron uptake system of Escherichia coli J Bacteriol 1756212ndash6219
11 DesaiPJ AngererA and GencoCA (1996) Analysis of Furbinding to operator sequences within the Neisseria gonorrhoeae fbpApromoter J Bacteriol 178 5020ndash5023
12 HeidrichC HantkeK BierbaumG and SahlHG (1996)Identification and analysis of a gene encoding a Fur-likeprotein of Staphylococcus epidermidis FEMS Microbiol Lett 140253ndash259
13 McHughJP Rodriguez-QuinonesF Abdul-TehraniHSvistunenkoDA PooleRK CooperCE and AndrewsSC(2003) Global iron-dependent gene regulation in Escherichia coliA new mechanism for iron homeostasis J Biol Chem 27829478ndash29486
14 ShannonCE (1948) A mathematical theory of communicationBell Syst Tech J 27 379ndash423 623ndash656 httpcmbell-labscomcmmswhatshannondaypaperhtml
15 PierceJR (1980) An Introduction to Information Theory SymbolsSignals and Noise 2nd edn Dover Publications Inc New York
16 SchneiderTD StormoGD GoldL and EhrenfeuchtA (1986)Information content of binding sites on nucleotide sequencesJ Mol Biol 188 415ndash431 httpwwwccrnpncifcrfgov~tomspaperschneider1986
17 SchneiderTD and MastronardeD (1996) Fast multiple alignmentof ungapped DNA sequences using information theory and arelaxation method Discrete Appli Math 71 259ndash268 httpwwwccrnpncifcrfgov~tomspapermalign
18 SchneiderTD and StephensRM (1990) Sequence logos a newway to display consensus sequences Nucleic Acids Res 186097ndash6100 httpwwwccrnpncifcrfgov~tomspaperlogopaper
19 SchneiderTD (1997) Information content of individual geneticsequences J Theor Biol 189 427ndash441 httpwwwccrnpncifcrfgov~tomspaperri
20 SchneiderTD (1997) Sequence walkers a graphical method todisplay how binding proteins interact with DNA or RNAsequences Nucleic Acids Res 25 4408ndash4415 httpwwwccrnpncifcrfgov~tomspaperwalker erratum NAR 26(4)1135 1998
21 PaninaEM MironovAA and GelfandMS (2001) Comparativeanalysis of FUR regulons in gamma-proteobacteria Nucleic AcidsRes 29 5195ndash5206
22 SchneiderTD (2000) Evolution of biological informationNucleic Acids Res 28 2794ndash2799 httpwwwccrnpncifcrfgov~tomspaperev
23 VyhlidalCA RoganPK and LeederJS (2004) Development andrefinement of pregnane X receptor (PXR) DNA binding site modelusing information theory insights into PXR-mediated gene regula-tion J Biol Chem 279 46779ndash46786
24 ZhengM DoanB SchneiderTD and StorzG (1999) OxyR andSoxRS regulation of fur J Bacteriol 181 4639ndash4643 httpwwwccrnpncifcrfgov~tomspaperoxyrfur
25 HengenPN BartramSL StewartLE and SchneiderTD (1997)Information analysis of Fis binding sites Nucleic Acids Res 254994ndash5002 httpwwwccrnpncifcrfgov~tomspaperfisinfo
26 WoodTI GriffithKL FawcettWP Jair K-W SchneiderTDand WolfRE (1999) Interdependence of the position andorientation of SoxS binding sites in the transcriptional activation ofthe class I subset of Escherichia coli superoxide-inducible promotersMol Microbiol 34 414ndash430
27 ZhengM WangX DoanB LewisKA SchneiderTD andStorzG (2001) Computation-directed identification of OxyR-DNAbinding sites in Escherichia coli J Bacteriol 183 4571ndash4579
28 RoganPK FauxBM and SchneiderTD (1998) Informationanalysis of human splice site mutations Hum Mutat 12 153ndash171Erratum in Hum Mutat 199913(1)82 httpwwwccrnpncifcrfgov~tomspaperrfs
29 ChenZ and SchneiderTD (2005) Information theory basedT7-like promoter models classification of bacteriophages anddifferential evolution of promoters and their polymerasesNucleic Acids Res 33 6172ndash6187 httpwwwccrnpncifcrfgov~tomspaperst7like
30 ChenZ and SchneiderTD (2006) Comparative analysis of tandemT7-like promoter containing regions in enterobacterial genomesreveals a novel group of genetic islands Nucleic Acids Res 341133ndash1147 httpwwwccrnpncifcrfgov~tomspaperst7island
31 SchneiderTD (1996) Reading of DNA sequence logos predictionof major groove binding by information theory Meth Enzymol274 445ndash455 httpwwwccrnpncifcrfgov~tomspaperoxyr
32 GriggsDW and KoniskyJ (1989) Mechanism for iron-regulatedtranscription of the Escherichia coli cir gene metal-dependentbinding of Fur protein to the promoters J Bacteriol 1711048ndash1052
33 AngererA and BraunV (1998) Iron regulates transcription of theEscherichia coli ferric citrate transport genes directly and throughthe transcription initiation proteins Arch Microbiol 169 483ndash490
34 de LorenzoV HerreroM GiovanniniF and NeilandsJB (1988)Fur (ferric uptake regulation) protein and CAP (catabolite-activatorprotein) modulate transcription of fur gene in Escherichia coliEur J Biochem 173 537ndash546
35 HuntMD PettisGS and McIntoshMA (1994) Promoter andoperator determinants for fur-mediated iron regulation in thebidirectional fepA-fes control region of the Escherichia colienterobactin gene system J Bacteriol 176 3944ndash3955
36 TardatB and TouatiD (1993) Iron and oxygen regulation ofEscherichia coli MnSOD expression competition between the globalregulators Fur and ArcA for binding to DNA Mol Microbiol 953ndash63
37 EscolarL Perez-MartinJ and de LorenzoV (2000) Evidence ofan unusually long operator for the fur repressor in the aerobactinpromoter of Escherichia coli J Biol Chem 275 24709ndash24714
38 YoungGM and PostleK (1994) Repression of tonB transcriptionduring anaerobic growth requires Fur binding at the promoterand a second factor binding upstream Mol Microbiol 11943ndash954
39 BrickmanTJ OzenbergerBA and McIntoshMA (1990)Regulation of divergent transcription from the iron-responsivefepB-entC promoter-operator regions in Escherichia coliJ Mol Biol 212 669ndash682
40 BaichooN and HelmannJD (2002) Recognition of DNA by Fura reinterpretation of the Fur box consensus sequence J Bacteriol184 5826ndash5832
41 OchsnerUA and VasilML (1996) Gene repression by the ferricuptake regulator in Pseudomonas aeruginosa cycle selection of iron-regulated genes Proc Natl Acad Sci USA 93 4409ndash4414
42 BaichooN WangT YeR and HelmannJD (2002) Globalanalysis of the Bacillus subtilis Fur regulon and the iron starvationstimulon Mol Microbiol 45 1613ndash1629
43 ChristoffersenCA BrickmanTJ Hook-BarnardI andMcIntoshMA (2001) Regulatory architecture of the iron-regulatedfepD-ybdA bidirectional promoter region in Escherichia coliJ Bacteriol 183 2059ndash2070
44 ChaiS WelchTJ and CrosaJH (1998) Characterization of theinteraction between Fur and the iron transport promoter of thevirulence plasmid in Vibrio anguillarum J Biol Chem 27333841ndash33847
6776 Nucleic Acids Research 2007 Vol 35 No 20
45 EscolarL Perez-MartinJ and de LorenzoV (1998) Coordinatedrepression in vitro of the divergent fepA-fes promoters ofEscherichia coli by the iron uptake regulation (Fur) proteinJ Bacteriol 180 2579ndash2582
46 EscolarL de LorenzoV and Perez-MartinJ (1997)Metalloregulation in vitro of the aerobactin promoter of Escherichiacoli by the Fur (ferric uptake regulation) protein Mol Microbiol26 799ndash808
47 ShultzabergerRK ChenZ LewisKA and SchneiderTD (2007)Anatomy of Escherichia coli s70 promoters Nucleic Acids Res 35771ndash788 httpwwwccrnpncifcrfgov~tomspaperflexprom
48 ShultzabergerRK BucheimerRE RuddKE andSchneiderTD (2001) Anatomy of Escherichia coli ribosomebinding sites J Mol Biol 313 215ndash228 httpwwwccrnpncifcrfgov~tomspaperflexrbs
49 HiraoI KawaiG YoshizawaS NishimuraY IshidoYWatanabeK and MiuraK (1994) Most compact hairpin-turnstructure exerted by a short DNA fragment d(GCGAAGC) insolution an extraordinarily stable structure resistant to nucleasesand heat Nucleic Acids Res 22 576ndash582
50 Eick-HelmerichK and BraunV (1989) Import of biopolymers intoEscherichia coli nucleotide sequences of the exbB and exbD genesare homologous to those of the tolQ and tolR genes respectivelyJ Bacteriol 171 5117ndash5126
51 VassinovaN and KozyrevD (2000) A method for direct cloning ofFur-regulated genes identification of seven new Fur-regulated lociin Escherichia coli Microbiology 146 3171ndash3182
52 ToledanoMB KullikI TrinhF BairdPT SchneiderTD andStorzG (1994) Redox-dependent shift of OxyR-DNA contactsalong an extended DNA binding site a mechanism for differentialpromoter selection Cell 78 897ndash909
53 BlattnerFR PlunkettG III BlochCA PernaNT BurlandVRileyM Collado-VidesJ GlasnerJD RodeCK et al (1997)The complete genome sequence of Escherichia coli K-12 Science277 1453ndash1474
54 BaggA and NeilandsJB (1987) Molecular mechanism of regula-tion of siderophore-mediated iron assimilation Microbiol Rev 51509ndash518
55 LavrrarJL and McIntoshMA (2003) Architecture of a Furbinding site a comparative analysis J Bacteriol 185 2194ndash2202
56 OrchardK and MayGE (1993) An EMSA-based method fordetermining the molecular weight of a protein ndash DNA complexNucleic Acids Res 21 3335ndash3336
57 MasseE and GottesmanS (2002) A small RNA regulates theexpression of genes involved in iron metabolism in Escherichia coliProc Natl Acad Sci USA 99 4620ndash4625
58 LavrrarJL ChristoffersenCA and McIntoshMA (2002)FurndashDNA interactions at the bidirectional fepDGC-entS promoterregion in Escherichia coli J Mol Biol 322 983ndash995
59 DelanyI RappuoliR and ScarlatoV (2004) Fur functions as anactivator and as a repressor of putative virulence genes in Neisseriameningitidis Mol Microbiol 52 1081ndash1090
60 MasseE VanderpoolCK and GottesmanS (2005) Effect ofRyhB small RNA on global iron use in Escherichia coliJ Bacteriol 187 6962ndash6971
61 ArgamanL HershbergR VogelJ BejeranoG WagnerEGMargalitH and AltuviaS (2001) Novel small RNA-encodinggenes in the intergenic regions of Escherichia coli Curr Biol 11941ndash950
62 WassarmanKM RepoilaF RosenowC StorzG andGottesmanS (2001) Identification of novel small RNAs usingcomparative genomics and microarrays Genes Dev 15 1637ndash1651
63 RobisonK McGuireAM and ChurchGM (1998) A compre-hensive library of DNA-binding site matrices for 55 proteins appliedto the complete Escherichia coli K-12 genome J Mol Biol 284241ndash254
64 SalgadoH Santos-ZavaletaA Gama-CastroS Millan-ZarateDDiaz-PeredoE Sanchez-SolanoF Perez-RuedaEBonavides-MartinezC and Collado-VidesJ (2001) RegulonDB(version 32) transcriptional regulation and operon organization inEscherichia coli K-12 Nucleic Acids Res 29 72ndash74
65 AndrewsSC RobinsonAK and Rodriguez-QuinonesF (2003)Bacterial iron homeostasis FEMS Microbiol Rev 27 215ndash237
66 SchneiderTD (2002) Consensus Sequence Zen ApplBioinformatics 1 111ndash119 httpwwwccrnpncifcrfgov~tomspaperszen
67 RuddKE and SchneiderTD (1992) Compilation of E coliribosome binding sites In MillerJH (ed) A Short Course inBacterial Genetics A Laboratory Manual and Handbook forEscherichia coli and Related Bacteria Cold Spring HarborCold Spring Harbor Laboratory Press New York pp 1719ndash1745
68 HengenPN LyakhovIG StewartLE and SchneiderTD(2003) Molecular flip-flops formed by overlapping Fis sitesNucleic Acids Res 31 6663ndash6673
69 FeeJA (1991) Regulation of sod genes in Escherichia colirelevance to superoxide dismutase function Mol Microbiol 52599ndash2610
70 IgarashiK SaishoT YuguchiM and KashiwagiK (1997)Molecular mechanism of polyamine stimulation of thesynthesis of oligopeptide-binding protein J Biol Chem 2724058ndash4064
71 YoshidaM MeksuriyenD KashiwagiK KawaiG andIgarashiK (1999) Polyamine stimulation of the synthesis ofoligopeptide-binding protein (OppA) Involvement of a structuralchange of the Shine-Dalgarno sequence and the initiation codonaug in oppa mRNA J Biol Chem 274 22723ndash22728
72 LitwinCM and CalderwoodSB (1993) Role of iron in regulationof virulence genes Clin Microbiol Rev 6 137ndash149
73 MakuiH RoigE ColeST HelmannJD GrosP andCellierMF (2000) Identification of the Escherichia coli K-12Nramp orthologue (MntH) as a selective divalent metal iontransporter Mol Microbiol 35 1065ndash1078
74 PatzerSI and HantkeK (2001) Dual repression by Fe(2+)-Furand Mn(2+)-MntR of the mntH gene encoding an NRAMP-likeMn(2+) transporter in Escherichia coli J Bacteriol 1834806ndash4813
75 FurrerJL SandersDN Hook-BarnardIG and McIntoshMA(2002) Export of the siderophore enterobactin in Escherichia coliinvolvement of a 43 kDa membrane exporter Mol Microbiol 441225ndash1234
76 CunninghamL GruerMJ and GuestJR (1997) Transcriptionalregulation of the aconitase genes (acnA and acnB) of Escherichiacoli Microbiology 143 ( Pt 12) 3795ndash3805
77 FuentesAM Diaz-MejiaJJ Maldonado-RodriguezR andAmabile-CuevasCF (2001) Differential activities of theSoxR protein of Escherichia coli SoxS is not required for geneactivation under iron deprivation FEMS Microbiol Lett 201271ndash275
78 VargheseS TangY and ImlayJA (2003) Contrasting sensitivitiesof Escherichia coli aconitases A and B to oxidation and irondepletion J Bacteriol 185 221ndash230
79 ZavitzKH DiGateRJ and MariansKJ (1991) The PriB andPriC replication proteins of Escherichia coli Genes DNAsequence overexpression and purification J Biol Chem 26613988ndash13995
80 SandlerSJ (2005) Requirements for replication restart proteinsduring constitutive stable DNA replication in Escherichia coli K-12Genetics 169 1799ndash1806
81 SchneiderTD (2001) Strong minor groove base conservation insequence logos implies DNA distortion or base flipping duringreplication and transcription initiation Nucleic Acids Res 294881ndash4891 httpwwwccrnpncifcrfgov~tomspaperbaseflip
Nucleic Acids Research 2007 Vol 35 No 20 6777
Because Fur binds as a dimer (954) the model includedboth the footprinted Fur sequences and their comple-ments The sequences were aligned using the programmalign which maximizes the information content within aregion of the alignment (a lsquowindowrsquo) by shuffling the
sequences back and forth (17) Multiple alignmentwith a window size of (12 thorn12) gave a sequencelogo that shows base conservation in the range of(12 thorn12) (Figure 1A) For convenience we named thislogo M12
A
B
C
M12208 bits(minus12 +12)
M9181 bits(minus9 +9)
M6154 bits(minus6 +6)
===========gtolt===========
========gtolt========
=====gtolt=====
========gtolt================gtolt========
=====gtolt==========gtolt=====
0
1
2
bit
s
5prime
minus20
GTAC
minus19
minus18
CGTA
minus17
CGTA
minus16
GACT
minus15
ACGT
minus14
GTCA
minus13
minus12
GTA
minus11
GTA
minus10
GACT
minus9CTAG
minus8
TCGA
minus7
GCTA
minus6
TCA
minus5
TA
minus4
CT
minus3
TAG
minus2
CGA
minus1
CGT
0
TA
1
GCA
2
GCT
3
ATC
4
GA
5
AT
6
GAT
7
GCTA
8
AGCT
9
GATC
10
CTGA
11
CAT
12
CAT
13 14
CAGT
15
TGCA
16
CTGA
17
GCAT
18
GCAT
19 20
CATG
3prime
0
1
2
bit
s
5prime
minus20
CGTA
minus19
minus18
minus17
GTCA
minus16
GCAT
minus15
GCTA
minus14
GTA
minus13
GACT
minus12
CAGT
minus11
CGA
minus10
GCAT
minus9
TA
minus
minus8
GTA
minus7
GCAT
minus6
CAG
minus5
CA
minus4
GAT
minus3
TA
minus2
GCA
minus1
GCT
0 1
CGA
2
CGT
3
AT
4
CTA
5
GT
6
GTC
7
GCTA
8CAT
9AT10
CGTA
11
GACT
12
GTCA
13
TCGA
14
CAT
15
CGAT
16
CGTA
17
CAGT
18 19 20
GCAT
3prime
0
1
2
bit
s
5prime minus20
GCA
minus19
GCAT
minus18
GCTA
minus
minus17
GCTA
minus16
GCAT
minus15
CAGT
minus14
TGCA
minus13
GTAC
minus12
GCTA
minus11
GTA
minus10
GACT
minus9
CAG
minus8
CTGA
minus7
GCAT
minus6
A
minus5
TA
minus4
CT
minus3
TCAG
minus2
CGA
minus1
AGT
0
TA
1
TCA
2
GCT
3
GATC
4
GA
5
AT
6
T7
GCTA
8
GACT
9
GATC
10
TCGA
11
CAT
12
GCAT
13
CTGA
14
ACGT
15
GTCA
16
GTCA
17
GCAT
18
GCAT
19
CGTA
20
CAGT
3prime
Figure 1 Different alignments of footprinted E coli Fur binding sites Sequence alignments were done using the program malign with differentwindow sizes [from (15 +15) to (5 +5)] (17) on 11 footprinted E coli Fur binding sequences and their complements (see Supplementary FigureS1 for sequences) Three classes of alignments M12 (A) M9 (B) and M6 (C) were obtained by using window sizes of (12 +12) (9 +9) and(6 +6) respectively In these logos the height of each letter is proportional to the frequency of that base at each position and the height of theletter stack is the conservation in bits (18) The dashed sine wave on each logo represents the 106 base helical twist of B-form DNA (3181) Thedouble dashed arrows on the top of each logo mark the inverted repeats in each model Note that the M12 logo corresponds to two overlapping M9sites as indicated by the two red double dashed arrows below the M12 logo A similar relationship occurs between the M9 and M6 logos
Nucleic Acids Research 2007 Vol 35 No 20 6765
The M12 logo appears to consist of two overlappingdirect repeats one is from 12 to thorn6 and the other from6 to thorn12 Because the multiple alignment process canincorporate one or the other of these repeats this suggeststhat different alignments may be obtained when otherwindows are used to align the sequences Because two Furdimers can bind overlapping sites with 6-base separation(4055) we then multiply aligned the sequences by usingwindow sizes ranging from (15 thorn15) to (5 thorn5) Thesesizes should cover the sites recognized by one or twoFur dimers (9) A total of seven different alignments wereobtained the corresponding sequence logos are shown inSupplementary Figure S2 According to the patternsshown in the logos the seven alignments that wereobtained can be further consolidated into and assignedto three basic alignment classes these alignment classesare represented by logos M12 M9 and M6 (Figure 1)Each of the three alignments itself is an inverted repeatthe M12 logo also appears to consist of two overlappingM9 sites (from 12 to thorn6 and from 6 to thorn12) and theM9 logo looks like two overlapping M6 sites (from 9to thorn3 and from 3 to thorn9) (Figure 1)To determine which of the three alignment classes best
represents a single-dimer binding model some publishedexperimental data was analyzed Lavrrar and McIntosh(55) used the Ferguson method (56) to determine howmany E coli Fur dimers can bind on five synthetic oligosand two natural sites fepB and entS We applied the threemodels to their sequences and used sequence walkersto show the predicted sites of each model A full list ofthe predictions (with walkers gt0 bits) is given inSupplementary Figure S3 The results show that themodels frequently found overlapping sites that areseparated by 3 or 6 bases The sites that are 6 basesapart can both be strong while 3-base separated siteshave one strong site and one weak site (SupplementaryFigure S3) Furthermore it has been shown that Furdimers can simultaneously bind two overlapping sites with6-bp spacing (40) but there is no experimental datashowing that 3-base separated overlapping sites can bebound at the same time Therefore to analyze the resultswe only counted major walkers (strong walkers that areseparated by 6 bases) for each model in each oligo
A summary of the results (with major walkers) is availablein Table 1 two examples of the prediction (with majorwalkers) are shown in Figure 2A and B The results showthat the M9 model made correct predictions for five ofthe seven oligos while the M12 and M6 models bothmade only two correct predictions Furthermore the M12model always predicted one less site than the M9 modelThese results suggest that the M9 model may be a single-dimer binding model the M12 model may representa two-dimer binding model while the M6 alignment maybe just a result of compression of the binding sequencesinto a small alignment window
Confirmation of 13 predicted sites and model refinement
Since our E coli Fur model had only 11 sites we wantedto refine the model by including more sites A genome scanwith the 11-site M9 model revealed 389 sites above 11 bits(the lowest information of a site in the model is 111 bits)Within these 13 sites (from 12 to 26 bits) were selectedfor experimental confirmation (Figure 3 SupplementaryTable S1 and Figure S1) These include eight sites locatedin promoter regions [yoeA fepD-entS (43) gpmA (= pgm)(51) ryhB (= yhhX-yhhY) (5157) fhuA (13) nohA (51)oppA (13) and fhuF (51)] and four sites inside genes(gspC garP yahA and fadD) We also included two sitesthat were predicted by consensus models to be in ygaCand exbD (5051) since our model only detected a weaksite (14 bits) in ygaC and a negative site (50 bits) inexbD (Figure 3) Gel mobility shift assays showed that all13 sites predicted by the M9 model shifted when incubatedwith Fur protein while the exbD site did not shift and theygaC site only showed extremely weak binding (Figure 3)
McHugh et al (13) found that both exbB and exbDwere regulated by Fur but their model did not detect abinding site in the exbB promoter while our informationtheory model successfully predicted that Fur binding siteand our gel shift results confirmed this (Figure 3)The exbD gene is located only seven bases downstreamof the exbB gene Both genes have the same orientationso they may belong to a single operon cotranscribed fromthe exbB promoter and coregulated by Fur
To strengthen our model we added the 13 confirmedsites to the 11 previously footprinted sites (Figure 4A)
Table 1 Information theory test of 11-site E coli Fur models M12 M9 and M6
Oligo1 Number of observed1
Fur dimers (nM)Number of M122
walkers (bits)Number of M92
walkers (bits)Number of M62
walkers (bits)
F-F 1 (25000) f 1 (55) 2 (90 94) f 1 (115)F-F-F 2 (3610) 1 (148) f 2 (118 153) f 2 (119 99)F-F-F-F 4 (630) 2 (166 192) 3 (118 189 153) 3 (119 119 99)F-F-x-R 2 (095) f 2 (221 43) f 2 (177 176) 1 (204)F-F-x-R-R 3 (078) 2 (296 227) f 3 (177 275 108) 2 (204 164)fepB 2 (860) 1 (157) f 2 (109 82) 1 (152)entS 3 (301) 2 (156 172) f 3 (75 172 95) 2 (83 122)
1Oligos and observed Fur dimers are from (55) The apparent Kd is also given (nM in parentheses) for each oligo2Number of sequence walkers of each model followed by the information (bits) of each walker are shown for each sequence The predictions thatmatch the number of observed Fur dimers are marked with a solid circle while those that do not are marked with an open circle Only majorwalkers (strong walkers with 6-bp separation) are counted (see the text) A full depiction of all walkers (gt0 bits) can be found in SupplementaryFigures S3 and S5
6766 Nucleic Acids Research 2007 Vol 35 No 20
As with the 11 sites multiple alignment of the 24 sitesusing different window sizes also gave three classes ofalignments M12 M9 and M6 (Supplementary Figure S4)When these models were tested against the same set ofsequences as that used to test the 11-site models (Table 1Supplementary Figure S3) similar but cleaner resultswere obtained ie the weak walkers that were not countedpreviously become weaker (some are even below 0 bits)but the strong walkers still have similar Ri values(Supplementary Figure S5) This confirms that ourcounting of major walkers predicted by the initial 11-sitemodels was reasonable
P aeruginosa and B subtilis Fur models
To compare E coli Fur models with other bacterialFur models we also built P aeruginosa and B subtilis Furbinding site models As with E coli Fur binding siteswe applied the same method to align footprinted Furbinding sites from P aeruginosa (41) and B subtilis (42)For both bacteria we obtained two classes of alignments
(Supplementary Figure S6) which correspond to theE coli M9 and M6 logos (Figure 1B and C) respectivelyNo M12 alignments were obtained for these two bacteriaprobably because some sites used to make the modelsare single-dimer binding sites (see Discussion Section)Both P aeruginosa and B subtilis M9 models arehighly similar to the E coli M9 model (Figure 4) TheP aeruginosa M6 logo looks almost identical to theE coli M6 logo while the B subtilis M6 logo is lesssimilar to the E coli M6 logo (Figure 1C SupplementaryFigure S6B and D)As with Lavrrar and McIntoshrsquos work on E coli Fur
binding sites (55) Baichoo and Helmann (40) performedsimilar experiments on eight synthetic oligos and twonatural sites (feuA and dhbA) with B subtilis Fur anddetermined how many Fur dimers can bind each of theseoligos We tested the B subtilis M9 and M6 modelswith these 10 sequences (Table 2 Figure 2C and Dand Supplementary Figure S7) The results show that theM9 model made correct predictions for all 10 sequenceswhile the M6 model was correct for only four oligos
A
B
fepB binds 2 Fur dimers
entS binds 3 Fur dimers
C
D
feuA binds 1 Fur dimer
dhbA binds 2 Fur dimers
10 20 30 5 g g a a t t c g a a a a t g a g a a g c a t t a t t g g a t c c c g 3
M12 157 bits
M9 109 bits
M9 82 bits
M6 152 bits
10 20 30 40 5 g g a a t t c g a t a a t g a a a t t a a t t a t c g t t a t c g g a t c c c g 3
M12 156 bits
M12 172 bits
M9 75 bits
M9 172 bits
M9 95 bits
M6 83 bits
M6 122 bits
10 20 30 5 a t t c c a a t t g a t a a t a g t t a t c a a t t g a a c a 3
M9 223 bits
M6 103 bits
M6 81 bits
10 20 30 5 a t t g a t a a t g a t a a t c a t t a t c a a t a g a t t g 3
M9 256 bits
M9 299 bits
M6 149 bits
M6 208 bits
M6 171 bits
Figure 2 Sequence walkers to test E coli (A and B) and B subtilis (C and D) Fur models (A and B) The three E coli Fur models (M12 M9 andM6) were tested with two E coli Fur binding sites fepB and entS (55) Sequence walkers of each model (green M12 red M9 blue M6) are shownfor each site (C and D) Two B subtilis Fur models (M9 and M6) were tested with two B subtilis Fur binding sites feuA and dhbA (40) Gray barsindicate natural sequence surrounded by synthetic DNA Different colored rectangles indicate different Fur models (by hue green M12 red M9blue M6) and their strengths (by saturation) (30) Only major walkers are shown in these figures for a full list of walkers (Rigt0 bits) please seeSupplementary Figures S3 S5 and S7
Nucleic Acids Research 2007 Vol 35 No 20 6767
For most of the oligos the M6 model predicted one moresite than the M9 model did Similar to our observationsin E coli these results strongly suggest that the M9 modelis a single-dimer binding model while the M6 alignment isdue to compression of the binding sequencesRelative to E coli Fur (NP_415209) the P aeruginosa
Fur (NP_253452) is moderately diverged (58 identical inprotein sequence) and the B subtilis Fur (NP_390233) isdistantly diverged (33 identity) However a comparisonof the three M9 models from these three species showsthat these models are highly similar to each other(Figure 4) suggesting that the FurndashDNA recognitionmechanism is conserved in even distantly related species
Scans of footprinted regions
Gel shift experiments cannot give precise Fur bindingregions Originally using the 11-site M9 model and laterusing the 24-site M9 E coli Fur model we predicted Fursites in the fhuF promoter region and then to validate ourmodel we performed DNase I footprinting on that regionTo further confirm the 24-site M9 model we also scannedit across all other available footprinted E coli Fur bindingregions and correlated our predictions with the publishedfootprints Fur has been documented to exhibit lsquosecondaryfootprintingrsquo protecting extended regions of DNA underhigher protein concentrations (Supplementary Figure S8)(333646) To reveal the extended regions we used a cutoffat zero bits since that is the theoretical lowest informationfor binding (19)The E coli 24-site M9 model predicts two separated
Fur binding clusters in the fhuF promoter region onecluster contains three strong sites of 161 208 and 223bits and the other contains two weaker sites of 146 and
93 bits (Figure 5A) The fhuF oligo used in the gel shiftexperiments in Figure 3 only contains the strong Furcluster To determine if both predicted clusters are pro-tected by Fur we performed DNase I footprinting on alarger fhuF promoter region (see Materials and MethodsSection) that contains both predicted Fur binding clustersThe results show that there are indeed two Fur-protectedregions in the fhuF promoter one has high affinity withFur (protected by 50 nM of Fur) and the other hasa lower affinity (weakly protected by 50 nM of Fur)(Figure 5B) The predicted strong Fur binding clustercovers 91 of the high-affinity footprinted region and theweak Fur cluster covers 66 of the low-affinity region(Table 3 Figure 5A)
There are 11 other footprinted E coli Fur bindingregions including the 10 regions used to build the initial11-site models and the fepD-entS promoter region (58)When these footprints were scanned similar coveragewas obtained (Table 3 Supplementary Figure S8)All 11 footprinted sequences displayed clusters of multipleoverlapping Fur walkers in these clusters the majorityof the walkers were separated by six bases some wereseparated by three bases (Table 3 SupplementaryFigure S8) For 7 of the 11 regions (fepA-fes fepB entCfur tonB cir and iucA) sequence walkers cover 83(72ndash93) of the footprints No sequence walkers coverthe secondary footprints in the fecA fepD-entS and sodAregions resulting in low coverage of these three footprints(66 54 and 56 respectively) The secondary sodAfootprint was only protected with a high concentration ofpurified Fur (200 and 500 nM) it was not protected bycrude extracts containing overproduced Fur (36) In theother region fecI several low-information content
A
B
yoeA fepD-entS gpmA ryhB fhuA nohA oppA
bits 257 172 243 220 161 205 187 11-site M9274 194 274 252 195 223 215 24-site M9
Fur minus + minus + minus + minus + minus + minus + minus +
gspC garP yahA fadD ygaC exbB exbD fhuF
bits 122 122 163 197 14 175 minus50 214 11-site M9144 138 186 217 minus34 191 minus71 223 24-site M9
Fur minus + minus + minus + minus + minus + minus + minus + minus +
Figure 3 Gel mobility shift assays to test predicted Fur binding sites Oligonucleotides containing predicted Fur binding sites were incubated without() or with (+) 150 nM E coli Fur protein and gel electrophoresed to test the 11-site M9 model (Figure 1B) Below each set of lanes is the strengthin bits of the strongest sequence walker found on each oligo by the 11-site M9 model Predictions by the 24-site M9 model which includes 13 of thesites tested here (not ygaC or exbD) are also given for comparison (A) The first set of oligos contain seven predicted Fur sites located in promoterregions (yoeA fepD-entS gpmA ryhB fhuA nohA and oppA) (B) The second set of oligos contain four predicted Fur sites located within genes(gspC garP yahA and fadD See Supplementary Figures S10ndashS13 for sequence walkers) two sites located in promoter regions (exbB and fhuF)and two consensus-predicted sites (ygaC and exbD) (5051) As expected all oligos that contain predicted sites (by the 11-site M9 model) exhibit oneor more mobility shifts following incubation with 150 nM Fur the oligo ygaC (14 bits) shows extremely weak binding and exbD (50 bits)does not shift
6768 Nucleic Acids Research 2007 Vol 35 No 20
walkers (02 and 08 bits) extend past the range of thefootprint resulting in 144 coverage of the footprint
In a previously published study Escolar et al synthe-sized oligonucleotides that contained repeats of thesequence GATAAT and determined Fur binding byfootprint experiments (5) No footprint was seen withone insert correspondingly no sequence walkers wereobserved in that sequence (Supplementary Figure S9)For the oligos with three four and five inserts sequencewalkers cover 83ndash90 of the footprints The 2-insertsequence had a weak interaction with Fur at high protein
concentrations and two overlapping walkers of 30 and42 bits appeared which cover the repeats (5) Theseresults demonstrate that the Fur model accurately predictsFur binding on both natural and synthetic sequences
Whole genome scan
The lowest information of any site in the 24-site M9 modelis 101 bits (Supplementary Figure S1) Since all sites inthe model are proven binding sites we used 10 bits asthe cutoff for a whole genome scan This presumably
A
B
C
======gtolt======
======gtolt======
======gtolt======
24 E coli Fur binding sites 196 bits (minus9 +9)
0
1
2b
its
5prime
minus20
CTGA
minus19
minus18
minus17
minus16
GCAT
minus15
GCTA
minus14
CGTA
minus13
GACT
minus12
CAGT
minus11
CGTA
minus10
GCAT
minus9
GTA
minus8
GTA
minus7
GCAT
minus6
CAG
minus5
GTCA
minus4
GAT
minus3
TA
minus2
TGCA
minus1
GACT
0 1
TCGA
2
ACGT
3
AT
4
TCA
5
CGAT
6
GTC
7
CTGA
8
CAT
9
CAT
10
CGTA
11
GCAT
12
GTCA
13
CTGA
14
GCAT
15
CGAT
16
CGTA
17 18 19 20
GCAT
3prime
20 P aeruginosa Fur sites 186 bits (minus9 +9)
0
1
2
bit
s
5prime
minus20
minus19
minus18
minus17
minus16
minus15
minus14
CGTA
minus13
ACGT
minus12
minus11
minus10
CGTA
minus9
CGTA
minus8
CTA
minus7
CT
minus6
TAG
minus5
GTCA
minus4
CAGT
minus3
TA
minus2
CTA
minus1
CT
0
GC
1
GA
2
GAT
3AT
4GTCA
5
CAGT
6
ATC
7
GA
8
GAT
9
GCAT
10
GCAT
11 12 13
TGCA
14
GCAT
15 16 17 18 19 20 3prime
22 B subtilis Fur sites 229 bits (minus9 +9)
0
1
2
bit
s
5prime
minus20
minus19
GCAT
minus18
GCTA
minus17
GACT
minus16
CTGA
minus15
minus14
GCAT
minus13
GCAT
minus12
minus11
GCAT
minus10
GCTA
minus9
CGTA
minus8
CGTA
minus7
AT
minus6
C
G
minus5
GCA
minus4
CGAT
minus3
TGA
minus2
TGA
minus1
GCAT
0
TAGC
1
GCTA
2
CAT
3
CAT
4
GCTA
5
CGT
6
G
C7
TA
8
GCAT
9
GCAT
10
CGAT
11
CGTA
12 13
CGTA
14
CGTA
15 16
GACT
17
CTGA
18
CGAT
19
CGTA
20 3prime
Figure 4 Comparison of M9 Fur models of three bacterial species Experimentally proven Fur binding sites from E coli (A) P aeruginosa (B) andB subtilis (C) were used to build the models for these three bacteria The marked region from 7 to +7 in each logo indicates the 7-1-7 model thatwas proposed by Baichoo and Helmann (40)
Nucleic Acids Research 2007 Vol 35 No 20 6769
minimizes false positives in our predictions When themodel was scanned across the E coli K-12 genome(NC_000913) a total of 412 sites were found above10 bits These sites belong to 363 unique clusters A list ofthe clusters is available at httpwwwccrnpncifcrfgovtomspapersfurThe 363 regions were extracted from 50 to thorn50
around the strongest site in each region and scanned withthe 24-site M9 model using a cutoff of 0 bits We thendetermined the exact region of each cluster that is coveredby walkers greater than 0 bits The results show that theseclusters contain 1ndash14 walkers covering a region from19ndash142 bases (350 bases on average) This is comparableto the known footprints (30ndash103 bases) (Table 3 Supple-mentary Figure S8) Of these clusters 155 are in inter-genic regions 143 are entirely inside gene coding regionsand 65 overlap the boundary between coding and inter-genic regionsTo determine how many of the Fur clusters overlap
with reasonably strong promoters we scanned these 363Fur clusters with a s70 promoter model (47) using a 4-bitcutoff The results showed that 303 clusters overlap withone or more strong promoters (Rigt4 bits) The other60 clusters do not overlap with a strong promoterWhen we scanned the flanking regions of these 60 clusters(50 to 0 of the left edge and 0 to thorn50 of the right edgeof each cluster) only 23 clusters were found to havestrong promoters (Ri gt 4 bits) in the flanking regionsWe examined the locations of Fur sites relative toknown promoter elements (transcriptional start site 10and 35) and found that they are more tightly distributedaround the promoter elements than observed with theDNA binding proteins Fis H-NS and IHF (47) (data notshown) These results suggest that Fur may mainly act as adirect repressor of genes by binding to and blockingthe promoters and that direct activation by Fur throughbinding to the region upstream of a promoter assuggested in Neisseria meningitidis by Delany et al (59)
may not be a common mechanism of Fur regulationin E coli
Out of the 363 clusters 42 were found containingat least one predicted Fur binding site above 170 bits(our arbitrary cutoff used to locate significant regions)(Table 4) Of the 42 clusters 39 were found to overlap withone or more strong s70 promoters (gt4 bits) The otherthree clusters fepD-entS yddA and fecA do not overlapScanning the flanking regions of these three clusters withthe s70 promoter model revealed that these Fur clustersare located between the respective promoters and transla-tional starts There may be only weak repression of thesegenes by Fur as none of them were significantly repressedby Fur according to microarray analyses (1360)
Scans of other proposed Fur-regulated genes
Many genes have been proposed to be Fur-regulated bycomparing conventional consensus sequences to promoterregions and also by homology to systems in other organ-isms Kammler et al annotated two Fur binding sites inthe promoter region of feoA in E coli by comparison toa consensus sequence (10) The 24-site M9 model predictstwo clusters of sites in the same region with each clustercovering one of the marked Fur boxes One cluster hasone major site of 105 bits (at 3538006) the other clusterhas two major overlapping sites of 109 (at 3538059) and134 bits (at 3538065) respectively The same authors haveconfirmed that Fur does bind to this region in vivo but theexact Fur binding regions have not been determined byfootprint experiments
Vassinova and Kozyrev used an in vivo selection tolocate Fur sites on Sau3A fragments from the E coligenome (51) The five regions from Figure 2 of their paperwere analyzed using sequence walkers Using an unidenti-fied consensus sequence yhhX (from 3578828 to 3577791)was predicted by Vassinova and Kozyrev to have two Furbinding sites In the same region we found one strongsite of 252 bits (at 3579054) (Table 4) overlapping twoweaker sites (82 and 83 bits) This cluster of Fur sites isactually located right in front of an sRNA gene ryhB(from 3579039 to 3578946) (Table 4) (6162) which hasbeen shown to be repressed by Fur (57) The promoterregion of gpmA (from 786818 to 786066 named pgm inreference 51) was predicted by consensus to contain twosites One of the highest sites (274 bits) was found in thisregion (at 786853) overlapping two weaker sites of 75and 56 bits The consensus sequence-predicted Fur site byVassinova and Kozyrev (51) is located about 600 basesupstream of the ygaC gene In the same region our 11-siteM9 model only found a 14-bit site (Figure 3) and the24-site M9 model found a negative site of 34 bits(Supplementary Figure S1) Instead in the nearby regionsthe 24-site M9 model found a 74-bit site (11 basesupstream of ygaC) and a 101-bit site (728 bases upstreamof ygaC) For nohA (from 1634391 to 1633822)two overlapping sites of 223 and 126 bits were found at1634624 and 1634630 (Table 4) The fifth region was fhuF(from 4603686 to 4602898) Figure 5 shows the DNase Ifootprints and predicted walkers in this region Onlythe high-affinity region (fhuF1) was identified by the
Table 2 Information theory test of B subtilis Fur models M9 and M6
Oligo1 Number of observed1
Fur dimers (nM)Number of M92
walkers (bits)Number of M62
walkers (bits)
6-mer 0 f 0 f 06-6 1 (500) f 1 (21) 2 (67 54)6-1-6 1 (200) f 1 (97) 2 (54 74)7-1-7 1 (100) f 1 (207) 2 (107 128)8-1-7 2 (100 1000) f 2 (76 202) f 2 (174 108)8-1-8 2 (100 1000) f 2 (76 230) f 2 (174 160)9-1-9 2 (50 100) f 2 (119 301) f 2 (208 194)2 (7-1-7) 2 (10 100) f 2 (166 257) 3 (61 208 143)feuA 1 (10) f 1 (223) 2 (103 81)dhbA 2 (10 100) f 2 (256 299) 3 (149 208 171)
1Oligos and observed Fur dimers are from (40) The Fur concentration(nM) at which one or two Fur dimer-DNA complexes appeared isgiven in parentheses2Number of sequence walkers of each model followed by theinformation (bits) of each walker are shown for each sequence Thepredictions that match the number of observed Fur dimers are markedwith a solid circle while those that do not are marked with an opencircle Only major walkers are counted A full depiction of all walkerscan be found in Supplementary Figure S7
6770 Nucleic Acids Research 2007 Vol 35 No 20
A B
L
H
H
L
H fhuF1
L fhuF2
yjjZ
4603880 4603870 4603860 4603850 4603840 4603830 4603820 5 a t c a g c a a t c c c g g c a g c a a c a c t c c c c a g c c a c t g c c c a g c g t a c g t t g c a a c a t g a t t t c a t c t c 3
4603810 4603800 4603790 4603780 4603770 4603760 4603750 5 t t t c a t t g a t a a t g a t a a c c a a t a t c a t a t g a t a a t t t t t a t c a t t t g c a a g c c a g a t a a a t c c c t t g c t a t c g g 3
Fur high affinity topFur high affinity bottom
M9 161 bits
M9 208 bits
M9 223 bits
M9 34 bits
fhuF
fhuF
4603740 4603730 4603720 4603710 4603700 4603690 4603680 5 g t a a a c c t a t c g c t a t g a t t a g c a a t c a t t a t c a t t t a g a t t a c t a t c c c g a t t a t g g c c t a t c g t t c 3rsquo
Fur low affinity topFur low affinity bottom
M9 22 bits
M9 01 bits
M9 146 bits
M9 93 bits
Figure 5 Fur model scan and DNase I footprints of the fhuF promoter region (A) The promoter region was scanned with the 24-site M9 model and two clusters of sequence walkers (Rigt 0bits) were detected with each closely corresponding to one of the two Fur protected regions (gray bars fhuF1 and fhuF2) The solid black arrow with two tails above the DNA at coordinates4603717 and 4603716 indicates the fhuF transcription starts The open arrows starting at coordinates 4603827 and 4603686 indicate the yjjZ and fhuF translation starts respectively Thesaturation of colored rectangles behind each sequence walker is proportional to the information of that site For reference colored lines cycle with 6-base periodicity (B) DNase I footprinting onthe fhuF promoter by Fur showed two regions protected by the protein marked in the figure by brackets The high-affinity and low-affinity regions are marked with H and L respectively Thefootprinting samples were run in parallel with MaxamndashGilbert sequencing ladders (marked by GA)
Nucleic
Acid
sResea
rch2007V
ol3
5N
o20
6771
consensus sequence Thus four of the five Sau3Afragments (yhhX gpmA nohA and fhuF) selected to haveFur binding sites in vivo were identified in our genomescan (Table 4) and our gel shift results confirmed this(Figure 3) the remaining site ygaC had a weak walker of14 bits (by the 11-site M9 model) and showed extremelyweak binding (Figure 3)Eick-Helmerich and Braun matched a Fur consensus
sequence to the promoters of exbB and exbD (50)One strong walker of 191 bits was found in the exbBpromoter region overlapping the lsquoFur boxrsquo The regionupstream of exbD showed no walkers with Ri greater than0 bits even though it had also been predicted to containa Fur binding site using the same consensus sequenceas used in the exbB promoter The highest walker thatwe could detect in the region upstream of the exbDgene is 71 bits (by the 24-site M9 model Figure 3 andSupplementary Figure S1) significantly lower than thetheoretical lower bound of a binding site (0 bits) (19) Thisindicates that Fur should not bind this region and our gelshift results confirmed this (Figure 3)A microarray analysis by McHugh et al found 101
genes to be regulated by Fur 53 of which were repressedand 48 activated (13) The 53 Fur-repressed genes belongto 32 transcription units (individual genes or operons)18 of which are involved in iron metabolism 14 in energymetabolism and other functions Using the 24-site M9model we predicted strong Fur binding clusters (higherthan 13 bits) for all 18 iron metabolism-related transcrip-tion units Of the other 14 transcription units onlytwo have a Fur binding site higher than 10 bits (nrdHIEF101 bits fimE 108 bits) The 48 Fur-activated genesbelong to 34 transcription units only three of these(garPLRK ynaE and ydfK) have a predicted Fur bindingcluster The garPLRK operon has a weak Fur clusterthat contains one major site of 104 bits The ynaE andydfK genes are almost identical in both promoter andcoding regions and the two Fur clusters (171 bitscentered at thorn8 from the translational start) are the same(Table 4) These results strongly suggest that in E coli
Fur is a direct repressor of iron metabolism genes and anindirect repressor or activator of other genes
McHugh et al (13) used an altered information theoryapproach for modeling Fur binding (63) to identify whichgenes are under direct Fur control Our model identifiedsites in all genes that their model did as well as fourothers fhuE (124 bits at 1160880) exbB (191 bits at3150076) fimE (108 bits at 4539697) and ydfK (171 bitsat 1631071)
The indirect activation by Fur requires an intermediateregulator which could directly target Fur-activated genesA small regulatory RNA gene ryhB is one such importantplayer Earlier study revealed several genes that aredirectly repressed by ryhB (57) Recently a microarrayanalysis of ryhB control by Masse et al (60) expanded thisdirect target set to 18 operons They also identified anadditional 10 indirectly down-regulated operons and 10up-regulated operons The 10 ryhB indirectly repressedoperons are directly repressed by Fur and for all ofthese operons we found strong Fur binding clusters(higher than 13 bits) in the corresponding promoterregions In contrast only a few of the other 28 ryhB-controlled operons (18 directly repressed operons and10 up-regulated operons) have a predicted Fur bindingcluster above 10 bits These include acnA (176 bits at1333484) ydhD (109 bits at 1732367) and oppA (215 bitsat 1298973) The acnA and oppA Fur clusters are bothstrong the former cluster has three overlapping sites of176 98 and 142 bits and the latter has two overlapp-ing sites of 215 and 96 bits (Table 4) Howevermicroarray analysis did not find these two genes to berepressed by Fur (1360) suggesting that the Fur bindingclusters may not be the primary control elements ofthese genes
DISCUSSION
Fur binding models
In this study we used experimentally proven sites to createinformation theory models of Fur binding Our models
Table 3 Scan of footprinted Fur binding sites in E coli
Accession Gene Footprinted region References Number Ri Percentage
NC_000913 fhuF1 4603763 ndash 4603815 This work 4 223 91NC_000913 fhuF2 4603680 ndash 4603733 This work 4 146 66NC_000913 fepA-fes 611868 ndash 611915 (35) 4 187 77NC_000913 fepB 623939 ndash 623969 (39) 2 167 81NC_000913 entC 624054 ndash 624102 (39) 4 231 76NC_000913 fur 709931 ndash 709960 (34) 2 123 83NC_000913 tonB 1309040 ndash 1309072 (838) 3 115 88NC_000913 cir 2244957 ndash 2244999 (32) 3 207 72NC_000913 sodA 4098726 ndash 4098780 (36) 3 233 56NC_000913 fecA 4514752 ndash 4514789 (33) 2 176 66NC_000913 fecI 4516300 ndash 4516340 (33) 7 231 144M10930 iucA 291 ndash 393 (37) 13 203 93NC_000913 fepD-entS 621394 ndash 621462 (58) 4 201 54
The footprinted regions were scanned with the 24-site M9 model (Rigt0) to test the modelrsquos validity This table lists the GenBank accession numberthe gene promoter that was footprinted the coordinates of the footprinted region the references for the footprints the number of walkers in theregion the strongest Ri value in the same region and the percent of the footprint covered by sequence walkers This table is a summary of Figure 5Aand Supplementary Figure S8 Sites in the promoters that are marked by an asterisk were used to build the 11-site models (Figure 1 SupplementaryFigure S1) in the case of iucA two sites were used in the model (see Materials and Methods Section)
6772 Nucleic Acids Research 2007 Vol 35 No 20
especially the 24-site M9 model approximate the bindingcharacteristics of Fur more fully than previous modelsthat depended on conventional consensus sequences datafrom multiple species and sequences that were not foot-printed (13251) The rigorous approach used hererevealed new binding sites disproved two sites predictedby a consensus sequence and clarified the manner inwhich the protein binds
To produce these models binding sequences weremultiply aligned to maximize the information content indifferent window sizes Because multiple Fur sites overlapour analysis of proven Fur binding sequences from three
bacterial species gave two or three different alignmentsdepending on the species (Figure 1 SupplementaryFigure S6) By testing these models against publisheddata we showed clearly that the M9 model representssingle-dimer binding sites the M12 model is for two-dimerbinding while the compact M6 alignment was caused bycompression of the binding sequences into a small window(Tables 1 and 2 Figures 1 and 2 Supplementary FiguresS3 S5 and S7)As evaluated by their respective M9 models our
initial set of 11 footprinted E coli Fur binding sitesall contain overlapping sites of 6-base spacing
Table 4 Strong Fur clusters in the E coli genome predicted by the 24-site M9 model
Ri Number Size Coordinate Genes Experimental Position References(bits) (bp) data
274 12 95 2066611 yoeA G 48 NA274 5 31 786853 gpmA G 35 (1351)253 4 28 579824 nohB 133 (1)252 4 31 3579054 ryhB G 15 (57)233 3 31 4098746 sodA F 87 (6069)231 7 59 4516309 fecI F 51 (1360)231 4 37 624072 entC F 36 (39)228 2 25 1903412 yebN 300 NA223 4 48 4603779 fhuFa FG 93 (135160)223 3 25 1634624 nohA G 233 (51)217 2 25 1886651 fadD G +1119b NA215 2 25 1298973 oppA G 233 (7071)213 3 31 1762463 sufA 53 (1360)207 3 31 2244980 cir F 189 (1372)206 5 40 1118358 yceJ 89 NA202 3 31 3465053 bfd 40 (1360)201 10 53 1787602 ydiE 35 (13)201 2 35 1080513 ycdN 57 (1360)199 8 106 3214572 yqjH 59 (13)199 7 103 1752756 ydhY 255 NA199 2 25 2510784 mntH 56 (7374)196 3 25 2231892 yeiT 163 NA196 2 25 1577358 yddA +8c (13)195 3 31 167439 fhuA G 45 (1360)194 4 37 621437 fepD ndash entS FG 25 86 (1375)193 3 52 675768 ybeQ +2c NA191 3 31 3150076 exbB G 70 (136072)187 8 96 2784036 ygaQ 383 NA187 4 37 611889 fepA ndash fes F 172 149 (13)186 7 38 3266404 yhaC 33 NA186 4 28 331901 yahA G +306e NA183 4 61 331085 yahA 510e NA176 6 41 1333484 acnA 371 (5776ndash78)176 2 25 4514779 fecA F 79 (1360)176 1 19 2254041 yeiL +5d NA175 3 25 490057 priC ndash ybaN 21 49 (7980)173 2 29 2454142 yfcV 474 NA172 8 41 3582772 yrhB 10 NA172 4 37 840877 fiu 123 (13)171 3 40 1631071 ydfK +8c (13)171 3 40 1432273 ynaE +8c (13)171 2 25 4570225 yjiT 212 NA
The strongest individual information value (Ri bits) in each cluster is shown followed by the number of walkers (Rigt 0 bits) the size of the clusterthe coordinate of the strongest site the genes that are presumably controlled experimental data footprinted (F) andor gel shifted (G) sites thelocation of the strongest Fur site relative to the gene start or stop codon (as noted) and references about Fur regulation of the genes (NA means noreference is available)aThis cluster corresponds to the high-affinity region (fhuF1) in the fhuF promoter (Figure 5)bThe coordinate is high because Fur cluster is inside the fadD coding regioncThese Fur clusters overlap with translational startsdThis Fur cluster overlaps with the stop codoneThe yahA gene has two Fur clusters one is inside the coding region another is 510 bases upstream of the translational start The Fur cluster insidethe gene was tested by gel shift experiments (Figure 3)
Nucleic Acids Research 2007 Vol 35 No 20 6773
(Supplementary Figure S8) while only 7 of the 20P aeruginosa Fur binding sites and 6 of the 22 B subtilisFur binding sites are single-dimer binding sites (data notshown) This scarcity of multiple sites explains why anM12 alignment can be obtained for E coli but not forP aeruginosa or B subtilis When we performed multiplealignment by only using overlapping sites of 6-bp spacingwe obtained M12 alignments for both P aeruginosa andB subtilis (data not shown) These observations alsosupport the use of M9 as a single-dimer modelOur multiple alignment method should apply to any
DNA binding sites that contain internal direct repeats asrepetition within binding sites may cause alternativealignments of the sites In the RegulonDB (httpregulondbccgunammx) (64) a Fur binding modelwas created by multiply aligning 47 Fur sites althoughit is not clear what these 47 sites are and how thealignment was made Based on their alignment (httpregulondbccgunammxhtmlmatrix_Alignmentjsp) we made a sequence logo (data notshown) which we found to be similar to our M6 logosAlthough the M6 models appear to be able to predictsome sites (eg Tables 1 and 2 Supplementary Figures S3S5 and S7) the models only represent the internal partof overlapping Fur binding sites ie from 6 to thorn6 of theM12 model (Figure 1A)A 18 A crystal structure for P aeruginosa Fur
was obtained by Pohl et al and based on this theauthors proposed a single-dimer binding model (9) TheP aeruginosa crystal structure contains a putative DNAbinding a-helix H4 and a loop between helices H1 and H2both of which were proposed to be involved in bindingDNA through contacting bases in the major groove Thusa single dimer would protect two consecutive majorgrooves on the same face of DNA and the minimalrecognition unit of Fur should be close to 20 bp (9) OurM9 binding site models (19 bp a 9-1-9 inverted repeat) fitwith this single-dimer binding model closely Pohl et alalso proposed a two-dimer binding model in whichtwo Fur dimers bind two overlapping sites that are 5-bpapart (9) However in our whole genome scans andfootprint scans as well we did not observe any case of5-bp separated overlapping sites above 5 bits Instead wemostly found 3- or 6-bp separated overlapping sitesAbove five bits we found 95 cases of 3-bp spacing and92 cases of 6-bp spacingSeveral other consensus-based Fur binding site models
have been proposed to interpret a 19-bp consensuslsquoFur boxrsquo (Figure 6) (65) Within these two earlier
models the classical model and hexamer model(573754) interpreted the lsquoFur boxrsquo as a single recogni-tion unit of a Fur dimer Later Lavrrar and McIntoshsuggested that a 13-bp inverted repeat (6-1-6) is theminimal unit recognized by a single Fur dimer and thattwo overlapping lsquo6-1-6rsquo motifs correspond to the Fur boxwhich is required for high-affinity binding of Fur (5558)Baichoo and Helmann reinterpreted the Fur box as twooverlapping 7-1-7 inverted repeats with a 6-bp spacingand suggested that a 7-1-7 site but not 6-1-6 representsthe minimal recognition unit of Fur (40) Baichoo andHelmann also demonstrated that high-affinity bindingby Fur can happen on a single-dimer binding site ie aninverted 7-1-7 site
The 7-1-7 model basically agrees with our M9 models(Figure 6) The main difference is that the 7-1-7 model didnot count the weakly conserved bases at positions 9 8thorn8 and thorn9 in the B subtilis Fur binding site alignment(Figure 4C) As the E coli and P aeruginosa M9 modelshave highly conserved bases at these four positions(Figure 4A and B) and also the Fur binding mechanismsfor these three species are similar (9) we suggest thata minimal Fur binding unit of 19 bp (9-1-9) as representedby our M9 models is more reasonable
Relative to our M9 models the Fur box only representsthe internal part of two overlapping sites that areseparated by 6 bases ie from 9 to thorn9 of the M12logo (Figures 1A and 6) thus any predictions based on theFur box may not be accurate Two sites (exbD and ygaC)that were predicted by matching to the Fur box wereshown to be non-binding (Figure 3)
Many difficulties in understanding Fur binding sites canbe attributed to the choice of consensus sequences asa model (66) In contrast to the individual informationweight matrix model the consensus sequence methodignores the varying importance of bases by treatingmismatches equivalently In addition the consensusmethod does not have a criterion for an acceptablenumber of mismatches
Sequence walkers predicted by the 24-site M9 modelcover 83 of the footprints (Table 3 Figure 5Supplementary Figure S8) Similar results were obtainedfor both the P aeruginosa and B subtilis M9 models(data not shown) These results strongly suggest that ourM9 models can accurately predict Fur binding
Fur Binding Site Clusters
The whole genome scan with the 24-site M9model revealed363 Fur binding clusters in E coli most of which containmultiple walkers (up to 14) covering regions up to about140 bases This is comparable to the size of knownfootprints which ranges from 30 to 103 bases (Table 3Supplementary Figure S8) Most of the clusters (303 outof 363 83) were found to overlap with one or morestrong s70 promoters Indeed the four Fur clusters inthe middle of coding regions gspC garP yahA and fadDthat we tested by gel mobility shift assays (Figure 3) allhave one or more potential s70 sites (Ri gt 4 bits) exactlyoverlapping them (see Figures S10ndashS13) The function ofthese control elements is unknown
Fur boxClassical model 9-1-9Hexamer model 6-6-1-6Lavrrar model 6-1-6
Baichoo model 7-1-7
M9 model 9-1-9
GATAATGATAATCATTATC
========gtOlt========
=====gt=====gtOlt=====
=====gtOlt==========gtOlt=====
t=====gtOlt============gtOlt=====a
aat=====gtOlt================gtOlt=====att
Figure 6 Different Fur binding models to interpret the Fur box
6774 Nucleic Acids Research 2007 Vol 35 No 20
We correlated our whole genome scan with publishedmicroarray data about Fur in E coli (1360) The resultsshowed that all operons that are involved in ironmetabolism have a predicted strong Fur binding cluster(higher than 13 bits) suggesting direct repression ofthese genes by Fur In contrast only a few of otherFur-activated or repressed operons have a predicted Furcluster higher than 10 bits suggesting indirect control ofthese genes by Fur It has been shown that RNA generyhB directly targets and represses a set of other genes andryhB itself is directly repressed by Fur thus ryhB playsan important role in indirect activation of genes by Furin E coli (135760)
It has been suggested that in N meningitidis Fur canalso directly activate genes by binding to the regionupstream of the promoters (59) However our wholegenome analysis did not support this activation mechan-ism in E coli The E coli ryhB gene is located betweengenes yhhX and yhhY No corresponding homologs of thegenes ryhB yhhX or yhhY can be found in N meningitidisThis suggests that Fur activation mechanisms may bedifferent in different species
Disentangling Fur models
Each binding site that we have studied in detail withinformation theory has had unique properties andchallenges for analysis For example ribosome bindingsites were initially modeled as rigid objects (16) but whenenough sites became available it was clear that theShinendashDalgarno sequence was being lost because it has avariable distance to the initiation codon (67) and a flexiblemodel was required (48) The flexible model was latersuccessfully applied to s70 promoters (47) and that wasused to study Fur sites Likewise information theoryanalysis revealed that Fis binding sites are commonlyfound to overlap each other (68) leading to someuncertainty as to how much nearby sites influence thesites being studied in the alignments Overlapping Fursites that are bound by Fur protein dimers create an evenmore complex situation
Although the 24-site M9 model was quite successful themodel is comprised of information from multiple Fur sitesbecause the Fur dimers overlap on the DNA Since theclusters frequently display 6-base separation of sitespositions 3 to thorn9 of the model contain data frompositions 9 to thorn3 of the adjacent Fur sites eg see thelogo of Figure 4 and the M9 walkers of Figure 2AIn addition information from sites separated by 3 basesand the complements of all of these also contribute to theoverall information content of the model It is not obvioushow to disentangle these effects We suggest that the M9model can be scanned across the genome to locate strongsingle-dimer binding sites which could then be experi-mentally confirmed and used to construct a disentangledmodel
Unfortunately the distribution of the individual infor-mation in these sites would be determined by the modelused to locate them and this could introduce biases Forexample the mean of the distribution can be altered justby selecting different subsets of sites from the distribution
A similar bias has occurred in binding site databaseswhere the consensus sequence was used to locate newsites (23) Thus although the sequence logo would nolonger contain self-overlapping data it would not neces-sarily represent the natural distribution of site strengthsTo approach such an ultimate model may require furtherscanning of footprinted regions Incorporation of theweaker sites in the footprints would of course re-entanglethe model so such scans might be best used only todetermine whether the model can detect the weaker sitesA viable solution may be to obtain all single-dimerbinding sites in the genome that have more than zero bitsand can be confirmed experimentally If the resultingmodel matches the observed footprints it may be consi-dered complete On the other hand if the resultingmodel does not match the footprints the mode of bindingby Fur could be different between single dimers andclusters
SUPPLEMENTARY DATA
Supplementary data are available at NAR Online
ACKNOWLEDGEMENTS
We thank Tom OrsquoHalloran and Caryn Outten for Furprotein Pete Rogan for helping to develop the local bestconcept Elaine Bucheimer for writing the localbestprogram and Lakshmanan Iyer Brent Jewett DanielleNeedle Xiao Ma Shu Ouyang Denise Rubens andBruce Shapiro for their helpful comments and discussionsKAL also thanks the National Cancer Institute atFrederick for sponsoring the Werner H Kirsten StudentIntern Program This publication has been funded in partwith Federal funds from the National Cancer InstituteNational Institutes of Health under contract NO1-CO-12400 The content of this publication does not necessarilyreflect the views or policies of the Department of Healthand Human Services nor does mention of trade namescommercial products or organizations imply endorsementby the US Government This research was supported inpart by the Intramural Research Program of the NIHNational Cancer Institute Center for Cancer ResearchFunding to pay the Open Access publication charges forthis article was provided by National Cancer Institute
Conflict of interest statement None declared
REFERENCES
1 StojiljkovicI BaumlerAJ and HantkeK (1994) Fur regulon ingram-negative bacteria Identification and characterization of newiron-regulated Escherichia coli genes by a fur titration assay[published erratum appears in J Mol Biol 1994 Jul 15240(3)271]J Mol Biol 236 531ndash545
2 Le CamE FrechonD BarrayM FourcadeA and DelainE(1994) Observation of binding and polymerization of Fur repressoronto operator-containing DNA with electron and atomic forcemicroscopes Proc Natl Acad Sci USA 91 11816ndash11820
3 CalderwoodSB and MekalanosJJ (1987) Iron regulation ofShiga-like toxin expression in Escherichia coli is mediated by thefur locus J Bacteriol 169 4759ndash4764
Nucleic Acids Research 2007 Vol 35 No 20 6775
4 NiederhofferEC NaranjoCM BradleyKL and FeeJA (1990)Control of Escherichia coli superoxide dismutase (sodA and sodB)genes by the ferric uptake regulation (fur) locus J Bacteriol 1721930ndash1938
5 EscolarL Perez-MartinJ and de LorenzoV (1998) Binding of theFur (Ferric Uptake Regulator) repressor of Escherichia coli toarrays of the GATAAT sequence J Mol Biol 283 537ndash547
6 BaggA and NeilandsJB (1987) Ferric uptake regulation proteinacts as a repressor employing iron (II) as a cofactor to bind theoperator of an iron transport operon in Escherichia coliBiochemistry 26 5471ndash5477
7 de LorenzoV WeeS HerreroM and NeilandsJB (1987)Operator sequences of the aerobactin operon of plasmid ColV-K30binding the ferric uptake regulation (fur) repressor J Bacteriol169 2624ndash2630
8 AlthausEW OuttenCE OlsonKE CaoH andOrsquoHalloranTV (1999) The ferric uptake regulation (Fur) repressoris a zinc metalloprotein Biochemistry 38 6559ndash6569
9 PohlE HallerJC MijovilovichA Meyer-KlauckeWGarmanE and VasilML (2003) Architecture of a protein centralto iron homeostasis crystal structure and spectroscopic analysis ofthe ferric uptake regulator Mol Microbiol 47 903ndash915
10 KammlerM SchonC and HantkeK (1993) Characterization ofthe ferrous iron uptake system of Escherichia coli J Bacteriol 1756212ndash6219
11 DesaiPJ AngererA and GencoCA (1996) Analysis of Furbinding to operator sequences within the Neisseria gonorrhoeae fbpApromoter J Bacteriol 178 5020ndash5023
12 HeidrichC HantkeK BierbaumG and SahlHG (1996)Identification and analysis of a gene encoding a Fur-likeprotein of Staphylococcus epidermidis FEMS Microbiol Lett 140253ndash259
13 McHughJP Rodriguez-QuinonesF Abdul-TehraniHSvistunenkoDA PooleRK CooperCE and AndrewsSC(2003) Global iron-dependent gene regulation in Escherichia coliA new mechanism for iron homeostasis J Biol Chem 27829478ndash29486
14 ShannonCE (1948) A mathematical theory of communicationBell Syst Tech J 27 379ndash423 623ndash656 httpcmbell-labscomcmmswhatshannondaypaperhtml
15 PierceJR (1980) An Introduction to Information Theory SymbolsSignals and Noise 2nd edn Dover Publications Inc New York
16 SchneiderTD StormoGD GoldL and EhrenfeuchtA (1986)Information content of binding sites on nucleotide sequencesJ Mol Biol 188 415ndash431 httpwwwccrnpncifcrfgov~tomspaperschneider1986
17 SchneiderTD and MastronardeD (1996) Fast multiple alignmentof ungapped DNA sequences using information theory and arelaxation method Discrete Appli Math 71 259ndash268 httpwwwccrnpncifcrfgov~tomspapermalign
18 SchneiderTD and StephensRM (1990) Sequence logos a newway to display consensus sequences Nucleic Acids Res 186097ndash6100 httpwwwccrnpncifcrfgov~tomspaperlogopaper
19 SchneiderTD (1997) Information content of individual geneticsequences J Theor Biol 189 427ndash441 httpwwwccrnpncifcrfgov~tomspaperri
20 SchneiderTD (1997) Sequence walkers a graphical method todisplay how binding proteins interact with DNA or RNAsequences Nucleic Acids Res 25 4408ndash4415 httpwwwccrnpncifcrfgov~tomspaperwalker erratum NAR 26(4)1135 1998
21 PaninaEM MironovAA and GelfandMS (2001) Comparativeanalysis of FUR regulons in gamma-proteobacteria Nucleic AcidsRes 29 5195ndash5206
22 SchneiderTD (2000) Evolution of biological informationNucleic Acids Res 28 2794ndash2799 httpwwwccrnpncifcrfgov~tomspaperev
23 VyhlidalCA RoganPK and LeederJS (2004) Development andrefinement of pregnane X receptor (PXR) DNA binding site modelusing information theory insights into PXR-mediated gene regula-tion J Biol Chem 279 46779ndash46786
24 ZhengM DoanB SchneiderTD and StorzG (1999) OxyR andSoxRS regulation of fur J Bacteriol 181 4639ndash4643 httpwwwccrnpncifcrfgov~tomspaperoxyrfur
25 HengenPN BartramSL StewartLE and SchneiderTD (1997)Information analysis of Fis binding sites Nucleic Acids Res 254994ndash5002 httpwwwccrnpncifcrfgov~tomspaperfisinfo
26 WoodTI GriffithKL FawcettWP Jair K-W SchneiderTDand WolfRE (1999) Interdependence of the position andorientation of SoxS binding sites in the transcriptional activation ofthe class I subset of Escherichia coli superoxide-inducible promotersMol Microbiol 34 414ndash430
27 ZhengM WangX DoanB LewisKA SchneiderTD andStorzG (2001) Computation-directed identification of OxyR-DNAbinding sites in Escherichia coli J Bacteriol 183 4571ndash4579
28 RoganPK FauxBM and SchneiderTD (1998) Informationanalysis of human splice site mutations Hum Mutat 12 153ndash171Erratum in Hum Mutat 199913(1)82 httpwwwccrnpncifcrfgov~tomspaperrfs
29 ChenZ and SchneiderTD (2005) Information theory basedT7-like promoter models classification of bacteriophages anddifferential evolution of promoters and their polymerasesNucleic Acids Res 33 6172ndash6187 httpwwwccrnpncifcrfgov~tomspaperst7like
30 ChenZ and SchneiderTD (2006) Comparative analysis of tandemT7-like promoter containing regions in enterobacterial genomesreveals a novel group of genetic islands Nucleic Acids Res 341133ndash1147 httpwwwccrnpncifcrfgov~tomspaperst7island
31 SchneiderTD (1996) Reading of DNA sequence logos predictionof major groove binding by information theory Meth Enzymol274 445ndash455 httpwwwccrnpncifcrfgov~tomspaperoxyr
32 GriggsDW and KoniskyJ (1989) Mechanism for iron-regulatedtranscription of the Escherichia coli cir gene metal-dependentbinding of Fur protein to the promoters J Bacteriol 1711048ndash1052
33 AngererA and BraunV (1998) Iron regulates transcription of theEscherichia coli ferric citrate transport genes directly and throughthe transcription initiation proteins Arch Microbiol 169 483ndash490
34 de LorenzoV HerreroM GiovanniniF and NeilandsJB (1988)Fur (ferric uptake regulation) protein and CAP (catabolite-activatorprotein) modulate transcription of fur gene in Escherichia coliEur J Biochem 173 537ndash546
35 HuntMD PettisGS and McIntoshMA (1994) Promoter andoperator determinants for fur-mediated iron regulation in thebidirectional fepA-fes control region of the Escherichia colienterobactin gene system J Bacteriol 176 3944ndash3955
36 TardatB and TouatiD (1993) Iron and oxygen regulation ofEscherichia coli MnSOD expression competition between the globalregulators Fur and ArcA for binding to DNA Mol Microbiol 953ndash63
37 EscolarL Perez-MartinJ and de LorenzoV (2000) Evidence ofan unusually long operator for the fur repressor in the aerobactinpromoter of Escherichia coli J Biol Chem 275 24709ndash24714
38 YoungGM and PostleK (1994) Repression of tonB transcriptionduring anaerobic growth requires Fur binding at the promoterand a second factor binding upstream Mol Microbiol 11943ndash954
39 BrickmanTJ OzenbergerBA and McIntoshMA (1990)Regulation of divergent transcription from the iron-responsivefepB-entC promoter-operator regions in Escherichia coliJ Mol Biol 212 669ndash682
40 BaichooN and HelmannJD (2002) Recognition of DNA by Fura reinterpretation of the Fur box consensus sequence J Bacteriol184 5826ndash5832
41 OchsnerUA and VasilML (1996) Gene repression by the ferricuptake regulator in Pseudomonas aeruginosa cycle selection of iron-regulated genes Proc Natl Acad Sci USA 93 4409ndash4414
42 BaichooN WangT YeR and HelmannJD (2002) Globalanalysis of the Bacillus subtilis Fur regulon and the iron starvationstimulon Mol Microbiol 45 1613ndash1629
43 ChristoffersenCA BrickmanTJ Hook-BarnardI andMcIntoshMA (2001) Regulatory architecture of the iron-regulatedfepD-ybdA bidirectional promoter region in Escherichia coliJ Bacteriol 183 2059ndash2070
44 ChaiS WelchTJ and CrosaJH (1998) Characterization of theinteraction between Fur and the iron transport promoter of thevirulence plasmid in Vibrio anguillarum J Biol Chem 27333841ndash33847
6776 Nucleic Acids Research 2007 Vol 35 No 20
45 EscolarL Perez-MartinJ and de LorenzoV (1998) Coordinatedrepression in vitro of the divergent fepA-fes promoters ofEscherichia coli by the iron uptake regulation (Fur) proteinJ Bacteriol 180 2579ndash2582
46 EscolarL de LorenzoV and Perez-MartinJ (1997)Metalloregulation in vitro of the aerobactin promoter of Escherichiacoli by the Fur (ferric uptake regulation) protein Mol Microbiol26 799ndash808
47 ShultzabergerRK ChenZ LewisKA and SchneiderTD (2007)Anatomy of Escherichia coli s70 promoters Nucleic Acids Res 35771ndash788 httpwwwccrnpncifcrfgov~tomspaperflexprom
48 ShultzabergerRK BucheimerRE RuddKE andSchneiderTD (2001) Anatomy of Escherichia coli ribosomebinding sites J Mol Biol 313 215ndash228 httpwwwccrnpncifcrfgov~tomspaperflexrbs
49 HiraoI KawaiG YoshizawaS NishimuraY IshidoYWatanabeK and MiuraK (1994) Most compact hairpin-turnstructure exerted by a short DNA fragment d(GCGAAGC) insolution an extraordinarily stable structure resistant to nucleasesand heat Nucleic Acids Res 22 576ndash582
50 Eick-HelmerichK and BraunV (1989) Import of biopolymers intoEscherichia coli nucleotide sequences of the exbB and exbD genesare homologous to those of the tolQ and tolR genes respectivelyJ Bacteriol 171 5117ndash5126
51 VassinovaN and KozyrevD (2000) A method for direct cloning ofFur-regulated genes identification of seven new Fur-regulated lociin Escherichia coli Microbiology 146 3171ndash3182
52 ToledanoMB KullikI TrinhF BairdPT SchneiderTD andStorzG (1994) Redox-dependent shift of OxyR-DNA contactsalong an extended DNA binding site a mechanism for differentialpromoter selection Cell 78 897ndash909
53 BlattnerFR PlunkettG III BlochCA PernaNT BurlandVRileyM Collado-VidesJ GlasnerJD RodeCK et al (1997)The complete genome sequence of Escherichia coli K-12 Science277 1453ndash1474
54 BaggA and NeilandsJB (1987) Molecular mechanism of regula-tion of siderophore-mediated iron assimilation Microbiol Rev 51509ndash518
55 LavrrarJL and McIntoshMA (2003) Architecture of a Furbinding site a comparative analysis J Bacteriol 185 2194ndash2202
56 OrchardK and MayGE (1993) An EMSA-based method fordetermining the molecular weight of a protein ndash DNA complexNucleic Acids Res 21 3335ndash3336
57 MasseE and GottesmanS (2002) A small RNA regulates theexpression of genes involved in iron metabolism in Escherichia coliProc Natl Acad Sci USA 99 4620ndash4625
58 LavrrarJL ChristoffersenCA and McIntoshMA (2002)FurndashDNA interactions at the bidirectional fepDGC-entS promoterregion in Escherichia coli J Mol Biol 322 983ndash995
59 DelanyI RappuoliR and ScarlatoV (2004) Fur functions as anactivator and as a repressor of putative virulence genes in Neisseriameningitidis Mol Microbiol 52 1081ndash1090
60 MasseE VanderpoolCK and GottesmanS (2005) Effect ofRyhB small RNA on global iron use in Escherichia coliJ Bacteriol 187 6962ndash6971
61 ArgamanL HershbergR VogelJ BejeranoG WagnerEGMargalitH and AltuviaS (2001) Novel small RNA-encodinggenes in the intergenic regions of Escherichia coli Curr Biol 11941ndash950
62 WassarmanKM RepoilaF RosenowC StorzG andGottesmanS (2001) Identification of novel small RNAs usingcomparative genomics and microarrays Genes Dev 15 1637ndash1651
63 RobisonK McGuireAM and ChurchGM (1998) A compre-hensive library of DNA-binding site matrices for 55 proteins appliedto the complete Escherichia coli K-12 genome J Mol Biol 284241ndash254
64 SalgadoH Santos-ZavaletaA Gama-CastroS Millan-ZarateDDiaz-PeredoE Sanchez-SolanoF Perez-RuedaEBonavides-MartinezC and Collado-VidesJ (2001) RegulonDB(version 32) transcriptional regulation and operon organization inEscherichia coli K-12 Nucleic Acids Res 29 72ndash74
65 AndrewsSC RobinsonAK and Rodriguez-QuinonesF (2003)Bacterial iron homeostasis FEMS Microbiol Rev 27 215ndash237
66 SchneiderTD (2002) Consensus Sequence Zen ApplBioinformatics 1 111ndash119 httpwwwccrnpncifcrfgov~tomspaperszen
67 RuddKE and SchneiderTD (1992) Compilation of E coliribosome binding sites In MillerJH (ed) A Short Course inBacterial Genetics A Laboratory Manual and Handbook forEscherichia coli and Related Bacteria Cold Spring HarborCold Spring Harbor Laboratory Press New York pp 1719ndash1745
68 HengenPN LyakhovIG StewartLE and SchneiderTD(2003) Molecular flip-flops formed by overlapping Fis sitesNucleic Acids Res 31 6663ndash6673
69 FeeJA (1991) Regulation of sod genes in Escherichia colirelevance to superoxide dismutase function Mol Microbiol 52599ndash2610
70 IgarashiK SaishoT YuguchiM and KashiwagiK (1997)Molecular mechanism of polyamine stimulation of thesynthesis of oligopeptide-binding protein J Biol Chem 2724058ndash4064
71 YoshidaM MeksuriyenD KashiwagiK KawaiG andIgarashiK (1999) Polyamine stimulation of the synthesis ofoligopeptide-binding protein (OppA) Involvement of a structuralchange of the Shine-Dalgarno sequence and the initiation codonaug in oppa mRNA J Biol Chem 274 22723ndash22728
72 LitwinCM and CalderwoodSB (1993) Role of iron in regulationof virulence genes Clin Microbiol Rev 6 137ndash149
73 MakuiH RoigE ColeST HelmannJD GrosP andCellierMF (2000) Identification of the Escherichia coli K-12Nramp orthologue (MntH) as a selective divalent metal iontransporter Mol Microbiol 35 1065ndash1078
74 PatzerSI and HantkeK (2001) Dual repression by Fe(2+)-Furand Mn(2+)-MntR of the mntH gene encoding an NRAMP-likeMn(2+) transporter in Escherichia coli J Bacteriol 1834806ndash4813
75 FurrerJL SandersDN Hook-BarnardIG and McIntoshMA(2002) Export of the siderophore enterobactin in Escherichia coliinvolvement of a 43 kDa membrane exporter Mol Microbiol 441225ndash1234
76 CunninghamL GruerMJ and GuestJR (1997) Transcriptionalregulation of the aconitase genes (acnA and acnB) of Escherichiacoli Microbiology 143 ( Pt 12) 3795ndash3805
77 FuentesAM Diaz-MejiaJJ Maldonado-RodriguezR andAmabile-CuevasCF (2001) Differential activities of theSoxR protein of Escherichia coli SoxS is not required for geneactivation under iron deprivation FEMS Microbiol Lett 201271ndash275
78 VargheseS TangY and ImlayJA (2003) Contrasting sensitivitiesof Escherichia coli aconitases A and B to oxidation and irondepletion J Bacteriol 185 221ndash230
79 ZavitzKH DiGateRJ and MariansKJ (1991) The PriB andPriC replication proteins of Escherichia coli Genes DNAsequence overexpression and purification J Biol Chem 26613988ndash13995
80 SandlerSJ (2005) Requirements for replication restart proteinsduring constitutive stable DNA replication in Escherichia coli K-12Genetics 169 1799ndash1806
81 SchneiderTD (2001) Strong minor groove base conservation insequence logos implies DNA distortion or base flipping duringreplication and transcription initiation Nucleic Acids Res 294881ndash4891 httpwwwccrnpncifcrfgov~tomspaperbaseflip
Nucleic Acids Research 2007 Vol 35 No 20 6777
The M12 logo appears to consist of two overlappingdirect repeats one is from 12 to thorn6 and the other from6 to thorn12 Because the multiple alignment process canincorporate one or the other of these repeats this suggeststhat different alignments may be obtained when otherwindows are used to align the sequences Because two Furdimers can bind overlapping sites with 6-base separation(4055) we then multiply aligned the sequences by usingwindow sizes ranging from (15 thorn15) to (5 thorn5) Thesesizes should cover the sites recognized by one or twoFur dimers (9) A total of seven different alignments wereobtained the corresponding sequence logos are shown inSupplementary Figure S2 According to the patternsshown in the logos the seven alignments that wereobtained can be further consolidated into and assignedto three basic alignment classes these alignment classesare represented by logos M12 M9 and M6 (Figure 1)Each of the three alignments itself is an inverted repeatthe M12 logo also appears to consist of two overlappingM9 sites (from 12 to thorn6 and from 6 to thorn12) and theM9 logo looks like two overlapping M6 sites (from 9to thorn3 and from 3 to thorn9) (Figure 1)To determine which of the three alignment classes best
represents a single-dimer binding model some publishedexperimental data was analyzed Lavrrar and McIntosh(55) used the Ferguson method (56) to determine howmany E coli Fur dimers can bind on five synthetic oligosand two natural sites fepB and entS We applied the threemodels to their sequences and used sequence walkersto show the predicted sites of each model A full list ofthe predictions (with walkers gt0 bits) is given inSupplementary Figure S3 The results show that themodels frequently found overlapping sites that areseparated by 3 or 6 bases The sites that are 6 basesapart can both be strong while 3-base separated siteshave one strong site and one weak site (SupplementaryFigure S3) Furthermore it has been shown that Furdimers can simultaneously bind two overlapping sites with6-bp spacing (40) but there is no experimental datashowing that 3-base separated overlapping sites can bebound at the same time Therefore to analyze the resultswe only counted major walkers (strong walkers that areseparated by 6 bases) for each model in each oligo
A summary of the results (with major walkers) is availablein Table 1 two examples of the prediction (with majorwalkers) are shown in Figure 2A and B The results showthat the M9 model made correct predictions for five ofthe seven oligos while the M12 and M6 models bothmade only two correct predictions Furthermore the M12model always predicted one less site than the M9 modelThese results suggest that the M9 model may be a single-dimer binding model the M12 model may representa two-dimer binding model while the M6 alignment maybe just a result of compression of the binding sequencesinto a small alignment window
Confirmation of 13 predicted sites and model refinement
Since our E coli Fur model had only 11 sites we wantedto refine the model by including more sites A genome scanwith the 11-site M9 model revealed 389 sites above 11 bits(the lowest information of a site in the model is 111 bits)Within these 13 sites (from 12 to 26 bits) were selectedfor experimental confirmation (Figure 3 SupplementaryTable S1 and Figure S1) These include eight sites locatedin promoter regions [yoeA fepD-entS (43) gpmA (= pgm)(51) ryhB (= yhhX-yhhY) (5157) fhuA (13) nohA (51)oppA (13) and fhuF (51)] and four sites inside genes(gspC garP yahA and fadD) We also included two sitesthat were predicted by consensus models to be in ygaCand exbD (5051) since our model only detected a weaksite (14 bits) in ygaC and a negative site (50 bits) inexbD (Figure 3) Gel mobility shift assays showed that all13 sites predicted by the M9 model shifted when incubatedwith Fur protein while the exbD site did not shift and theygaC site only showed extremely weak binding (Figure 3)
McHugh et al (13) found that both exbB and exbDwere regulated by Fur but their model did not detect abinding site in the exbB promoter while our informationtheory model successfully predicted that Fur binding siteand our gel shift results confirmed this (Figure 3)The exbD gene is located only seven bases downstreamof the exbB gene Both genes have the same orientationso they may belong to a single operon cotranscribed fromthe exbB promoter and coregulated by Fur
To strengthen our model we added the 13 confirmedsites to the 11 previously footprinted sites (Figure 4A)
Table 1 Information theory test of 11-site E coli Fur models M12 M9 and M6
Oligo1 Number of observed1
Fur dimers (nM)Number of M122
walkers (bits)Number of M92
walkers (bits)Number of M62
walkers (bits)
F-F 1 (25000) f 1 (55) 2 (90 94) f 1 (115)F-F-F 2 (3610) 1 (148) f 2 (118 153) f 2 (119 99)F-F-F-F 4 (630) 2 (166 192) 3 (118 189 153) 3 (119 119 99)F-F-x-R 2 (095) f 2 (221 43) f 2 (177 176) 1 (204)F-F-x-R-R 3 (078) 2 (296 227) f 3 (177 275 108) 2 (204 164)fepB 2 (860) 1 (157) f 2 (109 82) 1 (152)entS 3 (301) 2 (156 172) f 3 (75 172 95) 2 (83 122)
1Oligos and observed Fur dimers are from (55) The apparent Kd is also given (nM in parentheses) for each oligo2Number of sequence walkers of each model followed by the information (bits) of each walker are shown for each sequence The predictions thatmatch the number of observed Fur dimers are marked with a solid circle while those that do not are marked with an open circle Only majorwalkers (strong walkers with 6-bp separation) are counted (see the text) A full depiction of all walkers (gt0 bits) can be found in SupplementaryFigures S3 and S5
6766 Nucleic Acids Research 2007 Vol 35 No 20
As with the 11 sites multiple alignment of the 24 sitesusing different window sizes also gave three classes ofalignments M12 M9 and M6 (Supplementary Figure S4)When these models were tested against the same set ofsequences as that used to test the 11-site models (Table 1Supplementary Figure S3) similar but cleaner resultswere obtained ie the weak walkers that were not countedpreviously become weaker (some are even below 0 bits)but the strong walkers still have similar Ri values(Supplementary Figure S5) This confirms that ourcounting of major walkers predicted by the initial 11-sitemodels was reasonable
P aeruginosa and B subtilis Fur models
To compare E coli Fur models with other bacterialFur models we also built P aeruginosa and B subtilis Furbinding site models As with E coli Fur binding siteswe applied the same method to align footprinted Furbinding sites from P aeruginosa (41) and B subtilis (42)For both bacteria we obtained two classes of alignments
(Supplementary Figure S6) which correspond to theE coli M9 and M6 logos (Figure 1B and C) respectivelyNo M12 alignments were obtained for these two bacteriaprobably because some sites used to make the modelsare single-dimer binding sites (see Discussion Section)Both P aeruginosa and B subtilis M9 models arehighly similar to the E coli M9 model (Figure 4) TheP aeruginosa M6 logo looks almost identical to theE coli M6 logo while the B subtilis M6 logo is lesssimilar to the E coli M6 logo (Figure 1C SupplementaryFigure S6B and D)As with Lavrrar and McIntoshrsquos work on E coli Fur
binding sites (55) Baichoo and Helmann (40) performedsimilar experiments on eight synthetic oligos and twonatural sites (feuA and dhbA) with B subtilis Fur anddetermined how many Fur dimers can bind each of theseoligos We tested the B subtilis M9 and M6 modelswith these 10 sequences (Table 2 Figure 2C and Dand Supplementary Figure S7) The results show that theM9 model made correct predictions for all 10 sequenceswhile the M6 model was correct for only four oligos
A
B
fepB binds 2 Fur dimers
entS binds 3 Fur dimers
C
D
feuA binds 1 Fur dimer
dhbA binds 2 Fur dimers
10 20 30 5 g g a a t t c g a a a a t g a g a a g c a t t a t t g g a t c c c g 3
M12 157 bits
M9 109 bits
M9 82 bits
M6 152 bits
10 20 30 40 5 g g a a t t c g a t a a t g a a a t t a a t t a t c g t t a t c g g a t c c c g 3
M12 156 bits
M12 172 bits
M9 75 bits
M9 172 bits
M9 95 bits
M6 83 bits
M6 122 bits
10 20 30 5 a t t c c a a t t g a t a a t a g t t a t c a a t t g a a c a 3
M9 223 bits
M6 103 bits
M6 81 bits
10 20 30 5 a t t g a t a a t g a t a a t c a t t a t c a a t a g a t t g 3
M9 256 bits
M9 299 bits
M6 149 bits
M6 208 bits
M6 171 bits
Figure 2 Sequence walkers to test E coli (A and B) and B subtilis (C and D) Fur models (A and B) The three E coli Fur models (M12 M9 andM6) were tested with two E coli Fur binding sites fepB and entS (55) Sequence walkers of each model (green M12 red M9 blue M6) are shownfor each site (C and D) Two B subtilis Fur models (M9 and M6) were tested with two B subtilis Fur binding sites feuA and dhbA (40) Gray barsindicate natural sequence surrounded by synthetic DNA Different colored rectangles indicate different Fur models (by hue green M12 red M9blue M6) and their strengths (by saturation) (30) Only major walkers are shown in these figures for a full list of walkers (Rigt0 bits) please seeSupplementary Figures S3 S5 and S7
Nucleic Acids Research 2007 Vol 35 No 20 6767
For most of the oligos the M6 model predicted one moresite than the M9 model did Similar to our observationsin E coli these results strongly suggest that the M9 modelis a single-dimer binding model while the M6 alignment isdue to compression of the binding sequencesRelative to E coli Fur (NP_415209) the P aeruginosa
Fur (NP_253452) is moderately diverged (58 identical inprotein sequence) and the B subtilis Fur (NP_390233) isdistantly diverged (33 identity) However a comparisonof the three M9 models from these three species showsthat these models are highly similar to each other(Figure 4) suggesting that the FurndashDNA recognitionmechanism is conserved in even distantly related species
Scans of footprinted regions
Gel shift experiments cannot give precise Fur bindingregions Originally using the 11-site M9 model and laterusing the 24-site M9 E coli Fur model we predicted Fursites in the fhuF promoter region and then to validate ourmodel we performed DNase I footprinting on that regionTo further confirm the 24-site M9 model we also scannedit across all other available footprinted E coli Fur bindingregions and correlated our predictions with the publishedfootprints Fur has been documented to exhibit lsquosecondaryfootprintingrsquo protecting extended regions of DNA underhigher protein concentrations (Supplementary Figure S8)(333646) To reveal the extended regions we used a cutoffat zero bits since that is the theoretical lowest informationfor binding (19)The E coli 24-site M9 model predicts two separated
Fur binding clusters in the fhuF promoter region onecluster contains three strong sites of 161 208 and 223bits and the other contains two weaker sites of 146 and
93 bits (Figure 5A) The fhuF oligo used in the gel shiftexperiments in Figure 3 only contains the strong Furcluster To determine if both predicted clusters are pro-tected by Fur we performed DNase I footprinting on alarger fhuF promoter region (see Materials and MethodsSection) that contains both predicted Fur binding clustersThe results show that there are indeed two Fur-protectedregions in the fhuF promoter one has high affinity withFur (protected by 50 nM of Fur) and the other hasa lower affinity (weakly protected by 50 nM of Fur)(Figure 5B) The predicted strong Fur binding clustercovers 91 of the high-affinity footprinted region and theweak Fur cluster covers 66 of the low-affinity region(Table 3 Figure 5A)
There are 11 other footprinted E coli Fur bindingregions including the 10 regions used to build the initial11-site models and the fepD-entS promoter region (58)When these footprints were scanned similar coveragewas obtained (Table 3 Supplementary Figure S8)All 11 footprinted sequences displayed clusters of multipleoverlapping Fur walkers in these clusters the majorityof the walkers were separated by six bases some wereseparated by three bases (Table 3 SupplementaryFigure S8) For 7 of the 11 regions (fepA-fes fepB entCfur tonB cir and iucA) sequence walkers cover 83(72ndash93) of the footprints No sequence walkers coverthe secondary footprints in the fecA fepD-entS and sodAregions resulting in low coverage of these three footprints(66 54 and 56 respectively) The secondary sodAfootprint was only protected with a high concentration ofpurified Fur (200 and 500 nM) it was not protected bycrude extracts containing overproduced Fur (36) In theother region fecI several low-information content
A
B
yoeA fepD-entS gpmA ryhB fhuA nohA oppA
bits 257 172 243 220 161 205 187 11-site M9274 194 274 252 195 223 215 24-site M9
Fur minus + minus + minus + minus + minus + minus + minus +
gspC garP yahA fadD ygaC exbB exbD fhuF
bits 122 122 163 197 14 175 minus50 214 11-site M9144 138 186 217 minus34 191 minus71 223 24-site M9
Fur minus + minus + minus + minus + minus + minus + minus + minus +
Figure 3 Gel mobility shift assays to test predicted Fur binding sites Oligonucleotides containing predicted Fur binding sites were incubated without() or with (+) 150 nM E coli Fur protein and gel electrophoresed to test the 11-site M9 model (Figure 1B) Below each set of lanes is the strengthin bits of the strongest sequence walker found on each oligo by the 11-site M9 model Predictions by the 24-site M9 model which includes 13 of thesites tested here (not ygaC or exbD) are also given for comparison (A) The first set of oligos contain seven predicted Fur sites located in promoterregions (yoeA fepD-entS gpmA ryhB fhuA nohA and oppA) (B) The second set of oligos contain four predicted Fur sites located within genes(gspC garP yahA and fadD See Supplementary Figures S10ndashS13 for sequence walkers) two sites located in promoter regions (exbB and fhuF)and two consensus-predicted sites (ygaC and exbD) (5051) As expected all oligos that contain predicted sites (by the 11-site M9 model) exhibit oneor more mobility shifts following incubation with 150 nM Fur the oligo ygaC (14 bits) shows extremely weak binding and exbD (50 bits)does not shift
6768 Nucleic Acids Research 2007 Vol 35 No 20
walkers (02 and 08 bits) extend past the range of thefootprint resulting in 144 coverage of the footprint
In a previously published study Escolar et al synthe-sized oligonucleotides that contained repeats of thesequence GATAAT and determined Fur binding byfootprint experiments (5) No footprint was seen withone insert correspondingly no sequence walkers wereobserved in that sequence (Supplementary Figure S9)For the oligos with three four and five inserts sequencewalkers cover 83ndash90 of the footprints The 2-insertsequence had a weak interaction with Fur at high protein
concentrations and two overlapping walkers of 30 and42 bits appeared which cover the repeats (5) Theseresults demonstrate that the Fur model accurately predictsFur binding on both natural and synthetic sequences
Whole genome scan
The lowest information of any site in the 24-site M9 modelis 101 bits (Supplementary Figure S1) Since all sites inthe model are proven binding sites we used 10 bits asthe cutoff for a whole genome scan This presumably
A
B
C
======gtolt======
======gtolt======
======gtolt======
24 E coli Fur binding sites 196 bits (minus9 +9)
0
1
2b
its
5prime
minus20
CTGA
minus19
minus18
minus17
minus16
GCAT
minus15
GCTA
minus14
CGTA
minus13
GACT
minus12
CAGT
minus11
CGTA
minus10
GCAT
minus9
GTA
minus8
GTA
minus7
GCAT
minus6
CAG
minus5
GTCA
minus4
GAT
minus3
TA
minus2
TGCA
minus1
GACT
0 1
TCGA
2
ACGT
3
AT
4
TCA
5
CGAT
6
GTC
7
CTGA
8
CAT
9
CAT
10
CGTA
11
GCAT
12
GTCA
13
CTGA
14
GCAT
15
CGAT
16
CGTA
17 18 19 20
GCAT
3prime
20 P aeruginosa Fur sites 186 bits (minus9 +9)
0
1
2
bit
s
5prime
minus20
minus19
minus18
minus17
minus16
minus15
minus14
CGTA
minus13
ACGT
minus12
minus11
minus10
CGTA
minus9
CGTA
minus8
CTA
minus7
CT
minus6
TAG
minus5
GTCA
minus4
CAGT
minus3
TA
minus2
CTA
minus1
CT
0
GC
1
GA
2
GAT
3AT
4GTCA
5
CAGT
6
ATC
7
GA
8
GAT
9
GCAT
10
GCAT
11 12 13
TGCA
14
GCAT
15 16 17 18 19 20 3prime
22 B subtilis Fur sites 229 bits (minus9 +9)
0
1
2
bit
s
5prime
minus20
minus19
GCAT
minus18
GCTA
minus17
GACT
minus16
CTGA
minus15
minus14
GCAT
minus13
GCAT
minus12
minus11
GCAT
minus10
GCTA
minus9
CGTA
minus8
CGTA
minus7
AT
minus6
C
G
minus5
GCA
minus4
CGAT
minus3
TGA
minus2
TGA
minus1
GCAT
0
TAGC
1
GCTA
2
CAT
3
CAT
4
GCTA
5
CGT
6
G
C7
TA
8
GCAT
9
GCAT
10
CGAT
11
CGTA
12 13
CGTA
14
CGTA
15 16
GACT
17
CTGA
18
CGAT
19
CGTA
20 3prime
Figure 4 Comparison of M9 Fur models of three bacterial species Experimentally proven Fur binding sites from E coli (A) P aeruginosa (B) andB subtilis (C) were used to build the models for these three bacteria The marked region from 7 to +7 in each logo indicates the 7-1-7 model thatwas proposed by Baichoo and Helmann (40)
Nucleic Acids Research 2007 Vol 35 No 20 6769
minimizes false positives in our predictions When themodel was scanned across the E coli K-12 genome(NC_000913) a total of 412 sites were found above10 bits These sites belong to 363 unique clusters A list ofthe clusters is available at httpwwwccrnpncifcrfgovtomspapersfurThe 363 regions were extracted from 50 to thorn50
around the strongest site in each region and scanned withthe 24-site M9 model using a cutoff of 0 bits We thendetermined the exact region of each cluster that is coveredby walkers greater than 0 bits The results show that theseclusters contain 1ndash14 walkers covering a region from19ndash142 bases (350 bases on average) This is comparableto the known footprints (30ndash103 bases) (Table 3 Supple-mentary Figure S8) Of these clusters 155 are in inter-genic regions 143 are entirely inside gene coding regionsand 65 overlap the boundary between coding and inter-genic regionsTo determine how many of the Fur clusters overlap
with reasonably strong promoters we scanned these 363Fur clusters with a s70 promoter model (47) using a 4-bitcutoff The results showed that 303 clusters overlap withone or more strong promoters (Rigt4 bits) The other60 clusters do not overlap with a strong promoterWhen we scanned the flanking regions of these 60 clusters(50 to 0 of the left edge and 0 to thorn50 of the right edgeof each cluster) only 23 clusters were found to havestrong promoters (Ri gt 4 bits) in the flanking regionsWe examined the locations of Fur sites relative toknown promoter elements (transcriptional start site 10and 35) and found that they are more tightly distributedaround the promoter elements than observed with theDNA binding proteins Fis H-NS and IHF (47) (data notshown) These results suggest that Fur may mainly act as adirect repressor of genes by binding to and blockingthe promoters and that direct activation by Fur throughbinding to the region upstream of a promoter assuggested in Neisseria meningitidis by Delany et al (59)
may not be a common mechanism of Fur regulationin E coli
Out of the 363 clusters 42 were found containingat least one predicted Fur binding site above 170 bits(our arbitrary cutoff used to locate significant regions)(Table 4) Of the 42 clusters 39 were found to overlap withone or more strong s70 promoters (gt4 bits) The otherthree clusters fepD-entS yddA and fecA do not overlapScanning the flanking regions of these three clusters withthe s70 promoter model revealed that these Fur clustersare located between the respective promoters and transla-tional starts There may be only weak repression of thesegenes by Fur as none of them were significantly repressedby Fur according to microarray analyses (1360)
Scans of other proposed Fur-regulated genes
Many genes have been proposed to be Fur-regulated bycomparing conventional consensus sequences to promoterregions and also by homology to systems in other organ-isms Kammler et al annotated two Fur binding sites inthe promoter region of feoA in E coli by comparison toa consensus sequence (10) The 24-site M9 model predictstwo clusters of sites in the same region with each clustercovering one of the marked Fur boxes One cluster hasone major site of 105 bits (at 3538006) the other clusterhas two major overlapping sites of 109 (at 3538059) and134 bits (at 3538065) respectively The same authors haveconfirmed that Fur does bind to this region in vivo but theexact Fur binding regions have not been determined byfootprint experiments
Vassinova and Kozyrev used an in vivo selection tolocate Fur sites on Sau3A fragments from the E coligenome (51) The five regions from Figure 2 of their paperwere analyzed using sequence walkers Using an unidenti-fied consensus sequence yhhX (from 3578828 to 3577791)was predicted by Vassinova and Kozyrev to have two Furbinding sites In the same region we found one strongsite of 252 bits (at 3579054) (Table 4) overlapping twoweaker sites (82 and 83 bits) This cluster of Fur sites isactually located right in front of an sRNA gene ryhB(from 3579039 to 3578946) (Table 4) (6162) which hasbeen shown to be repressed by Fur (57) The promoterregion of gpmA (from 786818 to 786066 named pgm inreference 51) was predicted by consensus to contain twosites One of the highest sites (274 bits) was found in thisregion (at 786853) overlapping two weaker sites of 75and 56 bits The consensus sequence-predicted Fur site byVassinova and Kozyrev (51) is located about 600 basesupstream of the ygaC gene In the same region our 11-siteM9 model only found a 14-bit site (Figure 3) and the24-site M9 model found a negative site of 34 bits(Supplementary Figure S1) Instead in the nearby regionsthe 24-site M9 model found a 74-bit site (11 basesupstream of ygaC) and a 101-bit site (728 bases upstreamof ygaC) For nohA (from 1634391 to 1633822)two overlapping sites of 223 and 126 bits were found at1634624 and 1634630 (Table 4) The fifth region was fhuF(from 4603686 to 4602898) Figure 5 shows the DNase Ifootprints and predicted walkers in this region Onlythe high-affinity region (fhuF1) was identified by the
Table 2 Information theory test of B subtilis Fur models M9 and M6
Oligo1 Number of observed1
Fur dimers (nM)Number of M92
walkers (bits)Number of M62
walkers (bits)
6-mer 0 f 0 f 06-6 1 (500) f 1 (21) 2 (67 54)6-1-6 1 (200) f 1 (97) 2 (54 74)7-1-7 1 (100) f 1 (207) 2 (107 128)8-1-7 2 (100 1000) f 2 (76 202) f 2 (174 108)8-1-8 2 (100 1000) f 2 (76 230) f 2 (174 160)9-1-9 2 (50 100) f 2 (119 301) f 2 (208 194)2 (7-1-7) 2 (10 100) f 2 (166 257) 3 (61 208 143)feuA 1 (10) f 1 (223) 2 (103 81)dhbA 2 (10 100) f 2 (256 299) 3 (149 208 171)
1Oligos and observed Fur dimers are from (40) The Fur concentration(nM) at which one or two Fur dimer-DNA complexes appeared isgiven in parentheses2Number of sequence walkers of each model followed by theinformation (bits) of each walker are shown for each sequence Thepredictions that match the number of observed Fur dimers are markedwith a solid circle while those that do not are marked with an opencircle Only major walkers are counted A full depiction of all walkerscan be found in Supplementary Figure S7
6770 Nucleic Acids Research 2007 Vol 35 No 20
A B
L
H
H
L
H fhuF1
L fhuF2
yjjZ
4603880 4603870 4603860 4603850 4603840 4603830 4603820 5 a t c a g c a a t c c c g g c a g c a a c a c t c c c c a g c c a c t g c c c a g c g t a c g t t g c a a c a t g a t t t c a t c t c 3
4603810 4603800 4603790 4603780 4603770 4603760 4603750 5 t t t c a t t g a t a a t g a t a a c c a a t a t c a t a t g a t a a t t t t t a t c a t t t g c a a g c c a g a t a a a t c c c t t g c t a t c g g 3
Fur high affinity topFur high affinity bottom
M9 161 bits
M9 208 bits
M9 223 bits
M9 34 bits
fhuF
fhuF
4603740 4603730 4603720 4603710 4603700 4603690 4603680 5 g t a a a c c t a t c g c t a t g a t t a g c a a t c a t t a t c a t t t a g a t t a c t a t c c c g a t t a t g g c c t a t c g t t c 3rsquo
Fur low affinity topFur low affinity bottom
M9 22 bits
M9 01 bits
M9 146 bits
M9 93 bits
Figure 5 Fur model scan and DNase I footprints of the fhuF promoter region (A) The promoter region was scanned with the 24-site M9 model and two clusters of sequence walkers (Rigt 0bits) were detected with each closely corresponding to one of the two Fur protected regions (gray bars fhuF1 and fhuF2) The solid black arrow with two tails above the DNA at coordinates4603717 and 4603716 indicates the fhuF transcription starts The open arrows starting at coordinates 4603827 and 4603686 indicate the yjjZ and fhuF translation starts respectively Thesaturation of colored rectangles behind each sequence walker is proportional to the information of that site For reference colored lines cycle with 6-base periodicity (B) DNase I footprinting onthe fhuF promoter by Fur showed two regions protected by the protein marked in the figure by brackets The high-affinity and low-affinity regions are marked with H and L respectively Thefootprinting samples were run in parallel with MaxamndashGilbert sequencing ladders (marked by GA)
Nucleic
Acid
sResea
rch2007V
ol3
5N
o20
6771
consensus sequence Thus four of the five Sau3Afragments (yhhX gpmA nohA and fhuF) selected to haveFur binding sites in vivo were identified in our genomescan (Table 4) and our gel shift results confirmed this(Figure 3) the remaining site ygaC had a weak walker of14 bits (by the 11-site M9 model) and showed extremelyweak binding (Figure 3)Eick-Helmerich and Braun matched a Fur consensus
sequence to the promoters of exbB and exbD (50)One strong walker of 191 bits was found in the exbBpromoter region overlapping the lsquoFur boxrsquo The regionupstream of exbD showed no walkers with Ri greater than0 bits even though it had also been predicted to containa Fur binding site using the same consensus sequenceas used in the exbB promoter The highest walker thatwe could detect in the region upstream of the exbDgene is 71 bits (by the 24-site M9 model Figure 3 andSupplementary Figure S1) significantly lower than thetheoretical lower bound of a binding site (0 bits) (19) Thisindicates that Fur should not bind this region and our gelshift results confirmed this (Figure 3)A microarray analysis by McHugh et al found 101
genes to be regulated by Fur 53 of which were repressedand 48 activated (13) The 53 Fur-repressed genes belongto 32 transcription units (individual genes or operons)18 of which are involved in iron metabolism 14 in energymetabolism and other functions Using the 24-site M9model we predicted strong Fur binding clusters (higherthan 13 bits) for all 18 iron metabolism-related transcrip-tion units Of the other 14 transcription units onlytwo have a Fur binding site higher than 10 bits (nrdHIEF101 bits fimE 108 bits) The 48 Fur-activated genesbelong to 34 transcription units only three of these(garPLRK ynaE and ydfK) have a predicted Fur bindingcluster The garPLRK operon has a weak Fur clusterthat contains one major site of 104 bits The ynaE andydfK genes are almost identical in both promoter andcoding regions and the two Fur clusters (171 bitscentered at thorn8 from the translational start) are the same(Table 4) These results strongly suggest that in E coli
Fur is a direct repressor of iron metabolism genes and anindirect repressor or activator of other genes
McHugh et al (13) used an altered information theoryapproach for modeling Fur binding (63) to identify whichgenes are under direct Fur control Our model identifiedsites in all genes that their model did as well as fourothers fhuE (124 bits at 1160880) exbB (191 bits at3150076) fimE (108 bits at 4539697) and ydfK (171 bitsat 1631071)
The indirect activation by Fur requires an intermediateregulator which could directly target Fur-activated genesA small regulatory RNA gene ryhB is one such importantplayer Earlier study revealed several genes that aredirectly repressed by ryhB (57) Recently a microarrayanalysis of ryhB control by Masse et al (60) expanded thisdirect target set to 18 operons They also identified anadditional 10 indirectly down-regulated operons and 10up-regulated operons The 10 ryhB indirectly repressedoperons are directly repressed by Fur and for all ofthese operons we found strong Fur binding clusters(higher than 13 bits) in the corresponding promoterregions In contrast only a few of the other 28 ryhB-controlled operons (18 directly repressed operons and10 up-regulated operons) have a predicted Fur bindingcluster above 10 bits These include acnA (176 bits at1333484) ydhD (109 bits at 1732367) and oppA (215 bitsat 1298973) The acnA and oppA Fur clusters are bothstrong the former cluster has three overlapping sites of176 98 and 142 bits and the latter has two overlapp-ing sites of 215 and 96 bits (Table 4) Howevermicroarray analysis did not find these two genes to berepressed by Fur (1360) suggesting that the Fur bindingclusters may not be the primary control elements ofthese genes
DISCUSSION
Fur binding models
In this study we used experimentally proven sites to createinformation theory models of Fur binding Our models
Table 3 Scan of footprinted Fur binding sites in E coli
Accession Gene Footprinted region References Number Ri Percentage
NC_000913 fhuF1 4603763 ndash 4603815 This work 4 223 91NC_000913 fhuF2 4603680 ndash 4603733 This work 4 146 66NC_000913 fepA-fes 611868 ndash 611915 (35) 4 187 77NC_000913 fepB 623939 ndash 623969 (39) 2 167 81NC_000913 entC 624054 ndash 624102 (39) 4 231 76NC_000913 fur 709931 ndash 709960 (34) 2 123 83NC_000913 tonB 1309040 ndash 1309072 (838) 3 115 88NC_000913 cir 2244957 ndash 2244999 (32) 3 207 72NC_000913 sodA 4098726 ndash 4098780 (36) 3 233 56NC_000913 fecA 4514752 ndash 4514789 (33) 2 176 66NC_000913 fecI 4516300 ndash 4516340 (33) 7 231 144M10930 iucA 291 ndash 393 (37) 13 203 93NC_000913 fepD-entS 621394 ndash 621462 (58) 4 201 54
The footprinted regions were scanned with the 24-site M9 model (Rigt0) to test the modelrsquos validity This table lists the GenBank accession numberthe gene promoter that was footprinted the coordinates of the footprinted region the references for the footprints the number of walkers in theregion the strongest Ri value in the same region and the percent of the footprint covered by sequence walkers This table is a summary of Figure 5Aand Supplementary Figure S8 Sites in the promoters that are marked by an asterisk were used to build the 11-site models (Figure 1 SupplementaryFigure S1) in the case of iucA two sites were used in the model (see Materials and Methods Section)
6772 Nucleic Acids Research 2007 Vol 35 No 20
especially the 24-site M9 model approximate the bindingcharacteristics of Fur more fully than previous modelsthat depended on conventional consensus sequences datafrom multiple species and sequences that were not foot-printed (13251) The rigorous approach used hererevealed new binding sites disproved two sites predictedby a consensus sequence and clarified the manner inwhich the protein binds
To produce these models binding sequences weremultiply aligned to maximize the information content indifferent window sizes Because multiple Fur sites overlapour analysis of proven Fur binding sequences from three
bacterial species gave two or three different alignmentsdepending on the species (Figure 1 SupplementaryFigure S6) By testing these models against publisheddata we showed clearly that the M9 model representssingle-dimer binding sites the M12 model is for two-dimerbinding while the compact M6 alignment was caused bycompression of the binding sequences into a small window(Tables 1 and 2 Figures 1 and 2 Supplementary FiguresS3 S5 and S7)As evaluated by their respective M9 models our
initial set of 11 footprinted E coli Fur binding sitesall contain overlapping sites of 6-base spacing
Table 4 Strong Fur clusters in the E coli genome predicted by the 24-site M9 model
Ri Number Size Coordinate Genes Experimental Position References(bits) (bp) data
274 12 95 2066611 yoeA G 48 NA274 5 31 786853 gpmA G 35 (1351)253 4 28 579824 nohB 133 (1)252 4 31 3579054 ryhB G 15 (57)233 3 31 4098746 sodA F 87 (6069)231 7 59 4516309 fecI F 51 (1360)231 4 37 624072 entC F 36 (39)228 2 25 1903412 yebN 300 NA223 4 48 4603779 fhuFa FG 93 (135160)223 3 25 1634624 nohA G 233 (51)217 2 25 1886651 fadD G +1119b NA215 2 25 1298973 oppA G 233 (7071)213 3 31 1762463 sufA 53 (1360)207 3 31 2244980 cir F 189 (1372)206 5 40 1118358 yceJ 89 NA202 3 31 3465053 bfd 40 (1360)201 10 53 1787602 ydiE 35 (13)201 2 35 1080513 ycdN 57 (1360)199 8 106 3214572 yqjH 59 (13)199 7 103 1752756 ydhY 255 NA199 2 25 2510784 mntH 56 (7374)196 3 25 2231892 yeiT 163 NA196 2 25 1577358 yddA +8c (13)195 3 31 167439 fhuA G 45 (1360)194 4 37 621437 fepD ndash entS FG 25 86 (1375)193 3 52 675768 ybeQ +2c NA191 3 31 3150076 exbB G 70 (136072)187 8 96 2784036 ygaQ 383 NA187 4 37 611889 fepA ndash fes F 172 149 (13)186 7 38 3266404 yhaC 33 NA186 4 28 331901 yahA G +306e NA183 4 61 331085 yahA 510e NA176 6 41 1333484 acnA 371 (5776ndash78)176 2 25 4514779 fecA F 79 (1360)176 1 19 2254041 yeiL +5d NA175 3 25 490057 priC ndash ybaN 21 49 (7980)173 2 29 2454142 yfcV 474 NA172 8 41 3582772 yrhB 10 NA172 4 37 840877 fiu 123 (13)171 3 40 1631071 ydfK +8c (13)171 3 40 1432273 ynaE +8c (13)171 2 25 4570225 yjiT 212 NA
The strongest individual information value (Ri bits) in each cluster is shown followed by the number of walkers (Rigt 0 bits) the size of the clusterthe coordinate of the strongest site the genes that are presumably controlled experimental data footprinted (F) andor gel shifted (G) sites thelocation of the strongest Fur site relative to the gene start or stop codon (as noted) and references about Fur regulation of the genes (NA means noreference is available)aThis cluster corresponds to the high-affinity region (fhuF1) in the fhuF promoter (Figure 5)bThe coordinate is high because Fur cluster is inside the fadD coding regioncThese Fur clusters overlap with translational startsdThis Fur cluster overlaps with the stop codoneThe yahA gene has two Fur clusters one is inside the coding region another is 510 bases upstream of the translational start The Fur cluster insidethe gene was tested by gel shift experiments (Figure 3)
Nucleic Acids Research 2007 Vol 35 No 20 6773
(Supplementary Figure S8) while only 7 of the 20P aeruginosa Fur binding sites and 6 of the 22 B subtilisFur binding sites are single-dimer binding sites (data notshown) This scarcity of multiple sites explains why anM12 alignment can be obtained for E coli but not forP aeruginosa or B subtilis When we performed multiplealignment by only using overlapping sites of 6-bp spacingwe obtained M12 alignments for both P aeruginosa andB subtilis (data not shown) These observations alsosupport the use of M9 as a single-dimer modelOur multiple alignment method should apply to any
DNA binding sites that contain internal direct repeats asrepetition within binding sites may cause alternativealignments of the sites In the RegulonDB (httpregulondbccgunammx) (64) a Fur binding modelwas created by multiply aligning 47 Fur sites althoughit is not clear what these 47 sites are and how thealignment was made Based on their alignment (httpregulondbccgunammxhtmlmatrix_Alignmentjsp) we made a sequence logo (data notshown) which we found to be similar to our M6 logosAlthough the M6 models appear to be able to predictsome sites (eg Tables 1 and 2 Supplementary Figures S3S5 and S7) the models only represent the internal partof overlapping Fur binding sites ie from 6 to thorn6 of theM12 model (Figure 1A)A 18 A crystal structure for P aeruginosa Fur
was obtained by Pohl et al and based on this theauthors proposed a single-dimer binding model (9) TheP aeruginosa crystal structure contains a putative DNAbinding a-helix H4 and a loop between helices H1 and H2both of which were proposed to be involved in bindingDNA through contacting bases in the major groove Thusa single dimer would protect two consecutive majorgrooves on the same face of DNA and the minimalrecognition unit of Fur should be close to 20 bp (9) OurM9 binding site models (19 bp a 9-1-9 inverted repeat) fitwith this single-dimer binding model closely Pohl et alalso proposed a two-dimer binding model in whichtwo Fur dimers bind two overlapping sites that are 5-bpapart (9) However in our whole genome scans andfootprint scans as well we did not observe any case of5-bp separated overlapping sites above 5 bits Instead wemostly found 3- or 6-bp separated overlapping sitesAbove five bits we found 95 cases of 3-bp spacing and92 cases of 6-bp spacingSeveral other consensus-based Fur binding site models
have been proposed to interpret a 19-bp consensuslsquoFur boxrsquo (Figure 6) (65) Within these two earlier
models the classical model and hexamer model(573754) interpreted the lsquoFur boxrsquo as a single recogni-tion unit of a Fur dimer Later Lavrrar and McIntoshsuggested that a 13-bp inverted repeat (6-1-6) is theminimal unit recognized by a single Fur dimer and thattwo overlapping lsquo6-1-6rsquo motifs correspond to the Fur boxwhich is required for high-affinity binding of Fur (5558)Baichoo and Helmann reinterpreted the Fur box as twooverlapping 7-1-7 inverted repeats with a 6-bp spacingand suggested that a 7-1-7 site but not 6-1-6 representsthe minimal recognition unit of Fur (40) Baichoo andHelmann also demonstrated that high-affinity bindingby Fur can happen on a single-dimer binding site ie aninverted 7-1-7 site
The 7-1-7 model basically agrees with our M9 models(Figure 6) The main difference is that the 7-1-7 model didnot count the weakly conserved bases at positions 9 8thorn8 and thorn9 in the B subtilis Fur binding site alignment(Figure 4C) As the E coli and P aeruginosa M9 modelshave highly conserved bases at these four positions(Figure 4A and B) and also the Fur binding mechanismsfor these three species are similar (9) we suggest thata minimal Fur binding unit of 19 bp (9-1-9) as representedby our M9 models is more reasonable
Relative to our M9 models the Fur box only representsthe internal part of two overlapping sites that areseparated by 6 bases ie from 9 to thorn9 of the M12logo (Figures 1A and 6) thus any predictions based on theFur box may not be accurate Two sites (exbD and ygaC)that were predicted by matching to the Fur box wereshown to be non-binding (Figure 3)
Many difficulties in understanding Fur binding sites canbe attributed to the choice of consensus sequences asa model (66) In contrast to the individual informationweight matrix model the consensus sequence methodignores the varying importance of bases by treatingmismatches equivalently In addition the consensusmethod does not have a criterion for an acceptablenumber of mismatches
Sequence walkers predicted by the 24-site M9 modelcover 83 of the footprints (Table 3 Figure 5Supplementary Figure S8) Similar results were obtainedfor both the P aeruginosa and B subtilis M9 models(data not shown) These results strongly suggest that ourM9 models can accurately predict Fur binding
Fur Binding Site Clusters
The whole genome scan with the 24-site M9model revealed363 Fur binding clusters in E coli most of which containmultiple walkers (up to 14) covering regions up to about140 bases This is comparable to the size of knownfootprints which ranges from 30 to 103 bases (Table 3Supplementary Figure S8) Most of the clusters (303 outof 363 83) were found to overlap with one or morestrong s70 promoters Indeed the four Fur clusters inthe middle of coding regions gspC garP yahA and fadDthat we tested by gel mobility shift assays (Figure 3) allhave one or more potential s70 sites (Ri gt 4 bits) exactlyoverlapping them (see Figures S10ndashS13) The function ofthese control elements is unknown
Fur boxClassical model 9-1-9Hexamer model 6-6-1-6Lavrrar model 6-1-6
Baichoo model 7-1-7
M9 model 9-1-9
GATAATGATAATCATTATC
========gtOlt========
=====gt=====gtOlt=====
=====gtOlt==========gtOlt=====
t=====gtOlt============gtOlt=====a
aat=====gtOlt================gtOlt=====att
Figure 6 Different Fur binding models to interpret the Fur box
6774 Nucleic Acids Research 2007 Vol 35 No 20
We correlated our whole genome scan with publishedmicroarray data about Fur in E coli (1360) The resultsshowed that all operons that are involved in ironmetabolism have a predicted strong Fur binding cluster(higher than 13 bits) suggesting direct repression ofthese genes by Fur In contrast only a few of otherFur-activated or repressed operons have a predicted Furcluster higher than 10 bits suggesting indirect control ofthese genes by Fur It has been shown that RNA generyhB directly targets and represses a set of other genes andryhB itself is directly repressed by Fur thus ryhB playsan important role in indirect activation of genes by Furin E coli (135760)
It has been suggested that in N meningitidis Fur canalso directly activate genes by binding to the regionupstream of the promoters (59) However our wholegenome analysis did not support this activation mechan-ism in E coli The E coli ryhB gene is located betweengenes yhhX and yhhY No corresponding homologs of thegenes ryhB yhhX or yhhY can be found in N meningitidisThis suggests that Fur activation mechanisms may bedifferent in different species
Disentangling Fur models
Each binding site that we have studied in detail withinformation theory has had unique properties andchallenges for analysis For example ribosome bindingsites were initially modeled as rigid objects (16) but whenenough sites became available it was clear that theShinendashDalgarno sequence was being lost because it has avariable distance to the initiation codon (67) and a flexiblemodel was required (48) The flexible model was latersuccessfully applied to s70 promoters (47) and that wasused to study Fur sites Likewise information theoryanalysis revealed that Fis binding sites are commonlyfound to overlap each other (68) leading to someuncertainty as to how much nearby sites influence thesites being studied in the alignments Overlapping Fursites that are bound by Fur protein dimers create an evenmore complex situation
Although the 24-site M9 model was quite successful themodel is comprised of information from multiple Fur sitesbecause the Fur dimers overlap on the DNA Since theclusters frequently display 6-base separation of sitespositions 3 to thorn9 of the model contain data frompositions 9 to thorn3 of the adjacent Fur sites eg see thelogo of Figure 4 and the M9 walkers of Figure 2AIn addition information from sites separated by 3 basesand the complements of all of these also contribute to theoverall information content of the model It is not obvioushow to disentangle these effects We suggest that the M9model can be scanned across the genome to locate strongsingle-dimer binding sites which could then be experi-mentally confirmed and used to construct a disentangledmodel
Unfortunately the distribution of the individual infor-mation in these sites would be determined by the modelused to locate them and this could introduce biases Forexample the mean of the distribution can be altered justby selecting different subsets of sites from the distribution
A similar bias has occurred in binding site databaseswhere the consensus sequence was used to locate newsites (23) Thus although the sequence logo would nolonger contain self-overlapping data it would not neces-sarily represent the natural distribution of site strengthsTo approach such an ultimate model may require furtherscanning of footprinted regions Incorporation of theweaker sites in the footprints would of course re-entanglethe model so such scans might be best used only todetermine whether the model can detect the weaker sitesA viable solution may be to obtain all single-dimerbinding sites in the genome that have more than zero bitsand can be confirmed experimentally If the resultingmodel matches the observed footprints it may be consi-dered complete On the other hand if the resultingmodel does not match the footprints the mode of bindingby Fur could be different between single dimers andclusters
SUPPLEMENTARY DATA
Supplementary data are available at NAR Online
ACKNOWLEDGEMENTS
We thank Tom OrsquoHalloran and Caryn Outten for Furprotein Pete Rogan for helping to develop the local bestconcept Elaine Bucheimer for writing the localbestprogram and Lakshmanan Iyer Brent Jewett DanielleNeedle Xiao Ma Shu Ouyang Denise Rubens andBruce Shapiro for their helpful comments and discussionsKAL also thanks the National Cancer Institute atFrederick for sponsoring the Werner H Kirsten StudentIntern Program This publication has been funded in partwith Federal funds from the National Cancer InstituteNational Institutes of Health under contract NO1-CO-12400 The content of this publication does not necessarilyreflect the views or policies of the Department of Healthand Human Services nor does mention of trade namescommercial products or organizations imply endorsementby the US Government This research was supported inpart by the Intramural Research Program of the NIHNational Cancer Institute Center for Cancer ResearchFunding to pay the Open Access publication charges forthis article was provided by National Cancer Institute
Conflict of interest statement None declared
REFERENCES
1 StojiljkovicI BaumlerAJ and HantkeK (1994) Fur regulon ingram-negative bacteria Identification and characterization of newiron-regulated Escherichia coli genes by a fur titration assay[published erratum appears in J Mol Biol 1994 Jul 15240(3)271]J Mol Biol 236 531ndash545
2 Le CamE FrechonD BarrayM FourcadeA and DelainE(1994) Observation of binding and polymerization of Fur repressoronto operator-containing DNA with electron and atomic forcemicroscopes Proc Natl Acad Sci USA 91 11816ndash11820
3 CalderwoodSB and MekalanosJJ (1987) Iron regulation ofShiga-like toxin expression in Escherichia coli is mediated by thefur locus J Bacteriol 169 4759ndash4764
Nucleic Acids Research 2007 Vol 35 No 20 6775
4 NiederhofferEC NaranjoCM BradleyKL and FeeJA (1990)Control of Escherichia coli superoxide dismutase (sodA and sodB)genes by the ferric uptake regulation (fur) locus J Bacteriol 1721930ndash1938
5 EscolarL Perez-MartinJ and de LorenzoV (1998) Binding of theFur (Ferric Uptake Regulator) repressor of Escherichia coli toarrays of the GATAAT sequence J Mol Biol 283 537ndash547
6 BaggA and NeilandsJB (1987) Ferric uptake regulation proteinacts as a repressor employing iron (II) as a cofactor to bind theoperator of an iron transport operon in Escherichia coliBiochemistry 26 5471ndash5477
7 de LorenzoV WeeS HerreroM and NeilandsJB (1987)Operator sequences of the aerobactin operon of plasmid ColV-K30binding the ferric uptake regulation (fur) repressor J Bacteriol169 2624ndash2630
8 AlthausEW OuttenCE OlsonKE CaoH andOrsquoHalloranTV (1999) The ferric uptake regulation (Fur) repressoris a zinc metalloprotein Biochemistry 38 6559ndash6569
9 PohlE HallerJC MijovilovichA Meyer-KlauckeWGarmanE and VasilML (2003) Architecture of a protein centralto iron homeostasis crystal structure and spectroscopic analysis ofthe ferric uptake regulator Mol Microbiol 47 903ndash915
10 KammlerM SchonC and HantkeK (1993) Characterization ofthe ferrous iron uptake system of Escherichia coli J Bacteriol 1756212ndash6219
11 DesaiPJ AngererA and GencoCA (1996) Analysis of Furbinding to operator sequences within the Neisseria gonorrhoeae fbpApromoter J Bacteriol 178 5020ndash5023
12 HeidrichC HantkeK BierbaumG and SahlHG (1996)Identification and analysis of a gene encoding a Fur-likeprotein of Staphylococcus epidermidis FEMS Microbiol Lett 140253ndash259
13 McHughJP Rodriguez-QuinonesF Abdul-TehraniHSvistunenkoDA PooleRK CooperCE and AndrewsSC(2003) Global iron-dependent gene regulation in Escherichia coliA new mechanism for iron homeostasis J Biol Chem 27829478ndash29486
14 ShannonCE (1948) A mathematical theory of communicationBell Syst Tech J 27 379ndash423 623ndash656 httpcmbell-labscomcmmswhatshannondaypaperhtml
15 PierceJR (1980) An Introduction to Information Theory SymbolsSignals and Noise 2nd edn Dover Publications Inc New York
16 SchneiderTD StormoGD GoldL and EhrenfeuchtA (1986)Information content of binding sites on nucleotide sequencesJ Mol Biol 188 415ndash431 httpwwwccrnpncifcrfgov~tomspaperschneider1986
17 SchneiderTD and MastronardeD (1996) Fast multiple alignmentof ungapped DNA sequences using information theory and arelaxation method Discrete Appli Math 71 259ndash268 httpwwwccrnpncifcrfgov~tomspapermalign
18 SchneiderTD and StephensRM (1990) Sequence logos a newway to display consensus sequences Nucleic Acids Res 186097ndash6100 httpwwwccrnpncifcrfgov~tomspaperlogopaper
19 SchneiderTD (1997) Information content of individual geneticsequences J Theor Biol 189 427ndash441 httpwwwccrnpncifcrfgov~tomspaperri
20 SchneiderTD (1997) Sequence walkers a graphical method todisplay how binding proteins interact with DNA or RNAsequences Nucleic Acids Res 25 4408ndash4415 httpwwwccrnpncifcrfgov~tomspaperwalker erratum NAR 26(4)1135 1998
21 PaninaEM MironovAA and GelfandMS (2001) Comparativeanalysis of FUR regulons in gamma-proteobacteria Nucleic AcidsRes 29 5195ndash5206
22 SchneiderTD (2000) Evolution of biological informationNucleic Acids Res 28 2794ndash2799 httpwwwccrnpncifcrfgov~tomspaperev
23 VyhlidalCA RoganPK and LeederJS (2004) Development andrefinement of pregnane X receptor (PXR) DNA binding site modelusing information theory insights into PXR-mediated gene regula-tion J Biol Chem 279 46779ndash46786
24 ZhengM DoanB SchneiderTD and StorzG (1999) OxyR andSoxRS regulation of fur J Bacteriol 181 4639ndash4643 httpwwwccrnpncifcrfgov~tomspaperoxyrfur
25 HengenPN BartramSL StewartLE and SchneiderTD (1997)Information analysis of Fis binding sites Nucleic Acids Res 254994ndash5002 httpwwwccrnpncifcrfgov~tomspaperfisinfo
26 WoodTI GriffithKL FawcettWP Jair K-W SchneiderTDand WolfRE (1999) Interdependence of the position andorientation of SoxS binding sites in the transcriptional activation ofthe class I subset of Escherichia coli superoxide-inducible promotersMol Microbiol 34 414ndash430
27 ZhengM WangX DoanB LewisKA SchneiderTD andStorzG (2001) Computation-directed identification of OxyR-DNAbinding sites in Escherichia coli J Bacteriol 183 4571ndash4579
28 RoganPK FauxBM and SchneiderTD (1998) Informationanalysis of human splice site mutations Hum Mutat 12 153ndash171Erratum in Hum Mutat 199913(1)82 httpwwwccrnpncifcrfgov~tomspaperrfs
29 ChenZ and SchneiderTD (2005) Information theory basedT7-like promoter models classification of bacteriophages anddifferential evolution of promoters and their polymerasesNucleic Acids Res 33 6172ndash6187 httpwwwccrnpncifcrfgov~tomspaperst7like
30 ChenZ and SchneiderTD (2006) Comparative analysis of tandemT7-like promoter containing regions in enterobacterial genomesreveals a novel group of genetic islands Nucleic Acids Res 341133ndash1147 httpwwwccrnpncifcrfgov~tomspaperst7island
31 SchneiderTD (1996) Reading of DNA sequence logos predictionof major groove binding by information theory Meth Enzymol274 445ndash455 httpwwwccrnpncifcrfgov~tomspaperoxyr
32 GriggsDW and KoniskyJ (1989) Mechanism for iron-regulatedtranscription of the Escherichia coli cir gene metal-dependentbinding of Fur protein to the promoters J Bacteriol 1711048ndash1052
33 AngererA and BraunV (1998) Iron regulates transcription of theEscherichia coli ferric citrate transport genes directly and throughthe transcription initiation proteins Arch Microbiol 169 483ndash490
34 de LorenzoV HerreroM GiovanniniF and NeilandsJB (1988)Fur (ferric uptake regulation) protein and CAP (catabolite-activatorprotein) modulate transcription of fur gene in Escherichia coliEur J Biochem 173 537ndash546
35 HuntMD PettisGS and McIntoshMA (1994) Promoter andoperator determinants for fur-mediated iron regulation in thebidirectional fepA-fes control region of the Escherichia colienterobactin gene system J Bacteriol 176 3944ndash3955
36 TardatB and TouatiD (1993) Iron and oxygen regulation ofEscherichia coli MnSOD expression competition between the globalregulators Fur and ArcA for binding to DNA Mol Microbiol 953ndash63
37 EscolarL Perez-MartinJ and de LorenzoV (2000) Evidence ofan unusually long operator for the fur repressor in the aerobactinpromoter of Escherichia coli J Biol Chem 275 24709ndash24714
38 YoungGM and PostleK (1994) Repression of tonB transcriptionduring anaerobic growth requires Fur binding at the promoterand a second factor binding upstream Mol Microbiol 11943ndash954
39 BrickmanTJ OzenbergerBA and McIntoshMA (1990)Regulation of divergent transcription from the iron-responsivefepB-entC promoter-operator regions in Escherichia coliJ Mol Biol 212 669ndash682
40 BaichooN and HelmannJD (2002) Recognition of DNA by Fura reinterpretation of the Fur box consensus sequence J Bacteriol184 5826ndash5832
41 OchsnerUA and VasilML (1996) Gene repression by the ferricuptake regulator in Pseudomonas aeruginosa cycle selection of iron-regulated genes Proc Natl Acad Sci USA 93 4409ndash4414
42 BaichooN WangT YeR and HelmannJD (2002) Globalanalysis of the Bacillus subtilis Fur regulon and the iron starvationstimulon Mol Microbiol 45 1613ndash1629
43 ChristoffersenCA BrickmanTJ Hook-BarnardI andMcIntoshMA (2001) Regulatory architecture of the iron-regulatedfepD-ybdA bidirectional promoter region in Escherichia coliJ Bacteriol 183 2059ndash2070
44 ChaiS WelchTJ and CrosaJH (1998) Characterization of theinteraction between Fur and the iron transport promoter of thevirulence plasmid in Vibrio anguillarum J Biol Chem 27333841ndash33847
6776 Nucleic Acids Research 2007 Vol 35 No 20
45 EscolarL Perez-MartinJ and de LorenzoV (1998) Coordinatedrepression in vitro of the divergent fepA-fes promoters ofEscherichia coli by the iron uptake regulation (Fur) proteinJ Bacteriol 180 2579ndash2582
46 EscolarL de LorenzoV and Perez-MartinJ (1997)Metalloregulation in vitro of the aerobactin promoter of Escherichiacoli by the Fur (ferric uptake regulation) protein Mol Microbiol26 799ndash808
47 ShultzabergerRK ChenZ LewisKA and SchneiderTD (2007)Anatomy of Escherichia coli s70 promoters Nucleic Acids Res 35771ndash788 httpwwwccrnpncifcrfgov~tomspaperflexprom
48 ShultzabergerRK BucheimerRE RuddKE andSchneiderTD (2001) Anatomy of Escherichia coli ribosomebinding sites J Mol Biol 313 215ndash228 httpwwwccrnpncifcrfgov~tomspaperflexrbs
49 HiraoI KawaiG YoshizawaS NishimuraY IshidoYWatanabeK and MiuraK (1994) Most compact hairpin-turnstructure exerted by a short DNA fragment d(GCGAAGC) insolution an extraordinarily stable structure resistant to nucleasesand heat Nucleic Acids Res 22 576ndash582
50 Eick-HelmerichK and BraunV (1989) Import of biopolymers intoEscherichia coli nucleotide sequences of the exbB and exbD genesare homologous to those of the tolQ and tolR genes respectivelyJ Bacteriol 171 5117ndash5126
51 VassinovaN and KozyrevD (2000) A method for direct cloning ofFur-regulated genes identification of seven new Fur-regulated lociin Escherichia coli Microbiology 146 3171ndash3182
52 ToledanoMB KullikI TrinhF BairdPT SchneiderTD andStorzG (1994) Redox-dependent shift of OxyR-DNA contactsalong an extended DNA binding site a mechanism for differentialpromoter selection Cell 78 897ndash909
53 BlattnerFR PlunkettG III BlochCA PernaNT BurlandVRileyM Collado-VidesJ GlasnerJD RodeCK et al (1997)The complete genome sequence of Escherichia coli K-12 Science277 1453ndash1474
54 BaggA and NeilandsJB (1987) Molecular mechanism of regula-tion of siderophore-mediated iron assimilation Microbiol Rev 51509ndash518
55 LavrrarJL and McIntoshMA (2003) Architecture of a Furbinding site a comparative analysis J Bacteriol 185 2194ndash2202
56 OrchardK and MayGE (1993) An EMSA-based method fordetermining the molecular weight of a protein ndash DNA complexNucleic Acids Res 21 3335ndash3336
57 MasseE and GottesmanS (2002) A small RNA regulates theexpression of genes involved in iron metabolism in Escherichia coliProc Natl Acad Sci USA 99 4620ndash4625
58 LavrrarJL ChristoffersenCA and McIntoshMA (2002)FurndashDNA interactions at the bidirectional fepDGC-entS promoterregion in Escherichia coli J Mol Biol 322 983ndash995
59 DelanyI RappuoliR and ScarlatoV (2004) Fur functions as anactivator and as a repressor of putative virulence genes in Neisseriameningitidis Mol Microbiol 52 1081ndash1090
60 MasseE VanderpoolCK and GottesmanS (2005) Effect ofRyhB small RNA on global iron use in Escherichia coliJ Bacteriol 187 6962ndash6971
61 ArgamanL HershbergR VogelJ BejeranoG WagnerEGMargalitH and AltuviaS (2001) Novel small RNA-encodinggenes in the intergenic regions of Escherichia coli Curr Biol 11941ndash950
62 WassarmanKM RepoilaF RosenowC StorzG andGottesmanS (2001) Identification of novel small RNAs usingcomparative genomics and microarrays Genes Dev 15 1637ndash1651
63 RobisonK McGuireAM and ChurchGM (1998) A compre-hensive library of DNA-binding site matrices for 55 proteins appliedto the complete Escherichia coli K-12 genome J Mol Biol 284241ndash254
64 SalgadoH Santos-ZavaletaA Gama-CastroS Millan-ZarateDDiaz-PeredoE Sanchez-SolanoF Perez-RuedaEBonavides-MartinezC and Collado-VidesJ (2001) RegulonDB(version 32) transcriptional regulation and operon organization inEscherichia coli K-12 Nucleic Acids Res 29 72ndash74
65 AndrewsSC RobinsonAK and Rodriguez-QuinonesF (2003)Bacterial iron homeostasis FEMS Microbiol Rev 27 215ndash237
66 SchneiderTD (2002) Consensus Sequence Zen ApplBioinformatics 1 111ndash119 httpwwwccrnpncifcrfgov~tomspaperszen
67 RuddKE and SchneiderTD (1992) Compilation of E coliribosome binding sites In MillerJH (ed) A Short Course inBacterial Genetics A Laboratory Manual and Handbook forEscherichia coli and Related Bacteria Cold Spring HarborCold Spring Harbor Laboratory Press New York pp 1719ndash1745
68 HengenPN LyakhovIG StewartLE and SchneiderTD(2003) Molecular flip-flops formed by overlapping Fis sitesNucleic Acids Res 31 6663ndash6673
69 FeeJA (1991) Regulation of sod genes in Escherichia colirelevance to superoxide dismutase function Mol Microbiol 52599ndash2610
70 IgarashiK SaishoT YuguchiM and KashiwagiK (1997)Molecular mechanism of polyamine stimulation of thesynthesis of oligopeptide-binding protein J Biol Chem 2724058ndash4064
71 YoshidaM MeksuriyenD KashiwagiK KawaiG andIgarashiK (1999) Polyamine stimulation of the synthesis ofoligopeptide-binding protein (OppA) Involvement of a structuralchange of the Shine-Dalgarno sequence and the initiation codonaug in oppa mRNA J Biol Chem 274 22723ndash22728
72 LitwinCM and CalderwoodSB (1993) Role of iron in regulationof virulence genes Clin Microbiol Rev 6 137ndash149
73 MakuiH RoigE ColeST HelmannJD GrosP andCellierMF (2000) Identification of the Escherichia coli K-12Nramp orthologue (MntH) as a selective divalent metal iontransporter Mol Microbiol 35 1065ndash1078
74 PatzerSI and HantkeK (2001) Dual repression by Fe(2+)-Furand Mn(2+)-MntR of the mntH gene encoding an NRAMP-likeMn(2+) transporter in Escherichia coli J Bacteriol 1834806ndash4813
75 FurrerJL SandersDN Hook-BarnardIG and McIntoshMA(2002) Export of the siderophore enterobactin in Escherichia coliinvolvement of a 43 kDa membrane exporter Mol Microbiol 441225ndash1234
76 CunninghamL GruerMJ and GuestJR (1997) Transcriptionalregulation of the aconitase genes (acnA and acnB) of Escherichiacoli Microbiology 143 ( Pt 12) 3795ndash3805
77 FuentesAM Diaz-MejiaJJ Maldonado-RodriguezR andAmabile-CuevasCF (2001) Differential activities of theSoxR protein of Escherichia coli SoxS is not required for geneactivation under iron deprivation FEMS Microbiol Lett 201271ndash275
78 VargheseS TangY and ImlayJA (2003) Contrasting sensitivitiesof Escherichia coli aconitases A and B to oxidation and irondepletion J Bacteriol 185 221ndash230
79 ZavitzKH DiGateRJ and MariansKJ (1991) The PriB andPriC replication proteins of Escherichia coli Genes DNAsequence overexpression and purification J Biol Chem 26613988ndash13995
80 SandlerSJ (2005) Requirements for replication restart proteinsduring constitutive stable DNA replication in Escherichia coli K-12Genetics 169 1799ndash1806
81 SchneiderTD (2001) Strong minor groove base conservation insequence logos implies DNA distortion or base flipping duringreplication and transcription initiation Nucleic Acids Res 294881ndash4891 httpwwwccrnpncifcrfgov~tomspaperbaseflip
Nucleic Acids Research 2007 Vol 35 No 20 6777
As with the 11 sites multiple alignment of the 24 sitesusing different window sizes also gave three classes ofalignments M12 M9 and M6 (Supplementary Figure S4)When these models were tested against the same set ofsequences as that used to test the 11-site models (Table 1Supplementary Figure S3) similar but cleaner resultswere obtained ie the weak walkers that were not countedpreviously become weaker (some are even below 0 bits)but the strong walkers still have similar Ri values(Supplementary Figure S5) This confirms that ourcounting of major walkers predicted by the initial 11-sitemodels was reasonable
P aeruginosa and B subtilis Fur models
To compare E coli Fur models with other bacterialFur models we also built P aeruginosa and B subtilis Furbinding site models As with E coli Fur binding siteswe applied the same method to align footprinted Furbinding sites from P aeruginosa (41) and B subtilis (42)For both bacteria we obtained two classes of alignments
(Supplementary Figure S6) which correspond to theE coli M9 and M6 logos (Figure 1B and C) respectivelyNo M12 alignments were obtained for these two bacteriaprobably because some sites used to make the modelsare single-dimer binding sites (see Discussion Section)Both P aeruginosa and B subtilis M9 models arehighly similar to the E coli M9 model (Figure 4) TheP aeruginosa M6 logo looks almost identical to theE coli M6 logo while the B subtilis M6 logo is lesssimilar to the E coli M6 logo (Figure 1C SupplementaryFigure S6B and D)As with Lavrrar and McIntoshrsquos work on E coli Fur
binding sites (55) Baichoo and Helmann (40) performedsimilar experiments on eight synthetic oligos and twonatural sites (feuA and dhbA) with B subtilis Fur anddetermined how many Fur dimers can bind each of theseoligos We tested the B subtilis M9 and M6 modelswith these 10 sequences (Table 2 Figure 2C and Dand Supplementary Figure S7) The results show that theM9 model made correct predictions for all 10 sequenceswhile the M6 model was correct for only four oligos
A
B
fepB binds 2 Fur dimers
entS binds 3 Fur dimers
C
D
feuA binds 1 Fur dimer
dhbA binds 2 Fur dimers
10 20 30 5 g g a a t t c g a a a a t g a g a a g c a t t a t t g g a t c c c g 3
M12 157 bits
M9 109 bits
M9 82 bits
M6 152 bits
10 20 30 40 5 g g a a t t c g a t a a t g a a a t t a a t t a t c g t t a t c g g a t c c c g 3
M12 156 bits
M12 172 bits
M9 75 bits
M9 172 bits
M9 95 bits
M6 83 bits
M6 122 bits
10 20 30 5 a t t c c a a t t g a t a a t a g t t a t c a a t t g a a c a 3
M9 223 bits
M6 103 bits
M6 81 bits
10 20 30 5 a t t g a t a a t g a t a a t c a t t a t c a a t a g a t t g 3
M9 256 bits
M9 299 bits
M6 149 bits
M6 208 bits
M6 171 bits
Figure 2 Sequence walkers to test E coli (A and B) and B subtilis (C and D) Fur models (A and B) The three E coli Fur models (M12 M9 andM6) were tested with two E coli Fur binding sites fepB and entS (55) Sequence walkers of each model (green M12 red M9 blue M6) are shownfor each site (C and D) Two B subtilis Fur models (M9 and M6) were tested with two B subtilis Fur binding sites feuA and dhbA (40) Gray barsindicate natural sequence surrounded by synthetic DNA Different colored rectangles indicate different Fur models (by hue green M12 red M9blue M6) and their strengths (by saturation) (30) Only major walkers are shown in these figures for a full list of walkers (Rigt0 bits) please seeSupplementary Figures S3 S5 and S7
Nucleic Acids Research 2007 Vol 35 No 20 6767
For most of the oligos the M6 model predicted one moresite than the M9 model did Similar to our observationsin E coli these results strongly suggest that the M9 modelis a single-dimer binding model while the M6 alignment isdue to compression of the binding sequencesRelative to E coli Fur (NP_415209) the P aeruginosa
Fur (NP_253452) is moderately diverged (58 identical inprotein sequence) and the B subtilis Fur (NP_390233) isdistantly diverged (33 identity) However a comparisonof the three M9 models from these three species showsthat these models are highly similar to each other(Figure 4) suggesting that the FurndashDNA recognitionmechanism is conserved in even distantly related species
Scans of footprinted regions
Gel shift experiments cannot give precise Fur bindingregions Originally using the 11-site M9 model and laterusing the 24-site M9 E coli Fur model we predicted Fursites in the fhuF promoter region and then to validate ourmodel we performed DNase I footprinting on that regionTo further confirm the 24-site M9 model we also scannedit across all other available footprinted E coli Fur bindingregions and correlated our predictions with the publishedfootprints Fur has been documented to exhibit lsquosecondaryfootprintingrsquo protecting extended regions of DNA underhigher protein concentrations (Supplementary Figure S8)(333646) To reveal the extended regions we used a cutoffat zero bits since that is the theoretical lowest informationfor binding (19)The E coli 24-site M9 model predicts two separated
Fur binding clusters in the fhuF promoter region onecluster contains three strong sites of 161 208 and 223bits and the other contains two weaker sites of 146 and
93 bits (Figure 5A) The fhuF oligo used in the gel shiftexperiments in Figure 3 only contains the strong Furcluster To determine if both predicted clusters are pro-tected by Fur we performed DNase I footprinting on alarger fhuF promoter region (see Materials and MethodsSection) that contains both predicted Fur binding clustersThe results show that there are indeed two Fur-protectedregions in the fhuF promoter one has high affinity withFur (protected by 50 nM of Fur) and the other hasa lower affinity (weakly protected by 50 nM of Fur)(Figure 5B) The predicted strong Fur binding clustercovers 91 of the high-affinity footprinted region and theweak Fur cluster covers 66 of the low-affinity region(Table 3 Figure 5A)
There are 11 other footprinted E coli Fur bindingregions including the 10 regions used to build the initial11-site models and the fepD-entS promoter region (58)When these footprints were scanned similar coveragewas obtained (Table 3 Supplementary Figure S8)All 11 footprinted sequences displayed clusters of multipleoverlapping Fur walkers in these clusters the majorityof the walkers were separated by six bases some wereseparated by three bases (Table 3 SupplementaryFigure S8) For 7 of the 11 regions (fepA-fes fepB entCfur tonB cir and iucA) sequence walkers cover 83(72ndash93) of the footprints No sequence walkers coverthe secondary footprints in the fecA fepD-entS and sodAregions resulting in low coverage of these three footprints(66 54 and 56 respectively) The secondary sodAfootprint was only protected with a high concentration ofpurified Fur (200 and 500 nM) it was not protected bycrude extracts containing overproduced Fur (36) In theother region fecI several low-information content
A
B
yoeA fepD-entS gpmA ryhB fhuA nohA oppA
bits 257 172 243 220 161 205 187 11-site M9274 194 274 252 195 223 215 24-site M9
Fur minus + minus + minus + minus + minus + minus + minus +
gspC garP yahA fadD ygaC exbB exbD fhuF
bits 122 122 163 197 14 175 minus50 214 11-site M9144 138 186 217 minus34 191 minus71 223 24-site M9
Fur minus + minus + minus + minus + minus + minus + minus + minus +
Figure 3 Gel mobility shift assays to test predicted Fur binding sites Oligonucleotides containing predicted Fur binding sites were incubated without() or with (+) 150 nM E coli Fur protein and gel electrophoresed to test the 11-site M9 model (Figure 1B) Below each set of lanes is the strengthin bits of the strongest sequence walker found on each oligo by the 11-site M9 model Predictions by the 24-site M9 model which includes 13 of thesites tested here (not ygaC or exbD) are also given for comparison (A) The first set of oligos contain seven predicted Fur sites located in promoterregions (yoeA fepD-entS gpmA ryhB fhuA nohA and oppA) (B) The second set of oligos contain four predicted Fur sites located within genes(gspC garP yahA and fadD See Supplementary Figures S10ndashS13 for sequence walkers) two sites located in promoter regions (exbB and fhuF)and two consensus-predicted sites (ygaC and exbD) (5051) As expected all oligos that contain predicted sites (by the 11-site M9 model) exhibit oneor more mobility shifts following incubation with 150 nM Fur the oligo ygaC (14 bits) shows extremely weak binding and exbD (50 bits)does not shift
6768 Nucleic Acids Research 2007 Vol 35 No 20
walkers (02 and 08 bits) extend past the range of thefootprint resulting in 144 coverage of the footprint
In a previously published study Escolar et al synthe-sized oligonucleotides that contained repeats of thesequence GATAAT and determined Fur binding byfootprint experiments (5) No footprint was seen withone insert correspondingly no sequence walkers wereobserved in that sequence (Supplementary Figure S9)For the oligos with three four and five inserts sequencewalkers cover 83ndash90 of the footprints The 2-insertsequence had a weak interaction with Fur at high protein
concentrations and two overlapping walkers of 30 and42 bits appeared which cover the repeats (5) Theseresults demonstrate that the Fur model accurately predictsFur binding on both natural and synthetic sequences
Whole genome scan
The lowest information of any site in the 24-site M9 modelis 101 bits (Supplementary Figure S1) Since all sites inthe model are proven binding sites we used 10 bits asthe cutoff for a whole genome scan This presumably
A
B
C
======gtolt======
======gtolt======
======gtolt======
24 E coli Fur binding sites 196 bits (minus9 +9)
0
1
2b
its
5prime
minus20
CTGA
minus19
minus18
minus17
minus16
GCAT
minus15
GCTA
minus14
CGTA
minus13
GACT
minus12
CAGT
minus11
CGTA
minus10
GCAT
minus9
GTA
minus8
GTA
minus7
GCAT
minus6
CAG
minus5
GTCA
minus4
GAT
minus3
TA
minus2
TGCA
minus1
GACT
0 1
TCGA
2
ACGT
3
AT
4
TCA
5
CGAT
6
GTC
7
CTGA
8
CAT
9
CAT
10
CGTA
11
GCAT
12
GTCA
13
CTGA
14
GCAT
15
CGAT
16
CGTA
17 18 19 20
GCAT
3prime
20 P aeruginosa Fur sites 186 bits (minus9 +9)
0
1
2
bit
s
5prime
minus20
minus19
minus18
minus17
minus16
minus15
minus14
CGTA
minus13
ACGT
minus12
minus11
minus10
CGTA
minus9
CGTA
minus8
CTA
minus7
CT
minus6
TAG
minus5
GTCA
minus4
CAGT
minus3
TA
minus2
CTA
minus1
CT
0
GC
1
GA
2
GAT
3AT
4GTCA
5
CAGT
6
ATC
7
GA
8
GAT
9
GCAT
10
GCAT
11 12 13
TGCA
14
GCAT
15 16 17 18 19 20 3prime
22 B subtilis Fur sites 229 bits (minus9 +9)
0
1
2
bit
s
5prime
minus20
minus19
GCAT
minus18
GCTA
minus17
GACT
minus16
CTGA
minus15
minus14
GCAT
minus13
GCAT
minus12
minus11
GCAT
minus10
GCTA
minus9
CGTA
minus8
CGTA
minus7
AT
minus6
C
G
minus5
GCA
minus4
CGAT
minus3
TGA
minus2
TGA
minus1
GCAT
0
TAGC
1
GCTA
2
CAT
3
CAT
4
GCTA
5
CGT
6
G
C7
TA
8
GCAT
9
GCAT
10
CGAT
11
CGTA
12 13
CGTA
14
CGTA
15 16
GACT
17
CTGA
18
CGAT
19
CGTA
20 3prime
Figure 4 Comparison of M9 Fur models of three bacterial species Experimentally proven Fur binding sites from E coli (A) P aeruginosa (B) andB subtilis (C) were used to build the models for these three bacteria The marked region from 7 to +7 in each logo indicates the 7-1-7 model thatwas proposed by Baichoo and Helmann (40)
Nucleic Acids Research 2007 Vol 35 No 20 6769
minimizes false positives in our predictions When themodel was scanned across the E coli K-12 genome(NC_000913) a total of 412 sites were found above10 bits These sites belong to 363 unique clusters A list ofthe clusters is available at httpwwwccrnpncifcrfgovtomspapersfurThe 363 regions were extracted from 50 to thorn50
around the strongest site in each region and scanned withthe 24-site M9 model using a cutoff of 0 bits We thendetermined the exact region of each cluster that is coveredby walkers greater than 0 bits The results show that theseclusters contain 1ndash14 walkers covering a region from19ndash142 bases (350 bases on average) This is comparableto the known footprints (30ndash103 bases) (Table 3 Supple-mentary Figure S8) Of these clusters 155 are in inter-genic regions 143 are entirely inside gene coding regionsand 65 overlap the boundary between coding and inter-genic regionsTo determine how many of the Fur clusters overlap
with reasonably strong promoters we scanned these 363Fur clusters with a s70 promoter model (47) using a 4-bitcutoff The results showed that 303 clusters overlap withone or more strong promoters (Rigt4 bits) The other60 clusters do not overlap with a strong promoterWhen we scanned the flanking regions of these 60 clusters(50 to 0 of the left edge and 0 to thorn50 of the right edgeof each cluster) only 23 clusters were found to havestrong promoters (Ri gt 4 bits) in the flanking regionsWe examined the locations of Fur sites relative toknown promoter elements (transcriptional start site 10and 35) and found that they are more tightly distributedaround the promoter elements than observed with theDNA binding proteins Fis H-NS and IHF (47) (data notshown) These results suggest that Fur may mainly act as adirect repressor of genes by binding to and blockingthe promoters and that direct activation by Fur throughbinding to the region upstream of a promoter assuggested in Neisseria meningitidis by Delany et al (59)
may not be a common mechanism of Fur regulationin E coli
Out of the 363 clusters 42 were found containingat least one predicted Fur binding site above 170 bits(our arbitrary cutoff used to locate significant regions)(Table 4) Of the 42 clusters 39 were found to overlap withone or more strong s70 promoters (gt4 bits) The otherthree clusters fepD-entS yddA and fecA do not overlapScanning the flanking regions of these three clusters withthe s70 promoter model revealed that these Fur clustersare located between the respective promoters and transla-tional starts There may be only weak repression of thesegenes by Fur as none of them were significantly repressedby Fur according to microarray analyses (1360)
Scans of other proposed Fur-regulated genes
Many genes have been proposed to be Fur-regulated bycomparing conventional consensus sequences to promoterregions and also by homology to systems in other organ-isms Kammler et al annotated two Fur binding sites inthe promoter region of feoA in E coli by comparison toa consensus sequence (10) The 24-site M9 model predictstwo clusters of sites in the same region with each clustercovering one of the marked Fur boxes One cluster hasone major site of 105 bits (at 3538006) the other clusterhas two major overlapping sites of 109 (at 3538059) and134 bits (at 3538065) respectively The same authors haveconfirmed that Fur does bind to this region in vivo but theexact Fur binding regions have not been determined byfootprint experiments
Vassinova and Kozyrev used an in vivo selection tolocate Fur sites on Sau3A fragments from the E coligenome (51) The five regions from Figure 2 of their paperwere analyzed using sequence walkers Using an unidenti-fied consensus sequence yhhX (from 3578828 to 3577791)was predicted by Vassinova and Kozyrev to have two Furbinding sites In the same region we found one strongsite of 252 bits (at 3579054) (Table 4) overlapping twoweaker sites (82 and 83 bits) This cluster of Fur sites isactually located right in front of an sRNA gene ryhB(from 3579039 to 3578946) (Table 4) (6162) which hasbeen shown to be repressed by Fur (57) The promoterregion of gpmA (from 786818 to 786066 named pgm inreference 51) was predicted by consensus to contain twosites One of the highest sites (274 bits) was found in thisregion (at 786853) overlapping two weaker sites of 75and 56 bits The consensus sequence-predicted Fur site byVassinova and Kozyrev (51) is located about 600 basesupstream of the ygaC gene In the same region our 11-siteM9 model only found a 14-bit site (Figure 3) and the24-site M9 model found a negative site of 34 bits(Supplementary Figure S1) Instead in the nearby regionsthe 24-site M9 model found a 74-bit site (11 basesupstream of ygaC) and a 101-bit site (728 bases upstreamof ygaC) For nohA (from 1634391 to 1633822)two overlapping sites of 223 and 126 bits were found at1634624 and 1634630 (Table 4) The fifth region was fhuF(from 4603686 to 4602898) Figure 5 shows the DNase Ifootprints and predicted walkers in this region Onlythe high-affinity region (fhuF1) was identified by the
Table 2 Information theory test of B subtilis Fur models M9 and M6
Oligo1 Number of observed1
Fur dimers (nM)Number of M92
walkers (bits)Number of M62
walkers (bits)
6-mer 0 f 0 f 06-6 1 (500) f 1 (21) 2 (67 54)6-1-6 1 (200) f 1 (97) 2 (54 74)7-1-7 1 (100) f 1 (207) 2 (107 128)8-1-7 2 (100 1000) f 2 (76 202) f 2 (174 108)8-1-8 2 (100 1000) f 2 (76 230) f 2 (174 160)9-1-9 2 (50 100) f 2 (119 301) f 2 (208 194)2 (7-1-7) 2 (10 100) f 2 (166 257) 3 (61 208 143)feuA 1 (10) f 1 (223) 2 (103 81)dhbA 2 (10 100) f 2 (256 299) 3 (149 208 171)
1Oligos and observed Fur dimers are from (40) The Fur concentration(nM) at which one or two Fur dimer-DNA complexes appeared isgiven in parentheses2Number of sequence walkers of each model followed by theinformation (bits) of each walker are shown for each sequence Thepredictions that match the number of observed Fur dimers are markedwith a solid circle while those that do not are marked with an opencircle Only major walkers are counted A full depiction of all walkerscan be found in Supplementary Figure S7
6770 Nucleic Acids Research 2007 Vol 35 No 20
A B
L
H
H
L
H fhuF1
L fhuF2
yjjZ
4603880 4603870 4603860 4603850 4603840 4603830 4603820 5 a t c a g c a a t c c c g g c a g c a a c a c t c c c c a g c c a c t g c c c a g c g t a c g t t g c a a c a t g a t t t c a t c t c 3
4603810 4603800 4603790 4603780 4603770 4603760 4603750 5 t t t c a t t g a t a a t g a t a a c c a a t a t c a t a t g a t a a t t t t t a t c a t t t g c a a g c c a g a t a a a t c c c t t g c t a t c g g 3
Fur high affinity topFur high affinity bottom
M9 161 bits
M9 208 bits
M9 223 bits
M9 34 bits
fhuF
fhuF
4603740 4603730 4603720 4603710 4603700 4603690 4603680 5 g t a a a c c t a t c g c t a t g a t t a g c a a t c a t t a t c a t t t a g a t t a c t a t c c c g a t t a t g g c c t a t c g t t c 3rsquo
Fur low affinity topFur low affinity bottom
M9 22 bits
M9 01 bits
M9 146 bits
M9 93 bits
Figure 5 Fur model scan and DNase I footprints of the fhuF promoter region (A) The promoter region was scanned with the 24-site M9 model and two clusters of sequence walkers (Rigt 0bits) were detected with each closely corresponding to one of the two Fur protected regions (gray bars fhuF1 and fhuF2) The solid black arrow with two tails above the DNA at coordinates4603717 and 4603716 indicates the fhuF transcription starts The open arrows starting at coordinates 4603827 and 4603686 indicate the yjjZ and fhuF translation starts respectively Thesaturation of colored rectangles behind each sequence walker is proportional to the information of that site For reference colored lines cycle with 6-base periodicity (B) DNase I footprinting onthe fhuF promoter by Fur showed two regions protected by the protein marked in the figure by brackets The high-affinity and low-affinity regions are marked with H and L respectively Thefootprinting samples were run in parallel with MaxamndashGilbert sequencing ladders (marked by GA)
Nucleic
Acid
sResea
rch2007V
ol3
5N
o20
6771
consensus sequence Thus four of the five Sau3Afragments (yhhX gpmA nohA and fhuF) selected to haveFur binding sites in vivo were identified in our genomescan (Table 4) and our gel shift results confirmed this(Figure 3) the remaining site ygaC had a weak walker of14 bits (by the 11-site M9 model) and showed extremelyweak binding (Figure 3)Eick-Helmerich and Braun matched a Fur consensus
sequence to the promoters of exbB and exbD (50)One strong walker of 191 bits was found in the exbBpromoter region overlapping the lsquoFur boxrsquo The regionupstream of exbD showed no walkers with Ri greater than0 bits even though it had also been predicted to containa Fur binding site using the same consensus sequenceas used in the exbB promoter The highest walker thatwe could detect in the region upstream of the exbDgene is 71 bits (by the 24-site M9 model Figure 3 andSupplementary Figure S1) significantly lower than thetheoretical lower bound of a binding site (0 bits) (19) Thisindicates that Fur should not bind this region and our gelshift results confirmed this (Figure 3)A microarray analysis by McHugh et al found 101
genes to be regulated by Fur 53 of which were repressedand 48 activated (13) The 53 Fur-repressed genes belongto 32 transcription units (individual genes or operons)18 of which are involved in iron metabolism 14 in energymetabolism and other functions Using the 24-site M9model we predicted strong Fur binding clusters (higherthan 13 bits) for all 18 iron metabolism-related transcrip-tion units Of the other 14 transcription units onlytwo have a Fur binding site higher than 10 bits (nrdHIEF101 bits fimE 108 bits) The 48 Fur-activated genesbelong to 34 transcription units only three of these(garPLRK ynaE and ydfK) have a predicted Fur bindingcluster The garPLRK operon has a weak Fur clusterthat contains one major site of 104 bits The ynaE andydfK genes are almost identical in both promoter andcoding regions and the two Fur clusters (171 bitscentered at thorn8 from the translational start) are the same(Table 4) These results strongly suggest that in E coli
Fur is a direct repressor of iron metabolism genes and anindirect repressor or activator of other genes
McHugh et al (13) used an altered information theoryapproach for modeling Fur binding (63) to identify whichgenes are under direct Fur control Our model identifiedsites in all genes that their model did as well as fourothers fhuE (124 bits at 1160880) exbB (191 bits at3150076) fimE (108 bits at 4539697) and ydfK (171 bitsat 1631071)
The indirect activation by Fur requires an intermediateregulator which could directly target Fur-activated genesA small regulatory RNA gene ryhB is one such importantplayer Earlier study revealed several genes that aredirectly repressed by ryhB (57) Recently a microarrayanalysis of ryhB control by Masse et al (60) expanded thisdirect target set to 18 operons They also identified anadditional 10 indirectly down-regulated operons and 10up-regulated operons The 10 ryhB indirectly repressedoperons are directly repressed by Fur and for all ofthese operons we found strong Fur binding clusters(higher than 13 bits) in the corresponding promoterregions In contrast only a few of the other 28 ryhB-controlled operons (18 directly repressed operons and10 up-regulated operons) have a predicted Fur bindingcluster above 10 bits These include acnA (176 bits at1333484) ydhD (109 bits at 1732367) and oppA (215 bitsat 1298973) The acnA and oppA Fur clusters are bothstrong the former cluster has three overlapping sites of176 98 and 142 bits and the latter has two overlapp-ing sites of 215 and 96 bits (Table 4) Howevermicroarray analysis did not find these two genes to berepressed by Fur (1360) suggesting that the Fur bindingclusters may not be the primary control elements ofthese genes
DISCUSSION
Fur binding models
In this study we used experimentally proven sites to createinformation theory models of Fur binding Our models
Table 3 Scan of footprinted Fur binding sites in E coli
Accession Gene Footprinted region References Number Ri Percentage
NC_000913 fhuF1 4603763 ndash 4603815 This work 4 223 91NC_000913 fhuF2 4603680 ndash 4603733 This work 4 146 66NC_000913 fepA-fes 611868 ndash 611915 (35) 4 187 77NC_000913 fepB 623939 ndash 623969 (39) 2 167 81NC_000913 entC 624054 ndash 624102 (39) 4 231 76NC_000913 fur 709931 ndash 709960 (34) 2 123 83NC_000913 tonB 1309040 ndash 1309072 (838) 3 115 88NC_000913 cir 2244957 ndash 2244999 (32) 3 207 72NC_000913 sodA 4098726 ndash 4098780 (36) 3 233 56NC_000913 fecA 4514752 ndash 4514789 (33) 2 176 66NC_000913 fecI 4516300 ndash 4516340 (33) 7 231 144M10930 iucA 291 ndash 393 (37) 13 203 93NC_000913 fepD-entS 621394 ndash 621462 (58) 4 201 54
The footprinted regions were scanned with the 24-site M9 model (Rigt0) to test the modelrsquos validity This table lists the GenBank accession numberthe gene promoter that was footprinted the coordinates of the footprinted region the references for the footprints the number of walkers in theregion the strongest Ri value in the same region and the percent of the footprint covered by sequence walkers This table is a summary of Figure 5Aand Supplementary Figure S8 Sites in the promoters that are marked by an asterisk were used to build the 11-site models (Figure 1 SupplementaryFigure S1) in the case of iucA two sites were used in the model (see Materials and Methods Section)
6772 Nucleic Acids Research 2007 Vol 35 No 20
especially the 24-site M9 model approximate the bindingcharacteristics of Fur more fully than previous modelsthat depended on conventional consensus sequences datafrom multiple species and sequences that were not foot-printed (13251) The rigorous approach used hererevealed new binding sites disproved two sites predictedby a consensus sequence and clarified the manner inwhich the protein binds
To produce these models binding sequences weremultiply aligned to maximize the information content indifferent window sizes Because multiple Fur sites overlapour analysis of proven Fur binding sequences from three
bacterial species gave two or three different alignmentsdepending on the species (Figure 1 SupplementaryFigure S6) By testing these models against publisheddata we showed clearly that the M9 model representssingle-dimer binding sites the M12 model is for two-dimerbinding while the compact M6 alignment was caused bycompression of the binding sequences into a small window(Tables 1 and 2 Figures 1 and 2 Supplementary FiguresS3 S5 and S7)As evaluated by their respective M9 models our
initial set of 11 footprinted E coli Fur binding sitesall contain overlapping sites of 6-base spacing
Table 4 Strong Fur clusters in the E coli genome predicted by the 24-site M9 model
Ri Number Size Coordinate Genes Experimental Position References(bits) (bp) data
274 12 95 2066611 yoeA G 48 NA274 5 31 786853 gpmA G 35 (1351)253 4 28 579824 nohB 133 (1)252 4 31 3579054 ryhB G 15 (57)233 3 31 4098746 sodA F 87 (6069)231 7 59 4516309 fecI F 51 (1360)231 4 37 624072 entC F 36 (39)228 2 25 1903412 yebN 300 NA223 4 48 4603779 fhuFa FG 93 (135160)223 3 25 1634624 nohA G 233 (51)217 2 25 1886651 fadD G +1119b NA215 2 25 1298973 oppA G 233 (7071)213 3 31 1762463 sufA 53 (1360)207 3 31 2244980 cir F 189 (1372)206 5 40 1118358 yceJ 89 NA202 3 31 3465053 bfd 40 (1360)201 10 53 1787602 ydiE 35 (13)201 2 35 1080513 ycdN 57 (1360)199 8 106 3214572 yqjH 59 (13)199 7 103 1752756 ydhY 255 NA199 2 25 2510784 mntH 56 (7374)196 3 25 2231892 yeiT 163 NA196 2 25 1577358 yddA +8c (13)195 3 31 167439 fhuA G 45 (1360)194 4 37 621437 fepD ndash entS FG 25 86 (1375)193 3 52 675768 ybeQ +2c NA191 3 31 3150076 exbB G 70 (136072)187 8 96 2784036 ygaQ 383 NA187 4 37 611889 fepA ndash fes F 172 149 (13)186 7 38 3266404 yhaC 33 NA186 4 28 331901 yahA G +306e NA183 4 61 331085 yahA 510e NA176 6 41 1333484 acnA 371 (5776ndash78)176 2 25 4514779 fecA F 79 (1360)176 1 19 2254041 yeiL +5d NA175 3 25 490057 priC ndash ybaN 21 49 (7980)173 2 29 2454142 yfcV 474 NA172 8 41 3582772 yrhB 10 NA172 4 37 840877 fiu 123 (13)171 3 40 1631071 ydfK +8c (13)171 3 40 1432273 ynaE +8c (13)171 2 25 4570225 yjiT 212 NA
The strongest individual information value (Ri bits) in each cluster is shown followed by the number of walkers (Rigt 0 bits) the size of the clusterthe coordinate of the strongest site the genes that are presumably controlled experimental data footprinted (F) andor gel shifted (G) sites thelocation of the strongest Fur site relative to the gene start or stop codon (as noted) and references about Fur regulation of the genes (NA means noreference is available)aThis cluster corresponds to the high-affinity region (fhuF1) in the fhuF promoter (Figure 5)bThe coordinate is high because Fur cluster is inside the fadD coding regioncThese Fur clusters overlap with translational startsdThis Fur cluster overlaps with the stop codoneThe yahA gene has two Fur clusters one is inside the coding region another is 510 bases upstream of the translational start The Fur cluster insidethe gene was tested by gel shift experiments (Figure 3)
Nucleic Acids Research 2007 Vol 35 No 20 6773
(Supplementary Figure S8) while only 7 of the 20P aeruginosa Fur binding sites and 6 of the 22 B subtilisFur binding sites are single-dimer binding sites (data notshown) This scarcity of multiple sites explains why anM12 alignment can be obtained for E coli but not forP aeruginosa or B subtilis When we performed multiplealignment by only using overlapping sites of 6-bp spacingwe obtained M12 alignments for both P aeruginosa andB subtilis (data not shown) These observations alsosupport the use of M9 as a single-dimer modelOur multiple alignment method should apply to any
DNA binding sites that contain internal direct repeats asrepetition within binding sites may cause alternativealignments of the sites In the RegulonDB (httpregulondbccgunammx) (64) a Fur binding modelwas created by multiply aligning 47 Fur sites althoughit is not clear what these 47 sites are and how thealignment was made Based on their alignment (httpregulondbccgunammxhtmlmatrix_Alignmentjsp) we made a sequence logo (data notshown) which we found to be similar to our M6 logosAlthough the M6 models appear to be able to predictsome sites (eg Tables 1 and 2 Supplementary Figures S3S5 and S7) the models only represent the internal partof overlapping Fur binding sites ie from 6 to thorn6 of theM12 model (Figure 1A)A 18 A crystal structure for P aeruginosa Fur
was obtained by Pohl et al and based on this theauthors proposed a single-dimer binding model (9) TheP aeruginosa crystal structure contains a putative DNAbinding a-helix H4 and a loop between helices H1 and H2both of which were proposed to be involved in bindingDNA through contacting bases in the major groove Thusa single dimer would protect two consecutive majorgrooves on the same face of DNA and the minimalrecognition unit of Fur should be close to 20 bp (9) OurM9 binding site models (19 bp a 9-1-9 inverted repeat) fitwith this single-dimer binding model closely Pohl et alalso proposed a two-dimer binding model in whichtwo Fur dimers bind two overlapping sites that are 5-bpapart (9) However in our whole genome scans andfootprint scans as well we did not observe any case of5-bp separated overlapping sites above 5 bits Instead wemostly found 3- or 6-bp separated overlapping sitesAbove five bits we found 95 cases of 3-bp spacing and92 cases of 6-bp spacingSeveral other consensus-based Fur binding site models
have been proposed to interpret a 19-bp consensuslsquoFur boxrsquo (Figure 6) (65) Within these two earlier
models the classical model and hexamer model(573754) interpreted the lsquoFur boxrsquo as a single recogni-tion unit of a Fur dimer Later Lavrrar and McIntoshsuggested that a 13-bp inverted repeat (6-1-6) is theminimal unit recognized by a single Fur dimer and thattwo overlapping lsquo6-1-6rsquo motifs correspond to the Fur boxwhich is required for high-affinity binding of Fur (5558)Baichoo and Helmann reinterpreted the Fur box as twooverlapping 7-1-7 inverted repeats with a 6-bp spacingand suggested that a 7-1-7 site but not 6-1-6 representsthe minimal recognition unit of Fur (40) Baichoo andHelmann also demonstrated that high-affinity bindingby Fur can happen on a single-dimer binding site ie aninverted 7-1-7 site
The 7-1-7 model basically agrees with our M9 models(Figure 6) The main difference is that the 7-1-7 model didnot count the weakly conserved bases at positions 9 8thorn8 and thorn9 in the B subtilis Fur binding site alignment(Figure 4C) As the E coli and P aeruginosa M9 modelshave highly conserved bases at these four positions(Figure 4A and B) and also the Fur binding mechanismsfor these three species are similar (9) we suggest thata minimal Fur binding unit of 19 bp (9-1-9) as representedby our M9 models is more reasonable
Relative to our M9 models the Fur box only representsthe internal part of two overlapping sites that areseparated by 6 bases ie from 9 to thorn9 of the M12logo (Figures 1A and 6) thus any predictions based on theFur box may not be accurate Two sites (exbD and ygaC)that were predicted by matching to the Fur box wereshown to be non-binding (Figure 3)
Many difficulties in understanding Fur binding sites canbe attributed to the choice of consensus sequences asa model (66) In contrast to the individual informationweight matrix model the consensus sequence methodignores the varying importance of bases by treatingmismatches equivalently In addition the consensusmethod does not have a criterion for an acceptablenumber of mismatches
Sequence walkers predicted by the 24-site M9 modelcover 83 of the footprints (Table 3 Figure 5Supplementary Figure S8) Similar results were obtainedfor both the P aeruginosa and B subtilis M9 models(data not shown) These results strongly suggest that ourM9 models can accurately predict Fur binding
Fur Binding Site Clusters
The whole genome scan with the 24-site M9model revealed363 Fur binding clusters in E coli most of which containmultiple walkers (up to 14) covering regions up to about140 bases This is comparable to the size of knownfootprints which ranges from 30 to 103 bases (Table 3Supplementary Figure S8) Most of the clusters (303 outof 363 83) were found to overlap with one or morestrong s70 promoters Indeed the four Fur clusters inthe middle of coding regions gspC garP yahA and fadDthat we tested by gel mobility shift assays (Figure 3) allhave one or more potential s70 sites (Ri gt 4 bits) exactlyoverlapping them (see Figures S10ndashS13) The function ofthese control elements is unknown
Fur boxClassical model 9-1-9Hexamer model 6-6-1-6Lavrrar model 6-1-6
Baichoo model 7-1-7
M9 model 9-1-9
GATAATGATAATCATTATC
========gtOlt========
=====gt=====gtOlt=====
=====gtOlt==========gtOlt=====
t=====gtOlt============gtOlt=====a
aat=====gtOlt================gtOlt=====att
Figure 6 Different Fur binding models to interpret the Fur box
6774 Nucleic Acids Research 2007 Vol 35 No 20
We correlated our whole genome scan with publishedmicroarray data about Fur in E coli (1360) The resultsshowed that all operons that are involved in ironmetabolism have a predicted strong Fur binding cluster(higher than 13 bits) suggesting direct repression ofthese genes by Fur In contrast only a few of otherFur-activated or repressed operons have a predicted Furcluster higher than 10 bits suggesting indirect control ofthese genes by Fur It has been shown that RNA generyhB directly targets and represses a set of other genes andryhB itself is directly repressed by Fur thus ryhB playsan important role in indirect activation of genes by Furin E coli (135760)
It has been suggested that in N meningitidis Fur canalso directly activate genes by binding to the regionupstream of the promoters (59) However our wholegenome analysis did not support this activation mechan-ism in E coli The E coli ryhB gene is located betweengenes yhhX and yhhY No corresponding homologs of thegenes ryhB yhhX or yhhY can be found in N meningitidisThis suggests that Fur activation mechanisms may bedifferent in different species
Disentangling Fur models
Each binding site that we have studied in detail withinformation theory has had unique properties andchallenges for analysis For example ribosome bindingsites were initially modeled as rigid objects (16) but whenenough sites became available it was clear that theShinendashDalgarno sequence was being lost because it has avariable distance to the initiation codon (67) and a flexiblemodel was required (48) The flexible model was latersuccessfully applied to s70 promoters (47) and that wasused to study Fur sites Likewise information theoryanalysis revealed that Fis binding sites are commonlyfound to overlap each other (68) leading to someuncertainty as to how much nearby sites influence thesites being studied in the alignments Overlapping Fursites that are bound by Fur protein dimers create an evenmore complex situation
Although the 24-site M9 model was quite successful themodel is comprised of information from multiple Fur sitesbecause the Fur dimers overlap on the DNA Since theclusters frequently display 6-base separation of sitespositions 3 to thorn9 of the model contain data frompositions 9 to thorn3 of the adjacent Fur sites eg see thelogo of Figure 4 and the M9 walkers of Figure 2AIn addition information from sites separated by 3 basesand the complements of all of these also contribute to theoverall information content of the model It is not obvioushow to disentangle these effects We suggest that the M9model can be scanned across the genome to locate strongsingle-dimer binding sites which could then be experi-mentally confirmed and used to construct a disentangledmodel
Unfortunately the distribution of the individual infor-mation in these sites would be determined by the modelused to locate them and this could introduce biases Forexample the mean of the distribution can be altered justby selecting different subsets of sites from the distribution
A similar bias has occurred in binding site databaseswhere the consensus sequence was used to locate newsites (23) Thus although the sequence logo would nolonger contain self-overlapping data it would not neces-sarily represent the natural distribution of site strengthsTo approach such an ultimate model may require furtherscanning of footprinted regions Incorporation of theweaker sites in the footprints would of course re-entanglethe model so such scans might be best used only todetermine whether the model can detect the weaker sitesA viable solution may be to obtain all single-dimerbinding sites in the genome that have more than zero bitsand can be confirmed experimentally If the resultingmodel matches the observed footprints it may be consi-dered complete On the other hand if the resultingmodel does not match the footprints the mode of bindingby Fur could be different between single dimers andclusters
SUPPLEMENTARY DATA
Supplementary data are available at NAR Online
ACKNOWLEDGEMENTS
We thank Tom OrsquoHalloran and Caryn Outten for Furprotein Pete Rogan for helping to develop the local bestconcept Elaine Bucheimer for writing the localbestprogram and Lakshmanan Iyer Brent Jewett DanielleNeedle Xiao Ma Shu Ouyang Denise Rubens andBruce Shapiro for their helpful comments and discussionsKAL also thanks the National Cancer Institute atFrederick for sponsoring the Werner H Kirsten StudentIntern Program This publication has been funded in partwith Federal funds from the National Cancer InstituteNational Institutes of Health under contract NO1-CO-12400 The content of this publication does not necessarilyreflect the views or policies of the Department of Healthand Human Services nor does mention of trade namescommercial products or organizations imply endorsementby the US Government This research was supported inpart by the Intramural Research Program of the NIHNational Cancer Institute Center for Cancer ResearchFunding to pay the Open Access publication charges forthis article was provided by National Cancer Institute
Conflict of interest statement None declared
REFERENCES
1 StojiljkovicI BaumlerAJ and HantkeK (1994) Fur regulon ingram-negative bacteria Identification and characterization of newiron-regulated Escherichia coli genes by a fur titration assay[published erratum appears in J Mol Biol 1994 Jul 15240(3)271]J Mol Biol 236 531ndash545
2 Le CamE FrechonD BarrayM FourcadeA and DelainE(1994) Observation of binding and polymerization of Fur repressoronto operator-containing DNA with electron and atomic forcemicroscopes Proc Natl Acad Sci USA 91 11816ndash11820
3 CalderwoodSB and MekalanosJJ (1987) Iron regulation ofShiga-like toxin expression in Escherichia coli is mediated by thefur locus J Bacteriol 169 4759ndash4764
Nucleic Acids Research 2007 Vol 35 No 20 6775
4 NiederhofferEC NaranjoCM BradleyKL and FeeJA (1990)Control of Escherichia coli superoxide dismutase (sodA and sodB)genes by the ferric uptake regulation (fur) locus J Bacteriol 1721930ndash1938
5 EscolarL Perez-MartinJ and de LorenzoV (1998) Binding of theFur (Ferric Uptake Regulator) repressor of Escherichia coli toarrays of the GATAAT sequence J Mol Biol 283 537ndash547
6 BaggA and NeilandsJB (1987) Ferric uptake regulation proteinacts as a repressor employing iron (II) as a cofactor to bind theoperator of an iron transport operon in Escherichia coliBiochemistry 26 5471ndash5477
7 de LorenzoV WeeS HerreroM and NeilandsJB (1987)Operator sequences of the aerobactin operon of plasmid ColV-K30binding the ferric uptake regulation (fur) repressor J Bacteriol169 2624ndash2630
8 AlthausEW OuttenCE OlsonKE CaoH andOrsquoHalloranTV (1999) The ferric uptake regulation (Fur) repressoris a zinc metalloprotein Biochemistry 38 6559ndash6569
9 PohlE HallerJC MijovilovichA Meyer-KlauckeWGarmanE and VasilML (2003) Architecture of a protein centralto iron homeostasis crystal structure and spectroscopic analysis ofthe ferric uptake regulator Mol Microbiol 47 903ndash915
10 KammlerM SchonC and HantkeK (1993) Characterization ofthe ferrous iron uptake system of Escherichia coli J Bacteriol 1756212ndash6219
11 DesaiPJ AngererA and GencoCA (1996) Analysis of Furbinding to operator sequences within the Neisseria gonorrhoeae fbpApromoter J Bacteriol 178 5020ndash5023
12 HeidrichC HantkeK BierbaumG and SahlHG (1996)Identification and analysis of a gene encoding a Fur-likeprotein of Staphylococcus epidermidis FEMS Microbiol Lett 140253ndash259
13 McHughJP Rodriguez-QuinonesF Abdul-TehraniHSvistunenkoDA PooleRK CooperCE and AndrewsSC(2003) Global iron-dependent gene regulation in Escherichia coliA new mechanism for iron homeostasis J Biol Chem 27829478ndash29486
14 ShannonCE (1948) A mathematical theory of communicationBell Syst Tech J 27 379ndash423 623ndash656 httpcmbell-labscomcmmswhatshannondaypaperhtml
15 PierceJR (1980) An Introduction to Information Theory SymbolsSignals and Noise 2nd edn Dover Publications Inc New York
16 SchneiderTD StormoGD GoldL and EhrenfeuchtA (1986)Information content of binding sites on nucleotide sequencesJ Mol Biol 188 415ndash431 httpwwwccrnpncifcrfgov~tomspaperschneider1986
17 SchneiderTD and MastronardeD (1996) Fast multiple alignmentof ungapped DNA sequences using information theory and arelaxation method Discrete Appli Math 71 259ndash268 httpwwwccrnpncifcrfgov~tomspapermalign
18 SchneiderTD and StephensRM (1990) Sequence logos a newway to display consensus sequences Nucleic Acids Res 186097ndash6100 httpwwwccrnpncifcrfgov~tomspaperlogopaper
19 SchneiderTD (1997) Information content of individual geneticsequences J Theor Biol 189 427ndash441 httpwwwccrnpncifcrfgov~tomspaperri
20 SchneiderTD (1997) Sequence walkers a graphical method todisplay how binding proteins interact with DNA or RNAsequences Nucleic Acids Res 25 4408ndash4415 httpwwwccrnpncifcrfgov~tomspaperwalker erratum NAR 26(4)1135 1998
21 PaninaEM MironovAA and GelfandMS (2001) Comparativeanalysis of FUR regulons in gamma-proteobacteria Nucleic AcidsRes 29 5195ndash5206
22 SchneiderTD (2000) Evolution of biological informationNucleic Acids Res 28 2794ndash2799 httpwwwccrnpncifcrfgov~tomspaperev
23 VyhlidalCA RoganPK and LeederJS (2004) Development andrefinement of pregnane X receptor (PXR) DNA binding site modelusing information theory insights into PXR-mediated gene regula-tion J Biol Chem 279 46779ndash46786
24 ZhengM DoanB SchneiderTD and StorzG (1999) OxyR andSoxRS regulation of fur J Bacteriol 181 4639ndash4643 httpwwwccrnpncifcrfgov~tomspaperoxyrfur
25 HengenPN BartramSL StewartLE and SchneiderTD (1997)Information analysis of Fis binding sites Nucleic Acids Res 254994ndash5002 httpwwwccrnpncifcrfgov~tomspaperfisinfo
26 WoodTI GriffithKL FawcettWP Jair K-W SchneiderTDand WolfRE (1999) Interdependence of the position andorientation of SoxS binding sites in the transcriptional activation ofthe class I subset of Escherichia coli superoxide-inducible promotersMol Microbiol 34 414ndash430
27 ZhengM WangX DoanB LewisKA SchneiderTD andStorzG (2001) Computation-directed identification of OxyR-DNAbinding sites in Escherichia coli J Bacteriol 183 4571ndash4579
28 RoganPK FauxBM and SchneiderTD (1998) Informationanalysis of human splice site mutations Hum Mutat 12 153ndash171Erratum in Hum Mutat 199913(1)82 httpwwwccrnpncifcrfgov~tomspaperrfs
29 ChenZ and SchneiderTD (2005) Information theory basedT7-like promoter models classification of bacteriophages anddifferential evolution of promoters and their polymerasesNucleic Acids Res 33 6172ndash6187 httpwwwccrnpncifcrfgov~tomspaperst7like
30 ChenZ and SchneiderTD (2006) Comparative analysis of tandemT7-like promoter containing regions in enterobacterial genomesreveals a novel group of genetic islands Nucleic Acids Res 341133ndash1147 httpwwwccrnpncifcrfgov~tomspaperst7island
31 SchneiderTD (1996) Reading of DNA sequence logos predictionof major groove binding by information theory Meth Enzymol274 445ndash455 httpwwwccrnpncifcrfgov~tomspaperoxyr
32 GriggsDW and KoniskyJ (1989) Mechanism for iron-regulatedtranscription of the Escherichia coli cir gene metal-dependentbinding of Fur protein to the promoters J Bacteriol 1711048ndash1052
33 AngererA and BraunV (1998) Iron regulates transcription of theEscherichia coli ferric citrate transport genes directly and throughthe transcription initiation proteins Arch Microbiol 169 483ndash490
34 de LorenzoV HerreroM GiovanniniF and NeilandsJB (1988)Fur (ferric uptake regulation) protein and CAP (catabolite-activatorprotein) modulate transcription of fur gene in Escherichia coliEur J Biochem 173 537ndash546
35 HuntMD PettisGS and McIntoshMA (1994) Promoter andoperator determinants for fur-mediated iron regulation in thebidirectional fepA-fes control region of the Escherichia colienterobactin gene system J Bacteriol 176 3944ndash3955
36 TardatB and TouatiD (1993) Iron and oxygen regulation ofEscherichia coli MnSOD expression competition between the globalregulators Fur and ArcA for binding to DNA Mol Microbiol 953ndash63
37 EscolarL Perez-MartinJ and de LorenzoV (2000) Evidence ofan unusually long operator for the fur repressor in the aerobactinpromoter of Escherichia coli J Biol Chem 275 24709ndash24714
38 YoungGM and PostleK (1994) Repression of tonB transcriptionduring anaerobic growth requires Fur binding at the promoterand a second factor binding upstream Mol Microbiol 11943ndash954
39 BrickmanTJ OzenbergerBA and McIntoshMA (1990)Regulation of divergent transcription from the iron-responsivefepB-entC promoter-operator regions in Escherichia coliJ Mol Biol 212 669ndash682
40 BaichooN and HelmannJD (2002) Recognition of DNA by Fura reinterpretation of the Fur box consensus sequence J Bacteriol184 5826ndash5832
41 OchsnerUA and VasilML (1996) Gene repression by the ferricuptake regulator in Pseudomonas aeruginosa cycle selection of iron-regulated genes Proc Natl Acad Sci USA 93 4409ndash4414
42 BaichooN WangT YeR and HelmannJD (2002) Globalanalysis of the Bacillus subtilis Fur regulon and the iron starvationstimulon Mol Microbiol 45 1613ndash1629
43 ChristoffersenCA BrickmanTJ Hook-BarnardI andMcIntoshMA (2001) Regulatory architecture of the iron-regulatedfepD-ybdA bidirectional promoter region in Escherichia coliJ Bacteriol 183 2059ndash2070
44 ChaiS WelchTJ and CrosaJH (1998) Characterization of theinteraction between Fur and the iron transport promoter of thevirulence plasmid in Vibrio anguillarum J Biol Chem 27333841ndash33847
6776 Nucleic Acids Research 2007 Vol 35 No 20
45 EscolarL Perez-MartinJ and de LorenzoV (1998) Coordinatedrepression in vitro of the divergent fepA-fes promoters ofEscherichia coli by the iron uptake regulation (Fur) proteinJ Bacteriol 180 2579ndash2582
46 EscolarL de LorenzoV and Perez-MartinJ (1997)Metalloregulation in vitro of the aerobactin promoter of Escherichiacoli by the Fur (ferric uptake regulation) protein Mol Microbiol26 799ndash808
47 ShultzabergerRK ChenZ LewisKA and SchneiderTD (2007)Anatomy of Escherichia coli s70 promoters Nucleic Acids Res 35771ndash788 httpwwwccrnpncifcrfgov~tomspaperflexprom
48 ShultzabergerRK BucheimerRE RuddKE andSchneiderTD (2001) Anatomy of Escherichia coli ribosomebinding sites J Mol Biol 313 215ndash228 httpwwwccrnpncifcrfgov~tomspaperflexrbs
49 HiraoI KawaiG YoshizawaS NishimuraY IshidoYWatanabeK and MiuraK (1994) Most compact hairpin-turnstructure exerted by a short DNA fragment d(GCGAAGC) insolution an extraordinarily stable structure resistant to nucleasesand heat Nucleic Acids Res 22 576ndash582
50 Eick-HelmerichK and BraunV (1989) Import of biopolymers intoEscherichia coli nucleotide sequences of the exbB and exbD genesare homologous to those of the tolQ and tolR genes respectivelyJ Bacteriol 171 5117ndash5126
51 VassinovaN and KozyrevD (2000) A method for direct cloning ofFur-regulated genes identification of seven new Fur-regulated lociin Escherichia coli Microbiology 146 3171ndash3182
52 ToledanoMB KullikI TrinhF BairdPT SchneiderTD andStorzG (1994) Redox-dependent shift of OxyR-DNA contactsalong an extended DNA binding site a mechanism for differentialpromoter selection Cell 78 897ndash909
53 BlattnerFR PlunkettG III BlochCA PernaNT BurlandVRileyM Collado-VidesJ GlasnerJD RodeCK et al (1997)The complete genome sequence of Escherichia coli K-12 Science277 1453ndash1474
54 BaggA and NeilandsJB (1987) Molecular mechanism of regula-tion of siderophore-mediated iron assimilation Microbiol Rev 51509ndash518
55 LavrrarJL and McIntoshMA (2003) Architecture of a Furbinding site a comparative analysis J Bacteriol 185 2194ndash2202
56 OrchardK and MayGE (1993) An EMSA-based method fordetermining the molecular weight of a protein ndash DNA complexNucleic Acids Res 21 3335ndash3336
57 MasseE and GottesmanS (2002) A small RNA regulates theexpression of genes involved in iron metabolism in Escherichia coliProc Natl Acad Sci USA 99 4620ndash4625
58 LavrrarJL ChristoffersenCA and McIntoshMA (2002)FurndashDNA interactions at the bidirectional fepDGC-entS promoterregion in Escherichia coli J Mol Biol 322 983ndash995
59 DelanyI RappuoliR and ScarlatoV (2004) Fur functions as anactivator and as a repressor of putative virulence genes in Neisseriameningitidis Mol Microbiol 52 1081ndash1090
60 MasseE VanderpoolCK and GottesmanS (2005) Effect ofRyhB small RNA on global iron use in Escherichia coliJ Bacteriol 187 6962ndash6971
61 ArgamanL HershbergR VogelJ BejeranoG WagnerEGMargalitH and AltuviaS (2001) Novel small RNA-encodinggenes in the intergenic regions of Escherichia coli Curr Biol 11941ndash950
62 WassarmanKM RepoilaF RosenowC StorzG andGottesmanS (2001) Identification of novel small RNAs usingcomparative genomics and microarrays Genes Dev 15 1637ndash1651
63 RobisonK McGuireAM and ChurchGM (1998) A compre-hensive library of DNA-binding site matrices for 55 proteins appliedto the complete Escherichia coli K-12 genome J Mol Biol 284241ndash254
64 SalgadoH Santos-ZavaletaA Gama-CastroS Millan-ZarateDDiaz-PeredoE Sanchez-SolanoF Perez-RuedaEBonavides-MartinezC and Collado-VidesJ (2001) RegulonDB(version 32) transcriptional regulation and operon organization inEscherichia coli K-12 Nucleic Acids Res 29 72ndash74
65 AndrewsSC RobinsonAK and Rodriguez-QuinonesF (2003)Bacterial iron homeostasis FEMS Microbiol Rev 27 215ndash237
66 SchneiderTD (2002) Consensus Sequence Zen ApplBioinformatics 1 111ndash119 httpwwwccrnpncifcrfgov~tomspaperszen
67 RuddKE and SchneiderTD (1992) Compilation of E coliribosome binding sites In MillerJH (ed) A Short Course inBacterial Genetics A Laboratory Manual and Handbook forEscherichia coli and Related Bacteria Cold Spring HarborCold Spring Harbor Laboratory Press New York pp 1719ndash1745
68 HengenPN LyakhovIG StewartLE and SchneiderTD(2003) Molecular flip-flops formed by overlapping Fis sitesNucleic Acids Res 31 6663ndash6673
69 FeeJA (1991) Regulation of sod genes in Escherichia colirelevance to superoxide dismutase function Mol Microbiol 52599ndash2610
70 IgarashiK SaishoT YuguchiM and KashiwagiK (1997)Molecular mechanism of polyamine stimulation of thesynthesis of oligopeptide-binding protein J Biol Chem 2724058ndash4064
71 YoshidaM MeksuriyenD KashiwagiK KawaiG andIgarashiK (1999) Polyamine stimulation of the synthesis ofoligopeptide-binding protein (OppA) Involvement of a structuralchange of the Shine-Dalgarno sequence and the initiation codonaug in oppa mRNA J Biol Chem 274 22723ndash22728
72 LitwinCM and CalderwoodSB (1993) Role of iron in regulationof virulence genes Clin Microbiol Rev 6 137ndash149
73 MakuiH RoigE ColeST HelmannJD GrosP andCellierMF (2000) Identification of the Escherichia coli K-12Nramp orthologue (MntH) as a selective divalent metal iontransporter Mol Microbiol 35 1065ndash1078
74 PatzerSI and HantkeK (2001) Dual repression by Fe(2+)-Furand Mn(2+)-MntR of the mntH gene encoding an NRAMP-likeMn(2+) transporter in Escherichia coli J Bacteriol 1834806ndash4813
75 FurrerJL SandersDN Hook-BarnardIG and McIntoshMA(2002) Export of the siderophore enterobactin in Escherichia coliinvolvement of a 43 kDa membrane exporter Mol Microbiol 441225ndash1234
76 CunninghamL GruerMJ and GuestJR (1997) Transcriptionalregulation of the aconitase genes (acnA and acnB) of Escherichiacoli Microbiology 143 ( Pt 12) 3795ndash3805
77 FuentesAM Diaz-MejiaJJ Maldonado-RodriguezR andAmabile-CuevasCF (2001) Differential activities of theSoxR protein of Escherichia coli SoxS is not required for geneactivation under iron deprivation FEMS Microbiol Lett 201271ndash275
78 VargheseS TangY and ImlayJA (2003) Contrasting sensitivitiesof Escherichia coli aconitases A and B to oxidation and irondepletion J Bacteriol 185 221ndash230
79 ZavitzKH DiGateRJ and MariansKJ (1991) The PriB andPriC replication proteins of Escherichia coli Genes DNAsequence overexpression and purification J Biol Chem 26613988ndash13995
80 SandlerSJ (2005) Requirements for replication restart proteinsduring constitutive stable DNA replication in Escherichia coli K-12Genetics 169 1799ndash1806
81 SchneiderTD (2001) Strong minor groove base conservation insequence logos implies DNA distortion or base flipping duringreplication and transcription initiation Nucleic Acids Res 294881ndash4891 httpwwwccrnpncifcrfgov~tomspaperbaseflip
Nucleic Acids Research 2007 Vol 35 No 20 6777
For most of the oligos the M6 model predicted one moresite than the M9 model did Similar to our observationsin E coli these results strongly suggest that the M9 modelis a single-dimer binding model while the M6 alignment isdue to compression of the binding sequencesRelative to E coli Fur (NP_415209) the P aeruginosa
Fur (NP_253452) is moderately diverged (58 identical inprotein sequence) and the B subtilis Fur (NP_390233) isdistantly diverged (33 identity) However a comparisonof the three M9 models from these three species showsthat these models are highly similar to each other(Figure 4) suggesting that the FurndashDNA recognitionmechanism is conserved in even distantly related species
Scans of footprinted regions
Gel shift experiments cannot give precise Fur bindingregions Originally using the 11-site M9 model and laterusing the 24-site M9 E coli Fur model we predicted Fursites in the fhuF promoter region and then to validate ourmodel we performed DNase I footprinting on that regionTo further confirm the 24-site M9 model we also scannedit across all other available footprinted E coli Fur bindingregions and correlated our predictions with the publishedfootprints Fur has been documented to exhibit lsquosecondaryfootprintingrsquo protecting extended regions of DNA underhigher protein concentrations (Supplementary Figure S8)(333646) To reveal the extended regions we used a cutoffat zero bits since that is the theoretical lowest informationfor binding (19)The E coli 24-site M9 model predicts two separated
Fur binding clusters in the fhuF promoter region onecluster contains three strong sites of 161 208 and 223bits and the other contains two weaker sites of 146 and
93 bits (Figure 5A) The fhuF oligo used in the gel shiftexperiments in Figure 3 only contains the strong Furcluster To determine if both predicted clusters are pro-tected by Fur we performed DNase I footprinting on alarger fhuF promoter region (see Materials and MethodsSection) that contains both predicted Fur binding clustersThe results show that there are indeed two Fur-protectedregions in the fhuF promoter one has high affinity withFur (protected by 50 nM of Fur) and the other hasa lower affinity (weakly protected by 50 nM of Fur)(Figure 5B) The predicted strong Fur binding clustercovers 91 of the high-affinity footprinted region and theweak Fur cluster covers 66 of the low-affinity region(Table 3 Figure 5A)
There are 11 other footprinted E coli Fur bindingregions including the 10 regions used to build the initial11-site models and the fepD-entS promoter region (58)When these footprints were scanned similar coveragewas obtained (Table 3 Supplementary Figure S8)All 11 footprinted sequences displayed clusters of multipleoverlapping Fur walkers in these clusters the majorityof the walkers were separated by six bases some wereseparated by three bases (Table 3 SupplementaryFigure S8) For 7 of the 11 regions (fepA-fes fepB entCfur tonB cir and iucA) sequence walkers cover 83(72ndash93) of the footprints No sequence walkers coverthe secondary footprints in the fecA fepD-entS and sodAregions resulting in low coverage of these three footprints(66 54 and 56 respectively) The secondary sodAfootprint was only protected with a high concentration ofpurified Fur (200 and 500 nM) it was not protected bycrude extracts containing overproduced Fur (36) In theother region fecI several low-information content
A
B
yoeA fepD-entS gpmA ryhB fhuA nohA oppA
bits 257 172 243 220 161 205 187 11-site M9274 194 274 252 195 223 215 24-site M9
Fur minus + minus + minus + minus + minus + minus + minus +
gspC garP yahA fadD ygaC exbB exbD fhuF
bits 122 122 163 197 14 175 minus50 214 11-site M9144 138 186 217 minus34 191 minus71 223 24-site M9
Fur minus + minus + minus + minus + minus + minus + minus + minus +
Figure 3 Gel mobility shift assays to test predicted Fur binding sites Oligonucleotides containing predicted Fur binding sites were incubated without() or with (+) 150 nM E coli Fur protein and gel electrophoresed to test the 11-site M9 model (Figure 1B) Below each set of lanes is the strengthin bits of the strongest sequence walker found on each oligo by the 11-site M9 model Predictions by the 24-site M9 model which includes 13 of thesites tested here (not ygaC or exbD) are also given for comparison (A) The first set of oligos contain seven predicted Fur sites located in promoterregions (yoeA fepD-entS gpmA ryhB fhuA nohA and oppA) (B) The second set of oligos contain four predicted Fur sites located within genes(gspC garP yahA and fadD See Supplementary Figures S10ndashS13 for sequence walkers) two sites located in promoter regions (exbB and fhuF)and two consensus-predicted sites (ygaC and exbD) (5051) As expected all oligos that contain predicted sites (by the 11-site M9 model) exhibit oneor more mobility shifts following incubation with 150 nM Fur the oligo ygaC (14 bits) shows extremely weak binding and exbD (50 bits)does not shift
6768 Nucleic Acids Research 2007 Vol 35 No 20
walkers (02 and 08 bits) extend past the range of thefootprint resulting in 144 coverage of the footprint
In a previously published study Escolar et al synthe-sized oligonucleotides that contained repeats of thesequence GATAAT and determined Fur binding byfootprint experiments (5) No footprint was seen withone insert correspondingly no sequence walkers wereobserved in that sequence (Supplementary Figure S9)For the oligos with three four and five inserts sequencewalkers cover 83ndash90 of the footprints The 2-insertsequence had a weak interaction with Fur at high protein
concentrations and two overlapping walkers of 30 and42 bits appeared which cover the repeats (5) Theseresults demonstrate that the Fur model accurately predictsFur binding on both natural and synthetic sequences
Whole genome scan
The lowest information of any site in the 24-site M9 modelis 101 bits (Supplementary Figure S1) Since all sites inthe model are proven binding sites we used 10 bits asthe cutoff for a whole genome scan This presumably
A
B
C
======gtolt======
======gtolt======
======gtolt======
24 E coli Fur binding sites 196 bits (minus9 +9)
0
1
2b
its
5prime
minus20
CTGA
minus19
minus18
minus17
minus16
GCAT
minus15
GCTA
minus14
CGTA
minus13
GACT
minus12
CAGT
minus11
CGTA
minus10
GCAT
minus9
GTA
minus8
GTA
minus7
GCAT
minus6
CAG
minus5
GTCA
minus4
GAT
minus3
TA
minus2
TGCA
minus1
GACT
0 1
TCGA
2
ACGT
3
AT
4
TCA
5
CGAT
6
GTC
7
CTGA
8
CAT
9
CAT
10
CGTA
11
GCAT
12
GTCA
13
CTGA
14
GCAT
15
CGAT
16
CGTA
17 18 19 20
GCAT
3prime
20 P aeruginosa Fur sites 186 bits (minus9 +9)
0
1
2
bit
s
5prime
minus20
minus19
minus18
minus17
minus16
minus15
minus14
CGTA
minus13
ACGT
minus12
minus11
minus10
CGTA
minus9
CGTA
minus8
CTA
minus7
CT
minus6
TAG
minus5
GTCA
minus4
CAGT
minus3
TA
minus2
CTA
minus1
CT
0
GC
1
GA
2
GAT
3AT
4GTCA
5
CAGT
6
ATC
7
GA
8
GAT
9
GCAT
10
GCAT
11 12 13
TGCA
14
GCAT
15 16 17 18 19 20 3prime
22 B subtilis Fur sites 229 bits (minus9 +9)
0
1
2
bit
s
5prime
minus20
minus19
GCAT
minus18
GCTA
minus17
GACT
minus16
CTGA
minus15
minus14
GCAT
minus13
GCAT
minus12
minus11
GCAT
minus10
GCTA
minus9
CGTA
minus8
CGTA
minus7
AT
minus6
C
G
minus5
GCA
minus4
CGAT
minus3
TGA
minus2
TGA
minus1
GCAT
0
TAGC
1
GCTA
2
CAT
3
CAT
4
GCTA
5
CGT
6
G
C7
TA
8
GCAT
9
GCAT
10
CGAT
11
CGTA
12 13
CGTA
14
CGTA
15 16
GACT
17
CTGA
18
CGAT
19
CGTA
20 3prime
Figure 4 Comparison of M9 Fur models of three bacterial species Experimentally proven Fur binding sites from E coli (A) P aeruginosa (B) andB subtilis (C) were used to build the models for these three bacteria The marked region from 7 to +7 in each logo indicates the 7-1-7 model thatwas proposed by Baichoo and Helmann (40)
Nucleic Acids Research 2007 Vol 35 No 20 6769
minimizes false positives in our predictions When themodel was scanned across the E coli K-12 genome(NC_000913) a total of 412 sites were found above10 bits These sites belong to 363 unique clusters A list ofthe clusters is available at httpwwwccrnpncifcrfgovtomspapersfurThe 363 regions were extracted from 50 to thorn50
around the strongest site in each region and scanned withthe 24-site M9 model using a cutoff of 0 bits We thendetermined the exact region of each cluster that is coveredby walkers greater than 0 bits The results show that theseclusters contain 1ndash14 walkers covering a region from19ndash142 bases (350 bases on average) This is comparableto the known footprints (30ndash103 bases) (Table 3 Supple-mentary Figure S8) Of these clusters 155 are in inter-genic regions 143 are entirely inside gene coding regionsand 65 overlap the boundary between coding and inter-genic regionsTo determine how many of the Fur clusters overlap
with reasonably strong promoters we scanned these 363Fur clusters with a s70 promoter model (47) using a 4-bitcutoff The results showed that 303 clusters overlap withone or more strong promoters (Rigt4 bits) The other60 clusters do not overlap with a strong promoterWhen we scanned the flanking regions of these 60 clusters(50 to 0 of the left edge and 0 to thorn50 of the right edgeof each cluster) only 23 clusters were found to havestrong promoters (Ri gt 4 bits) in the flanking regionsWe examined the locations of Fur sites relative toknown promoter elements (transcriptional start site 10and 35) and found that they are more tightly distributedaround the promoter elements than observed with theDNA binding proteins Fis H-NS and IHF (47) (data notshown) These results suggest that Fur may mainly act as adirect repressor of genes by binding to and blockingthe promoters and that direct activation by Fur throughbinding to the region upstream of a promoter assuggested in Neisseria meningitidis by Delany et al (59)
may not be a common mechanism of Fur regulationin E coli
Out of the 363 clusters 42 were found containingat least one predicted Fur binding site above 170 bits(our arbitrary cutoff used to locate significant regions)(Table 4) Of the 42 clusters 39 were found to overlap withone or more strong s70 promoters (gt4 bits) The otherthree clusters fepD-entS yddA and fecA do not overlapScanning the flanking regions of these three clusters withthe s70 promoter model revealed that these Fur clustersare located between the respective promoters and transla-tional starts There may be only weak repression of thesegenes by Fur as none of them were significantly repressedby Fur according to microarray analyses (1360)
Scans of other proposed Fur-regulated genes
Many genes have been proposed to be Fur-regulated bycomparing conventional consensus sequences to promoterregions and also by homology to systems in other organ-isms Kammler et al annotated two Fur binding sites inthe promoter region of feoA in E coli by comparison toa consensus sequence (10) The 24-site M9 model predictstwo clusters of sites in the same region with each clustercovering one of the marked Fur boxes One cluster hasone major site of 105 bits (at 3538006) the other clusterhas two major overlapping sites of 109 (at 3538059) and134 bits (at 3538065) respectively The same authors haveconfirmed that Fur does bind to this region in vivo but theexact Fur binding regions have not been determined byfootprint experiments
Vassinova and Kozyrev used an in vivo selection tolocate Fur sites on Sau3A fragments from the E coligenome (51) The five regions from Figure 2 of their paperwere analyzed using sequence walkers Using an unidenti-fied consensus sequence yhhX (from 3578828 to 3577791)was predicted by Vassinova and Kozyrev to have two Furbinding sites In the same region we found one strongsite of 252 bits (at 3579054) (Table 4) overlapping twoweaker sites (82 and 83 bits) This cluster of Fur sites isactually located right in front of an sRNA gene ryhB(from 3579039 to 3578946) (Table 4) (6162) which hasbeen shown to be repressed by Fur (57) The promoterregion of gpmA (from 786818 to 786066 named pgm inreference 51) was predicted by consensus to contain twosites One of the highest sites (274 bits) was found in thisregion (at 786853) overlapping two weaker sites of 75and 56 bits The consensus sequence-predicted Fur site byVassinova and Kozyrev (51) is located about 600 basesupstream of the ygaC gene In the same region our 11-siteM9 model only found a 14-bit site (Figure 3) and the24-site M9 model found a negative site of 34 bits(Supplementary Figure S1) Instead in the nearby regionsthe 24-site M9 model found a 74-bit site (11 basesupstream of ygaC) and a 101-bit site (728 bases upstreamof ygaC) For nohA (from 1634391 to 1633822)two overlapping sites of 223 and 126 bits were found at1634624 and 1634630 (Table 4) The fifth region was fhuF(from 4603686 to 4602898) Figure 5 shows the DNase Ifootprints and predicted walkers in this region Onlythe high-affinity region (fhuF1) was identified by the
Table 2 Information theory test of B subtilis Fur models M9 and M6
Oligo1 Number of observed1
Fur dimers (nM)Number of M92
walkers (bits)Number of M62
walkers (bits)
6-mer 0 f 0 f 06-6 1 (500) f 1 (21) 2 (67 54)6-1-6 1 (200) f 1 (97) 2 (54 74)7-1-7 1 (100) f 1 (207) 2 (107 128)8-1-7 2 (100 1000) f 2 (76 202) f 2 (174 108)8-1-8 2 (100 1000) f 2 (76 230) f 2 (174 160)9-1-9 2 (50 100) f 2 (119 301) f 2 (208 194)2 (7-1-7) 2 (10 100) f 2 (166 257) 3 (61 208 143)feuA 1 (10) f 1 (223) 2 (103 81)dhbA 2 (10 100) f 2 (256 299) 3 (149 208 171)
1Oligos and observed Fur dimers are from (40) The Fur concentration(nM) at which one or two Fur dimer-DNA complexes appeared isgiven in parentheses2Number of sequence walkers of each model followed by theinformation (bits) of each walker are shown for each sequence Thepredictions that match the number of observed Fur dimers are markedwith a solid circle while those that do not are marked with an opencircle Only major walkers are counted A full depiction of all walkerscan be found in Supplementary Figure S7
6770 Nucleic Acids Research 2007 Vol 35 No 20
A B
L
H
H
L
H fhuF1
L fhuF2
yjjZ
4603880 4603870 4603860 4603850 4603840 4603830 4603820 5 a t c a g c a a t c c c g g c a g c a a c a c t c c c c a g c c a c t g c c c a g c g t a c g t t g c a a c a t g a t t t c a t c t c 3
4603810 4603800 4603790 4603780 4603770 4603760 4603750 5 t t t c a t t g a t a a t g a t a a c c a a t a t c a t a t g a t a a t t t t t a t c a t t t g c a a g c c a g a t a a a t c c c t t g c t a t c g g 3
Fur high affinity topFur high affinity bottom
M9 161 bits
M9 208 bits
M9 223 bits
M9 34 bits
fhuF
fhuF
4603740 4603730 4603720 4603710 4603700 4603690 4603680 5 g t a a a c c t a t c g c t a t g a t t a g c a a t c a t t a t c a t t t a g a t t a c t a t c c c g a t t a t g g c c t a t c g t t c 3rsquo
Fur low affinity topFur low affinity bottom
M9 22 bits
M9 01 bits
M9 146 bits
M9 93 bits
Figure 5 Fur model scan and DNase I footprints of the fhuF promoter region (A) The promoter region was scanned with the 24-site M9 model and two clusters of sequence walkers (Rigt 0bits) were detected with each closely corresponding to one of the two Fur protected regions (gray bars fhuF1 and fhuF2) The solid black arrow with two tails above the DNA at coordinates4603717 and 4603716 indicates the fhuF transcription starts The open arrows starting at coordinates 4603827 and 4603686 indicate the yjjZ and fhuF translation starts respectively Thesaturation of colored rectangles behind each sequence walker is proportional to the information of that site For reference colored lines cycle with 6-base periodicity (B) DNase I footprinting onthe fhuF promoter by Fur showed two regions protected by the protein marked in the figure by brackets The high-affinity and low-affinity regions are marked with H and L respectively Thefootprinting samples were run in parallel with MaxamndashGilbert sequencing ladders (marked by GA)
Nucleic
Acid
sResea
rch2007V
ol3
5N
o20
6771
consensus sequence Thus four of the five Sau3Afragments (yhhX gpmA nohA and fhuF) selected to haveFur binding sites in vivo were identified in our genomescan (Table 4) and our gel shift results confirmed this(Figure 3) the remaining site ygaC had a weak walker of14 bits (by the 11-site M9 model) and showed extremelyweak binding (Figure 3)Eick-Helmerich and Braun matched a Fur consensus
sequence to the promoters of exbB and exbD (50)One strong walker of 191 bits was found in the exbBpromoter region overlapping the lsquoFur boxrsquo The regionupstream of exbD showed no walkers with Ri greater than0 bits even though it had also been predicted to containa Fur binding site using the same consensus sequenceas used in the exbB promoter The highest walker thatwe could detect in the region upstream of the exbDgene is 71 bits (by the 24-site M9 model Figure 3 andSupplementary Figure S1) significantly lower than thetheoretical lower bound of a binding site (0 bits) (19) Thisindicates that Fur should not bind this region and our gelshift results confirmed this (Figure 3)A microarray analysis by McHugh et al found 101
genes to be regulated by Fur 53 of which were repressedand 48 activated (13) The 53 Fur-repressed genes belongto 32 transcription units (individual genes or operons)18 of which are involved in iron metabolism 14 in energymetabolism and other functions Using the 24-site M9model we predicted strong Fur binding clusters (higherthan 13 bits) for all 18 iron metabolism-related transcrip-tion units Of the other 14 transcription units onlytwo have a Fur binding site higher than 10 bits (nrdHIEF101 bits fimE 108 bits) The 48 Fur-activated genesbelong to 34 transcription units only three of these(garPLRK ynaE and ydfK) have a predicted Fur bindingcluster The garPLRK operon has a weak Fur clusterthat contains one major site of 104 bits The ynaE andydfK genes are almost identical in both promoter andcoding regions and the two Fur clusters (171 bitscentered at thorn8 from the translational start) are the same(Table 4) These results strongly suggest that in E coli
Fur is a direct repressor of iron metabolism genes and anindirect repressor or activator of other genes
McHugh et al (13) used an altered information theoryapproach for modeling Fur binding (63) to identify whichgenes are under direct Fur control Our model identifiedsites in all genes that their model did as well as fourothers fhuE (124 bits at 1160880) exbB (191 bits at3150076) fimE (108 bits at 4539697) and ydfK (171 bitsat 1631071)
The indirect activation by Fur requires an intermediateregulator which could directly target Fur-activated genesA small regulatory RNA gene ryhB is one such importantplayer Earlier study revealed several genes that aredirectly repressed by ryhB (57) Recently a microarrayanalysis of ryhB control by Masse et al (60) expanded thisdirect target set to 18 operons They also identified anadditional 10 indirectly down-regulated operons and 10up-regulated operons The 10 ryhB indirectly repressedoperons are directly repressed by Fur and for all ofthese operons we found strong Fur binding clusters(higher than 13 bits) in the corresponding promoterregions In contrast only a few of the other 28 ryhB-controlled operons (18 directly repressed operons and10 up-regulated operons) have a predicted Fur bindingcluster above 10 bits These include acnA (176 bits at1333484) ydhD (109 bits at 1732367) and oppA (215 bitsat 1298973) The acnA and oppA Fur clusters are bothstrong the former cluster has three overlapping sites of176 98 and 142 bits and the latter has two overlapp-ing sites of 215 and 96 bits (Table 4) Howevermicroarray analysis did not find these two genes to berepressed by Fur (1360) suggesting that the Fur bindingclusters may not be the primary control elements ofthese genes
DISCUSSION
Fur binding models
In this study we used experimentally proven sites to createinformation theory models of Fur binding Our models
Table 3 Scan of footprinted Fur binding sites in E coli
Accession Gene Footprinted region References Number Ri Percentage
NC_000913 fhuF1 4603763 ndash 4603815 This work 4 223 91NC_000913 fhuF2 4603680 ndash 4603733 This work 4 146 66NC_000913 fepA-fes 611868 ndash 611915 (35) 4 187 77NC_000913 fepB 623939 ndash 623969 (39) 2 167 81NC_000913 entC 624054 ndash 624102 (39) 4 231 76NC_000913 fur 709931 ndash 709960 (34) 2 123 83NC_000913 tonB 1309040 ndash 1309072 (838) 3 115 88NC_000913 cir 2244957 ndash 2244999 (32) 3 207 72NC_000913 sodA 4098726 ndash 4098780 (36) 3 233 56NC_000913 fecA 4514752 ndash 4514789 (33) 2 176 66NC_000913 fecI 4516300 ndash 4516340 (33) 7 231 144M10930 iucA 291 ndash 393 (37) 13 203 93NC_000913 fepD-entS 621394 ndash 621462 (58) 4 201 54
The footprinted regions were scanned with the 24-site M9 model (Rigt0) to test the modelrsquos validity This table lists the GenBank accession numberthe gene promoter that was footprinted the coordinates of the footprinted region the references for the footprints the number of walkers in theregion the strongest Ri value in the same region and the percent of the footprint covered by sequence walkers This table is a summary of Figure 5Aand Supplementary Figure S8 Sites in the promoters that are marked by an asterisk were used to build the 11-site models (Figure 1 SupplementaryFigure S1) in the case of iucA two sites were used in the model (see Materials and Methods Section)
6772 Nucleic Acids Research 2007 Vol 35 No 20
especially the 24-site M9 model approximate the bindingcharacteristics of Fur more fully than previous modelsthat depended on conventional consensus sequences datafrom multiple species and sequences that were not foot-printed (13251) The rigorous approach used hererevealed new binding sites disproved two sites predictedby a consensus sequence and clarified the manner inwhich the protein binds
To produce these models binding sequences weremultiply aligned to maximize the information content indifferent window sizes Because multiple Fur sites overlapour analysis of proven Fur binding sequences from three
bacterial species gave two or three different alignmentsdepending on the species (Figure 1 SupplementaryFigure S6) By testing these models against publisheddata we showed clearly that the M9 model representssingle-dimer binding sites the M12 model is for two-dimerbinding while the compact M6 alignment was caused bycompression of the binding sequences into a small window(Tables 1 and 2 Figures 1 and 2 Supplementary FiguresS3 S5 and S7)As evaluated by their respective M9 models our
initial set of 11 footprinted E coli Fur binding sitesall contain overlapping sites of 6-base spacing
Table 4 Strong Fur clusters in the E coli genome predicted by the 24-site M9 model
Ri Number Size Coordinate Genes Experimental Position References(bits) (bp) data
274 12 95 2066611 yoeA G 48 NA274 5 31 786853 gpmA G 35 (1351)253 4 28 579824 nohB 133 (1)252 4 31 3579054 ryhB G 15 (57)233 3 31 4098746 sodA F 87 (6069)231 7 59 4516309 fecI F 51 (1360)231 4 37 624072 entC F 36 (39)228 2 25 1903412 yebN 300 NA223 4 48 4603779 fhuFa FG 93 (135160)223 3 25 1634624 nohA G 233 (51)217 2 25 1886651 fadD G +1119b NA215 2 25 1298973 oppA G 233 (7071)213 3 31 1762463 sufA 53 (1360)207 3 31 2244980 cir F 189 (1372)206 5 40 1118358 yceJ 89 NA202 3 31 3465053 bfd 40 (1360)201 10 53 1787602 ydiE 35 (13)201 2 35 1080513 ycdN 57 (1360)199 8 106 3214572 yqjH 59 (13)199 7 103 1752756 ydhY 255 NA199 2 25 2510784 mntH 56 (7374)196 3 25 2231892 yeiT 163 NA196 2 25 1577358 yddA +8c (13)195 3 31 167439 fhuA G 45 (1360)194 4 37 621437 fepD ndash entS FG 25 86 (1375)193 3 52 675768 ybeQ +2c NA191 3 31 3150076 exbB G 70 (136072)187 8 96 2784036 ygaQ 383 NA187 4 37 611889 fepA ndash fes F 172 149 (13)186 7 38 3266404 yhaC 33 NA186 4 28 331901 yahA G +306e NA183 4 61 331085 yahA 510e NA176 6 41 1333484 acnA 371 (5776ndash78)176 2 25 4514779 fecA F 79 (1360)176 1 19 2254041 yeiL +5d NA175 3 25 490057 priC ndash ybaN 21 49 (7980)173 2 29 2454142 yfcV 474 NA172 8 41 3582772 yrhB 10 NA172 4 37 840877 fiu 123 (13)171 3 40 1631071 ydfK +8c (13)171 3 40 1432273 ynaE +8c (13)171 2 25 4570225 yjiT 212 NA
The strongest individual information value (Ri bits) in each cluster is shown followed by the number of walkers (Rigt 0 bits) the size of the clusterthe coordinate of the strongest site the genes that are presumably controlled experimental data footprinted (F) andor gel shifted (G) sites thelocation of the strongest Fur site relative to the gene start or stop codon (as noted) and references about Fur regulation of the genes (NA means noreference is available)aThis cluster corresponds to the high-affinity region (fhuF1) in the fhuF promoter (Figure 5)bThe coordinate is high because Fur cluster is inside the fadD coding regioncThese Fur clusters overlap with translational startsdThis Fur cluster overlaps with the stop codoneThe yahA gene has two Fur clusters one is inside the coding region another is 510 bases upstream of the translational start The Fur cluster insidethe gene was tested by gel shift experiments (Figure 3)
Nucleic Acids Research 2007 Vol 35 No 20 6773
(Supplementary Figure S8) while only 7 of the 20P aeruginosa Fur binding sites and 6 of the 22 B subtilisFur binding sites are single-dimer binding sites (data notshown) This scarcity of multiple sites explains why anM12 alignment can be obtained for E coli but not forP aeruginosa or B subtilis When we performed multiplealignment by only using overlapping sites of 6-bp spacingwe obtained M12 alignments for both P aeruginosa andB subtilis (data not shown) These observations alsosupport the use of M9 as a single-dimer modelOur multiple alignment method should apply to any
DNA binding sites that contain internal direct repeats asrepetition within binding sites may cause alternativealignments of the sites In the RegulonDB (httpregulondbccgunammx) (64) a Fur binding modelwas created by multiply aligning 47 Fur sites althoughit is not clear what these 47 sites are and how thealignment was made Based on their alignment (httpregulondbccgunammxhtmlmatrix_Alignmentjsp) we made a sequence logo (data notshown) which we found to be similar to our M6 logosAlthough the M6 models appear to be able to predictsome sites (eg Tables 1 and 2 Supplementary Figures S3S5 and S7) the models only represent the internal partof overlapping Fur binding sites ie from 6 to thorn6 of theM12 model (Figure 1A)A 18 A crystal structure for P aeruginosa Fur
was obtained by Pohl et al and based on this theauthors proposed a single-dimer binding model (9) TheP aeruginosa crystal structure contains a putative DNAbinding a-helix H4 and a loop between helices H1 and H2both of which were proposed to be involved in bindingDNA through contacting bases in the major groove Thusa single dimer would protect two consecutive majorgrooves on the same face of DNA and the minimalrecognition unit of Fur should be close to 20 bp (9) OurM9 binding site models (19 bp a 9-1-9 inverted repeat) fitwith this single-dimer binding model closely Pohl et alalso proposed a two-dimer binding model in whichtwo Fur dimers bind two overlapping sites that are 5-bpapart (9) However in our whole genome scans andfootprint scans as well we did not observe any case of5-bp separated overlapping sites above 5 bits Instead wemostly found 3- or 6-bp separated overlapping sitesAbove five bits we found 95 cases of 3-bp spacing and92 cases of 6-bp spacingSeveral other consensus-based Fur binding site models
have been proposed to interpret a 19-bp consensuslsquoFur boxrsquo (Figure 6) (65) Within these two earlier
models the classical model and hexamer model(573754) interpreted the lsquoFur boxrsquo as a single recogni-tion unit of a Fur dimer Later Lavrrar and McIntoshsuggested that a 13-bp inverted repeat (6-1-6) is theminimal unit recognized by a single Fur dimer and thattwo overlapping lsquo6-1-6rsquo motifs correspond to the Fur boxwhich is required for high-affinity binding of Fur (5558)Baichoo and Helmann reinterpreted the Fur box as twooverlapping 7-1-7 inverted repeats with a 6-bp spacingand suggested that a 7-1-7 site but not 6-1-6 representsthe minimal recognition unit of Fur (40) Baichoo andHelmann also demonstrated that high-affinity bindingby Fur can happen on a single-dimer binding site ie aninverted 7-1-7 site
The 7-1-7 model basically agrees with our M9 models(Figure 6) The main difference is that the 7-1-7 model didnot count the weakly conserved bases at positions 9 8thorn8 and thorn9 in the B subtilis Fur binding site alignment(Figure 4C) As the E coli and P aeruginosa M9 modelshave highly conserved bases at these four positions(Figure 4A and B) and also the Fur binding mechanismsfor these three species are similar (9) we suggest thata minimal Fur binding unit of 19 bp (9-1-9) as representedby our M9 models is more reasonable
Relative to our M9 models the Fur box only representsthe internal part of two overlapping sites that areseparated by 6 bases ie from 9 to thorn9 of the M12logo (Figures 1A and 6) thus any predictions based on theFur box may not be accurate Two sites (exbD and ygaC)that were predicted by matching to the Fur box wereshown to be non-binding (Figure 3)
Many difficulties in understanding Fur binding sites canbe attributed to the choice of consensus sequences asa model (66) In contrast to the individual informationweight matrix model the consensus sequence methodignores the varying importance of bases by treatingmismatches equivalently In addition the consensusmethod does not have a criterion for an acceptablenumber of mismatches
Sequence walkers predicted by the 24-site M9 modelcover 83 of the footprints (Table 3 Figure 5Supplementary Figure S8) Similar results were obtainedfor both the P aeruginosa and B subtilis M9 models(data not shown) These results strongly suggest that ourM9 models can accurately predict Fur binding
Fur Binding Site Clusters
The whole genome scan with the 24-site M9model revealed363 Fur binding clusters in E coli most of which containmultiple walkers (up to 14) covering regions up to about140 bases This is comparable to the size of knownfootprints which ranges from 30 to 103 bases (Table 3Supplementary Figure S8) Most of the clusters (303 outof 363 83) were found to overlap with one or morestrong s70 promoters Indeed the four Fur clusters inthe middle of coding regions gspC garP yahA and fadDthat we tested by gel mobility shift assays (Figure 3) allhave one or more potential s70 sites (Ri gt 4 bits) exactlyoverlapping them (see Figures S10ndashS13) The function ofthese control elements is unknown
Fur boxClassical model 9-1-9Hexamer model 6-6-1-6Lavrrar model 6-1-6
Baichoo model 7-1-7
M9 model 9-1-9
GATAATGATAATCATTATC
========gtOlt========
=====gt=====gtOlt=====
=====gtOlt==========gtOlt=====
t=====gtOlt============gtOlt=====a
aat=====gtOlt================gtOlt=====att
Figure 6 Different Fur binding models to interpret the Fur box
6774 Nucleic Acids Research 2007 Vol 35 No 20
We correlated our whole genome scan with publishedmicroarray data about Fur in E coli (1360) The resultsshowed that all operons that are involved in ironmetabolism have a predicted strong Fur binding cluster(higher than 13 bits) suggesting direct repression ofthese genes by Fur In contrast only a few of otherFur-activated or repressed operons have a predicted Furcluster higher than 10 bits suggesting indirect control ofthese genes by Fur It has been shown that RNA generyhB directly targets and represses a set of other genes andryhB itself is directly repressed by Fur thus ryhB playsan important role in indirect activation of genes by Furin E coli (135760)
It has been suggested that in N meningitidis Fur canalso directly activate genes by binding to the regionupstream of the promoters (59) However our wholegenome analysis did not support this activation mechan-ism in E coli The E coli ryhB gene is located betweengenes yhhX and yhhY No corresponding homologs of thegenes ryhB yhhX or yhhY can be found in N meningitidisThis suggests that Fur activation mechanisms may bedifferent in different species
Disentangling Fur models
Each binding site that we have studied in detail withinformation theory has had unique properties andchallenges for analysis For example ribosome bindingsites were initially modeled as rigid objects (16) but whenenough sites became available it was clear that theShinendashDalgarno sequence was being lost because it has avariable distance to the initiation codon (67) and a flexiblemodel was required (48) The flexible model was latersuccessfully applied to s70 promoters (47) and that wasused to study Fur sites Likewise information theoryanalysis revealed that Fis binding sites are commonlyfound to overlap each other (68) leading to someuncertainty as to how much nearby sites influence thesites being studied in the alignments Overlapping Fursites that are bound by Fur protein dimers create an evenmore complex situation
Although the 24-site M9 model was quite successful themodel is comprised of information from multiple Fur sitesbecause the Fur dimers overlap on the DNA Since theclusters frequently display 6-base separation of sitespositions 3 to thorn9 of the model contain data frompositions 9 to thorn3 of the adjacent Fur sites eg see thelogo of Figure 4 and the M9 walkers of Figure 2AIn addition information from sites separated by 3 basesand the complements of all of these also contribute to theoverall information content of the model It is not obvioushow to disentangle these effects We suggest that the M9model can be scanned across the genome to locate strongsingle-dimer binding sites which could then be experi-mentally confirmed and used to construct a disentangledmodel
Unfortunately the distribution of the individual infor-mation in these sites would be determined by the modelused to locate them and this could introduce biases Forexample the mean of the distribution can be altered justby selecting different subsets of sites from the distribution
A similar bias has occurred in binding site databaseswhere the consensus sequence was used to locate newsites (23) Thus although the sequence logo would nolonger contain self-overlapping data it would not neces-sarily represent the natural distribution of site strengthsTo approach such an ultimate model may require furtherscanning of footprinted regions Incorporation of theweaker sites in the footprints would of course re-entanglethe model so such scans might be best used only todetermine whether the model can detect the weaker sitesA viable solution may be to obtain all single-dimerbinding sites in the genome that have more than zero bitsand can be confirmed experimentally If the resultingmodel matches the observed footprints it may be consi-dered complete On the other hand if the resultingmodel does not match the footprints the mode of bindingby Fur could be different between single dimers andclusters
SUPPLEMENTARY DATA
Supplementary data are available at NAR Online
ACKNOWLEDGEMENTS
We thank Tom OrsquoHalloran and Caryn Outten for Furprotein Pete Rogan for helping to develop the local bestconcept Elaine Bucheimer for writing the localbestprogram and Lakshmanan Iyer Brent Jewett DanielleNeedle Xiao Ma Shu Ouyang Denise Rubens andBruce Shapiro for their helpful comments and discussionsKAL also thanks the National Cancer Institute atFrederick for sponsoring the Werner H Kirsten StudentIntern Program This publication has been funded in partwith Federal funds from the National Cancer InstituteNational Institutes of Health under contract NO1-CO-12400 The content of this publication does not necessarilyreflect the views or policies of the Department of Healthand Human Services nor does mention of trade namescommercial products or organizations imply endorsementby the US Government This research was supported inpart by the Intramural Research Program of the NIHNational Cancer Institute Center for Cancer ResearchFunding to pay the Open Access publication charges forthis article was provided by National Cancer Institute
Conflict of interest statement None declared
REFERENCES
1 StojiljkovicI BaumlerAJ and HantkeK (1994) Fur regulon ingram-negative bacteria Identification and characterization of newiron-regulated Escherichia coli genes by a fur titration assay[published erratum appears in J Mol Biol 1994 Jul 15240(3)271]J Mol Biol 236 531ndash545
2 Le CamE FrechonD BarrayM FourcadeA and DelainE(1994) Observation of binding and polymerization of Fur repressoronto operator-containing DNA with electron and atomic forcemicroscopes Proc Natl Acad Sci USA 91 11816ndash11820
3 CalderwoodSB and MekalanosJJ (1987) Iron regulation ofShiga-like toxin expression in Escherichia coli is mediated by thefur locus J Bacteriol 169 4759ndash4764
Nucleic Acids Research 2007 Vol 35 No 20 6775
4 NiederhofferEC NaranjoCM BradleyKL and FeeJA (1990)Control of Escherichia coli superoxide dismutase (sodA and sodB)genes by the ferric uptake regulation (fur) locus J Bacteriol 1721930ndash1938
5 EscolarL Perez-MartinJ and de LorenzoV (1998) Binding of theFur (Ferric Uptake Regulator) repressor of Escherichia coli toarrays of the GATAAT sequence J Mol Biol 283 537ndash547
6 BaggA and NeilandsJB (1987) Ferric uptake regulation proteinacts as a repressor employing iron (II) as a cofactor to bind theoperator of an iron transport operon in Escherichia coliBiochemistry 26 5471ndash5477
7 de LorenzoV WeeS HerreroM and NeilandsJB (1987)Operator sequences of the aerobactin operon of plasmid ColV-K30binding the ferric uptake regulation (fur) repressor J Bacteriol169 2624ndash2630
8 AlthausEW OuttenCE OlsonKE CaoH andOrsquoHalloranTV (1999) The ferric uptake regulation (Fur) repressoris a zinc metalloprotein Biochemistry 38 6559ndash6569
9 PohlE HallerJC MijovilovichA Meyer-KlauckeWGarmanE and VasilML (2003) Architecture of a protein centralto iron homeostasis crystal structure and spectroscopic analysis ofthe ferric uptake regulator Mol Microbiol 47 903ndash915
10 KammlerM SchonC and HantkeK (1993) Characterization ofthe ferrous iron uptake system of Escherichia coli J Bacteriol 1756212ndash6219
11 DesaiPJ AngererA and GencoCA (1996) Analysis of Furbinding to operator sequences within the Neisseria gonorrhoeae fbpApromoter J Bacteriol 178 5020ndash5023
12 HeidrichC HantkeK BierbaumG and SahlHG (1996)Identification and analysis of a gene encoding a Fur-likeprotein of Staphylococcus epidermidis FEMS Microbiol Lett 140253ndash259
13 McHughJP Rodriguez-QuinonesF Abdul-TehraniHSvistunenkoDA PooleRK CooperCE and AndrewsSC(2003) Global iron-dependent gene regulation in Escherichia coliA new mechanism for iron homeostasis J Biol Chem 27829478ndash29486
14 ShannonCE (1948) A mathematical theory of communicationBell Syst Tech J 27 379ndash423 623ndash656 httpcmbell-labscomcmmswhatshannondaypaperhtml
15 PierceJR (1980) An Introduction to Information Theory SymbolsSignals and Noise 2nd edn Dover Publications Inc New York
16 SchneiderTD StormoGD GoldL and EhrenfeuchtA (1986)Information content of binding sites on nucleotide sequencesJ Mol Biol 188 415ndash431 httpwwwccrnpncifcrfgov~tomspaperschneider1986
17 SchneiderTD and MastronardeD (1996) Fast multiple alignmentof ungapped DNA sequences using information theory and arelaxation method Discrete Appli Math 71 259ndash268 httpwwwccrnpncifcrfgov~tomspapermalign
18 SchneiderTD and StephensRM (1990) Sequence logos a newway to display consensus sequences Nucleic Acids Res 186097ndash6100 httpwwwccrnpncifcrfgov~tomspaperlogopaper
19 SchneiderTD (1997) Information content of individual geneticsequences J Theor Biol 189 427ndash441 httpwwwccrnpncifcrfgov~tomspaperri
20 SchneiderTD (1997) Sequence walkers a graphical method todisplay how binding proteins interact with DNA or RNAsequences Nucleic Acids Res 25 4408ndash4415 httpwwwccrnpncifcrfgov~tomspaperwalker erratum NAR 26(4)1135 1998
21 PaninaEM MironovAA and GelfandMS (2001) Comparativeanalysis of FUR regulons in gamma-proteobacteria Nucleic AcidsRes 29 5195ndash5206
22 SchneiderTD (2000) Evolution of biological informationNucleic Acids Res 28 2794ndash2799 httpwwwccrnpncifcrfgov~tomspaperev
23 VyhlidalCA RoganPK and LeederJS (2004) Development andrefinement of pregnane X receptor (PXR) DNA binding site modelusing information theory insights into PXR-mediated gene regula-tion J Biol Chem 279 46779ndash46786
24 ZhengM DoanB SchneiderTD and StorzG (1999) OxyR andSoxRS regulation of fur J Bacteriol 181 4639ndash4643 httpwwwccrnpncifcrfgov~tomspaperoxyrfur
25 HengenPN BartramSL StewartLE and SchneiderTD (1997)Information analysis of Fis binding sites Nucleic Acids Res 254994ndash5002 httpwwwccrnpncifcrfgov~tomspaperfisinfo
26 WoodTI GriffithKL FawcettWP Jair K-W SchneiderTDand WolfRE (1999) Interdependence of the position andorientation of SoxS binding sites in the transcriptional activation ofthe class I subset of Escherichia coli superoxide-inducible promotersMol Microbiol 34 414ndash430
27 ZhengM WangX DoanB LewisKA SchneiderTD andStorzG (2001) Computation-directed identification of OxyR-DNAbinding sites in Escherichia coli J Bacteriol 183 4571ndash4579
28 RoganPK FauxBM and SchneiderTD (1998) Informationanalysis of human splice site mutations Hum Mutat 12 153ndash171Erratum in Hum Mutat 199913(1)82 httpwwwccrnpncifcrfgov~tomspaperrfs
29 ChenZ and SchneiderTD (2005) Information theory basedT7-like promoter models classification of bacteriophages anddifferential evolution of promoters and their polymerasesNucleic Acids Res 33 6172ndash6187 httpwwwccrnpncifcrfgov~tomspaperst7like
30 ChenZ and SchneiderTD (2006) Comparative analysis of tandemT7-like promoter containing regions in enterobacterial genomesreveals a novel group of genetic islands Nucleic Acids Res 341133ndash1147 httpwwwccrnpncifcrfgov~tomspaperst7island
31 SchneiderTD (1996) Reading of DNA sequence logos predictionof major groove binding by information theory Meth Enzymol274 445ndash455 httpwwwccrnpncifcrfgov~tomspaperoxyr
32 GriggsDW and KoniskyJ (1989) Mechanism for iron-regulatedtranscription of the Escherichia coli cir gene metal-dependentbinding of Fur protein to the promoters J Bacteriol 1711048ndash1052
33 AngererA and BraunV (1998) Iron regulates transcription of theEscherichia coli ferric citrate transport genes directly and throughthe transcription initiation proteins Arch Microbiol 169 483ndash490
34 de LorenzoV HerreroM GiovanniniF and NeilandsJB (1988)Fur (ferric uptake regulation) protein and CAP (catabolite-activatorprotein) modulate transcription of fur gene in Escherichia coliEur J Biochem 173 537ndash546
35 HuntMD PettisGS and McIntoshMA (1994) Promoter andoperator determinants for fur-mediated iron regulation in thebidirectional fepA-fes control region of the Escherichia colienterobactin gene system J Bacteriol 176 3944ndash3955
36 TardatB and TouatiD (1993) Iron and oxygen regulation ofEscherichia coli MnSOD expression competition between the globalregulators Fur and ArcA for binding to DNA Mol Microbiol 953ndash63
37 EscolarL Perez-MartinJ and de LorenzoV (2000) Evidence ofan unusually long operator for the fur repressor in the aerobactinpromoter of Escherichia coli J Biol Chem 275 24709ndash24714
38 YoungGM and PostleK (1994) Repression of tonB transcriptionduring anaerobic growth requires Fur binding at the promoterand a second factor binding upstream Mol Microbiol 11943ndash954
39 BrickmanTJ OzenbergerBA and McIntoshMA (1990)Regulation of divergent transcription from the iron-responsivefepB-entC promoter-operator regions in Escherichia coliJ Mol Biol 212 669ndash682
40 BaichooN and HelmannJD (2002) Recognition of DNA by Fura reinterpretation of the Fur box consensus sequence J Bacteriol184 5826ndash5832
41 OchsnerUA and VasilML (1996) Gene repression by the ferricuptake regulator in Pseudomonas aeruginosa cycle selection of iron-regulated genes Proc Natl Acad Sci USA 93 4409ndash4414
42 BaichooN WangT YeR and HelmannJD (2002) Globalanalysis of the Bacillus subtilis Fur regulon and the iron starvationstimulon Mol Microbiol 45 1613ndash1629
43 ChristoffersenCA BrickmanTJ Hook-BarnardI andMcIntoshMA (2001) Regulatory architecture of the iron-regulatedfepD-ybdA bidirectional promoter region in Escherichia coliJ Bacteriol 183 2059ndash2070
44 ChaiS WelchTJ and CrosaJH (1998) Characterization of theinteraction between Fur and the iron transport promoter of thevirulence plasmid in Vibrio anguillarum J Biol Chem 27333841ndash33847
6776 Nucleic Acids Research 2007 Vol 35 No 20
45 EscolarL Perez-MartinJ and de LorenzoV (1998) Coordinatedrepression in vitro of the divergent fepA-fes promoters ofEscherichia coli by the iron uptake regulation (Fur) proteinJ Bacteriol 180 2579ndash2582
46 EscolarL de LorenzoV and Perez-MartinJ (1997)Metalloregulation in vitro of the aerobactin promoter of Escherichiacoli by the Fur (ferric uptake regulation) protein Mol Microbiol26 799ndash808
47 ShultzabergerRK ChenZ LewisKA and SchneiderTD (2007)Anatomy of Escherichia coli s70 promoters Nucleic Acids Res 35771ndash788 httpwwwccrnpncifcrfgov~tomspaperflexprom
48 ShultzabergerRK BucheimerRE RuddKE andSchneiderTD (2001) Anatomy of Escherichia coli ribosomebinding sites J Mol Biol 313 215ndash228 httpwwwccrnpncifcrfgov~tomspaperflexrbs
49 HiraoI KawaiG YoshizawaS NishimuraY IshidoYWatanabeK and MiuraK (1994) Most compact hairpin-turnstructure exerted by a short DNA fragment d(GCGAAGC) insolution an extraordinarily stable structure resistant to nucleasesand heat Nucleic Acids Res 22 576ndash582
50 Eick-HelmerichK and BraunV (1989) Import of biopolymers intoEscherichia coli nucleotide sequences of the exbB and exbD genesare homologous to those of the tolQ and tolR genes respectivelyJ Bacteriol 171 5117ndash5126
51 VassinovaN and KozyrevD (2000) A method for direct cloning ofFur-regulated genes identification of seven new Fur-regulated lociin Escherichia coli Microbiology 146 3171ndash3182
52 ToledanoMB KullikI TrinhF BairdPT SchneiderTD andStorzG (1994) Redox-dependent shift of OxyR-DNA contactsalong an extended DNA binding site a mechanism for differentialpromoter selection Cell 78 897ndash909
53 BlattnerFR PlunkettG III BlochCA PernaNT BurlandVRileyM Collado-VidesJ GlasnerJD RodeCK et al (1997)The complete genome sequence of Escherichia coli K-12 Science277 1453ndash1474
54 BaggA and NeilandsJB (1987) Molecular mechanism of regula-tion of siderophore-mediated iron assimilation Microbiol Rev 51509ndash518
55 LavrrarJL and McIntoshMA (2003) Architecture of a Furbinding site a comparative analysis J Bacteriol 185 2194ndash2202
56 OrchardK and MayGE (1993) An EMSA-based method fordetermining the molecular weight of a protein ndash DNA complexNucleic Acids Res 21 3335ndash3336
57 MasseE and GottesmanS (2002) A small RNA regulates theexpression of genes involved in iron metabolism in Escherichia coliProc Natl Acad Sci USA 99 4620ndash4625
58 LavrrarJL ChristoffersenCA and McIntoshMA (2002)FurndashDNA interactions at the bidirectional fepDGC-entS promoterregion in Escherichia coli J Mol Biol 322 983ndash995
59 DelanyI RappuoliR and ScarlatoV (2004) Fur functions as anactivator and as a repressor of putative virulence genes in Neisseriameningitidis Mol Microbiol 52 1081ndash1090
60 MasseE VanderpoolCK and GottesmanS (2005) Effect ofRyhB small RNA on global iron use in Escherichia coliJ Bacteriol 187 6962ndash6971
61 ArgamanL HershbergR VogelJ BejeranoG WagnerEGMargalitH and AltuviaS (2001) Novel small RNA-encodinggenes in the intergenic regions of Escherichia coli Curr Biol 11941ndash950
62 WassarmanKM RepoilaF RosenowC StorzG andGottesmanS (2001) Identification of novel small RNAs usingcomparative genomics and microarrays Genes Dev 15 1637ndash1651
63 RobisonK McGuireAM and ChurchGM (1998) A compre-hensive library of DNA-binding site matrices for 55 proteins appliedto the complete Escherichia coli K-12 genome J Mol Biol 284241ndash254
64 SalgadoH Santos-ZavaletaA Gama-CastroS Millan-ZarateDDiaz-PeredoE Sanchez-SolanoF Perez-RuedaEBonavides-MartinezC and Collado-VidesJ (2001) RegulonDB(version 32) transcriptional regulation and operon organization inEscherichia coli K-12 Nucleic Acids Res 29 72ndash74
65 AndrewsSC RobinsonAK and Rodriguez-QuinonesF (2003)Bacterial iron homeostasis FEMS Microbiol Rev 27 215ndash237
66 SchneiderTD (2002) Consensus Sequence Zen ApplBioinformatics 1 111ndash119 httpwwwccrnpncifcrfgov~tomspaperszen
67 RuddKE and SchneiderTD (1992) Compilation of E coliribosome binding sites In MillerJH (ed) A Short Course inBacterial Genetics A Laboratory Manual and Handbook forEscherichia coli and Related Bacteria Cold Spring HarborCold Spring Harbor Laboratory Press New York pp 1719ndash1745
68 HengenPN LyakhovIG StewartLE and SchneiderTD(2003) Molecular flip-flops formed by overlapping Fis sitesNucleic Acids Res 31 6663ndash6673
69 FeeJA (1991) Regulation of sod genes in Escherichia colirelevance to superoxide dismutase function Mol Microbiol 52599ndash2610
70 IgarashiK SaishoT YuguchiM and KashiwagiK (1997)Molecular mechanism of polyamine stimulation of thesynthesis of oligopeptide-binding protein J Biol Chem 2724058ndash4064
71 YoshidaM MeksuriyenD KashiwagiK KawaiG andIgarashiK (1999) Polyamine stimulation of the synthesis ofoligopeptide-binding protein (OppA) Involvement of a structuralchange of the Shine-Dalgarno sequence and the initiation codonaug in oppa mRNA J Biol Chem 274 22723ndash22728
72 LitwinCM and CalderwoodSB (1993) Role of iron in regulationof virulence genes Clin Microbiol Rev 6 137ndash149
73 MakuiH RoigE ColeST HelmannJD GrosP andCellierMF (2000) Identification of the Escherichia coli K-12Nramp orthologue (MntH) as a selective divalent metal iontransporter Mol Microbiol 35 1065ndash1078
74 PatzerSI and HantkeK (2001) Dual repression by Fe(2+)-Furand Mn(2+)-MntR of the mntH gene encoding an NRAMP-likeMn(2+) transporter in Escherichia coli J Bacteriol 1834806ndash4813
75 FurrerJL SandersDN Hook-BarnardIG and McIntoshMA(2002) Export of the siderophore enterobactin in Escherichia coliinvolvement of a 43 kDa membrane exporter Mol Microbiol 441225ndash1234
76 CunninghamL GruerMJ and GuestJR (1997) Transcriptionalregulation of the aconitase genes (acnA and acnB) of Escherichiacoli Microbiology 143 ( Pt 12) 3795ndash3805
77 FuentesAM Diaz-MejiaJJ Maldonado-RodriguezR andAmabile-CuevasCF (2001) Differential activities of theSoxR protein of Escherichia coli SoxS is not required for geneactivation under iron deprivation FEMS Microbiol Lett 201271ndash275
78 VargheseS TangY and ImlayJA (2003) Contrasting sensitivitiesof Escherichia coli aconitases A and B to oxidation and irondepletion J Bacteriol 185 221ndash230
79 ZavitzKH DiGateRJ and MariansKJ (1991) The PriB andPriC replication proteins of Escherichia coli Genes DNAsequence overexpression and purification J Biol Chem 26613988ndash13995
80 SandlerSJ (2005) Requirements for replication restart proteinsduring constitutive stable DNA replication in Escherichia coli K-12Genetics 169 1799ndash1806
81 SchneiderTD (2001) Strong minor groove base conservation insequence logos implies DNA distortion or base flipping duringreplication and transcription initiation Nucleic Acids Res 294881ndash4891 httpwwwccrnpncifcrfgov~tomspaperbaseflip
Nucleic Acids Research 2007 Vol 35 No 20 6777
walkers (02 and 08 bits) extend past the range of thefootprint resulting in 144 coverage of the footprint
In a previously published study Escolar et al synthe-sized oligonucleotides that contained repeats of thesequence GATAAT and determined Fur binding byfootprint experiments (5) No footprint was seen withone insert correspondingly no sequence walkers wereobserved in that sequence (Supplementary Figure S9)For the oligos with three four and five inserts sequencewalkers cover 83ndash90 of the footprints The 2-insertsequence had a weak interaction with Fur at high protein
concentrations and two overlapping walkers of 30 and42 bits appeared which cover the repeats (5) Theseresults demonstrate that the Fur model accurately predictsFur binding on both natural and synthetic sequences
Whole genome scan
The lowest information of any site in the 24-site M9 modelis 101 bits (Supplementary Figure S1) Since all sites inthe model are proven binding sites we used 10 bits asthe cutoff for a whole genome scan This presumably
A
B
C
======gtolt======
======gtolt======
======gtolt======
24 E coli Fur binding sites 196 bits (minus9 +9)
0
1
2b
its
5prime
minus20
CTGA
minus19
minus18
minus17
minus16
GCAT
minus15
GCTA
minus14
CGTA
minus13
GACT
minus12
CAGT
minus11
CGTA
minus10
GCAT
minus9
GTA
minus8
GTA
minus7
GCAT
minus6
CAG
minus5
GTCA
minus4
GAT
minus3
TA
minus2
TGCA
minus1
GACT
0 1
TCGA
2
ACGT
3
AT
4
TCA
5
CGAT
6
GTC
7
CTGA
8
CAT
9
CAT
10
CGTA
11
GCAT
12
GTCA
13
CTGA
14
GCAT
15
CGAT
16
CGTA
17 18 19 20
GCAT
3prime
20 P aeruginosa Fur sites 186 bits (minus9 +9)
0
1
2
bit
s
5prime
minus20
minus19
minus18
minus17
minus16
minus15
minus14
CGTA
minus13
ACGT
minus12
minus11
minus10
CGTA
minus9
CGTA
minus8
CTA
minus7
CT
minus6
TAG
minus5
GTCA
minus4
CAGT
minus3
TA
minus2
CTA
minus1
CT
0
GC
1
GA
2
GAT
3AT
4GTCA
5
CAGT
6
ATC
7
GA
8
GAT
9
GCAT
10
GCAT
11 12 13
TGCA
14
GCAT
15 16 17 18 19 20 3prime
22 B subtilis Fur sites 229 bits (minus9 +9)
0
1
2
bit
s
5prime
minus20
minus19
GCAT
minus18
GCTA
minus17
GACT
minus16
CTGA
minus15
minus14
GCAT
minus13
GCAT
minus12
minus11
GCAT
minus10
GCTA
minus9
CGTA
minus8
CGTA
minus7
AT
minus6
C
G
minus5
GCA
minus4
CGAT
minus3
TGA
minus2
TGA
minus1
GCAT
0
TAGC
1
GCTA
2
CAT
3
CAT
4
GCTA
5
CGT
6
G
C7
TA
8
GCAT
9
GCAT
10
CGAT
11
CGTA
12 13
CGTA
14
CGTA
15 16
GACT
17
CTGA
18
CGAT
19
CGTA
20 3prime
Figure 4 Comparison of M9 Fur models of three bacterial species Experimentally proven Fur binding sites from E coli (A) P aeruginosa (B) andB subtilis (C) were used to build the models for these three bacteria The marked region from 7 to +7 in each logo indicates the 7-1-7 model thatwas proposed by Baichoo and Helmann (40)
Nucleic Acids Research 2007 Vol 35 No 20 6769
minimizes false positives in our predictions When themodel was scanned across the E coli K-12 genome(NC_000913) a total of 412 sites were found above10 bits These sites belong to 363 unique clusters A list ofthe clusters is available at httpwwwccrnpncifcrfgovtomspapersfurThe 363 regions were extracted from 50 to thorn50
around the strongest site in each region and scanned withthe 24-site M9 model using a cutoff of 0 bits We thendetermined the exact region of each cluster that is coveredby walkers greater than 0 bits The results show that theseclusters contain 1ndash14 walkers covering a region from19ndash142 bases (350 bases on average) This is comparableto the known footprints (30ndash103 bases) (Table 3 Supple-mentary Figure S8) Of these clusters 155 are in inter-genic regions 143 are entirely inside gene coding regionsand 65 overlap the boundary between coding and inter-genic regionsTo determine how many of the Fur clusters overlap
with reasonably strong promoters we scanned these 363Fur clusters with a s70 promoter model (47) using a 4-bitcutoff The results showed that 303 clusters overlap withone or more strong promoters (Rigt4 bits) The other60 clusters do not overlap with a strong promoterWhen we scanned the flanking regions of these 60 clusters(50 to 0 of the left edge and 0 to thorn50 of the right edgeof each cluster) only 23 clusters were found to havestrong promoters (Ri gt 4 bits) in the flanking regionsWe examined the locations of Fur sites relative toknown promoter elements (transcriptional start site 10and 35) and found that they are more tightly distributedaround the promoter elements than observed with theDNA binding proteins Fis H-NS and IHF (47) (data notshown) These results suggest that Fur may mainly act as adirect repressor of genes by binding to and blockingthe promoters and that direct activation by Fur throughbinding to the region upstream of a promoter assuggested in Neisseria meningitidis by Delany et al (59)
may not be a common mechanism of Fur regulationin E coli
Out of the 363 clusters 42 were found containingat least one predicted Fur binding site above 170 bits(our arbitrary cutoff used to locate significant regions)(Table 4) Of the 42 clusters 39 were found to overlap withone or more strong s70 promoters (gt4 bits) The otherthree clusters fepD-entS yddA and fecA do not overlapScanning the flanking regions of these three clusters withthe s70 promoter model revealed that these Fur clustersare located between the respective promoters and transla-tional starts There may be only weak repression of thesegenes by Fur as none of them were significantly repressedby Fur according to microarray analyses (1360)
Scans of other proposed Fur-regulated genes
Many genes have been proposed to be Fur-regulated bycomparing conventional consensus sequences to promoterregions and also by homology to systems in other organ-isms Kammler et al annotated two Fur binding sites inthe promoter region of feoA in E coli by comparison toa consensus sequence (10) The 24-site M9 model predictstwo clusters of sites in the same region with each clustercovering one of the marked Fur boxes One cluster hasone major site of 105 bits (at 3538006) the other clusterhas two major overlapping sites of 109 (at 3538059) and134 bits (at 3538065) respectively The same authors haveconfirmed that Fur does bind to this region in vivo but theexact Fur binding regions have not been determined byfootprint experiments
Vassinova and Kozyrev used an in vivo selection tolocate Fur sites on Sau3A fragments from the E coligenome (51) The five regions from Figure 2 of their paperwere analyzed using sequence walkers Using an unidenti-fied consensus sequence yhhX (from 3578828 to 3577791)was predicted by Vassinova and Kozyrev to have two Furbinding sites In the same region we found one strongsite of 252 bits (at 3579054) (Table 4) overlapping twoweaker sites (82 and 83 bits) This cluster of Fur sites isactually located right in front of an sRNA gene ryhB(from 3579039 to 3578946) (Table 4) (6162) which hasbeen shown to be repressed by Fur (57) The promoterregion of gpmA (from 786818 to 786066 named pgm inreference 51) was predicted by consensus to contain twosites One of the highest sites (274 bits) was found in thisregion (at 786853) overlapping two weaker sites of 75and 56 bits The consensus sequence-predicted Fur site byVassinova and Kozyrev (51) is located about 600 basesupstream of the ygaC gene In the same region our 11-siteM9 model only found a 14-bit site (Figure 3) and the24-site M9 model found a negative site of 34 bits(Supplementary Figure S1) Instead in the nearby regionsthe 24-site M9 model found a 74-bit site (11 basesupstream of ygaC) and a 101-bit site (728 bases upstreamof ygaC) For nohA (from 1634391 to 1633822)two overlapping sites of 223 and 126 bits were found at1634624 and 1634630 (Table 4) The fifth region was fhuF(from 4603686 to 4602898) Figure 5 shows the DNase Ifootprints and predicted walkers in this region Onlythe high-affinity region (fhuF1) was identified by the
Table 2 Information theory test of B subtilis Fur models M9 and M6
Oligo1 Number of observed1
Fur dimers (nM)Number of M92
walkers (bits)Number of M62
walkers (bits)
6-mer 0 f 0 f 06-6 1 (500) f 1 (21) 2 (67 54)6-1-6 1 (200) f 1 (97) 2 (54 74)7-1-7 1 (100) f 1 (207) 2 (107 128)8-1-7 2 (100 1000) f 2 (76 202) f 2 (174 108)8-1-8 2 (100 1000) f 2 (76 230) f 2 (174 160)9-1-9 2 (50 100) f 2 (119 301) f 2 (208 194)2 (7-1-7) 2 (10 100) f 2 (166 257) 3 (61 208 143)feuA 1 (10) f 1 (223) 2 (103 81)dhbA 2 (10 100) f 2 (256 299) 3 (149 208 171)
1Oligos and observed Fur dimers are from (40) The Fur concentration(nM) at which one or two Fur dimer-DNA complexes appeared isgiven in parentheses2Number of sequence walkers of each model followed by theinformation (bits) of each walker are shown for each sequence Thepredictions that match the number of observed Fur dimers are markedwith a solid circle while those that do not are marked with an opencircle Only major walkers are counted A full depiction of all walkerscan be found in Supplementary Figure S7
6770 Nucleic Acids Research 2007 Vol 35 No 20
A B
L
H
H
L
H fhuF1
L fhuF2
yjjZ
4603880 4603870 4603860 4603850 4603840 4603830 4603820 5 a t c a g c a a t c c c g g c a g c a a c a c t c c c c a g c c a c t g c c c a g c g t a c g t t g c a a c a t g a t t t c a t c t c 3
4603810 4603800 4603790 4603780 4603770 4603760 4603750 5 t t t c a t t g a t a a t g a t a a c c a a t a t c a t a t g a t a a t t t t t a t c a t t t g c a a g c c a g a t a a a t c c c t t g c t a t c g g 3
Fur high affinity topFur high affinity bottom
M9 161 bits
M9 208 bits
M9 223 bits
M9 34 bits
fhuF
fhuF
4603740 4603730 4603720 4603710 4603700 4603690 4603680 5 g t a a a c c t a t c g c t a t g a t t a g c a a t c a t t a t c a t t t a g a t t a c t a t c c c g a t t a t g g c c t a t c g t t c 3rsquo
Fur low affinity topFur low affinity bottom
M9 22 bits
M9 01 bits
M9 146 bits
M9 93 bits
Figure 5 Fur model scan and DNase I footprints of the fhuF promoter region (A) The promoter region was scanned with the 24-site M9 model and two clusters of sequence walkers (Rigt 0bits) were detected with each closely corresponding to one of the two Fur protected regions (gray bars fhuF1 and fhuF2) The solid black arrow with two tails above the DNA at coordinates4603717 and 4603716 indicates the fhuF transcription starts The open arrows starting at coordinates 4603827 and 4603686 indicate the yjjZ and fhuF translation starts respectively Thesaturation of colored rectangles behind each sequence walker is proportional to the information of that site For reference colored lines cycle with 6-base periodicity (B) DNase I footprinting onthe fhuF promoter by Fur showed two regions protected by the protein marked in the figure by brackets The high-affinity and low-affinity regions are marked with H and L respectively Thefootprinting samples were run in parallel with MaxamndashGilbert sequencing ladders (marked by GA)
Nucleic
Acid
sResea
rch2007V
ol3
5N
o20
6771
consensus sequence Thus four of the five Sau3Afragments (yhhX gpmA nohA and fhuF) selected to haveFur binding sites in vivo were identified in our genomescan (Table 4) and our gel shift results confirmed this(Figure 3) the remaining site ygaC had a weak walker of14 bits (by the 11-site M9 model) and showed extremelyweak binding (Figure 3)Eick-Helmerich and Braun matched a Fur consensus
sequence to the promoters of exbB and exbD (50)One strong walker of 191 bits was found in the exbBpromoter region overlapping the lsquoFur boxrsquo The regionupstream of exbD showed no walkers with Ri greater than0 bits even though it had also been predicted to containa Fur binding site using the same consensus sequenceas used in the exbB promoter The highest walker thatwe could detect in the region upstream of the exbDgene is 71 bits (by the 24-site M9 model Figure 3 andSupplementary Figure S1) significantly lower than thetheoretical lower bound of a binding site (0 bits) (19) Thisindicates that Fur should not bind this region and our gelshift results confirmed this (Figure 3)A microarray analysis by McHugh et al found 101
genes to be regulated by Fur 53 of which were repressedand 48 activated (13) The 53 Fur-repressed genes belongto 32 transcription units (individual genes or operons)18 of which are involved in iron metabolism 14 in energymetabolism and other functions Using the 24-site M9model we predicted strong Fur binding clusters (higherthan 13 bits) for all 18 iron metabolism-related transcrip-tion units Of the other 14 transcription units onlytwo have a Fur binding site higher than 10 bits (nrdHIEF101 bits fimE 108 bits) The 48 Fur-activated genesbelong to 34 transcription units only three of these(garPLRK ynaE and ydfK) have a predicted Fur bindingcluster The garPLRK operon has a weak Fur clusterthat contains one major site of 104 bits The ynaE andydfK genes are almost identical in both promoter andcoding regions and the two Fur clusters (171 bitscentered at thorn8 from the translational start) are the same(Table 4) These results strongly suggest that in E coli
Fur is a direct repressor of iron metabolism genes and anindirect repressor or activator of other genes
McHugh et al (13) used an altered information theoryapproach for modeling Fur binding (63) to identify whichgenes are under direct Fur control Our model identifiedsites in all genes that their model did as well as fourothers fhuE (124 bits at 1160880) exbB (191 bits at3150076) fimE (108 bits at 4539697) and ydfK (171 bitsat 1631071)
The indirect activation by Fur requires an intermediateregulator which could directly target Fur-activated genesA small regulatory RNA gene ryhB is one such importantplayer Earlier study revealed several genes that aredirectly repressed by ryhB (57) Recently a microarrayanalysis of ryhB control by Masse et al (60) expanded thisdirect target set to 18 operons They also identified anadditional 10 indirectly down-regulated operons and 10up-regulated operons The 10 ryhB indirectly repressedoperons are directly repressed by Fur and for all ofthese operons we found strong Fur binding clusters(higher than 13 bits) in the corresponding promoterregions In contrast only a few of the other 28 ryhB-controlled operons (18 directly repressed operons and10 up-regulated operons) have a predicted Fur bindingcluster above 10 bits These include acnA (176 bits at1333484) ydhD (109 bits at 1732367) and oppA (215 bitsat 1298973) The acnA and oppA Fur clusters are bothstrong the former cluster has three overlapping sites of176 98 and 142 bits and the latter has two overlapp-ing sites of 215 and 96 bits (Table 4) Howevermicroarray analysis did not find these two genes to berepressed by Fur (1360) suggesting that the Fur bindingclusters may not be the primary control elements ofthese genes
DISCUSSION
Fur binding models
In this study we used experimentally proven sites to createinformation theory models of Fur binding Our models
Table 3 Scan of footprinted Fur binding sites in E coli
Accession Gene Footprinted region References Number Ri Percentage
NC_000913 fhuF1 4603763 ndash 4603815 This work 4 223 91NC_000913 fhuF2 4603680 ndash 4603733 This work 4 146 66NC_000913 fepA-fes 611868 ndash 611915 (35) 4 187 77NC_000913 fepB 623939 ndash 623969 (39) 2 167 81NC_000913 entC 624054 ndash 624102 (39) 4 231 76NC_000913 fur 709931 ndash 709960 (34) 2 123 83NC_000913 tonB 1309040 ndash 1309072 (838) 3 115 88NC_000913 cir 2244957 ndash 2244999 (32) 3 207 72NC_000913 sodA 4098726 ndash 4098780 (36) 3 233 56NC_000913 fecA 4514752 ndash 4514789 (33) 2 176 66NC_000913 fecI 4516300 ndash 4516340 (33) 7 231 144M10930 iucA 291 ndash 393 (37) 13 203 93NC_000913 fepD-entS 621394 ndash 621462 (58) 4 201 54
The footprinted regions were scanned with the 24-site M9 model (Rigt0) to test the modelrsquos validity This table lists the GenBank accession numberthe gene promoter that was footprinted the coordinates of the footprinted region the references for the footprints the number of walkers in theregion the strongest Ri value in the same region and the percent of the footprint covered by sequence walkers This table is a summary of Figure 5Aand Supplementary Figure S8 Sites in the promoters that are marked by an asterisk were used to build the 11-site models (Figure 1 SupplementaryFigure S1) in the case of iucA two sites were used in the model (see Materials and Methods Section)
6772 Nucleic Acids Research 2007 Vol 35 No 20
especially the 24-site M9 model approximate the bindingcharacteristics of Fur more fully than previous modelsthat depended on conventional consensus sequences datafrom multiple species and sequences that were not foot-printed (13251) The rigorous approach used hererevealed new binding sites disproved two sites predictedby a consensus sequence and clarified the manner inwhich the protein binds
To produce these models binding sequences weremultiply aligned to maximize the information content indifferent window sizes Because multiple Fur sites overlapour analysis of proven Fur binding sequences from three
bacterial species gave two or three different alignmentsdepending on the species (Figure 1 SupplementaryFigure S6) By testing these models against publisheddata we showed clearly that the M9 model representssingle-dimer binding sites the M12 model is for two-dimerbinding while the compact M6 alignment was caused bycompression of the binding sequences into a small window(Tables 1 and 2 Figures 1 and 2 Supplementary FiguresS3 S5 and S7)As evaluated by their respective M9 models our
initial set of 11 footprinted E coli Fur binding sitesall contain overlapping sites of 6-base spacing
Table 4 Strong Fur clusters in the E coli genome predicted by the 24-site M9 model
Ri Number Size Coordinate Genes Experimental Position References(bits) (bp) data
274 12 95 2066611 yoeA G 48 NA274 5 31 786853 gpmA G 35 (1351)253 4 28 579824 nohB 133 (1)252 4 31 3579054 ryhB G 15 (57)233 3 31 4098746 sodA F 87 (6069)231 7 59 4516309 fecI F 51 (1360)231 4 37 624072 entC F 36 (39)228 2 25 1903412 yebN 300 NA223 4 48 4603779 fhuFa FG 93 (135160)223 3 25 1634624 nohA G 233 (51)217 2 25 1886651 fadD G +1119b NA215 2 25 1298973 oppA G 233 (7071)213 3 31 1762463 sufA 53 (1360)207 3 31 2244980 cir F 189 (1372)206 5 40 1118358 yceJ 89 NA202 3 31 3465053 bfd 40 (1360)201 10 53 1787602 ydiE 35 (13)201 2 35 1080513 ycdN 57 (1360)199 8 106 3214572 yqjH 59 (13)199 7 103 1752756 ydhY 255 NA199 2 25 2510784 mntH 56 (7374)196 3 25 2231892 yeiT 163 NA196 2 25 1577358 yddA +8c (13)195 3 31 167439 fhuA G 45 (1360)194 4 37 621437 fepD ndash entS FG 25 86 (1375)193 3 52 675768 ybeQ +2c NA191 3 31 3150076 exbB G 70 (136072)187 8 96 2784036 ygaQ 383 NA187 4 37 611889 fepA ndash fes F 172 149 (13)186 7 38 3266404 yhaC 33 NA186 4 28 331901 yahA G +306e NA183 4 61 331085 yahA 510e NA176 6 41 1333484 acnA 371 (5776ndash78)176 2 25 4514779 fecA F 79 (1360)176 1 19 2254041 yeiL +5d NA175 3 25 490057 priC ndash ybaN 21 49 (7980)173 2 29 2454142 yfcV 474 NA172 8 41 3582772 yrhB 10 NA172 4 37 840877 fiu 123 (13)171 3 40 1631071 ydfK +8c (13)171 3 40 1432273 ynaE +8c (13)171 2 25 4570225 yjiT 212 NA
The strongest individual information value (Ri bits) in each cluster is shown followed by the number of walkers (Rigt 0 bits) the size of the clusterthe coordinate of the strongest site the genes that are presumably controlled experimental data footprinted (F) andor gel shifted (G) sites thelocation of the strongest Fur site relative to the gene start or stop codon (as noted) and references about Fur regulation of the genes (NA means noreference is available)aThis cluster corresponds to the high-affinity region (fhuF1) in the fhuF promoter (Figure 5)bThe coordinate is high because Fur cluster is inside the fadD coding regioncThese Fur clusters overlap with translational startsdThis Fur cluster overlaps with the stop codoneThe yahA gene has two Fur clusters one is inside the coding region another is 510 bases upstream of the translational start The Fur cluster insidethe gene was tested by gel shift experiments (Figure 3)
Nucleic Acids Research 2007 Vol 35 No 20 6773
(Supplementary Figure S8) while only 7 of the 20P aeruginosa Fur binding sites and 6 of the 22 B subtilisFur binding sites are single-dimer binding sites (data notshown) This scarcity of multiple sites explains why anM12 alignment can be obtained for E coli but not forP aeruginosa or B subtilis When we performed multiplealignment by only using overlapping sites of 6-bp spacingwe obtained M12 alignments for both P aeruginosa andB subtilis (data not shown) These observations alsosupport the use of M9 as a single-dimer modelOur multiple alignment method should apply to any
DNA binding sites that contain internal direct repeats asrepetition within binding sites may cause alternativealignments of the sites In the RegulonDB (httpregulondbccgunammx) (64) a Fur binding modelwas created by multiply aligning 47 Fur sites althoughit is not clear what these 47 sites are and how thealignment was made Based on their alignment (httpregulondbccgunammxhtmlmatrix_Alignmentjsp) we made a sequence logo (data notshown) which we found to be similar to our M6 logosAlthough the M6 models appear to be able to predictsome sites (eg Tables 1 and 2 Supplementary Figures S3S5 and S7) the models only represent the internal partof overlapping Fur binding sites ie from 6 to thorn6 of theM12 model (Figure 1A)A 18 A crystal structure for P aeruginosa Fur
was obtained by Pohl et al and based on this theauthors proposed a single-dimer binding model (9) TheP aeruginosa crystal structure contains a putative DNAbinding a-helix H4 and a loop between helices H1 and H2both of which were proposed to be involved in bindingDNA through contacting bases in the major groove Thusa single dimer would protect two consecutive majorgrooves on the same face of DNA and the minimalrecognition unit of Fur should be close to 20 bp (9) OurM9 binding site models (19 bp a 9-1-9 inverted repeat) fitwith this single-dimer binding model closely Pohl et alalso proposed a two-dimer binding model in whichtwo Fur dimers bind two overlapping sites that are 5-bpapart (9) However in our whole genome scans andfootprint scans as well we did not observe any case of5-bp separated overlapping sites above 5 bits Instead wemostly found 3- or 6-bp separated overlapping sitesAbove five bits we found 95 cases of 3-bp spacing and92 cases of 6-bp spacingSeveral other consensus-based Fur binding site models
have been proposed to interpret a 19-bp consensuslsquoFur boxrsquo (Figure 6) (65) Within these two earlier
models the classical model and hexamer model(573754) interpreted the lsquoFur boxrsquo as a single recogni-tion unit of a Fur dimer Later Lavrrar and McIntoshsuggested that a 13-bp inverted repeat (6-1-6) is theminimal unit recognized by a single Fur dimer and thattwo overlapping lsquo6-1-6rsquo motifs correspond to the Fur boxwhich is required for high-affinity binding of Fur (5558)Baichoo and Helmann reinterpreted the Fur box as twooverlapping 7-1-7 inverted repeats with a 6-bp spacingand suggested that a 7-1-7 site but not 6-1-6 representsthe minimal recognition unit of Fur (40) Baichoo andHelmann also demonstrated that high-affinity bindingby Fur can happen on a single-dimer binding site ie aninverted 7-1-7 site
The 7-1-7 model basically agrees with our M9 models(Figure 6) The main difference is that the 7-1-7 model didnot count the weakly conserved bases at positions 9 8thorn8 and thorn9 in the B subtilis Fur binding site alignment(Figure 4C) As the E coli and P aeruginosa M9 modelshave highly conserved bases at these four positions(Figure 4A and B) and also the Fur binding mechanismsfor these three species are similar (9) we suggest thata minimal Fur binding unit of 19 bp (9-1-9) as representedby our M9 models is more reasonable
Relative to our M9 models the Fur box only representsthe internal part of two overlapping sites that areseparated by 6 bases ie from 9 to thorn9 of the M12logo (Figures 1A and 6) thus any predictions based on theFur box may not be accurate Two sites (exbD and ygaC)that were predicted by matching to the Fur box wereshown to be non-binding (Figure 3)
Many difficulties in understanding Fur binding sites canbe attributed to the choice of consensus sequences asa model (66) In contrast to the individual informationweight matrix model the consensus sequence methodignores the varying importance of bases by treatingmismatches equivalently In addition the consensusmethod does not have a criterion for an acceptablenumber of mismatches
Sequence walkers predicted by the 24-site M9 modelcover 83 of the footprints (Table 3 Figure 5Supplementary Figure S8) Similar results were obtainedfor both the P aeruginosa and B subtilis M9 models(data not shown) These results strongly suggest that ourM9 models can accurately predict Fur binding
Fur Binding Site Clusters
The whole genome scan with the 24-site M9model revealed363 Fur binding clusters in E coli most of which containmultiple walkers (up to 14) covering regions up to about140 bases This is comparable to the size of knownfootprints which ranges from 30 to 103 bases (Table 3Supplementary Figure S8) Most of the clusters (303 outof 363 83) were found to overlap with one or morestrong s70 promoters Indeed the four Fur clusters inthe middle of coding regions gspC garP yahA and fadDthat we tested by gel mobility shift assays (Figure 3) allhave one or more potential s70 sites (Ri gt 4 bits) exactlyoverlapping them (see Figures S10ndashS13) The function ofthese control elements is unknown
Fur boxClassical model 9-1-9Hexamer model 6-6-1-6Lavrrar model 6-1-6
Baichoo model 7-1-7
M9 model 9-1-9
GATAATGATAATCATTATC
========gtOlt========
=====gt=====gtOlt=====
=====gtOlt==========gtOlt=====
t=====gtOlt============gtOlt=====a
aat=====gtOlt================gtOlt=====att
Figure 6 Different Fur binding models to interpret the Fur box
6774 Nucleic Acids Research 2007 Vol 35 No 20
We correlated our whole genome scan with publishedmicroarray data about Fur in E coli (1360) The resultsshowed that all operons that are involved in ironmetabolism have a predicted strong Fur binding cluster(higher than 13 bits) suggesting direct repression ofthese genes by Fur In contrast only a few of otherFur-activated or repressed operons have a predicted Furcluster higher than 10 bits suggesting indirect control ofthese genes by Fur It has been shown that RNA generyhB directly targets and represses a set of other genes andryhB itself is directly repressed by Fur thus ryhB playsan important role in indirect activation of genes by Furin E coli (135760)
It has been suggested that in N meningitidis Fur canalso directly activate genes by binding to the regionupstream of the promoters (59) However our wholegenome analysis did not support this activation mechan-ism in E coli The E coli ryhB gene is located betweengenes yhhX and yhhY No corresponding homologs of thegenes ryhB yhhX or yhhY can be found in N meningitidisThis suggests that Fur activation mechanisms may bedifferent in different species
Disentangling Fur models
Each binding site that we have studied in detail withinformation theory has had unique properties andchallenges for analysis For example ribosome bindingsites were initially modeled as rigid objects (16) but whenenough sites became available it was clear that theShinendashDalgarno sequence was being lost because it has avariable distance to the initiation codon (67) and a flexiblemodel was required (48) The flexible model was latersuccessfully applied to s70 promoters (47) and that wasused to study Fur sites Likewise information theoryanalysis revealed that Fis binding sites are commonlyfound to overlap each other (68) leading to someuncertainty as to how much nearby sites influence thesites being studied in the alignments Overlapping Fursites that are bound by Fur protein dimers create an evenmore complex situation
Although the 24-site M9 model was quite successful themodel is comprised of information from multiple Fur sitesbecause the Fur dimers overlap on the DNA Since theclusters frequently display 6-base separation of sitespositions 3 to thorn9 of the model contain data frompositions 9 to thorn3 of the adjacent Fur sites eg see thelogo of Figure 4 and the M9 walkers of Figure 2AIn addition information from sites separated by 3 basesand the complements of all of these also contribute to theoverall information content of the model It is not obvioushow to disentangle these effects We suggest that the M9model can be scanned across the genome to locate strongsingle-dimer binding sites which could then be experi-mentally confirmed and used to construct a disentangledmodel
Unfortunately the distribution of the individual infor-mation in these sites would be determined by the modelused to locate them and this could introduce biases Forexample the mean of the distribution can be altered justby selecting different subsets of sites from the distribution
A similar bias has occurred in binding site databaseswhere the consensus sequence was used to locate newsites (23) Thus although the sequence logo would nolonger contain self-overlapping data it would not neces-sarily represent the natural distribution of site strengthsTo approach such an ultimate model may require furtherscanning of footprinted regions Incorporation of theweaker sites in the footprints would of course re-entanglethe model so such scans might be best used only todetermine whether the model can detect the weaker sitesA viable solution may be to obtain all single-dimerbinding sites in the genome that have more than zero bitsand can be confirmed experimentally If the resultingmodel matches the observed footprints it may be consi-dered complete On the other hand if the resultingmodel does not match the footprints the mode of bindingby Fur could be different between single dimers andclusters
SUPPLEMENTARY DATA
Supplementary data are available at NAR Online
ACKNOWLEDGEMENTS
We thank Tom OrsquoHalloran and Caryn Outten for Furprotein Pete Rogan for helping to develop the local bestconcept Elaine Bucheimer for writing the localbestprogram and Lakshmanan Iyer Brent Jewett DanielleNeedle Xiao Ma Shu Ouyang Denise Rubens andBruce Shapiro for their helpful comments and discussionsKAL also thanks the National Cancer Institute atFrederick for sponsoring the Werner H Kirsten StudentIntern Program This publication has been funded in partwith Federal funds from the National Cancer InstituteNational Institutes of Health under contract NO1-CO-12400 The content of this publication does not necessarilyreflect the views or policies of the Department of Healthand Human Services nor does mention of trade namescommercial products or organizations imply endorsementby the US Government This research was supported inpart by the Intramural Research Program of the NIHNational Cancer Institute Center for Cancer ResearchFunding to pay the Open Access publication charges forthis article was provided by National Cancer Institute
Conflict of interest statement None declared
REFERENCES
1 StojiljkovicI BaumlerAJ and HantkeK (1994) Fur regulon ingram-negative bacteria Identification and characterization of newiron-regulated Escherichia coli genes by a fur titration assay[published erratum appears in J Mol Biol 1994 Jul 15240(3)271]J Mol Biol 236 531ndash545
2 Le CamE FrechonD BarrayM FourcadeA and DelainE(1994) Observation of binding and polymerization of Fur repressoronto operator-containing DNA with electron and atomic forcemicroscopes Proc Natl Acad Sci USA 91 11816ndash11820
3 CalderwoodSB and MekalanosJJ (1987) Iron regulation ofShiga-like toxin expression in Escherichia coli is mediated by thefur locus J Bacteriol 169 4759ndash4764
Nucleic Acids Research 2007 Vol 35 No 20 6775
4 NiederhofferEC NaranjoCM BradleyKL and FeeJA (1990)Control of Escherichia coli superoxide dismutase (sodA and sodB)genes by the ferric uptake regulation (fur) locus J Bacteriol 1721930ndash1938
5 EscolarL Perez-MartinJ and de LorenzoV (1998) Binding of theFur (Ferric Uptake Regulator) repressor of Escherichia coli toarrays of the GATAAT sequence J Mol Biol 283 537ndash547
6 BaggA and NeilandsJB (1987) Ferric uptake regulation proteinacts as a repressor employing iron (II) as a cofactor to bind theoperator of an iron transport operon in Escherichia coliBiochemistry 26 5471ndash5477
7 de LorenzoV WeeS HerreroM and NeilandsJB (1987)Operator sequences of the aerobactin operon of plasmid ColV-K30binding the ferric uptake regulation (fur) repressor J Bacteriol169 2624ndash2630
8 AlthausEW OuttenCE OlsonKE CaoH andOrsquoHalloranTV (1999) The ferric uptake regulation (Fur) repressoris a zinc metalloprotein Biochemistry 38 6559ndash6569
9 PohlE HallerJC MijovilovichA Meyer-KlauckeWGarmanE and VasilML (2003) Architecture of a protein centralto iron homeostasis crystal structure and spectroscopic analysis ofthe ferric uptake regulator Mol Microbiol 47 903ndash915
10 KammlerM SchonC and HantkeK (1993) Characterization ofthe ferrous iron uptake system of Escherichia coli J Bacteriol 1756212ndash6219
11 DesaiPJ AngererA and GencoCA (1996) Analysis of Furbinding to operator sequences within the Neisseria gonorrhoeae fbpApromoter J Bacteriol 178 5020ndash5023
12 HeidrichC HantkeK BierbaumG and SahlHG (1996)Identification and analysis of a gene encoding a Fur-likeprotein of Staphylococcus epidermidis FEMS Microbiol Lett 140253ndash259
13 McHughJP Rodriguez-QuinonesF Abdul-TehraniHSvistunenkoDA PooleRK CooperCE and AndrewsSC(2003) Global iron-dependent gene regulation in Escherichia coliA new mechanism for iron homeostasis J Biol Chem 27829478ndash29486
14 ShannonCE (1948) A mathematical theory of communicationBell Syst Tech J 27 379ndash423 623ndash656 httpcmbell-labscomcmmswhatshannondaypaperhtml
15 PierceJR (1980) An Introduction to Information Theory SymbolsSignals and Noise 2nd edn Dover Publications Inc New York
16 SchneiderTD StormoGD GoldL and EhrenfeuchtA (1986)Information content of binding sites on nucleotide sequencesJ Mol Biol 188 415ndash431 httpwwwccrnpncifcrfgov~tomspaperschneider1986
17 SchneiderTD and MastronardeD (1996) Fast multiple alignmentof ungapped DNA sequences using information theory and arelaxation method Discrete Appli Math 71 259ndash268 httpwwwccrnpncifcrfgov~tomspapermalign
18 SchneiderTD and StephensRM (1990) Sequence logos a newway to display consensus sequences Nucleic Acids Res 186097ndash6100 httpwwwccrnpncifcrfgov~tomspaperlogopaper
19 SchneiderTD (1997) Information content of individual geneticsequences J Theor Biol 189 427ndash441 httpwwwccrnpncifcrfgov~tomspaperri
20 SchneiderTD (1997) Sequence walkers a graphical method todisplay how binding proteins interact with DNA or RNAsequences Nucleic Acids Res 25 4408ndash4415 httpwwwccrnpncifcrfgov~tomspaperwalker erratum NAR 26(4)1135 1998
21 PaninaEM MironovAA and GelfandMS (2001) Comparativeanalysis of FUR regulons in gamma-proteobacteria Nucleic AcidsRes 29 5195ndash5206
22 SchneiderTD (2000) Evolution of biological informationNucleic Acids Res 28 2794ndash2799 httpwwwccrnpncifcrfgov~tomspaperev
23 VyhlidalCA RoganPK and LeederJS (2004) Development andrefinement of pregnane X receptor (PXR) DNA binding site modelusing information theory insights into PXR-mediated gene regula-tion J Biol Chem 279 46779ndash46786
24 ZhengM DoanB SchneiderTD and StorzG (1999) OxyR andSoxRS regulation of fur J Bacteriol 181 4639ndash4643 httpwwwccrnpncifcrfgov~tomspaperoxyrfur
25 HengenPN BartramSL StewartLE and SchneiderTD (1997)Information analysis of Fis binding sites Nucleic Acids Res 254994ndash5002 httpwwwccrnpncifcrfgov~tomspaperfisinfo
26 WoodTI GriffithKL FawcettWP Jair K-W SchneiderTDand WolfRE (1999) Interdependence of the position andorientation of SoxS binding sites in the transcriptional activation ofthe class I subset of Escherichia coli superoxide-inducible promotersMol Microbiol 34 414ndash430
27 ZhengM WangX DoanB LewisKA SchneiderTD andStorzG (2001) Computation-directed identification of OxyR-DNAbinding sites in Escherichia coli J Bacteriol 183 4571ndash4579
28 RoganPK FauxBM and SchneiderTD (1998) Informationanalysis of human splice site mutations Hum Mutat 12 153ndash171Erratum in Hum Mutat 199913(1)82 httpwwwccrnpncifcrfgov~tomspaperrfs
29 ChenZ and SchneiderTD (2005) Information theory basedT7-like promoter models classification of bacteriophages anddifferential evolution of promoters and their polymerasesNucleic Acids Res 33 6172ndash6187 httpwwwccrnpncifcrfgov~tomspaperst7like
30 ChenZ and SchneiderTD (2006) Comparative analysis of tandemT7-like promoter containing regions in enterobacterial genomesreveals a novel group of genetic islands Nucleic Acids Res 341133ndash1147 httpwwwccrnpncifcrfgov~tomspaperst7island
31 SchneiderTD (1996) Reading of DNA sequence logos predictionof major groove binding by information theory Meth Enzymol274 445ndash455 httpwwwccrnpncifcrfgov~tomspaperoxyr
32 GriggsDW and KoniskyJ (1989) Mechanism for iron-regulatedtranscription of the Escherichia coli cir gene metal-dependentbinding of Fur protein to the promoters J Bacteriol 1711048ndash1052
33 AngererA and BraunV (1998) Iron regulates transcription of theEscherichia coli ferric citrate transport genes directly and throughthe transcription initiation proteins Arch Microbiol 169 483ndash490
34 de LorenzoV HerreroM GiovanniniF and NeilandsJB (1988)Fur (ferric uptake regulation) protein and CAP (catabolite-activatorprotein) modulate transcription of fur gene in Escherichia coliEur J Biochem 173 537ndash546
35 HuntMD PettisGS and McIntoshMA (1994) Promoter andoperator determinants for fur-mediated iron regulation in thebidirectional fepA-fes control region of the Escherichia colienterobactin gene system J Bacteriol 176 3944ndash3955
36 TardatB and TouatiD (1993) Iron and oxygen regulation ofEscherichia coli MnSOD expression competition between the globalregulators Fur and ArcA for binding to DNA Mol Microbiol 953ndash63
37 EscolarL Perez-MartinJ and de LorenzoV (2000) Evidence ofan unusually long operator for the fur repressor in the aerobactinpromoter of Escherichia coli J Biol Chem 275 24709ndash24714
38 YoungGM and PostleK (1994) Repression of tonB transcriptionduring anaerobic growth requires Fur binding at the promoterand a second factor binding upstream Mol Microbiol 11943ndash954
39 BrickmanTJ OzenbergerBA and McIntoshMA (1990)Regulation of divergent transcription from the iron-responsivefepB-entC promoter-operator regions in Escherichia coliJ Mol Biol 212 669ndash682
40 BaichooN and HelmannJD (2002) Recognition of DNA by Fura reinterpretation of the Fur box consensus sequence J Bacteriol184 5826ndash5832
41 OchsnerUA and VasilML (1996) Gene repression by the ferricuptake regulator in Pseudomonas aeruginosa cycle selection of iron-regulated genes Proc Natl Acad Sci USA 93 4409ndash4414
42 BaichooN WangT YeR and HelmannJD (2002) Globalanalysis of the Bacillus subtilis Fur regulon and the iron starvationstimulon Mol Microbiol 45 1613ndash1629
43 ChristoffersenCA BrickmanTJ Hook-BarnardI andMcIntoshMA (2001) Regulatory architecture of the iron-regulatedfepD-ybdA bidirectional promoter region in Escherichia coliJ Bacteriol 183 2059ndash2070
44 ChaiS WelchTJ and CrosaJH (1998) Characterization of theinteraction between Fur and the iron transport promoter of thevirulence plasmid in Vibrio anguillarum J Biol Chem 27333841ndash33847
6776 Nucleic Acids Research 2007 Vol 35 No 20
45 EscolarL Perez-MartinJ and de LorenzoV (1998) Coordinatedrepression in vitro of the divergent fepA-fes promoters ofEscherichia coli by the iron uptake regulation (Fur) proteinJ Bacteriol 180 2579ndash2582
46 EscolarL de LorenzoV and Perez-MartinJ (1997)Metalloregulation in vitro of the aerobactin promoter of Escherichiacoli by the Fur (ferric uptake regulation) protein Mol Microbiol26 799ndash808
47 ShultzabergerRK ChenZ LewisKA and SchneiderTD (2007)Anatomy of Escherichia coli s70 promoters Nucleic Acids Res 35771ndash788 httpwwwccrnpncifcrfgov~tomspaperflexprom
48 ShultzabergerRK BucheimerRE RuddKE andSchneiderTD (2001) Anatomy of Escherichia coli ribosomebinding sites J Mol Biol 313 215ndash228 httpwwwccrnpncifcrfgov~tomspaperflexrbs
49 HiraoI KawaiG YoshizawaS NishimuraY IshidoYWatanabeK and MiuraK (1994) Most compact hairpin-turnstructure exerted by a short DNA fragment d(GCGAAGC) insolution an extraordinarily stable structure resistant to nucleasesand heat Nucleic Acids Res 22 576ndash582
50 Eick-HelmerichK and BraunV (1989) Import of biopolymers intoEscherichia coli nucleotide sequences of the exbB and exbD genesare homologous to those of the tolQ and tolR genes respectivelyJ Bacteriol 171 5117ndash5126
51 VassinovaN and KozyrevD (2000) A method for direct cloning ofFur-regulated genes identification of seven new Fur-regulated lociin Escherichia coli Microbiology 146 3171ndash3182
52 ToledanoMB KullikI TrinhF BairdPT SchneiderTD andStorzG (1994) Redox-dependent shift of OxyR-DNA contactsalong an extended DNA binding site a mechanism for differentialpromoter selection Cell 78 897ndash909
53 BlattnerFR PlunkettG III BlochCA PernaNT BurlandVRileyM Collado-VidesJ GlasnerJD RodeCK et al (1997)The complete genome sequence of Escherichia coli K-12 Science277 1453ndash1474
54 BaggA and NeilandsJB (1987) Molecular mechanism of regula-tion of siderophore-mediated iron assimilation Microbiol Rev 51509ndash518
55 LavrrarJL and McIntoshMA (2003) Architecture of a Furbinding site a comparative analysis J Bacteriol 185 2194ndash2202
56 OrchardK and MayGE (1993) An EMSA-based method fordetermining the molecular weight of a protein ndash DNA complexNucleic Acids Res 21 3335ndash3336
57 MasseE and GottesmanS (2002) A small RNA regulates theexpression of genes involved in iron metabolism in Escherichia coliProc Natl Acad Sci USA 99 4620ndash4625
58 LavrrarJL ChristoffersenCA and McIntoshMA (2002)FurndashDNA interactions at the bidirectional fepDGC-entS promoterregion in Escherichia coli J Mol Biol 322 983ndash995
59 DelanyI RappuoliR and ScarlatoV (2004) Fur functions as anactivator and as a repressor of putative virulence genes in Neisseriameningitidis Mol Microbiol 52 1081ndash1090
60 MasseE VanderpoolCK and GottesmanS (2005) Effect ofRyhB small RNA on global iron use in Escherichia coliJ Bacteriol 187 6962ndash6971
61 ArgamanL HershbergR VogelJ BejeranoG WagnerEGMargalitH and AltuviaS (2001) Novel small RNA-encodinggenes in the intergenic regions of Escherichia coli Curr Biol 11941ndash950
62 WassarmanKM RepoilaF RosenowC StorzG andGottesmanS (2001) Identification of novel small RNAs usingcomparative genomics and microarrays Genes Dev 15 1637ndash1651
63 RobisonK McGuireAM and ChurchGM (1998) A compre-hensive library of DNA-binding site matrices for 55 proteins appliedto the complete Escherichia coli K-12 genome J Mol Biol 284241ndash254
64 SalgadoH Santos-ZavaletaA Gama-CastroS Millan-ZarateDDiaz-PeredoE Sanchez-SolanoF Perez-RuedaEBonavides-MartinezC and Collado-VidesJ (2001) RegulonDB(version 32) transcriptional regulation and operon organization inEscherichia coli K-12 Nucleic Acids Res 29 72ndash74
65 AndrewsSC RobinsonAK and Rodriguez-QuinonesF (2003)Bacterial iron homeostasis FEMS Microbiol Rev 27 215ndash237
66 SchneiderTD (2002) Consensus Sequence Zen ApplBioinformatics 1 111ndash119 httpwwwccrnpncifcrfgov~tomspaperszen
67 RuddKE and SchneiderTD (1992) Compilation of E coliribosome binding sites In MillerJH (ed) A Short Course inBacterial Genetics A Laboratory Manual and Handbook forEscherichia coli and Related Bacteria Cold Spring HarborCold Spring Harbor Laboratory Press New York pp 1719ndash1745
68 HengenPN LyakhovIG StewartLE and SchneiderTD(2003) Molecular flip-flops formed by overlapping Fis sitesNucleic Acids Res 31 6663ndash6673
69 FeeJA (1991) Regulation of sod genes in Escherichia colirelevance to superoxide dismutase function Mol Microbiol 52599ndash2610
70 IgarashiK SaishoT YuguchiM and KashiwagiK (1997)Molecular mechanism of polyamine stimulation of thesynthesis of oligopeptide-binding protein J Biol Chem 2724058ndash4064
71 YoshidaM MeksuriyenD KashiwagiK KawaiG andIgarashiK (1999) Polyamine stimulation of the synthesis ofoligopeptide-binding protein (OppA) Involvement of a structuralchange of the Shine-Dalgarno sequence and the initiation codonaug in oppa mRNA J Biol Chem 274 22723ndash22728
72 LitwinCM and CalderwoodSB (1993) Role of iron in regulationof virulence genes Clin Microbiol Rev 6 137ndash149
73 MakuiH RoigE ColeST HelmannJD GrosP andCellierMF (2000) Identification of the Escherichia coli K-12Nramp orthologue (MntH) as a selective divalent metal iontransporter Mol Microbiol 35 1065ndash1078
74 PatzerSI and HantkeK (2001) Dual repression by Fe(2+)-Furand Mn(2+)-MntR of the mntH gene encoding an NRAMP-likeMn(2+) transporter in Escherichia coli J Bacteriol 1834806ndash4813
75 FurrerJL SandersDN Hook-BarnardIG and McIntoshMA(2002) Export of the siderophore enterobactin in Escherichia coliinvolvement of a 43 kDa membrane exporter Mol Microbiol 441225ndash1234
76 CunninghamL GruerMJ and GuestJR (1997) Transcriptionalregulation of the aconitase genes (acnA and acnB) of Escherichiacoli Microbiology 143 ( Pt 12) 3795ndash3805
77 FuentesAM Diaz-MejiaJJ Maldonado-RodriguezR andAmabile-CuevasCF (2001) Differential activities of theSoxR protein of Escherichia coli SoxS is not required for geneactivation under iron deprivation FEMS Microbiol Lett 201271ndash275
78 VargheseS TangY and ImlayJA (2003) Contrasting sensitivitiesof Escherichia coli aconitases A and B to oxidation and irondepletion J Bacteriol 185 221ndash230
79 ZavitzKH DiGateRJ and MariansKJ (1991) The PriB andPriC replication proteins of Escherichia coli Genes DNAsequence overexpression and purification J Biol Chem 26613988ndash13995
80 SandlerSJ (2005) Requirements for replication restart proteinsduring constitutive stable DNA replication in Escherichia coli K-12Genetics 169 1799ndash1806
81 SchneiderTD (2001) Strong minor groove base conservation insequence logos implies DNA distortion or base flipping duringreplication and transcription initiation Nucleic Acids Res 294881ndash4891 httpwwwccrnpncifcrfgov~tomspaperbaseflip
Nucleic Acids Research 2007 Vol 35 No 20 6777
minimizes false positives in our predictions When themodel was scanned across the E coli K-12 genome(NC_000913) a total of 412 sites were found above10 bits These sites belong to 363 unique clusters A list ofthe clusters is available at httpwwwccrnpncifcrfgovtomspapersfurThe 363 regions were extracted from 50 to thorn50
around the strongest site in each region and scanned withthe 24-site M9 model using a cutoff of 0 bits We thendetermined the exact region of each cluster that is coveredby walkers greater than 0 bits The results show that theseclusters contain 1ndash14 walkers covering a region from19ndash142 bases (350 bases on average) This is comparableto the known footprints (30ndash103 bases) (Table 3 Supple-mentary Figure S8) Of these clusters 155 are in inter-genic regions 143 are entirely inside gene coding regionsand 65 overlap the boundary between coding and inter-genic regionsTo determine how many of the Fur clusters overlap
with reasonably strong promoters we scanned these 363Fur clusters with a s70 promoter model (47) using a 4-bitcutoff The results showed that 303 clusters overlap withone or more strong promoters (Rigt4 bits) The other60 clusters do not overlap with a strong promoterWhen we scanned the flanking regions of these 60 clusters(50 to 0 of the left edge and 0 to thorn50 of the right edgeof each cluster) only 23 clusters were found to havestrong promoters (Ri gt 4 bits) in the flanking regionsWe examined the locations of Fur sites relative toknown promoter elements (transcriptional start site 10and 35) and found that they are more tightly distributedaround the promoter elements than observed with theDNA binding proteins Fis H-NS and IHF (47) (data notshown) These results suggest that Fur may mainly act as adirect repressor of genes by binding to and blockingthe promoters and that direct activation by Fur throughbinding to the region upstream of a promoter assuggested in Neisseria meningitidis by Delany et al (59)
may not be a common mechanism of Fur regulationin E coli
Out of the 363 clusters 42 were found containingat least one predicted Fur binding site above 170 bits(our arbitrary cutoff used to locate significant regions)(Table 4) Of the 42 clusters 39 were found to overlap withone or more strong s70 promoters (gt4 bits) The otherthree clusters fepD-entS yddA and fecA do not overlapScanning the flanking regions of these three clusters withthe s70 promoter model revealed that these Fur clustersare located between the respective promoters and transla-tional starts There may be only weak repression of thesegenes by Fur as none of them were significantly repressedby Fur according to microarray analyses (1360)
Scans of other proposed Fur-regulated genes
Many genes have been proposed to be Fur-regulated bycomparing conventional consensus sequences to promoterregions and also by homology to systems in other organ-isms Kammler et al annotated two Fur binding sites inthe promoter region of feoA in E coli by comparison toa consensus sequence (10) The 24-site M9 model predictstwo clusters of sites in the same region with each clustercovering one of the marked Fur boxes One cluster hasone major site of 105 bits (at 3538006) the other clusterhas two major overlapping sites of 109 (at 3538059) and134 bits (at 3538065) respectively The same authors haveconfirmed that Fur does bind to this region in vivo but theexact Fur binding regions have not been determined byfootprint experiments
Vassinova and Kozyrev used an in vivo selection tolocate Fur sites on Sau3A fragments from the E coligenome (51) The five regions from Figure 2 of their paperwere analyzed using sequence walkers Using an unidenti-fied consensus sequence yhhX (from 3578828 to 3577791)was predicted by Vassinova and Kozyrev to have two Furbinding sites In the same region we found one strongsite of 252 bits (at 3579054) (Table 4) overlapping twoweaker sites (82 and 83 bits) This cluster of Fur sites isactually located right in front of an sRNA gene ryhB(from 3579039 to 3578946) (Table 4) (6162) which hasbeen shown to be repressed by Fur (57) The promoterregion of gpmA (from 786818 to 786066 named pgm inreference 51) was predicted by consensus to contain twosites One of the highest sites (274 bits) was found in thisregion (at 786853) overlapping two weaker sites of 75and 56 bits The consensus sequence-predicted Fur site byVassinova and Kozyrev (51) is located about 600 basesupstream of the ygaC gene In the same region our 11-siteM9 model only found a 14-bit site (Figure 3) and the24-site M9 model found a negative site of 34 bits(Supplementary Figure S1) Instead in the nearby regionsthe 24-site M9 model found a 74-bit site (11 basesupstream of ygaC) and a 101-bit site (728 bases upstreamof ygaC) For nohA (from 1634391 to 1633822)two overlapping sites of 223 and 126 bits were found at1634624 and 1634630 (Table 4) The fifth region was fhuF(from 4603686 to 4602898) Figure 5 shows the DNase Ifootprints and predicted walkers in this region Onlythe high-affinity region (fhuF1) was identified by the
Table 2 Information theory test of B subtilis Fur models M9 and M6
Oligo1 Number of observed1
Fur dimers (nM)Number of M92
walkers (bits)Number of M62
walkers (bits)
6-mer 0 f 0 f 06-6 1 (500) f 1 (21) 2 (67 54)6-1-6 1 (200) f 1 (97) 2 (54 74)7-1-7 1 (100) f 1 (207) 2 (107 128)8-1-7 2 (100 1000) f 2 (76 202) f 2 (174 108)8-1-8 2 (100 1000) f 2 (76 230) f 2 (174 160)9-1-9 2 (50 100) f 2 (119 301) f 2 (208 194)2 (7-1-7) 2 (10 100) f 2 (166 257) 3 (61 208 143)feuA 1 (10) f 1 (223) 2 (103 81)dhbA 2 (10 100) f 2 (256 299) 3 (149 208 171)
1Oligos and observed Fur dimers are from (40) The Fur concentration(nM) at which one or two Fur dimer-DNA complexes appeared isgiven in parentheses2Number of sequence walkers of each model followed by theinformation (bits) of each walker are shown for each sequence Thepredictions that match the number of observed Fur dimers are markedwith a solid circle while those that do not are marked with an opencircle Only major walkers are counted A full depiction of all walkerscan be found in Supplementary Figure S7
6770 Nucleic Acids Research 2007 Vol 35 No 20
A B
L
H
H
L
H fhuF1
L fhuF2
yjjZ
4603880 4603870 4603860 4603850 4603840 4603830 4603820 5 a t c a g c a a t c c c g g c a g c a a c a c t c c c c a g c c a c t g c c c a g c g t a c g t t g c a a c a t g a t t t c a t c t c 3
4603810 4603800 4603790 4603780 4603770 4603760 4603750 5 t t t c a t t g a t a a t g a t a a c c a a t a t c a t a t g a t a a t t t t t a t c a t t t g c a a g c c a g a t a a a t c c c t t g c t a t c g g 3
Fur high affinity topFur high affinity bottom
M9 161 bits
M9 208 bits
M9 223 bits
M9 34 bits
fhuF
fhuF
4603740 4603730 4603720 4603710 4603700 4603690 4603680 5 g t a a a c c t a t c g c t a t g a t t a g c a a t c a t t a t c a t t t a g a t t a c t a t c c c g a t t a t g g c c t a t c g t t c 3rsquo
Fur low affinity topFur low affinity bottom
M9 22 bits
M9 01 bits
M9 146 bits
M9 93 bits
Figure 5 Fur model scan and DNase I footprints of the fhuF promoter region (A) The promoter region was scanned with the 24-site M9 model and two clusters of sequence walkers (Rigt 0bits) were detected with each closely corresponding to one of the two Fur protected regions (gray bars fhuF1 and fhuF2) The solid black arrow with two tails above the DNA at coordinates4603717 and 4603716 indicates the fhuF transcription starts The open arrows starting at coordinates 4603827 and 4603686 indicate the yjjZ and fhuF translation starts respectively Thesaturation of colored rectangles behind each sequence walker is proportional to the information of that site For reference colored lines cycle with 6-base periodicity (B) DNase I footprinting onthe fhuF promoter by Fur showed two regions protected by the protein marked in the figure by brackets The high-affinity and low-affinity regions are marked with H and L respectively Thefootprinting samples were run in parallel with MaxamndashGilbert sequencing ladders (marked by GA)
Nucleic
Acid
sResea
rch2007V
ol3
5N
o20
6771
consensus sequence Thus four of the five Sau3Afragments (yhhX gpmA nohA and fhuF) selected to haveFur binding sites in vivo were identified in our genomescan (Table 4) and our gel shift results confirmed this(Figure 3) the remaining site ygaC had a weak walker of14 bits (by the 11-site M9 model) and showed extremelyweak binding (Figure 3)Eick-Helmerich and Braun matched a Fur consensus
sequence to the promoters of exbB and exbD (50)One strong walker of 191 bits was found in the exbBpromoter region overlapping the lsquoFur boxrsquo The regionupstream of exbD showed no walkers with Ri greater than0 bits even though it had also been predicted to containa Fur binding site using the same consensus sequenceas used in the exbB promoter The highest walker thatwe could detect in the region upstream of the exbDgene is 71 bits (by the 24-site M9 model Figure 3 andSupplementary Figure S1) significantly lower than thetheoretical lower bound of a binding site (0 bits) (19) Thisindicates that Fur should not bind this region and our gelshift results confirmed this (Figure 3)A microarray analysis by McHugh et al found 101
genes to be regulated by Fur 53 of which were repressedand 48 activated (13) The 53 Fur-repressed genes belongto 32 transcription units (individual genes or operons)18 of which are involved in iron metabolism 14 in energymetabolism and other functions Using the 24-site M9model we predicted strong Fur binding clusters (higherthan 13 bits) for all 18 iron metabolism-related transcrip-tion units Of the other 14 transcription units onlytwo have a Fur binding site higher than 10 bits (nrdHIEF101 bits fimE 108 bits) The 48 Fur-activated genesbelong to 34 transcription units only three of these(garPLRK ynaE and ydfK) have a predicted Fur bindingcluster The garPLRK operon has a weak Fur clusterthat contains one major site of 104 bits The ynaE andydfK genes are almost identical in both promoter andcoding regions and the two Fur clusters (171 bitscentered at thorn8 from the translational start) are the same(Table 4) These results strongly suggest that in E coli
Fur is a direct repressor of iron metabolism genes and anindirect repressor or activator of other genes
McHugh et al (13) used an altered information theoryapproach for modeling Fur binding (63) to identify whichgenes are under direct Fur control Our model identifiedsites in all genes that their model did as well as fourothers fhuE (124 bits at 1160880) exbB (191 bits at3150076) fimE (108 bits at 4539697) and ydfK (171 bitsat 1631071)
The indirect activation by Fur requires an intermediateregulator which could directly target Fur-activated genesA small regulatory RNA gene ryhB is one such importantplayer Earlier study revealed several genes that aredirectly repressed by ryhB (57) Recently a microarrayanalysis of ryhB control by Masse et al (60) expanded thisdirect target set to 18 operons They also identified anadditional 10 indirectly down-regulated operons and 10up-regulated operons The 10 ryhB indirectly repressedoperons are directly repressed by Fur and for all ofthese operons we found strong Fur binding clusters(higher than 13 bits) in the corresponding promoterregions In contrast only a few of the other 28 ryhB-controlled operons (18 directly repressed operons and10 up-regulated operons) have a predicted Fur bindingcluster above 10 bits These include acnA (176 bits at1333484) ydhD (109 bits at 1732367) and oppA (215 bitsat 1298973) The acnA and oppA Fur clusters are bothstrong the former cluster has three overlapping sites of176 98 and 142 bits and the latter has two overlapp-ing sites of 215 and 96 bits (Table 4) Howevermicroarray analysis did not find these two genes to berepressed by Fur (1360) suggesting that the Fur bindingclusters may not be the primary control elements ofthese genes
DISCUSSION
Fur binding models
In this study we used experimentally proven sites to createinformation theory models of Fur binding Our models
Table 3 Scan of footprinted Fur binding sites in E coli
Accession Gene Footprinted region References Number Ri Percentage
NC_000913 fhuF1 4603763 ndash 4603815 This work 4 223 91NC_000913 fhuF2 4603680 ndash 4603733 This work 4 146 66NC_000913 fepA-fes 611868 ndash 611915 (35) 4 187 77NC_000913 fepB 623939 ndash 623969 (39) 2 167 81NC_000913 entC 624054 ndash 624102 (39) 4 231 76NC_000913 fur 709931 ndash 709960 (34) 2 123 83NC_000913 tonB 1309040 ndash 1309072 (838) 3 115 88NC_000913 cir 2244957 ndash 2244999 (32) 3 207 72NC_000913 sodA 4098726 ndash 4098780 (36) 3 233 56NC_000913 fecA 4514752 ndash 4514789 (33) 2 176 66NC_000913 fecI 4516300 ndash 4516340 (33) 7 231 144M10930 iucA 291 ndash 393 (37) 13 203 93NC_000913 fepD-entS 621394 ndash 621462 (58) 4 201 54
The footprinted regions were scanned with the 24-site M9 model (Rigt0) to test the modelrsquos validity This table lists the GenBank accession numberthe gene promoter that was footprinted the coordinates of the footprinted region the references for the footprints the number of walkers in theregion the strongest Ri value in the same region and the percent of the footprint covered by sequence walkers This table is a summary of Figure 5Aand Supplementary Figure S8 Sites in the promoters that are marked by an asterisk were used to build the 11-site models (Figure 1 SupplementaryFigure S1) in the case of iucA two sites were used in the model (see Materials and Methods Section)
6772 Nucleic Acids Research 2007 Vol 35 No 20
especially the 24-site M9 model approximate the bindingcharacteristics of Fur more fully than previous modelsthat depended on conventional consensus sequences datafrom multiple species and sequences that were not foot-printed (13251) The rigorous approach used hererevealed new binding sites disproved two sites predictedby a consensus sequence and clarified the manner inwhich the protein binds
To produce these models binding sequences weremultiply aligned to maximize the information content indifferent window sizes Because multiple Fur sites overlapour analysis of proven Fur binding sequences from three
bacterial species gave two or three different alignmentsdepending on the species (Figure 1 SupplementaryFigure S6) By testing these models against publisheddata we showed clearly that the M9 model representssingle-dimer binding sites the M12 model is for two-dimerbinding while the compact M6 alignment was caused bycompression of the binding sequences into a small window(Tables 1 and 2 Figures 1 and 2 Supplementary FiguresS3 S5 and S7)As evaluated by their respective M9 models our
initial set of 11 footprinted E coli Fur binding sitesall contain overlapping sites of 6-base spacing
Table 4 Strong Fur clusters in the E coli genome predicted by the 24-site M9 model
Ri Number Size Coordinate Genes Experimental Position References(bits) (bp) data
274 12 95 2066611 yoeA G 48 NA274 5 31 786853 gpmA G 35 (1351)253 4 28 579824 nohB 133 (1)252 4 31 3579054 ryhB G 15 (57)233 3 31 4098746 sodA F 87 (6069)231 7 59 4516309 fecI F 51 (1360)231 4 37 624072 entC F 36 (39)228 2 25 1903412 yebN 300 NA223 4 48 4603779 fhuFa FG 93 (135160)223 3 25 1634624 nohA G 233 (51)217 2 25 1886651 fadD G +1119b NA215 2 25 1298973 oppA G 233 (7071)213 3 31 1762463 sufA 53 (1360)207 3 31 2244980 cir F 189 (1372)206 5 40 1118358 yceJ 89 NA202 3 31 3465053 bfd 40 (1360)201 10 53 1787602 ydiE 35 (13)201 2 35 1080513 ycdN 57 (1360)199 8 106 3214572 yqjH 59 (13)199 7 103 1752756 ydhY 255 NA199 2 25 2510784 mntH 56 (7374)196 3 25 2231892 yeiT 163 NA196 2 25 1577358 yddA +8c (13)195 3 31 167439 fhuA G 45 (1360)194 4 37 621437 fepD ndash entS FG 25 86 (1375)193 3 52 675768 ybeQ +2c NA191 3 31 3150076 exbB G 70 (136072)187 8 96 2784036 ygaQ 383 NA187 4 37 611889 fepA ndash fes F 172 149 (13)186 7 38 3266404 yhaC 33 NA186 4 28 331901 yahA G +306e NA183 4 61 331085 yahA 510e NA176 6 41 1333484 acnA 371 (5776ndash78)176 2 25 4514779 fecA F 79 (1360)176 1 19 2254041 yeiL +5d NA175 3 25 490057 priC ndash ybaN 21 49 (7980)173 2 29 2454142 yfcV 474 NA172 8 41 3582772 yrhB 10 NA172 4 37 840877 fiu 123 (13)171 3 40 1631071 ydfK +8c (13)171 3 40 1432273 ynaE +8c (13)171 2 25 4570225 yjiT 212 NA
The strongest individual information value (Ri bits) in each cluster is shown followed by the number of walkers (Rigt 0 bits) the size of the clusterthe coordinate of the strongest site the genes that are presumably controlled experimental data footprinted (F) andor gel shifted (G) sites thelocation of the strongest Fur site relative to the gene start or stop codon (as noted) and references about Fur regulation of the genes (NA means noreference is available)aThis cluster corresponds to the high-affinity region (fhuF1) in the fhuF promoter (Figure 5)bThe coordinate is high because Fur cluster is inside the fadD coding regioncThese Fur clusters overlap with translational startsdThis Fur cluster overlaps with the stop codoneThe yahA gene has two Fur clusters one is inside the coding region another is 510 bases upstream of the translational start The Fur cluster insidethe gene was tested by gel shift experiments (Figure 3)
Nucleic Acids Research 2007 Vol 35 No 20 6773
(Supplementary Figure S8) while only 7 of the 20P aeruginosa Fur binding sites and 6 of the 22 B subtilisFur binding sites are single-dimer binding sites (data notshown) This scarcity of multiple sites explains why anM12 alignment can be obtained for E coli but not forP aeruginosa or B subtilis When we performed multiplealignment by only using overlapping sites of 6-bp spacingwe obtained M12 alignments for both P aeruginosa andB subtilis (data not shown) These observations alsosupport the use of M9 as a single-dimer modelOur multiple alignment method should apply to any
DNA binding sites that contain internal direct repeats asrepetition within binding sites may cause alternativealignments of the sites In the RegulonDB (httpregulondbccgunammx) (64) a Fur binding modelwas created by multiply aligning 47 Fur sites althoughit is not clear what these 47 sites are and how thealignment was made Based on their alignment (httpregulondbccgunammxhtmlmatrix_Alignmentjsp) we made a sequence logo (data notshown) which we found to be similar to our M6 logosAlthough the M6 models appear to be able to predictsome sites (eg Tables 1 and 2 Supplementary Figures S3S5 and S7) the models only represent the internal partof overlapping Fur binding sites ie from 6 to thorn6 of theM12 model (Figure 1A)A 18 A crystal structure for P aeruginosa Fur
was obtained by Pohl et al and based on this theauthors proposed a single-dimer binding model (9) TheP aeruginosa crystal structure contains a putative DNAbinding a-helix H4 and a loop between helices H1 and H2both of which were proposed to be involved in bindingDNA through contacting bases in the major groove Thusa single dimer would protect two consecutive majorgrooves on the same face of DNA and the minimalrecognition unit of Fur should be close to 20 bp (9) OurM9 binding site models (19 bp a 9-1-9 inverted repeat) fitwith this single-dimer binding model closely Pohl et alalso proposed a two-dimer binding model in whichtwo Fur dimers bind two overlapping sites that are 5-bpapart (9) However in our whole genome scans andfootprint scans as well we did not observe any case of5-bp separated overlapping sites above 5 bits Instead wemostly found 3- or 6-bp separated overlapping sitesAbove five bits we found 95 cases of 3-bp spacing and92 cases of 6-bp spacingSeveral other consensus-based Fur binding site models
have been proposed to interpret a 19-bp consensuslsquoFur boxrsquo (Figure 6) (65) Within these two earlier
models the classical model and hexamer model(573754) interpreted the lsquoFur boxrsquo as a single recogni-tion unit of a Fur dimer Later Lavrrar and McIntoshsuggested that a 13-bp inverted repeat (6-1-6) is theminimal unit recognized by a single Fur dimer and thattwo overlapping lsquo6-1-6rsquo motifs correspond to the Fur boxwhich is required for high-affinity binding of Fur (5558)Baichoo and Helmann reinterpreted the Fur box as twooverlapping 7-1-7 inverted repeats with a 6-bp spacingand suggested that a 7-1-7 site but not 6-1-6 representsthe minimal recognition unit of Fur (40) Baichoo andHelmann also demonstrated that high-affinity bindingby Fur can happen on a single-dimer binding site ie aninverted 7-1-7 site
The 7-1-7 model basically agrees with our M9 models(Figure 6) The main difference is that the 7-1-7 model didnot count the weakly conserved bases at positions 9 8thorn8 and thorn9 in the B subtilis Fur binding site alignment(Figure 4C) As the E coli and P aeruginosa M9 modelshave highly conserved bases at these four positions(Figure 4A and B) and also the Fur binding mechanismsfor these three species are similar (9) we suggest thata minimal Fur binding unit of 19 bp (9-1-9) as representedby our M9 models is more reasonable
Relative to our M9 models the Fur box only representsthe internal part of two overlapping sites that areseparated by 6 bases ie from 9 to thorn9 of the M12logo (Figures 1A and 6) thus any predictions based on theFur box may not be accurate Two sites (exbD and ygaC)that were predicted by matching to the Fur box wereshown to be non-binding (Figure 3)
Many difficulties in understanding Fur binding sites canbe attributed to the choice of consensus sequences asa model (66) In contrast to the individual informationweight matrix model the consensus sequence methodignores the varying importance of bases by treatingmismatches equivalently In addition the consensusmethod does not have a criterion for an acceptablenumber of mismatches
Sequence walkers predicted by the 24-site M9 modelcover 83 of the footprints (Table 3 Figure 5Supplementary Figure S8) Similar results were obtainedfor both the P aeruginosa and B subtilis M9 models(data not shown) These results strongly suggest that ourM9 models can accurately predict Fur binding
Fur Binding Site Clusters
The whole genome scan with the 24-site M9model revealed363 Fur binding clusters in E coli most of which containmultiple walkers (up to 14) covering regions up to about140 bases This is comparable to the size of knownfootprints which ranges from 30 to 103 bases (Table 3Supplementary Figure S8) Most of the clusters (303 outof 363 83) were found to overlap with one or morestrong s70 promoters Indeed the four Fur clusters inthe middle of coding regions gspC garP yahA and fadDthat we tested by gel mobility shift assays (Figure 3) allhave one or more potential s70 sites (Ri gt 4 bits) exactlyoverlapping them (see Figures S10ndashS13) The function ofthese control elements is unknown
Fur boxClassical model 9-1-9Hexamer model 6-6-1-6Lavrrar model 6-1-6
Baichoo model 7-1-7
M9 model 9-1-9
GATAATGATAATCATTATC
========gtOlt========
=====gt=====gtOlt=====
=====gtOlt==========gtOlt=====
t=====gtOlt============gtOlt=====a
aat=====gtOlt================gtOlt=====att
Figure 6 Different Fur binding models to interpret the Fur box
6774 Nucleic Acids Research 2007 Vol 35 No 20
We correlated our whole genome scan with publishedmicroarray data about Fur in E coli (1360) The resultsshowed that all operons that are involved in ironmetabolism have a predicted strong Fur binding cluster(higher than 13 bits) suggesting direct repression ofthese genes by Fur In contrast only a few of otherFur-activated or repressed operons have a predicted Furcluster higher than 10 bits suggesting indirect control ofthese genes by Fur It has been shown that RNA generyhB directly targets and represses a set of other genes andryhB itself is directly repressed by Fur thus ryhB playsan important role in indirect activation of genes by Furin E coli (135760)
It has been suggested that in N meningitidis Fur canalso directly activate genes by binding to the regionupstream of the promoters (59) However our wholegenome analysis did not support this activation mechan-ism in E coli The E coli ryhB gene is located betweengenes yhhX and yhhY No corresponding homologs of thegenes ryhB yhhX or yhhY can be found in N meningitidisThis suggests that Fur activation mechanisms may bedifferent in different species
Disentangling Fur models
Each binding site that we have studied in detail withinformation theory has had unique properties andchallenges for analysis For example ribosome bindingsites were initially modeled as rigid objects (16) but whenenough sites became available it was clear that theShinendashDalgarno sequence was being lost because it has avariable distance to the initiation codon (67) and a flexiblemodel was required (48) The flexible model was latersuccessfully applied to s70 promoters (47) and that wasused to study Fur sites Likewise information theoryanalysis revealed that Fis binding sites are commonlyfound to overlap each other (68) leading to someuncertainty as to how much nearby sites influence thesites being studied in the alignments Overlapping Fursites that are bound by Fur protein dimers create an evenmore complex situation
Although the 24-site M9 model was quite successful themodel is comprised of information from multiple Fur sitesbecause the Fur dimers overlap on the DNA Since theclusters frequently display 6-base separation of sitespositions 3 to thorn9 of the model contain data frompositions 9 to thorn3 of the adjacent Fur sites eg see thelogo of Figure 4 and the M9 walkers of Figure 2AIn addition information from sites separated by 3 basesand the complements of all of these also contribute to theoverall information content of the model It is not obvioushow to disentangle these effects We suggest that the M9model can be scanned across the genome to locate strongsingle-dimer binding sites which could then be experi-mentally confirmed and used to construct a disentangledmodel
Unfortunately the distribution of the individual infor-mation in these sites would be determined by the modelused to locate them and this could introduce biases Forexample the mean of the distribution can be altered justby selecting different subsets of sites from the distribution
A similar bias has occurred in binding site databaseswhere the consensus sequence was used to locate newsites (23) Thus although the sequence logo would nolonger contain self-overlapping data it would not neces-sarily represent the natural distribution of site strengthsTo approach such an ultimate model may require furtherscanning of footprinted regions Incorporation of theweaker sites in the footprints would of course re-entanglethe model so such scans might be best used only todetermine whether the model can detect the weaker sitesA viable solution may be to obtain all single-dimerbinding sites in the genome that have more than zero bitsand can be confirmed experimentally If the resultingmodel matches the observed footprints it may be consi-dered complete On the other hand if the resultingmodel does not match the footprints the mode of bindingby Fur could be different between single dimers andclusters
SUPPLEMENTARY DATA
Supplementary data are available at NAR Online
ACKNOWLEDGEMENTS
We thank Tom OrsquoHalloran and Caryn Outten for Furprotein Pete Rogan for helping to develop the local bestconcept Elaine Bucheimer for writing the localbestprogram and Lakshmanan Iyer Brent Jewett DanielleNeedle Xiao Ma Shu Ouyang Denise Rubens andBruce Shapiro for their helpful comments and discussionsKAL also thanks the National Cancer Institute atFrederick for sponsoring the Werner H Kirsten StudentIntern Program This publication has been funded in partwith Federal funds from the National Cancer InstituteNational Institutes of Health under contract NO1-CO-12400 The content of this publication does not necessarilyreflect the views or policies of the Department of Healthand Human Services nor does mention of trade namescommercial products or organizations imply endorsementby the US Government This research was supported inpart by the Intramural Research Program of the NIHNational Cancer Institute Center for Cancer ResearchFunding to pay the Open Access publication charges forthis article was provided by National Cancer Institute
Conflict of interest statement None declared
REFERENCES
1 StojiljkovicI BaumlerAJ and HantkeK (1994) Fur regulon ingram-negative bacteria Identification and characterization of newiron-regulated Escherichia coli genes by a fur titration assay[published erratum appears in J Mol Biol 1994 Jul 15240(3)271]J Mol Biol 236 531ndash545
2 Le CamE FrechonD BarrayM FourcadeA and DelainE(1994) Observation of binding and polymerization of Fur repressoronto operator-containing DNA with electron and atomic forcemicroscopes Proc Natl Acad Sci USA 91 11816ndash11820
3 CalderwoodSB and MekalanosJJ (1987) Iron regulation ofShiga-like toxin expression in Escherichia coli is mediated by thefur locus J Bacteriol 169 4759ndash4764
Nucleic Acids Research 2007 Vol 35 No 20 6775
4 NiederhofferEC NaranjoCM BradleyKL and FeeJA (1990)Control of Escherichia coli superoxide dismutase (sodA and sodB)genes by the ferric uptake regulation (fur) locus J Bacteriol 1721930ndash1938
5 EscolarL Perez-MartinJ and de LorenzoV (1998) Binding of theFur (Ferric Uptake Regulator) repressor of Escherichia coli toarrays of the GATAAT sequence J Mol Biol 283 537ndash547
6 BaggA and NeilandsJB (1987) Ferric uptake regulation proteinacts as a repressor employing iron (II) as a cofactor to bind theoperator of an iron transport operon in Escherichia coliBiochemistry 26 5471ndash5477
7 de LorenzoV WeeS HerreroM and NeilandsJB (1987)Operator sequences of the aerobactin operon of plasmid ColV-K30binding the ferric uptake regulation (fur) repressor J Bacteriol169 2624ndash2630
8 AlthausEW OuttenCE OlsonKE CaoH andOrsquoHalloranTV (1999) The ferric uptake regulation (Fur) repressoris a zinc metalloprotein Biochemistry 38 6559ndash6569
9 PohlE HallerJC MijovilovichA Meyer-KlauckeWGarmanE and VasilML (2003) Architecture of a protein centralto iron homeostasis crystal structure and spectroscopic analysis ofthe ferric uptake regulator Mol Microbiol 47 903ndash915
10 KammlerM SchonC and HantkeK (1993) Characterization ofthe ferrous iron uptake system of Escherichia coli J Bacteriol 1756212ndash6219
11 DesaiPJ AngererA and GencoCA (1996) Analysis of Furbinding to operator sequences within the Neisseria gonorrhoeae fbpApromoter J Bacteriol 178 5020ndash5023
12 HeidrichC HantkeK BierbaumG and SahlHG (1996)Identification and analysis of a gene encoding a Fur-likeprotein of Staphylococcus epidermidis FEMS Microbiol Lett 140253ndash259
13 McHughJP Rodriguez-QuinonesF Abdul-TehraniHSvistunenkoDA PooleRK CooperCE and AndrewsSC(2003) Global iron-dependent gene regulation in Escherichia coliA new mechanism for iron homeostasis J Biol Chem 27829478ndash29486
14 ShannonCE (1948) A mathematical theory of communicationBell Syst Tech J 27 379ndash423 623ndash656 httpcmbell-labscomcmmswhatshannondaypaperhtml
15 PierceJR (1980) An Introduction to Information Theory SymbolsSignals and Noise 2nd edn Dover Publications Inc New York
16 SchneiderTD StormoGD GoldL and EhrenfeuchtA (1986)Information content of binding sites on nucleotide sequencesJ Mol Biol 188 415ndash431 httpwwwccrnpncifcrfgov~tomspaperschneider1986
17 SchneiderTD and MastronardeD (1996) Fast multiple alignmentof ungapped DNA sequences using information theory and arelaxation method Discrete Appli Math 71 259ndash268 httpwwwccrnpncifcrfgov~tomspapermalign
18 SchneiderTD and StephensRM (1990) Sequence logos a newway to display consensus sequences Nucleic Acids Res 186097ndash6100 httpwwwccrnpncifcrfgov~tomspaperlogopaper
19 SchneiderTD (1997) Information content of individual geneticsequences J Theor Biol 189 427ndash441 httpwwwccrnpncifcrfgov~tomspaperri
20 SchneiderTD (1997) Sequence walkers a graphical method todisplay how binding proteins interact with DNA or RNAsequences Nucleic Acids Res 25 4408ndash4415 httpwwwccrnpncifcrfgov~tomspaperwalker erratum NAR 26(4)1135 1998
21 PaninaEM MironovAA and GelfandMS (2001) Comparativeanalysis of FUR regulons in gamma-proteobacteria Nucleic AcidsRes 29 5195ndash5206
22 SchneiderTD (2000) Evolution of biological informationNucleic Acids Res 28 2794ndash2799 httpwwwccrnpncifcrfgov~tomspaperev
23 VyhlidalCA RoganPK and LeederJS (2004) Development andrefinement of pregnane X receptor (PXR) DNA binding site modelusing information theory insights into PXR-mediated gene regula-tion J Biol Chem 279 46779ndash46786
24 ZhengM DoanB SchneiderTD and StorzG (1999) OxyR andSoxRS regulation of fur J Bacteriol 181 4639ndash4643 httpwwwccrnpncifcrfgov~tomspaperoxyrfur
25 HengenPN BartramSL StewartLE and SchneiderTD (1997)Information analysis of Fis binding sites Nucleic Acids Res 254994ndash5002 httpwwwccrnpncifcrfgov~tomspaperfisinfo
26 WoodTI GriffithKL FawcettWP Jair K-W SchneiderTDand WolfRE (1999) Interdependence of the position andorientation of SoxS binding sites in the transcriptional activation ofthe class I subset of Escherichia coli superoxide-inducible promotersMol Microbiol 34 414ndash430
27 ZhengM WangX DoanB LewisKA SchneiderTD andStorzG (2001) Computation-directed identification of OxyR-DNAbinding sites in Escherichia coli J Bacteriol 183 4571ndash4579
28 RoganPK FauxBM and SchneiderTD (1998) Informationanalysis of human splice site mutations Hum Mutat 12 153ndash171Erratum in Hum Mutat 199913(1)82 httpwwwccrnpncifcrfgov~tomspaperrfs
29 ChenZ and SchneiderTD (2005) Information theory basedT7-like promoter models classification of bacteriophages anddifferential evolution of promoters and their polymerasesNucleic Acids Res 33 6172ndash6187 httpwwwccrnpncifcrfgov~tomspaperst7like
30 ChenZ and SchneiderTD (2006) Comparative analysis of tandemT7-like promoter containing regions in enterobacterial genomesreveals a novel group of genetic islands Nucleic Acids Res 341133ndash1147 httpwwwccrnpncifcrfgov~tomspaperst7island
31 SchneiderTD (1996) Reading of DNA sequence logos predictionof major groove binding by information theory Meth Enzymol274 445ndash455 httpwwwccrnpncifcrfgov~tomspaperoxyr
32 GriggsDW and KoniskyJ (1989) Mechanism for iron-regulatedtranscription of the Escherichia coli cir gene metal-dependentbinding of Fur protein to the promoters J Bacteriol 1711048ndash1052
33 AngererA and BraunV (1998) Iron regulates transcription of theEscherichia coli ferric citrate transport genes directly and throughthe transcription initiation proteins Arch Microbiol 169 483ndash490
34 de LorenzoV HerreroM GiovanniniF and NeilandsJB (1988)Fur (ferric uptake regulation) protein and CAP (catabolite-activatorprotein) modulate transcription of fur gene in Escherichia coliEur J Biochem 173 537ndash546
35 HuntMD PettisGS and McIntoshMA (1994) Promoter andoperator determinants for fur-mediated iron regulation in thebidirectional fepA-fes control region of the Escherichia colienterobactin gene system J Bacteriol 176 3944ndash3955
36 TardatB and TouatiD (1993) Iron and oxygen regulation ofEscherichia coli MnSOD expression competition between the globalregulators Fur and ArcA for binding to DNA Mol Microbiol 953ndash63
37 EscolarL Perez-MartinJ and de LorenzoV (2000) Evidence ofan unusually long operator for the fur repressor in the aerobactinpromoter of Escherichia coli J Biol Chem 275 24709ndash24714
38 YoungGM and PostleK (1994) Repression of tonB transcriptionduring anaerobic growth requires Fur binding at the promoterand a second factor binding upstream Mol Microbiol 11943ndash954
39 BrickmanTJ OzenbergerBA and McIntoshMA (1990)Regulation of divergent transcription from the iron-responsivefepB-entC promoter-operator regions in Escherichia coliJ Mol Biol 212 669ndash682
40 BaichooN and HelmannJD (2002) Recognition of DNA by Fura reinterpretation of the Fur box consensus sequence J Bacteriol184 5826ndash5832
41 OchsnerUA and VasilML (1996) Gene repression by the ferricuptake regulator in Pseudomonas aeruginosa cycle selection of iron-regulated genes Proc Natl Acad Sci USA 93 4409ndash4414
42 BaichooN WangT YeR and HelmannJD (2002) Globalanalysis of the Bacillus subtilis Fur regulon and the iron starvationstimulon Mol Microbiol 45 1613ndash1629
43 ChristoffersenCA BrickmanTJ Hook-BarnardI andMcIntoshMA (2001) Regulatory architecture of the iron-regulatedfepD-ybdA bidirectional promoter region in Escherichia coliJ Bacteriol 183 2059ndash2070
44 ChaiS WelchTJ and CrosaJH (1998) Characterization of theinteraction between Fur and the iron transport promoter of thevirulence plasmid in Vibrio anguillarum J Biol Chem 27333841ndash33847
6776 Nucleic Acids Research 2007 Vol 35 No 20
45 EscolarL Perez-MartinJ and de LorenzoV (1998) Coordinatedrepression in vitro of the divergent fepA-fes promoters ofEscherichia coli by the iron uptake regulation (Fur) proteinJ Bacteriol 180 2579ndash2582
46 EscolarL de LorenzoV and Perez-MartinJ (1997)Metalloregulation in vitro of the aerobactin promoter of Escherichiacoli by the Fur (ferric uptake regulation) protein Mol Microbiol26 799ndash808
47 ShultzabergerRK ChenZ LewisKA and SchneiderTD (2007)Anatomy of Escherichia coli s70 promoters Nucleic Acids Res 35771ndash788 httpwwwccrnpncifcrfgov~tomspaperflexprom
48 ShultzabergerRK BucheimerRE RuddKE andSchneiderTD (2001) Anatomy of Escherichia coli ribosomebinding sites J Mol Biol 313 215ndash228 httpwwwccrnpncifcrfgov~tomspaperflexrbs
49 HiraoI KawaiG YoshizawaS NishimuraY IshidoYWatanabeK and MiuraK (1994) Most compact hairpin-turnstructure exerted by a short DNA fragment d(GCGAAGC) insolution an extraordinarily stable structure resistant to nucleasesand heat Nucleic Acids Res 22 576ndash582
50 Eick-HelmerichK and BraunV (1989) Import of biopolymers intoEscherichia coli nucleotide sequences of the exbB and exbD genesare homologous to those of the tolQ and tolR genes respectivelyJ Bacteriol 171 5117ndash5126
51 VassinovaN and KozyrevD (2000) A method for direct cloning ofFur-regulated genes identification of seven new Fur-regulated lociin Escherichia coli Microbiology 146 3171ndash3182
52 ToledanoMB KullikI TrinhF BairdPT SchneiderTD andStorzG (1994) Redox-dependent shift of OxyR-DNA contactsalong an extended DNA binding site a mechanism for differentialpromoter selection Cell 78 897ndash909
53 BlattnerFR PlunkettG III BlochCA PernaNT BurlandVRileyM Collado-VidesJ GlasnerJD RodeCK et al (1997)The complete genome sequence of Escherichia coli K-12 Science277 1453ndash1474
54 BaggA and NeilandsJB (1987) Molecular mechanism of regula-tion of siderophore-mediated iron assimilation Microbiol Rev 51509ndash518
55 LavrrarJL and McIntoshMA (2003) Architecture of a Furbinding site a comparative analysis J Bacteriol 185 2194ndash2202
56 OrchardK and MayGE (1993) An EMSA-based method fordetermining the molecular weight of a protein ndash DNA complexNucleic Acids Res 21 3335ndash3336
57 MasseE and GottesmanS (2002) A small RNA regulates theexpression of genes involved in iron metabolism in Escherichia coliProc Natl Acad Sci USA 99 4620ndash4625
58 LavrrarJL ChristoffersenCA and McIntoshMA (2002)FurndashDNA interactions at the bidirectional fepDGC-entS promoterregion in Escherichia coli J Mol Biol 322 983ndash995
59 DelanyI RappuoliR and ScarlatoV (2004) Fur functions as anactivator and as a repressor of putative virulence genes in Neisseriameningitidis Mol Microbiol 52 1081ndash1090
60 MasseE VanderpoolCK and GottesmanS (2005) Effect ofRyhB small RNA on global iron use in Escherichia coliJ Bacteriol 187 6962ndash6971
61 ArgamanL HershbergR VogelJ BejeranoG WagnerEGMargalitH and AltuviaS (2001) Novel small RNA-encodinggenes in the intergenic regions of Escherichia coli Curr Biol 11941ndash950
62 WassarmanKM RepoilaF RosenowC StorzG andGottesmanS (2001) Identification of novel small RNAs usingcomparative genomics and microarrays Genes Dev 15 1637ndash1651
63 RobisonK McGuireAM and ChurchGM (1998) A compre-hensive library of DNA-binding site matrices for 55 proteins appliedto the complete Escherichia coli K-12 genome J Mol Biol 284241ndash254
64 SalgadoH Santos-ZavaletaA Gama-CastroS Millan-ZarateDDiaz-PeredoE Sanchez-SolanoF Perez-RuedaEBonavides-MartinezC and Collado-VidesJ (2001) RegulonDB(version 32) transcriptional regulation and operon organization inEscherichia coli K-12 Nucleic Acids Res 29 72ndash74
65 AndrewsSC RobinsonAK and Rodriguez-QuinonesF (2003)Bacterial iron homeostasis FEMS Microbiol Rev 27 215ndash237
66 SchneiderTD (2002) Consensus Sequence Zen ApplBioinformatics 1 111ndash119 httpwwwccrnpncifcrfgov~tomspaperszen
67 RuddKE and SchneiderTD (1992) Compilation of E coliribosome binding sites In MillerJH (ed) A Short Course inBacterial Genetics A Laboratory Manual and Handbook forEscherichia coli and Related Bacteria Cold Spring HarborCold Spring Harbor Laboratory Press New York pp 1719ndash1745
68 HengenPN LyakhovIG StewartLE and SchneiderTD(2003) Molecular flip-flops formed by overlapping Fis sitesNucleic Acids Res 31 6663ndash6673
69 FeeJA (1991) Regulation of sod genes in Escherichia colirelevance to superoxide dismutase function Mol Microbiol 52599ndash2610
70 IgarashiK SaishoT YuguchiM and KashiwagiK (1997)Molecular mechanism of polyamine stimulation of thesynthesis of oligopeptide-binding protein J Biol Chem 2724058ndash4064
71 YoshidaM MeksuriyenD KashiwagiK KawaiG andIgarashiK (1999) Polyamine stimulation of the synthesis ofoligopeptide-binding protein (OppA) Involvement of a structuralchange of the Shine-Dalgarno sequence and the initiation codonaug in oppa mRNA J Biol Chem 274 22723ndash22728
72 LitwinCM and CalderwoodSB (1993) Role of iron in regulationof virulence genes Clin Microbiol Rev 6 137ndash149
73 MakuiH RoigE ColeST HelmannJD GrosP andCellierMF (2000) Identification of the Escherichia coli K-12Nramp orthologue (MntH) as a selective divalent metal iontransporter Mol Microbiol 35 1065ndash1078
74 PatzerSI and HantkeK (2001) Dual repression by Fe(2+)-Furand Mn(2+)-MntR of the mntH gene encoding an NRAMP-likeMn(2+) transporter in Escherichia coli J Bacteriol 1834806ndash4813
75 FurrerJL SandersDN Hook-BarnardIG and McIntoshMA(2002) Export of the siderophore enterobactin in Escherichia coliinvolvement of a 43 kDa membrane exporter Mol Microbiol 441225ndash1234
76 CunninghamL GruerMJ and GuestJR (1997) Transcriptionalregulation of the aconitase genes (acnA and acnB) of Escherichiacoli Microbiology 143 ( Pt 12) 3795ndash3805
77 FuentesAM Diaz-MejiaJJ Maldonado-RodriguezR andAmabile-CuevasCF (2001) Differential activities of theSoxR protein of Escherichia coli SoxS is not required for geneactivation under iron deprivation FEMS Microbiol Lett 201271ndash275
78 VargheseS TangY and ImlayJA (2003) Contrasting sensitivitiesof Escherichia coli aconitases A and B to oxidation and irondepletion J Bacteriol 185 221ndash230
79 ZavitzKH DiGateRJ and MariansKJ (1991) The PriB andPriC replication proteins of Escherichia coli Genes DNAsequence overexpression and purification J Biol Chem 26613988ndash13995
80 SandlerSJ (2005) Requirements for replication restart proteinsduring constitutive stable DNA replication in Escherichia coli K-12Genetics 169 1799ndash1806
81 SchneiderTD (2001) Strong minor groove base conservation insequence logos implies DNA distortion or base flipping duringreplication and transcription initiation Nucleic Acids Res 294881ndash4891 httpwwwccrnpncifcrfgov~tomspaperbaseflip
Nucleic Acids Research 2007 Vol 35 No 20 6777
A B
L
H
H
L
H fhuF1
L fhuF2
yjjZ
4603880 4603870 4603860 4603850 4603840 4603830 4603820 5 a t c a g c a a t c c c g g c a g c a a c a c t c c c c a g c c a c t g c c c a g c g t a c g t t g c a a c a t g a t t t c a t c t c 3
4603810 4603800 4603790 4603780 4603770 4603760 4603750 5 t t t c a t t g a t a a t g a t a a c c a a t a t c a t a t g a t a a t t t t t a t c a t t t g c a a g c c a g a t a a a t c c c t t g c t a t c g g 3
Fur high affinity topFur high affinity bottom
M9 161 bits
M9 208 bits
M9 223 bits
M9 34 bits
fhuF
fhuF
4603740 4603730 4603720 4603710 4603700 4603690 4603680 5 g t a a a c c t a t c g c t a t g a t t a g c a a t c a t t a t c a t t t a g a t t a c t a t c c c g a t t a t g g c c t a t c g t t c 3rsquo
Fur low affinity topFur low affinity bottom
M9 22 bits
M9 01 bits
M9 146 bits
M9 93 bits
Figure 5 Fur model scan and DNase I footprints of the fhuF promoter region (A) The promoter region was scanned with the 24-site M9 model and two clusters of sequence walkers (Rigt 0bits) were detected with each closely corresponding to one of the two Fur protected regions (gray bars fhuF1 and fhuF2) The solid black arrow with two tails above the DNA at coordinates4603717 and 4603716 indicates the fhuF transcription starts The open arrows starting at coordinates 4603827 and 4603686 indicate the yjjZ and fhuF translation starts respectively Thesaturation of colored rectangles behind each sequence walker is proportional to the information of that site For reference colored lines cycle with 6-base periodicity (B) DNase I footprinting onthe fhuF promoter by Fur showed two regions protected by the protein marked in the figure by brackets The high-affinity and low-affinity regions are marked with H and L respectively Thefootprinting samples were run in parallel with MaxamndashGilbert sequencing ladders (marked by GA)
Nucleic
Acid
sResea
rch2007V
ol3
5N
o20
6771
consensus sequence Thus four of the five Sau3Afragments (yhhX gpmA nohA and fhuF) selected to haveFur binding sites in vivo were identified in our genomescan (Table 4) and our gel shift results confirmed this(Figure 3) the remaining site ygaC had a weak walker of14 bits (by the 11-site M9 model) and showed extremelyweak binding (Figure 3)Eick-Helmerich and Braun matched a Fur consensus
sequence to the promoters of exbB and exbD (50)One strong walker of 191 bits was found in the exbBpromoter region overlapping the lsquoFur boxrsquo The regionupstream of exbD showed no walkers with Ri greater than0 bits even though it had also been predicted to containa Fur binding site using the same consensus sequenceas used in the exbB promoter The highest walker thatwe could detect in the region upstream of the exbDgene is 71 bits (by the 24-site M9 model Figure 3 andSupplementary Figure S1) significantly lower than thetheoretical lower bound of a binding site (0 bits) (19) Thisindicates that Fur should not bind this region and our gelshift results confirmed this (Figure 3)A microarray analysis by McHugh et al found 101
genes to be regulated by Fur 53 of which were repressedand 48 activated (13) The 53 Fur-repressed genes belongto 32 transcription units (individual genes or operons)18 of which are involved in iron metabolism 14 in energymetabolism and other functions Using the 24-site M9model we predicted strong Fur binding clusters (higherthan 13 bits) for all 18 iron metabolism-related transcrip-tion units Of the other 14 transcription units onlytwo have a Fur binding site higher than 10 bits (nrdHIEF101 bits fimE 108 bits) The 48 Fur-activated genesbelong to 34 transcription units only three of these(garPLRK ynaE and ydfK) have a predicted Fur bindingcluster The garPLRK operon has a weak Fur clusterthat contains one major site of 104 bits The ynaE andydfK genes are almost identical in both promoter andcoding regions and the two Fur clusters (171 bitscentered at thorn8 from the translational start) are the same(Table 4) These results strongly suggest that in E coli
Fur is a direct repressor of iron metabolism genes and anindirect repressor or activator of other genes
McHugh et al (13) used an altered information theoryapproach for modeling Fur binding (63) to identify whichgenes are under direct Fur control Our model identifiedsites in all genes that their model did as well as fourothers fhuE (124 bits at 1160880) exbB (191 bits at3150076) fimE (108 bits at 4539697) and ydfK (171 bitsat 1631071)
The indirect activation by Fur requires an intermediateregulator which could directly target Fur-activated genesA small regulatory RNA gene ryhB is one such importantplayer Earlier study revealed several genes that aredirectly repressed by ryhB (57) Recently a microarrayanalysis of ryhB control by Masse et al (60) expanded thisdirect target set to 18 operons They also identified anadditional 10 indirectly down-regulated operons and 10up-regulated operons The 10 ryhB indirectly repressedoperons are directly repressed by Fur and for all ofthese operons we found strong Fur binding clusters(higher than 13 bits) in the corresponding promoterregions In contrast only a few of the other 28 ryhB-controlled operons (18 directly repressed operons and10 up-regulated operons) have a predicted Fur bindingcluster above 10 bits These include acnA (176 bits at1333484) ydhD (109 bits at 1732367) and oppA (215 bitsat 1298973) The acnA and oppA Fur clusters are bothstrong the former cluster has three overlapping sites of176 98 and 142 bits and the latter has two overlapp-ing sites of 215 and 96 bits (Table 4) Howevermicroarray analysis did not find these two genes to berepressed by Fur (1360) suggesting that the Fur bindingclusters may not be the primary control elements ofthese genes
DISCUSSION
Fur binding models
In this study we used experimentally proven sites to createinformation theory models of Fur binding Our models
Table 3 Scan of footprinted Fur binding sites in E coli
Accession Gene Footprinted region References Number Ri Percentage
NC_000913 fhuF1 4603763 ndash 4603815 This work 4 223 91NC_000913 fhuF2 4603680 ndash 4603733 This work 4 146 66NC_000913 fepA-fes 611868 ndash 611915 (35) 4 187 77NC_000913 fepB 623939 ndash 623969 (39) 2 167 81NC_000913 entC 624054 ndash 624102 (39) 4 231 76NC_000913 fur 709931 ndash 709960 (34) 2 123 83NC_000913 tonB 1309040 ndash 1309072 (838) 3 115 88NC_000913 cir 2244957 ndash 2244999 (32) 3 207 72NC_000913 sodA 4098726 ndash 4098780 (36) 3 233 56NC_000913 fecA 4514752 ndash 4514789 (33) 2 176 66NC_000913 fecI 4516300 ndash 4516340 (33) 7 231 144M10930 iucA 291 ndash 393 (37) 13 203 93NC_000913 fepD-entS 621394 ndash 621462 (58) 4 201 54
The footprinted regions were scanned with the 24-site M9 model (Rigt0) to test the modelrsquos validity This table lists the GenBank accession numberthe gene promoter that was footprinted the coordinates of the footprinted region the references for the footprints the number of walkers in theregion the strongest Ri value in the same region and the percent of the footprint covered by sequence walkers This table is a summary of Figure 5Aand Supplementary Figure S8 Sites in the promoters that are marked by an asterisk were used to build the 11-site models (Figure 1 SupplementaryFigure S1) in the case of iucA two sites were used in the model (see Materials and Methods Section)
6772 Nucleic Acids Research 2007 Vol 35 No 20
especially the 24-site M9 model approximate the bindingcharacteristics of Fur more fully than previous modelsthat depended on conventional consensus sequences datafrom multiple species and sequences that were not foot-printed (13251) The rigorous approach used hererevealed new binding sites disproved two sites predictedby a consensus sequence and clarified the manner inwhich the protein binds
To produce these models binding sequences weremultiply aligned to maximize the information content indifferent window sizes Because multiple Fur sites overlapour analysis of proven Fur binding sequences from three
bacterial species gave two or three different alignmentsdepending on the species (Figure 1 SupplementaryFigure S6) By testing these models against publisheddata we showed clearly that the M9 model representssingle-dimer binding sites the M12 model is for two-dimerbinding while the compact M6 alignment was caused bycompression of the binding sequences into a small window(Tables 1 and 2 Figures 1 and 2 Supplementary FiguresS3 S5 and S7)As evaluated by their respective M9 models our
initial set of 11 footprinted E coli Fur binding sitesall contain overlapping sites of 6-base spacing
Table 4 Strong Fur clusters in the E coli genome predicted by the 24-site M9 model
Ri Number Size Coordinate Genes Experimental Position References(bits) (bp) data
274 12 95 2066611 yoeA G 48 NA274 5 31 786853 gpmA G 35 (1351)253 4 28 579824 nohB 133 (1)252 4 31 3579054 ryhB G 15 (57)233 3 31 4098746 sodA F 87 (6069)231 7 59 4516309 fecI F 51 (1360)231 4 37 624072 entC F 36 (39)228 2 25 1903412 yebN 300 NA223 4 48 4603779 fhuFa FG 93 (135160)223 3 25 1634624 nohA G 233 (51)217 2 25 1886651 fadD G +1119b NA215 2 25 1298973 oppA G 233 (7071)213 3 31 1762463 sufA 53 (1360)207 3 31 2244980 cir F 189 (1372)206 5 40 1118358 yceJ 89 NA202 3 31 3465053 bfd 40 (1360)201 10 53 1787602 ydiE 35 (13)201 2 35 1080513 ycdN 57 (1360)199 8 106 3214572 yqjH 59 (13)199 7 103 1752756 ydhY 255 NA199 2 25 2510784 mntH 56 (7374)196 3 25 2231892 yeiT 163 NA196 2 25 1577358 yddA +8c (13)195 3 31 167439 fhuA G 45 (1360)194 4 37 621437 fepD ndash entS FG 25 86 (1375)193 3 52 675768 ybeQ +2c NA191 3 31 3150076 exbB G 70 (136072)187 8 96 2784036 ygaQ 383 NA187 4 37 611889 fepA ndash fes F 172 149 (13)186 7 38 3266404 yhaC 33 NA186 4 28 331901 yahA G +306e NA183 4 61 331085 yahA 510e NA176 6 41 1333484 acnA 371 (5776ndash78)176 2 25 4514779 fecA F 79 (1360)176 1 19 2254041 yeiL +5d NA175 3 25 490057 priC ndash ybaN 21 49 (7980)173 2 29 2454142 yfcV 474 NA172 8 41 3582772 yrhB 10 NA172 4 37 840877 fiu 123 (13)171 3 40 1631071 ydfK +8c (13)171 3 40 1432273 ynaE +8c (13)171 2 25 4570225 yjiT 212 NA
The strongest individual information value (Ri bits) in each cluster is shown followed by the number of walkers (Rigt 0 bits) the size of the clusterthe coordinate of the strongest site the genes that are presumably controlled experimental data footprinted (F) andor gel shifted (G) sites thelocation of the strongest Fur site relative to the gene start or stop codon (as noted) and references about Fur regulation of the genes (NA means noreference is available)aThis cluster corresponds to the high-affinity region (fhuF1) in the fhuF promoter (Figure 5)bThe coordinate is high because Fur cluster is inside the fadD coding regioncThese Fur clusters overlap with translational startsdThis Fur cluster overlaps with the stop codoneThe yahA gene has two Fur clusters one is inside the coding region another is 510 bases upstream of the translational start The Fur cluster insidethe gene was tested by gel shift experiments (Figure 3)
Nucleic Acids Research 2007 Vol 35 No 20 6773
(Supplementary Figure S8) while only 7 of the 20P aeruginosa Fur binding sites and 6 of the 22 B subtilisFur binding sites are single-dimer binding sites (data notshown) This scarcity of multiple sites explains why anM12 alignment can be obtained for E coli but not forP aeruginosa or B subtilis When we performed multiplealignment by only using overlapping sites of 6-bp spacingwe obtained M12 alignments for both P aeruginosa andB subtilis (data not shown) These observations alsosupport the use of M9 as a single-dimer modelOur multiple alignment method should apply to any
DNA binding sites that contain internal direct repeats asrepetition within binding sites may cause alternativealignments of the sites In the RegulonDB (httpregulondbccgunammx) (64) a Fur binding modelwas created by multiply aligning 47 Fur sites althoughit is not clear what these 47 sites are and how thealignment was made Based on their alignment (httpregulondbccgunammxhtmlmatrix_Alignmentjsp) we made a sequence logo (data notshown) which we found to be similar to our M6 logosAlthough the M6 models appear to be able to predictsome sites (eg Tables 1 and 2 Supplementary Figures S3S5 and S7) the models only represent the internal partof overlapping Fur binding sites ie from 6 to thorn6 of theM12 model (Figure 1A)A 18 A crystal structure for P aeruginosa Fur
was obtained by Pohl et al and based on this theauthors proposed a single-dimer binding model (9) TheP aeruginosa crystal structure contains a putative DNAbinding a-helix H4 and a loop between helices H1 and H2both of which were proposed to be involved in bindingDNA through contacting bases in the major groove Thusa single dimer would protect two consecutive majorgrooves on the same face of DNA and the minimalrecognition unit of Fur should be close to 20 bp (9) OurM9 binding site models (19 bp a 9-1-9 inverted repeat) fitwith this single-dimer binding model closely Pohl et alalso proposed a two-dimer binding model in whichtwo Fur dimers bind two overlapping sites that are 5-bpapart (9) However in our whole genome scans andfootprint scans as well we did not observe any case of5-bp separated overlapping sites above 5 bits Instead wemostly found 3- or 6-bp separated overlapping sitesAbove five bits we found 95 cases of 3-bp spacing and92 cases of 6-bp spacingSeveral other consensus-based Fur binding site models
have been proposed to interpret a 19-bp consensuslsquoFur boxrsquo (Figure 6) (65) Within these two earlier
models the classical model and hexamer model(573754) interpreted the lsquoFur boxrsquo as a single recogni-tion unit of a Fur dimer Later Lavrrar and McIntoshsuggested that a 13-bp inverted repeat (6-1-6) is theminimal unit recognized by a single Fur dimer and thattwo overlapping lsquo6-1-6rsquo motifs correspond to the Fur boxwhich is required for high-affinity binding of Fur (5558)Baichoo and Helmann reinterpreted the Fur box as twooverlapping 7-1-7 inverted repeats with a 6-bp spacingand suggested that a 7-1-7 site but not 6-1-6 representsthe minimal recognition unit of Fur (40) Baichoo andHelmann also demonstrated that high-affinity bindingby Fur can happen on a single-dimer binding site ie aninverted 7-1-7 site
The 7-1-7 model basically agrees with our M9 models(Figure 6) The main difference is that the 7-1-7 model didnot count the weakly conserved bases at positions 9 8thorn8 and thorn9 in the B subtilis Fur binding site alignment(Figure 4C) As the E coli and P aeruginosa M9 modelshave highly conserved bases at these four positions(Figure 4A and B) and also the Fur binding mechanismsfor these three species are similar (9) we suggest thata minimal Fur binding unit of 19 bp (9-1-9) as representedby our M9 models is more reasonable
Relative to our M9 models the Fur box only representsthe internal part of two overlapping sites that areseparated by 6 bases ie from 9 to thorn9 of the M12logo (Figures 1A and 6) thus any predictions based on theFur box may not be accurate Two sites (exbD and ygaC)that were predicted by matching to the Fur box wereshown to be non-binding (Figure 3)
Many difficulties in understanding Fur binding sites canbe attributed to the choice of consensus sequences asa model (66) In contrast to the individual informationweight matrix model the consensus sequence methodignores the varying importance of bases by treatingmismatches equivalently In addition the consensusmethod does not have a criterion for an acceptablenumber of mismatches
Sequence walkers predicted by the 24-site M9 modelcover 83 of the footprints (Table 3 Figure 5Supplementary Figure S8) Similar results were obtainedfor both the P aeruginosa and B subtilis M9 models(data not shown) These results strongly suggest that ourM9 models can accurately predict Fur binding
Fur Binding Site Clusters
The whole genome scan with the 24-site M9model revealed363 Fur binding clusters in E coli most of which containmultiple walkers (up to 14) covering regions up to about140 bases This is comparable to the size of knownfootprints which ranges from 30 to 103 bases (Table 3Supplementary Figure S8) Most of the clusters (303 outof 363 83) were found to overlap with one or morestrong s70 promoters Indeed the four Fur clusters inthe middle of coding regions gspC garP yahA and fadDthat we tested by gel mobility shift assays (Figure 3) allhave one or more potential s70 sites (Ri gt 4 bits) exactlyoverlapping them (see Figures S10ndashS13) The function ofthese control elements is unknown
Fur boxClassical model 9-1-9Hexamer model 6-6-1-6Lavrrar model 6-1-6
Baichoo model 7-1-7
M9 model 9-1-9
GATAATGATAATCATTATC
========gtOlt========
=====gt=====gtOlt=====
=====gtOlt==========gtOlt=====
t=====gtOlt============gtOlt=====a
aat=====gtOlt================gtOlt=====att
Figure 6 Different Fur binding models to interpret the Fur box
6774 Nucleic Acids Research 2007 Vol 35 No 20
We correlated our whole genome scan with publishedmicroarray data about Fur in E coli (1360) The resultsshowed that all operons that are involved in ironmetabolism have a predicted strong Fur binding cluster(higher than 13 bits) suggesting direct repression ofthese genes by Fur In contrast only a few of otherFur-activated or repressed operons have a predicted Furcluster higher than 10 bits suggesting indirect control ofthese genes by Fur It has been shown that RNA generyhB directly targets and represses a set of other genes andryhB itself is directly repressed by Fur thus ryhB playsan important role in indirect activation of genes by Furin E coli (135760)
It has been suggested that in N meningitidis Fur canalso directly activate genes by binding to the regionupstream of the promoters (59) However our wholegenome analysis did not support this activation mechan-ism in E coli The E coli ryhB gene is located betweengenes yhhX and yhhY No corresponding homologs of thegenes ryhB yhhX or yhhY can be found in N meningitidisThis suggests that Fur activation mechanisms may bedifferent in different species
Disentangling Fur models
Each binding site that we have studied in detail withinformation theory has had unique properties andchallenges for analysis For example ribosome bindingsites were initially modeled as rigid objects (16) but whenenough sites became available it was clear that theShinendashDalgarno sequence was being lost because it has avariable distance to the initiation codon (67) and a flexiblemodel was required (48) The flexible model was latersuccessfully applied to s70 promoters (47) and that wasused to study Fur sites Likewise information theoryanalysis revealed that Fis binding sites are commonlyfound to overlap each other (68) leading to someuncertainty as to how much nearby sites influence thesites being studied in the alignments Overlapping Fursites that are bound by Fur protein dimers create an evenmore complex situation
Although the 24-site M9 model was quite successful themodel is comprised of information from multiple Fur sitesbecause the Fur dimers overlap on the DNA Since theclusters frequently display 6-base separation of sitespositions 3 to thorn9 of the model contain data frompositions 9 to thorn3 of the adjacent Fur sites eg see thelogo of Figure 4 and the M9 walkers of Figure 2AIn addition information from sites separated by 3 basesand the complements of all of these also contribute to theoverall information content of the model It is not obvioushow to disentangle these effects We suggest that the M9model can be scanned across the genome to locate strongsingle-dimer binding sites which could then be experi-mentally confirmed and used to construct a disentangledmodel
Unfortunately the distribution of the individual infor-mation in these sites would be determined by the modelused to locate them and this could introduce biases Forexample the mean of the distribution can be altered justby selecting different subsets of sites from the distribution
A similar bias has occurred in binding site databaseswhere the consensus sequence was used to locate newsites (23) Thus although the sequence logo would nolonger contain self-overlapping data it would not neces-sarily represent the natural distribution of site strengthsTo approach such an ultimate model may require furtherscanning of footprinted regions Incorporation of theweaker sites in the footprints would of course re-entanglethe model so such scans might be best used only todetermine whether the model can detect the weaker sitesA viable solution may be to obtain all single-dimerbinding sites in the genome that have more than zero bitsand can be confirmed experimentally If the resultingmodel matches the observed footprints it may be consi-dered complete On the other hand if the resultingmodel does not match the footprints the mode of bindingby Fur could be different between single dimers andclusters
SUPPLEMENTARY DATA
Supplementary data are available at NAR Online
ACKNOWLEDGEMENTS
We thank Tom OrsquoHalloran and Caryn Outten for Furprotein Pete Rogan for helping to develop the local bestconcept Elaine Bucheimer for writing the localbestprogram and Lakshmanan Iyer Brent Jewett DanielleNeedle Xiao Ma Shu Ouyang Denise Rubens andBruce Shapiro for their helpful comments and discussionsKAL also thanks the National Cancer Institute atFrederick for sponsoring the Werner H Kirsten StudentIntern Program This publication has been funded in partwith Federal funds from the National Cancer InstituteNational Institutes of Health under contract NO1-CO-12400 The content of this publication does not necessarilyreflect the views or policies of the Department of Healthand Human Services nor does mention of trade namescommercial products or organizations imply endorsementby the US Government This research was supported inpart by the Intramural Research Program of the NIHNational Cancer Institute Center for Cancer ResearchFunding to pay the Open Access publication charges forthis article was provided by National Cancer Institute
Conflict of interest statement None declared
REFERENCES
1 StojiljkovicI BaumlerAJ and HantkeK (1994) Fur regulon ingram-negative bacteria Identification and characterization of newiron-regulated Escherichia coli genes by a fur titration assay[published erratum appears in J Mol Biol 1994 Jul 15240(3)271]J Mol Biol 236 531ndash545
2 Le CamE FrechonD BarrayM FourcadeA and DelainE(1994) Observation of binding and polymerization of Fur repressoronto operator-containing DNA with electron and atomic forcemicroscopes Proc Natl Acad Sci USA 91 11816ndash11820
3 CalderwoodSB and MekalanosJJ (1987) Iron regulation ofShiga-like toxin expression in Escherichia coli is mediated by thefur locus J Bacteriol 169 4759ndash4764
Nucleic Acids Research 2007 Vol 35 No 20 6775
4 NiederhofferEC NaranjoCM BradleyKL and FeeJA (1990)Control of Escherichia coli superoxide dismutase (sodA and sodB)genes by the ferric uptake regulation (fur) locus J Bacteriol 1721930ndash1938
5 EscolarL Perez-MartinJ and de LorenzoV (1998) Binding of theFur (Ferric Uptake Regulator) repressor of Escherichia coli toarrays of the GATAAT sequence J Mol Biol 283 537ndash547
6 BaggA and NeilandsJB (1987) Ferric uptake regulation proteinacts as a repressor employing iron (II) as a cofactor to bind theoperator of an iron transport operon in Escherichia coliBiochemistry 26 5471ndash5477
7 de LorenzoV WeeS HerreroM and NeilandsJB (1987)Operator sequences of the aerobactin operon of plasmid ColV-K30binding the ferric uptake regulation (fur) repressor J Bacteriol169 2624ndash2630
8 AlthausEW OuttenCE OlsonKE CaoH andOrsquoHalloranTV (1999) The ferric uptake regulation (Fur) repressoris a zinc metalloprotein Biochemistry 38 6559ndash6569
9 PohlE HallerJC MijovilovichA Meyer-KlauckeWGarmanE and VasilML (2003) Architecture of a protein centralto iron homeostasis crystal structure and spectroscopic analysis ofthe ferric uptake regulator Mol Microbiol 47 903ndash915
10 KammlerM SchonC and HantkeK (1993) Characterization ofthe ferrous iron uptake system of Escherichia coli J Bacteriol 1756212ndash6219
11 DesaiPJ AngererA and GencoCA (1996) Analysis of Furbinding to operator sequences within the Neisseria gonorrhoeae fbpApromoter J Bacteriol 178 5020ndash5023
12 HeidrichC HantkeK BierbaumG and SahlHG (1996)Identification and analysis of a gene encoding a Fur-likeprotein of Staphylococcus epidermidis FEMS Microbiol Lett 140253ndash259
13 McHughJP Rodriguez-QuinonesF Abdul-TehraniHSvistunenkoDA PooleRK CooperCE and AndrewsSC(2003) Global iron-dependent gene regulation in Escherichia coliA new mechanism for iron homeostasis J Biol Chem 27829478ndash29486
14 ShannonCE (1948) A mathematical theory of communicationBell Syst Tech J 27 379ndash423 623ndash656 httpcmbell-labscomcmmswhatshannondaypaperhtml
15 PierceJR (1980) An Introduction to Information Theory SymbolsSignals and Noise 2nd edn Dover Publications Inc New York
16 SchneiderTD StormoGD GoldL and EhrenfeuchtA (1986)Information content of binding sites on nucleotide sequencesJ Mol Biol 188 415ndash431 httpwwwccrnpncifcrfgov~tomspaperschneider1986
17 SchneiderTD and MastronardeD (1996) Fast multiple alignmentof ungapped DNA sequences using information theory and arelaxation method Discrete Appli Math 71 259ndash268 httpwwwccrnpncifcrfgov~tomspapermalign
18 SchneiderTD and StephensRM (1990) Sequence logos a newway to display consensus sequences Nucleic Acids Res 186097ndash6100 httpwwwccrnpncifcrfgov~tomspaperlogopaper
19 SchneiderTD (1997) Information content of individual geneticsequences J Theor Biol 189 427ndash441 httpwwwccrnpncifcrfgov~tomspaperri
20 SchneiderTD (1997) Sequence walkers a graphical method todisplay how binding proteins interact with DNA or RNAsequences Nucleic Acids Res 25 4408ndash4415 httpwwwccrnpncifcrfgov~tomspaperwalker erratum NAR 26(4)1135 1998
21 PaninaEM MironovAA and GelfandMS (2001) Comparativeanalysis of FUR regulons in gamma-proteobacteria Nucleic AcidsRes 29 5195ndash5206
22 SchneiderTD (2000) Evolution of biological informationNucleic Acids Res 28 2794ndash2799 httpwwwccrnpncifcrfgov~tomspaperev
23 VyhlidalCA RoganPK and LeederJS (2004) Development andrefinement of pregnane X receptor (PXR) DNA binding site modelusing information theory insights into PXR-mediated gene regula-tion J Biol Chem 279 46779ndash46786
24 ZhengM DoanB SchneiderTD and StorzG (1999) OxyR andSoxRS regulation of fur J Bacteriol 181 4639ndash4643 httpwwwccrnpncifcrfgov~tomspaperoxyrfur
25 HengenPN BartramSL StewartLE and SchneiderTD (1997)Information analysis of Fis binding sites Nucleic Acids Res 254994ndash5002 httpwwwccrnpncifcrfgov~tomspaperfisinfo
26 WoodTI GriffithKL FawcettWP Jair K-W SchneiderTDand WolfRE (1999) Interdependence of the position andorientation of SoxS binding sites in the transcriptional activation ofthe class I subset of Escherichia coli superoxide-inducible promotersMol Microbiol 34 414ndash430
27 ZhengM WangX DoanB LewisKA SchneiderTD andStorzG (2001) Computation-directed identification of OxyR-DNAbinding sites in Escherichia coli J Bacteriol 183 4571ndash4579
28 RoganPK FauxBM and SchneiderTD (1998) Informationanalysis of human splice site mutations Hum Mutat 12 153ndash171Erratum in Hum Mutat 199913(1)82 httpwwwccrnpncifcrfgov~tomspaperrfs
29 ChenZ and SchneiderTD (2005) Information theory basedT7-like promoter models classification of bacteriophages anddifferential evolution of promoters and their polymerasesNucleic Acids Res 33 6172ndash6187 httpwwwccrnpncifcrfgov~tomspaperst7like
30 ChenZ and SchneiderTD (2006) Comparative analysis of tandemT7-like promoter containing regions in enterobacterial genomesreveals a novel group of genetic islands Nucleic Acids Res 341133ndash1147 httpwwwccrnpncifcrfgov~tomspaperst7island
31 SchneiderTD (1996) Reading of DNA sequence logos predictionof major groove binding by information theory Meth Enzymol274 445ndash455 httpwwwccrnpncifcrfgov~tomspaperoxyr
32 GriggsDW and KoniskyJ (1989) Mechanism for iron-regulatedtranscription of the Escherichia coli cir gene metal-dependentbinding of Fur protein to the promoters J Bacteriol 1711048ndash1052
33 AngererA and BraunV (1998) Iron regulates transcription of theEscherichia coli ferric citrate transport genes directly and throughthe transcription initiation proteins Arch Microbiol 169 483ndash490
34 de LorenzoV HerreroM GiovanniniF and NeilandsJB (1988)Fur (ferric uptake regulation) protein and CAP (catabolite-activatorprotein) modulate transcription of fur gene in Escherichia coliEur J Biochem 173 537ndash546
35 HuntMD PettisGS and McIntoshMA (1994) Promoter andoperator determinants for fur-mediated iron regulation in thebidirectional fepA-fes control region of the Escherichia colienterobactin gene system J Bacteriol 176 3944ndash3955
36 TardatB and TouatiD (1993) Iron and oxygen regulation ofEscherichia coli MnSOD expression competition between the globalregulators Fur and ArcA for binding to DNA Mol Microbiol 953ndash63
37 EscolarL Perez-MartinJ and de LorenzoV (2000) Evidence ofan unusually long operator for the fur repressor in the aerobactinpromoter of Escherichia coli J Biol Chem 275 24709ndash24714
38 YoungGM and PostleK (1994) Repression of tonB transcriptionduring anaerobic growth requires Fur binding at the promoterand a second factor binding upstream Mol Microbiol 11943ndash954
39 BrickmanTJ OzenbergerBA and McIntoshMA (1990)Regulation of divergent transcription from the iron-responsivefepB-entC promoter-operator regions in Escherichia coliJ Mol Biol 212 669ndash682
40 BaichooN and HelmannJD (2002) Recognition of DNA by Fura reinterpretation of the Fur box consensus sequence J Bacteriol184 5826ndash5832
41 OchsnerUA and VasilML (1996) Gene repression by the ferricuptake regulator in Pseudomonas aeruginosa cycle selection of iron-regulated genes Proc Natl Acad Sci USA 93 4409ndash4414
42 BaichooN WangT YeR and HelmannJD (2002) Globalanalysis of the Bacillus subtilis Fur regulon and the iron starvationstimulon Mol Microbiol 45 1613ndash1629
43 ChristoffersenCA BrickmanTJ Hook-BarnardI andMcIntoshMA (2001) Regulatory architecture of the iron-regulatedfepD-ybdA bidirectional promoter region in Escherichia coliJ Bacteriol 183 2059ndash2070
44 ChaiS WelchTJ and CrosaJH (1998) Characterization of theinteraction between Fur and the iron transport promoter of thevirulence plasmid in Vibrio anguillarum J Biol Chem 27333841ndash33847
6776 Nucleic Acids Research 2007 Vol 35 No 20
45 EscolarL Perez-MartinJ and de LorenzoV (1998) Coordinatedrepression in vitro of the divergent fepA-fes promoters ofEscherichia coli by the iron uptake regulation (Fur) proteinJ Bacteriol 180 2579ndash2582
46 EscolarL de LorenzoV and Perez-MartinJ (1997)Metalloregulation in vitro of the aerobactin promoter of Escherichiacoli by the Fur (ferric uptake regulation) protein Mol Microbiol26 799ndash808
47 ShultzabergerRK ChenZ LewisKA and SchneiderTD (2007)Anatomy of Escherichia coli s70 promoters Nucleic Acids Res 35771ndash788 httpwwwccrnpncifcrfgov~tomspaperflexprom
48 ShultzabergerRK BucheimerRE RuddKE andSchneiderTD (2001) Anatomy of Escherichia coli ribosomebinding sites J Mol Biol 313 215ndash228 httpwwwccrnpncifcrfgov~tomspaperflexrbs
49 HiraoI KawaiG YoshizawaS NishimuraY IshidoYWatanabeK and MiuraK (1994) Most compact hairpin-turnstructure exerted by a short DNA fragment d(GCGAAGC) insolution an extraordinarily stable structure resistant to nucleasesand heat Nucleic Acids Res 22 576ndash582
50 Eick-HelmerichK and BraunV (1989) Import of biopolymers intoEscherichia coli nucleotide sequences of the exbB and exbD genesare homologous to those of the tolQ and tolR genes respectivelyJ Bacteriol 171 5117ndash5126
51 VassinovaN and KozyrevD (2000) A method for direct cloning ofFur-regulated genes identification of seven new Fur-regulated lociin Escherichia coli Microbiology 146 3171ndash3182
52 ToledanoMB KullikI TrinhF BairdPT SchneiderTD andStorzG (1994) Redox-dependent shift of OxyR-DNA contactsalong an extended DNA binding site a mechanism for differentialpromoter selection Cell 78 897ndash909
53 BlattnerFR PlunkettG III BlochCA PernaNT BurlandVRileyM Collado-VidesJ GlasnerJD RodeCK et al (1997)The complete genome sequence of Escherichia coli K-12 Science277 1453ndash1474
54 BaggA and NeilandsJB (1987) Molecular mechanism of regula-tion of siderophore-mediated iron assimilation Microbiol Rev 51509ndash518
55 LavrrarJL and McIntoshMA (2003) Architecture of a Furbinding site a comparative analysis J Bacteriol 185 2194ndash2202
56 OrchardK and MayGE (1993) An EMSA-based method fordetermining the molecular weight of a protein ndash DNA complexNucleic Acids Res 21 3335ndash3336
57 MasseE and GottesmanS (2002) A small RNA regulates theexpression of genes involved in iron metabolism in Escherichia coliProc Natl Acad Sci USA 99 4620ndash4625
58 LavrrarJL ChristoffersenCA and McIntoshMA (2002)FurndashDNA interactions at the bidirectional fepDGC-entS promoterregion in Escherichia coli J Mol Biol 322 983ndash995
59 DelanyI RappuoliR and ScarlatoV (2004) Fur functions as anactivator and as a repressor of putative virulence genes in Neisseriameningitidis Mol Microbiol 52 1081ndash1090
60 MasseE VanderpoolCK and GottesmanS (2005) Effect ofRyhB small RNA on global iron use in Escherichia coliJ Bacteriol 187 6962ndash6971
61 ArgamanL HershbergR VogelJ BejeranoG WagnerEGMargalitH and AltuviaS (2001) Novel small RNA-encodinggenes in the intergenic regions of Escherichia coli Curr Biol 11941ndash950
62 WassarmanKM RepoilaF RosenowC StorzG andGottesmanS (2001) Identification of novel small RNAs usingcomparative genomics and microarrays Genes Dev 15 1637ndash1651
63 RobisonK McGuireAM and ChurchGM (1998) A compre-hensive library of DNA-binding site matrices for 55 proteins appliedto the complete Escherichia coli K-12 genome J Mol Biol 284241ndash254
64 SalgadoH Santos-ZavaletaA Gama-CastroS Millan-ZarateDDiaz-PeredoE Sanchez-SolanoF Perez-RuedaEBonavides-MartinezC and Collado-VidesJ (2001) RegulonDB(version 32) transcriptional regulation and operon organization inEscherichia coli K-12 Nucleic Acids Res 29 72ndash74
65 AndrewsSC RobinsonAK and Rodriguez-QuinonesF (2003)Bacterial iron homeostasis FEMS Microbiol Rev 27 215ndash237
66 SchneiderTD (2002) Consensus Sequence Zen ApplBioinformatics 1 111ndash119 httpwwwccrnpncifcrfgov~tomspaperszen
67 RuddKE and SchneiderTD (1992) Compilation of E coliribosome binding sites In MillerJH (ed) A Short Course inBacterial Genetics A Laboratory Manual and Handbook forEscherichia coli and Related Bacteria Cold Spring HarborCold Spring Harbor Laboratory Press New York pp 1719ndash1745
68 HengenPN LyakhovIG StewartLE and SchneiderTD(2003) Molecular flip-flops formed by overlapping Fis sitesNucleic Acids Res 31 6663ndash6673
69 FeeJA (1991) Regulation of sod genes in Escherichia colirelevance to superoxide dismutase function Mol Microbiol 52599ndash2610
70 IgarashiK SaishoT YuguchiM and KashiwagiK (1997)Molecular mechanism of polyamine stimulation of thesynthesis of oligopeptide-binding protein J Biol Chem 2724058ndash4064
71 YoshidaM MeksuriyenD KashiwagiK KawaiG andIgarashiK (1999) Polyamine stimulation of the synthesis ofoligopeptide-binding protein (OppA) Involvement of a structuralchange of the Shine-Dalgarno sequence and the initiation codonaug in oppa mRNA J Biol Chem 274 22723ndash22728
72 LitwinCM and CalderwoodSB (1993) Role of iron in regulationof virulence genes Clin Microbiol Rev 6 137ndash149
73 MakuiH RoigE ColeST HelmannJD GrosP andCellierMF (2000) Identification of the Escherichia coli K-12Nramp orthologue (MntH) as a selective divalent metal iontransporter Mol Microbiol 35 1065ndash1078
74 PatzerSI and HantkeK (2001) Dual repression by Fe(2+)-Furand Mn(2+)-MntR of the mntH gene encoding an NRAMP-likeMn(2+) transporter in Escherichia coli J Bacteriol 1834806ndash4813
75 FurrerJL SandersDN Hook-BarnardIG and McIntoshMA(2002) Export of the siderophore enterobactin in Escherichia coliinvolvement of a 43 kDa membrane exporter Mol Microbiol 441225ndash1234
76 CunninghamL GruerMJ and GuestJR (1997) Transcriptionalregulation of the aconitase genes (acnA and acnB) of Escherichiacoli Microbiology 143 ( Pt 12) 3795ndash3805
77 FuentesAM Diaz-MejiaJJ Maldonado-RodriguezR andAmabile-CuevasCF (2001) Differential activities of theSoxR protein of Escherichia coli SoxS is not required for geneactivation under iron deprivation FEMS Microbiol Lett 201271ndash275
78 VargheseS TangY and ImlayJA (2003) Contrasting sensitivitiesof Escherichia coli aconitases A and B to oxidation and irondepletion J Bacteriol 185 221ndash230
79 ZavitzKH DiGateRJ and MariansKJ (1991) The PriB andPriC replication proteins of Escherichia coli Genes DNAsequence overexpression and purification J Biol Chem 26613988ndash13995
80 SandlerSJ (2005) Requirements for replication restart proteinsduring constitutive stable DNA replication in Escherichia coli K-12Genetics 169 1799ndash1806
81 SchneiderTD (2001) Strong minor groove base conservation insequence logos implies DNA distortion or base flipping duringreplication and transcription initiation Nucleic Acids Res 294881ndash4891 httpwwwccrnpncifcrfgov~tomspaperbaseflip
Nucleic Acids Research 2007 Vol 35 No 20 6777
consensus sequence Thus four of the five Sau3Afragments (yhhX gpmA nohA and fhuF) selected to haveFur binding sites in vivo were identified in our genomescan (Table 4) and our gel shift results confirmed this(Figure 3) the remaining site ygaC had a weak walker of14 bits (by the 11-site M9 model) and showed extremelyweak binding (Figure 3)Eick-Helmerich and Braun matched a Fur consensus
sequence to the promoters of exbB and exbD (50)One strong walker of 191 bits was found in the exbBpromoter region overlapping the lsquoFur boxrsquo The regionupstream of exbD showed no walkers with Ri greater than0 bits even though it had also been predicted to containa Fur binding site using the same consensus sequenceas used in the exbB promoter The highest walker thatwe could detect in the region upstream of the exbDgene is 71 bits (by the 24-site M9 model Figure 3 andSupplementary Figure S1) significantly lower than thetheoretical lower bound of a binding site (0 bits) (19) Thisindicates that Fur should not bind this region and our gelshift results confirmed this (Figure 3)A microarray analysis by McHugh et al found 101
genes to be regulated by Fur 53 of which were repressedand 48 activated (13) The 53 Fur-repressed genes belongto 32 transcription units (individual genes or operons)18 of which are involved in iron metabolism 14 in energymetabolism and other functions Using the 24-site M9model we predicted strong Fur binding clusters (higherthan 13 bits) for all 18 iron metabolism-related transcrip-tion units Of the other 14 transcription units onlytwo have a Fur binding site higher than 10 bits (nrdHIEF101 bits fimE 108 bits) The 48 Fur-activated genesbelong to 34 transcription units only three of these(garPLRK ynaE and ydfK) have a predicted Fur bindingcluster The garPLRK operon has a weak Fur clusterthat contains one major site of 104 bits The ynaE andydfK genes are almost identical in both promoter andcoding regions and the two Fur clusters (171 bitscentered at thorn8 from the translational start) are the same(Table 4) These results strongly suggest that in E coli
Fur is a direct repressor of iron metabolism genes and anindirect repressor or activator of other genes
McHugh et al (13) used an altered information theoryapproach for modeling Fur binding (63) to identify whichgenes are under direct Fur control Our model identifiedsites in all genes that their model did as well as fourothers fhuE (124 bits at 1160880) exbB (191 bits at3150076) fimE (108 bits at 4539697) and ydfK (171 bitsat 1631071)
The indirect activation by Fur requires an intermediateregulator which could directly target Fur-activated genesA small regulatory RNA gene ryhB is one such importantplayer Earlier study revealed several genes that aredirectly repressed by ryhB (57) Recently a microarrayanalysis of ryhB control by Masse et al (60) expanded thisdirect target set to 18 operons They also identified anadditional 10 indirectly down-regulated operons and 10up-regulated operons The 10 ryhB indirectly repressedoperons are directly repressed by Fur and for all ofthese operons we found strong Fur binding clusters(higher than 13 bits) in the corresponding promoterregions In contrast only a few of the other 28 ryhB-controlled operons (18 directly repressed operons and10 up-regulated operons) have a predicted Fur bindingcluster above 10 bits These include acnA (176 bits at1333484) ydhD (109 bits at 1732367) and oppA (215 bitsat 1298973) The acnA and oppA Fur clusters are bothstrong the former cluster has three overlapping sites of176 98 and 142 bits and the latter has two overlapp-ing sites of 215 and 96 bits (Table 4) Howevermicroarray analysis did not find these two genes to berepressed by Fur (1360) suggesting that the Fur bindingclusters may not be the primary control elements ofthese genes
DISCUSSION
Fur binding models
In this study we used experimentally proven sites to createinformation theory models of Fur binding Our models
Table 3 Scan of footprinted Fur binding sites in E coli
Accession Gene Footprinted region References Number Ri Percentage
NC_000913 fhuF1 4603763 ndash 4603815 This work 4 223 91NC_000913 fhuF2 4603680 ndash 4603733 This work 4 146 66NC_000913 fepA-fes 611868 ndash 611915 (35) 4 187 77NC_000913 fepB 623939 ndash 623969 (39) 2 167 81NC_000913 entC 624054 ndash 624102 (39) 4 231 76NC_000913 fur 709931 ndash 709960 (34) 2 123 83NC_000913 tonB 1309040 ndash 1309072 (838) 3 115 88NC_000913 cir 2244957 ndash 2244999 (32) 3 207 72NC_000913 sodA 4098726 ndash 4098780 (36) 3 233 56NC_000913 fecA 4514752 ndash 4514789 (33) 2 176 66NC_000913 fecI 4516300 ndash 4516340 (33) 7 231 144M10930 iucA 291 ndash 393 (37) 13 203 93NC_000913 fepD-entS 621394 ndash 621462 (58) 4 201 54
The footprinted regions were scanned with the 24-site M9 model (Rigt0) to test the modelrsquos validity This table lists the GenBank accession numberthe gene promoter that was footprinted the coordinates of the footprinted region the references for the footprints the number of walkers in theregion the strongest Ri value in the same region and the percent of the footprint covered by sequence walkers This table is a summary of Figure 5Aand Supplementary Figure S8 Sites in the promoters that are marked by an asterisk were used to build the 11-site models (Figure 1 SupplementaryFigure S1) in the case of iucA two sites were used in the model (see Materials and Methods Section)
6772 Nucleic Acids Research 2007 Vol 35 No 20
especially the 24-site M9 model approximate the bindingcharacteristics of Fur more fully than previous modelsthat depended on conventional consensus sequences datafrom multiple species and sequences that were not foot-printed (13251) The rigorous approach used hererevealed new binding sites disproved two sites predictedby a consensus sequence and clarified the manner inwhich the protein binds
To produce these models binding sequences weremultiply aligned to maximize the information content indifferent window sizes Because multiple Fur sites overlapour analysis of proven Fur binding sequences from three
bacterial species gave two or three different alignmentsdepending on the species (Figure 1 SupplementaryFigure S6) By testing these models against publisheddata we showed clearly that the M9 model representssingle-dimer binding sites the M12 model is for two-dimerbinding while the compact M6 alignment was caused bycompression of the binding sequences into a small window(Tables 1 and 2 Figures 1 and 2 Supplementary FiguresS3 S5 and S7)As evaluated by their respective M9 models our
initial set of 11 footprinted E coli Fur binding sitesall contain overlapping sites of 6-base spacing
Table 4 Strong Fur clusters in the E coli genome predicted by the 24-site M9 model
Ri Number Size Coordinate Genes Experimental Position References(bits) (bp) data
274 12 95 2066611 yoeA G 48 NA274 5 31 786853 gpmA G 35 (1351)253 4 28 579824 nohB 133 (1)252 4 31 3579054 ryhB G 15 (57)233 3 31 4098746 sodA F 87 (6069)231 7 59 4516309 fecI F 51 (1360)231 4 37 624072 entC F 36 (39)228 2 25 1903412 yebN 300 NA223 4 48 4603779 fhuFa FG 93 (135160)223 3 25 1634624 nohA G 233 (51)217 2 25 1886651 fadD G +1119b NA215 2 25 1298973 oppA G 233 (7071)213 3 31 1762463 sufA 53 (1360)207 3 31 2244980 cir F 189 (1372)206 5 40 1118358 yceJ 89 NA202 3 31 3465053 bfd 40 (1360)201 10 53 1787602 ydiE 35 (13)201 2 35 1080513 ycdN 57 (1360)199 8 106 3214572 yqjH 59 (13)199 7 103 1752756 ydhY 255 NA199 2 25 2510784 mntH 56 (7374)196 3 25 2231892 yeiT 163 NA196 2 25 1577358 yddA +8c (13)195 3 31 167439 fhuA G 45 (1360)194 4 37 621437 fepD ndash entS FG 25 86 (1375)193 3 52 675768 ybeQ +2c NA191 3 31 3150076 exbB G 70 (136072)187 8 96 2784036 ygaQ 383 NA187 4 37 611889 fepA ndash fes F 172 149 (13)186 7 38 3266404 yhaC 33 NA186 4 28 331901 yahA G +306e NA183 4 61 331085 yahA 510e NA176 6 41 1333484 acnA 371 (5776ndash78)176 2 25 4514779 fecA F 79 (1360)176 1 19 2254041 yeiL +5d NA175 3 25 490057 priC ndash ybaN 21 49 (7980)173 2 29 2454142 yfcV 474 NA172 8 41 3582772 yrhB 10 NA172 4 37 840877 fiu 123 (13)171 3 40 1631071 ydfK +8c (13)171 3 40 1432273 ynaE +8c (13)171 2 25 4570225 yjiT 212 NA
The strongest individual information value (Ri bits) in each cluster is shown followed by the number of walkers (Rigt 0 bits) the size of the clusterthe coordinate of the strongest site the genes that are presumably controlled experimental data footprinted (F) andor gel shifted (G) sites thelocation of the strongest Fur site relative to the gene start or stop codon (as noted) and references about Fur regulation of the genes (NA means noreference is available)aThis cluster corresponds to the high-affinity region (fhuF1) in the fhuF promoter (Figure 5)bThe coordinate is high because Fur cluster is inside the fadD coding regioncThese Fur clusters overlap with translational startsdThis Fur cluster overlaps with the stop codoneThe yahA gene has two Fur clusters one is inside the coding region another is 510 bases upstream of the translational start The Fur cluster insidethe gene was tested by gel shift experiments (Figure 3)
Nucleic Acids Research 2007 Vol 35 No 20 6773
(Supplementary Figure S8) while only 7 of the 20P aeruginosa Fur binding sites and 6 of the 22 B subtilisFur binding sites are single-dimer binding sites (data notshown) This scarcity of multiple sites explains why anM12 alignment can be obtained for E coli but not forP aeruginosa or B subtilis When we performed multiplealignment by only using overlapping sites of 6-bp spacingwe obtained M12 alignments for both P aeruginosa andB subtilis (data not shown) These observations alsosupport the use of M9 as a single-dimer modelOur multiple alignment method should apply to any
DNA binding sites that contain internal direct repeats asrepetition within binding sites may cause alternativealignments of the sites In the RegulonDB (httpregulondbccgunammx) (64) a Fur binding modelwas created by multiply aligning 47 Fur sites althoughit is not clear what these 47 sites are and how thealignment was made Based on their alignment (httpregulondbccgunammxhtmlmatrix_Alignmentjsp) we made a sequence logo (data notshown) which we found to be similar to our M6 logosAlthough the M6 models appear to be able to predictsome sites (eg Tables 1 and 2 Supplementary Figures S3S5 and S7) the models only represent the internal partof overlapping Fur binding sites ie from 6 to thorn6 of theM12 model (Figure 1A)A 18 A crystal structure for P aeruginosa Fur
was obtained by Pohl et al and based on this theauthors proposed a single-dimer binding model (9) TheP aeruginosa crystal structure contains a putative DNAbinding a-helix H4 and a loop between helices H1 and H2both of which were proposed to be involved in bindingDNA through contacting bases in the major groove Thusa single dimer would protect two consecutive majorgrooves on the same face of DNA and the minimalrecognition unit of Fur should be close to 20 bp (9) OurM9 binding site models (19 bp a 9-1-9 inverted repeat) fitwith this single-dimer binding model closely Pohl et alalso proposed a two-dimer binding model in whichtwo Fur dimers bind two overlapping sites that are 5-bpapart (9) However in our whole genome scans andfootprint scans as well we did not observe any case of5-bp separated overlapping sites above 5 bits Instead wemostly found 3- or 6-bp separated overlapping sitesAbove five bits we found 95 cases of 3-bp spacing and92 cases of 6-bp spacingSeveral other consensus-based Fur binding site models
have been proposed to interpret a 19-bp consensuslsquoFur boxrsquo (Figure 6) (65) Within these two earlier
models the classical model and hexamer model(573754) interpreted the lsquoFur boxrsquo as a single recogni-tion unit of a Fur dimer Later Lavrrar and McIntoshsuggested that a 13-bp inverted repeat (6-1-6) is theminimal unit recognized by a single Fur dimer and thattwo overlapping lsquo6-1-6rsquo motifs correspond to the Fur boxwhich is required for high-affinity binding of Fur (5558)Baichoo and Helmann reinterpreted the Fur box as twooverlapping 7-1-7 inverted repeats with a 6-bp spacingand suggested that a 7-1-7 site but not 6-1-6 representsthe minimal recognition unit of Fur (40) Baichoo andHelmann also demonstrated that high-affinity bindingby Fur can happen on a single-dimer binding site ie aninverted 7-1-7 site
The 7-1-7 model basically agrees with our M9 models(Figure 6) The main difference is that the 7-1-7 model didnot count the weakly conserved bases at positions 9 8thorn8 and thorn9 in the B subtilis Fur binding site alignment(Figure 4C) As the E coli and P aeruginosa M9 modelshave highly conserved bases at these four positions(Figure 4A and B) and also the Fur binding mechanismsfor these three species are similar (9) we suggest thata minimal Fur binding unit of 19 bp (9-1-9) as representedby our M9 models is more reasonable
Relative to our M9 models the Fur box only representsthe internal part of two overlapping sites that areseparated by 6 bases ie from 9 to thorn9 of the M12logo (Figures 1A and 6) thus any predictions based on theFur box may not be accurate Two sites (exbD and ygaC)that were predicted by matching to the Fur box wereshown to be non-binding (Figure 3)
Many difficulties in understanding Fur binding sites canbe attributed to the choice of consensus sequences asa model (66) In contrast to the individual informationweight matrix model the consensus sequence methodignores the varying importance of bases by treatingmismatches equivalently In addition the consensusmethod does not have a criterion for an acceptablenumber of mismatches
Sequence walkers predicted by the 24-site M9 modelcover 83 of the footprints (Table 3 Figure 5Supplementary Figure S8) Similar results were obtainedfor both the P aeruginosa and B subtilis M9 models(data not shown) These results strongly suggest that ourM9 models can accurately predict Fur binding
Fur Binding Site Clusters
The whole genome scan with the 24-site M9model revealed363 Fur binding clusters in E coli most of which containmultiple walkers (up to 14) covering regions up to about140 bases This is comparable to the size of knownfootprints which ranges from 30 to 103 bases (Table 3Supplementary Figure S8) Most of the clusters (303 outof 363 83) were found to overlap with one or morestrong s70 promoters Indeed the four Fur clusters inthe middle of coding regions gspC garP yahA and fadDthat we tested by gel mobility shift assays (Figure 3) allhave one or more potential s70 sites (Ri gt 4 bits) exactlyoverlapping them (see Figures S10ndashS13) The function ofthese control elements is unknown
Fur boxClassical model 9-1-9Hexamer model 6-6-1-6Lavrrar model 6-1-6
Baichoo model 7-1-7
M9 model 9-1-9
GATAATGATAATCATTATC
========gtOlt========
=====gt=====gtOlt=====
=====gtOlt==========gtOlt=====
t=====gtOlt============gtOlt=====a
aat=====gtOlt================gtOlt=====att
Figure 6 Different Fur binding models to interpret the Fur box
6774 Nucleic Acids Research 2007 Vol 35 No 20
We correlated our whole genome scan with publishedmicroarray data about Fur in E coli (1360) The resultsshowed that all operons that are involved in ironmetabolism have a predicted strong Fur binding cluster(higher than 13 bits) suggesting direct repression ofthese genes by Fur In contrast only a few of otherFur-activated or repressed operons have a predicted Furcluster higher than 10 bits suggesting indirect control ofthese genes by Fur It has been shown that RNA generyhB directly targets and represses a set of other genes andryhB itself is directly repressed by Fur thus ryhB playsan important role in indirect activation of genes by Furin E coli (135760)
It has been suggested that in N meningitidis Fur canalso directly activate genes by binding to the regionupstream of the promoters (59) However our wholegenome analysis did not support this activation mechan-ism in E coli The E coli ryhB gene is located betweengenes yhhX and yhhY No corresponding homologs of thegenes ryhB yhhX or yhhY can be found in N meningitidisThis suggests that Fur activation mechanisms may bedifferent in different species
Disentangling Fur models
Each binding site that we have studied in detail withinformation theory has had unique properties andchallenges for analysis For example ribosome bindingsites were initially modeled as rigid objects (16) but whenenough sites became available it was clear that theShinendashDalgarno sequence was being lost because it has avariable distance to the initiation codon (67) and a flexiblemodel was required (48) The flexible model was latersuccessfully applied to s70 promoters (47) and that wasused to study Fur sites Likewise information theoryanalysis revealed that Fis binding sites are commonlyfound to overlap each other (68) leading to someuncertainty as to how much nearby sites influence thesites being studied in the alignments Overlapping Fursites that are bound by Fur protein dimers create an evenmore complex situation
Although the 24-site M9 model was quite successful themodel is comprised of information from multiple Fur sitesbecause the Fur dimers overlap on the DNA Since theclusters frequently display 6-base separation of sitespositions 3 to thorn9 of the model contain data frompositions 9 to thorn3 of the adjacent Fur sites eg see thelogo of Figure 4 and the M9 walkers of Figure 2AIn addition information from sites separated by 3 basesand the complements of all of these also contribute to theoverall information content of the model It is not obvioushow to disentangle these effects We suggest that the M9model can be scanned across the genome to locate strongsingle-dimer binding sites which could then be experi-mentally confirmed and used to construct a disentangledmodel
Unfortunately the distribution of the individual infor-mation in these sites would be determined by the modelused to locate them and this could introduce biases Forexample the mean of the distribution can be altered justby selecting different subsets of sites from the distribution
A similar bias has occurred in binding site databaseswhere the consensus sequence was used to locate newsites (23) Thus although the sequence logo would nolonger contain self-overlapping data it would not neces-sarily represent the natural distribution of site strengthsTo approach such an ultimate model may require furtherscanning of footprinted regions Incorporation of theweaker sites in the footprints would of course re-entanglethe model so such scans might be best used only todetermine whether the model can detect the weaker sitesA viable solution may be to obtain all single-dimerbinding sites in the genome that have more than zero bitsand can be confirmed experimentally If the resultingmodel matches the observed footprints it may be consi-dered complete On the other hand if the resultingmodel does not match the footprints the mode of bindingby Fur could be different between single dimers andclusters
SUPPLEMENTARY DATA
Supplementary data are available at NAR Online
ACKNOWLEDGEMENTS
We thank Tom OrsquoHalloran and Caryn Outten for Furprotein Pete Rogan for helping to develop the local bestconcept Elaine Bucheimer for writing the localbestprogram and Lakshmanan Iyer Brent Jewett DanielleNeedle Xiao Ma Shu Ouyang Denise Rubens andBruce Shapiro for their helpful comments and discussionsKAL also thanks the National Cancer Institute atFrederick for sponsoring the Werner H Kirsten StudentIntern Program This publication has been funded in partwith Federal funds from the National Cancer InstituteNational Institutes of Health under contract NO1-CO-12400 The content of this publication does not necessarilyreflect the views or policies of the Department of Healthand Human Services nor does mention of trade namescommercial products or organizations imply endorsementby the US Government This research was supported inpart by the Intramural Research Program of the NIHNational Cancer Institute Center for Cancer ResearchFunding to pay the Open Access publication charges forthis article was provided by National Cancer Institute
Conflict of interest statement None declared
REFERENCES
1 StojiljkovicI BaumlerAJ and HantkeK (1994) Fur regulon ingram-negative bacteria Identification and characterization of newiron-regulated Escherichia coli genes by a fur titration assay[published erratum appears in J Mol Biol 1994 Jul 15240(3)271]J Mol Biol 236 531ndash545
2 Le CamE FrechonD BarrayM FourcadeA and DelainE(1994) Observation of binding and polymerization of Fur repressoronto operator-containing DNA with electron and atomic forcemicroscopes Proc Natl Acad Sci USA 91 11816ndash11820
3 CalderwoodSB and MekalanosJJ (1987) Iron regulation ofShiga-like toxin expression in Escherichia coli is mediated by thefur locus J Bacteriol 169 4759ndash4764
Nucleic Acids Research 2007 Vol 35 No 20 6775
4 NiederhofferEC NaranjoCM BradleyKL and FeeJA (1990)Control of Escherichia coli superoxide dismutase (sodA and sodB)genes by the ferric uptake regulation (fur) locus J Bacteriol 1721930ndash1938
5 EscolarL Perez-MartinJ and de LorenzoV (1998) Binding of theFur (Ferric Uptake Regulator) repressor of Escherichia coli toarrays of the GATAAT sequence J Mol Biol 283 537ndash547
6 BaggA and NeilandsJB (1987) Ferric uptake regulation proteinacts as a repressor employing iron (II) as a cofactor to bind theoperator of an iron transport operon in Escherichia coliBiochemistry 26 5471ndash5477
7 de LorenzoV WeeS HerreroM and NeilandsJB (1987)Operator sequences of the aerobactin operon of plasmid ColV-K30binding the ferric uptake regulation (fur) repressor J Bacteriol169 2624ndash2630
8 AlthausEW OuttenCE OlsonKE CaoH andOrsquoHalloranTV (1999) The ferric uptake regulation (Fur) repressoris a zinc metalloprotein Biochemistry 38 6559ndash6569
9 PohlE HallerJC MijovilovichA Meyer-KlauckeWGarmanE and VasilML (2003) Architecture of a protein centralto iron homeostasis crystal structure and spectroscopic analysis ofthe ferric uptake regulator Mol Microbiol 47 903ndash915
10 KammlerM SchonC and HantkeK (1993) Characterization ofthe ferrous iron uptake system of Escherichia coli J Bacteriol 1756212ndash6219
11 DesaiPJ AngererA and GencoCA (1996) Analysis of Furbinding to operator sequences within the Neisseria gonorrhoeae fbpApromoter J Bacteriol 178 5020ndash5023
12 HeidrichC HantkeK BierbaumG and SahlHG (1996)Identification and analysis of a gene encoding a Fur-likeprotein of Staphylococcus epidermidis FEMS Microbiol Lett 140253ndash259
13 McHughJP Rodriguez-QuinonesF Abdul-TehraniHSvistunenkoDA PooleRK CooperCE and AndrewsSC(2003) Global iron-dependent gene regulation in Escherichia coliA new mechanism for iron homeostasis J Biol Chem 27829478ndash29486
14 ShannonCE (1948) A mathematical theory of communicationBell Syst Tech J 27 379ndash423 623ndash656 httpcmbell-labscomcmmswhatshannondaypaperhtml
15 PierceJR (1980) An Introduction to Information Theory SymbolsSignals and Noise 2nd edn Dover Publications Inc New York
16 SchneiderTD StormoGD GoldL and EhrenfeuchtA (1986)Information content of binding sites on nucleotide sequencesJ Mol Biol 188 415ndash431 httpwwwccrnpncifcrfgov~tomspaperschneider1986
17 SchneiderTD and MastronardeD (1996) Fast multiple alignmentof ungapped DNA sequences using information theory and arelaxation method Discrete Appli Math 71 259ndash268 httpwwwccrnpncifcrfgov~tomspapermalign
18 SchneiderTD and StephensRM (1990) Sequence logos a newway to display consensus sequences Nucleic Acids Res 186097ndash6100 httpwwwccrnpncifcrfgov~tomspaperlogopaper
19 SchneiderTD (1997) Information content of individual geneticsequences J Theor Biol 189 427ndash441 httpwwwccrnpncifcrfgov~tomspaperri
20 SchneiderTD (1997) Sequence walkers a graphical method todisplay how binding proteins interact with DNA or RNAsequences Nucleic Acids Res 25 4408ndash4415 httpwwwccrnpncifcrfgov~tomspaperwalker erratum NAR 26(4)1135 1998
21 PaninaEM MironovAA and GelfandMS (2001) Comparativeanalysis of FUR regulons in gamma-proteobacteria Nucleic AcidsRes 29 5195ndash5206
22 SchneiderTD (2000) Evolution of biological informationNucleic Acids Res 28 2794ndash2799 httpwwwccrnpncifcrfgov~tomspaperev
23 VyhlidalCA RoganPK and LeederJS (2004) Development andrefinement of pregnane X receptor (PXR) DNA binding site modelusing information theory insights into PXR-mediated gene regula-tion J Biol Chem 279 46779ndash46786
24 ZhengM DoanB SchneiderTD and StorzG (1999) OxyR andSoxRS regulation of fur J Bacteriol 181 4639ndash4643 httpwwwccrnpncifcrfgov~tomspaperoxyrfur
25 HengenPN BartramSL StewartLE and SchneiderTD (1997)Information analysis of Fis binding sites Nucleic Acids Res 254994ndash5002 httpwwwccrnpncifcrfgov~tomspaperfisinfo
26 WoodTI GriffithKL FawcettWP Jair K-W SchneiderTDand WolfRE (1999) Interdependence of the position andorientation of SoxS binding sites in the transcriptional activation ofthe class I subset of Escherichia coli superoxide-inducible promotersMol Microbiol 34 414ndash430
27 ZhengM WangX DoanB LewisKA SchneiderTD andStorzG (2001) Computation-directed identification of OxyR-DNAbinding sites in Escherichia coli J Bacteriol 183 4571ndash4579
28 RoganPK FauxBM and SchneiderTD (1998) Informationanalysis of human splice site mutations Hum Mutat 12 153ndash171Erratum in Hum Mutat 199913(1)82 httpwwwccrnpncifcrfgov~tomspaperrfs
29 ChenZ and SchneiderTD (2005) Information theory basedT7-like promoter models classification of bacteriophages anddifferential evolution of promoters and their polymerasesNucleic Acids Res 33 6172ndash6187 httpwwwccrnpncifcrfgov~tomspaperst7like
30 ChenZ and SchneiderTD (2006) Comparative analysis of tandemT7-like promoter containing regions in enterobacterial genomesreveals a novel group of genetic islands Nucleic Acids Res 341133ndash1147 httpwwwccrnpncifcrfgov~tomspaperst7island
31 SchneiderTD (1996) Reading of DNA sequence logos predictionof major groove binding by information theory Meth Enzymol274 445ndash455 httpwwwccrnpncifcrfgov~tomspaperoxyr
32 GriggsDW and KoniskyJ (1989) Mechanism for iron-regulatedtranscription of the Escherichia coli cir gene metal-dependentbinding of Fur protein to the promoters J Bacteriol 1711048ndash1052
33 AngererA and BraunV (1998) Iron regulates transcription of theEscherichia coli ferric citrate transport genes directly and throughthe transcription initiation proteins Arch Microbiol 169 483ndash490
34 de LorenzoV HerreroM GiovanniniF and NeilandsJB (1988)Fur (ferric uptake regulation) protein and CAP (catabolite-activatorprotein) modulate transcription of fur gene in Escherichia coliEur J Biochem 173 537ndash546
35 HuntMD PettisGS and McIntoshMA (1994) Promoter andoperator determinants for fur-mediated iron regulation in thebidirectional fepA-fes control region of the Escherichia colienterobactin gene system J Bacteriol 176 3944ndash3955
36 TardatB and TouatiD (1993) Iron and oxygen regulation ofEscherichia coli MnSOD expression competition between the globalregulators Fur and ArcA for binding to DNA Mol Microbiol 953ndash63
37 EscolarL Perez-MartinJ and de LorenzoV (2000) Evidence ofan unusually long operator for the fur repressor in the aerobactinpromoter of Escherichia coli J Biol Chem 275 24709ndash24714
38 YoungGM and PostleK (1994) Repression of tonB transcriptionduring anaerobic growth requires Fur binding at the promoterand a second factor binding upstream Mol Microbiol 11943ndash954
39 BrickmanTJ OzenbergerBA and McIntoshMA (1990)Regulation of divergent transcription from the iron-responsivefepB-entC promoter-operator regions in Escherichia coliJ Mol Biol 212 669ndash682
40 BaichooN and HelmannJD (2002) Recognition of DNA by Fura reinterpretation of the Fur box consensus sequence J Bacteriol184 5826ndash5832
41 OchsnerUA and VasilML (1996) Gene repression by the ferricuptake regulator in Pseudomonas aeruginosa cycle selection of iron-regulated genes Proc Natl Acad Sci USA 93 4409ndash4414
42 BaichooN WangT YeR and HelmannJD (2002) Globalanalysis of the Bacillus subtilis Fur regulon and the iron starvationstimulon Mol Microbiol 45 1613ndash1629
43 ChristoffersenCA BrickmanTJ Hook-BarnardI andMcIntoshMA (2001) Regulatory architecture of the iron-regulatedfepD-ybdA bidirectional promoter region in Escherichia coliJ Bacteriol 183 2059ndash2070
44 ChaiS WelchTJ and CrosaJH (1998) Characterization of theinteraction between Fur and the iron transport promoter of thevirulence plasmid in Vibrio anguillarum J Biol Chem 27333841ndash33847
6776 Nucleic Acids Research 2007 Vol 35 No 20
45 EscolarL Perez-MartinJ and de LorenzoV (1998) Coordinatedrepression in vitro of the divergent fepA-fes promoters ofEscherichia coli by the iron uptake regulation (Fur) proteinJ Bacteriol 180 2579ndash2582
46 EscolarL de LorenzoV and Perez-MartinJ (1997)Metalloregulation in vitro of the aerobactin promoter of Escherichiacoli by the Fur (ferric uptake regulation) protein Mol Microbiol26 799ndash808
47 ShultzabergerRK ChenZ LewisKA and SchneiderTD (2007)Anatomy of Escherichia coli s70 promoters Nucleic Acids Res 35771ndash788 httpwwwccrnpncifcrfgov~tomspaperflexprom
48 ShultzabergerRK BucheimerRE RuddKE andSchneiderTD (2001) Anatomy of Escherichia coli ribosomebinding sites J Mol Biol 313 215ndash228 httpwwwccrnpncifcrfgov~tomspaperflexrbs
49 HiraoI KawaiG YoshizawaS NishimuraY IshidoYWatanabeK and MiuraK (1994) Most compact hairpin-turnstructure exerted by a short DNA fragment d(GCGAAGC) insolution an extraordinarily stable structure resistant to nucleasesand heat Nucleic Acids Res 22 576ndash582
50 Eick-HelmerichK and BraunV (1989) Import of biopolymers intoEscherichia coli nucleotide sequences of the exbB and exbD genesare homologous to those of the tolQ and tolR genes respectivelyJ Bacteriol 171 5117ndash5126
51 VassinovaN and KozyrevD (2000) A method for direct cloning ofFur-regulated genes identification of seven new Fur-regulated lociin Escherichia coli Microbiology 146 3171ndash3182
52 ToledanoMB KullikI TrinhF BairdPT SchneiderTD andStorzG (1994) Redox-dependent shift of OxyR-DNA contactsalong an extended DNA binding site a mechanism for differentialpromoter selection Cell 78 897ndash909
53 BlattnerFR PlunkettG III BlochCA PernaNT BurlandVRileyM Collado-VidesJ GlasnerJD RodeCK et al (1997)The complete genome sequence of Escherichia coli K-12 Science277 1453ndash1474
54 BaggA and NeilandsJB (1987) Molecular mechanism of regula-tion of siderophore-mediated iron assimilation Microbiol Rev 51509ndash518
55 LavrrarJL and McIntoshMA (2003) Architecture of a Furbinding site a comparative analysis J Bacteriol 185 2194ndash2202
56 OrchardK and MayGE (1993) An EMSA-based method fordetermining the molecular weight of a protein ndash DNA complexNucleic Acids Res 21 3335ndash3336
57 MasseE and GottesmanS (2002) A small RNA regulates theexpression of genes involved in iron metabolism in Escherichia coliProc Natl Acad Sci USA 99 4620ndash4625
58 LavrrarJL ChristoffersenCA and McIntoshMA (2002)FurndashDNA interactions at the bidirectional fepDGC-entS promoterregion in Escherichia coli J Mol Biol 322 983ndash995
59 DelanyI RappuoliR and ScarlatoV (2004) Fur functions as anactivator and as a repressor of putative virulence genes in Neisseriameningitidis Mol Microbiol 52 1081ndash1090
60 MasseE VanderpoolCK and GottesmanS (2005) Effect ofRyhB small RNA on global iron use in Escherichia coliJ Bacteriol 187 6962ndash6971
61 ArgamanL HershbergR VogelJ BejeranoG WagnerEGMargalitH and AltuviaS (2001) Novel small RNA-encodinggenes in the intergenic regions of Escherichia coli Curr Biol 11941ndash950
62 WassarmanKM RepoilaF RosenowC StorzG andGottesmanS (2001) Identification of novel small RNAs usingcomparative genomics and microarrays Genes Dev 15 1637ndash1651
63 RobisonK McGuireAM and ChurchGM (1998) A compre-hensive library of DNA-binding site matrices for 55 proteins appliedto the complete Escherichia coli K-12 genome J Mol Biol 284241ndash254
64 SalgadoH Santos-ZavaletaA Gama-CastroS Millan-ZarateDDiaz-PeredoE Sanchez-SolanoF Perez-RuedaEBonavides-MartinezC and Collado-VidesJ (2001) RegulonDB(version 32) transcriptional regulation and operon organization inEscherichia coli K-12 Nucleic Acids Res 29 72ndash74
65 AndrewsSC RobinsonAK and Rodriguez-QuinonesF (2003)Bacterial iron homeostasis FEMS Microbiol Rev 27 215ndash237
66 SchneiderTD (2002) Consensus Sequence Zen ApplBioinformatics 1 111ndash119 httpwwwccrnpncifcrfgov~tomspaperszen
67 RuddKE and SchneiderTD (1992) Compilation of E coliribosome binding sites In MillerJH (ed) A Short Course inBacterial Genetics A Laboratory Manual and Handbook forEscherichia coli and Related Bacteria Cold Spring HarborCold Spring Harbor Laboratory Press New York pp 1719ndash1745
68 HengenPN LyakhovIG StewartLE and SchneiderTD(2003) Molecular flip-flops formed by overlapping Fis sitesNucleic Acids Res 31 6663ndash6673
69 FeeJA (1991) Regulation of sod genes in Escherichia colirelevance to superoxide dismutase function Mol Microbiol 52599ndash2610
70 IgarashiK SaishoT YuguchiM and KashiwagiK (1997)Molecular mechanism of polyamine stimulation of thesynthesis of oligopeptide-binding protein J Biol Chem 2724058ndash4064
71 YoshidaM MeksuriyenD KashiwagiK KawaiG andIgarashiK (1999) Polyamine stimulation of the synthesis ofoligopeptide-binding protein (OppA) Involvement of a structuralchange of the Shine-Dalgarno sequence and the initiation codonaug in oppa mRNA J Biol Chem 274 22723ndash22728
72 LitwinCM and CalderwoodSB (1993) Role of iron in regulationof virulence genes Clin Microbiol Rev 6 137ndash149
73 MakuiH RoigE ColeST HelmannJD GrosP andCellierMF (2000) Identification of the Escherichia coli K-12Nramp orthologue (MntH) as a selective divalent metal iontransporter Mol Microbiol 35 1065ndash1078
74 PatzerSI and HantkeK (2001) Dual repression by Fe(2+)-Furand Mn(2+)-MntR of the mntH gene encoding an NRAMP-likeMn(2+) transporter in Escherichia coli J Bacteriol 1834806ndash4813
75 FurrerJL SandersDN Hook-BarnardIG and McIntoshMA(2002) Export of the siderophore enterobactin in Escherichia coliinvolvement of a 43 kDa membrane exporter Mol Microbiol 441225ndash1234
76 CunninghamL GruerMJ and GuestJR (1997) Transcriptionalregulation of the aconitase genes (acnA and acnB) of Escherichiacoli Microbiology 143 ( Pt 12) 3795ndash3805
77 FuentesAM Diaz-MejiaJJ Maldonado-RodriguezR andAmabile-CuevasCF (2001) Differential activities of theSoxR protein of Escherichia coli SoxS is not required for geneactivation under iron deprivation FEMS Microbiol Lett 201271ndash275
78 VargheseS TangY and ImlayJA (2003) Contrasting sensitivitiesof Escherichia coli aconitases A and B to oxidation and irondepletion J Bacteriol 185 221ndash230
79 ZavitzKH DiGateRJ and MariansKJ (1991) The PriB andPriC replication proteins of Escherichia coli Genes DNAsequence overexpression and purification J Biol Chem 26613988ndash13995
80 SandlerSJ (2005) Requirements for replication restart proteinsduring constitutive stable DNA replication in Escherichia coli K-12Genetics 169 1799ndash1806
81 SchneiderTD (2001) Strong minor groove base conservation insequence logos implies DNA distortion or base flipping duringreplication and transcription initiation Nucleic Acids Res 294881ndash4891 httpwwwccrnpncifcrfgov~tomspaperbaseflip
Nucleic Acids Research 2007 Vol 35 No 20 6777
especially the 24-site M9 model approximate the bindingcharacteristics of Fur more fully than previous modelsthat depended on conventional consensus sequences datafrom multiple species and sequences that were not foot-printed (13251) The rigorous approach used hererevealed new binding sites disproved two sites predictedby a consensus sequence and clarified the manner inwhich the protein binds
To produce these models binding sequences weremultiply aligned to maximize the information content indifferent window sizes Because multiple Fur sites overlapour analysis of proven Fur binding sequences from three
bacterial species gave two or three different alignmentsdepending on the species (Figure 1 SupplementaryFigure S6) By testing these models against publisheddata we showed clearly that the M9 model representssingle-dimer binding sites the M12 model is for two-dimerbinding while the compact M6 alignment was caused bycompression of the binding sequences into a small window(Tables 1 and 2 Figures 1 and 2 Supplementary FiguresS3 S5 and S7)As evaluated by their respective M9 models our
initial set of 11 footprinted E coli Fur binding sitesall contain overlapping sites of 6-base spacing
Table 4 Strong Fur clusters in the E coli genome predicted by the 24-site M9 model
Ri Number Size Coordinate Genes Experimental Position References(bits) (bp) data
274 12 95 2066611 yoeA G 48 NA274 5 31 786853 gpmA G 35 (1351)253 4 28 579824 nohB 133 (1)252 4 31 3579054 ryhB G 15 (57)233 3 31 4098746 sodA F 87 (6069)231 7 59 4516309 fecI F 51 (1360)231 4 37 624072 entC F 36 (39)228 2 25 1903412 yebN 300 NA223 4 48 4603779 fhuFa FG 93 (135160)223 3 25 1634624 nohA G 233 (51)217 2 25 1886651 fadD G +1119b NA215 2 25 1298973 oppA G 233 (7071)213 3 31 1762463 sufA 53 (1360)207 3 31 2244980 cir F 189 (1372)206 5 40 1118358 yceJ 89 NA202 3 31 3465053 bfd 40 (1360)201 10 53 1787602 ydiE 35 (13)201 2 35 1080513 ycdN 57 (1360)199 8 106 3214572 yqjH 59 (13)199 7 103 1752756 ydhY 255 NA199 2 25 2510784 mntH 56 (7374)196 3 25 2231892 yeiT 163 NA196 2 25 1577358 yddA +8c (13)195 3 31 167439 fhuA G 45 (1360)194 4 37 621437 fepD ndash entS FG 25 86 (1375)193 3 52 675768 ybeQ +2c NA191 3 31 3150076 exbB G 70 (136072)187 8 96 2784036 ygaQ 383 NA187 4 37 611889 fepA ndash fes F 172 149 (13)186 7 38 3266404 yhaC 33 NA186 4 28 331901 yahA G +306e NA183 4 61 331085 yahA 510e NA176 6 41 1333484 acnA 371 (5776ndash78)176 2 25 4514779 fecA F 79 (1360)176 1 19 2254041 yeiL +5d NA175 3 25 490057 priC ndash ybaN 21 49 (7980)173 2 29 2454142 yfcV 474 NA172 8 41 3582772 yrhB 10 NA172 4 37 840877 fiu 123 (13)171 3 40 1631071 ydfK +8c (13)171 3 40 1432273 ynaE +8c (13)171 2 25 4570225 yjiT 212 NA
The strongest individual information value (Ri bits) in each cluster is shown followed by the number of walkers (Rigt 0 bits) the size of the clusterthe coordinate of the strongest site the genes that are presumably controlled experimental data footprinted (F) andor gel shifted (G) sites thelocation of the strongest Fur site relative to the gene start or stop codon (as noted) and references about Fur regulation of the genes (NA means noreference is available)aThis cluster corresponds to the high-affinity region (fhuF1) in the fhuF promoter (Figure 5)bThe coordinate is high because Fur cluster is inside the fadD coding regioncThese Fur clusters overlap with translational startsdThis Fur cluster overlaps with the stop codoneThe yahA gene has two Fur clusters one is inside the coding region another is 510 bases upstream of the translational start The Fur cluster insidethe gene was tested by gel shift experiments (Figure 3)
Nucleic Acids Research 2007 Vol 35 No 20 6773
(Supplementary Figure S8) while only 7 of the 20P aeruginosa Fur binding sites and 6 of the 22 B subtilisFur binding sites are single-dimer binding sites (data notshown) This scarcity of multiple sites explains why anM12 alignment can be obtained for E coli but not forP aeruginosa or B subtilis When we performed multiplealignment by only using overlapping sites of 6-bp spacingwe obtained M12 alignments for both P aeruginosa andB subtilis (data not shown) These observations alsosupport the use of M9 as a single-dimer modelOur multiple alignment method should apply to any
DNA binding sites that contain internal direct repeats asrepetition within binding sites may cause alternativealignments of the sites In the RegulonDB (httpregulondbccgunammx) (64) a Fur binding modelwas created by multiply aligning 47 Fur sites althoughit is not clear what these 47 sites are and how thealignment was made Based on their alignment (httpregulondbccgunammxhtmlmatrix_Alignmentjsp) we made a sequence logo (data notshown) which we found to be similar to our M6 logosAlthough the M6 models appear to be able to predictsome sites (eg Tables 1 and 2 Supplementary Figures S3S5 and S7) the models only represent the internal partof overlapping Fur binding sites ie from 6 to thorn6 of theM12 model (Figure 1A)A 18 A crystal structure for P aeruginosa Fur
was obtained by Pohl et al and based on this theauthors proposed a single-dimer binding model (9) TheP aeruginosa crystal structure contains a putative DNAbinding a-helix H4 and a loop between helices H1 and H2both of which were proposed to be involved in bindingDNA through contacting bases in the major groove Thusa single dimer would protect two consecutive majorgrooves on the same face of DNA and the minimalrecognition unit of Fur should be close to 20 bp (9) OurM9 binding site models (19 bp a 9-1-9 inverted repeat) fitwith this single-dimer binding model closely Pohl et alalso proposed a two-dimer binding model in whichtwo Fur dimers bind two overlapping sites that are 5-bpapart (9) However in our whole genome scans andfootprint scans as well we did not observe any case of5-bp separated overlapping sites above 5 bits Instead wemostly found 3- or 6-bp separated overlapping sitesAbove five bits we found 95 cases of 3-bp spacing and92 cases of 6-bp spacingSeveral other consensus-based Fur binding site models
have been proposed to interpret a 19-bp consensuslsquoFur boxrsquo (Figure 6) (65) Within these two earlier
models the classical model and hexamer model(573754) interpreted the lsquoFur boxrsquo as a single recogni-tion unit of a Fur dimer Later Lavrrar and McIntoshsuggested that a 13-bp inverted repeat (6-1-6) is theminimal unit recognized by a single Fur dimer and thattwo overlapping lsquo6-1-6rsquo motifs correspond to the Fur boxwhich is required for high-affinity binding of Fur (5558)Baichoo and Helmann reinterpreted the Fur box as twooverlapping 7-1-7 inverted repeats with a 6-bp spacingand suggested that a 7-1-7 site but not 6-1-6 representsthe minimal recognition unit of Fur (40) Baichoo andHelmann also demonstrated that high-affinity bindingby Fur can happen on a single-dimer binding site ie aninverted 7-1-7 site
The 7-1-7 model basically agrees with our M9 models(Figure 6) The main difference is that the 7-1-7 model didnot count the weakly conserved bases at positions 9 8thorn8 and thorn9 in the B subtilis Fur binding site alignment(Figure 4C) As the E coli and P aeruginosa M9 modelshave highly conserved bases at these four positions(Figure 4A and B) and also the Fur binding mechanismsfor these three species are similar (9) we suggest thata minimal Fur binding unit of 19 bp (9-1-9) as representedby our M9 models is more reasonable
Relative to our M9 models the Fur box only representsthe internal part of two overlapping sites that areseparated by 6 bases ie from 9 to thorn9 of the M12logo (Figures 1A and 6) thus any predictions based on theFur box may not be accurate Two sites (exbD and ygaC)that were predicted by matching to the Fur box wereshown to be non-binding (Figure 3)
Many difficulties in understanding Fur binding sites canbe attributed to the choice of consensus sequences asa model (66) In contrast to the individual informationweight matrix model the consensus sequence methodignores the varying importance of bases by treatingmismatches equivalently In addition the consensusmethod does not have a criterion for an acceptablenumber of mismatches
Sequence walkers predicted by the 24-site M9 modelcover 83 of the footprints (Table 3 Figure 5Supplementary Figure S8) Similar results were obtainedfor both the P aeruginosa and B subtilis M9 models(data not shown) These results strongly suggest that ourM9 models can accurately predict Fur binding
Fur Binding Site Clusters
The whole genome scan with the 24-site M9model revealed363 Fur binding clusters in E coli most of which containmultiple walkers (up to 14) covering regions up to about140 bases This is comparable to the size of knownfootprints which ranges from 30 to 103 bases (Table 3Supplementary Figure S8) Most of the clusters (303 outof 363 83) were found to overlap with one or morestrong s70 promoters Indeed the four Fur clusters inthe middle of coding regions gspC garP yahA and fadDthat we tested by gel mobility shift assays (Figure 3) allhave one or more potential s70 sites (Ri gt 4 bits) exactlyoverlapping them (see Figures S10ndashS13) The function ofthese control elements is unknown
Fur boxClassical model 9-1-9Hexamer model 6-6-1-6Lavrrar model 6-1-6
Baichoo model 7-1-7
M9 model 9-1-9
GATAATGATAATCATTATC
========gtOlt========
=====gt=====gtOlt=====
=====gtOlt==========gtOlt=====
t=====gtOlt============gtOlt=====a
aat=====gtOlt================gtOlt=====att
Figure 6 Different Fur binding models to interpret the Fur box
6774 Nucleic Acids Research 2007 Vol 35 No 20
We correlated our whole genome scan with publishedmicroarray data about Fur in E coli (1360) The resultsshowed that all operons that are involved in ironmetabolism have a predicted strong Fur binding cluster(higher than 13 bits) suggesting direct repression ofthese genes by Fur In contrast only a few of otherFur-activated or repressed operons have a predicted Furcluster higher than 10 bits suggesting indirect control ofthese genes by Fur It has been shown that RNA generyhB directly targets and represses a set of other genes andryhB itself is directly repressed by Fur thus ryhB playsan important role in indirect activation of genes by Furin E coli (135760)
It has been suggested that in N meningitidis Fur canalso directly activate genes by binding to the regionupstream of the promoters (59) However our wholegenome analysis did not support this activation mechan-ism in E coli The E coli ryhB gene is located betweengenes yhhX and yhhY No corresponding homologs of thegenes ryhB yhhX or yhhY can be found in N meningitidisThis suggests that Fur activation mechanisms may bedifferent in different species
Disentangling Fur models
Each binding site that we have studied in detail withinformation theory has had unique properties andchallenges for analysis For example ribosome bindingsites were initially modeled as rigid objects (16) but whenenough sites became available it was clear that theShinendashDalgarno sequence was being lost because it has avariable distance to the initiation codon (67) and a flexiblemodel was required (48) The flexible model was latersuccessfully applied to s70 promoters (47) and that wasused to study Fur sites Likewise information theoryanalysis revealed that Fis binding sites are commonlyfound to overlap each other (68) leading to someuncertainty as to how much nearby sites influence thesites being studied in the alignments Overlapping Fursites that are bound by Fur protein dimers create an evenmore complex situation
Although the 24-site M9 model was quite successful themodel is comprised of information from multiple Fur sitesbecause the Fur dimers overlap on the DNA Since theclusters frequently display 6-base separation of sitespositions 3 to thorn9 of the model contain data frompositions 9 to thorn3 of the adjacent Fur sites eg see thelogo of Figure 4 and the M9 walkers of Figure 2AIn addition information from sites separated by 3 basesand the complements of all of these also contribute to theoverall information content of the model It is not obvioushow to disentangle these effects We suggest that the M9model can be scanned across the genome to locate strongsingle-dimer binding sites which could then be experi-mentally confirmed and used to construct a disentangledmodel
Unfortunately the distribution of the individual infor-mation in these sites would be determined by the modelused to locate them and this could introduce biases Forexample the mean of the distribution can be altered justby selecting different subsets of sites from the distribution
A similar bias has occurred in binding site databaseswhere the consensus sequence was used to locate newsites (23) Thus although the sequence logo would nolonger contain self-overlapping data it would not neces-sarily represent the natural distribution of site strengthsTo approach such an ultimate model may require furtherscanning of footprinted regions Incorporation of theweaker sites in the footprints would of course re-entanglethe model so such scans might be best used only todetermine whether the model can detect the weaker sitesA viable solution may be to obtain all single-dimerbinding sites in the genome that have more than zero bitsand can be confirmed experimentally If the resultingmodel matches the observed footprints it may be consi-dered complete On the other hand if the resultingmodel does not match the footprints the mode of bindingby Fur could be different between single dimers andclusters
SUPPLEMENTARY DATA
Supplementary data are available at NAR Online
ACKNOWLEDGEMENTS
We thank Tom OrsquoHalloran and Caryn Outten for Furprotein Pete Rogan for helping to develop the local bestconcept Elaine Bucheimer for writing the localbestprogram and Lakshmanan Iyer Brent Jewett DanielleNeedle Xiao Ma Shu Ouyang Denise Rubens andBruce Shapiro for their helpful comments and discussionsKAL also thanks the National Cancer Institute atFrederick for sponsoring the Werner H Kirsten StudentIntern Program This publication has been funded in partwith Federal funds from the National Cancer InstituteNational Institutes of Health under contract NO1-CO-12400 The content of this publication does not necessarilyreflect the views or policies of the Department of Healthand Human Services nor does mention of trade namescommercial products or organizations imply endorsementby the US Government This research was supported inpart by the Intramural Research Program of the NIHNational Cancer Institute Center for Cancer ResearchFunding to pay the Open Access publication charges forthis article was provided by National Cancer Institute
Conflict of interest statement None declared
REFERENCES
1 StojiljkovicI BaumlerAJ and HantkeK (1994) Fur regulon ingram-negative bacteria Identification and characterization of newiron-regulated Escherichia coli genes by a fur titration assay[published erratum appears in J Mol Biol 1994 Jul 15240(3)271]J Mol Biol 236 531ndash545
2 Le CamE FrechonD BarrayM FourcadeA and DelainE(1994) Observation of binding and polymerization of Fur repressoronto operator-containing DNA with electron and atomic forcemicroscopes Proc Natl Acad Sci USA 91 11816ndash11820
3 CalderwoodSB and MekalanosJJ (1987) Iron regulation ofShiga-like toxin expression in Escherichia coli is mediated by thefur locus J Bacteriol 169 4759ndash4764
Nucleic Acids Research 2007 Vol 35 No 20 6775
4 NiederhofferEC NaranjoCM BradleyKL and FeeJA (1990)Control of Escherichia coli superoxide dismutase (sodA and sodB)genes by the ferric uptake regulation (fur) locus J Bacteriol 1721930ndash1938
5 EscolarL Perez-MartinJ and de LorenzoV (1998) Binding of theFur (Ferric Uptake Regulator) repressor of Escherichia coli toarrays of the GATAAT sequence J Mol Biol 283 537ndash547
6 BaggA and NeilandsJB (1987) Ferric uptake regulation proteinacts as a repressor employing iron (II) as a cofactor to bind theoperator of an iron transport operon in Escherichia coliBiochemistry 26 5471ndash5477
7 de LorenzoV WeeS HerreroM and NeilandsJB (1987)Operator sequences of the aerobactin operon of plasmid ColV-K30binding the ferric uptake regulation (fur) repressor J Bacteriol169 2624ndash2630
8 AlthausEW OuttenCE OlsonKE CaoH andOrsquoHalloranTV (1999) The ferric uptake regulation (Fur) repressoris a zinc metalloprotein Biochemistry 38 6559ndash6569
9 PohlE HallerJC MijovilovichA Meyer-KlauckeWGarmanE and VasilML (2003) Architecture of a protein centralto iron homeostasis crystal structure and spectroscopic analysis ofthe ferric uptake regulator Mol Microbiol 47 903ndash915
10 KammlerM SchonC and HantkeK (1993) Characterization ofthe ferrous iron uptake system of Escherichia coli J Bacteriol 1756212ndash6219
11 DesaiPJ AngererA and GencoCA (1996) Analysis of Furbinding to operator sequences within the Neisseria gonorrhoeae fbpApromoter J Bacteriol 178 5020ndash5023
12 HeidrichC HantkeK BierbaumG and SahlHG (1996)Identification and analysis of a gene encoding a Fur-likeprotein of Staphylococcus epidermidis FEMS Microbiol Lett 140253ndash259
13 McHughJP Rodriguez-QuinonesF Abdul-TehraniHSvistunenkoDA PooleRK CooperCE and AndrewsSC(2003) Global iron-dependent gene regulation in Escherichia coliA new mechanism for iron homeostasis J Biol Chem 27829478ndash29486
14 ShannonCE (1948) A mathematical theory of communicationBell Syst Tech J 27 379ndash423 623ndash656 httpcmbell-labscomcmmswhatshannondaypaperhtml
15 PierceJR (1980) An Introduction to Information Theory SymbolsSignals and Noise 2nd edn Dover Publications Inc New York
16 SchneiderTD StormoGD GoldL and EhrenfeuchtA (1986)Information content of binding sites on nucleotide sequencesJ Mol Biol 188 415ndash431 httpwwwccrnpncifcrfgov~tomspaperschneider1986
17 SchneiderTD and MastronardeD (1996) Fast multiple alignmentof ungapped DNA sequences using information theory and arelaxation method Discrete Appli Math 71 259ndash268 httpwwwccrnpncifcrfgov~tomspapermalign
18 SchneiderTD and StephensRM (1990) Sequence logos a newway to display consensus sequences Nucleic Acids Res 186097ndash6100 httpwwwccrnpncifcrfgov~tomspaperlogopaper
19 SchneiderTD (1997) Information content of individual geneticsequences J Theor Biol 189 427ndash441 httpwwwccrnpncifcrfgov~tomspaperri
20 SchneiderTD (1997) Sequence walkers a graphical method todisplay how binding proteins interact with DNA or RNAsequences Nucleic Acids Res 25 4408ndash4415 httpwwwccrnpncifcrfgov~tomspaperwalker erratum NAR 26(4)1135 1998
21 PaninaEM MironovAA and GelfandMS (2001) Comparativeanalysis of FUR regulons in gamma-proteobacteria Nucleic AcidsRes 29 5195ndash5206
22 SchneiderTD (2000) Evolution of biological informationNucleic Acids Res 28 2794ndash2799 httpwwwccrnpncifcrfgov~tomspaperev
23 VyhlidalCA RoganPK and LeederJS (2004) Development andrefinement of pregnane X receptor (PXR) DNA binding site modelusing information theory insights into PXR-mediated gene regula-tion J Biol Chem 279 46779ndash46786
24 ZhengM DoanB SchneiderTD and StorzG (1999) OxyR andSoxRS regulation of fur J Bacteriol 181 4639ndash4643 httpwwwccrnpncifcrfgov~tomspaperoxyrfur
25 HengenPN BartramSL StewartLE and SchneiderTD (1997)Information analysis of Fis binding sites Nucleic Acids Res 254994ndash5002 httpwwwccrnpncifcrfgov~tomspaperfisinfo
26 WoodTI GriffithKL FawcettWP Jair K-W SchneiderTDand WolfRE (1999) Interdependence of the position andorientation of SoxS binding sites in the transcriptional activation ofthe class I subset of Escherichia coli superoxide-inducible promotersMol Microbiol 34 414ndash430
27 ZhengM WangX DoanB LewisKA SchneiderTD andStorzG (2001) Computation-directed identification of OxyR-DNAbinding sites in Escherichia coli J Bacteriol 183 4571ndash4579
28 RoganPK FauxBM and SchneiderTD (1998) Informationanalysis of human splice site mutations Hum Mutat 12 153ndash171Erratum in Hum Mutat 199913(1)82 httpwwwccrnpncifcrfgov~tomspaperrfs
29 ChenZ and SchneiderTD (2005) Information theory basedT7-like promoter models classification of bacteriophages anddifferential evolution of promoters and their polymerasesNucleic Acids Res 33 6172ndash6187 httpwwwccrnpncifcrfgov~tomspaperst7like
30 ChenZ and SchneiderTD (2006) Comparative analysis of tandemT7-like promoter containing regions in enterobacterial genomesreveals a novel group of genetic islands Nucleic Acids Res 341133ndash1147 httpwwwccrnpncifcrfgov~tomspaperst7island
31 SchneiderTD (1996) Reading of DNA sequence logos predictionof major groove binding by information theory Meth Enzymol274 445ndash455 httpwwwccrnpncifcrfgov~tomspaperoxyr
32 GriggsDW and KoniskyJ (1989) Mechanism for iron-regulatedtranscription of the Escherichia coli cir gene metal-dependentbinding of Fur protein to the promoters J Bacteriol 1711048ndash1052
33 AngererA and BraunV (1998) Iron regulates transcription of theEscherichia coli ferric citrate transport genes directly and throughthe transcription initiation proteins Arch Microbiol 169 483ndash490
34 de LorenzoV HerreroM GiovanniniF and NeilandsJB (1988)Fur (ferric uptake regulation) protein and CAP (catabolite-activatorprotein) modulate transcription of fur gene in Escherichia coliEur J Biochem 173 537ndash546
35 HuntMD PettisGS and McIntoshMA (1994) Promoter andoperator determinants for fur-mediated iron regulation in thebidirectional fepA-fes control region of the Escherichia colienterobactin gene system J Bacteriol 176 3944ndash3955
36 TardatB and TouatiD (1993) Iron and oxygen regulation ofEscherichia coli MnSOD expression competition between the globalregulators Fur and ArcA for binding to DNA Mol Microbiol 953ndash63
37 EscolarL Perez-MartinJ and de LorenzoV (2000) Evidence ofan unusually long operator for the fur repressor in the aerobactinpromoter of Escherichia coli J Biol Chem 275 24709ndash24714
38 YoungGM and PostleK (1994) Repression of tonB transcriptionduring anaerobic growth requires Fur binding at the promoterand a second factor binding upstream Mol Microbiol 11943ndash954
39 BrickmanTJ OzenbergerBA and McIntoshMA (1990)Regulation of divergent transcription from the iron-responsivefepB-entC promoter-operator regions in Escherichia coliJ Mol Biol 212 669ndash682
40 BaichooN and HelmannJD (2002) Recognition of DNA by Fura reinterpretation of the Fur box consensus sequence J Bacteriol184 5826ndash5832
41 OchsnerUA and VasilML (1996) Gene repression by the ferricuptake regulator in Pseudomonas aeruginosa cycle selection of iron-regulated genes Proc Natl Acad Sci USA 93 4409ndash4414
42 BaichooN WangT YeR and HelmannJD (2002) Globalanalysis of the Bacillus subtilis Fur regulon and the iron starvationstimulon Mol Microbiol 45 1613ndash1629
43 ChristoffersenCA BrickmanTJ Hook-BarnardI andMcIntoshMA (2001) Regulatory architecture of the iron-regulatedfepD-ybdA bidirectional promoter region in Escherichia coliJ Bacteriol 183 2059ndash2070
44 ChaiS WelchTJ and CrosaJH (1998) Characterization of theinteraction between Fur and the iron transport promoter of thevirulence plasmid in Vibrio anguillarum J Biol Chem 27333841ndash33847
6776 Nucleic Acids Research 2007 Vol 35 No 20
45 EscolarL Perez-MartinJ and de LorenzoV (1998) Coordinatedrepression in vitro of the divergent fepA-fes promoters ofEscherichia coli by the iron uptake regulation (Fur) proteinJ Bacteriol 180 2579ndash2582
46 EscolarL de LorenzoV and Perez-MartinJ (1997)Metalloregulation in vitro of the aerobactin promoter of Escherichiacoli by the Fur (ferric uptake regulation) protein Mol Microbiol26 799ndash808
47 ShultzabergerRK ChenZ LewisKA and SchneiderTD (2007)Anatomy of Escherichia coli s70 promoters Nucleic Acids Res 35771ndash788 httpwwwccrnpncifcrfgov~tomspaperflexprom
48 ShultzabergerRK BucheimerRE RuddKE andSchneiderTD (2001) Anatomy of Escherichia coli ribosomebinding sites J Mol Biol 313 215ndash228 httpwwwccrnpncifcrfgov~tomspaperflexrbs
49 HiraoI KawaiG YoshizawaS NishimuraY IshidoYWatanabeK and MiuraK (1994) Most compact hairpin-turnstructure exerted by a short DNA fragment d(GCGAAGC) insolution an extraordinarily stable structure resistant to nucleasesand heat Nucleic Acids Res 22 576ndash582
50 Eick-HelmerichK and BraunV (1989) Import of biopolymers intoEscherichia coli nucleotide sequences of the exbB and exbD genesare homologous to those of the tolQ and tolR genes respectivelyJ Bacteriol 171 5117ndash5126
51 VassinovaN and KozyrevD (2000) A method for direct cloning ofFur-regulated genes identification of seven new Fur-regulated lociin Escherichia coli Microbiology 146 3171ndash3182
52 ToledanoMB KullikI TrinhF BairdPT SchneiderTD andStorzG (1994) Redox-dependent shift of OxyR-DNA contactsalong an extended DNA binding site a mechanism for differentialpromoter selection Cell 78 897ndash909
53 BlattnerFR PlunkettG III BlochCA PernaNT BurlandVRileyM Collado-VidesJ GlasnerJD RodeCK et al (1997)The complete genome sequence of Escherichia coli K-12 Science277 1453ndash1474
54 BaggA and NeilandsJB (1987) Molecular mechanism of regula-tion of siderophore-mediated iron assimilation Microbiol Rev 51509ndash518
55 LavrrarJL and McIntoshMA (2003) Architecture of a Furbinding site a comparative analysis J Bacteriol 185 2194ndash2202
56 OrchardK and MayGE (1993) An EMSA-based method fordetermining the molecular weight of a protein ndash DNA complexNucleic Acids Res 21 3335ndash3336
57 MasseE and GottesmanS (2002) A small RNA regulates theexpression of genes involved in iron metabolism in Escherichia coliProc Natl Acad Sci USA 99 4620ndash4625
58 LavrrarJL ChristoffersenCA and McIntoshMA (2002)FurndashDNA interactions at the bidirectional fepDGC-entS promoterregion in Escherichia coli J Mol Biol 322 983ndash995
59 DelanyI RappuoliR and ScarlatoV (2004) Fur functions as anactivator and as a repressor of putative virulence genes in Neisseriameningitidis Mol Microbiol 52 1081ndash1090
60 MasseE VanderpoolCK and GottesmanS (2005) Effect ofRyhB small RNA on global iron use in Escherichia coliJ Bacteriol 187 6962ndash6971
61 ArgamanL HershbergR VogelJ BejeranoG WagnerEGMargalitH and AltuviaS (2001) Novel small RNA-encodinggenes in the intergenic regions of Escherichia coli Curr Biol 11941ndash950
62 WassarmanKM RepoilaF RosenowC StorzG andGottesmanS (2001) Identification of novel small RNAs usingcomparative genomics and microarrays Genes Dev 15 1637ndash1651
63 RobisonK McGuireAM and ChurchGM (1998) A compre-hensive library of DNA-binding site matrices for 55 proteins appliedto the complete Escherichia coli K-12 genome J Mol Biol 284241ndash254
64 SalgadoH Santos-ZavaletaA Gama-CastroS Millan-ZarateDDiaz-PeredoE Sanchez-SolanoF Perez-RuedaEBonavides-MartinezC and Collado-VidesJ (2001) RegulonDB(version 32) transcriptional regulation and operon organization inEscherichia coli K-12 Nucleic Acids Res 29 72ndash74
65 AndrewsSC RobinsonAK and Rodriguez-QuinonesF (2003)Bacterial iron homeostasis FEMS Microbiol Rev 27 215ndash237
66 SchneiderTD (2002) Consensus Sequence Zen ApplBioinformatics 1 111ndash119 httpwwwccrnpncifcrfgov~tomspaperszen
67 RuddKE and SchneiderTD (1992) Compilation of E coliribosome binding sites In MillerJH (ed) A Short Course inBacterial Genetics A Laboratory Manual and Handbook forEscherichia coli and Related Bacteria Cold Spring HarborCold Spring Harbor Laboratory Press New York pp 1719ndash1745
68 HengenPN LyakhovIG StewartLE and SchneiderTD(2003) Molecular flip-flops formed by overlapping Fis sitesNucleic Acids Res 31 6663ndash6673
69 FeeJA (1991) Regulation of sod genes in Escherichia colirelevance to superoxide dismutase function Mol Microbiol 52599ndash2610
70 IgarashiK SaishoT YuguchiM and KashiwagiK (1997)Molecular mechanism of polyamine stimulation of thesynthesis of oligopeptide-binding protein J Biol Chem 2724058ndash4064
71 YoshidaM MeksuriyenD KashiwagiK KawaiG andIgarashiK (1999) Polyamine stimulation of the synthesis ofoligopeptide-binding protein (OppA) Involvement of a structuralchange of the Shine-Dalgarno sequence and the initiation codonaug in oppa mRNA J Biol Chem 274 22723ndash22728
72 LitwinCM and CalderwoodSB (1993) Role of iron in regulationof virulence genes Clin Microbiol Rev 6 137ndash149
73 MakuiH RoigE ColeST HelmannJD GrosP andCellierMF (2000) Identification of the Escherichia coli K-12Nramp orthologue (MntH) as a selective divalent metal iontransporter Mol Microbiol 35 1065ndash1078
74 PatzerSI and HantkeK (2001) Dual repression by Fe(2+)-Furand Mn(2+)-MntR of the mntH gene encoding an NRAMP-likeMn(2+) transporter in Escherichia coli J Bacteriol 1834806ndash4813
75 FurrerJL SandersDN Hook-BarnardIG and McIntoshMA(2002) Export of the siderophore enterobactin in Escherichia coliinvolvement of a 43 kDa membrane exporter Mol Microbiol 441225ndash1234
76 CunninghamL GruerMJ and GuestJR (1997) Transcriptionalregulation of the aconitase genes (acnA and acnB) of Escherichiacoli Microbiology 143 ( Pt 12) 3795ndash3805
77 FuentesAM Diaz-MejiaJJ Maldonado-RodriguezR andAmabile-CuevasCF (2001) Differential activities of theSoxR protein of Escherichia coli SoxS is not required for geneactivation under iron deprivation FEMS Microbiol Lett 201271ndash275
78 VargheseS TangY and ImlayJA (2003) Contrasting sensitivitiesof Escherichia coli aconitases A and B to oxidation and irondepletion J Bacteriol 185 221ndash230
79 ZavitzKH DiGateRJ and MariansKJ (1991) The PriB andPriC replication proteins of Escherichia coli Genes DNAsequence overexpression and purification J Biol Chem 26613988ndash13995
80 SandlerSJ (2005) Requirements for replication restart proteinsduring constitutive stable DNA replication in Escherichia coli K-12Genetics 169 1799ndash1806
81 SchneiderTD (2001) Strong minor groove base conservation insequence logos implies DNA distortion or base flipping duringreplication and transcription initiation Nucleic Acids Res 294881ndash4891 httpwwwccrnpncifcrfgov~tomspaperbaseflip
Nucleic Acids Research 2007 Vol 35 No 20 6777
(Supplementary Figure S8) while only 7 of the 20P aeruginosa Fur binding sites and 6 of the 22 B subtilisFur binding sites are single-dimer binding sites (data notshown) This scarcity of multiple sites explains why anM12 alignment can be obtained for E coli but not forP aeruginosa or B subtilis When we performed multiplealignment by only using overlapping sites of 6-bp spacingwe obtained M12 alignments for both P aeruginosa andB subtilis (data not shown) These observations alsosupport the use of M9 as a single-dimer modelOur multiple alignment method should apply to any
DNA binding sites that contain internal direct repeats asrepetition within binding sites may cause alternativealignments of the sites In the RegulonDB (httpregulondbccgunammx) (64) a Fur binding modelwas created by multiply aligning 47 Fur sites althoughit is not clear what these 47 sites are and how thealignment was made Based on their alignment (httpregulondbccgunammxhtmlmatrix_Alignmentjsp) we made a sequence logo (data notshown) which we found to be similar to our M6 logosAlthough the M6 models appear to be able to predictsome sites (eg Tables 1 and 2 Supplementary Figures S3S5 and S7) the models only represent the internal partof overlapping Fur binding sites ie from 6 to thorn6 of theM12 model (Figure 1A)A 18 A crystal structure for P aeruginosa Fur
was obtained by Pohl et al and based on this theauthors proposed a single-dimer binding model (9) TheP aeruginosa crystal structure contains a putative DNAbinding a-helix H4 and a loop between helices H1 and H2both of which were proposed to be involved in bindingDNA through contacting bases in the major groove Thusa single dimer would protect two consecutive majorgrooves on the same face of DNA and the minimalrecognition unit of Fur should be close to 20 bp (9) OurM9 binding site models (19 bp a 9-1-9 inverted repeat) fitwith this single-dimer binding model closely Pohl et alalso proposed a two-dimer binding model in whichtwo Fur dimers bind two overlapping sites that are 5-bpapart (9) However in our whole genome scans andfootprint scans as well we did not observe any case of5-bp separated overlapping sites above 5 bits Instead wemostly found 3- or 6-bp separated overlapping sitesAbove five bits we found 95 cases of 3-bp spacing and92 cases of 6-bp spacingSeveral other consensus-based Fur binding site models
have been proposed to interpret a 19-bp consensuslsquoFur boxrsquo (Figure 6) (65) Within these two earlier
models the classical model and hexamer model(573754) interpreted the lsquoFur boxrsquo as a single recogni-tion unit of a Fur dimer Later Lavrrar and McIntoshsuggested that a 13-bp inverted repeat (6-1-6) is theminimal unit recognized by a single Fur dimer and thattwo overlapping lsquo6-1-6rsquo motifs correspond to the Fur boxwhich is required for high-affinity binding of Fur (5558)Baichoo and Helmann reinterpreted the Fur box as twooverlapping 7-1-7 inverted repeats with a 6-bp spacingand suggested that a 7-1-7 site but not 6-1-6 representsthe minimal recognition unit of Fur (40) Baichoo andHelmann also demonstrated that high-affinity bindingby Fur can happen on a single-dimer binding site ie aninverted 7-1-7 site
The 7-1-7 model basically agrees with our M9 models(Figure 6) The main difference is that the 7-1-7 model didnot count the weakly conserved bases at positions 9 8thorn8 and thorn9 in the B subtilis Fur binding site alignment(Figure 4C) As the E coli and P aeruginosa M9 modelshave highly conserved bases at these four positions(Figure 4A and B) and also the Fur binding mechanismsfor these three species are similar (9) we suggest thata minimal Fur binding unit of 19 bp (9-1-9) as representedby our M9 models is more reasonable
Relative to our M9 models the Fur box only representsthe internal part of two overlapping sites that areseparated by 6 bases ie from 9 to thorn9 of the M12logo (Figures 1A and 6) thus any predictions based on theFur box may not be accurate Two sites (exbD and ygaC)that were predicted by matching to the Fur box wereshown to be non-binding (Figure 3)
Many difficulties in understanding Fur binding sites canbe attributed to the choice of consensus sequences asa model (66) In contrast to the individual informationweight matrix model the consensus sequence methodignores the varying importance of bases by treatingmismatches equivalently In addition the consensusmethod does not have a criterion for an acceptablenumber of mismatches
Sequence walkers predicted by the 24-site M9 modelcover 83 of the footprints (Table 3 Figure 5Supplementary Figure S8) Similar results were obtainedfor both the P aeruginosa and B subtilis M9 models(data not shown) These results strongly suggest that ourM9 models can accurately predict Fur binding
Fur Binding Site Clusters
The whole genome scan with the 24-site M9model revealed363 Fur binding clusters in E coli most of which containmultiple walkers (up to 14) covering regions up to about140 bases This is comparable to the size of knownfootprints which ranges from 30 to 103 bases (Table 3Supplementary Figure S8) Most of the clusters (303 outof 363 83) were found to overlap with one or morestrong s70 promoters Indeed the four Fur clusters inthe middle of coding regions gspC garP yahA and fadDthat we tested by gel mobility shift assays (Figure 3) allhave one or more potential s70 sites (Ri gt 4 bits) exactlyoverlapping them (see Figures S10ndashS13) The function ofthese control elements is unknown
Fur boxClassical model 9-1-9Hexamer model 6-6-1-6Lavrrar model 6-1-6
Baichoo model 7-1-7
M9 model 9-1-9
GATAATGATAATCATTATC
========gtOlt========
=====gt=====gtOlt=====
=====gtOlt==========gtOlt=====
t=====gtOlt============gtOlt=====a
aat=====gtOlt================gtOlt=====att
Figure 6 Different Fur binding models to interpret the Fur box
6774 Nucleic Acids Research 2007 Vol 35 No 20
We correlated our whole genome scan with publishedmicroarray data about Fur in E coli (1360) The resultsshowed that all operons that are involved in ironmetabolism have a predicted strong Fur binding cluster(higher than 13 bits) suggesting direct repression ofthese genes by Fur In contrast only a few of otherFur-activated or repressed operons have a predicted Furcluster higher than 10 bits suggesting indirect control ofthese genes by Fur It has been shown that RNA generyhB directly targets and represses a set of other genes andryhB itself is directly repressed by Fur thus ryhB playsan important role in indirect activation of genes by Furin E coli (135760)
It has been suggested that in N meningitidis Fur canalso directly activate genes by binding to the regionupstream of the promoters (59) However our wholegenome analysis did not support this activation mechan-ism in E coli The E coli ryhB gene is located betweengenes yhhX and yhhY No corresponding homologs of thegenes ryhB yhhX or yhhY can be found in N meningitidisThis suggests that Fur activation mechanisms may bedifferent in different species
Disentangling Fur models
Each binding site that we have studied in detail withinformation theory has had unique properties andchallenges for analysis For example ribosome bindingsites were initially modeled as rigid objects (16) but whenenough sites became available it was clear that theShinendashDalgarno sequence was being lost because it has avariable distance to the initiation codon (67) and a flexiblemodel was required (48) The flexible model was latersuccessfully applied to s70 promoters (47) and that wasused to study Fur sites Likewise information theoryanalysis revealed that Fis binding sites are commonlyfound to overlap each other (68) leading to someuncertainty as to how much nearby sites influence thesites being studied in the alignments Overlapping Fursites that are bound by Fur protein dimers create an evenmore complex situation
Although the 24-site M9 model was quite successful themodel is comprised of information from multiple Fur sitesbecause the Fur dimers overlap on the DNA Since theclusters frequently display 6-base separation of sitespositions 3 to thorn9 of the model contain data frompositions 9 to thorn3 of the adjacent Fur sites eg see thelogo of Figure 4 and the M9 walkers of Figure 2AIn addition information from sites separated by 3 basesand the complements of all of these also contribute to theoverall information content of the model It is not obvioushow to disentangle these effects We suggest that the M9model can be scanned across the genome to locate strongsingle-dimer binding sites which could then be experi-mentally confirmed and used to construct a disentangledmodel
Unfortunately the distribution of the individual infor-mation in these sites would be determined by the modelused to locate them and this could introduce biases Forexample the mean of the distribution can be altered justby selecting different subsets of sites from the distribution
A similar bias has occurred in binding site databaseswhere the consensus sequence was used to locate newsites (23) Thus although the sequence logo would nolonger contain self-overlapping data it would not neces-sarily represent the natural distribution of site strengthsTo approach such an ultimate model may require furtherscanning of footprinted regions Incorporation of theweaker sites in the footprints would of course re-entanglethe model so such scans might be best used only todetermine whether the model can detect the weaker sitesA viable solution may be to obtain all single-dimerbinding sites in the genome that have more than zero bitsand can be confirmed experimentally If the resultingmodel matches the observed footprints it may be consi-dered complete On the other hand if the resultingmodel does not match the footprints the mode of bindingby Fur could be different between single dimers andclusters
SUPPLEMENTARY DATA
Supplementary data are available at NAR Online
ACKNOWLEDGEMENTS
We thank Tom OrsquoHalloran and Caryn Outten for Furprotein Pete Rogan for helping to develop the local bestconcept Elaine Bucheimer for writing the localbestprogram and Lakshmanan Iyer Brent Jewett DanielleNeedle Xiao Ma Shu Ouyang Denise Rubens andBruce Shapiro for their helpful comments and discussionsKAL also thanks the National Cancer Institute atFrederick for sponsoring the Werner H Kirsten StudentIntern Program This publication has been funded in partwith Federal funds from the National Cancer InstituteNational Institutes of Health under contract NO1-CO-12400 The content of this publication does not necessarilyreflect the views or policies of the Department of Healthand Human Services nor does mention of trade namescommercial products or organizations imply endorsementby the US Government This research was supported inpart by the Intramural Research Program of the NIHNational Cancer Institute Center for Cancer ResearchFunding to pay the Open Access publication charges forthis article was provided by National Cancer Institute
Conflict of interest statement None declared
REFERENCES
1 StojiljkovicI BaumlerAJ and HantkeK (1994) Fur regulon ingram-negative bacteria Identification and characterization of newiron-regulated Escherichia coli genes by a fur titration assay[published erratum appears in J Mol Biol 1994 Jul 15240(3)271]J Mol Biol 236 531ndash545
2 Le CamE FrechonD BarrayM FourcadeA and DelainE(1994) Observation of binding and polymerization of Fur repressoronto operator-containing DNA with electron and atomic forcemicroscopes Proc Natl Acad Sci USA 91 11816ndash11820
3 CalderwoodSB and MekalanosJJ (1987) Iron regulation ofShiga-like toxin expression in Escherichia coli is mediated by thefur locus J Bacteriol 169 4759ndash4764
Nucleic Acids Research 2007 Vol 35 No 20 6775
4 NiederhofferEC NaranjoCM BradleyKL and FeeJA (1990)Control of Escherichia coli superoxide dismutase (sodA and sodB)genes by the ferric uptake regulation (fur) locus J Bacteriol 1721930ndash1938
5 EscolarL Perez-MartinJ and de LorenzoV (1998) Binding of theFur (Ferric Uptake Regulator) repressor of Escherichia coli toarrays of the GATAAT sequence J Mol Biol 283 537ndash547
6 BaggA and NeilandsJB (1987) Ferric uptake regulation proteinacts as a repressor employing iron (II) as a cofactor to bind theoperator of an iron transport operon in Escherichia coliBiochemistry 26 5471ndash5477
7 de LorenzoV WeeS HerreroM and NeilandsJB (1987)Operator sequences of the aerobactin operon of plasmid ColV-K30binding the ferric uptake regulation (fur) repressor J Bacteriol169 2624ndash2630
8 AlthausEW OuttenCE OlsonKE CaoH andOrsquoHalloranTV (1999) The ferric uptake regulation (Fur) repressoris a zinc metalloprotein Biochemistry 38 6559ndash6569
9 PohlE HallerJC MijovilovichA Meyer-KlauckeWGarmanE and VasilML (2003) Architecture of a protein centralto iron homeostasis crystal structure and spectroscopic analysis ofthe ferric uptake regulator Mol Microbiol 47 903ndash915
10 KammlerM SchonC and HantkeK (1993) Characterization ofthe ferrous iron uptake system of Escherichia coli J Bacteriol 1756212ndash6219
11 DesaiPJ AngererA and GencoCA (1996) Analysis of Furbinding to operator sequences within the Neisseria gonorrhoeae fbpApromoter J Bacteriol 178 5020ndash5023
12 HeidrichC HantkeK BierbaumG and SahlHG (1996)Identification and analysis of a gene encoding a Fur-likeprotein of Staphylococcus epidermidis FEMS Microbiol Lett 140253ndash259
13 McHughJP Rodriguez-QuinonesF Abdul-TehraniHSvistunenkoDA PooleRK CooperCE and AndrewsSC(2003) Global iron-dependent gene regulation in Escherichia coliA new mechanism for iron homeostasis J Biol Chem 27829478ndash29486
14 ShannonCE (1948) A mathematical theory of communicationBell Syst Tech J 27 379ndash423 623ndash656 httpcmbell-labscomcmmswhatshannondaypaperhtml
15 PierceJR (1980) An Introduction to Information Theory SymbolsSignals and Noise 2nd edn Dover Publications Inc New York
16 SchneiderTD StormoGD GoldL and EhrenfeuchtA (1986)Information content of binding sites on nucleotide sequencesJ Mol Biol 188 415ndash431 httpwwwccrnpncifcrfgov~tomspaperschneider1986
17 SchneiderTD and MastronardeD (1996) Fast multiple alignmentof ungapped DNA sequences using information theory and arelaxation method Discrete Appli Math 71 259ndash268 httpwwwccrnpncifcrfgov~tomspapermalign
18 SchneiderTD and StephensRM (1990) Sequence logos a newway to display consensus sequences Nucleic Acids Res 186097ndash6100 httpwwwccrnpncifcrfgov~tomspaperlogopaper
19 SchneiderTD (1997) Information content of individual geneticsequences J Theor Biol 189 427ndash441 httpwwwccrnpncifcrfgov~tomspaperri
20 SchneiderTD (1997) Sequence walkers a graphical method todisplay how binding proteins interact with DNA or RNAsequences Nucleic Acids Res 25 4408ndash4415 httpwwwccrnpncifcrfgov~tomspaperwalker erratum NAR 26(4)1135 1998
21 PaninaEM MironovAA and GelfandMS (2001) Comparativeanalysis of FUR regulons in gamma-proteobacteria Nucleic AcidsRes 29 5195ndash5206
22 SchneiderTD (2000) Evolution of biological informationNucleic Acids Res 28 2794ndash2799 httpwwwccrnpncifcrfgov~tomspaperev
23 VyhlidalCA RoganPK and LeederJS (2004) Development andrefinement of pregnane X receptor (PXR) DNA binding site modelusing information theory insights into PXR-mediated gene regula-tion J Biol Chem 279 46779ndash46786
24 ZhengM DoanB SchneiderTD and StorzG (1999) OxyR andSoxRS regulation of fur J Bacteriol 181 4639ndash4643 httpwwwccrnpncifcrfgov~tomspaperoxyrfur
25 HengenPN BartramSL StewartLE and SchneiderTD (1997)Information analysis of Fis binding sites Nucleic Acids Res 254994ndash5002 httpwwwccrnpncifcrfgov~tomspaperfisinfo
26 WoodTI GriffithKL FawcettWP Jair K-W SchneiderTDand WolfRE (1999) Interdependence of the position andorientation of SoxS binding sites in the transcriptional activation ofthe class I subset of Escherichia coli superoxide-inducible promotersMol Microbiol 34 414ndash430
27 ZhengM WangX DoanB LewisKA SchneiderTD andStorzG (2001) Computation-directed identification of OxyR-DNAbinding sites in Escherichia coli J Bacteriol 183 4571ndash4579
28 RoganPK FauxBM and SchneiderTD (1998) Informationanalysis of human splice site mutations Hum Mutat 12 153ndash171Erratum in Hum Mutat 199913(1)82 httpwwwccrnpncifcrfgov~tomspaperrfs
29 ChenZ and SchneiderTD (2005) Information theory basedT7-like promoter models classification of bacteriophages anddifferential evolution of promoters and their polymerasesNucleic Acids Res 33 6172ndash6187 httpwwwccrnpncifcrfgov~tomspaperst7like
30 ChenZ and SchneiderTD (2006) Comparative analysis of tandemT7-like promoter containing regions in enterobacterial genomesreveals a novel group of genetic islands Nucleic Acids Res 341133ndash1147 httpwwwccrnpncifcrfgov~tomspaperst7island
31 SchneiderTD (1996) Reading of DNA sequence logos predictionof major groove binding by information theory Meth Enzymol274 445ndash455 httpwwwccrnpncifcrfgov~tomspaperoxyr
32 GriggsDW and KoniskyJ (1989) Mechanism for iron-regulatedtranscription of the Escherichia coli cir gene metal-dependentbinding of Fur protein to the promoters J Bacteriol 1711048ndash1052
33 AngererA and BraunV (1998) Iron regulates transcription of theEscherichia coli ferric citrate transport genes directly and throughthe transcription initiation proteins Arch Microbiol 169 483ndash490
34 de LorenzoV HerreroM GiovanniniF and NeilandsJB (1988)Fur (ferric uptake regulation) protein and CAP (catabolite-activatorprotein) modulate transcription of fur gene in Escherichia coliEur J Biochem 173 537ndash546
35 HuntMD PettisGS and McIntoshMA (1994) Promoter andoperator determinants for fur-mediated iron regulation in thebidirectional fepA-fes control region of the Escherichia colienterobactin gene system J Bacteriol 176 3944ndash3955
36 TardatB and TouatiD (1993) Iron and oxygen regulation ofEscherichia coli MnSOD expression competition between the globalregulators Fur and ArcA for binding to DNA Mol Microbiol 953ndash63
37 EscolarL Perez-MartinJ and de LorenzoV (2000) Evidence ofan unusually long operator for the fur repressor in the aerobactinpromoter of Escherichia coli J Biol Chem 275 24709ndash24714
38 YoungGM and PostleK (1994) Repression of tonB transcriptionduring anaerobic growth requires Fur binding at the promoterand a second factor binding upstream Mol Microbiol 11943ndash954
39 BrickmanTJ OzenbergerBA and McIntoshMA (1990)Regulation of divergent transcription from the iron-responsivefepB-entC promoter-operator regions in Escherichia coliJ Mol Biol 212 669ndash682
40 BaichooN and HelmannJD (2002) Recognition of DNA by Fura reinterpretation of the Fur box consensus sequence J Bacteriol184 5826ndash5832
41 OchsnerUA and VasilML (1996) Gene repression by the ferricuptake regulator in Pseudomonas aeruginosa cycle selection of iron-regulated genes Proc Natl Acad Sci USA 93 4409ndash4414
42 BaichooN WangT YeR and HelmannJD (2002) Globalanalysis of the Bacillus subtilis Fur regulon and the iron starvationstimulon Mol Microbiol 45 1613ndash1629
43 ChristoffersenCA BrickmanTJ Hook-BarnardI andMcIntoshMA (2001) Regulatory architecture of the iron-regulatedfepD-ybdA bidirectional promoter region in Escherichia coliJ Bacteriol 183 2059ndash2070
44 ChaiS WelchTJ and CrosaJH (1998) Characterization of theinteraction between Fur and the iron transport promoter of thevirulence plasmid in Vibrio anguillarum J Biol Chem 27333841ndash33847
6776 Nucleic Acids Research 2007 Vol 35 No 20
45 EscolarL Perez-MartinJ and de LorenzoV (1998) Coordinatedrepression in vitro of the divergent fepA-fes promoters ofEscherichia coli by the iron uptake regulation (Fur) proteinJ Bacteriol 180 2579ndash2582
46 EscolarL de LorenzoV and Perez-MartinJ (1997)Metalloregulation in vitro of the aerobactin promoter of Escherichiacoli by the Fur (ferric uptake regulation) protein Mol Microbiol26 799ndash808
47 ShultzabergerRK ChenZ LewisKA and SchneiderTD (2007)Anatomy of Escherichia coli s70 promoters Nucleic Acids Res 35771ndash788 httpwwwccrnpncifcrfgov~tomspaperflexprom
48 ShultzabergerRK BucheimerRE RuddKE andSchneiderTD (2001) Anatomy of Escherichia coli ribosomebinding sites J Mol Biol 313 215ndash228 httpwwwccrnpncifcrfgov~tomspaperflexrbs
49 HiraoI KawaiG YoshizawaS NishimuraY IshidoYWatanabeK and MiuraK (1994) Most compact hairpin-turnstructure exerted by a short DNA fragment d(GCGAAGC) insolution an extraordinarily stable structure resistant to nucleasesand heat Nucleic Acids Res 22 576ndash582
50 Eick-HelmerichK and BraunV (1989) Import of biopolymers intoEscherichia coli nucleotide sequences of the exbB and exbD genesare homologous to those of the tolQ and tolR genes respectivelyJ Bacteriol 171 5117ndash5126
51 VassinovaN and KozyrevD (2000) A method for direct cloning ofFur-regulated genes identification of seven new Fur-regulated lociin Escherichia coli Microbiology 146 3171ndash3182
52 ToledanoMB KullikI TrinhF BairdPT SchneiderTD andStorzG (1994) Redox-dependent shift of OxyR-DNA contactsalong an extended DNA binding site a mechanism for differentialpromoter selection Cell 78 897ndash909
53 BlattnerFR PlunkettG III BlochCA PernaNT BurlandVRileyM Collado-VidesJ GlasnerJD RodeCK et al (1997)The complete genome sequence of Escherichia coli K-12 Science277 1453ndash1474
54 BaggA and NeilandsJB (1987) Molecular mechanism of regula-tion of siderophore-mediated iron assimilation Microbiol Rev 51509ndash518
55 LavrrarJL and McIntoshMA (2003) Architecture of a Furbinding site a comparative analysis J Bacteriol 185 2194ndash2202
56 OrchardK and MayGE (1993) An EMSA-based method fordetermining the molecular weight of a protein ndash DNA complexNucleic Acids Res 21 3335ndash3336
57 MasseE and GottesmanS (2002) A small RNA regulates theexpression of genes involved in iron metabolism in Escherichia coliProc Natl Acad Sci USA 99 4620ndash4625
58 LavrrarJL ChristoffersenCA and McIntoshMA (2002)FurndashDNA interactions at the bidirectional fepDGC-entS promoterregion in Escherichia coli J Mol Biol 322 983ndash995
59 DelanyI RappuoliR and ScarlatoV (2004) Fur functions as anactivator and as a repressor of putative virulence genes in Neisseriameningitidis Mol Microbiol 52 1081ndash1090
60 MasseE VanderpoolCK and GottesmanS (2005) Effect ofRyhB small RNA on global iron use in Escherichia coliJ Bacteriol 187 6962ndash6971
61 ArgamanL HershbergR VogelJ BejeranoG WagnerEGMargalitH and AltuviaS (2001) Novel small RNA-encodinggenes in the intergenic regions of Escherichia coli Curr Biol 11941ndash950
62 WassarmanKM RepoilaF RosenowC StorzG andGottesmanS (2001) Identification of novel small RNAs usingcomparative genomics and microarrays Genes Dev 15 1637ndash1651
63 RobisonK McGuireAM and ChurchGM (1998) A compre-hensive library of DNA-binding site matrices for 55 proteins appliedto the complete Escherichia coli K-12 genome J Mol Biol 284241ndash254
64 SalgadoH Santos-ZavaletaA Gama-CastroS Millan-ZarateDDiaz-PeredoE Sanchez-SolanoF Perez-RuedaEBonavides-MartinezC and Collado-VidesJ (2001) RegulonDB(version 32) transcriptional regulation and operon organization inEscherichia coli K-12 Nucleic Acids Res 29 72ndash74
65 AndrewsSC RobinsonAK and Rodriguez-QuinonesF (2003)Bacterial iron homeostasis FEMS Microbiol Rev 27 215ndash237
66 SchneiderTD (2002) Consensus Sequence Zen ApplBioinformatics 1 111ndash119 httpwwwccrnpncifcrfgov~tomspaperszen
67 RuddKE and SchneiderTD (1992) Compilation of E coliribosome binding sites In MillerJH (ed) A Short Course inBacterial Genetics A Laboratory Manual and Handbook forEscherichia coli and Related Bacteria Cold Spring HarborCold Spring Harbor Laboratory Press New York pp 1719ndash1745
68 HengenPN LyakhovIG StewartLE and SchneiderTD(2003) Molecular flip-flops formed by overlapping Fis sitesNucleic Acids Res 31 6663ndash6673
69 FeeJA (1991) Regulation of sod genes in Escherichia colirelevance to superoxide dismutase function Mol Microbiol 52599ndash2610
70 IgarashiK SaishoT YuguchiM and KashiwagiK (1997)Molecular mechanism of polyamine stimulation of thesynthesis of oligopeptide-binding protein J Biol Chem 2724058ndash4064
71 YoshidaM MeksuriyenD KashiwagiK KawaiG andIgarashiK (1999) Polyamine stimulation of the synthesis ofoligopeptide-binding protein (OppA) Involvement of a structuralchange of the Shine-Dalgarno sequence and the initiation codonaug in oppa mRNA J Biol Chem 274 22723ndash22728
72 LitwinCM and CalderwoodSB (1993) Role of iron in regulationof virulence genes Clin Microbiol Rev 6 137ndash149
73 MakuiH RoigE ColeST HelmannJD GrosP andCellierMF (2000) Identification of the Escherichia coli K-12Nramp orthologue (MntH) as a selective divalent metal iontransporter Mol Microbiol 35 1065ndash1078
74 PatzerSI and HantkeK (2001) Dual repression by Fe(2+)-Furand Mn(2+)-MntR of the mntH gene encoding an NRAMP-likeMn(2+) transporter in Escherichia coli J Bacteriol 1834806ndash4813
75 FurrerJL SandersDN Hook-BarnardIG and McIntoshMA(2002) Export of the siderophore enterobactin in Escherichia coliinvolvement of a 43 kDa membrane exporter Mol Microbiol 441225ndash1234
76 CunninghamL GruerMJ and GuestJR (1997) Transcriptionalregulation of the aconitase genes (acnA and acnB) of Escherichiacoli Microbiology 143 ( Pt 12) 3795ndash3805
77 FuentesAM Diaz-MejiaJJ Maldonado-RodriguezR andAmabile-CuevasCF (2001) Differential activities of theSoxR protein of Escherichia coli SoxS is not required for geneactivation under iron deprivation FEMS Microbiol Lett 201271ndash275
78 VargheseS TangY and ImlayJA (2003) Contrasting sensitivitiesof Escherichia coli aconitases A and B to oxidation and irondepletion J Bacteriol 185 221ndash230
79 ZavitzKH DiGateRJ and MariansKJ (1991) The PriB andPriC replication proteins of Escherichia coli Genes DNAsequence overexpression and purification J Biol Chem 26613988ndash13995
80 SandlerSJ (2005) Requirements for replication restart proteinsduring constitutive stable DNA replication in Escherichia coli K-12Genetics 169 1799ndash1806
81 SchneiderTD (2001) Strong minor groove base conservation insequence logos implies DNA distortion or base flipping duringreplication and transcription initiation Nucleic Acids Res 294881ndash4891 httpwwwccrnpncifcrfgov~tomspaperbaseflip
Nucleic Acids Research 2007 Vol 35 No 20 6777
We correlated our whole genome scan with publishedmicroarray data about Fur in E coli (1360) The resultsshowed that all operons that are involved in ironmetabolism have a predicted strong Fur binding cluster(higher than 13 bits) suggesting direct repression ofthese genes by Fur In contrast only a few of otherFur-activated or repressed operons have a predicted Furcluster higher than 10 bits suggesting indirect control ofthese genes by Fur It has been shown that RNA generyhB directly targets and represses a set of other genes andryhB itself is directly repressed by Fur thus ryhB playsan important role in indirect activation of genes by Furin E coli (135760)
It has been suggested that in N meningitidis Fur canalso directly activate genes by binding to the regionupstream of the promoters (59) However our wholegenome analysis did not support this activation mechan-ism in E coli The E coli ryhB gene is located betweengenes yhhX and yhhY No corresponding homologs of thegenes ryhB yhhX or yhhY can be found in N meningitidisThis suggests that Fur activation mechanisms may bedifferent in different species
Disentangling Fur models
Each binding site that we have studied in detail withinformation theory has had unique properties andchallenges for analysis For example ribosome bindingsites were initially modeled as rigid objects (16) but whenenough sites became available it was clear that theShinendashDalgarno sequence was being lost because it has avariable distance to the initiation codon (67) and a flexiblemodel was required (48) The flexible model was latersuccessfully applied to s70 promoters (47) and that wasused to study Fur sites Likewise information theoryanalysis revealed that Fis binding sites are commonlyfound to overlap each other (68) leading to someuncertainty as to how much nearby sites influence thesites being studied in the alignments Overlapping Fursites that are bound by Fur protein dimers create an evenmore complex situation
Although the 24-site M9 model was quite successful themodel is comprised of information from multiple Fur sitesbecause the Fur dimers overlap on the DNA Since theclusters frequently display 6-base separation of sitespositions 3 to thorn9 of the model contain data frompositions 9 to thorn3 of the adjacent Fur sites eg see thelogo of Figure 4 and the M9 walkers of Figure 2AIn addition information from sites separated by 3 basesand the complements of all of these also contribute to theoverall information content of the model It is not obvioushow to disentangle these effects We suggest that the M9model can be scanned across the genome to locate strongsingle-dimer binding sites which could then be experi-mentally confirmed and used to construct a disentangledmodel
Unfortunately the distribution of the individual infor-mation in these sites would be determined by the modelused to locate them and this could introduce biases Forexample the mean of the distribution can be altered justby selecting different subsets of sites from the distribution
A similar bias has occurred in binding site databaseswhere the consensus sequence was used to locate newsites (23) Thus although the sequence logo would nolonger contain self-overlapping data it would not neces-sarily represent the natural distribution of site strengthsTo approach such an ultimate model may require furtherscanning of footprinted regions Incorporation of theweaker sites in the footprints would of course re-entanglethe model so such scans might be best used only todetermine whether the model can detect the weaker sitesA viable solution may be to obtain all single-dimerbinding sites in the genome that have more than zero bitsand can be confirmed experimentally If the resultingmodel matches the observed footprints it may be consi-dered complete On the other hand if the resultingmodel does not match the footprints the mode of bindingby Fur could be different between single dimers andclusters
SUPPLEMENTARY DATA
Supplementary data are available at NAR Online
ACKNOWLEDGEMENTS
We thank Tom OrsquoHalloran and Caryn Outten for Furprotein Pete Rogan for helping to develop the local bestconcept Elaine Bucheimer for writing the localbestprogram and Lakshmanan Iyer Brent Jewett DanielleNeedle Xiao Ma Shu Ouyang Denise Rubens andBruce Shapiro for their helpful comments and discussionsKAL also thanks the National Cancer Institute atFrederick for sponsoring the Werner H Kirsten StudentIntern Program This publication has been funded in partwith Federal funds from the National Cancer InstituteNational Institutes of Health under contract NO1-CO-12400 The content of this publication does not necessarilyreflect the views or policies of the Department of Healthand Human Services nor does mention of trade namescommercial products or organizations imply endorsementby the US Government This research was supported inpart by the Intramural Research Program of the NIHNational Cancer Institute Center for Cancer ResearchFunding to pay the Open Access publication charges forthis article was provided by National Cancer Institute
Conflict of interest statement None declared
REFERENCES
1 StojiljkovicI BaumlerAJ and HantkeK (1994) Fur regulon ingram-negative bacteria Identification and characterization of newiron-regulated Escherichia coli genes by a fur titration assay[published erratum appears in J Mol Biol 1994 Jul 15240(3)271]J Mol Biol 236 531ndash545
2 Le CamE FrechonD BarrayM FourcadeA and DelainE(1994) Observation of binding and polymerization of Fur repressoronto operator-containing DNA with electron and atomic forcemicroscopes Proc Natl Acad Sci USA 91 11816ndash11820
3 CalderwoodSB and MekalanosJJ (1987) Iron regulation ofShiga-like toxin expression in Escherichia coli is mediated by thefur locus J Bacteriol 169 4759ndash4764
Nucleic Acids Research 2007 Vol 35 No 20 6775
4 NiederhofferEC NaranjoCM BradleyKL and FeeJA (1990)Control of Escherichia coli superoxide dismutase (sodA and sodB)genes by the ferric uptake regulation (fur) locus J Bacteriol 1721930ndash1938
5 EscolarL Perez-MartinJ and de LorenzoV (1998) Binding of theFur (Ferric Uptake Regulator) repressor of Escherichia coli toarrays of the GATAAT sequence J Mol Biol 283 537ndash547
6 BaggA and NeilandsJB (1987) Ferric uptake regulation proteinacts as a repressor employing iron (II) as a cofactor to bind theoperator of an iron transport operon in Escherichia coliBiochemistry 26 5471ndash5477
7 de LorenzoV WeeS HerreroM and NeilandsJB (1987)Operator sequences of the aerobactin operon of plasmid ColV-K30binding the ferric uptake regulation (fur) repressor J Bacteriol169 2624ndash2630
8 AlthausEW OuttenCE OlsonKE CaoH andOrsquoHalloranTV (1999) The ferric uptake regulation (Fur) repressoris a zinc metalloprotein Biochemistry 38 6559ndash6569
9 PohlE HallerJC MijovilovichA Meyer-KlauckeWGarmanE and VasilML (2003) Architecture of a protein centralto iron homeostasis crystal structure and spectroscopic analysis ofthe ferric uptake regulator Mol Microbiol 47 903ndash915
10 KammlerM SchonC and HantkeK (1993) Characterization ofthe ferrous iron uptake system of Escherichia coli J Bacteriol 1756212ndash6219
11 DesaiPJ AngererA and GencoCA (1996) Analysis of Furbinding to operator sequences within the Neisseria gonorrhoeae fbpApromoter J Bacteriol 178 5020ndash5023
12 HeidrichC HantkeK BierbaumG and SahlHG (1996)Identification and analysis of a gene encoding a Fur-likeprotein of Staphylococcus epidermidis FEMS Microbiol Lett 140253ndash259
13 McHughJP Rodriguez-QuinonesF Abdul-TehraniHSvistunenkoDA PooleRK CooperCE and AndrewsSC(2003) Global iron-dependent gene regulation in Escherichia coliA new mechanism for iron homeostasis J Biol Chem 27829478ndash29486
14 ShannonCE (1948) A mathematical theory of communicationBell Syst Tech J 27 379ndash423 623ndash656 httpcmbell-labscomcmmswhatshannondaypaperhtml
15 PierceJR (1980) An Introduction to Information Theory SymbolsSignals and Noise 2nd edn Dover Publications Inc New York
16 SchneiderTD StormoGD GoldL and EhrenfeuchtA (1986)Information content of binding sites on nucleotide sequencesJ Mol Biol 188 415ndash431 httpwwwccrnpncifcrfgov~tomspaperschneider1986
17 SchneiderTD and MastronardeD (1996) Fast multiple alignmentof ungapped DNA sequences using information theory and arelaxation method Discrete Appli Math 71 259ndash268 httpwwwccrnpncifcrfgov~tomspapermalign
18 SchneiderTD and StephensRM (1990) Sequence logos a newway to display consensus sequences Nucleic Acids Res 186097ndash6100 httpwwwccrnpncifcrfgov~tomspaperlogopaper
19 SchneiderTD (1997) Information content of individual geneticsequences J Theor Biol 189 427ndash441 httpwwwccrnpncifcrfgov~tomspaperri
20 SchneiderTD (1997) Sequence walkers a graphical method todisplay how binding proteins interact with DNA or RNAsequences Nucleic Acids Res 25 4408ndash4415 httpwwwccrnpncifcrfgov~tomspaperwalker erratum NAR 26(4)1135 1998
21 PaninaEM MironovAA and GelfandMS (2001) Comparativeanalysis of FUR regulons in gamma-proteobacteria Nucleic AcidsRes 29 5195ndash5206
22 SchneiderTD (2000) Evolution of biological informationNucleic Acids Res 28 2794ndash2799 httpwwwccrnpncifcrfgov~tomspaperev
23 VyhlidalCA RoganPK and LeederJS (2004) Development andrefinement of pregnane X receptor (PXR) DNA binding site modelusing information theory insights into PXR-mediated gene regula-tion J Biol Chem 279 46779ndash46786
24 ZhengM DoanB SchneiderTD and StorzG (1999) OxyR andSoxRS regulation of fur J Bacteriol 181 4639ndash4643 httpwwwccrnpncifcrfgov~tomspaperoxyrfur
25 HengenPN BartramSL StewartLE and SchneiderTD (1997)Information analysis of Fis binding sites Nucleic Acids Res 254994ndash5002 httpwwwccrnpncifcrfgov~tomspaperfisinfo
26 WoodTI GriffithKL FawcettWP Jair K-W SchneiderTDand WolfRE (1999) Interdependence of the position andorientation of SoxS binding sites in the transcriptional activation ofthe class I subset of Escherichia coli superoxide-inducible promotersMol Microbiol 34 414ndash430
27 ZhengM WangX DoanB LewisKA SchneiderTD andStorzG (2001) Computation-directed identification of OxyR-DNAbinding sites in Escherichia coli J Bacteriol 183 4571ndash4579
28 RoganPK FauxBM and SchneiderTD (1998) Informationanalysis of human splice site mutations Hum Mutat 12 153ndash171Erratum in Hum Mutat 199913(1)82 httpwwwccrnpncifcrfgov~tomspaperrfs
29 ChenZ and SchneiderTD (2005) Information theory basedT7-like promoter models classification of bacteriophages anddifferential evolution of promoters and their polymerasesNucleic Acids Res 33 6172ndash6187 httpwwwccrnpncifcrfgov~tomspaperst7like
30 ChenZ and SchneiderTD (2006) Comparative analysis of tandemT7-like promoter containing regions in enterobacterial genomesreveals a novel group of genetic islands Nucleic Acids Res 341133ndash1147 httpwwwccrnpncifcrfgov~tomspaperst7island
31 SchneiderTD (1996) Reading of DNA sequence logos predictionof major groove binding by information theory Meth Enzymol274 445ndash455 httpwwwccrnpncifcrfgov~tomspaperoxyr
32 GriggsDW and KoniskyJ (1989) Mechanism for iron-regulatedtranscription of the Escherichia coli cir gene metal-dependentbinding of Fur protein to the promoters J Bacteriol 1711048ndash1052
33 AngererA and BraunV (1998) Iron regulates transcription of theEscherichia coli ferric citrate transport genes directly and throughthe transcription initiation proteins Arch Microbiol 169 483ndash490
34 de LorenzoV HerreroM GiovanniniF and NeilandsJB (1988)Fur (ferric uptake regulation) protein and CAP (catabolite-activatorprotein) modulate transcription of fur gene in Escherichia coliEur J Biochem 173 537ndash546
35 HuntMD PettisGS and McIntoshMA (1994) Promoter andoperator determinants for fur-mediated iron regulation in thebidirectional fepA-fes control region of the Escherichia colienterobactin gene system J Bacteriol 176 3944ndash3955
36 TardatB and TouatiD (1993) Iron and oxygen regulation ofEscherichia coli MnSOD expression competition between the globalregulators Fur and ArcA for binding to DNA Mol Microbiol 953ndash63
37 EscolarL Perez-MartinJ and de LorenzoV (2000) Evidence ofan unusually long operator for the fur repressor in the aerobactinpromoter of Escherichia coli J Biol Chem 275 24709ndash24714
38 YoungGM and PostleK (1994) Repression of tonB transcriptionduring anaerobic growth requires Fur binding at the promoterand a second factor binding upstream Mol Microbiol 11943ndash954
39 BrickmanTJ OzenbergerBA and McIntoshMA (1990)Regulation of divergent transcription from the iron-responsivefepB-entC promoter-operator regions in Escherichia coliJ Mol Biol 212 669ndash682
40 BaichooN and HelmannJD (2002) Recognition of DNA by Fura reinterpretation of the Fur box consensus sequence J Bacteriol184 5826ndash5832
41 OchsnerUA and VasilML (1996) Gene repression by the ferricuptake regulator in Pseudomonas aeruginosa cycle selection of iron-regulated genes Proc Natl Acad Sci USA 93 4409ndash4414
42 BaichooN WangT YeR and HelmannJD (2002) Globalanalysis of the Bacillus subtilis Fur regulon and the iron starvationstimulon Mol Microbiol 45 1613ndash1629
43 ChristoffersenCA BrickmanTJ Hook-BarnardI andMcIntoshMA (2001) Regulatory architecture of the iron-regulatedfepD-ybdA bidirectional promoter region in Escherichia coliJ Bacteriol 183 2059ndash2070
44 ChaiS WelchTJ and CrosaJH (1998) Characterization of theinteraction between Fur and the iron transport promoter of thevirulence plasmid in Vibrio anguillarum J Biol Chem 27333841ndash33847
6776 Nucleic Acids Research 2007 Vol 35 No 20
45 EscolarL Perez-MartinJ and de LorenzoV (1998) Coordinatedrepression in vitro of the divergent fepA-fes promoters ofEscherichia coli by the iron uptake regulation (Fur) proteinJ Bacteriol 180 2579ndash2582
46 EscolarL de LorenzoV and Perez-MartinJ (1997)Metalloregulation in vitro of the aerobactin promoter of Escherichiacoli by the Fur (ferric uptake regulation) protein Mol Microbiol26 799ndash808
47 ShultzabergerRK ChenZ LewisKA and SchneiderTD (2007)Anatomy of Escherichia coli s70 promoters Nucleic Acids Res 35771ndash788 httpwwwccrnpncifcrfgov~tomspaperflexprom
48 ShultzabergerRK BucheimerRE RuddKE andSchneiderTD (2001) Anatomy of Escherichia coli ribosomebinding sites J Mol Biol 313 215ndash228 httpwwwccrnpncifcrfgov~tomspaperflexrbs
49 HiraoI KawaiG YoshizawaS NishimuraY IshidoYWatanabeK and MiuraK (1994) Most compact hairpin-turnstructure exerted by a short DNA fragment d(GCGAAGC) insolution an extraordinarily stable structure resistant to nucleasesand heat Nucleic Acids Res 22 576ndash582
50 Eick-HelmerichK and BraunV (1989) Import of biopolymers intoEscherichia coli nucleotide sequences of the exbB and exbD genesare homologous to those of the tolQ and tolR genes respectivelyJ Bacteriol 171 5117ndash5126
51 VassinovaN and KozyrevD (2000) A method for direct cloning ofFur-regulated genes identification of seven new Fur-regulated lociin Escherichia coli Microbiology 146 3171ndash3182
52 ToledanoMB KullikI TrinhF BairdPT SchneiderTD andStorzG (1994) Redox-dependent shift of OxyR-DNA contactsalong an extended DNA binding site a mechanism for differentialpromoter selection Cell 78 897ndash909
53 BlattnerFR PlunkettG III BlochCA PernaNT BurlandVRileyM Collado-VidesJ GlasnerJD RodeCK et al (1997)The complete genome sequence of Escherichia coli K-12 Science277 1453ndash1474
54 BaggA and NeilandsJB (1987) Molecular mechanism of regula-tion of siderophore-mediated iron assimilation Microbiol Rev 51509ndash518
55 LavrrarJL and McIntoshMA (2003) Architecture of a Furbinding site a comparative analysis J Bacteriol 185 2194ndash2202
56 OrchardK and MayGE (1993) An EMSA-based method fordetermining the molecular weight of a protein ndash DNA complexNucleic Acids Res 21 3335ndash3336
57 MasseE and GottesmanS (2002) A small RNA regulates theexpression of genes involved in iron metabolism in Escherichia coliProc Natl Acad Sci USA 99 4620ndash4625
58 LavrrarJL ChristoffersenCA and McIntoshMA (2002)FurndashDNA interactions at the bidirectional fepDGC-entS promoterregion in Escherichia coli J Mol Biol 322 983ndash995
59 DelanyI RappuoliR and ScarlatoV (2004) Fur functions as anactivator and as a repressor of putative virulence genes in Neisseriameningitidis Mol Microbiol 52 1081ndash1090
60 MasseE VanderpoolCK and GottesmanS (2005) Effect ofRyhB small RNA on global iron use in Escherichia coliJ Bacteriol 187 6962ndash6971
61 ArgamanL HershbergR VogelJ BejeranoG WagnerEGMargalitH and AltuviaS (2001) Novel small RNA-encodinggenes in the intergenic regions of Escherichia coli Curr Biol 11941ndash950
62 WassarmanKM RepoilaF RosenowC StorzG andGottesmanS (2001) Identification of novel small RNAs usingcomparative genomics and microarrays Genes Dev 15 1637ndash1651
63 RobisonK McGuireAM and ChurchGM (1998) A compre-hensive library of DNA-binding site matrices for 55 proteins appliedto the complete Escherichia coli K-12 genome J Mol Biol 284241ndash254
64 SalgadoH Santos-ZavaletaA Gama-CastroS Millan-ZarateDDiaz-PeredoE Sanchez-SolanoF Perez-RuedaEBonavides-MartinezC and Collado-VidesJ (2001) RegulonDB(version 32) transcriptional regulation and operon organization inEscherichia coli K-12 Nucleic Acids Res 29 72ndash74
65 AndrewsSC RobinsonAK and Rodriguez-QuinonesF (2003)Bacterial iron homeostasis FEMS Microbiol Rev 27 215ndash237
66 SchneiderTD (2002) Consensus Sequence Zen ApplBioinformatics 1 111ndash119 httpwwwccrnpncifcrfgov~tomspaperszen
67 RuddKE and SchneiderTD (1992) Compilation of E coliribosome binding sites In MillerJH (ed) A Short Course inBacterial Genetics A Laboratory Manual and Handbook forEscherichia coli and Related Bacteria Cold Spring HarborCold Spring Harbor Laboratory Press New York pp 1719ndash1745
68 HengenPN LyakhovIG StewartLE and SchneiderTD(2003) Molecular flip-flops formed by overlapping Fis sitesNucleic Acids Res 31 6663ndash6673
69 FeeJA (1991) Regulation of sod genes in Escherichia colirelevance to superoxide dismutase function Mol Microbiol 52599ndash2610
70 IgarashiK SaishoT YuguchiM and KashiwagiK (1997)Molecular mechanism of polyamine stimulation of thesynthesis of oligopeptide-binding protein J Biol Chem 2724058ndash4064
71 YoshidaM MeksuriyenD KashiwagiK KawaiG andIgarashiK (1999) Polyamine stimulation of the synthesis ofoligopeptide-binding protein (OppA) Involvement of a structuralchange of the Shine-Dalgarno sequence and the initiation codonaug in oppa mRNA J Biol Chem 274 22723ndash22728
72 LitwinCM and CalderwoodSB (1993) Role of iron in regulationof virulence genes Clin Microbiol Rev 6 137ndash149
73 MakuiH RoigE ColeST HelmannJD GrosP andCellierMF (2000) Identification of the Escherichia coli K-12Nramp orthologue (MntH) as a selective divalent metal iontransporter Mol Microbiol 35 1065ndash1078
74 PatzerSI and HantkeK (2001) Dual repression by Fe(2+)-Furand Mn(2+)-MntR of the mntH gene encoding an NRAMP-likeMn(2+) transporter in Escherichia coli J Bacteriol 1834806ndash4813
75 FurrerJL SandersDN Hook-BarnardIG and McIntoshMA(2002) Export of the siderophore enterobactin in Escherichia coliinvolvement of a 43 kDa membrane exporter Mol Microbiol 441225ndash1234
76 CunninghamL GruerMJ and GuestJR (1997) Transcriptionalregulation of the aconitase genes (acnA and acnB) of Escherichiacoli Microbiology 143 ( Pt 12) 3795ndash3805
77 FuentesAM Diaz-MejiaJJ Maldonado-RodriguezR andAmabile-CuevasCF (2001) Differential activities of theSoxR protein of Escherichia coli SoxS is not required for geneactivation under iron deprivation FEMS Microbiol Lett 201271ndash275
78 VargheseS TangY and ImlayJA (2003) Contrasting sensitivitiesof Escherichia coli aconitases A and B to oxidation and irondepletion J Bacteriol 185 221ndash230
79 ZavitzKH DiGateRJ and MariansKJ (1991) The PriB andPriC replication proteins of Escherichia coli Genes DNAsequence overexpression and purification J Biol Chem 26613988ndash13995
80 SandlerSJ (2005) Requirements for replication restart proteinsduring constitutive stable DNA replication in Escherichia coli K-12Genetics 169 1799ndash1806
81 SchneiderTD (2001) Strong minor groove base conservation insequence logos implies DNA distortion or base flipping duringreplication and transcription initiation Nucleic Acids Res 294881ndash4891 httpwwwccrnpncifcrfgov~tomspaperbaseflip
Nucleic Acids Research 2007 Vol 35 No 20 6777
4 NiederhofferEC NaranjoCM BradleyKL and FeeJA (1990)Control of Escherichia coli superoxide dismutase (sodA and sodB)genes by the ferric uptake regulation (fur) locus J Bacteriol 1721930ndash1938
5 EscolarL Perez-MartinJ and de LorenzoV (1998) Binding of theFur (Ferric Uptake Regulator) repressor of Escherichia coli toarrays of the GATAAT sequence J Mol Biol 283 537ndash547
6 BaggA and NeilandsJB (1987) Ferric uptake regulation proteinacts as a repressor employing iron (II) as a cofactor to bind theoperator of an iron transport operon in Escherichia coliBiochemistry 26 5471ndash5477
7 de LorenzoV WeeS HerreroM and NeilandsJB (1987)Operator sequences of the aerobactin operon of plasmid ColV-K30binding the ferric uptake regulation (fur) repressor J Bacteriol169 2624ndash2630
8 AlthausEW OuttenCE OlsonKE CaoH andOrsquoHalloranTV (1999) The ferric uptake regulation (Fur) repressoris a zinc metalloprotein Biochemistry 38 6559ndash6569
9 PohlE HallerJC MijovilovichA Meyer-KlauckeWGarmanE and VasilML (2003) Architecture of a protein centralto iron homeostasis crystal structure and spectroscopic analysis ofthe ferric uptake regulator Mol Microbiol 47 903ndash915
10 KammlerM SchonC and HantkeK (1993) Characterization ofthe ferrous iron uptake system of Escherichia coli J Bacteriol 1756212ndash6219
11 DesaiPJ AngererA and GencoCA (1996) Analysis of Furbinding to operator sequences within the Neisseria gonorrhoeae fbpApromoter J Bacteriol 178 5020ndash5023
12 HeidrichC HantkeK BierbaumG and SahlHG (1996)Identification and analysis of a gene encoding a Fur-likeprotein of Staphylococcus epidermidis FEMS Microbiol Lett 140253ndash259
13 McHughJP Rodriguez-QuinonesF Abdul-TehraniHSvistunenkoDA PooleRK CooperCE and AndrewsSC(2003) Global iron-dependent gene regulation in Escherichia coliA new mechanism for iron homeostasis J Biol Chem 27829478ndash29486
14 ShannonCE (1948) A mathematical theory of communicationBell Syst Tech J 27 379ndash423 623ndash656 httpcmbell-labscomcmmswhatshannondaypaperhtml
15 PierceJR (1980) An Introduction to Information Theory SymbolsSignals and Noise 2nd edn Dover Publications Inc New York
16 SchneiderTD StormoGD GoldL and EhrenfeuchtA (1986)Information content of binding sites on nucleotide sequencesJ Mol Biol 188 415ndash431 httpwwwccrnpncifcrfgov~tomspaperschneider1986
17 SchneiderTD and MastronardeD (1996) Fast multiple alignmentof ungapped DNA sequences using information theory and arelaxation method Discrete Appli Math 71 259ndash268 httpwwwccrnpncifcrfgov~tomspapermalign
18 SchneiderTD and StephensRM (1990) Sequence logos a newway to display consensus sequences Nucleic Acids Res 186097ndash6100 httpwwwccrnpncifcrfgov~tomspaperlogopaper
19 SchneiderTD (1997) Information content of individual geneticsequences J Theor Biol 189 427ndash441 httpwwwccrnpncifcrfgov~tomspaperri
20 SchneiderTD (1997) Sequence walkers a graphical method todisplay how binding proteins interact with DNA or RNAsequences Nucleic Acids Res 25 4408ndash4415 httpwwwccrnpncifcrfgov~tomspaperwalker erratum NAR 26(4)1135 1998
21 PaninaEM MironovAA and GelfandMS (2001) Comparativeanalysis of FUR regulons in gamma-proteobacteria Nucleic AcidsRes 29 5195ndash5206
22 SchneiderTD (2000) Evolution of biological informationNucleic Acids Res 28 2794ndash2799 httpwwwccrnpncifcrfgov~tomspaperev
23 VyhlidalCA RoganPK and LeederJS (2004) Development andrefinement of pregnane X receptor (PXR) DNA binding site modelusing information theory insights into PXR-mediated gene regula-tion J Biol Chem 279 46779ndash46786
24 ZhengM DoanB SchneiderTD and StorzG (1999) OxyR andSoxRS regulation of fur J Bacteriol 181 4639ndash4643 httpwwwccrnpncifcrfgov~tomspaperoxyrfur
25 HengenPN BartramSL StewartLE and SchneiderTD (1997)Information analysis of Fis binding sites Nucleic Acids Res 254994ndash5002 httpwwwccrnpncifcrfgov~tomspaperfisinfo
26 WoodTI GriffithKL FawcettWP Jair K-W SchneiderTDand WolfRE (1999) Interdependence of the position andorientation of SoxS binding sites in the transcriptional activation ofthe class I subset of Escherichia coli superoxide-inducible promotersMol Microbiol 34 414ndash430
27 ZhengM WangX DoanB LewisKA SchneiderTD andStorzG (2001) Computation-directed identification of OxyR-DNAbinding sites in Escherichia coli J Bacteriol 183 4571ndash4579
28 RoganPK FauxBM and SchneiderTD (1998) Informationanalysis of human splice site mutations Hum Mutat 12 153ndash171Erratum in Hum Mutat 199913(1)82 httpwwwccrnpncifcrfgov~tomspaperrfs
29 ChenZ and SchneiderTD (2005) Information theory basedT7-like promoter models classification of bacteriophages anddifferential evolution of promoters and their polymerasesNucleic Acids Res 33 6172ndash6187 httpwwwccrnpncifcrfgov~tomspaperst7like
30 ChenZ and SchneiderTD (2006) Comparative analysis of tandemT7-like promoter containing regions in enterobacterial genomesreveals a novel group of genetic islands Nucleic Acids Res 341133ndash1147 httpwwwccrnpncifcrfgov~tomspaperst7island
31 SchneiderTD (1996) Reading of DNA sequence logos predictionof major groove binding by information theory Meth Enzymol274 445ndash455 httpwwwccrnpncifcrfgov~tomspaperoxyr
32 GriggsDW and KoniskyJ (1989) Mechanism for iron-regulatedtranscription of the Escherichia coli cir gene metal-dependentbinding of Fur protein to the promoters J Bacteriol 1711048ndash1052
33 AngererA and BraunV (1998) Iron regulates transcription of theEscherichia coli ferric citrate transport genes directly and throughthe transcription initiation proteins Arch Microbiol 169 483ndash490
34 de LorenzoV HerreroM GiovanniniF and NeilandsJB (1988)Fur (ferric uptake regulation) protein and CAP (catabolite-activatorprotein) modulate transcription of fur gene in Escherichia coliEur J Biochem 173 537ndash546
35 HuntMD PettisGS and McIntoshMA (1994) Promoter andoperator determinants for fur-mediated iron regulation in thebidirectional fepA-fes control region of the Escherichia colienterobactin gene system J Bacteriol 176 3944ndash3955
36 TardatB and TouatiD (1993) Iron and oxygen regulation ofEscherichia coli MnSOD expression competition between the globalregulators Fur and ArcA for binding to DNA Mol Microbiol 953ndash63
37 EscolarL Perez-MartinJ and de LorenzoV (2000) Evidence ofan unusually long operator for the fur repressor in the aerobactinpromoter of Escherichia coli J Biol Chem 275 24709ndash24714
38 YoungGM and PostleK (1994) Repression of tonB transcriptionduring anaerobic growth requires Fur binding at the promoterand a second factor binding upstream Mol Microbiol 11943ndash954
39 BrickmanTJ OzenbergerBA and McIntoshMA (1990)Regulation of divergent transcription from the iron-responsivefepB-entC promoter-operator regions in Escherichia coliJ Mol Biol 212 669ndash682
40 BaichooN and HelmannJD (2002) Recognition of DNA by Fura reinterpretation of the Fur box consensus sequence J Bacteriol184 5826ndash5832
41 OchsnerUA and VasilML (1996) Gene repression by the ferricuptake regulator in Pseudomonas aeruginosa cycle selection of iron-regulated genes Proc Natl Acad Sci USA 93 4409ndash4414
42 BaichooN WangT YeR and HelmannJD (2002) Globalanalysis of the Bacillus subtilis Fur regulon and the iron starvationstimulon Mol Microbiol 45 1613ndash1629
43 ChristoffersenCA BrickmanTJ Hook-BarnardI andMcIntoshMA (2001) Regulatory architecture of the iron-regulatedfepD-ybdA bidirectional promoter region in Escherichia coliJ Bacteriol 183 2059ndash2070
44 ChaiS WelchTJ and CrosaJH (1998) Characterization of theinteraction between Fur and the iron transport promoter of thevirulence plasmid in Vibrio anguillarum J Biol Chem 27333841ndash33847
6776 Nucleic Acids Research 2007 Vol 35 No 20
45 EscolarL Perez-MartinJ and de LorenzoV (1998) Coordinatedrepression in vitro of the divergent fepA-fes promoters ofEscherichia coli by the iron uptake regulation (Fur) proteinJ Bacteriol 180 2579ndash2582
46 EscolarL de LorenzoV and Perez-MartinJ (1997)Metalloregulation in vitro of the aerobactin promoter of Escherichiacoli by the Fur (ferric uptake regulation) protein Mol Microbiol26 799ndash808
47 ShultzabergerRK ChenZ LewisKA and SchneiderTD (2007)Anatomy of Escherichia coli s70 promoters Nucleic Acids Res 35771ndash788 httpwwwccrnpncifcrfgov~tomspaperflexprom
48 ShultzabergerRK BucheimerRE RuddKE andSchneiderTD (2001) Anatomy of Escherichia coli ribosomebinding sites J Mol Biol 313 215ndash228 httpwwwccrnpncifcrfgov~tomspaperflexrbs
49 HiraoI KawaiG YoshizawaS NishimuraY IshidoYWatanabeK and MiuraK (1994) Most compact hairpin-turnstructure exerted by a short DNA fragment d(GCGAAGC) insolution an extraordinarily stable structure resistant to nucleasesand heat Nucleic Acids Res 22 576ndash582
50 Eick-HelmerichK and BraunV (1989) Import of biopolymers intoEscherichia coli nucleotide sequences of the exbB and exbD genesare homologous to those of the tolQ and tolR genes respectivelyJ Bacteriol 171 5117ndash5126
51 VassinovaN and KozyrevD (2000) A method for direct cloning ofFur-regulated genes identification of seven new Fur-regulated lociin Escherichia coli Microbiology 146 3171ndash3182
52 ToledanoMB KullikI TrinhF BairdPT SchneiderTD andStorzG (1994) Redox-dependent shift of OxyR-DNA contactsalong an extended DNA binding site a mechanism for differentialpromoter selection Cell 78 897ndash909
53 BlattnerFR PlunkettG III BlochCA PernaNT BurlandVRileyM Collado-VidesJ GlasnerJD RodeCK et al (1997)The complete genome sequence of Escherichia coli K-12 Science277 1453ndash1474
54 BaggA and NeilandsJB (1987) Molecular mechanism of regula-tion of siderophore-mediated iron assimilation Microbiol Rev 51509ndash518
55 LavrrarJL and McIntoshMA (2003) Architecture of a Furbinding site a comparative analysis J Bacteriol 185 2194ndash2202
56 OrchardK and MayGE (1993) An EMSA-based method fordetermining the molecular weight of a protein ndash DNA complexNucleic Acids Res 21 3335ndash3336
57 MasseE and GottesmanS (2002) A small RNA regulates theexpression of genes involved in iron metabolism in Escherichia coliProc Natl Acad Sci USA 99 4620ndash4625
58 LavrrarJL ChristoffersenCA and McIntoshMA (2002)FurndashDNA interactions at the bidirectional fepDGC-entS promoterregion in Escherichia coli J Mol Biol 322 983ndash995
59 DelanyI RappuoliR and ScarlatoV (2004) Fur functions as anactivator and as a repressor of putative virulence genes in Neisseriameningitidis Mol Microbiol 52 1081ndash1090
60 MasseE VanderpoolCK and GottesmanS (2005) Effect ofRyhB small RNA on global iron use in Escherichia coliJ Bacteriol 187 6962ndash6971
61 ArgamanL HershbergR VogelJ BejeranoG WagnerEGMargalitH and AltuviaS (2001) Novel small RNA-encodinggenes in the intergenic regions of Escherichia coli Curr Biol 11941ndash950
62 WassarmanKM RepoilaF RosenowC StorzG andGottesmanS (2001) Identification of novel small RNAs usingcomparative genomics and microarrays Genes Dev 15 1637ndash1651
63 RobisonK McGuireAM and ChurchGM (1998) A compre-hensive library of DNA-binding site matrices for 55 proteins appliedto the complete Escherichia coli K-12 genome J Mol Biol 284241ndash254
64 SalgadoH Santos-ZavaletaA Gama-CastroS Millan-ZarateDDiaz-PeredoE Sanchez-SolanoF Perez-RuedaEBonavides-MartinezC and Collado-VidesJ (2001) RegulonDB(version 32) transcriptional regulation and operon organization inEscherichia coli K-12 Nucleic Acids Res 29 72ndash74
65 AndrewsSC RobinsonAK and Rodriguez-QuinonesF (2003)Bacterial iron homeostasis FEMS Microbiol Rev 27 215ndash237
66 SchneiderTD (2002) Consensus Sequence Zen ApplBioinformatics 1 111ndash119 httpwwwccrnpncifcrfgov~tomspaperszen
67 RuddKE and SchneiderTD (1992) Compilation of E coliribosome binding sites In MillerJH (ed) A Short Course inBacterial Genetics A Laboratory Manual and Handbook forEscherichia coli and Related Bacteria Cold Spring HarborCold Spring Harbor Laboratory Press New York pp 1719ndash1745
68 HengenPN LyakhovIG StewartLE and SchneiderTD(2003) Molecular flip-flops formed by overlapping Fis sitesNucleic Acids Res 31 6663ndash6673
69 FeeJA (1991) Regulation of sod genes in Escherichia colirelevance to superoxide dismutase function Mol Microbiol 52599ndash2610
70 IgarashiK SaishoT YuguchiM and KashiwagiK (1997)Molecular mechanism of polyamine stimulation of thesynthesis of oligopeptide-binding protein J Biol Chem 2724058ndash4064
71 YoshidaM MeksuriyenD KashiwagiK KawaiG andIgarashiK (1999) Polyamine stimulation of the synthesis ofoligopeptide-binding protein (OppA) Involvement of a structuralchange of the Shine-Dalgarno sequence and the initiation codonaug in oppa mRNA J Biol Chem 274 22723ndash22728
72 LitwinCM and CalderwoodSB (1993) Role of iron in regulationof virulence genes Clin Microbiol Rev 6 137ndash149
73 MakuiH RoigE ColeST HelmannJD GrosP andCellierMF (2000) Identification of the Escherichia coli K-12Nramp orthologue (MntH) as a selective divalent metal iontransporter Mol Microbiol 35 1065ndash1078
74 PatzerSI and HantkeK (2001) Dual repression by Fe(2+)-Furand Mn(2+)-MntR of the mntH gene encoding an NRAMP-likeMn(2+) transporter in Escherichia coli J Bacteriol 1834806ndash4813
75 FurrerJL SandersDN Hook-BarnardIG and McIntoshMA(2002) Export of the siderophore enterobactin in Escherichia coliinvolvement of a 43 kDa membrane exporter Mol Microbiol 441225ndash1234
76 CunninghamL GruerMJ and GuestJR (1997) Transcriptionalregulation of the aconitase genes (acnA and acnB) of Escherichiacoli Microbiology 143 ( Pt 12) 3795ndash3805
77 FuentesAM Diaz-MejiaJJ Maldonado-RodriguezR andAmabile-CuevasCF (2001) Differential activities of theSoxR protein of Escherichia coli SoxS is not required for geneactivation under iron deprivation FEMS Microbiol Lett 201271ndash275
78 VargheseS TangY and ImlayJA (2003) Contrasting sensitivitiesof Escherichia coli aconitases A and B to oxidation and irondepletion J Bacteriol 185 221ndash230
79 ZavitzKH DiGateRJ and MariansKJ (1991) The PriB andPriC replication proteins of Escherichia coli Genes DNAsequence overexpression and purification J Biol Chem 26613988ndash13995
80 SandlerSJ (2005) Requirements for replication restart proteinsduring constitutive stable DNA replication in Escherichia coli K-12Genetics 169 1799ndash1806
81 SchneiderTD (2001) Strong minor groove base conservation insequence logos implies DNA distortion or base flipping duringreplication and transcription initiation Nucleic Acids Res 294881ndash4891 httpwwwccrnpncifcrfgov~tomspaperbaseflip
Nucleic Acids Research 2007 Vol 35 No 20 6777
45 EscolarL Perez-MartinJ and de LorenzoV (1998) Coordinatedrepression in vitro of the divergent fepA-fes promoters ofEscherichia coli by the iron uptake regulation (Fur) proteinJ Bacteriol 180 2579ndash2582
46 EscolarL de LorenzoV and Perez-MartinJ (1997)Metalloregulation in vitro of the aerobactin promoter of Escherichiacoli by the Fur (ferric uptake regulation) protein Mol Microbiol26 799ndash808
47 ShultzabergerRK ChenZ LewisKA and SchneiderTD (2007)Anatomy of Escherichia coli s70 promoters Nucleic Acids Res 35771ndash788 httpwwwccrnpncifcrfgov~tomspaperflexprom
48 ShultzabergerRK BucheimerRE RuddKE andSchneiderTD (2001) Anatomy of Escherichia coli ribosomebinding sites J Mol Biol 313 215ndash228 httpwwwccrnpncifcrfgov~tomspaperflexrbs
49 HiraoI KawaiG YoshizawaS NishimuraY IshidoYWatanabeK and MiuraK (1994) Most compact hairpin-turnstructure exerted by a short DNA fragment d(GCGAAGC) insolution an extraordinarily stable structure resistant to nucleasesand heat Nucleic Acids Res 22 576ndash582
50 Eick-HelmerichK and BraunV (1989) Import of biopolymers intoEscherichia coli nucleotide sequences of the exbB and exbD genesare homologous to those of the tolQ and tolR genes respectivelyJ Bacteriol 171 5117ndash5126
51 VassinovaN and KozyrevD (2000) A method for direct cloning ofFur-regulated genes identification of seven new Fur-regulated lociin Escherichia coli Microbiology 146 3171ndash3182
52 ToledanoMB KullikI TrinhF BairdPT SchneiderTD andStorzG (1994) Redox-dependent shift of OxyR-DNA contactsalong an extended DNA binding site a mechanism for differentialpromoter selection Cell 78 897ndash909
53 BlattnerFR PlunkettG III BlochCA PernaNT BurlandVRileyM Collado-VidesJ GlasnerJD RodeCK et al (1997)The complete genome sequence of Escherichia coli K-12 Science277 1453ndash1474
54 BaggA and NeilandsJB (1987) Molecular mechanism of regula-tion of siderophore-mediated iron assimilation Microbiol Rev 51509ndash518
55 LavrrarJL and McIntoshMA (2003) Architecture of a Furbinding site a comparative analysis J Bacteriol 185 2194ndash2202
56 OrchardK and MayGE (1993) An EMSA-based method fordetermining the molecular weight of a protein ndash DNA complexNucleic Acids Res 21 3335ndash3336
57 MasseE and GottesmanS (2002) A small RNA regulates theexpression of genes involved in iron metabolism in Escherichia coliProc Natl Acad Sci USA 99 4620ndash4625
58 LavrrarJL ChristoffersenCA and McIntoshMA (2002)FurndashDNA interactions at the bidirectional fepDGC-entS promoterregion in Escherichia coli J Mol Biol 322 983ndash995
59 DelanyI RappuoliR and ScarlatoV (2004) Fur functions as anactivator and as a repressor of putative virulence genes in Neisseriameningitidis Mol Microbiol 52 1081ndash1090
60 MasseE VanderpoolCK and GottesmanS (2005) Effect ofRyhB small RNA on global iron use in Escherichia coliJ Bacteriol 187 6962ndash6971
61 ArgamanL HershbergR VogelJ BejeranoG WagnerEGMargalitH and AltuviaS (2001) Novel small RNA-encodinggenes in the intergenic regions of Escherichia coli Curr Biol 11941ndash950
62 WassarmanKM RepoilaF RosenowC StorzG andGottesmanS (2001) Identification of novel small RNAs usingcomparative genomics and microarrays Genes Dev 15 1637ndash1651
63 RobisonK McGuireAM and ChurchGM (1998) A compre-hensive library of DNA-binding site matrices for 55 proteins appliedto the complete Escherichia coli K-12 genome J Mol Biol 284241ndash254
64 SalgadoH Santos-ZavaletaA Gama-CastroS Millan-ZarateDDiaz-PeredoE Sanchez-SolanoF Perez-RuedaEBonavides-MartinezC and Collado-VidesJ (2001) RegulonDB(version 32) transcriptional regulation and operon organization inEscherichia coli K-12 Nucleic Acids Res 29 72ndash74
65 AndrewsSC RobinsonAK and Rodriguez-QuinonesF (2003)Bacterial iron homeostasis FEMS Microbiol Rev 27 215ndash237
66 SchneiderTD (2002) Consensus Sequence Zen ApplBioinformatics 1 111ndash119 httpwwwccrnpncifcrfgov~tomspaperszen
67 RuddKE and SchneiderTD (1992) Compilation of E coliribosome binding sites In MillerJH (ed) A Short Course inBacterial Genetics A Laboratory Manual and Handbook forEscherichia coli and Related Bacteria Cold Spring HarborCold Spring Harbor Laboratory Press New York pp 1719ndash1745
68 HengenPN LyakhovIG StewartLE and SchneiderTD(2003) Molecular flip-flops formed by overlapping Fis sitesNucleic Acids Res 31 6663ndash6673
69 FeeJA (1991) Regulation of sod genes in Escherichia colirelevance to superoxide dismutase function Mol Microbiol 52599ndash2610
70 IgarashiK SaishoT YuguchiM and KashiwagiK (1997)Molecular mechanism of polyamine stimulation of thesynthesis of oligopeptide-binding protein J Biol Chem 2724058ndash4064
71 YoshidaM MeksuriyenD KashiwagiK KawaiG andIgarashiK (1999) Polyamine stimulation of the synthesis ofoligopeptide-binding protein (OppA) Involvement of a structuralchange of the Shine-Dalgarno sequence and the initiation codonaug in oppa mRNA J Biol Chem 274 22723ndash22728
72 LitwinCM and CalderwoodSB (1993) Role of iron in regulationof virulence genes Clin Microbiol Rev 6 137ndash149
73 MakuiH RoigE ColeST HelmannJD GrosP andCellierMF (2000) Identification of the Escherichia coli K-12Nramp orthologue (MntH) as a selective divalent metal iontransporter Mol Microbiol 35 1065ndash1078
74 PatzerSI and HantkeK (2001) Dual repression by Fe(2+)-Furand Mn(2+)-MntR of the mntH gene encoding an NRAMP-likeMn(2+) transporter in Escherichia coli J Bacteriol 1834806ndash4813
75 FurrerJL SandersDN Hook-BarnardIG and McIntoshMA(2002) Export of the siderophore enterobactin in Escherichia coliinvolvement of a 43 kDa membrane exporter Mol Microbiol 441225ndash1234
76 CunninghamL GruerMJ and GuestJR (1997) Transcriptionalregulation of the aconitase genes (acnA and acnB) of Escherichiacoli Microbiology 143 ( Pt 12) 3795ndash3805
77 FuentesAM Diaz-MejiaJJ Maldonado-RodriguezR andAmabile-CuevasCF (2001) Differential activities of theSoxR protein of Escherichia coli SoxS is not required for geneactivation under iron deprivation FEMS Microbiol Lett 201271ndash275
78 VargheseS TangY and ImlayJA (2003) Contrasting sensitivitiesof Escherichia coli aconitases A and B to oxidation and irondepletion J Bacteriol 185 221ndash230
79 ZavitzKH DiGateRJ and MariansKJ (1991) The PriB andPriC replication proteins of Escherichia coli Genes DNAsequence overexpression and purification J Biol Chem 26613988ndash13995
80 SandlerSJ (2005) Requirements for replication restart proteinsduring constitutive stable DNA replication in Escherichia coli K-12Genetics 169 1799ndash1806
81 SchneiderTD (2001) Strong minor groove base conservation insequence logos implies DNA distortion or base flipping duringreplication and transcription initiation Nucleic Acids Res 294881ndash4891 httpwwwccrnpncifcrfgov~tomspaperbaseflip
Nucleic Acids Research 2007 Vol 35 No 20 6777