9
Eur. J. Blochem. 135, 519-527 (1983) CJ FEBS 1983 Nucleotide sequence of the lipoamide dehydrogenase gene of Escherichia coli K12 Paul E. STEPHENS, Hilary M. LEWIS, Mark G. DARLISON, and John R. GUEST Department of Microbiology, University of Sheffield (Received March 14IMay 20, 1983) ~ EJR 83 0244 The nucleotide sequence of a 1980-base-pair segment or DNA, containing the lpd gene encoding the lipo- amide dehydrogenase component (E3) of the pyruvate dehydrogenase complex of Escherichia coli K 12, has been determined by the dideoxy chain-termination method. The Ipd structural gene comprises 1419 base pairs (473 codons, excluding the initiating AUG codon). It is preceded by a good promoter and an excellent ribosome binding site and it ends with a typical rho-independent terminator sequence. The results confirm that the lpd gene is an independent gene linked to, but not part of, the ace operon that encodes the El and E2 components of the pyruvate dehydrogenase complex. The location and transcriptional polarity of the lpd gene relative to the restriction map of the corresponding region of DNA, are completely consistent with previous genetic and post-infection labelling studies. The composition, M, (50554 or 51 274 if the FAD cofactor is included), amino- terminal sequence and carboxy-terminal sequence predicted from the nucleotide sequence are in excellent agree- ment with previous studies on the purified enzyme. The enzyme also exhibits a remarkable degree of sequence homology with peptides of the pig heart enzyme and with other pyridine nucleotide disulphide oxidoreductases whose sequences have been defined : human erythrocyte glutathione reductase and plasmid-encoded mercuric reductase. Lipoamide dehydrngenase is the flavoprotein component (E3) of the pyruvate and 2-oxoglutarate dehydrogenase multienzyme complexes [I - 31. These complexes catalyse the oxidative decarboxylation of pyruvate and 2-oxoglutarate with the formation of acetyl-CoA and succinyl-CoA, respec- tively : Pyruvate + NAD+ + CoA + Acetyl-CoA + CO, + NADH + H ' 2-Oxoglutarate + NAD' + CoA -+ Succinyl-CoA + CO, + NADH + H+ . In Escheridzia coif the complexes contain multiple copies of three types of subunit: the pyruvate or 2-oxoglutaratc de- hydrogenase (El), the dihydrolipoamide acetyltransferase or succinyltransferase (E2) and lipoamide dehydrogenase (E3). The lipoamide dehydrogenase component is a dimer con- taining identical subunits of M, = 52000-59000 [4-71. It catalyses the reoxidation of the dihydrolipoyl groups bound by amide linkage to lysine residues or the acyltransferases. These groups are reduced during the oxidative decarboxyla- tion of 2-ox0 acids to acyl-CoA, and their reoxidation enables the cycle of reactions to continue: R-CO-COOH + El-TPP-t El-TPP-CHOH-R + C02 El-TPP-CHOH-R + E2-Lips, + CoA + El-TPP + R-CO-CoA + E2-Lip(SH), E3 E2-Lip(SH)2 + NAD+ + E2-LipS2 + NADH + Hi where TPP is thiamin pyrophosphate. Enzymes. Lipoamide dehydrogenase (EC I .8.1.4); dihydrolipoamide acetyltransferase (EC 2.3.1.12); dihydrolipoamide succinyltransferase (EC 2.3.1.61); pyruvate dehydrogenase (EC 1.2.4.1); 2-oxoglutarate dehydrogenase (EC 1.2.4.2); glutathione reductase (EC 1.6.4.2); mercuric reductase (EC 1.6.4.-); restriction endonuclcases: Awl (EC 3.1.23.47), BnmHI (EC 3.1.23.6), BrlI (EC 3.1.23.8), EcoRI (EC 3.1.23.13), Hind111 (EC 3.1.23.21), MspI (EC 3.1.23.24), Suu3A (EC 3.1.23.27), TqI (EC 3.1.23.39). Lipoamide dehydrogenases have been isolated from many sources, both prokaryotic and eukaryotic : they are remarkably resistant to heat inactivation and proteolysis, and they each contain a flavin (FAD) coenzyme and an active disulphide bond. They belong to a family of pyridine nucleotide oxido- reductases that includes glutathione reductase and thio- redoxin reductase [3]. In E. coli genetic studies with mutants deficient in lipo- amide dehydrogenase have established that the E3 com- ponents of the pyruvate and 2-oxoglutarate complexes are encoded by a single gene, lpd [8 - lo]. This confirmed earlier findings that the E3 components of the two complexes are identical with respect to various physical, en7ymatic and immunochemical criteria [l I], although without the genetic evidence, the existence of two similar and functionally inter- changeable enzymes contributing to a common pool of components remained a formal possibility. The lipoamide dehydrogenase gene (lpq is located at 2.6 min in the E. coli linkage map, very close to the accEand aceFgenes that encode the respective dehydrogenase (El) and acetyltransferase (E2) components of the pyruvate dehydrogenase complex [8- 10, 12,131. The ace genes constitute an operon with aceE3 polarity and, although the lpd gene is situated at the distal end of the ace operon, studies with ace deletion mutants [I41 and with llpd transducing phages [lS] have shown that the lpd gene can be expressed from its own promoter situated between the uceF and lpd genes (Fig. 1). The existence of a single gene for lipoamide dehydrogenase raises the interesting question of how its expression is regulated in order to supply components for assembly into the two complexes, which in turn appear to be independently regulated. A simple model for coupling expression of the ace and suc operons to lpd ex- pression, in which uncomplexed lipoamide dehydrogenase functions as a repressor of the lpd gene, has been proposed The ace-lpd region of the E. cdi chromosome has been cloned in phage and plasmid vectors and the approximate positions of the ace and lpd genes have been defined [I 5 - 171. ~2, 141.

Nucleotide sequence of the lipoamide dehydrogenase gene of Escherichia coli K12

Embed Size (px)

Citation preview

Eur. J . Blochem. 135, 519-527 (1983) CJ FEBS 1983

Nucleotide sequence of the lipoamide dehydrogenase gene of Escherichia coli K12 Paul E. STEPHENS, Hilary M. LEWIS, Mark G. DARLISON, and John R. GUEST

Department of Microbiology, University of Sheffield

(Received March 14IMay 20, 1983) ~ EJR 83 0244

The nucleotide sequence of a 1980-base-pair segment or DNA, containing the lpd gene encoding the lipo- amide dehydrogenase component (E3) of the pyruvate dehydrogenase complex of Escherichia coli K 12, has been determined by the dideoxy chain-termination method. The Ipd structural gene comprises 1419 base pairs (473 codons, excluding the initiating AUG codon). It is preceded by a good promoter and an excellent ribosome binding site and it ends with a typical rho-independent terminator sequence. The results confirm that the lpd gene is an independent gene linked to, but not part of, the ace operon that encodes the E l and E2 components of the pyruvate dehydrogenase complex. The location and transcriptional polarity of the lpd gene relative to the restriction map of the corresponding region of DNA, are completely consistent with previous genetic and post-infection labelling studies. The composition, M , (50554 or 51 274 if the FAD cofactor is included), amino- terminal sequence and carboxy-terminal sequence predicted from the nucleotide sequence are in excellent agree- ment with previous studies on the purified enzyme. The enzyme also exhibits a remarkable degree of sequence homology with peptides of the pig heart enzyme and with other pyridine nucleotide disulphide oxidoreductases whose sequences have been defined : human erythrocyte glutathione reductase and plasmid-encoded mercuric reductase.

Lipoamide dehydrngenase is the flavoprotein component (E3) of the pyruvate and 2-oxoglutarate dehydrogenase multienzyme complexes [ I - 31. These complexes catalyse the oxidative decarboxylation of pyruvate and 2-oxoglutarate with the formation of acetyl-CoA and succinyl-CoA, respec- tively : Pyruvate + NAD+ + CoA +

Acetyl-CoA + CO, + NADH + H ' 2-Oxoglutarate + NAD' + CoA -+

Succinyl-CoA + CO, + NADH + H + .

In Escheridzia coif the complexes contain multiple copies of three types of subunit: the pyruvate or 2-oxoglutaratc de- hydrogenase (El), the dihydrolipoamide acetyltransferase or succinyltransferase (E2) and lipoamide dehydrogenase (E3). The lipoamide dehydrogenase component is a dimer con- taining identical subunits of M , = 52000-59000 [4-71. It catalyses the reoxidation of the dihydrolipoyl groups bound by amide linkage to lysine residues or the acyltransferases. These groups are reduced during the oxidative decarboxyla- tion of 2-ox0 acids to acyl-CoA, and their reoxidation enables the cycle of reactions to continue:

R-CO-COOH + El-TPP-t El-TPP-CHOH-R + C02

El-TPP-CHOH-R + E2-Lips, + CoA + El-TPP + R-CO-CoA + E2-Lip(SH),

E3 E2-Lip(SH)2 + NAD+ + E2-LipS2 + NADH + H i

where TPP is thiamin pyrophosphate.

Enzymes. Lipoamide dehydrogenase (EC I .8.1.4); dihydrolipoamide acetyltransferase (EC 2.3.1.12); dihydrolipoamide succinyltransferase (EC 2.3.1.61); pyruvate dehydrogenase (EC 1.2.4.1); 2-oxoglutarate dehydrogenase (EC 1.2.4.2); glutathione reductase (EC 1.6.4.2); mercuric reductase (EC 1.6.4.-); restriction endonuclcases: A w l (EC 3.1.23.47), BnmHI (EC 3.1.23.6), BrlI (EC 3.1.23.8), EcoRI (EC 3.1.23.13), Hind111 (EC 3.1.23.21), MspI (EC 3.1.23.24), Suu3A (EC 3.1.23.27), T q I (EC 3.1.23.39).

Lipoamide dehydrogenases have been isolated from many sources, both prokaryotic and eukaryotic : they are remarkably resistant to heat inactivation and proteolysis, and they each contain a flavin (FAD) coenzyme and an active disulphide bond. They belong to a family of pyridine nucleotide oxido- reductases that includes glutathione reductase and thio- redoxin reductase [3].

In E. coli genetic studies with mutants deficient in lipo- amide dehydrogenase have established that the E3 com- ponents of the pyruvate and 2-oxoglutarate complexes are encoded by a single gene, lpd [8 - lo]. This confirmed earlier findings that the E3 components of the two complexes are identical with respect to various physical, en7ymatic and immunochemical criteria [l I], although without the genetic evidence, the existence of two similar and functionally inter- changeable enzymes contributing to a common pool of components remained a formal possibility. The lipoamide dehydrogenase gene ( l p q is located at 2.6 min in the E. coli linkage map, very close to the accEand aceFgenes that encode the respective dehydrogenase (El ) and acetyltransferase (E2) components of the pyruvate dehydrogenase complex [8- 10, 12,131. The ace genes constitute an operon with aceE3 polarity and, although the lpd gene is situated at the distal end of the ace operon, studies with ace deletion mutants [I41 and with llpd transducing phages [lS] have shown that the lpd gene can be expressed from its own promoter situated between the uceF and lpd genes (Fig. 1). The existence of a single gene for lipoamide dehydrogenase raises the interesting question of how its expression is regulated in order to supply components for assembly into the two complexes, which in turn appear to be independently regulated. A simple model for coupling expression of the ace and suc operons to lpd ex- pression, in which uncomplexed lipoamide dehydrogenase functions as a repressor of the lpd gene, has been proposed

The ace-lpd region of the E. c d i chromosome has been cloned in phage and plasmid vectors and the approximate positions of the ace and lpd genes have been defined [I 5 - 171.

~ 2 , 141.

I IOI I I

M M M M M M 5600 6000 6L00 6800 7200 7600 7800

Fig. 1 . Organization and expression of the pyruvatr dvh,~rln~gc,nu.rt.7u.~i~ c~omples ~ e n ~ s of E. cob rind .sunzniary o f riucleolide .sequc.ncc~ darn ohtained ,from M I 3 clones. A scale diagram of the segment of the E. coli linkage map at 2.6 min containing the uwE, aceF and Ipd genes and an unidentified gene (geneA) is shown aligned with the restriction map. The si7es and positions of the genes are derived from the nucleotide sequences [18, 191 or the M, of the gene products. The left to right orientation corresponds to clockwise in the E. col i linkage map and thc positions of the promoters for the uce and Ipd genes are indicated. The restriction targets for Hind111 (H) . EcoRI (R) are defined by subscripts according to Guest et al. 1171 and the sizes of the fundamental fragments are shown. The nucleotide coordinates correspond to thc number oi‘basc pairs svarting at the first base of the Arc1 target (Acs) designated X above [17-191. The expanded section shows the relevant Bcll sites (B) and the restriction targets identified by ‘shot-gun‘ cloning (S. Suu3A; T, TaqI; M, MspT): the arrows show the positions and extents of D N A sequence obtained from the M I 3 clones. The sequence is fully overlapped and most of it was obtained from both DNA strands

An overall strategy for sequencing a 9900-base-pair segment of DNA containing these genes has been devised [18] and the primary structures of the pyruvate dehydrogenase and di- hydrolipoamide acetyltransferase components have already been deduced from the nucleotidc sequence of a 5760-base- pair segment containing the UCPL and aceF genes [18,19]. This paper reports the nucleotidc sequence of a further segment of 1980 base pairs, which contains the Ipd structural gene. The primary structure of lipoamide dehydrogenase has been deduced and preliminary structural comparisons made with other Ravoproteins and pyridine nucleotide di- sulphide oxidoreductases.

MATERIALS AND METHODS

Sources o j D N A

The fundamental 5400-base-pair HindIII-EcoRI fragment (H4-R3 ; Fig. I), which contains the whole of the Ipd gene, was obtained from plasmid pGS20 [17] for ‘shot-gun’ cloning in M13 vectors. Some sequence was also obtained using DNA from plasmid pGS41 [17], which contains the 8300-base-pair fragment Rl-R3 (Fig. 1). This plasmid was isolated from Escherichia coli strain GM242, a durn-3 r e c A l mutant [20], to make the DNA susceptible to digestion with BclI. Plasmids were prepared as described previously [I 71 and restriction fragments were separated by electrophoresis in agarose gels, extracted by electroelution [21] and purified by DEAE- cellulose (DE52) chromatography [22].

Cloning in M I 3

The overall strategy for sequencing the ace and 1pcl genes has been described previously [18]. For the kid gene, the

5400-base-pair fragment (H4-R3, Fig. I ) was digested with three restriction enzymes for ‘shot-gun’ cloning into appro- priate M13 vectors: TuyI and MspT fragments were cloned into the AccT site of M13mp701 (a derivative of M13mp7, D. Bentley, unpublished observations) and Sau3A fragments were cloned into the BamHI site of MI3niWJ43 [23]. In addition BclI and Ben-Hind111 fragments of pGS41 were cloned into the BamHI and Hind111 sites of M13mp9 [24]. Transfection of E. coli JMlOl [.l(luc-pro) supE rlii/F traD36 p ~ o A B ImIq ZAM1.51 was performed according to published procedures [25] , as were clone reversal and hybridization analysis [26].

Nudeotide Sequence Anulj-sis

Single-stranded M13 DNA templates were prepared and sequcnced by the dideoxy chain-termination method using a 17-nucleotide synthetic primer [25,27,28]. All the clones were screened initially by ‘A-tracking‘, to avoid generating redundant data, and the nucleotide sequences were compiled and analysed using the Stadcn computer programs [29-33],

Material,,

Thc sources of all the materials have been described previously [I 8,191.

RESULTS AND DISCUSSION

Tlicl Nuc.leo tide Srqiteizce

The organization of the genes encoding the pyruvate dehydrogenase complex of Esch~richiu coli is shown in Fig. 1. The positions of the ac.e and 111.‘ genes relative to the physical

521

map of the region are based on genetic studies with re- combinant lambda phages and sequence analysis of the aceE and uceF genes [16-191. The overall strategy adopted for sequencing the ace and lpd genes has involved a combination of ‘shot-gun’ and directed cloning of segments of the 9900- base-pair region of bacterial DNA illustrated in Fig. 1. The lipoamide dehydrogenase gene (lpd) is located within the 5400-base-pair HindIII-EcoRI fragment (H4-R3) and a summary of most of the MI3 clones used to sequence the gene is given in Fig. 1. The clones used to confirm the sequence from positions 61 60 to 6409 were obtained from pGS41 after digestion at the Hind111 (H4) and at the two BclI sites (po- sitions 6278 and 6404; Fig. l). The complete and unambiguous sequence of a 2100-base-pair region containing the /pd structural gene is presented in Fig. 2. All of the sequence was obtaincd from at least two independent clones; it was fully overlapped and 75‘;;) (98;; of the coding region) was derived from both DNA strands. The sequence extcnds previous data by 1980 base pairs; the distal end of the ucrF gene and part of the intergenic region between aceF and /pd have already been published [19]. There remains a con- siderable amount of unedited sequence from a large col- lection of MI 3 clones containing segments of the 21 60-base- pair region that extends from the end of the sequence shown in Fig. 2 to the terminal EcoRI site (R3 in Fig. 1).

Loc,ation of Codiiig Regions The coding regions were identified using the computer

program FRAMESCAN 1321. Only one large open reading frame of 1425 base pairs (positions 5998-7422) was found. It exhibits a consistently high score with respect to preferred codon usage and occurs in the DNA strand previously identified as the bid coding strand (15,161. No significant stretches of open reading frame were found in thc com- plementary strand.

Feutiires ?j’ tlw Nuclmtidc S t q u t w e

The lpd coding region is marked by two AUG initiation codons (positions 5998-6003; Fig.2). It is preceded by a potential ribosome binding site, d(T-A-T-A-G-A-G-G-T) (po- sitions 5988 - 5996) that has five consecutive bases complemen- tary to the 3’-terminal sequence of 16s ribosomal RNA [33, 341. The very close proximity of this sequence to the first AUG codon indicates that the second AUG codon is functional in initiating translation. Furthermore, the sequence around the second AUG codon is essentially consistent with the rules of, Stormo et al. (351 and the observations of Atkins [36]; the first upstream stop codon being the opal codon (UGA) that overlaps the two AUG codons. The DNA sequence flanking the first AUG does not conform to these rules. Thus it is predicted that the Ipd structural gene extends for a total of 474 codons, from position 6001 to 7422, where it is terininatcd by an ochre (UAA) codon (Fig.2).

The best putative promoter for the lpd gene is almost 200 base pairs upstream of the translation initiation site (Fig. 2). I t comprises a Pribnow box (positions 5794-5799) and a cluster of potential RNA polymerasc recognition sites ( - 35 regions), the best of these being at positions 5769 - 5780 (371. There is also an excellent - 35 region within the promoter region just defined : the sequence d(T-G-T-T-A-A€-A-A-T- T-T) (positions 5780- 5791) is almost identical to the canon- ical sequence [37], but it is presumed to be non-functional because there is no accompanying Pribnow box.

The nucleotide sequence flanking the translational stop codon (UAA) contains three regions of hyphenated dyad symmetry (Fig. 2 and 3). The first of these includes codons for the five carboxy-terminal residues of lipoamide dehydro- genase and places the termination codon in the loop of a stem-and-loop structure that could be formed in the mRNA transcript (positions 7409 - 7435; a in Fig. 3). However, the significance of this region of dyad symmetry is uncertain because the stem-and-loop structure would not be very stable (AG = -8.4 kJ, -2.0 kcal [38]). Free energy calculations 13x1 for the mRNA transcripts of the other regions of dyad symmetry suggest that they could form very stable stem-and- loop structures: b, positions 7434-7452 (AG = -70.2 kJ, -16.8 kcal) and c, positions 7455-7483 (AC = -89.4kJ, - 21.4 kcal), as illustrated in Fig. 3. The inore stable structure (c) has a (dG + dC)-rich sequence and a run of dT residues like many rho-independent transcription terminators (371 and this is presumably the terminator for the bdgene. The function of the less stable structure (b) is unclear but it could also participate in the process of transcription termination

Codon Usage.

The pattern of codon usage for the lpd gene (Table 1) is non-random and similar to that observed for many E. coli genes that are strongly expressed [39,40]. The pattern is very similar to those of the uceE and areF genes (18,191, and the G + C contents in the third positions of the 32 quartet codons are also similar, 56% (lpd), 54?< (aceE) and 56;; (rrceF), indicating that all three pyruvate dehydrogenase complex genes are translated at the same rate.

Tlzo Intergmic Region (aceF-lpd) and Expression of tlzc Ipd Gerze

The intergenic region between the aceF and lpd structural genes comprises 324 base pairs, excluding the translational termination and initiation codons (positions 5677 - 6000; Fig. 2). It is characterized by many regions of hyphenated or unhyphenated dyad symmetry. The most significant of these are indicated by the letters, a-e, in Fig.2. The first (a, 5679 - 5708) is thought to be the transcriptional terminator of the aceF gene [19]. Then there is a sequence of un- hyphenated dyad symmetry, d(A-A-A-A-T-T-G-T-T-A-A-C- A-A-T-T-T-T) (b, 5775- 5792), situated between the -35 and -10 regions of the sequence dcsignated as the pu- tative promotor for the lpd gene (Fig. 2); it is not clear whether this is significant or not. If the promoter has been identified correctly, the other regions of dyad symmetry (c, d and e) could form stable stem-and-loop structures in the inRNA transcript. The free energy values for these are:

AG = -86.9 kJ, -20.8 kcal;e(5941 -5951), AC = -42.6kJ, - 10.2 kcal 1381. Their significance is unknown but they could be involved in controlling expression of the lpd gene.

It is not known whether the Ipd gene is ever transcribed from the ace promoter, nor have the ace or lpd transcripts been identified by direct methods. Since it is known that the Ipdgene can be expressed independently of the ace operon, the simplest interpretation of the intergenic sequence data would be that the /pd gene is always expressed from its own promoter in an independent manner. This is supported by the presence of both a potential rho-independent terminator for the ace operon and a putative lpd promoter. The large size and complexity of the sequence upstream of the lpd structural

c (5829-5852), AG -49.3 kJ, - 11.8 kcal; d (5x92 -5922),

670 674 Asr,ThrLeSerAspIleArgArgLeuVdlM~~*** a AACACGCTGTCTGACATTCGCGTCTGGTGATGTAAGTUGAGCCGGCCCAACMXCG

5650 5660 5670 5680 5690/ \5700 GCTTTTTTCTGGTAATCTCATGAATGTATTGAGGTTATTAGCGAATAGAC~TCGGTTG

5710 5720 5730 5740 5750 5760 - b -35 -10

CCGTTTGTTGTTTAAAAATTGTTAACAATTTTGTAAAATACCGACGGATAGAACGACCCG 5770 5780 '' 5790 5800 5810 5820

C

d GCGCCAGAAT~~AGCTTACATAAGTAAGTGACTWTGAGGGCWGAAGCTAAC

5890 5 9 5 59fO"5- 5930 5940

e Met

5960 5970 5980 5990 6000 GCCGCTGCGGCCTGAAAGACGACGGGTATGACCGCCGGAGATAAATATAT-ATG

lpd I E3Component 1 0 MetSerThrGluI1eLysThrGlnValValVdlLuGl~AlaGlyProAlaG1~rSer ATGAGTACTGAAATCUCTCA~TCGTGGTACTT~AGGCCCCGCAGGTTACTCC

6010 6020 6070 6040 6050 6060

20 70 A 1 aAl aPheArgCys Al aAspLeuGlyLeuGluThrValIleValGl~r~ AsrlThr GCTGCCTTCCGTTGCGCTGATTTAGGTCTGGAMCCWAATCGTAGAACGTTACAACACC

6070 6080 6090 6100 6110 61 20

40 50 LeuGl,yGlyVal~sLe~nValGlyCysIleProSerL~sAl akuLeuHisValAl a CTTGGCGGTGTTTGCCTGAACGTCGGCTGTATCCCTTCTAAAGCACTGCTGCACGTAGCA

6130 6140 61 50 61 60 6170 6180

60 70 LysValIleGluGluAlaLysAl aLeuAl aGluHi sGlyIleValPheGlyGluFr oLys A A A G T T A T C G A A G A A G C C A A A G C G C T G G C T G A A C A C G G

61% 6200 6210 6220 6230 6240

8n 4n Thr AspIlePspLysIle ArgThrTr ~ysGi~ysVdlI1eAsriGlr~euThrGlyGly A G C G A T A T C G A C A A G A T T C G T A C C T G G A A A G A G A A A G

6250 6260 6270 6280 6290 6700

100 110 LeuAl aGlyXetAl aLysGlyArgLysValLysValValAsnGlyLeuGlyLysPheThr CTGGCTGGTATGGCGAAAGGCCGCAMGTCAAAGTGGTC~CGGTCTGGGTAAATTCACC

6710 6320 6770 6740 6350 6360

170 1 30

6370 6380 6790 6400 6410 6420

1 40 150 AlaIleIleAlaAl&lySerAr@roIleGlrJkuProPheIleProHisGluAspPro GCGATCATTGCAGCGGGTTCTCGCCCGATCCAACTGCCGTTTATTCCGCATGAAGATCCG

6430 6440 6450 6460 6470 64&C

160 1 70 ArgIleTrpAspSerThrAspAlaLeuGluLevLysGluVdlPr&l~r~~uVal CGTATCTGGGACTCCACTGACGCGCTGGAACTGAAAGAAGTACCAGAACGCCTGCTGGTA

6490 6500 6510 6520 6570 6540

1 8 n 1% . _ - M~~Gl~l~lyIleI1eGlyLeuGluMetGl~hrValTyrHisAl aLeuGlySerGln AT~TGGCGGTATCATCGCTGGAAATGGXACCGTTTACCACGCGCTGGGTTCACAG

6550 6560 6570 6580 6590 6600

200 21 0 IleAspV~ValGluMetPheAspGl~ValIleProAlaAlailspLysAspIleValLys ATTGACWGGTTGAAATGTTCGACCAGGTTATCCCGGCAGCTGACAAAGACATCGTTAAA

6610 6620 6630 6640 6650 6660

220 270 ValPheThrLysArgIleSerLysLysPheAsnLeutLeuGluIChrLysVdlThrAla GTCTTCACCAAGCGTATCAGCAAGAAATTCAACCTGATGCTGGAAACCAAAGTTACCGCC

6670 6680 6690 6700 6710 6720 240 250 ValGluAlaLysGluAspGlyIle~rVdlThrMetGluGlyLysLysAlaProAl aGlu G T T G A A G C G A A A G A A G A C G G C A T T T A T G T G A C G A T G G A A A

6770 6740 6750 6760 6770 6780 7 h0 270 ~.

%GlnArgTyrAspAl aValLeuValA1 aIleGlyArgValPr oAsr.GlyLysAsr&u CCGCAGCGTTACGACGCCGTGCTGGTAGCGATTGGTCG'l'GTGCCGMCWAAAAACCTC

6790 6800 6810 6820 6870 6840 780 2% AspAl aGlyLysAlaGlyV~GluValAsp~pAr~ly~eI1eArgValAspLysGlr. GACGCAGGCAAAGCAGGCCGGAAGTTGACGACCGTGGTTTCATCCGCGTTGACAAACAG

6850 6860 6870 6880 6890 6900 700 31 0 LeuArgThr AsriValPr oHi s I1 ePheAl a11 eGl yAs PI1 eValGl yGlr.Pr oMetLeu CTGCGTACCMCGTACCGCACATCTTTGCTATCGGCGATATCGTCGGTCAACCGATGCTG

6910 6920 6970 6940 6950 6960

320 330 A1 &isLysGlyValHi sGluGlyHi s V a l g & aGluValIleAlaGlyLysLysHi s GCACACAAAGGTGTTCACGMGGTCACGTTGCCGCTGAAGTTATCGCCGGTAAGAAACAC

6970 6980 6990 7000 7010 7020

740 350 Tyr PheAspProLysValIleProSer IleAl a'QrThrGluFr&luValAlaTrpV& TACTTCGATCCGAAAGTTATCCCGTCCATCGCCTATACCGAACCAGAAGTTGCAT~G

7030 7040 7050 7060 7070 7080

360 770 GlyLeu!Phr-GluLysGluAl aLysGl~ysGly I leSer 'Pyr GluThr Al aThrPhePro GGTCTGACTGAGAAAGAAGCGAAAGAGAAAGGCATCAGCTATGAAACCGCCACCTTCCCG

7100 71 10 71 20 71 30 7090 71 40

380 390 TrpAlaAlaSerGlyArgAlaIleAlaSerAs~sAlaAspGlyMetThrLysLeuIle TGGGCTGCTTCTGGTCGTGCTATCGCTTCCGACTGCGCAGACGGTATGACCAAGCTGATT

7150 7160 7170 7180 71% 72cO

400 41 0 Pheks~ysGluSerHisArgValIleGlyGly~aI1eVdlGl~hrAsnGlyG1yGlu TTCGACAAAGAATCTCACCGTGTGATCGGTGGTGCGATTGTCGGTACTAACGGCGGCGAG

7210 7220 7230 7240 7250 7260

420 430 L e ~ u G l ~ l ~ ~ ~ l y L e u A l a I l e G l ~ e t G l ~ s A s p ~ aCluAspIleAl dLeu CTGCTGGGTGAAATCGGCCTGGCAATCGAAATGGGTTGTGATGCTGAAGACATCGCACTG

7270 7280 7293 7700 7310 7720

440 450 Thr I l e H i sAl aIIi sProThrLeuHi sGluSerValGlyLeuAl aAl aGluValPheGlu ACCATCCACGCGCACCCGACTCTGCACGAG~CTGTGGGCCTGGCGGCAGAAGTGTTCGAA

7770 7340 7350 7360 7770 7380

460 470 477 GlySerIleThrAspLeuProAsriProLysAlaLysLysLys*** GGTAGCATTACCGACCTGCCGAACCCGAAA~GAAGAAGAAWAATTTTTCGTTTGCCGG

7390 7400 74-20 \7470 7440

AACATCCGGCAATTAAAAAAGCGGCTAACCACGCCGC'TTTTTTTACGTCTCCAATTTACC 7450 7460 ' 7470 ' 7480 7490 7500

TTTCCAGTCTTCTTGCTCCACGT'TCAGAGAGACGTTCGCATACTGCTGACCGTTGCCTCG 7510 7520 7570 7540 7550 7560

TTATTCAGCCTGACAGTATGTACTCCGTTTAGACGTTGT~GGCTCTCCTGAACT 7570 7580 7590 7600 7610 7620

TTCTCCCGAAAAACCTGACGTTGTTCAGGTGATGCCGATTGAACAGCTGGCGGGCGTTAT 7670 7 6 4 r 7 6 5 0 7660 7670 7680

CACGTTGCTGTTGATTCAGTGGGCGCTGCTGTACTTTTTCCTTAAACACCTGGCGCTGCT 7690 7700 7710 7720 7770 7740

Fig. 2. Nueleotide .sequence qfthe Ipd gene andprimary structure qflipoarnide d~,li.ydrogmasc,. The nucleotide sequence of 21 00 base pairs containing the non-coding (sense strand) of the lpd gene is shown in the 5'+3' direction. The 'd' representing deoxy and the hyphens representing phosphodiester links have been omitted. The distal end of the aceF gene plus the crceF-lpd intergenic region are included. The nucleotide coordinates have been assigned relative to the first base of the AccI site (Acs) and the data extends the previously published sequence by 1980 base pairs [18,19]. The primary structure of the 473 amino acids comprising lipoamide dehydrogenase is shown above the nucleotide sequence and the active disulphide bridge and flavin adenine binding sequences are underlined. The proposed ribosome binding site (Shine-Dalgarno sequence) for the lpd gene is boxed and the - 35 and - 10 (Pribnow) sites of the proposed lpd promoter are indicated by lines above the nucleotide sequence. Relevant stop codons are denoted by asterisks thus: ***. Regions of dyad symmetry are underlined by converging arrows; those in the aceF-lpd intergenic region are denoted by letters. a-e (see text)

523

( b ) ( c ) C

A C A A T C C G

c G C

A T C G G C G C G C A T C G A T C G A T

AG I - 1 5 . 8 k c a l A A G C AG I -21.4kcal

( a ) CCGAAAGCGAAGAAGAAGTAATTTTTCGTT ATTA TTACGTCTGCAATTTACCTTTCCAGT ProLysAlaLysLysLys***

7410 /7422 il L: 7490 7500

Fig. 3. Nucleotide sequence at the end o j the Ipd structural gene. The nucleotide sequence encoding the carboxy terminus of lipoamide dehydrogenase is redrawn to highlight significant features. The ‘d’ representing deoxy and hyphens representing phosphodiester bonds have been omitted. Two regions of hyphenated dyad symmetry (b and c) are shown as stem-and-loop structures with the free energies of the corresponding transcripts (b = - 70.2 kJ, --16.8 kcal; c = -89.4 W, -21.4 kcal) 1381. Another region of dyad symmetry (a) involving the carboxy-terminal amino acid codons is indicated by converging arrows: it has a calculated free energy of -8.2 kJ, -2.0 kcal [38]

Table 1. Codon usage i/i the lpd gene The AUG initiation codon is not included with the methionine codons

~ ~ ~~ ~

First residue Second residue ~ ~~ ~-

U C A G

U

C

A

UUU Phe 2 UCU Ser 5 UAU Tyr 3 U G U Cya 2 U U C Phe 12 UCC Ser 4 UAC Tyr 5 UGC cya 3 UUA Leu 1 UCA Ser 1 UAA Ochre 1 UGA Opal 0 U U G Leu 0 UCG Ser 0 UAG Amber 0 U G G Trp 4

CUU Leu 2 CCU Pro I CAU His 1 C G U Arg 11 CUC Leu 1 CCC PI0 2 CAC His 12 CGC Arg 4 CUA Leu 0 CCA Pro 2 CAA Gln 2 CGA Ary 0 C U G Leu 30 CCG Pro 16 CAG Gln 6 CGG Arg 0

AUU Ile 9 ACU Thi 6 AAU Aan 1 AGU Ser 1 AUC Ile 30 ACC Thr 19 AAC Asn 13 AGC Ser 3

AAA Lys 31 AGA Arg 0 AUA lle 0 ACA Thr 0 AUG Met 9 ACG Thr 1 AAG Lys 8 AGG Arg 0

G G U U Val 16 GCU Ala 14 G A U Asp 6 GGU Gly 31 G U C Val 8 GCC Ala 8 G A C Asp 19 G G C Gly 18 G U A Val 8 GCA Ala 15 GAA Glu 33 G G A Gly 0 G U G Val 13 GCG Ala 13 GAG Glu 6 G G G Gly 2

gene may thus be connected with the need to gear Ipd ex- pression to the expression of two operons (ace and .suc), which in turn are independently regulated and respond to different physiological stimuli. However, read-through tran- scription from a c t to lpd has not been excluded so the putative terminator (a) could function as an intercistronic attcnuator responding to translation of the ace region. This could mediate the disproportionate synthesis of the E3 component (demanded by some estimates of the subunit stoichiometry of the pyruvate dehydrogenase complex) and it would mean that the observed lowering of Ipd expression in ucc’ nonsense mutants could be a direct consequence of translational polarity rather than a pseudo-polar effect [2,9,12,13].

Primary Structure and Composition of lipoamide Dehydrogenase

The primary structure of the lipoamide dehydrogenase component (E3) of the pyruvate and 2-oxoglutaratc dehydro- genase complexes of E. coli is presented in Fig. 2.

It is assumed that the initiating formylmethionine i s removed post-translationally so that serine becomes the amino-terminal residue and this has bcen designated residue 1. This agrees with the amino-terminal residue reported for the lipoamide dehydrogenase of E. coli B [41]. Furthermore, a 13-residue amino-terminal sequence has recently been de- termined by automated Edman degradation for the E. coli B enzyme (C. H. Williams Jr, personal communication). This is identical to that deduced from the nucleotide sequence of the E. coli K12 gcnc except that glutamate (residue 3) is re- placed by glycine in E. coli B.

The carboxy-terminal sequence of the E3 component, derived from the nucleotide sequence is -Pro-Lys-Ala-Lys- Lys-Lys-COOH (Fig. 2). It would therefore be predicted that digestion with carboxypeptidase A and B should release lysine and alanine in the ratio 4: 1. This is what was observed by Burleigh and Williams [41] for the E. coli B enzyme although the data were interpreted in favour of the sequence -Ah-Lys-Lys-COOH. Likewise, Vogel and Henning [6] found that lysine (2 mol/mol of protein) was released and they

concluded that the carboxy terminus of the E. coli K12 en- zyme is -Lys-Lys-COOH.

The primary structure translated from the nucleotide sequence contains 473 amino acid residues (Fig. 2) that correspond to a protein of M , = 50554 (51 274 including the FAD cofactor). These M , values are somewhat lower than previous estimates, 52000 - 59 000 [4 - 71, but come closest to the values reported by Williams, 52000-53000 [3,4]. The amino acid composition derived from the nucleotide sequence

Table 2. Amino acid compo,~ition of the I~ioaniide deliydrogrnase compo- nent ( E 3 ) of tlw pjruvate de/ijdrogma.re complex of E. coli The amino acid composition derived from the nucleotide sequence is compared with the compositions obtained from the amino acid analyses of the purified protein determined by Williams et al. [4] and Vogcl and Henning [6]. FoI ease of comparison the contents oi‘cach amino acid are expressed as a percentage of the total number of amino acids. The ini- tiating methionine residue is not included in the DNA-derived com- position. n.d. = not determined

Amino No. of Composition from acid residues

from DNA sequence DNA [41 [61

Asp A m Thr Ser Glu Gln Pro Gly Ala Val Met Ilc Leu Ty r Dhe Lys His Arg CYS TIP

Total Jw

molil00 mol

25 5.2 1 14 2.9 26 5.5 14 2.9

8 1.7 21 4.4 51 10.7 50 10.5 45 9.5 9 2.3

39 8.2 34 7 2

8 1.7 14 2.9 39 8.2 13 2.7 15 3.2 5 1 .0 4 0.8

39 8.2 1

473 50554

8.5 8.6

5.6 5.4 3.1 3.6

10.1 10.1

4.3 5.0 10.7 11.1 10.7 11.3 9.3 9.4 1.9 1.9 7.9 X . 0 7.4 1.3 1 .6 1.9 3.1 3.1

2.1 2.5 3.3 3.1 1 .o n.d. n.d. n.d.

8.7 7.8

486 522 52000 56000

Table 3. A&ninr hintlina site honzoloaies

corresponds very closely to the compositions obtained previ- ously by hydrolysis of the purified protein (Table 2).

The calculated polarity of the E3 subunit is 41 “ i [42]. The protein contains several regions of relatively high hydro- phobicity (hydrophobicity indices 1.33 ~ 2.36 [43] : residues 6-22,~2-104,177-187,302-31O, 375-384,407-418and 437 -448 (Fig. 2). These include the adenine binding site (see below), reflecting the non-polar nature of the FAD coenzyme. However, the active disulphide region does not appear to be very hydrophobic even though it would be ex- pected to interact with the non-polar lipoyl residue(s) of the E2 subunit.

Thr Active Site Disulphidc Bridge and Adenine Binding Site

The amino acid sequence of the rcdox-active disulphide bridge region (residues 37 - 53; Fig. 2) is exactly as reported foi- the E3 component of the 2-oxoglutarate dehydrogenase complex of the Crookes’ strain of E. c d i [44] and the lipoamide dehydrogcnase of E. coli B [41]. This region is in fact highly conserved, not only in other lipoamide dehydrogenases, e.g. from pig heart [3] and Bacillus sterrrotli~~rrnopliilu.~ [45], but also in the glutathione reductascs from yeast [3], E. coli [46] and human erythrocytes [47] and the mercuric reductase encoded by Tn501 [48].

Arscott et al. [49] have recently pointed out the homologies in the adenine (FAD) binding sites of several flavoproteins. These are situated near the amino termini and the sequence Gly-Xaa-Gly-Xaa-Xaa-Gly-Xaa-Xaa-Xaa- Ala is highly con- served. In Table 3 the region of the lipoamide dehydrogenase of E. coli containing the adenine binding site is compared with the analogous regions of several representative flavo- proteins.

Homologics hrtwcen Disulphidc> 0sidoreductuse.s

The primary structures of two other disulphide oxido- reductases have recently been elucidated : GR, human ery- throcyte glutathione reductase [47] and MR, plasinid Tn501- encoded mercuric reductase [48]. These have been compared with the lipoamide dehydrogenase of E. co/i K22 using the proportional matching option of the interactive graphics program DrAGON [31]. This incorporates a scoring system based on MDM78, the mutation data matrix found to be the most powerful score matrix for detecting distant relationships between amino acid sequences [52]. The diagonals in the comparison matrices correspond to highly significant regions of homology and it is clear that all three sequences are cs-

The adenine binding site of the lipoamide dehydrogenase (LipDH) of E. coli is compared with several representative flavoproteins: pig heart lipoamide dehydrogenase (LipDH), glutathione reductase (GR) and lactate dehydrogcnase (LDH) [49]. and the NADH dchydrogenase (NADHDH) and fu- marate reductase (FRD) of E. coli [50,51]. The number preceding Lhc sequence is the residue number of the first amino acid shown relative to the amino terminus of the protein

Enzyme Source Sequence

LipDH II toll (4) I K T Q V V V L G A G P A G Y S A A F R C A D I G - - L E T V I V E R LipDH pig heart (5) 1 D A D V T V l G S G P G G Y V A A I K A A Q L G - - F K T V C I E K G R humnn crythrocyte (19) A S Y D Y L V I G G G S G G L A S A R R A A b L G - - A R A A V V F S LDH dogfish (19) S Y N K I T V V G V G A V G M A C A I S 1 L M K D L A D E V A L V D V

F R D L colr (3) F Q A D L A I V G A G G A G L R A A I A A A Q A N P N A K I A L I S K Conserved residues * * * * * * *

NADHDH b c olr (3) P L K K I V I V G G G A C G L E M A T Q L G H K L G R K K K A K ~ r~

525

LipDH A

1 L73

GR - o 4 L L

I R

4 z n

N

D < U

W 0 0

W

C

- i - .~

LipDH GR

1 B

1 L 7 3

MR MR

56C

C L78

FAD-1 NADPH FAD-2 in te r face I I

- d

Fig. 4. Amino acid sequence comparisons for pyridine nucleotidc disulphirie oxidoreductases. Sequence comparison matrices are shown for: (A) the E. coli lipoamide dehydrogenase (LipDH) with human erythrocyte glutathione reductase (GR); (B) LipDH with Tn501 -encoded mercuric re- ductdse (MR); and (C) GR with MR. The computer program DIAGON [31] was used and the points correspond to the midpoints of each span of 25 residues giving a score equivalent to a double matching probability of < 0.001. The domains identified in glutathione reductase are denoted by the lettered boxes: a , FAD-1 : b, NADPH; c, FAD-2; d , interface [46, 47, 491.

sentially colinear and exhibit a remarkable degree of mutual homology (Fig. 4). The homologies are not just confined to the adenine binding sites and the active disulphide bridge regions; they also occur both within and across the other domains that have been identified for glutathione reductase [46,47,49]. Mercuric reductase possesses a large 80- 90- residue amino-terminal sequence that has no equivalent in the other proteins. Nevertheless, the results indicate that this enzyme is more closely related to lipoamide dehydrogenase and glutathione reductase than the latter enzymes are to each other (Fig. 4).

Considerable sequence homology between huinan ery- throcyte glutathione reductase and nine tryptic peptides from pig heart lipoamide dehydrogenase has been reported [46,49]. In fact, the peptides (a-i) were placed in three different domains according to the homologies: a-b-c, d (FAD-I); e (NADPH); f, g (FAD-2); h, i (interface). These peptides exhibit even greater degrees of homology with sequences in the E. coli lipoamide dehydrogenase and this has permitted

a refinement of the analysis (Table 4). Thus, peptide e is better placed adjacent to peptide h in the interface (h-e, i) rather than in the NADPHiNADH domain, and peptide d is homologous to the sequence adjacent to peptide c in the FAD-I domain (a-b-c-d). The structural and functional implications of the homologies between some of the disulphide oxido- reductases will be treated more fully in a subsequent pub- lication.

The work described in this paper completes the nucleotide sequence of the three genes (aceE, uceF and 1pcO encoding the pyruvate dehydrogenase complex and an unidentified gene (genrA). Present work is aimed at (a) identifying the cor- responding mRNA transcripts and defining the regulatory mechanisms controlling the expression and coupling of the m e and lpd genes, (b) defining the function of geneA and (c) sequencing the sucA and sucB genes that encode the analogous dehydrogenase and succinyltransferase compo- nents of the 2-oxoglutarate dehydrogenase complex for comparison with the ace genes and gene products.

526

Table 4. Sequences qf trjptic peptides,from pig heart lipoumide dehydrogenase and honiologous regions of the E. coli enq'me The tryptic peptides (a - i) from the pig heart enzyme [46] are placed above the sequences in the E. coli enzyme as defined by residue numbers in Fig. 2. The domains are those defined for the analogous glutathione reductase [46,47]. The asterisk denotes that a revised sequence for peptide d is used (C. H. Williams, personal communication)

~~~ ~~ ~~

Pig heart peptide Amino acid sequence Identity Domain (position in E. coli)

a (1 - 23)

b (24-30)

C

(31-53)

d* (54-68)

f (274-289)

g (304 - 31 7)

h (345-361)

e (362-374)

i (407 - 439)

A D Q P I D A D V T V I G S G P G G Y V A A I K S T E I K T Q V V V L G A G P A G Y S A A F R

A A Q L G F K C A D L G L E

T V C I E K N E T L G G T C L N V G C I P S K T V I V E R Y N T L G G V C L N V G C I P S K

A L L N N S H Y Y H M A H G K A L L H V A K V I E E A K A L

P P T Q N L G L E E L G I E L R P N G K N L D A G K A G V E V D

I P k l A A I G D V V A G P V P H l t A 1 G D I V G Q P

C V P S V I Y T H P t V A W V G K V I P S I A Y T E P E V A W V G L

S E E Q L K E E G I E Y K T E K E A K E K G I S Y E

> 50 FAD-I

31 FAD-2

57

V L G A H I I G P G A G b M 1 N t A A L A L b Y G A S C E D I A R V I G G A I V G T N G G E L L G E I G L A I E M G C D A E D I A L 45

We are greatly indebted to F. S a a e r and A. Coulson for introducing us to the dideoxy-sequencing method and for a generous gift of primer. We also wish to thank R. Staden and I. K. Duckenfield for providing pro- grams and assisting with computing, N. L. Brown for supplying the sequence of the mercuric reductase gene prior to publication, and C. H . Williams J r for the amino-terminal sequence of E. coli B lipoamide dehydrogenase and sequences of some peptides of the pig heart enzyme. Support from the Science and Engineering Research Council by project grant GR/B35543 (J.R.G.) and studentship (P.E.S.) is gratefully acknowl- edged.

REFERENCES 1. Reed, L. J . (1974) Ace. Chen7. Rrs. 7, 40-46. 2. Guest, J. R. (1978) Adv. Nrurol. 22, 21 9 - 244. 3. Williams, C . H., Jr (1976) in The Enzymes (Boycr, P. D., ed.) vol. 13,

4. Williams, C. H., Jr, Zanetti, G., Arscott, L. D. & McAllister, J. K.

5. Perham, R. N. & Thomas, J. 0. (1971) FEBS Lett. 15, 8-12. 6. Vogel, 0. & Henning, U. (1973) Eur. J . Biochem. 35, 307-310. 7. Vogel, 0. (1977) Biochem. Biophys. Res. Commun. 74, 1235-1241. 8 . Guest, J. R. 5. Creaghan, 1. T. (1 972) Biochem. J . 130, 8P. 9. Guest, J. R.&Creaghan, I . T.(1973)J. Gen. Microhiol. 75,197-210.

10. Alwine, J. C., Russell, F. M. & Murray, K. N. (1973) J. Bucteriol. 115,

1 2 . Pettit, F. H. & Reed, L. J. (1967) Proc. Nutl Acad. Sci. USA, 58,

12. Guest, J . R . (1974) J. G m . Microhiol. 80, 523-532. 13. Guest, J. R.&Creaghan, I. T. (1974) J . Grn. Microbiol. 81,237-245. 14. Langley, D. & Guest, J . R. (1978) J . Gen. Microhiol. 106, 103-117. 15. Guest, J . R., Cole, S. T. & Jeyaseelan, K. (1981) J. Gen. Microhiol.

16. Guest, J . R. & Stephens, P. E. (1980) J . Gen. Microbial. 121, 277-

17. Guest, J. R., Roberts, R. E. & Stephens, P. E. (1983) J . Gm. Micro-

pp. 89- 173, Academic Press, New York.

(1967) J . Biol. Chem. 242, 5226-5231.

1-8.

1126- 1130.

127,65 - 79.

292.

hid. 129, 671 -680.

18

19

20

21

22 23 24 25

26 27

28

29 30 31 32

33

34

35

Stephens. P. E.. Darlisou, M . G., Lewis. H. M. &Guest , J . R. (1983)

Stephens, P. E., Darlison, M. G., Lewis, H. M. & Guest, J. R. (1983)

Marinus, M. G . & Morris, N. R. (1973) J . Bucreriol. 114, 1143-

McDonell, M. W., Simon, M. N. & Studicr, F. W. (1977) J . Mol.

Smith, H. 0. (1980) Methods Enzymol. 65, 371 - 380. Rothstein, R. & Wu, R. (1981) Gene, 15, 167-176. Messing, J . & Vieira, J . (1982) Gene, 19, 269-276. Sanger, F. , Coulson, A. R., Barrel], B. G. , Smith, A. J. H. & Roe,

WinLer, G. & Fields, S. (1 980) Nucleic Acid. Re.s. 8, 1965 - 1974. Sanger. F., Nicklen, S. & boulson, A. R. (1977) Proc. Nutl h a d . Sci.

Duckworth, M. L., Gait, M. J., Goelet, P., Hong, G . F.. Singh, M.

Staden, R. (1979) Nucleic Acid. Res. 6, 2601 -2610. Staden, R. (1980) Nucleic Acid. Res. 8, 3673 - 3694. Staden, R. (1982) Nucleic Acid. Res. 10, 2951 -2961. Staden, R. & McLachlan, A. D. (1982) Nucleic Acid. Ras. 10, 141 -

Shine, J . & Dalgarno, L. (1974) Pro(,. Null Acad. Sci. CiSA. 71.

Gold, L., Pribnow, D., Schneider, T. Shinedling, S., Singer, B. S. &

Stormo, G. D., Schneider, T. D . & Gold, L. (1982) Nucleic Acid.

Eur. J . Biochem. 133, 155- 162.

Eur. J . Biochem. 133, 481 - 489.

1 150.

Biol. 110, 119-146.

B. A. (1980) J . Mol. B i d . 143, 161 - 178.

U S A , 74, 5463 - 5467.

& Titmas, R. C. (1981) Nucleic Acid. Res. 9, 1691 - 1706.

156.

1342- 1346.

Stormo, G. D. (1981) Annu. Rev. Microhiol. 35, 365-403.

Re.5. 10. 2971 -2996. 36. Atkins, J. F. (1979) Nucleic Acid. Res. 7, 1035-1041. 37. Rosenberg, M. & Court, D. (1979) Annu. Rev. Genet. 13, 319-353. 38. Tinoco, J., Jr, Borer, P. N., Dengler, B., Levine, M. D., Uhlenbeck,

0. C., Crothers, D. M. & Gralla, J. (1973) Nuf . New, Biol. 246,

39. Grantham, R., Gautier, C., Gouy, M., Jacobzone, M. & Mercier,

40. Grosjean, J . & Fiers, W. (1982) Gene, 18, 199-2200.

40-41.

R. (1981) Nucleic. Acid. Res. 9, r43 - r74.

41. Burleigh, B. D., J r & Williams, C. H., Jr (1972) J . B i d . Chetn. 247,

42. Capaldi, R . A. & Vanderkooi, G. (1972) Proc. Nut/ Acad. Sci. USA,

43. Segrest, J . P. & Feldmann, R. J . (1974) J . Mol. B id . 87, 853-858. 44. Brown, J . P. & Perham, R. N. (1972) FEBS Lett. 26, 221 -224. 45. Packman, L. C. & Perham, R. N. (1982) FEBS Lef t . 139, 155-158. 46. Williams, C. H., Jr, Arscott, L. D. & Schultz, G. E. (1982) Proc.

Natl Acad. Sci. U S A , 79, 2199-2207. 47. Krauth-Siegel, R. L., Blatterspiel, R., Saleh, M., Schiltz, E., Schir-

mer, R . H. & Untucht-Grau, R. (1982) Eur. J . Bioclzem. 121. 259 - 267.

2077 - 2082.

69,930-932.

48. Brown, N. L., Ford, S. J., Pridmore, R . D. & Fritzinger. D. C. (1983) Biochemistry, 22, in the press.

49. Arscott, L. D., Williams, C. H., J r & Schub, G. E. (1982) in Flavins and Fluvoprorc4ns (Massey, V. & Williams, C. H.. Jr, eds) vol. 21, pp. 44- 48, Elsevier/North Holland, Biomedical Press, Amster- dam.

S O . Young, I . G., Rogers, B. L., Campbell, H. D., Jaworowski, A. & Shaw, D. C. (1981) Eur. J . Biochenz. 116, 165- 170.

51. Cole, S. T. (1982) Eur. J . Bioc,hem. 122, 479-484. 52. Schwartz, R. M. & Dayhoff, M. 0. (1978) Atlas of’Protc+? Sequence

and Structure, vol. 5 , pp. 353-358, National Biomedical Research Foundation, Washington DC.

P. E. Stephens, Cclltech 244- 250 Bath Road, Slough, Berkshire, Great Britain SL1 4DY

H. M. Lewis, M. G. Darlison, and J . R . Guest, Department of Microbiology, University of Sheffield. Western Bank, Sheffield, South Yorkshire, Great Britain S10 2TN