9
Eur. J. Biochem. 141, 351-359 (19x4) FEBS 1984 Nucleotide sequence of the sucA gene encoding the 2-oxoglutarate dehydrogenase of Escherichia coli K12 Mark G. DARLISON, Margaret E. SPENCER, and John R. GUEST Department of Microbiology, University of Sheffield (Received August 4, 1%3/February 6, 1984) - EJB 83 1317 The nucleotide sequence of a 3180-base-pair segment of DN._, containing the sucA gene encoding the 2-oxoglutarate dehydrogenase component (El 0) of the 2-oxoglutarate dehydrogenase complex of Escherichia coli, has been determined by the dideoxy chain-termination method. The sucA structural gene contains 2796 base pairs (932 codons, excluding the initiation codon AUG) and encodes a polypeptide having a glutamine residue at the amino terminus, a glutamate residue at the carboxy-terminus and a calculated M,= 104905. The predicted amino acid composition is in good agreement with published information obtained by hydrolysis of the purified enzyme. There is a striking lack of sequence homology between the 2-oxoglutarate dehydrogenase (Elo) and the corresponding pyruvate dehydrogenase (Elp), which suggests that the two components are not closely related in evolutionary terms. The location and polarity of the sucA gene, relative to the restriction map of the corresponding segment of DNA, are consistent with it being the proximal gene of the sue operon, as defined in previous genetic and post-infection labelling studies, but it could also form part of a more complex regulatory unit. The sucA gene is preceded by a segment of DNA that contains many substantial regions of hyphenated dyad symmetry including an IS-like sequence of the type that is thought to function as an intercistronic regulatory element. This segment also contains three putative RNA polymerase binding sites and a good ribosome binding site. The 2-oxoglutarate dehydrogenase complex of Escherichia coli catalyses the oxidative decarboxylation of 2-oxoglutarate to succinyl-CoA and carbon dioxide [l, 21 : 2-Oxoglutarate + CoA + NAD+ Succinyl-CoA + CO, + NADH + H ’. It comprises multiple copies of three enzymic components : 2-oxoglutarate dehydrogenase (El o), dihydrolipoarnide SUC- cinyltransferase (E20) and lipoamide dehydrogenase (E3), with reported stoichiometries of approximately 1 .O :2.0 : 1.0 (El o : E20 : E3) [3]. The 2-oxoglutarate dehydrogenase com- ponent specifically catalyses the initial decarboxylation and oxidation of 2-oxoglutarate with the production of a reduced and succinylated form of the dihydrolipoamide succinyl- transferase component : 2-Oxoglutarate + TPP-E1 o + CO, + hydroxybutyryl- TPP-El0 Hydroxybutyryl-TPP-Elo +lipoyl-E20-, Succinyl-dihydrolipoyl-E20 + TPP-Elo where TPP = thiamin diphosphate. -~ Abbreviation. CAMP, adenosine 3’,S’-monophosphate. Enzymes. Calf intestinal alkaline phosphatase (EC 3.1.3.1); citrate synthase (EC 4.1.3.7) ; dihydrolipoamide acetyltransferase (EC 2.3.1.12); dihydrolipoamide succinyltransferase (EC 2.3.2.61); DNA polymerase (Klenow fragment) (EC 2.7.7.7); lipoamide de- hydrogenase (EC 1.8.1.4); 2-oxoglutaratedehydrogenase (EC 1.2.4.2); phage-T4 DNA polymerase (EC 2.7.7.7); pyruvate dehydrogenase (EC 1.2.4.1); restriction endonucleases: AccI (EC 3.2.23.47); BumHI (EC 3.1.23.6) BglII (EC 3.1.23.10), MspI (EC 3.1.23.24); PstI (EC 3.1.23.31), SulI (EC 3.1.23.37), Suu3A (EC 3.1.23.27), SmuI (EC 3.1.23.44), SsrI (EC 3.1.21.34), Tug1 (EC 3.1.23.39) and XhoI (EC 3.1.23.42), succinate dehydrogenase (EC 1.3.99.1). The 2-oxoglutarate dehydrogenase component is a dimer, containing identical polypeptides with M, = 95 000 per chain, estimated by sedimentation equilibrium in guanidine . HCl [3], or M,= 94000 - 101 000 per chain, from electrophoretic mobi- lity in sodium dodecyl sulphate/polyacrylamide gel [3 -61. The three components of the 2-oxoglutarate dehydro- genase complex are encoded by the sucA (El o), sucB (E20) and lpd (E3) genes. The suc genes form an operon with sucAB polarity, which is situated at 16.7 min in the E. coli linkage map, very close to the genes encoding some other tricarboxylic acid cycle enzymes, gltA (citrate synthase) and sdhA,B (suc- cinate dehydrogenase; large and small subunits), as shown in Fig. 1 [2,6-81. The lpd gene supplies E3 components for assembly into both the pyruvate and 2-oxoglutarate de- hydrogenase complexes and it is situated at 2.8 min in the E. coli linkage map, very close to the ace operon that encodes the Elp (aceE) and E2p (aceF) Components of the pyruvate dehydrogenase complex [2]. The two complexes appear to be independently expressed : the pyruvate complex is induced during growth on pyruvate and weakly repressed on glucose or acetate and during anaerobic growth whereas, the 2-0x0- glutarate complex is induced on acetate (2-oxoglutarate), repressed by glucose and severely repressed during anaerobic growth [2,9,10]. The ace and lpd genes have recently been cloned [ l l , 121 and the primary structures of the three components of the pyruvate dehydrogenase complex have been deduced from the nucleotide sequence of the corresponding 7740-base-pair segment of DNA [13-151. Studies with the cloned genes established that although the lpd gene is adjacent to the ace operon and transcribed with the same polarity, it can be expressed from its own promoter. -+

Nucleotide sequence of the sucA gene encoding the 2-oxoglutarate dehydrogenase of Escherichia coli K12

Embed Size (px)

Citation preview

Page 1: Nucleotide sequence of the sucA gene encoding the 2-oxoglutarate dehydrogenase of Escherichia coli K12

Eur. J. Biochem. 141, 351-359 (19x4) FEBS 1984

Nucleotide sequence of the sucA gene encoding the 2-oxoglutarate dehydrogenase of Escherichia coli K12

Mark G. DARLISON, Margaret E. SPENCER, and John R. GUEST Department of Microbiology, University of Sheffield

(Received August 4, 1%3/February 6, 1984) - EJB 83 1317

The nucleotide sequence of a 3180-base-pair segment of DN._, containing the sucA gene encoding the 2-oxoglutarate dehydrogenase component (El 0) of the 2-oxoglutarate dehydrogenase complex of Escherichia coli, has been determined by the dideoxy chain-termination method. The sucA structural gene contains 2796 base pairs (932 codons, excluding the initiation codon AUG) and encodes a polypeptide having a glutamine residue at the amino terminus, a glutamate residue at the carboxy-terminus and a calculated M,= 104905. The predicted amino acid composition is in good agreement with published information obtained by hydrolysis of the purified enzyme. There is a striking lack of sequence homology between the 2-oxoglutarate dehydrogenase (Elo) and the corresponding pyruvate dehydrogenase (Elp), which suggests that the two components are not closely related in evolutionary terms.

The location and polarity of the sucA gene, relative to the restriction map of the corresponding segment of DNA, are consistent with it being the proximal gene of the sue operon, as defined in previous genetic and post-infection labelling studies, but it could also form part of a more complex regulatory unit. The sucA gene is preceded by a segment of DNA that contains many substantial regions of hyphenated dyad symmetry including an IS-like sequence of the type that is thought to function as an intercistronic regulatory element. This segment also contains three putative RNA polymerase binding sites and a good ribosome binding site.

The 2-oxoglutarate dehydrogenase complex of Escherichia coli catalyses the oxidative decarboxylation of 2-oxoglutarate to succinyl-CoA and carbon dioxide [l, 21 :

2-Oxoglutarate + CoA + NAD+ +

Succinyl-CoA + CO, + NADH + H ’. It comprises multiple copies of three enzymic components : 2-oxoglutarate dehydrogenase (El o), dihydrolipoarnide SUC-

cinyltransferase (E20) and lipoamide dehydrogenase (E3), with reported stoichiometries of approximately 1 .O :2.0 : 1.0 (El o : E20 : E3) [3]. The 2-oxoglutarate dehydrogenase com- ponent specifically catalyses the initial decarboxylation and oxidation of 2-oxoglutarate with the production of a reduced and succinylated form of the dihydrolipoamide succinyl- transferase component :

2-Oxoglutarate + TPP-E1 o + CO, + hydroxybutyryl- TPP-El0

Hydroxybutyryl-TPP-Elo +lipoyl-E20-, Succinyl-dihydrolipoyl-E20 + TPP-Elo where TPP = thiamin diphosphate. -~

Abbreviation. CAMP, adenosine 3’,S’-monophosphate. Enzymes. Calf intestinal alkaline phosphatase (EC 3.1.3.1);

citrate synthase (EC 4.1.3.7) ; dihydrolipoamide acetyltransferase (EC 2.3.1.12); dihydrolipoamide succinyltransferase (EC 2.3.2.61); DNA polymerase (Klenow fragment) (EC 2.7.7.7); lipoamide de- hydrogenase (EC 1.8.1.4); 2-oxoglutarate dehydrogenase (EC 1.2.4.2); phage-T4 DNA polymerase (EC 2.7.7.7); pyruvate dehydrogenase (EC 1.2.4.1); restriction endonucleases: AccI (EC 3.2.23.47); BumHI (EC 3.1.23.6) BglII (EC 3.1.23.10), MspI (EC 3.1.23.24); PstI (EC 3.1.23.31), SulI (EC 3.1.23.37), Suu3A (EC 3.1.23.27), SmuI (EC 3.1.23.44), SsrI (EC 3.1.21.34), Tug1 (EC 3.1.23.39) and XhoI (EC 3.1.23.42), succinate dehydrogenase (EC 1.3.99.1).

The 2-oxoglutarate dehydrogenase component is a dimer, containing identical polypeptides with M , = 95 000 per chain, estimated by sedimentation equilibrium in guanidine . HCl [3], or M,= 94000 - 101 000 per chain, from electrophoretic mobi- lity in sodium dodecyl sulphate/polyacrylamide gel [3 -61.

The three components of the 2-oxoglutarate dehydro- genase complex are encoded by the sucA (El o), sucB (E20) and

lpd (E3) genes. The suc genes form an operon with sucAB polarity, which is situated at 16.7 min in the E. coli linkage map, very close to the genes encoding some other tricarboxylic acid cycle enzymes, gltA (citrate synthase) and sdhA,B (suc- cinate dehydrogenase; large and small subunits), as shown in Fig. 1 [2,6-81. The lpd gene supplies E3 components for assembly into both the pyruvate and 2-oxoglutarate de- hydrogenase complexes and it is situated at 2.8 min in the E. coli linkage map, very close to the ace operon that encodes the E lp (aceE) and E2p (aceF) Components of the pyruvate dehydrogenase complex [2]. The two complexes appear to be independently expressed : the pyruvate complex is induced during growth on pyruvate and weakly repressed on glucose or acetate and during anaerobic growth whereas, the 2-0x0- glutarate complex is induced on acetate (2-oxoglutarate), repressed by glucose and severely repressed during anaerobic growth [2,9,10].

The ace and lpd genes have recently been cloned [ l l , 121 and the primary structures of the three components of the pyruvate dehydrogenase complex have been deduced from the nucleotide sequence of the corresponding 7740-base-pair segment of DNA [13-151. Studies with the cloned genes established that although the lpd gene is adjacent to the ace operon and transcribed with the same polarity, it can be expressed from its own promoter.

-+

Page 2: Nucleotide sequence of the sucA gene encoding the 2-oxoglutarate dehydrogenase of Escherichia coli K12

352

In a parallel study the &A-sucB region has been cloned in phage and plasmid vectors [6,8] and the approximate positions of the genes and their transcriptional polarities relative to the restriction map have been defined as shown in Fig. 1. This region, which contains several Krebs cycle genes, is of particular interest for (a) making detailed strue- tural and functional comparisons of several pairs of related enzymes, viz. succinate dehydrogenase and fumarate re- ductase, and the El and E2 components of the 2-oxoglutarate and pyruvate dehydrogenase complexes; (b) defining the molecular mechanisms which control the expression of the Krebs cycle genes, and (c) investigating potential evolutionary relationships between the sdh-frd and uce-suc gene-protein systems.

This paper reports the complete nucleotide sequence of a 3180-base-pair segment of DNA and the location and identi- fication of the sucA structural gene. The primary structure of the 2-oxoglutarate dehydrogenase (Elo) component, derived from the nucleotide sequence, is also reported and compared with the structure of the analogous pyruvate dehydrogenase (Elp) component that was deduced in the same way from the nucleotide sequence of the uceE gene [I 31.

MATERIALS AND METHODS

Sorrrcr.~ of‘ f) N A

The segment of DNA to be Sequenced was originally isolated in the 3LKltA.~~~j.~ur.il,sucB transducing phages, AG117 and AG118 [6] (Fig. I ) . Several sub-fragments were transferred from iG118 DNA to pRR322, to provide uscful sources of DNA for detailed restriction mapping (M. E. Spencer, un- published work) and sequence analysis. These sub-fragments included the 4600-base-pair BamH1 fragment (BI-B,) of pGS65 and the 5800-base-pair BonzHI-MI fragment (B,-Sa,) of pGS64 (Fig. 1). Plasmid pGS64 was found to complement the lesions of ,sucA rec,4 and s7icB recA mutants (JRG 1500 and JRG ISOI), possibly as ;I consequence of an ‘in-phase’ fusion between the trt gene of the vector and the sue operon (see bclo w).

The primary sourccs of DNA fragments for ‘shot-gun’ cloning into MI 3 vectors were: pGS65 for the 1300-base-pair X,-P, fragment and pGS64 for the 5800-base-pair B,-Sa, fragment (Fig. 1). Other specific fragments, including the 179- base-pair B,-B, fragment and the overlapping PI-P, fragment (Fig. 1): were cloned directly from digests of dG117 DNA or pCiS.64. Phage and plasmid DNA were prepared as described previously [I 1,161. Restriction fragments were isolated after agarose gel electrophoresis by dissolving gel slices in saturated K I and recovering the DNA by chromatography on hydroxy- apatite [17]. Alternatively, DNA fragments were recovered directly from low-melting-point agarose by extraction with phenol [18].

clotzing in iw13 und transfertion

The isolation of replicative forms of M13 vectors and the subsequent cloning of DNA fragments into specific sites were according to standard methods [ 19 ~ 21 1. Hybrid M 13 phages were identified as described previously [13].

The products of MspI-digested XI -PI fragment were cloned into the AccI site of Ml3mp8 [22] and the products of Sau3A-digested B,-Sa, fragment were cloned into the BarriHI site of M131np9 [22]. Random fragments of the B,-Sa,

segment, generated by ultrasonic treatment [23], were in- cubated with T4-DNA polymerase to create blunt ends and then cloned into SmaI-digested and phosphatase-treated M 1 3 q 8 [23].

Specific fragments involving single (BarnHI, BglII, Pst I) or double (Bun?HI plus BglII or Pstl) digestion were cloned into the relevant sites of M13mp8 and/or M13inp9. Other specific fragments involving double (Barn111 plus SstI or XhoI, and S.FtI plus PstI or XhoI) digestion were cloned into the Corresponding sites of M13mplO and/or M13mpll (J. Messing, unpublished). A clone containing the A,-P, fragment (Fig. 2), required for clarifying the sequence in a region of ambiguity, was obtained from pGS64 by first isolating the P,-P, fragment. The Accl-digested fragment was treated with T4-DNA polymerase and the desired clone obtained using SmaI-digested and phosphatase-treated MI 3mp8. Where necessary clones were analysed by hybridiza- tion according to the method of Winter and Fields “91.

Nucleotide sequence analysis

Single-stranded M13 templates were prepared [21] and sequenced using the dideoxy chain-termination method of Sanger et al. [24]. All clones were initially screened by ‘A- tracking’ to avoid generating redundant data. N ucleotide sequences were compiled and analysed using the computer programs devised by Staden [25 -271.

Amino acid sequence comparisons

Amino acid sequences were compared using the pro- portional matching option of the interactive graphics program DIACJON [28]. This incorporates a scoring system based on

95000 42000

RlBgl HI XI F; p2 T b , B E j 0% I I

XG117 = XG118 L 1

_ _ _ ~

p G S 6 4 I . .-

p G S 6 5 - Fig. 1 . Orgunization and e.upression qf t h -7-o.uoghcrcrrrrtt, deehydro- genase and dihydrolipotrmide succin.l?ltrcrn.~ff2.rcr.vc. gene.c of’ E. coli. The segment of the linkage map at 16.4 - 16.9 inin containing the g l tA . sdh and suc genes is shown aligned with the irestriction map. The approximate locations and polarities ofthe genes and their sizes. based on the sizes of the corresponding products. are indicated. Left to right corresponds to clockwise in the E. coli linkage map. The restriction targets for EQ~IIII (B), EgrII (Bg), EcoKl ( I t ) ; Hind111 (H). S d I (Sa) and XhoI (X) are lhose defined by Spencer and Guest [ 6 ] . except that the two EnmHT targets. B2 and B, (separatcd by 179 base pairs), were previously defined as a single site (B,) and the positions of‘ three PsrI (P) sites havc been included. The segmcnts of DNA cloned in dG117 and iLG1 18, and into the BarwHI and SuII sires of pBR322 to generate the plasmids used in this work. are indicated by the open bars. Fragment sizes are shown in base pairs and the nucleotide sequence has been defined for the region between the vertical lines

Page 3: Nucleotide sequence of the sucA gene encoding the 2-oxoglutarate dehydrogenase of Escherichia coli K12

353

Sonics

Fig. 2. Sumrnary of the DNA sequence data obtained from M13 clones. The restriction map drawn to scale shows sites identified in ‘shot-gun’ cloning (MspI and Sau3A) and those used for directed cloning (AccI, A; BamHI, B; BglII, Bg; PstI, P and SstI, St) numbered as in Fig. 1 . The site denoted Pp refers to the unique PstI target in pBR322, used to construct a PstI-PstI clone from pGS64. The nucleotide positions, in base pairs, are numbered from the first base of a TaqI target (TI) situated between XI and PL in Fig. 1. The arrows above and below the restriction map show the positions and directions of nucleotide sequence data obtained from the two DN,4 strands by either ‘shot-gun’ cloning ( M a p I , Sau3A and ultrasonic fragmentation) or directed cloning. The sequence was fully ovcrlapped and most of it (87%) was obtained from both DNA strands. The left to right orientation corresponds to clockwise in the E. colt linkage map

MDM,,, calculated from accepted point mutations of 71 families of related proteins [29].

Muterials

The sources of most of the materials have been described previously [13]. The sources of other materials are as follow: T4-DNA polymerase from Bethesda Research Laboratories Inc. ; calf intestinal alkaline phosphatase from Boehringer Corporation Ltd; 17-nucleotide synthetic primer from Col- laborative Research Inc. ; and the replicative forms of phages M13mplO and M13mpll were purchased from P-L Bio- chemicals, Inc.

RESULTS AND DISCUSSION

Sequencing strategy

The size of the coding region for the 2-oxoglutarate dehydrogenase (El 0) and dihydrolipoamide succinyltrans- ferase (E20) components based on the highest rcported M, values (Elo, M,=101000; E20, Mr=54000; [6]) is ~ 4 4 0 0 base pairs. There is also evidence from the properties of the h c A B transducingphages [6] that the sucB gene terminates in the BglII-SulI segment, Bg,-Sa, (Fig. I). Thus it would be predicted that the left-most limit for the start of the suc operon should be midway between the XhoT and PstI sites, Xl-P, (Fig. l), assuming that the sucA-sucB intergenic region is small. Thismeans that theentiresuc operon should be included in the 7100-base-pair segment of DNA between the XhoI and SulI sites (XI-Sa2 ; Fig. 1).

The overall strategy devised for sequencing the X,-Sa2 fragment, containing the suc genes, involved a combination of the ‘shot-gun’ or random method for two major fragments (Xl-P, and B,-Sa, ; Fig. 1) and directed or forced cloning of

specific fragments for (a) analysing and overlapping the region between the major fragments, particularly with fragments B,- B,, St,-B2 and P,-P2 (Fig. 1 and 2), and for (b) providing additional sequence data around specific sites. These frag- ments were chosen by referring to detailed restriction maps of the relevant phages and plasmids [6] (and M. E. Spencer, unpublished work).

The complete nucleotide sequence of z 5300 base pairs extending rightwards (clockwise) from the XhoI target (X,) has now been determined. This paper reports the nucleotide sequence of a 3180-base-pair segment containing the 2- oxoglutarate dehydrogenase gene (sucA). The sequence of the dihydrolipoamide succinyltransferase gene (sucB) is reported in the following paper [30]; the sequence of the ~700-base - pair segment closest to the XhoI site, which contains most of the sdhB structural gene encoding the iron-sulphur protein subunit (small subunit) of succinate dehydrogenase, will be reported separately (M. G. Darlison and J. R. Guest, un- published work).

A map summarizing the M13 clones used to generate the nucleotide sequence and showing the directions and extents of sequencing is presented in Fig. 2. The complete and un- ambiguous sequence of a 31 80-base-pair segment containing the sucA gene is shown in Fig. 3. The nucleotides are numbered from the first base of a convenient TaqI site, TI in Flg. 2. All of the sequence was obtained from at least two inodependent clones, it was fully overlapped and most of it (87 A) derives from both DNA strands.

Location of’ the coding region

The computer program FRAMESCAN [27], which pre- dicts coding regions by a statistical analysis of the codon usage in all three reading frames, was used to detect open reading frames in both DNA strands. A large open reading frame of

Page 4: Nucleotide sequence of the sucA gene encoding the 2-oxoglutarate dehydrogenase of Escherichia coli K12

3 54

a , sdh3 A Se rMe t LeuLeLG1 nArg&s_ntll=~*-*- TCGATGTTGTTGCAACGTAAT(;CCTP.~.~CCG'~AGGCCTGATA~GACGCGCAAGCGTC~~~

d 6o 2 r 30 40 r 5 0 - -,I

1 0 ' a

TCAGGCAACCAG~GCCGGA"CG~CTI~~~C~CCTTATCCGGCCTAC~AGTCA~TACCCGT - 7C B 8C ?- K t - 7 110 120 - - - - - - - - - - - - AGGCCTGATAACCGCACCGCATCA~GCGTAACAAAGAAATGCAGGAAATCTTTAAAAACT

130b7?4Cb 150 160 170 180

G C C C C T G A C A C T A A C A C k C T T T T T A A A 3 G T T C C T T C G C G A G C C A C T A C G T A G A C A A G A G C - - - - - - - - - - - - -

190 2 2 0 0 21 0 2 m

343 350 360 20 30

SerTyrLeueerGlyAlaAsnGlnSerTrpIleGluGlnLeuTyrGluAspPhe~euThr CTTCTTACCTCTCTGGCGCAAACC4GAGCTGGA~AGAACAGCTCTATGAAGACTTCTTAA

370 3EC 3 90 400 41 0 420 40 50

As~ProAspSerValAsFAlaAsnTrpArgSer"hrPheGlnGlnLeuProGlyThrG1y CCGATCCTGACTCGGTTGAC~C~A~CTGGCGTTCGACGTTCCAGCAGTTACCTGGTACGG

480 7n

430 440 460 470 z0 ," I -

ValLysProAspGlnPheHisSerGln~hrArgGluTyrPheArgArgLeuAlaLysAsp G A G T C A A A C C G G A T C A A I T C C A C T C T C A A A C G C G T G A A T A T T T C C G C C G C C T G G C G A A A G

490 50c 51 0 4 20 530 Z i O Rn I" ,"

AlsSerArgTyrScrSerThrIleSerAspProAspThrAsnValLysGlnValLysVal ACGCTTCACGTTAC~CCTTC4ACGATCT~CGACCCTGACACCAATGTGAA~CAGGT~AAAG

550 560 570 5 ao 590 600 1 00 110

LeuGlnLeuIleAsnAlaTyrArgPheArgGlyHieGlnHisAlaAsnLeuAspProLeu TCCTGCAGC~CATTAACGCATACCGCTTCCGTGGlCACCAGCATGCGAATCTCGATCCGC

660 650 130

61 0 620 630 640 120

GlyLeuTrFGliGlnAspLysValAlaAspLeuAspFroSerPhe~isAspLeuThrGlu TGGGAC~GG:GGCAGCAAGATAAAGTGGCCGAT~TGG~.TCCGTC~TTCCACGATCTGACCG

67C 680 6 90 700 71 0 720 140 150

AlaAspPheGlnGlulh;PheAsnValGlySerPh~AlaSerGlyLysGluThrMetLys AAGCAGACTTCCAGGAGACCT~CAAC~TCGGTTCAT~TGCCAGCGGCAAAGAAACCA~GA

730 74c 750 760 770 780 160 170

LeuGlyGluLeuLeuGluAlaLeuLysGlnT~rTyrCysGlyProIleGlyAlaGlu~yr AACTCGGCGAGCTGGCTGGkAGCCCTChAGCAAACCTACTGCGGCCCGATTGGTGCCGAGT

790 800 81 0 820 830 a40 i ao 190

MetIiisIleIhrBerThrSluGluLysArgTrpIl~GlnGlnArgIleGluSerGlyArg ATATGCACATTACCAGCACCGAAGAAAA~.CGCTGGATCCAACA~C~TATCGAGTCTGGTC

85C 860 870 880 890 900 200 21 0

AlaT~rPheAsnSer;luGluLysLysArgPheLeuSerGluLeuThrAlaAlaGluGly GCGCGAC"TCAATACCGAkGAGAAAAAACGC~~C~TAA~CGAACTGACCGCCGCTGAAG

9' 0 92,; 970 940 950 960 220 230

LeuGluArgTyrLeuGlyAlaLysPheProGlyAlaLyeArg?heSerLeuGluGlyGly GTCTTGAACGTTACCTCGGCGCAAAATTCCCTGGCGCAAAACGCTTCTCGCTGGAAGGCG

970 9ac 990 1000 1010 1020

GTGACGCGTTAATCCCGATGCTTAAAGAGATGATCCGCCACGC~GGCAACAGCGGCACCC 1030 1 OAO 1050 1060 1070 1080

240 250 AspAlaLeuIleProMetLeuLysGluMetIleArgHisAlaGlyAsnSerGlyThrArg

2 60 270 GluValValLeuGlyMetAlaHisArgGlyArgLeuAsnValLeuValAsnValLeuGly

GCGAAGTGGTTCTCGGGATGGCGCACCGTGGTCGTCTGAACGTGCTGGTGAACGTGCTGG 1090 1100 1110 11 20 1130 1140

280 290 LysLysProGlnAspLeuPheAspGluPheAlaGlyLysHisLysGluHisLeuGlyThr

GTAAAAAACCGCAAGACTTGTTCGACGAGTTCGCC~GTAAACATAAAGAACACCTCGGCA 1150 1 160 1 170 11 80 11 go 1200

3 00 31 0 GlyAspYalLysTyrHisMetGlyPheSerSerAspPheGlnThrAspGlyGlyLeuVal

CGGGTGACGTGAAATACCACATGGGCTTCTCGTCTGACT~CCAGACCGATGGCGGCCTGG 1210 1220 1230 1240 1250 1260

3 20 330 H i s L e u A l a L e u A l a P h e A s n P r o S e r H i s L e u G l u I l e V a l I l e G l y

TGCACCTGGCGCTGGCGTT~AACCCGTCTCACCTTGCTGAGATTGTAAGCCCGGTAG~TATCG 1270 1280 1290 1300 1310 1320

340 350 SerValArgAlaArgLeuAspArg~ei~AspGluProSerSerAsnLy~ValLeuProIle

GTTC~GTTCGTGCCCGTCTGGACAGACTTGATGAGCCGAGCAGCAACAAAGTGCTGCCAA 1330 1340 1350 1360 1370 1380

7 6 0 77n , a - Thr IleHi sGlyAspAlaAlaVal~h~GlyGlnGlyValValGlnGluThrLeuAsnMet

TCACCATCCACGGTGACGCCGCAGTGACCGGGCAGGGCGTGGTTCAGGAAACCCTGAACA 1390 1400 1410 1420 1430 1440

7 80

400 41 0 GlyPheThrThrSerAsnProLeuAs~AlaArgSerThrProTyrCysThrAspIleGly

TTGGTTTCACCACCTCTAATCCGCTGGATGCCCGTTCTACGCCGTACTGTACTGCTGATATCG 1560

1550 Afn 1510 1520 1530 1540

A? O 7,- _.

LysMet~alGlnAlaProIleP~e~~sVal.4snAlaAspAspPraGluAlaValAlaPhe GTAAGATGGTTCAGGCCCCGATTTTCCACGT~AACGCGGACGATCCGGAAGCCGT~GCCT

1570 158Q 1590 1600 1610 1620

440 450 VaiThrArgLeuAlaLeuAspPheArgAsnThrPheLysArgAspValPheIleAspL~~

TTGTGACCCGTCTGGCGCTCGAT~TCCGTAACACCTTTAAACGTGATGTC~TCATCGACC 1630 1640 1650 1660 1670 1680

A f i n A 7 0 _-- ValSerTyrArgArgHisG1yHisAsnGluAlaAspGluProSerAlaThrGlnProLeU

TGGTGTCGTACCGCCGTCACGGCCACAACGAAGCCGACGAGCCGAGCGCAACCCAGCCGC 1690 1700 171 0 1720 1730 1740

480 490 MetTyrGlnLysIleLysLysHisProThrProArgLysIleTyrAlaAspLysLeuGlu

TGATGTATCAGAAAATCAAAAAACATCCGACACCGC~CAAAATCTACGCTGACAAGCTGG 1800

17'0 510 1750 1760 1770 ,780

500 GlnGluLysValAlaThrLeuGlu~spAlaThrGluMetValAsn~euTyrArg~spAla

AGCAGGAAAAAGTGGCGACGCTGGAAGATGCCACCGAGATGGTTAACCTGTACCGCGATG 1 a1 o 1 a20 1830 1840 1850 1860

520 530 LeuAspAlaGlyAspCysVal"lalAlaGluZrpArgProMetAsnMet~isS~r?heThr

CGCTGGATGCTGGCGATTGCGTAGTGGCAGAGTGGCGTCCGATGAACATGCACTCTTTCA 1870 i a80 1890 1900 1910 1920

540 550 TrpSerProTyrLeuAsnHisGluTrpAspGluGluTyr?roAsnLysValGluMetLys

CCTGGTCGCCGTACCTCAACCACGAATGGGACGAAGAGTACCCGAACAAAGTTGAGATGA 1930 1940 1950 1960 1970 1980

560 570

i 990 2000 201 0 2020 2030 2040 580 590

ArgValAlaLysIleTyrG1yAspArgGlnAlaMetAlaAlaGlyGluLysLeuPheAsp CTCGCCTTGCCAAGATTTATGGCGATCGCCAGGCGATGGC~GCCGGTGAGAAACTGT~CG

2050 2060 2070 2080 2090 21 00 600 61 0

TrpGlyGlyAlaGluAsnLeuAlaTyrAlaThrLeuValAspGluGlyIleProValArg AC':GGGGCGGTGCGGAAAACCTCGCTTACGCCACGCTGGTTGATGAAGGCATTCCGGTTC

21 10 21 20 2130 21 40 21 60 1;7n " L I -,-

LeuSerGlyGluAspSerGlyArgGlyThrPhePheHisArgHisAlaValIleHisAsn GCCTGTCGGGTGAAGACTCCGGTCGCGGTACCTTCTTCCACCGCCACGCGGTGATCCACA

21 70 21 ao 21 go 2200 221 0 2220 640 650

GlnSerAsnGlySerThrlyrThr?roLeuGlnHisI~eHisA6nGlyGlnGlyAlaPhe ACCAGTCTAACGGTTCCACTTACACGCCGCTGCAACATATCCA~AACGGGCAGGGCGCGT

2230 2240 2250 2260 2270 2280 6 60 670

ArgValTrpAspSerValLeuSerGluGluAlaValLeuAlaPheGluTyrGlyTyrAla TCCGTGTCTGGGACTCCGTACTGTCTGAAGAAGCAGTGCTGGCGTTTGAATATGGTTATG

2290 2300 231 0 2320 2330 2340 680 690

ThrAlaGluProArgThrLeuThrIle~rpGluAlaGlnPheGlyAspPheAlaAsnGly CCACCGCAGAACCACGCACTCTGACCATCTGGGAAGCGCAGTTCGGTGACTTCGCCAACG

2400 23g0 710

2350 2360 2370 2380 700

AlaGlnValValIleAspGlnPheIleSerSerGlyGluGlnLysTrpGlyArgMetCys GTGCGCAGGTGGTTATCGACCAGTTCA~CTCCTC~GGCGAACAGAAATGGGGCCGGATGT

241 0 2420 2430 2440 2450 2460 720 730

GlyLeuValMetLeuLeu?roHisGlyTyrGluGlyGlnGlyProGluHisSerSerAla GTGGTCTGGTGATGTTGCTGCCGCACGGTTACGAAGGGCAGGGGCCGGAGCACTCCTCCG

2470 2480 2490 2500 251 0 2520 760 750

Ar gLeuGluArgTyrLeuGlnLeuCysAlaGluGlnAsnMetGlnValCysVa~~~~Ser CGCGTCTGGAACGTTATCTGCAACTGCTTTGTGCTGAGCAAAACATGCAGGTTTGCGTACCGT

2530 2540 2550 2560 2570 2580 760 770

ThrProAlaGlnValTyrHisMetLeuArgArgGlnAlaLeuArgGlyMetArgArgPro CTACCCCGGCACAGGTTTACCACATGCTGCTGCGTCGTCAGGCGCTGCGCGGGATGCGTCGTC

2590 2600 261 0 2620 2630 2640

2650 2660 2670 2680 2690 2700 800 81 0

GluLeuAlaAsnGlyThrPheLeuProAlaIleGlyGluIleAspGluLeuAspProLys A A G A A C T G G C G A A C G G C A C C T T C C A G C C A T C G G T G A A A T C G A C G A G C T T G A T C C G A

271 0 2720 2730 2740 2750 2760 a20 870

2830 2840 2850 2860 2870 2880 860 87 0

HisLysAlaMetGlnGluValLeuGlnGlnPheAlaHisVal~ysAspPheValTrpC~s CGCATAAAGCGATGCAGGAAGTGTTGCAGCAGTTTGCT~ACGTCAAGGATTTTGTCTGGT

2890 2900 291 0 2920 2930 2940 RRO 890 __. ~-

GlnGluGluProLeuAsnGlnGlyAlaTrpTyrCysSerGlnHisHisPheArgGluVal GCCAGGAAGAGCCGCTCAACCAGGGCGCATGGTACTGCAGCCAGCATCATTTCCGTGAAG

2950 2960 2970 2980 2990 3000 900 91 0

IleProPheGlyAlaSerLeuArgTyrAlaGlyArgProAlaSerAlaSerProAlaVal TGATTCCGTTTGGGGCTTCTCTGCGTTATGCAGGCCGCCCGGCCTCCGCCTCTCCGGCGG

301 0 3020 3030 3040 3050 3060 920 930

GlyTyrMetSerValHisGlnLysGlnGlnGlnAspLeuValAsnAspAlaLeuAsnVal TAGGGTATATGTCCGTTCACCAGAAACAGCAACAAGATCTGGTTAATGACGCGCTGAACG

31C0 31 10 31 20 932 3070 ?%cB E%9torn onent Glu*** Met SerSer ValgspIZeLeuValProAspLeuProGlu

TCGAATAAATAAAGGATACACAATGAGTAGCGTAGATATTCTGGTCCCTGACCTGCCTGA 31 3 0 3140 31 50 3160 31 70 31 a0

Page 5: Nucleotide sequence of the sucA gene encoding the 2-oxoglutarate dehydrogenase of Escherichia coli K12

355

2799 base pairs (nucleotide positions 327 -3125 in Fig. 3) was found in the strand previously identified as the suc coding strand in studies correlating polarity of transcription with the restriction map [6] . This coding region starts with AUG as the initiation codon, it ends with an ochre codon (UAA) and it corresponds in both size and position to that predicted for the structural gene ( surA) of the dehydrogenase component (El 0) of the 2-oxoglutarate dehydrogenase complex. It is interesting to note that this open reading frame starts 547 base pairs to the left of the BamHl site of the BamHI-Sall fragment (B,-Sa,) cloned in the suc’ plasmid, pGS64. However, inspection of the vector (pBR322) shows that insertion of this fragment could result in an ‘in-phase’ fusion product, expressed from the tet promoter, in which the first 181 residues of Elo are replaced by the 95 amino-terminal residues of the tet product [31]. The SUC’ phenotype conferred by pGS64 appears to be due to the hybrid protein retaining 2-oxoglutarate dehydrogenase activity.

Features of the nucleotide sequences flanking the sucA gene

The sucA structural gene is flanked by the sdhB gene (upstream) and the sucB gene (downstream) with intergenic regions of 302 and 17 base pairs, respectively’(Fig. 3). The intergenic (intercistronic) region between the two suc genes is typical of those found between genes in the same operon; it resembles the intercistronic region between the ace genes [14] and is discussed in detail in the following paper describing the sucB gene [30].

The sucA structural gene is preceded by a well-placed ribosome binding site [32], d(T-A-A-G-G-G-A-T-C) (po- sitions 31 5 - 323), having five consecutive bases that are complementary to the 3’-terminal sequence of the 16s ribo- somal RNA (Fig. 3). It is interesting, but probably coin- cidental, that the uceE gene has the same ribosome binding site [13]. The translation initiation region conforms to rules 1-5 of Stormo et al. [33] and it satisfies Atkins’ rule [34], the nearest stop codon upstream of the initiation codon being an ochre codon (UAA) at positions 315 -317.

Since the sucA gene is thought to be the proximal gene of the suc operon, the sequence preceding the structural gene has been scanned for potential promoters by looking for homology with the canonical -35 and -10 (Pribnow box) sequences [35]. Three putative promoters were detected: A, 10-37; B, 125 - 154 and C, 185 -21 5 (Fig. 3). Two of these (A and B) lack the so-called ‘invariant’ dT and one (C) has a spacing of 19 base pairs between the - 35 and - 10 regions. However, the ‘invariant’ dT is not essential for promoter function and functional promoters with 19 base-pair spacings have been reported [35]. Promoter A overlaps the sdhB coding region, but this situation is not unique in Escherichia coli, because the umpC promoter overlaps the frdD structural gene [36]. Putative promotor A is also located upstream of a large

intercistronic regulatory element or IS-like sequence [37], which in turn overlaps promoter B (see below). Consequently, C is the preferred potential promoter sequence and it should be noted that it is associated with a substantial region of hyphenated dyad symmetry (ee’, see below). However, further studies, including transcript mapping, will be required in order to identify the functional promoter.

The sdhB-sucA intergenic region is particularly complex with respect to regions of hyphenated dyad symmetry (Fig. 3). The most striking feature is the presence of an intercistronic regulatory element or IS-like sequence analogous to those described by Higgins et al. [37]. It comprises three regions of hyphenated dyad symmetry designated : dd’, 35-66; cc’, 74-102 and bb’, 123-147 (Fig. 3). If transcribed these could form very stable stem-and-loop structures having free energies of: AG= -104.5 kJ, -98.6 kJ and -91.1 kJ (-25.0 kcal, -23.6 kcal and -21.8 kcal) respectively [38]. As with the sequences described previously, larger stem-and-loop struc- tures could be formed in the mRNA as follows : d‘d-c’c (AG = -215.3 kJ, -51.5 kcal), c’c-b‘b ( A G = -242.0 kJ, -57.9 kcal) and, in this case, d’dc’-cb‘b ( A G = -293.0 kJ, -70.1 kcal). Another region of hyphenated dyad symmetry (aa’ in Fig. 3) occurs within the intercistronic regulatory element and although it could form a stable stem-and-loop structure in mRNA, its free energy value (dG= -53.5 kJ, - 12.8 kcal) suggests that this would be less stable than those involving dd‘ and cc‘. The significance of the intercistronic regulatory element with respect to the expression of the suc genes will be discussed later (see below) and a more detailed analysis of this complex region will be presented in the following paper describing the sucB gene [30], because an analogous region has been discovered immediately down- stream of the sucB coding region.

Further regions of hyphenated dyad symmetry, situated between the intercistronic regulatory element and the sucA gene, that could form stable stem-and-loop structures after transcription, are defined by letters ee’ to ii’ in Fig. 3. They are redrawn in Fig. 4 to highlight their potential stem-and-loop configurations and stabilities (free energy values calculated according to Tinoco et al. [38]). The largest (ee’) contains a putative promoter (C), so it would not be formed if the promoter is functional, and another (ii’) overlaps the proposed ribosome binding site (Fig. 4). The role of these structures is not known but it is tempting to speculate that they may be important in mediating the effects of the diversity of factors that influence suc gene expression.

The codon usage for the sucA gene is similar to the patterns observed for many Escherichia coli genes that are strongly or moderately expressed [39]. In this respect it resembles the aceE gene, but there are several differences which suggest that the sucA gene may be less strongly expressed than aceE. This will be discussed in a subsequent paper in which the codon usages

Fig. 3. The nucleotide sequence of the sucA region. The nucleotide sequence (3180 base pairs) of the non-coding strand (sense strand) containing the sucA gene, is shown in the 5‘+3‘direction starting at the first base of a convenicnt TnqI target near the 3’end ofthe neighbouringsdhBgene. The respective carboxy-terminal and amino-terminal segments of the sdhB and sucB gene products are included at the beginning and end of the sequence. The amino acid sequcnce of the product of the sucA gene (nucleotide positions 327 -3125) is shown directly above the nucleotide sequence. The d representing deoxy and the hyphens indicating both phosphodiester links and peptide links have been omitted. Significant regions of dyad symmetry are indicated by converging arrows labelled aa‘ to ii’ (dashes within the converging arrows denote nucleotides that do not base-pair). The intercistronic regulatory element or IS-like sequence [37] corresponds to the region containing dd’, cc’ and bb’. A potential ribosome binding site (Shne-Dalgarno sequence) is boxed and possible promoter sequences, -35 and - 10 (Pribnow), are indicated by letters (A,B.C) and bars above the nucleotide sequence connected by broken lines. Relevant stop codons are indicated thus: * * *

Page 6: Nucleotide sequence of the sucA gene encoding the 2-oxoglutarate dehydrogenase of Escherichia coli K12

356

c A: C G I

C A,' G C '

C G, AC_=-lll.Okcal AG=-1?.2kcal AG-7. Pkcal T A;

A T A A A

e ; T C T A

TC-CAGGAA 1 f > O

2' T'

T, e' T:

A

C G

f t C

C A

A C A A A

T C A A A T

g : :g' C G

A G ' P A

' A T" I C G I

G 250

G C G C

I GI?*** C IG t k MetGln

' AGTATCCACGGCGAAGT AAG ATCACGATGCAG--------------GAATAA

G C- A T A A+11 .Gkcal A C

C A O G

T A

Fig. 4. Nurleotidr .vryuenc~fliinking rhe sucA gene. The nucleotide sequence at the start of the sucA gcnc i s redrawn to highlight significant features. Four regions of hyphenated dyad symmetry are shown as stem-and-loop structures. The free energies of the corresponding tran- scripts, calculated according to Tinoco et al. [38], arc indicated (ee', - 13.0 kcal- -54.3 kJ ; ff', - 11.6 k c a l e -48.5 kJ ; gg'. - 13.2 kcal= -55.2 kJ; i i ' , -7.2 kcal= -30.1 kJ). An additional region of dyad symmetry (hh'), overlapping the structure designated gg', is denoted by converging arrows ( A G = -7.6 kcal= -31.8 kJ). The potential ribosome binding site is boxed and the amino-terminal sequcnce of the 2-oxoglutarate dehydrogenase (El 0) i s shown above the nucleotide sequence. One of thc potential sucA promoter sequences (C), -35 and - 10 (Pribnow), ovcrlapping the potential stem-and-loop structure, ee', i s indicated by lines above the nucleotide sequence

of the five genes encoding components of the multienzyme complexes will be compared.

Primnary structure und cornposition of the 2-oxogluturatc dehydrogenase component ( E l o )

The primary structure of the E l o component of the 2-oxoglutarate dehydrogcnasc complex, translated from the nucleotide sequence, is presented in Fig. 3. If it is assumed that the initiating formylmethionine residue is removed post- translationally, then the amino-terminal residue would be the glutamine residue designated residue 1. The predicted car- boxy-terminal residue is glutamate. The 932 amino acid residues (excluding the initiating methionine) correspond to a polypeptide of M,=104905. This value agrees with that derived from the electrophoretic mobility of the labelled sucA gene product generated by phage-directed protein synthesis, Mr=lOIOOO [6], but it is somewhat higher than the values reported for the purified protein, M,=94000-97000 [3 -51. In Table 1 the amino acid composition derived from the nucleotide sequence is compared with the composition ob- tained by amino acid analysis of the purified protein [3] and with the composition of the analogous pyruvate dehydro- genase component ( E l p) derived from the nucleotide sequence of the uceE gene [13]. The predicted composition of the 2- oxoglutarate dehydrogenase component is in very good agreement with that determined experimentally. The main discrepancies involve the serine, cysleine and tryptophan contents. which are somewhat lower in the DNA-derived composition. Comparisons between the compositions deduced for the two dehydrogenase components indicate that Elo is relatively rich in scrine. glutaminc, valinc and histidine

residues and relatively low in glycine, jsoleucine and tyrosine. The calculated polarity orElo is 47.4':" [40]. This is similar to the value of46.2n/i reported for E l p [13] and it is within the normal range for soluble proteins. Several stretches of high hydrophobicity have been detected in the polypeptide chain : residues 308 -322 (hydrophobicity index, 1.70); 324 -333 (2.25); 413 -423 (1.86); 524 -538 (1.73); 710 -721 (2.13) and 793 - 803 (1.77). The sequence has also been examined for the presence or potential DNA binding sites, which could indicate that the E lo component controls suc transcription in an autoregulatory manner analogous to that proposed for uce and Ipd expression [2,41]. However, no sites comparable to thosc detected in e.g. u p , lucl, jizr were detected.

It is interesting that within EJlo are two sequences, Gln- Val -Gly -Phe -Thr-T h r - Ser - Am -Pro- Leu -Asp- A h -Arg- Ser (residues 390- 403) and Thr-Phe-Lys-Arg-Asp-Val-Phe-lle- Asp-Leu-Val-Ser-Tyr-Arg (residues 442 - 455). which have features incommon with a tryptic phosphopeptide isolated from mammalian pyruvate dehydrogcnase, Tyr-His-Gly-His-Ser( P ) -Met-Ser-Am-Pro-Gly-Val-Ser(P)-Tyr-Arg [42] [phospho- rylated residues are denoted thus ( P ) and similar residues are printed in bold-face type]. Although the 2-oxoglutarate dc- hydrogcnase complex or E. coli is not thought to be regulated by a phosphorylation-dephosphorylation mechanism, these homologies may have some evolutionary significance. No comparable homologies were found between the mammalian and E. coli pyruvate dehydrogenases [ 131.

The amino acid sequence of E lo has been analysed for internal homologies and compared with El p using DIAGON [28]. This is a comparison program that incorporates a scoring matrix, MDM,, [29], which has been found to be very powerful for detecting distant relationships between amino acid sequences. No significant internal homologies were

Page 7: Nucleotide sequence of the sucA gene encoding the 2-oxoglutarate dehydrogenase of Escherichia coli K12

357

Table 1. Amino acid composition of the 2-oxoglutarate dehydrogenase component (Elo) of the 2-oxoglutarate dehydrogenase conip1e.x of' E. coli The amino acid composition of 2-oxoglutarate dehydrogenase (Elo) derived from the nucleotide sequence of the sucA gene is compared with the composition determined by amino acid analysis of the purified protein by Pettit et al. [3] and with theamino acid composition of the analogous pyruvate dehydrogenase (Elp) derived from the aucfeotidesequenceofthenceEgene by Stephenset al. [13]. Foreaseof comparison the contents of each amino acid are expressed as a molar percentage of the total number of amino acids. The initiating methionine residues are not included in the DNA-derived compositions

Amino No. of Content of acid residues in

Elo from DNA Elo E l P sequence

from DNA from [3] from [I31

mol/lOO niol

4.3 5.9 1 9.2 3.9 4.9 4.7

ASP 52 Asn 36 Thr 42 4.5 Ser 58 6.2 7.0 4.9 Glu 66 7.1 1 G1 n 54 5.8 i Pro 44 4.7 4.5 4.0 GlY 67 7.2 7.6 8.9 Ala 77 8.3 8.4 8.1 Val 69 7.4 6.9 5.8 Met 24 2.6 2.6 2.4 Ile 33 3.5 3.6 6.3 Leu 83 8.9 8.7 7.9 TYr 32 3.3 3.1 4.7 Phe 39 4.2 4.0 3.8 LYS 44 4.1 4.7 5.4 His 34 3.6 3.3 2.6 Arg 56 6.0 5.5 5.6 CYS 9 1 .o 1.2 0.7 TrP 14 1.5 1.8 1.2

5.6 1 4.6 1 12.9

Total residues 932

M , 104905 849 885

95 000 99 474

detected, nor was there any striking homology between the two El components (Fig. 5). It had been anticipated that this analysis would have revealed highly significant and well- aligned homologies between the El components, consistent wiih a close evolutionary relationship and slructural conserva- tion in regions associated with their common catalytic func- tions, e. g. binding sites for thiamin diphosphate and the lipoyl cofactor. The sequences of three segments that are aligned on or close to the diagonal (0, b and c in Fig. 5 ) are compared in Fig. 6. Segment b gives the highest score, equivalent to a double matching probability of 3 x and segment c exhibits a high degree of sequence conservation close to a histidine residue, which may have functional significance. However, it is not known whether any of these segments are associated with common functions. Indeed, regions of com- mon function need not necessarily be aligned on the diagonal and some of the other small regions of homology may be as important. By changing the span and significance parameters more extensive, but still weak and discontinuous, homologies

E l o A I f 932

Fig. 5. Sequence comparison rnutrix of the 2-oxoglutarate dehydro- genase ( E l o ) and pyruvate dehydrogenase [ Elp) components. The comparison matrix was obtained by DIAGON analysis [28], which incorporates the MDM,, scoring system [29]. Each point corresponds to the mid-point of spans of 35 residues that givc a score equivalent to a double matching probability of < 0.001. The letters a, h and c denote regions ofhomology located on, or close to, the diagonal, and d-e and f-g denote lines where more homology becomes apparent at a span of 55 residues

a 244 254 264 274 294 294

El o H A G I . ? S G T R E V V L G M A H R G R L " V L G K I P Q D L F D G P A E H L G ~ G D V I I I * * ] * I ** *I * * * * I *

El P GEXDEFESXGAiT IATRE;rLDNLVFv-rJCNsQRLDGPV~GMGKI~ELEGIF 231 241 251 261 271 281

b E l c

El P

C

539 549 559 569 E'rlDEEY PNKVBdRRLaSLAKRISTVPEAliEl

SREPNFTEKLELPSLPDFGALLEEQSKEIST / I I I I * I * l * * I l l I I

463 473 483 493

74; 158 768 I78 788 798 El o "VPSTPAqVYHMLRRQALRGMRSPLPVMS?KSLLRHPLAV~SLE~LANG~F

El P FEGAEEGIRKGIYKLETIECSKGKVQLLGSGSG~ILR~VRE~AEILAKDYGVG 701 71 1 121 731 74 1 751

I I I I / * I I I1 * I*** I *

Fig. 6. Alignments for amino acid sequences of the 2-oxoglutarate and pyruuate dehydrogenase components ( E l o and E l p ) . Details of three regions of diagonally related homology (a, h and c in Fig. 5) are shown. Identical residues are denoted by asterisks (*) and conservative changes with elements > 0.12 in MDM7* [29] are indicated by vertical bars. The numbers denote the position of the amino acid residues in the sequences of thc corrcsponding proteins

build-up along the lines d-e and f - g (Fig. 5). This could be signalling a very remote relationship between the two proteins based on a rearrangement of the amino-terminal and cdrboxy- terminal segments.

The lack of sequence homology between the E l com- ponents contrasts with the striking homologies that have been found for the E2 components [30]. It appears that despite their analogous catalytic functions, molecular interactions and

Page 8: Nucleotide sequence of the sucA gene encoding the 2-oxoglutarate dehydrogenase of Escherichia coli K12

358

genetic organization, the Elo and E l p components do not share a close evolutionary ancestor and that the ace and suc operons have not arisen by simple duplication of an ancestral operon. It is possible that the 2-oxoglutarate dehydrogenase components of prokaryotes (gram-positive and gram-ne- gative) and eukaryotes will prove to be more closely related to each other than are the El p and E l o components of E. coli and this would provide some interesting insights into the paths of evolution and expression of the Krebs cycle in different groups of organisms.

Expression of the suc operon

The structure of the .sdhB-sucA intergenic region, par- ticularly with respect to the presence of an intercistronic regulatory element or IS-like sequence 1371, raises interesting questions concerning the nature of the suc operon and its expression. Thcrc sequences have been found not only in intercistronic regions, where they decrease the expression of the distal gene(s). but also in front of potential terminators at the ends of independent genes [37] (and C. F. Higgins, per- sonal communication). The view that the sucA and sucB genes constitute an operon is based on several pieces of evidence including the close proximity of sucA and sucB mutations [43], the polar effects of m c A amber mutants [44] and the inde- pendent exprcssion of the suc genes in post-infection labelling studies [6] . In the latter. a IusucAsucB transducing phage (AG136): isolated following spontaneous deletion of a i-gltAsdhsucAsucB transducing phage, was shown to direct the synthesis of the E l o and E2o components in a lysogenic host. This indicated that thc .sue genes possessed an independent promoter, although the possibility that the suc genes were being expressed from a promoter situated to the left of thegltA gene was not excluded. That some functionally unrelated genes (mpG and l y s7 ) had been located between the sdh and suc genes [45] and that sdh amber mutants I461 show no require- ment for exogenous succinate for growth on glucose (Sue- phenotype) were also consistent with the suc genes being expressed as an independent transcriptional unit. The evidence of the nucleotide sequence supports this view in part, but some features demand that the situation be reassessed. There is no obvious transcriptional terminator-like sequence between the sdhB and sucA genes but, if the IS-like sequence functions as a terminator or if this region contains an atypical terminator, then there are potential promoters (B and C) that could mediate independent expression of the suc genes. On the other hand, the IS-like sequence could be functioning as a n inter- cistronic regulatory element decreasing or attenuating the rate of transcription initiated from an upstream promoter, possibly the sdh promoter. If so, the suc genes would not constitute an independent operon but would be the distal genes of a larger multicistronic unit containing the sdh and suc genes. The evidence already presented does not rule out this possibility because the unrelated genes (supG and lysT) are not found in the sequence of the intergenic region (Fig. 3), furthermore, they have recently been relocated betweenglt.4 and sdh [7]. In addition, i t is non known that sdh mutations suppress the succinate requirement of sue (Ipd and tip) mutants [47], so polar sdh mutations, which could potentially affect suc expression, would not be expected to generate a Suc- nutritional phenotype. It has also been observed that sdh mutations invariably decrease the synthesis of the 2-oxoglu- tarate dehydrogcnasc complex [47], but this may not be directly related to transcriptional polarity because suc mu-

tations similarly affect succinate dehydrogenase synthesis. In this connexion it is important to note that succinate dehy- drogenase and the 2-oxoglutarate dehydrogenase complex are similarly regulated with respect to catabolite repression and repression during anaerobic growth [2,9,10,48,49]. Thus, cotranscription of sdh and suc could avoid unnecessary duplication of regulatory sequences. Interestingly, no CRP- binding site, having the consensus proposed by Valentin- Hansen [50], could be detected in the sdhB-sucA region, but there may be one upstream of the sdh genes, serving both the sdh and suc genes. Another possibility that has not been excluded is that there may be two mechanisms for suc gene expression, cotranscription with the sdh genes and inde- pendent transcription from a suc promoter. Clearly, further experiments are required in ordcr to elucidate the transcrip- tional organization of the sdh-suc region and to define the functions of the IS-like sequence and the other polenlial stem- and-loop structures.

The nucleotide sequence of the sueU gene has now been completed and is reported in the following paper [30]. The sequences of four structural genes encoding succinate dehy- drogenase and the corresponding regulatory region have also been completed (M. G. Darlison, D. Wood, R. J. Wilde, and J. R. Guest unpublished). Current work is directed towards (a) transcript mapping in the sdh-sue region and elucidating the molecular basis of sue and sdh gene expression and (b) investigating structural and regulatory relationships of the pyruvate and 2-oxoglutarate dehydrogcnasc complexes.

We thank Elizabeth P. Hull for providing some ofthe MI 3 clones and Ian K. Duckenfield for assisting with computing. We also acknowledge support from the Science and Engineering Research Council by project grant CiRiB35543 to J . R. Guest.

REFERENCES

1. 2. 3.

4. 5.

6. 7. 8.

9.

10.

11.

12.

13.

14.

15.

16.

3 7.

Reed, L. J . (1 974) Ace. Chem. Res. 7, 40 -46. Guest, J. R. (1978) Ado. Neurol. 21, 219 -244. Pettit. F. H., Hamilton. L., Munk, P., Namihira, G., Eley, M. H.,

Willms. C. R. & Reed, L. J. (1973) J . Bid. Chem. 248, 5282 -5290.

Perham, R. N. & Thomas, J. 0. (1971) FEBS Lett. 15, 8-12. Phillips, T. A,, Bloch, P. L. & Neidhardt. F. C. (1980) J . Bucteriol.

Spencer, M. E. & Guest, J. R. (1982) J . Bucteriol. 151, 542 -552 . Bachmann, 9. J. (1983) Microbiol. Rrr. 47, 180-230. Hull, E. P., Spencer. M. E.. Wood, D. & Gucst, J. R. (1983) FEBS

Langley, D. & Guest, J. R. (1978) J. Gen. Microhiol. 106,

Arnarasingham. C . R. & Davis. B. D. (1965) J . Biol. Chrm. 240,

Guest, J. R. & Stephens. P. E. (1980) J . Gen. Microbiol. 121,

Guest, J. R., Cole. S. T. & Jeyaseelan. K. (1981) J . Gen. Microbiol.

Stephens, P. E., Darlison, M. G.: Lewis, H. M. & Guest, J. R.

Stephens, P. E., Darlison, M. G., Leuris, H. M. & Guest, J. R.

Stephens, P. E., Lewis, H. M., Darlison, M. G. & Guest, J. R.

Guest, J. R., Roberts, R. E. & Stephens, P. E. (1983) J . Gen.

Smith, H. 0. (1980) Meihods Enzytnol. 65, 371 -380.

144> 1024 - 1033.

Lett. 156, 366 - 370.

103 -117.

3664 - 3668.

277 - 292.

127, 65 - 79.

(1983) Ear. J . Biochtw. 133. 155 -162.

(1983) Eur. J. Biochem. 133, 481 -489.

('1983) Eitr. J . Biochenz. 135. 519 -527.

Microbiol. 129, 671 -680.

Page 9: Nucleotide sequence of the sucA gene encoding the 2-oxoglutarate dehydrogenase of Escherichia coli K12

359

18. Kiihn, S., Fritz, H.-J. & Starlinger, P. (1979) Mol. Gen. Genet. 167,

19. Winter, G. & Fields, S. (1980) Nucleic Acids Res. 8, 1965 - 1974. 20. Shaw. D. J. & Guest, J. R. (1982) J . Gen. Microbiol. 128,

21. Sanger, F., Coulson, A. R., Barrell, B. G., Smith, A. J. H. & Roe, B. A. (1980) J . Mol. Biol. 143, 161 -178.

22. Messing, J. & Vieira, J . (1982) Gene 19, 269 -276. 23. Deininger, P. L. (1983) Anul. Biochenz. 129, 216-223. 24. Sanger, F., Nicklen, S. & Coulson, A. R. (1977) Proc. Natl Acud.

25. Staden, R. (1979) Nucleic Acids Res. 6, 2601 -2610. 26. Staden, R. (1980) Nucleic Acids Res. 8, 3673 -3694. 27. Staden, R. & McLachlan, A. D. (1982) Nucleic Acids Res. 10,

28. Staden, R. (1982) Nucleic Acids Res. 10, 2951 -2961. 29. Schwartz, R. M. & Dayhoff, M. 0. (1978) in Atlas ofprotein

SequenceundStructure, vol. 5, suppl. 3, pp. 353 -358, National Biomedical Research Foundation, Washington, DC.

30. Spencer, M. E.; Darlison, M. G., Stephens, P. E., Duckenfield, I. K . & Guest, J. R. (1984) Eur. J . Biocherrr. 141, 361 - 374.

31. Peden, K. W. C. (1983) Gene 22, 277-280. 32. Shine, J. & Dalgarno, L. (1974) Proc. Natl Acad. Sci. USA 71,

33. Stormo, G. D., Schneider, T. D. & Gold, L. M. (1982) Nucleic

34. Atkins, J. F. (1979) Nucleic Acids Res. 7, 1035-1041. 35. Flawley, D. K. & McClure, W. R. (1983) Nuckic Acids Res. 11,

235-241.

2221 -2228.

Sci. USA 74, 5463 -5467.

141 -156.

1342 - 1346.

Acids Res. 10, 2971 -2996.

2237 -2255.

36. Grundstrom, T. & Jaurin. B. (1982) Proc. Nut1 Acad. Sci. USA 79,

37. Higgins, C. F., Ames, G. F.-L., Barnes, W. M., Clement, J. M. & Hofnung, M. (1982) Nuture (Lond.) 298, 760-762.

38. Tinoco, I., Jr, Borer, P. N., Dengler, B.. Levine, M. D., Uhlenbeck, 0. C., Crothers, D. M. &Graila, J. (1973) Nut. New

1111 -1115.

Biol. 246, 40 -41. 39. Grosjean, H. & Fiers, W. (1982) Gene 18, 199-209. 40. Capaldi, R. A. & Vanderkooi, G. (1972) Proc. Natl Acad. Sci.

41. Henning, U., Vogel, O., Busch, W. & Flatgaard, J. E. (1972) in Protein-Protein Interactions (Jaenicke, R. & Helmreich, E., eds) pp. 343 -361, Springer Publ., New York.

42. Yeaman, S. J., Hutcheson, E. T., Roche, T. E., Pettit, F. H., Brown, J. R., Reed, L. J., Watson, D. C. &Dixon, G. H. (1978) Biocheniistry 17, 2364 -2370.

43. Herbert, A. A. & Guest, J. R. (1969) Mol. Gen. Genet. 105,

44. Creaghan, I. T. & Guest, J. R. (1972) J. Gen. Microbiol. 71,

45. Bachmann, B. J. & Low, K. B. (1980) Microbiol. Rev. 44, 1-56. 46. Spencer, M. E. & Guest, J. R. (1974) J. Bacteriol. I17, 947 -953. 47. Creaghan, I. T. & Guest, J. R. (1978) J . Gen. Microbiol. 107,

48. Ruiz-Herrera, J. & Garcia, L. G. (1972) J. Gen. Microbiol. 72,

49. Spencer, M. E. & Guest, J. R. (1972) J . Bacteriol. 114, 563 -570. 50. Valentin-Hansen, P. (1982) EMBO J. I . 1049-1054.

USA 69, 930 -932.

I82 - 190.

207 - 220.

1-13.

29 -35.

M. G. Darlison, M. E. Spencer, and J. R. Guest, Department of Microbiology, University of Sheffield, Western Bank, Sheffield, South Yorkshire, England, S10 2TN