9
Eur. J. Biochem. 133, 481-489 (1983) T FEBS 1983 The Pyruvate Dehydrogenase Complex of Eschevichia coli K12 Nucleotide Sequence Encoding the Dihydrolipoamide Acetyltransferase Component Paul E. STEPHENS, Mark G. DARLISON, Hilary M. LEWIS, and John R. GUEST Department of Microbiology, University of Sheffield (Received March 2, 3983) - EJB 830198 The nucleotide sequence of the aceF gene, which encodes the dihydrolipoamide acetyltransferase component (E2) of the pyruvate dehydrogenase complex of Escherichia coli Kl2, has been determined using the dideoxy chain- termination method. The aceFgene comprises 1887 base pairs (629 codons excluding the initiation codon AUG); it is preceded by a short intercistronic segment of 14 base pairs containing a good ribosomal binding site, and it is followed closely by a potential rho-independent terminator. The results extend by 1980 base pairs the previously sequenced segment of 3780 base pairs containing the structural gene (ace@ of the pyruvate dehydrogenase component (El) and they confirm that aceE and aceF are the proximal and distal genes of the ace operon. The amino terminus, carboxy-terminal sequence and amino acid composition of the acetyltransferase subunit predicted from the nucleotide sequence are in excellent agreement with previous studies with the purified protein. The predicted molecular weight (M, = 65959) confirms experimental values derived from sedimentation equilibrium analysis and indicates that the higher values (78 000 - 89 000) that have been reported are due to unusual features of the protein that lead to anomalous mobilities during sodium dodecyl sulphate/polyacrylamide gel electrophoresis and in gel filtration. The primary structure fully supports conclusions, based on limited tryptic proteolysis, that the acetyltransferase subunit possesses two heterologous domains : the lipoyl domain and the subunit binding and catalytic domain. The lipoyl domain corresponds to the amino-terminal segment of the protein. It is acidic and contains three remarkably homologous repeating units of approximately 100 amino acids, each possessing a potential lipoyl binding site and a region that is characteristically rich in alanine and proline residues. The subunit binding and catalytic domain occupies most of the residual polypeptide in the carboxy-terminal segment. The pyruvate dehydrogenase complex catalyses the oxi- dative decarboxylation of pyruvate to acetyl-CoA and CO, via a series of enzyme-bound intermediates [I, 21. Pyruvate + NAD' + CoA --f Acetyl-CoA + CO, + NADH + H'. It contains multiple copies of three cnzymic components: pyruvate dehydrogenase (E I), dihydrolipoamide acetyltrans- ferase (E2) and lipoamide dehydrogenase (E 3). In Esclzerichiu coli the acetyltransferase component contains 24 identical polypeptide chains forming the structural core of the complex to which the El and E3 subunits are independently bound. The E2 component contains covalently-bound lipoyl cofactors and it participates in the generation of acetyl groups from hydroxyethyl-TPP-El and their transfer to coenzyme A. Hydroxyethyl-TPP-El + lipoyl-E2 --f TPP-El + acetyl-dihydrolipoyl-E2 Acetyl-dihydrolipoyl-E2 + CoA + Acetyl-CoA + dihydrolipoyl-E2 where TPP is thiamin pyrophosphate. The apparent molecular weight of the E2 subunit has been estimated as 78000 - 89000 by sodium dodecyl sulphate/ polyacrylamide gel electrophoresis or gel-filtration in gua- Enzymes. Pyruvate dehydrogenasc (EC 1.2.4.1): dihydrolipoamide acetyltransferase (EC 2.3. I. 12); lipoamide dehydrogenase (EC 1.6.4.3); DNA polymerase - Klenow fragment (EC 2.7.7.7); restriction endo- nucleases: AccI (EC 3.1.23.47), AvaI (EC 3.1.23.3), BamHI (EC 3.3.23.6). &/I1 (EC 3.1.23.101, EcoRI (EC 3.1.23.13), Hind111 (EC 3.1.23.22), MspI (EC 3.1.23.241, PsrI (EC 3.1.23.32). Sau3A (EC 3.1.23.27), TayI (EC 3.3.23.39). nidine HCI using either purified protein [3 -91 or specifically- labelled products of phage-directed protein synthesis [lo]. However, values of 60000 - 65 000 have been obtained by sedimentation equilibrium 13, 9). Sometimes the E2 subunit appears as two or more components (M, in the range 80000- 89 000) on sodium dodecyl sulphate/polyacrylamide gels and these may also be accompanied by smaller components of M, = 35000- 38000 [4, 6, 81, which suggests that this component is liable to modification or cleavage. The E2 subunit possesses one or more regions that are highly susceptible to trypsin cleavage, and two models for the structure of the subunit have been proposed, based on limited tryptic proteolysis of the native complex and the acetyltrans- ferase. Hale and Perham [11] and Gebhardt et al. [8] have suggested that it contains two homologous folding domains of M, = 35000-39000, that could have resulted from gene duplication and fusion. In contrast, Bleile et al. [9] have proposed that the acetyltransferase subunit comprises two non- identical domains: a compact subunit binding and catalytic domain (tryptic fragment D) and a flexible extension, the lipoyl domain (tryptic fragment A), that may be connected to a trypsin-sensitive hinge region containing two closely-spaced trypsin-sensitive bonds. The apparent molecular weights of these fragments vary markedly according to the method of determination: gel electrophoresis giving consistently higher values (fragment A, 46000 - 48 000; fragment D, 36 000 - 39000) than sedimentation equilibrium (fragment A, 31 600; fragment D, 29600) [9]. Bleile et al. [9] have suggested that the previous authors [8, 1 I] may have overlooked the presence of fragment A because its acidic nature makes it resistant to staining and that the doublet of M, = 36000-39000 cor-

The Pyruvate Dehydrogenase Complex of Escherichia coli K12 : Nucleotide Sequence Encoding the Dihydrolipoamide Acetyltransferase Component

Embed Size (px)

Citation preview

Page 1: The Pyruvate Dehydrogenase Complex of Escherichia coli K12 : Nucleotide Sequence Encoding the Dihydrolipoamide Acetyltransferase Component

Eur. J. Biochem. 133, 481-489 (1983) T FEBS 1983

The Pyruvate Dehydrogenase Complex of Eschevichia coli K12 Nucleotide Sequence Encoding the Dihydrolipoamide Acetyltransferase Component

Paul E. STEPHENS, Mark G. DARLISON, Hilary M. LEWIS, and John R. GUEST

Department of Microbiology, University of Sheffield

(Received March 2, 3983) - EJB 830198

The nucleotide sequence of the aceF gene, which encodes the dihydrolipoamide acetyltransferase component (E2) of the pyruvate dehydrogenase complex of Escherichia coli Kl2, has been determined using the dideoxy chain- termination method. The aceFgene comprises 1887 base pairs (629 codons excluding the initiation codon AUG); it is preceded by a short intercistronic segment of 14 base pairs containing a good ribosomal binding site, and it is followed closely by a potential rho-independent terminator. The results extend by 1980 base pairs the previously sequenced segment of 3780 base pairs containing the structural gene (ace@ of the pyruvate dehydrogenase component (El) and they confirm that aceE and aceF are the proximal and distal genes of the ace operon.

The amino terminus, carboxy-terminal sequence and amino acid composition of the acetyltransferase subunit predicted from the nucleotide sequence are in excellent agreement with previous studies with the purified protein. The predicted molecular weight (M, = 65959) confirms experimental values derived from sedimentation equilibrium analysis and indicates that the higher values (78 000 - 89 000) that have been reported are due to unusual features of the protein that lead to anomalous mobilities during sodium dodecyl sulphate/polyacrylamide gel electrophoresis and in gel filtration. The primary structure fully supports conclusions, based on limited tryptic proteolysis, that the acetyltransferase subunit possesses two heterologous domains : the lipoyl domain and the subunit binding and catalytic domain. The lipoyl domain corresponds to the amino-terminal segment of the protein. It is acidic and contains three remarkably homologous repeating units of approximately 100 amino acids, each possessing a potential lipoyl binding site and a region that is characteristically rich in alanine and proline residues. The subunit binding and catalytic domain occupies most of the residual polypeptide in the carboxy-terminal segment.

The pyruvate dehydrogenase complex catalyses the oxi- dative decarboxylation of pyruvate to acetyl-CoA and CO, via a series of enzyme-bound intermediates [I , 21.

Pyruvate + NAD' + CoA --f

Acetyl-CoA + CO, + NADH + H'. It contains multiple copies of three cnzymic components:

pyruvate dehydrogenase (E I), dihydrolipoamide acetyltrans- ferase (E2) and lipoamide dehydrogenase (E 3). In Esclzerichiu coli the acetyltransferase component contains 24 identical polypeptide chains forming the structural core of the complex to which the El and E3 subunits are independently bound. The E2 component contains covalently-bound lipoyl cofactors and it participates in the generation of acetyl groups from hydroxyethyl-TPP-El and their transfer to coenzyme A.

Hydroxyethyl-TPP-El + lipoyl-E2 --f

TPP-El + acetyl-dihydrolipoyl-E2 Acetyl-dihydrolipoyl-E2 + CoA +

Acetyl-CoA + dihydrolipoyl-E2 where TPP is thiamin pyrophosphate.

The apparent molecular weight of the E2 subunit has been estimated as 78000 - 89000 by sodium dodecyl sulphate/ polyacrylamide gel electrophoresis or gel-filtration in gua-

Enzymes. Pyruvate dehydrogenasc (EC 1.2.4.1): dihydrolipoamide acetyltransferase (EC 2.3. I . 12); lipoamide dehydrogenase (EC 1.6.4.3); DNA polymerase - Klenow fragment (EC 2.7.7.7); restriction endo- nucleases: AccI (EC 3.1.23.47), AvaI (EC 3.1.23.3), BamHI (EC 3.3.23.6). &/I1 (EC 3.1.23.101, EcoRI (EC 3.1.23.13), Hind111 (EC 3.1.23.22), MspI (EC 3.1.23.241, PsrI (EC 3.1.23.32). Sau3A (EC 3.1.23.27), TayI (EC 3.3.23.39).

nidine HCI using either purified protein [3 -91 or specifically- labelled products of phage-directed protein synthesis [lo]. However, values of 60000 - 65 000 have been obtained by sedimentation equilibrium 13, 9). Sometimes the E2 subunit appears as two or more components ( M , in the range 80000- 89 000) on sodium dodecyl sulphate/polyacrylamide gels and these may also be accompanied by smaller components of M, = 35000- 38000 [4, 6, 81, which suggests that this component is liable to modification or cleavage.

The E2 subunit possesses one or more regions that are highly susceptible to trypsin cleavage, and two models for the structure of the subunit have been proposed, based on limited tryptic proteolysis of the native complex and the acetyltrans- ferase. Hale and Perham [11] and Gebhardt et al. [8] have suggested that it contains two homologous folding domains of M, = 35000-39000, that could have resulted from gene duplication and fusion. In contrast, Bleile et al. [9] have proposed that the acetyltransferase subunit comprises two non- identical domains: a compact subunit binding and catalytic domain (tryptic fragment D) and a flexible extension, the lipoyl domain (tryptic fragment A), that may be connected to a trypsin-sensitive hinge region containing two closely-spaced trypsin-sensitive bonds. The apparent molecular weights of these fragments vary markedly according to the method of determination: gel electrophoresis giving consistently higher values (fragment A, 46000 - 48 000; fragment D, 36 000 - 39000) than sedimentation equilibrium (fragment A, 31 600; fragment D, 29600) [9]. Bleile et al. [9] have suggested that the previous authors [8, 1 I] may have overlooked the presence of fragment A because its acidic nature makes it resistant to staining and that the doublet of M, = 36000-39000 cor-

Page 2: The Pyruvate Dehydrogenase Complex of Escherichia coli K12 : Nucleotide Sequence Encoding the Dihydrolipoamide Acetyltransferase Component

482

responds to the subunit binding domain (fragment D) rather than two homologous folding domains. Support for the model of Bleile et al. [9] is provided by electron micrographs of the pyruvate dehydrogenase complex indicating that fragment D provides the subunit binding sites because the complex retains its integrity under conditions of tryptic digestion that remove fragment A. Furthermore, high-resolution electron micro- graphs of the acetyltransferase indicate the presence of fibre- like extensions that make up a ‘fuzz’ of lipoyl domains surrounding the octahedral core [9]. It is also evident from recent NMR spectroscopic studies of the pyruvate dehydrogenase complex that large segments of the polypeptide chain encom- passing the lipoic acid residues exhibit a high degree of conformational mobility [12]. This greatly increases the effec- tive radius of the lipoyl-lysyl swinging arms, which carry substrate between the catalytic sites of the three enzymic components and between different acetyltransferase subunits in the complex.

The lipoic acid cofactors are bound in amide linkage to the &-amino group of lysyl residues and each acetyltransferase chain is thought to contain two [13- 161 or possibly three [I71 lipoyl groups. However, only one sequence of up to 13 amino acid residues, Ile-Thr-Val-Glu-Gly-Asp-Lys(Lip)-Ala-Ser- Met-Glu-Val-Pro, has been detected [I, 181 suggesting that this sequence is repeated twice or even three times in the E2 polypeptide chain.

The pyruvate dehydrogenase complex is encoded by three linked genes : aceE (El component), aceF (E2 component) and lpd (E3 component), situated at 2.6 min in the E. coli linkage map [2]. These genes have been cloned in phage and plasmid vectors and their approximate positions relative to the re- striction sites in the corresponding segment of DNA are illustrated in Fig.J [lo, 19, 201. The ace genes constitute an operon with aceEF polarity and although it has been estab- lished that the lpd gene can be transcribed with the same polarity from an independent promoter, it is conceivable that the lpd gene could also be transcribed as the distal gene of the ace operon.

The nucleotide sequence of a 3780-base-pair segment of DNA containing an unidentified gene (geneA) and the py- ruvate dehydrogenase gene (aceE) has been determined [21]. This paper reports the extension of this nucleotide sequence by 1980 base pairs to include the dihydrolipoamide acetyltrans- ferase gene (aceF). The aceF coding region was identified and translated into the primary structure of the dihydrolipoamide acetyltransferase subunit (E2) of the E. coli pyruvate dehy- drogenase complex. Answers were found for several questions concerning the size, structural organization and number of potential lipoyl-binding sites of this component.

MATERIALS AND METHODS

Sources of DNA

The original source of the DNA used for sequencing most of the aceF gene was a LaceEFlpd transducing phage (1G91) [19]. The fundamental fragments (Fig. 1) used for ‘shot-gun’ cloning in M13 were obtained from the phage and plasmid derivatives described previously [20] : H3-H4 (2000 base pairs fromiG107 andIG108); H4-R3 (5400 basepairsfrompGS20). Other specific fragments shown in Fig. 1 are the 670-base-pair fragment (H4-P4) and the 4730-base-pair fragment (P4-R3) that were obtained from the 5400-base-pair fragment (H4-R,). A specific Sau3A fragment, needed to overlap the sequence at a critical HindIII site (H,), was obtained by cloning from a

Sau3A digest of a 899-base-pair AccI-PstI fragment, Ac,-P, [20], which was in turn obtained from the 8300-base-pair EcoRI fragment (R,-R,) of a i lpd transducing phage (iG83L) [20]. The methods used for preparing phage and plasmid DNA and for isolating specific restriction fragments have been listed previously [21].

Cloning in M13

Most of the nucleotide sequence was derived by analysing the products of ‘shot-gun’ cloning Sau3A and Tag1 subfrag- ments of the fundamental fragments into the BamHI and AccI sites of M13mWJ43 [22] and Ml3mp701 (D. Bentley, un- published), respectively. Additional information was obtained from MspI subfragments of the 5400-base-pair fragment (H4- R3) cloned into the AccI site of M13mp701. The HindIII fragment (H,-H,, Fig. 1) was cloned in M13mp5 [23], the PstI- EcoRI fragment (P4-R3, Fig. 1) and the Suu3A digestion products of the Ac,-P, fragment were cloned in MI 3mp8 [24], and the HindIII-PstI fragment (H4-P4) was cloned in both orientations using appropriate sites in M13mp8 and M13mp9

Recombinant phages were identified as colourless plaques on indicator plates containing 5-bromo-4-chloro-3-indolyl-~- D-galactoside after transfecting E. coli strain JMI 01 [ A (lac-pro) supE thilF’traD36 proAB lacP Z A M15] using published methods [25]. Clone reversal and hybridization were as outlined by Winter and Fields [26].

~ 4 1 .

Nucleotide Sequence Analysis

Single-stranded DNA templates were prepared from re- combinant M13 phages [25] and sequenced by the dideoxy chain-termination method [27] using a 17-nucleotide synthetic primer [28]. Nucleotide sequences were compiled and analysed using the Staden computer programs [29 - 331.

Materials

Restriction endonucleases were purchased from Bethesda Research Laboratories Inc., New England Biolabs, Boehringer Corporation Ltd and Miles Laboratories Inc. Phage T4-DNA ligase was either a gift from K. and N. E. Murray or purchased from Bethesda Research Laboratories Inc. The 17-nucleotide synthetic primer was generously provided by A. Coulson. DNA polymerase (Klenow fragment) was obtained from the Boehringer Corporation Ltd. Dideoxy nucleotide triphos- phates were purchased from Collaborative Research Inc. and 2’-deoxyadenosine 5’-[0r-~~P] triphosphate was from Amer- sham International.

RESULTS AND DISCUSSION

Sequencing Strategy

The organization of the genes encoding the pyruvate dehydrogenase complex of E. coli is illustrated in Fig. 1. The positions of the ace and lpd genes relative to the restriction map are based on genetic studies with recombinant lambda phages [19, 201. These indicated that the dihydrolipoamide acetyl- transferase gene (aceF) is encoded by two fundamental frag- ments, H3-H4 and H4-R,, flanking the HindIII site (H,). An overall strategy for sequencing the ace and lpd genes by combining the ‘shot-gun’ method for the three fundamental

Page 3: The Pyruvate Dehydrogenase Complex of Escherichia coli K12 : Nucleotide Sequence Encoding the Dihydrolipoamide Acetyltransferase Component

48 3

I I I I I I 1

T T T T T T

- ---- 1 T T

Specifics { Fig. 1. Organisation of thepyruvate dehydrogenase complexgenes of E. coli andsummary ofDNA sequence data obtainedfrom MI3 clones. The region of the linkage map at 2.6 min containing the ace& aceFand Ipd genes and an unidentified gene (gmeA) is shown aligned with the restriction map. The polarities (clockwise) of ace and [pdtranscription are indicated and the restriction targets for AccZ(Ac), Hind111 (H), EcoRI (R) and PstI (P) are defined by subscripts according to Guest et al. [20]. The maps are drawn to scale and the sizes of the fundamental fragmens, H,-H,, H,-H, and H,-R,, are shown in base pairs. The expanded section containing the aceFgene summarises the sites identified by 'shot-gun' cloning (Suu3A, S; TuqI, T and MspI, M) and those used in directed cloning. The directions and extents of the sequences obtained from the M13 clones are indicated. The asterisk (*) denotes the Sau3A clone obtained from the Ac,-P, fragment to overlap the Hind111 site, H, (see text). The scale corresponds to numbers of base pairs starting at the Accl site (Ac~), see Stephens et al. [21]. The sequence is fully overlapped and most of it (82 %) was obtained from both DNA strands

fragments (Fig. 1) and the directed approach to provide overlaps, has been outlined in a previous paper [21], which described the sequence of a 3780-base-pair segment containing the pyruvate dehydrogenase gene, aceE. In the present paper the sequence of a further 1980 base pairs encoding the dihydrolipoamide acetyltransferase gene (acef l is reported. The nucleotide sequence containing the lipoamide dehydro- genase gene (lpd) will be published later.

A restriction map summarising the MI3 clones and se- quence data used to analyse the aceF gene is shown in Fig. 1. The complete and unambiguous sequence of the region con- taining the aceF gene is presented in Fig. 2. All of the sequence was obtained from at least two independent clones, it was fully overlapped and 82 % was obtained from both DNA strands.

Location of Coding Regions

The coding regions were identified using the computer program FRAMESCAN [33]. Only one large open reading frame, possessing a consistently high score for preferred codon usage, was found. It contains 1890 base pairs and is situated between the triplet for the AUG initiation codon (position 3784, Fig. 2) and that for the UAA stop codon (position 5674 in Fig.2) in the DNA strand previously identified as the ace coding strand 110, 19, 211. Its size corresponds to that of the aceF structural gene predicted from the lowest estimates of the

size of the dihydrolipoamide acetyltransferase subunit based on sedimentation equilibrium [9]. Other features confirm that this coding region is the aceF structural gene (see below), and no other significant open reading frames were detected in either the coding strand or the complementary strand.

The Intercistronic Region

The results support the view that the aceFgene is expressed from the ace promoter [2, 10, 211. The distance between the stop codon of the aceE gene and the start of aceF is only 14 nucleotides (Fig. 2) and no independent aceF promoter se- quence could be detected in this region or in the aceE structural gene [21]. The intercistronic region is extremely purine-rich and it contains a well-placed ribosome binding site [34, 351 having five consecutive residues, d(G-A-G-G-T) (positions 3770 - 3774), that are complementary to the 3'-terminal sequence of 16s ribosomal RNA (Fig. 2). The reading frame for the genes is not continuous across the intercistronic region but this is not unusual. The translational initiation region satisfies rules 1 - 8 of Stormo et al. [36] and Atkins' rule 1371, the nearest stop codon upstream of the initiation codon being an ochre codon (positions 3782- 3784).

Post-infection labelling studies have indicated that the aceE and aceF genes are expressed at approximately equal rates and this is consistent with a polypeptide stoichiometry of 1 : 1 for

Page 4: The Pyruvate Dehydrogenase Complex of Escherichia coli K12 : Nucleotide Sequence Encoding the Dihydrolipoamide Acetyltransferase Component

484

Met Al aneGluIieLysta3 ProAspIleGlyAl aAspGluValGluIleThrGlu ATAATGGCTATCGAAATCAAAGTACCGGACATCGGGGCTGATGAAGTTGAAATCACCGAG

3790 3800 3810 3820 3830 3840 20 30

As~~IUaSerMetGluValProSerProGlnAi &lyIleVal~ysGluIleLysVal

3910 3920 3930 3940 7950 7960 GACAAAGCCTCTATGTCCGTCTCCGCAGGCGGGTATCGTTAAAGAGATMAGTC

60 70 SerValGlyAspLysThrGlnThrGlyAl aLeuIl~MetIlePheAspSerA1 aAs pGly TC'lWCTGGCGATAAAACCCAGACCGGCGCACTGATTATGATTTTCGATTCCGCCGACG

B70 3980 3993 4030 4010 4020 80 90

A l a A l aAspAl aA1 aProAl aClr.AlaG1~luLysLysGluAl aAl aF'roA1aAlaNa GCAGCAGACGCTGCACCTCCTCACGCAGAAGAGAAGAAAGAAGCAGCTCCGGCAGCAGCA

4030 4040 4050 4060 4070 4039

ProAl aAl aAl aAl aAl aLysAspVdAsr.Val ProAspIleC,l~~ySerAspluValGlu CCAGCGGCTCXXGCGGCAAAAGACGTTAACGT?CCGGATA?C?GAGCGACGAAG'??GAA

4090 4103 4110 4120 41'50 4140

V a l T h r G l u I 1 e ~ u V a l ~ y s V a l G l y A s p ~ y s V a l G 1 u A l ~ l ~ l r . S e r T e u I l e ~ h r

100 110

1 20 1 30

GTGACCCAAATCCTC~~TCGTTWGATAAAGTTGAAGCTGAACAGTCWTGAT~ 4150 4160 4170 41eO 4193 4200

140 -& 1 50

IleLysValAsnValGlyAspLysValS~r~rGl~erLeuIleMetVal~eGluVal ATCAAAGTGAACGTGGFTGACAAAGTGTCTACCGGCTCGCTGATTATGGTCTTCGAAGTC

4270 428) 4293 4300 4310 4320 1 8 0 193

AlaGlyGluAlsGlyAlaAlaAl~oAI.aAlaLysGlnGl~aAlaPro~aAlaAla GCGGGTGAAGCAGGCGCGGCAGCTCCGGCCGCTAMCAGGAAGCAGCTCCGGCAGCGGCC

4370 4340 4750 4360 4370 4380

Pr oAl aPr oAl aAlaGlyV~LysGluValPsnValProAspIleGl~lyAs~luVal CCTGCACCAGCGGCTGGCGTGAAAGAAGTTAACGTTCCGGATATCGGCGGTGACGAAGTT

4390 4403 4410 4420 4430 4440

GluValThrGluValMetValLysValGlyAspLysV~AlaAl~luGlnSerLeuI~e GAAGTGACTGAAGTGATGGTGAAAGTGGGCGACAAAGTTGCCGCTGAACAGTCACTGATC

4450 4460 4470 4480 4493 4500

203 21 0

220 230

240 * 250

4510 4520 4570 4540 4550 4560

Gl~uLysValAsnValGl YAs~ysVdlLysThrGl~er~uIleMet IlePheGlu GAACTGAAAGTCAACGTTGGCGATAAAGTGAAAACTGGCTCGCTGATTATGATCTTCGAA

4570 4580 4593 4600 4610 4620

ValGluGlyAlaAlaProAl aAl aAl &oAl aLysGlnGluAl aAlaAl aProAlaFro GTTGAAGGCGCAGCGCCTGCGGCAGCTCCTGCGAAACAGGAAGCGGCAGCGCCWACCG

4670 4640 4650 4660 4670 4680

260 270

280 293

Zcx) 71 n Al&&sAl&luAl&oAl aAlaAlaProh &aLysAl aGluGlyLysSerGlu

4690 4700 4710 4720 4770 4740 GCAGCAAAAGCTGAAGCCCCGGCAGCAGCACCAGCTGCGAAAGCGGAAGGCAAATCTGAA

320 770 PheAl aGluhnAspAl alPyrValHisAlaThr ProLeuIleArgArguAl aArgGlu TTTGCTGAAAACGACGCTTATGTTCACGCGACECGCTGATCCGCCGTCTWACGCGAG

4750 4760 4770 4780 4793 4800 340

AspirzGlnAl aTyr ValLysGluAl aIleLysG& EGluAl aAl &oAl aA l d h r GACGTTCAGGCTTACGTGAAAGAAGCTATCAAACGTGCAGAAGCAGCTCCGGCAGCGACT

4870 4.3% 4893 4930 4910 4920 780 393

GlyClyClyllePr~l~eGlyMetLeuProTrph.o~sValAspPheSerLys~~~yG~u G G C G G T ~ A T C C C T W A T ~ ~ C G T G G C C G ~ ~ ~ A C ~ A ~ ~ ~ ~ T ~ G ~

4930 4940 4950 4960 49'70 4980

IleGluGluValG1~uGlyArgI1eGln4ysIleSerGlyAl aAs&uSerkgPsn A T C G A A G A A G T G G A A C T G C G C A T C C A G A A M T C T C T G A A C

4990 5033 5010 5020 5030 5x0

Tr pValMet IlePr oHi sVd!ThrHi sPheAspLysThr Aspn eThrGluLeuGlu4l a TGGGTAATGATCCCGCATGTTACTCACTTCGACAAAACCGATATCACCGAGTTGGAAGCG

5050 5060 5070 50% 5090 5100

PheArgLysGlnGlnAsnGluGlu4laAlaLysAr~ysLeuPspVdLysIleThrPro TTCCGTAAACAGCAGAACGAAGAAGCGGCGAAACWAAGCTGGATGTGAAGATCACCCCG

51 10 5120 5130 51 40 5150 51 60

ValVdLPheIleMetLysAlaV~AlaAl~~~luGlnMetProoArgPheAsnSer GTTGTCTTCATCATGAAAGCCWTGCTGCAGCTCTTGAGCAGATGCCTCGCTTCAATAGT

5170 5180 5193 5200 5210 5220

400 41 0

420 430

440 450

460 470

480 AW S e r ~ ~ e r G l ~ ~ l ~ l ~ r g L e ~ h r ~ ~ ~ s L y s ~ I l e A s n I l e G l y V a l ~ a TCGCTCTCGGAAGACGGTCAGCGTCTGACCCTGAAGAAATACATCAACATCGGTWG

5230 5240 5250 5260 5270 5280 500 51 0

V a l Asprhr Pr oAsnGl yLeuValValPr oVa l~eLys As pValAsnLysLysGly11 e GTGGATACCCCGAACGCTGGTTGTTCCGGTATTCAAAGACGTCAACMGAUWATC

5290 53m 5310 5320 5730 5340

IleGluLeuSer A r ~ G l i T h r I l e S e r ~ y s L y s A l aArgAsspGlyLysLeflhr PLCCGAGCTGTCTCGCGAGCTGA'PGACTATTTCTAAGAMGCGCGTGAC~TAAGCTGACT

5350 5360 5370 5380 5390 54CO

A 1 aGlyGluMetGlnGlyGlys~eThrI1 eSerSer I1 eGlyG1yLeuGlflhrThr GCGGGCGAAATGCAGGGCGTGCTTCACCATCTCCAGCATCWWCTGGGTACTACC

5410 5420 5430 5440 5450 5460

520 530

540 550

560 570 HisPheAl &oIleVal AsnAl aFroGluValL kleLeuGlyValSerLysSerAl a CACTTCGCGCCGATTGTGAACGCGCCGGAAGTWTATCCTCGGCGTTTCCMGTCCGCG

5470 5480 5490 5500 5510 5520 580 590

Met GluFToValTr pAsnGlyLysGluPheV~PooArgLeuMetLeuProI1 eSerLeu ATGGAGCCGGTGTGGAATGAAAGAGTTCGTGCCGCGTC'PGATGCTGCCGATTTCTCTC

5530 5540 5550 5560 5570 5580 600 61 0

SerPheAspHisArgValIleAspGlyIU aAspGlyAl aArgPheIleThrIleIleAn TCCTTCGACCACCGCGTGATCGACGGTGCTGATGGTGCCCGTTTCA'PTACCATCATTMC

5590 5603 5610 5620 5630 5640 620 63Q

AsnThrLeuSerAspIleAr~oAr~uVdM~~*** AACACGCTGTCTGACATTCGCCGTCTGGTGATGTAAGTA GAGCCWCCAACGGCCG

5650 5660 5670 5 6 e 569* Y r X ,

G C T T T T T T C T ~ A A T C T C A ~ G A A T G T A T T G A G G T T A T T A ' P G 5710 5720 5770 5740 5750 5760

Fig. 2. The nucleotide serpence of the aceF gene andprimary .structLire o f i t s product. The nucleotide sequence of 2040 base pairs of the non-coding (sense) strand of the aceFgene is shown in the 5' + 3' direction. This extends by 1980 base pairs the previously sequenced segment of 3780 base pairs containing the aceE gene [21]. The nucleotide coordinates are numbered from the first base ofthe AccI restriction target, Ac, in Fig. I. The primary structure of the 629 amino acids comprising the dihydrolipoamide acetyltransferase (aceFgene product) is shown directly above the nucleotide sequence. Significant regions of dyad symmetry are underlined by converging arrows, potential ribosome binding sites (Shine-Dalgarno sequences) are boxed and relevant stop codons are denoted thus: ***. Potential lipoyl binding sites containing the critical lysine residues ( L ~ s ) in three identical sequences of 18 amino acids are also underlined. Hyphens indicating both phosphodiester and peptidyl linkages and the d representing deoxy have been omitted

E l : E2 [9]. However, ratios of 1.3: 1.0 and 1.8: 1.0 (El : E2) have been reported for the native complex [I 7,38,39] so it is possible that the aceE and aceF genes are translated at different rates. Although several operons are known to have intercistronic

regions that resemble the aceE-Fregion in size and composition (e.g. frpC-B [40], galE-T [41], and atpF-A and arpA-H [42]), there are cases where the termination and initiation codons or even the coding sequences of adjacent genes are overlapping

Page 5: The Pyruvate Dehydrogenase Complex of Escherichia coli K12 : Nucleotide Sequence Encoding the Dihydrolipoamide Acetyltransferase Component

485

(e.g. trpE-D, trpB-A 1401 and frdA-B [43]). In the latter examples, translational coupling appears to ensure that the subunits are produced in equimolar proportions. Thus, it is possible that the intercistronic region between two genes may allow disproportionate synthesis of the corresponding prod- ucts such as the E l and E2 components of the pyruvate dehydrogenase complex.

Transcription Termination

Three residues downstream of the triplet for the trans- lational termination codon d(T-A-A) (positions 5674- 5676) is a region of hyphenated dyad symmetry (positions 5679 - 5708) capable of forming a very stable stem and loop structure ( d G = -125 kJ, -30 kcal); [44]) in the mRNA transcript (Fig. 3). Furthermore, the presence of a (dG - dC)-rich se- quence and the run of six thymidine residues, which are characteristic of rho-independent terminators [45], clearly

indicate that this is likely to be the transcription termination region for the ace operon.

Codon Usage

The codon usage for the aceFgene is shown in Table 1. It is non-random and typical of many E. coli genes [46]. The G + C content in the third positions of the 32 quartet codons is 56 %. This value is similar to the 54 % observed for the aceE gene [21] and together with the clear preference for codons having an intermediate interaction energy it would appear that both genes of the ace operon are strongly expressed [46, 471.

Primary Structure und Composition of the Acetyltran$erase Component ( E 2 )

The primary structure of the acetyltransferase component derived by translating the nucleotide sequence of the nceFgene

AG = - 3 0 k c a l

A A L L

C G c c G C G C C G . .

C G G C A T G T A T

GTCTGGC L e u q * * AA-AAAGAATAATG ;;;p CT------------ 4 Met A T G A A G T A ** A T T C T G G T A A T

a E E2 (m) 3766 5708 5716

1 8 8 7 b p 629aa

Fig. 3 . Nucleotide sequencesflanking the aceF gene. The nucleotide sequences flanking the aceFgene are redrawn to highlight features of the intercistronic region between the aceE and acef'genes and a potential rho-independent transcription terminator ofthe ace operon. The proposed ribosome-binding site is boxed, the translational start and stop codons, and the amino-terminal and carboxy-terminal residues of the dihydrolipoamide acetyltransferase subunit are indicated. (The d representing deoxy and the hyphens for phosphodiester linkages have been omitted.) The stability of the stem and loop structure in the mRNA transcript is denoted by the free energy value calculated according to Tinoco et al. [44] (-30 kcal = -12SkJ)

Table 1. Codon usage in the aceF gene The AUG initiation codon is not included with the methionine codons in this table

U UUU u UC UUA UUG

CUU CUC CUA CUG

AUU AUC AUA AUG

GUU GUC GUA GUG

Phe 5 Phe 14 Leu 0 Leu 1

Leu 2 Leu 2 Leu 0 Leu 28

Ile 10 Ile 35 Ile 0 Met 16

Val 27 Val 8 Val 6 Val 21

ucu UCC UCA UCG

CCU CCC CCA CCG

ACU ACC ACA ACG

GCU GCC GCA GCG

Ser 12 Ser 5 Ser 1 Ser 6

Pro 6 Pro 0 Pro 3 Pro 28

Thr 9 Thr 17 Thr 0 Thr 1

Ala 28 Ala 9 Ala 30 Ala 29

UAU UAC UAA UAG

CA U CAC CAA CAG

AAU AAC AAA AAG

GAU GAC GAA GAG

TY r TYr Ochre Amber

His His Gln Gln

Asn Asn LYS LYS

ASP ASP Glu Glu

1 2 1 0

1 4 0

1.5

2 15 40 13

11 22 47 11

UGU UGC UGA UGG

CGU CGC CGA CGG

AGU AGC AGA AGG

GGU GGC GGA GGG

cys 0 cys 1 Opal 0 Trp 3

Arg 12 Arg 8 Arg 0 Arg 0

Ser I Ser 4 Arg 0 Arg 0

Gly 20 Gly 30 Gly 0 Gly I

Page 6: The Pyruvate Dehydrogenase Complex of Escherichia coli K12 : Nucleotide Sequence Encoding the Dihydrolipoamide Acetyltransferase Component

486

I u4

705 316 PAAAPAAKAEGK

3

Fig. 4. The lipoyl domain ofthe acetyltransferase component (E2) . The three 100-amino-acid repeating units (1,2 and 3) of the acetyltransferase chain are redrawn in single-letter amino-acid code and aligned for maximum homology. The potential lipoyl binding sites are denoted by the three lysine residues that are marked with asterisks (k). Residues that are identical in at least two of the three repeating units are enclosed in boxes. The residues are numbered from the alanine residue at the amino terminus of the acetyltransferase. Also shown are 12 additional residues that extend the sequence to the end of the postulated lipoyl domain (A1 in Table 3) corresponding to fragment A (M, = 31 600), the product oflimited tryptic proteolysis identified by Bleileet al. [9].

Table 2. Amino acid composition of the dihydrolipoamide acetyltransferase component (E2) of the pyruvate dehydrogenase complex of E. coli The amino acid composition and M, values derived from the nucleotide sequence of the aceFgene are compared with those reported for the purified protein by Vogel et at. [4], Eley et al. [3], and Bleile et at. [9]. For ease of comparison the content for each amino acid is expressed as a percentage of the total number of amino acids (mol %). The methionine initiation residue is not included in the DNA-derived composition; n.d. = not determined

Amino No. of Composition from acid residues ~

from DNA sequence DNA [4] [31 [91

ASP Asn Thr Ser Glu Gln Pro GlY Ala V a1 Met Ile Leu TYr Phe LY s His Arg CYS TrP

Total Mr

in excellent agreement with previous protein chemical analyses (Table 2). Thus it would appear that the size of this subunit is markedly over-estimated in sodium dodecyl sulphate/poly- acrylamide electrophoresis. It has been suggested that an abnormally low electrophoretic mobility may be due to the acidic nature of the protein [9] but the anomalous behaviour also extends to gel filtration in guanidine . HCI [7], indicating that some other structural feature is responsible. Uncertainty about the size of the E2 polypeptide has been an important factor in the continuing controversy over the subunit com- position of the native pyruvate dehydrogenase complex, and many of the stoichiometries will need to be reassessed now that the lower estimates for the unmodified E2 subunit have

mo1/100 mol received independent confirmation. ~- ____ ____

3.5 33 17 21 4.3 29 4.6

2.4 58 15 37 5.9 51 8.1 96 15.3 68 10.8 16 2.5 45 7.2 33 5.2 3 0.5

19 3.0 53 8.4

5 0.8 20 3.2

1 0.2 3 0.5

629

7.1 8.1 8.0

3.9 4.2 4.4 4.4 5.2 4.6

12.6 11.6 12.7

5.6 5.7 5.7 8.8 8.2 8.2

16.1 13.3 14.7 10.1 10.5 10.9 2.1 2.4 2.4 6.9 6.8 6.5 5.3 5.4 6.5 0.5 0.7 0.6 2.8 2.9 2.9 8.5 7.6 8.0 0.8 0.9 1.1 3.0 3.2 3.1 0.1 0.2 n.d. 0.8 0.6 0.8

769 669 613

The Lipoyl Binding Site

The sequence of 13 amino acid residues containing the lipoyl-lysine residue occurs three times in the amino acid sequence of the acetyltransferase at residues 34- 46, 137- 149 and 238 - 250, the critical lysines being residues 40,143 and 244 (Fig. 2). Further examination of the primary structure reveals that the three potential lipoyl-lysine residues are situated in much larger repeating units. In fact the first 304 amino acids contain three highly conserved repeats of approximately 100 residues. These are presented in Fig. 4 where the repeated regions are aligned to give maximum homology. It is im- mediately clear that the region of complete identity surround- ing the critical lysine residues extends to 18 amino acids. The overall degree of sequence conservation is remarkably high : the first and third repeating units share 76 and 80 identical residues with the second repeating unit and 69 with each other (Fig. 4). Furthermore, many of the non-identical residues represent conservative substitutions. It is also apparent that each repeat- ing: unit has a region that is extremlv rich in alanine and Droline

-

Y Y

65 959 80000 70000 64500 residues at the carboxy terminals. There is also evidence for

is shown in Fig. 2. It appears likely that the initiating for- mylmethionine residue is removed post-translationally so that the next residue, the alanine designated residue 1, corresponds to the amino-terminal alanine residue identified by Henney et al. [48] in studies on the purified protein from E. coli (Crookes strain). Likewise, the carboxy-terminal sequence agrees with the sequence proposed by Vogel et al. [4] from studies with the protein from E. coli K12: Arg-Arg-Leu-Val-Met.

The 629 amino acid residues encoded by the aceF gene correspond to a polypeptide of M , = 65959, which is in good agreement with the value of 60000-65000 obtained by sedimentation analysis [9]. The amino acid composition of the acetyltransferase derived from the nucleotide sequence is also

further internal duplickions within the repeating units. The significance of these will be discussed in a subsequent paper where the lipoyl binding sites of the acetyltransferase will be compared with the comparable region of the succinyltrans- ferase (Spencer, Stephens, Darlison, Duckenfield and Guest, unpublished).

At the nucleotide level the homologies are equally striking. Substitutions occur at only four positions in the 54-base-pair segments encoding the three invariant 18-amino-acid se- quences. Detailed comparisons for repeating units 1 - 2, 2 - 3 and 1 - 3 indicate that there are 78, 57 and 86 base changes in the respective sequences of (approximately) 300 base pairs (Fig. 2). Of these, 29,23 and 27 (respectively), are silent changes that affect the third base of degenerate codons; most of the remaining changes involve multiple events in relatively few codons.

Page 7: The Pyruvate Dehydrogenase Complex of Escherichia coli K12 : Nucleotide Sequence Encoding the Dihydrolipoamide Acetyltransferase Component

487

These homologies suggest that the lipoyl binding sequences have arisen by duplication of a common ancestral sequence. Furthermore, the relative homologies suggest that repeating unit 2 may be closest to the ancestral sequence and repeating unit 1 may have diverged earlier than unit 3. In some respects the lipoyl binding regions resemble tethered forms of the small disulphydryl peptides, thioredoxin and glutaredoxin, which mediate somewhat similar redox reactions in conjunction with disulphide-flavoprotein reductases (thioredoxin reductase and glutathione reductase) that are analogous to lipoamide dehy- drogenase. The existence of tethered, but presumably mobile, lipoyl domains has obvious advantages with respect to func- tional efficiency of the complex.

The primary structure derived from the nucleotide sequence clearly shows the presence of three potential lipoyl binding sites but no conclusions concerning the number of sites lipoylated or the degree of lipoylation at each potential site can be drawn. Recent estimates of the lipoate content of the pyruvate dehydrogenase complex based on isotopic dilution studies and radiochemical analyses of the complex isolated from [35S]sulphate-grown E. coli have given values of 10.2 - 13.8 nmol lipoate/mg of protein [16, 171. These have been interpreted as confirming estimates of two lipoyl groups per acetyltransferase subunit obtained with the purified E2 com- ponent [16] or as favouring a lipoate content of three lipoyl groups per polypeptide chain [I 71. The discrepancy appears to be largely due to the different sizes and polypeptide stoichiometries ascribed to the respective preparations of complex. By contrast it is generally agreed that the number of lipoyl groups that function in reductive acetylation, acetyl- transfer and reoxidation is two (or two net) per E2 chain [14,15, 491. It has also been observed that up to half of the lipoyl groups can be removed without serious loss of complex activity [9] and that they can be modified more rapidly than catalytic activity is lost [49]. This is presumably because the number of lipoyl groups participating in the network of transacetylation sites that mediates active-site coupling in the complex is not rate- limiting for the overall reaction. It will thus be of great interest to investigate both structural and functional consequences of selective alteration and deletion of one or more of the three lipoyl binding sites using mutagenesis and gene manipulation in vitro. It is also interesting that the mutations affecting the aceF gene are clustered in the segment of DNA encoding the subunit binding and catalytic domain (see below) rather than in the lipoyl binding domain [20]. Presumably most of the mutations affecting the lipoyl domain are silent due to func- tional complementation occurring between the repeating units so that loss of function may only accompany large deletions or frameshift mutations.

Domain Structure of the Acetyltransferase Component (E2)

The primary structure provides an excellent confirmation of almost all the predictions of Bleile et al. [9] concerning the domain structure of the E2 polypeptide based on limited tryptic proteolysis of [3H]lipoyl-labelled complex and acetyltrans- ferase core. The lipoyl domain (fragment A) clearly cor- responds to the amino-terminal segment of the E2 polypeptide, the remainder resembling the subunit binding domain (frag- ment D). The approximate position of the junction between the two domains can be deduced from their amino acid com- positions. Fragment A of the Crookes strain differs from fragment D in possessing no arginine, histidine or tyrosine residues [9], and this means that fragment A must be generated by cleavage at a highly susceptible lysine residue. The sequence

of the E. coli K12 acetyltransferase shows that tyrosine, histidine and arginine appear first at residues 325,327 and 333, respectively (Fig. 2). If no small connecting peptide is released, the susceptible lysine residue is identifiable as the first lysine on the amino-terminal side of the tyrosine at 325, possibly the lysine residue at position 316 (Fig. 2). In Table 3 the com- position of fragment A is compared to that of the first 316 residues (Al) and, apart from slight excesses of alanine and lysine and a deficiency in glutamate plus glutamine in the latter, the correspondence is extremely good. If fragment A were released by cleavage at a site closer to the amino terminus and further from the first tyrosine residue, such as the lysine residue at position 312 or 301, the predicted composition would be reduced by Ala,, Glu,, Gly,, Lys, or Ala,, Glu,, Pro,, Gly,, Lys,. These reductions produce slight improvements in com- position but increase the MI discrepancy. The composition of the first 301 residues is included as A2 in Table 3. However, fragment A is probably heterogeneous with respect to its carboxy terminus. In fact, it migrates as a doublet in sodium dodecyl sulphate/polyacrylamide gels, the minor component having an apparent MI that is approximately 2000 larger than the major component [9,11]. The two species could well arise by cleavage at positions 316 (larger component) and 301 (smaller component) giving a size difference of 1325. Consequently, residues 302-316 could represent all or part of the postulated trypsin-sensitive hinge region between the lipoyl and subunit binding domains [9].

The first 30 amino acid residues of fragment A from E. coli (Crookes strain) have been sequenced recently by automated Edman degradation (L. J. Reed, G. H. Dixon and D. C. Watson, personal communication). This sequence agrees well with the amino-terminal sequence of the acetyltransferase subunit (and the lipoyl domain) deduced from the nucleotide sequence of the aceF gene of E. coli K12: Ala-Ile-Glu-Ile-Lys- Val-Pro-Asp-Ile-Gly-Ala-Asp-Glu-Val-Glu-I~e-Thr(Val)-Glu- Ile-Leu-Val-Lys-Val-Gly-Asp-Lys-Val-Glu-Ala(Gly)-Glu (Gln)-; Crookes strain residues are given in parentheses. The three amino acid residues that are replaced in the Crookes strain could simply be strain differences.

The composition of the residual fragment produced by substracting the larger lipoyl domain (Al) from the E2 polypeptide resembles, but is significantly larger than, frag- ment D, the subunit binding domain (Table 3). However, examination of the sequence of the residual fragment reveals two segments giving closer approximations to the composition offragmentD:D1,355-629andD2,317-591 (Table3). Both involve the removal of 38 residues either from the region between the two domains (317- 354, D1) or from the carboxy terminus (592- 629, D2). Sequential cleavage at residues 316 and 354 or 354 and 591 provides a plausible explanation why two fragments corresponding to fragment D (M, = 36000 and 39 000) are often detected [9, 1 I], but it is difficult to predict which of these alternatives may correspond to the actual cleavage pattern. Moreover, there are several trypsin-sensitive sites in the critical segments, so fragment D is likely to be heterogeneous with respect to its amino and carboxy termini. Segment 317 - 354 could well form part of the trypsin-sensitive hinge region that is released from between the two domains, because it is particularly rich in potential tryptic cleavage sites (Fig. 2). On the other hand, the existence of a region that is susceptible to endogenous proteolysis in the vicinity of residue 591 leading to the release of 38 amino acids from the E2 chain, could explain the two bands that are often seen with the E2 component [8, 101. Examination of the nucleotide and amino acid sequence offers no other simple explanation for this

Page 8: The Pyruvate Dehydrogenase Complex of Escherichia coli K12 : Nucleotide Sequence Encoding the Dihydrolipoamide Acetyltransferase Component

48 8

I I I I I I

I I I I I I N 1 I I I I 11 I I I 1 1

Table 3. Amino acid compositions of’ the lipoyl and .subunit binding domains of the dihydroiipoumide acet.vbansferasi> component (E2) The amino acid compositions of the lipoyl and subunit binding domains, fragment A and D released by limited tryptic proteolysis [9], are compared with those of potentially analogous segments of the primary structure: A l , residues 1-316; A2, residues 1-301; D1, residues 355-629; D2, residues 317- 591. The composition of E2 minus the lipoyl domain (Al) is included for comparison with fragment D (D1 and D2). The M , values denoted thus * were as determined by sedimentation equilibrium [9] and the rest were calculated from the amino acid compositions; n.d. = not determined

Amino acid Lipoyl domain E2 Subunit binding domain ~ ~~ __ ~___ - minus ~~ ~~

A A1 A2 A1 D D I D2 (1 -316) (1 - 301) (355 - 629) (3 17 - 591)

c

ASP Asn 1 Thr Ser

Glri Pro

Ala Val Met Ile Leu TY r Phe LY s His Arg CY s TrP

M ,

GlY

~-

Total

24

11 13

48

23 24 61 44

7 18 10 0 5

26 0 0

n.d. u.d. ~- .

314 31 600*

19 4

11 13 36 8

22 25 67 42

7 19 9 0 5

29 0 0 0 0

316 31 605

- ~-

19 4

11 13 34 8

20 24 59 42

7 19 9 0 5

27 0 0 0 0

301 30270

~~

13 16 16

22 7 I 15 26 29 26

9 26 24

3 14 24

5 20

1 3

313 34373

24

14 14

30

15 23 27 26

7 20 19 2

11 23

5 15 n.d. n.d.

275 29 600 *

13 11 14 15 19 7

14 22 24 23

9 25 21

2 12 21

4 15

1 3

275 30163

10 11 14 13 22

7 14 24 27 24

7 20 19 3

12 24

4 16

1 3

275 30 107

Lipoyl domain . . Subunit binding domain

Hinge <. . . . . >

a b c d

<-A2- <- D1-

A1 /\ D2 \ /

Fig. 5. Domain structure qftlze acetyltran.ferase subunit. The domain structure is shown with predicted positions for the trypsin-sensitive hinge region and the major fragments from limited tryptic proteolysis: based on the primary structure (Fig. 2), amino acid compositions (Tables 2 and 3) and proteolytic studies [9, 111. The three potential lipoyl-lysine residues at positions 40, 143 and 244 are denoted by the asterisks and putative trypsin-sensitive sites (see text) are indicated by letters corresponding to specific residues: a, 301; b, 316; c, 354; d, 591

heterogeneity of the E2 component, unless it stems from differences in the degree of lipoylation. A domain structure for the acetyltransferase subunit that is consistent with the primary structure and the results of tryptic proteolysis is presented in Fig. 5.

The patterns of limited tryptic proteolysis [9, 111 contain other intermediates ( M , = 70000 and 56000) and lipoylated products derived from fragment A : fragment B ( M , = 28000) and fragment C, designated as a limit polypeptide of the lipoyl domain ( M , = 10000). These fragments can all be accom- modated in plausible models of tryptic fragmentation based on the primary structure, if allowances are made for the anom- alous M, values obtained by gel electrophoresis. Thus it is

tempting to speculate that fragment C corresponds to the lipoyl binding regions, or hybrid regions derived from adjacent repelting units, and that fragment B may contain two such regions. The presence of a non-lipoylated fragment of M , = 56000 [9] raises important questions concerning the num- ber and location of the lipoyl groups in the acetyltransferase component. The largest segment of polypeptide chain contain- ing no potential lipoyl binding sites is of M , = 41 335 (residues 245 - 629) and this is unlikely to have an apparent M , as high as 56 000 by sodium dodecyl sulphate/polyacrylamide gel elec- trophoresis. Thus it is probable that the 56000-Mr fragment contains the third potential lipoyl binding site and the results suggest that this site is not lipoylated. Clearly the degree of

Page 9: The Pyruvate Dehydrogenase Complex of Escherichia coli K12 : Nucleotide Sequence Encoding the Dihydrolipoamide Acetyltransferase Component

489

lipoylation at each potential binding site in complexes labelled under different conditions needs to be reinvestigated now that the primary structure has been established. The large discrep- ancies in M , values obtained by different methods, especially for fragments containing parts of the lipoyl domain, is a complicating feature of the acetyltransferase component and its significance undoubtedly resides in the unusual structure of the assembly of repeating units comprising the lipoyl domain, be it swollen or extended [9], random-coil [I21 or three extended structures linkcd by random coil.

The polarity indices [50] of the lipoyl and subunit binding domains calculated from the primary structure are 33, 39 (Al, A2) and 43,44 (Dl , D2) compared to 41 for the E2 subunit. The lipoyl domain (A1 and A2) has a net negative charge of -26 at pH 7.5 whereas the subunit binding domain should have a net positive charge, + 4 (Dl) or + 8 (D2). These values are consistent with the apparent isoelectric points determined for the A and D fragments, 4.0 and 8.6, respectively [9]. The high acidity undoubtedly accounts for the weak Coomassie blue staining capacity of the lipoyl domain and it may contribute to its anomalous electrophoretic migration. The subunit binding domain contains two segments of high hydrophobicity : re- sidues 373 ~ 389 and 542- 567 with hydrophobicity indices of 1.6 and 1.46 respectively [51]. There is only one cysteine residue in the acetyltransferase subunit (residue 546, Fig. 2). It is located in the subunit binding and catalytic domain and could be an important active-site residue.

The primary structure of the dihydrolipoamide acetyl- transferase component deduced from the nucleotide sequence of the aceF gene has shown that size estimates based on electrophoretic migration and gel filtration are unreliable due primarily to inherent features of the lipoyl domain. The lipoyl domain itself is remarkable in consisting of three tandemly repeated segments of 100 amino acids, each exhibiting a very high degree of mutual homology. The results also confirm and extend predictions about the domain structure and organi- zation of this subunit based on limited tryptic proteolysis.

We are greatly indebted to F. Sanger and A. Coulson (for introducing us to the dideoxy method, and for a generous gift of primer) and to R. Staden and I. K. Duckenfield (for providing computer programs and assisting with computing). We also acknowledge support from the Science and Engineering Research Council by project grant GR/B35543 (JRG) and studentship (PES).

REFERENCES

1. Reed, L. J. (1974) Ace. Chem. Res. 7 , 40-46. 2. Guest, J. R. (1978) Adv. Neurol. 21, 219-244. 3. Eley, M. H., Namihira, G., Hamilton, L., Munk, P. & Reed, L. J.

4. Vogel, O., Beikirch, H., Muller, H. & Henning, U. (1971) Eur. J .

5. Vogel, 0. (1977) Biochem. Biophys. Res. Commun. 74, 1235-1241. 6 . Perham, R. N. & Thomas, J. 0. (1971) FEBS Lett. 15, 8-12. 7. Danson, M. J. & Porteous, C. F. (1981) FEBS Lett. 133, 212-114. 8. Gebhardt, C., Mecke, D. & Bisswanger, H. (1978) Biochem. Biophys.

9. Bleile, D. M., Munk, P., Oliver, R. M. & Reed, L. J. (1979) Proc. Nut1

10. Guest, J. R., Cole, S. T. &Jeyaseelan, K . (1981)J. Gen. Microhiol. 127,

(1972) Arch. Biochem. Biophys. 152, 655 -669.

Biochem. 20, 169- 178.

Res. Commun. 84, 508 - 514.

Acad. Sci. USA. 76, 4385-4389.

65 - 79.

11. Hale, G. & Perham, R. N. (1979) Eur. J . Biochem. 94, 119-126. 12. Perham, R. N., Duckworth, H. W. &Roberts, G. C. K. (1981) Nuture

13. Danson, M. J. & Perham, R. N. (1976) Biochem. J . 199, 505-511. 14. Speckhard, D. C., Ikeda, B. H., Wong, S. S. & Frey, P. A. (1977)

15. Collins, J. H. & Reed, L. J . (1977) Proc. Nut1 Acud. Sri. USA. 74,

16. White, R. H., Bleile, D. M. & Reed, L. J. (1980) Biochem. Biophys. Res

17. Hale, G. & Perham, R. N. (1979) Biochem. J. 177, 129-137. 18. Hale, G. & Perham, R. N. (1980) Biochem. J . 187, 905-908. 19. Guest, J. R. & Stephens,P. E. (1980) J . Gen. Microbiol. 121, 277-292. 20. Guest, J. R. Roberts, R. E. & Stephens, P. E. (1983) J . Gen. Microbiol.

21. Stephens, P. E., Darlison, M. G., Lewis, H. M. & Guest, J. R. (1983)

22. Rothstein, R. J. & Wu, R. (1981) Gene 15, 167- 176. 23. Messing, J. (1979) Recomb. DNA Tech. Bull. 2, 43-48. 24. Messing, J. & Vieira, J. (1982) Gene 19, 269-276. 25. Sanger, F., Coulson, A. R., Barrell, B. G., Smith, A. J. H. & Roe, B. A.

26. Winter, G. & Fields, S. (1980) Nucleic Acids Rex 8, 1965- 1974. 27. Sanger, F., Nicklen, S. & Coulson, A. R. (1977) Proc. Nut1 Acad. Sci.

28. Duckworth, M. L., Gait, M. J., Goelet, P., Hang, G. F., Singh, M. &

29. Staden, R. (1977) Nucleic Acids Res. 4, 4037-4051. 30. Staden, R. (1978) Nucleic Acids Res. 5, 1013-1015. 31. Staden, R. (1979) Nucleic Acids Res. 6 , 2601 -2610. 32. Staden, R. (1980) Nucleic Acids Res. 8, 3673-3694. 33. Staden, R. & McLachlan, A. D. (1982) Nucleic. Acids Res. 10,

141 - 156. 34. Shine, J . & Dalgdrno, L. (1974) Proc. Nut1 Acad. Sti. U S A 71, 1342-

1346. 35. Steitz, J. (1979) in Biological Regulation and Development (Goldberger,

R. F., ed.) vol. I , pp. 349- 399, Plenum Press, New York. 36. Stormo, G. D., Schneider, T. D. & Gold, L. M. (1982) Nucleic Acids

37. Atkins, J. F. (1979) Nucleic Acids Res. 7, 1035- 1041. 38. Vogel, O., Hoehn, B. & Henning, U. (1972) Proc. Nut1 Acad. Sci. USA,

39. Bates, D. L., Harrison, R. A. & Perham, R. N. (1975) FEBS Lett. 60, 427 - 430.

40. Yanofsky, C., Platt, T., Crawford, I. P., Nichols, B. P., Christie, G. E., Horowitz, H., Van Cleemput, M. & Wu, A. M. (1981) Nueleic Acids

(Lond.) 292, 474-477.

Biochem. Biophys. Res Commun. 77, 708 - 713.

4223 - 4227.

Commun. 94, 78 - 84.

129, 671 - 680.

Eur. J. Biochem. 133, 155-162.

(1980) J. Mol. Biol. 143, 161- 178.

USA 74, 5463 - 5467.

Titmas, R. C. (1981) Nucleic Acids Res. 9, 1691 - 1706.

RCS. 10, 2971 - 2996.

69, 1615-1619.

R ~ s . 9 , 6647 - 6668. 41. Grendly, N. D. F. (1978) Cell 13, 419-426. 42. Neilson, J., Hansen, F. G., Hoppe, J., Friedl, P. & Meyenburg, K. V.

(1981) Mol. Gen. Genet. 184, 33-39. 43. Cole, S. T., Grundstrom, T., Jaurin, B., Robinson, J. J. & Weiner, J. H.

(1982) Eur. J , Biochem. 126, 21 1-216. 44. Tinoco, I., Jr, Borer, P. N., Dengler, B., Levine, M. D., Uhlenbeck, 0.

C., Crothers, D. M. & Gralla, J. (1973) Nuture (New B i d . ) 246, 4 - 4 1 .

45. Rosenberg, M. & Court, D. (1979) Annu. Rev. Genet. 13, 319-353. 46. Grantham, R., Gautier, C., Gouy, M., Jacobzone, M. & Mercier, R.

47. Grosjean, H. & Fiers, W. (1982) Gene 18, 199-209. 48. Henney, H . R., Willms, C. R., Muramastu, T., Mukerjee, B. B. &Reed,

49. Danson, M. J. , Hale, G. & Perham, R. N. (1981) Biochem. J . 199,

50. Capaldi, R. A. &Vanderkooi, G. (1972) Pror. NutlAcad. Sci. USA, 69,

51. Segrest, G. B. & Feldmann, R. J. (1974) J . Mol. Biol. 87, 853-858.

(1981) Nurleic Acids Res. 9, r43-r74.

L. J. (1967) J . Biol. Chem. 242, 898-901.

505- 51 1.

930-932.

P. E. Stephens, Celltech. 244-250 Bath Road, Slough, Berkshire, England SL1 4DY

M. G. Darlison, H. M. Lewis, and J. R . Guest, Department of Microbiology, University of Sheffield, Western Bank, Sheffield, South Yorkshire, England S10 2TN