6
Eur. J. Biochem. 148, 515-520 (1985) 0 FEBS 1985 Nucleotide sequence of the gene encoding the hydrogenase from Desulfovibrio vulgaris (Hildenborough) Gerrit VOORDOUW and Sydney BRENNER Medical Research Council Laboratory of Molecular Biology, Cambridge (Received November 19, 1984/February 6, 1985) - EJB 84 1215 The nucleotide sequence of the 4.7-kb SaA/EcoRI insert of plasmid pHVl5 containing the hydrogenase gene from DesulJbvibrio vulgaris (Hildenborough) has been determined with the dideoxy chain-termination method. The structural gene for hydrogenase encodes a protein product of molecular mass 45820 Da. The NH2-terminal sequence of the enzyme deduced from the nucleic acid sequence corresponds exactly to the amino acid sequence determined by Edman degradation. The nucleic acid sequence indicates that a N-formylmethionine residue pre- cedes the NH2-terminal amino acid Ser-1. There is no evidence for a leader sequence. The NH,-terminal part of the hydrogenase shows homology to the bacterial [8Fe-8S] ferredoxins. The sequence Cys-Ile-Xaa-Cys-Xaa-Xaa-Cys-Xaa-Xaa-Xaa-Cys-Pro-Xaa-Xaa-Ala-(Ile) occurs twice both in the hydrogenase and in [8Fe-8S] ferredoxins, where the Cys residues have been shown to coordinate two [4Fe-4S]clusters [Adman, E. T., Sieker, L. C. and Jensen, L. H. (1973) J. Biol. Chem. 248, 3987-39961. These results, therefore, suggest that two electron-transferring ferredoxin-like [4Fe-4S] clusters are located in the NH2-terminal segment of the hydrogenase molecule. There are ten more Cys residues but it is not clear which four of these could participate in the formation of the third cluster, which is thought to be the hydrogen binding centre. Another gene, encoding a protein of molecular mass 13493 Da, was found immediately downstream from the gene for the 46-kDa hydrogenase. The nucleic acid sequence suggests that the hydrogenase and the 13.5-kDa protein belong to a single operon and are coordinately expressed. Since dodecylsulfate gel electrophoresis of purified hydrogenase indicates the presence of a 13.5-kDa polypeptide in addition to the 46-kDa component, it is proposed that the hydrogenase from D. vulgaris (Hildenborough) is a two-subunit enzyme. The preceding paper reports the cloning of the gene encoding the 46-kDa hydrogenase from Desuljovibrio vulgaris (Hildenborough) on a 4.7-kb SalI/EcoRI restriction fragment [l]. The further characterisation of the hydrogenase gene by nucleic acid sequencing and analysis of the sequence is de- scribed here. MATERIALS AND METHODS Materials Restriction endonucleases were obtained from the same suppliers as in [l]. DNA polymerase (Klenow fragment) was from Boehringer or Anglian Biotechnology Ltd (Colchester, UK). Labelled dATP (either [ U - ~ ~ P I ~ A T P or [u-~~S]~ATP[S], both 400 Ci/mmol) was from Amersham. Deoxynucleoside triphosphates, dideoxynucleoside triphosphates, as well as all other reagents used for nucleic acid sequencing, were as de- scribed in [2]. DNA preparation for sequencing Plasmid pHV15 was purified as described [I]. The 4.7-kb SalIlEcoRI insert of this plasmid was excised with EcoRI and HindIII (the PstI and HindIII sites of the multiple cloning sequence of pUC9 are 5' from the SaA site [3]). The 4.7-kb HindIIIIEcoRI fragment was isolated by agarose gel electro- phoresis and used for sequencing [2]. Cloning in MI3 The purified 4.7-kb HindIIIIEcoRI fragment was digested with either Sau3A, TagI, AluI, HaeIII or RsaI and the digest cloned into the compatible site of the replicative form of the vectors M13 mp8 or mp9 [4].Alternatively the 4.7-kb HindIII/ EcoRI fragment was sonicated, end-repaired and size- fractionated by agarose gel electrophoresis. The 400 - 500- bp fraction was cloned into the SmaI site of MI3 mp8 [2]. Following transfection of E. coli JMlOl, recombinant phages were identified as colourless plaques on indicator plates containing 5-bromo-4-chloro-3-indolyl-~-~-galactoside and isopropyl-P-D-thiogalactoside. Single-stranded DNA was pre- pared from these plaques using standard methods [2]. Shotgun nucleotide sequencing Single-stranded M13 templates were sequenced at random with the dideoxy chain-termination procedure of Sanger et al. [5] using the universal sequencing primer (17-mer) as detailed in [2]. The data were compiled in a DEC VAX computer using the programmes of Staden [6, 71. Correspondence to G. Voordouw, MRC Laboratory of Molecuar Abbreviations. bp, base pair; kb, 103-basepairs; IPTG, isopropyl- Enzyme. Hydrogenase, hydrogen : cytochrome c3 oxidoreductase Biology, Hills Road, Cambridge, England CB2 2QH fi-D-thiogalactoside. (EC 1.12.2.1). RESULTS AND DISCUSSION Sequencing strategy and location of the hydrogenase gene The complete nucleotide sequence of the HindIIIIEcoRI fragment isolated from pHV15 has been determined and is

Nucleotide sequence of the gene encoding the hydrogenase from Desulfovibrio vulgaris (Hildenborough)

Embed Size (px)

Citation preview

Page 1: Nucleotide sequence of the gene encoding the hydrogenase from Desulfovibrio vulgaris (Hildenborough)

Eur. J . Biochem. 148, 515-520 (1985) 0 FEBS 1985

Nucleotide sequence of the gene encoding the hydrogenase from Desulfovibrio vulgaris (Hildenborough) Gerrit VOORDOUW and Sydney BRENNER

Medical Research Council Laboratory of Molecular Biology, Cambridge

(Received November 19, 1984/February 6, 1985) - EJB 84 1215

The nucleotide sequence of the 4.7-kb SaA/EcoRI insert of plasmid pHVl5 containing the hydrogenase gene from DesulJbvibrio vulgaris (Hildenborough) has been determined with the dideoxy chain-termination method. The structural gene for hydrogenase encodes a protein product of molecular mass 45820 Da. The NH2-terminal sequence of the enzyme deduced from the nucleic acid sequence corresponds exactly to the amino acid sequence determined by Edman degradation. The nucleic acid sequence indicates that a N-formylmethionine residue pre- cedes the NH2-terminal amino acid Ser-1. There is no evidence for a leader sequence.

The NH,-terminal part of the hydrogenase shows homology to the bacterial [8Fe-8S] ferredoxins. The sequence Cys-Ile-Xaa-Cys-Xaa-Xaa-Cys-Xaa-Xaa-Xaa-Cys-Pro-Xaa-Xaa-Ala-(Ile) occurs twice both in the hydrogenase and in [8Fe-8S] ferredoxins, where the Cys residues have been shown to coordinate two [4Fe-4S] clusters [Adman, E. T., Sieker, L. C. and Jensen, L. H. (1973) J . Biol. Chem. 248, 3987-39961. These results, therefore, suggest that two electron-transferring ferredoxin-like [4Fe-4S] clusters are located in the NH2-terminal segment of the hydrogenase molecule. There are ten more Cys residues but it is not clear which four of these could participate in the formation of the third cluster, which is thought to be the hydrogen binding centre.

Another gene, encoding a protein of molecular mass 13 493 Da, was found immediately downstream from the gene for the 46-kDa hydrogenase. The nucleic acid sequence suggests that the hydrogenase and the 13.5-kDa protein belong to a single operon and are coordinately expressed. Since dodecylsulfate gel electrophoresis of purified hydrogenase indicates the presence of a 13.5-kDa polypeptide in addition to the 46-kDa component, it is proposed that the hydrogenase from D. vulgaris (Hildenborough) is a two-subunit enzyme.

The preceding paper reports the cloning of the gene encoding the 46-kDa hydrogenase from Desuljovibrio vulgaris (Hildenborough) on a 4.7-kb SalI/EcoRI restriction fragment [l]. The further characterisation of the hydrogenase gene by nucleic acid sequencing and analysis of the sequence is de- scribed here.

MATERIALS AND METHODS

Materials

Restriction endonucleases were obtained from the same suppliers as in [l]. DNA polymerase (Klenow fragment) was from Boehringer or Anglian Biotechnology Ltd (Colchester, UK). Labelled dATP (either [ U - ~ ~ P I ~ A T P or [u-~~S]~ATP[S], both 400 Ci/mmol) was from Amersham. Deoxynucleoside triphosphates, dideoxynucleoside triphosphates, as well as all other reagents used for nucleic acid sequencing, were as de- scribed in [2].

D N A preparation f o r sequencing

Plasmid pHV15 was purified as described [I]. The 4.7-kb SalIlEcoRI insert of this plasmid was excised with EcoRI and HindIII (the PstI and HindIII sites of the multiple cloning sequence of pUC9 are 5' from the SaA site [3]). The 4.7-kb

HindIIIIEcoRI fragment was isolated by agarose gel electro- phoresis and used for sequencing [2].

Cloning in MI3

The purified 4.7-kb HindIIIIEcoRI fragment was digested with either Sau3A, TagI, AluI, HaeIII or RsaI and the digest cloned into the compatible site of the replicative form of the vectors M13 mp8 or mp9 [4]. Alternatively the 4.7-kb HindIII/ EcoRI fragment was sonicated, end-repaired and size- fractionated by agarose gel electrophoresis. The 400 - 500- bp fraction was cloned into the SmaI site of MI3 mp8 [2]. Following transfection of E. coli JMlOl, recombinant phages were identified as colourless plaques on indicator plates containing 5-bromo-4-chloro-3-indolyl-~-~-galactoside and isopropyl-P-D-thiogalactoside. Single-stranded DNA was pre- pared from these plaques using standard methods [2].

Shotgun nucleotide sequencing

Single-stranded M13 templates were sequenced at random with the dideoxy chain-termination procedure of Sanger et al. [5] using the universal sequencing primer (17-mer) as detailed in [2]. The data were compiled in a DEC VAX computer using the programmes of Staden [6, 71.

Correspondence to G. Voordouw, MRC Laboratory of Molecuar

Abbreviations. bp, base pair; kb, 103-base pairs; IPTG, isopropyl-

Enzyme. Hydrogenase, hydrogen : cytochrome c3 oxidoreductase

Biology, Hills Road, Cambridge, England CB2 2QH

fi-D-thiogalactoside.

(EC 1.12.2.1).

RESULTS AND DISCUSSION

Sequencing strategy and location of the hydrogenase gene

The complete nucleotide sequence of the HindIIIIEcoRI fragment isolated from pHV15 has been determined and is

Page 2: Nucleotide sequence of the gene encoding the hydrogenase from Desulfovibrio vulgaris (Hildenborough)

516

Sonics I I

B

i moo 2600 3000 4000 5000

Fig. 1 . ( A ) Survey of MI3 clones that have been sequenced with the dideoxy chain-termination procedure. ( B ) Location of the hydrogenase gene on the 4678-bp HindIII/EcoRI fragment. (A) The clones were obtained from the 4678-bp HindIIIIEcoRI fragment. The direction and extent to which they were sequenced are indicated. M13 clones were obtained by cloning restriction enzyme digests (Sau3A, TaqI, AluI, HueIII, RsuI) or sonicated fragments (sonics). The restriction map shown for these enzymes was derived from the sequence. The HindIII site is on the left, the EcoRI site on the right (B). The positions of the genes for the 46-kDa hydrogenase and the 13.5-kDa protein are indicated on the coding strand (5’ + 3’). Base 1 is the 5’ adenine of the HindIII site. Two SstII sites (t, positions 1785 and 1904) and a single PsrI site (1, position 2467) are indicated. The nucleotide sequence from the furthest right SstII site to the furthest right Sau3A site (-; 1904-3970) is shown in Fig. 2

4678 bp long. This includes the SalI/EcoRI insert of D. vulgaris (Hildenborough) DNA (4664 bp) and 14 bp of pUC9 sequence with the Hind111 and PstI restriction sites. The M13 clones that have been isolated and the extent to which these have been sequenced are indicated in Fig. 1. The insert lengths of M13 clones obtained by cloning sonicated DNA were 400 - 500 bp, while those obtained by cloning restriction endonuclease digests can be deduced from the restriction map (Fig. l), which was derived from the sequence. Several clones contained multiple restriction fragments joined during liga- tion. All sequences obtained for M13 clones with restriction fragment inserts were checked by a computer programme [6] for the presence of the corresponding restriction sequence. When a site was found the two sequences were disconnected and entered separately into the data base. This procedure is more cumbersome than the sequencing of random, appropriately size-fractionated, sonicated fragments [2]. However, the latter were found to be difficult to clone.

The final compiled sequence was fully overlapped and 90% of the sequence is derived from information on both strands. Each base was on average determined 7.2 times. The 4678-bp HindIIIIEcoRI fragment has a high content of G . C base pairs (63.6% ; an average of 65% G . C has been reported for D. vulgaris DNA [S]). The interpretation of eight stretches

of sequence was difficult due to compressions on the gels. Four of these occurred in the coding regions indicated in Fig. 1. However, in all cases an unambiguous sequence could be derived by reading both strands. The sequences for these compressed regions were checked by substituting dITP for dGTP in the sequencing reactions [2] to prevent G . C base pair formation.

The location of the coding sequence for the hydrogenase of D. vulgaris (Hildenborough) was established from the NH2- terminal amino acid sequence of the 46-kDa protein [I]. A 2067-bp part of the sequence of the 4678-bp HindlIIIEcoRI fragment (bases 1904- 3970) is shown in Fig. 2, together with the predicted amino acid sequence of the hydrogenase gene (bases 165-1430, taking the original base 1904 as num- ber 1). The gene codes for a protein of 420 amino acids, molecular mass 45 820 Da, excluding the initiator methionine. The NH2-terminal sequence, from Ser-1 to Asp-20, shown in Fig. 2, is in complete agreement with that determined for the 46-kDa hydrogenase by sequential Edman degradation [l]. The amino acid composition determined for the carboxymethylated 46-kDa protein also agrees with that derived from the sequence (Table 1). These results confirm the expectation based on Western blotting experiments (Fig. 4 in [l]) that the insert of pHV15 contains the entire hydrogenase gene.

The nucleic acid sequence indicates that the first codon of the gene (AGC; Ser-1) is preceded by AUG (Met). Since a plausible ribosome binding site (GGAGGA) is positioned 12-7 bp upstream, this AUG codon then must code for an N-formyl-methionine, which is later removed. There is no evidence in the sequence for any preceding translation-initia- tion site further upstream and it must, therefore, be concluded that the 46-kDa hydrogenase lacks a leader sequence. This is surprising because this and related enzymes are thought to be located in the periplasm [9 - 111. If it is indeed periplasmic then a special mechanism would be needed for translocation of the enzyme across the membrane. We prefer the possibility that the enzyme is cytoplasmic.

The 13.5-kDa protein gene

Immediately downstream from the gene encoding the 46- kDa hydrogenase another gene was found that codes for a protein of 122 amino acids, molecular mass 13493 Da, ex- cluding the N-formyl-methionine. The reading frame was detected by two different methods.

a ) Gene search by signal. The gene is preceded by a plausible ribosome binding site (GGAGGA) located 12 - 7 bp upstream from the initiator codon, precisely as observed for the 46-kDa hydrogenase gene (Fig. 3). Like the hydrogenase gene, it has an amber stop codon (UAG).

6) Gene search by content. The codon usage method has been chosen from several possibilities [12, 131. The high G + C content of the sequence mentioned earlier leads to a strongly biased codon usage in the 46-kDa hydrogenase gene with a strong preference for G or C over A or T in the third position of codons (Table 2). Supplying the codon usage Vdbk for the hydrogenase gene as a standard in searching for other genes gives the results shown in Fig. 4. A clear peak is observed for the 13.5-kDa protein gene (B) in reading frame 11, indicating similar codon usage. This is also demonstrated in Table 2. In addition a possible third gene (C) is indicated in frame 111.

Thus the presence of a second gene (B) coding for a 13.5- kDa protein appears well established. The two genes, hydro- genase and B, are very close together with only 14 bp between

Page 3: Nucleotide sequence of the gene encoding the hydrogenase from Desulfovibrio vulgaris (Hildenborough)

517

- - - - Sst II -35 -10 C C G C G G G G C T G A C A G G A T G C T G C A A C A C A T A G G G C A G A A T C T C C G C A G G C A G A G C A A T G C C C T T C T G A T A T T A C A G A C A G A T A C ~ G C C G G G A C A T G C T C C C G G C A A C G G C A G C G A G G C A

10 20 30 40 50 6 0 70 80 90 100 -110 120

- M S R T V M E R I E Y C M H T P D P h A D P D h L H T A C C G C C C G C G C C T G C C A G A A C C T G T A A C G G A G G A T T G C A G A T G A G C C G T A C C G T C A T G G A G C G C A T C G A ~ T A T G A G A T G C R C A C T C C G G A C C C C A A G G C C G A T C C G G A C A A G C T C C

130 140 150 160 170 180 190 200 210 220 230 240

F V O I D E A ~ ~ I G O D T O S O Y O ~ T A A I F G E ~ G E F H S I P H I E ~ ~ A C T T C G T C C A G A T C G A C G A G G C 4 A A G T G C G A C A C C T G T T C G C A G T A C T G C C C C A C C G C C G C C A T C T T C G G C G A A A T G G G C G A A C C G C A C T C C A T T C C C C A C A T C G A G G C G T

250 260 270 280 290 300 310 320 330 340 350 360

I N O G R O L T H O P E N A I Y E A R S W V F E V E ~ ~ L ~ C I G ~ V K @ I A ~ ~ G C A T C A A C T G C G G C C A G T G C C T C A C G C A C T G C C C C G A G ~ ~ C G C C A T C T A C G A G G C A C A G T C G T G G G T G C C T G A ~ G T C G A G A A G A A G C T G ~ A G G A ~ G G C A A G G T G A A A T G C A T C G C C A T G C

370 380 390 400 410 420 430 440 450 460 470 480

A P A V K Y A L G D A F G M ~ V G S V T T G K M L A A L R K L G F A H @ W I ~ T ~ : C C G C C C C C G C C G T G C G C T A T G C A C T G G G C G A C G C C T T C G G C A T G C C C G T C G G T T C C G T C A C C A C C G G C A A G A T G C T C G C G G C C C T G C A G A A G C T C G G C T T C G C T C A T T G C T G G G A C A C C G

490 500 510 520 530 540 550 560 570 580 590 600

F T A n v T I w E E G s E F v E R L T k h s D M P L P R F T S @ @ F G w R K Y n A G T T C A C C G C T G A C G T G A C C A T C T G G G G A A G b G G G G T C C G h G T T C G T G G A A C G C C T C R C C A A G A A G A G C G A C A T G C C G C T G C C G C A G T T C A C C T C G T G C T G C C C C G G C T G G C A G A & G T A T G

610 620 630 640 650 660 670 680 690 700 710 720

E T Y Y P E L L P H F S T @ K S F I G M N G A L A K T Y G A E R M K Y D F K R V C C G A G A C C T A C T A C C C C G A A C T G C T G C C G C A C T T C T C C A C G T G C A A G T C G C C C A T C G G C A T G A A C l j G C G C A C T G G C G A A G A C C T A C G G C G C A G A G C G G A T G A A G T A C G A C C C C A A G C A G G

730 740 750 760 770 780 790 800 810 820 830 840

Y T V S I M P ~ I A K K Y E G L ~ P E L K S S G M R I l I I l A T L T T R E L A Y M TCTACACCGTCTCCATCATGCCCTGCATCGCAAAGAAGTACGRAGGGTTGCGTCCCGAACTGAAGTCCAGCGGCATGCGCGACATCGACGCCACGCTGACCACCCGTGAGCTGGCCTACA

850 860 870 880 890 900 910 920 930 940 950 960

I K ~ A G I ~ F A ~ L P I I G ~ H D S L ~ G E S T G G ~ T X F G ~ T G G V ~ E ~ ~ T G A T C A A G A A G G C C G G T A T C G A C T T C G C G A A A C T C C C C G A C G G C A A G C G T G A C A G C C T C A T G G G T G A A T C C A C C G G C G G T G C C A C C A T C T T C G G C G T C A C C G G C G G C G T C A T G G A A G C G G

970 980 990 1000 1010 1020 1030 1040 1050 1060 1070 1080

L R F A Y E A V T G K K P I l S W D F k A V R G L I l G I K E ~ T V N V G G T D V t i C A C T C C G C T T C G C C T A C G A A G C C G T C A C C G G C A A G A A G C C C G A C A G C T G G G A C T T C A A G G C C G T G C G C G G T C T T G A T G G C A T C A A G G A A G C C A C C G T C A A C G T C G G C G G T A C C G A C G T C A

1090 1100 1110 1120 1130 1140 1150 1160 1170 1180 1190 1200

V A V V H G A ~ R F K R V C D D V K A G K S P Y H F I E Y M A O P G G C V C G C A G G T C G C C G T G G T G C A C G G G G C C A A G C G G T T C A A G C A G G T C ~ C G A C G A T G T G A A G G C G G G C A A G T C G C C C T A ~ C A C T T C A T C G A A T 4 C A T G G C C T G C C C C G G C G G C ~ C G T C ~ T G G C G

1210 1220 1230 1240 1250 1260 1270 1280 1290 1300 1310 1320

G R F V ~ P G V L E A ~ D R T T T R L Y A G L ~ K R L A M A S A N K A ~ - GCGGTCAGCCCGTCATGCCCGGCGTGCTCGAAGCCATGGACCGCACC~CCACCCGCCTTTACGCGGGCCTGAAGAAGCGCCTCGCCATGGCGAGCGCCAACAAGGCATAGGAGGAAACGC

1330 1340 1350 1360 1370 1380 1390 1400 1410 1420 1430 1440

~ Q I A S I T R ~ ~ G F L K V A O ~ ~ T G A A L I G I R ~ T G ~ ~ A V A A ~ ~ ~ Q I ~ C A T G C A G A T A G C C A G C A T C A C C C G G C G C G G C T T C C T C A A G G T C G C C T G C G T C A C G A C G G G C G C A G C C C T C R T C G G C A T T C G C A T G A C C G G A A ~ G G C C G T T G C C G C C G T C A A G C ~ G A T C A A

1450 1460 1470 1480 1490 1500 1510 1520 1530 1540 1550 1560

D Y ~ L D K I N G V Y G A D A h F P V R A S O I I N T Q V h A L Y K S Y L E t i P L GGACTACATGCTTGACCGCATCAACGGCGTCTACGGGGCGGRTGCCRAGTTCCCCGTTCGCGCCTCGCAGGACAACACGCAGGTCAAGGCTCTCTACAAGAGCTACCTTGAGAAGCCTCT

1570 1580 1590 1600 1610 1620 1630 1640 1650 1660 1670 1680

G H h S H D L L H T H W F D h S K G V K E L T T A G h L P N ~ ~ A S E F E G P Y CGGTCACAAGTCGCACGACCTGCTGCACACGCACTGGTTCGACAAGTCCAAGGGCGTCAAGGAACTCACCACGGCAGGCRAGTTGCCCAACCCGCGTGCTTCCGAGTTCGAAGGTCCGTA

1690 1700 1710 1720 1730 1740 1750 1760 1770 1780 1790 1800

P Y E * CCCCTACGAATAGCGCCAGAACGTATACGGAAGGCATAAACGCACATCCGTGATGCCGGAACCGCCTGCGGCG~GGGCCTCTTGGCCTGACCTCAGGGCGATTCCTCCGGCAACGCAGAT

1810 1820 1830 1840 1850 1860 1870 1880 1890 1900 1910 1920

-A 6-

1930 1940 1950 1960 1970 1980 1990 2000 2030 2040

GCATCTGCCCCACGATCTCATGRCACGAAGACACTTAAAGTAATACATTACTGTTTTCGTGAAGCCTGTCCCTTGTGGCGCGGCGACAGAAAAGCCCCCCGACGCGCAGCGCAGGGGGGC 2 0 1 0 2070

S 2 A G G B C G T C A G C A C C T G T C C A C G A T C

2050 2060

Fig. 2. Nucleic acid sequence of a 2067-bp section of the 4678-hp HindIIl/EcoRI fragment. (See legend to Fig. 1 and text for details). The coding regions of the genes for the 46-kDa hydrogenase (165-1427) and the 13.5-kDa protein (1442-1810) are indicated by means of translation using the single letter amino acid code. The Cys residues have been circled. Two possible ribosome binding sites (positions 153 and 1430) are indicated, as well as a (weak) E. coli promoter (- 35, - 10) and two inverted repeats (--) that might serve as transcriptional terminators

them. A weak E. coli promoter is present 100-130 bp 250 bp downstream from the 13.5-kDa protein gene. These upstream from the translational start of the 46-kDa results strongly suggest that the two genes constitute a single hydrogenase gene (Fig. 2: -35 and -10; Fig. 4: bottom), unit of transcription. It was shown earlier that purified D. together with a plausible transcription terminator 200 - vulgaris hydrogenase comprises two polypeptide chains hav-

Page 4: Nucleotide sequence of the gene encoding the hydrogenase from Desulfovibrio vulgaris (Hildenborough)

518

Table 1. Amino acid composition fo r the 46-kDa component of hydro- genase derived f rom the nucleic acid sequence compared with the ex- perimentally determined value The experimental value is the mean of a duplicate determination following 24 h of hydrolysis of carboxymethylated 46-kDa hydrogenase [I] at 110°C in 6 M HCI. Cys was determined as carboxymethylcy steine

Amino acid Residues for

sequence experimental

Asp + Asn Thr Ser Glu + Gln Pro

Ala

Met Ile Leu TYr Phe His LYS Arg TrP CYS

GIY

Val

28 28 18 38 26 39 42 21 18 19 24 16 14 9

36 15

5 18

28.9 21.2 19.6 33.9 24.1 41.8 44.1 31.4 14.6 18.9 24.3 15.9 15.5 9.3

32.6 15.1

16.3 -

Total 420 414.1

46 kDa: G G A G G A T T G C A G m

- 13.5 kDa: G G A G G A A A C G C C A T G

Fig. 3. Comparison of sequences immediately upstream,from the coding regions fo r the 46-kDa and 13.5-kDa proteins

ing masses as predicted from the two sequences (Fig. 1 A in [l]). It appears that the hydrogenase from D. vulgaris Hildenborough is composed of two subunits tightly complexed in the oxidized form of the enzyme.

An upstream terminator

The hydrogenase gene is in the correct orientation for the lac promoter located 2 kb upstream to serve as a transcription initiation site. However, it is poorly expressed in log-phase E. coli cells transformed with pHV15 and is not enhanced by IPTG in liquid culture [l]. This may be explained by termina- tion of transcripts from the lac promoter at the hairpin- forming structure located 70 - 80 bp upstream from the trans- lational start of the hydrogenase gene (Fig. 2 ) .

The 46-kDa hydrogenase

There is a remarkable pattern of cysteine residues in the NH2-terminal part of the 46-kDa hydrogenase (Fig. 2), which emphasizes the homology of this region of the sequence to bacterial [SFe-gS] ferredoxins. The primary structures of some of these [14] are compared with the homologous part of the 46-kDa hydrogenase sequence in Fig. 5. The pattern Cys-lle- Xaa- Cys- Xaa-Xaa- Cys- Xaa- Xaa-Xaa-Cys-Pro-Xaa-Xaa- Ala-(Ile) is found twice both in hydrogenase and in [SFe- 8S] ferredoxins. An X-ray crystallographic structure of P. aerogenes ferredoxin shows the eight cysteines to coordinate two [4Fe-4S] clusters [15]. Interestingly it was found that Cys-8, Cys-11, Cys-14 and Cys-45 (but not Cys-18) coordinate to one and Cys-35, Cys-38, Cys-41 and Cys-18 to the other cluster. The homology of the NH2-terminal part of hydrogenase to ferredoxins (Fig. 5) is significant and even extends a little beyond the immediate cluster regions. It may therefore be proposed that the hydrogenase from D. vulgaris (Hildenborough) will contain two ferredoxin-like [4Fe-4S] clusters in the NH2-terminal part of its sequence.

In [8Fe-8S] ferredoxins the two clusters transfer a single electron each (EA = -400 mV), while shuttling between the

I

I I I I l l II

I I I IIIII I I I 1 m

P I I . . . . ,I

0 1000 2000 3000 4000

Fig. 4. Gene search by the codon usage method. The probability that a stretch of sequence (window 25 bases) in either of three reading frames (I, 11, 111) is coding is plotted for the 4678-bp HindlII/EcoRI fragment as function of position [13]. The codon usagc table (Table 2) of the gene for the 46-kDa hydrogenase (A) served as the standard. Two additional genes are indicated, one for the 13.5-kDa protein (B) in frame I1 and one coding for a 43 562-Da protein in frame 111 (C). Stop codons (50% level) and AUG codons (bottom level) are indicated for each reading frame. P is a prediction for possible E. coli promoter sequences

Page 5: Nucleotide sequence of the gene encoding the hydrogenase from Desulfovibrio vulgaris (Hildenborough)

519

Table 2. Codon usage for the 46-kDa hydrogenase ( A ) and the 13.5-kDa protein ( B ) gene The number of times each codon is used is tabulated; aa = amino acid (single-letter code is used); the asterisks denote stop codons

aa Codon A B aa Codon A B aa Codon A B aa Codon A B

F TTT F TTC L TTA L TTG L CTT L CTC L CTA L CTG I ATT I ATC I ATA M ATG V GTT V GTC V GTA V GTG

0 0 s 14 4 S 0 0 s 1 1 s 2 2 P

10 5 P 0 0 P

11 2 P 1 1 T

17 4 T 1 1 T

18 2 T

0 2 A 17 6 A 0 0 A

10 0 A

TCT TCC TCA TCG CCT ccc CCA CCG ACT ACC ACA ACG

GCT GCC GCA GCG

0 0 Y 7 2 Y 0 0 * 5 2 * 1 1 H

19 3 H 0 0 Q 6 2 Q I 0 N

24 3 N 0 0 K 3 5 K 2 2 D

24 8 D 8 2 E 8 1 E

TAT TAC TAA TAG CAT CAC CAA CAG AAT AAC AAA AAG GAT GAC GAA GAG

4 o c 12 6 C 0 o * 1 1 w 8 1 R 8 4 R 0 O R

10 4 R 0 0 s 5 3 s 2 O R

34 13 R

3 1 G 20 5 G 15 3 G 13 2 G

TGT TG C TGA TGG CGT CGC CGA CGG ACT AGC AGA AGG GGT GGC GGA GGG

~

2 16 0 5 4 9 0 2

0 6 0 0

7 29 0 3

0 1 0 1

1 4 0 1

0 2 0 0

2 6 1 1

2 5 30 35 40 45 50 55 60 65 70 75 an H2ase H F V Q I D E A K C I G C D T C S Q Y C P T A A I F G E M G E P H S I P H S I P H I E A - - C I N C G Q C L T H C P ~ N A I Y E A Q * * * * * * * * * * * * * * * * * * * * C.B. A F V - I N D S - C V S C G A C A G E C P V S A I T Q G O T - - - - Q F V I D A D T C I D C G D C A ~ V C P V ~ A P ~ Q E * * * * * * * * * * * * * * * * * , * * * * * * * * * * * * * * * * * * * P.A. A Y V - I N D S - C I A C G A C K P E C P V N - I q Q G - S - - - - T Y A I E A D S C I D C G S C A S V C P V G A P D P E D * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * C.A. A Y V - I N E A - C I S C G A C D P E C P V D A I S Q G D S - - - - R Y V I D A D T C I D C G A C A G V C P V D A P V Q A * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * C.T. A H I - I T D E - C I S C G A C A A E C P V E A I H E G T G - - - - K Y Q V D A D T C I D C G A C Q A V C P T G A V ~ A E

1 5 10 15 20 25 30 35 40 45 50 55

Fig. 5. Comparison of part of the 46-kDa hydrogenase sequence (amino acids 25 -84) with sequences for bacterial [SFe-SS] firredoxins taken from [ 141. P. A. is Peptococcus aerogenes; C. B., Clostridium butyricum; C. A,, C. acidi-urici; C. T., C. tartarivorum

oxidized (+ 2; 2Fe2+ and 2Fe3+) and reduced (+ 1 ; 3Fe2+ and lFe3 ') states. We suggest that the two ferredoxin-like [4Fe-4S] clusters present in the NH2-terminal part of hydrogenase function similarly in the transfer of electrons from the third cluster, which is part of the hydrogen-binding center, to the electron acceptor cytochrome c3. This proposal is in agreement with the results of spectroscopic studies on the clusters [16, 171.

The distribution of cysteine residues along the polypeptide chain of the 46-kDa hydrogenase is also shown in Fig. 6. From the homology with bacterial [8Fe-8S] ferredoxins it may be proposed that Cys-34, Cys-37, Cys-40 and Cys-75 coordinate to one and Cys-65, Cys-68, Cys-71 and Cys-44 to the other cluster. It is not clear which four of the remaining cysteine residues coordinate to the third hydrogen-binding [4Fe-4S] cluster. Ten cysteine residues, available for this purpose, are distributed throughout the remainder of the polypeptide chain (Fig. 6). This part of the molecule does not show homology with the primary structures of other known redox carriers (e.g. the high-potential iron proteins or nitrogenase subunits). The gene encoding the hydrogenase from D. vulgaris (Hildenborough) may thus have originated by fusion of a gene for a unique hydrogen-oxidizing enzyme to that for an existing [8Fe-8S] ferredoxin electron carrier.

The distribution of positively charged (Lys, Arg) and negatively charged (Asp, Glu) residues is also represented in Fig. 6. It appears that the region binding the two ferredoxin- like [4Fe-4S] clusters is acidic. It is thus likely that there is also an excess of negative charges at the ferredoxin site of hydrogenase in the folded protein. This feature may promote

Fig. 6. ( A ) Distribution of charged residues along the 46-kDa polypeptide chain of hydrogenase. ( B ) Scale and distribution of' Cys residues along the sequence. (A) Only Arg and Lys (pointing upwards) and Asp and Glu residues (pointing downwards) are represented. Two acidic regions of the sequence are indicated (++)

interaction with the electron carrier cytochrome c3, which is a small basic protein with an isoelectric point PI = 10.5 [8, 18-20].

The 13.5-kDa protein

The primary structure of the 13.5-kDa protein, derived from the nucleic acid sequence, shows the presence of only a single cysteine residue (Cys-15). It is therefore unlikely to have either a cytochrome-c-type haem group (requiring the sequence Cys-Xaa-Xaa-Cys-His) or an iron-sulfur cluster by itself. From protein sequence comparisons using the com- parison programme DIAGON [21] no obvious homologies were found. There are several possible functions for the 13.5- kDa protein in a 1 : 1 complex with the 46-kDa hydrogenase. It may serve for attachment of the 46-kDa component to a

Page 6: Nucleotide sequence of the gene encoding the hydrogenase from Desulfovibrio vulgaris (Hildenborough)

520

membrane surface, or it may protect the 46-kDa hydrogenase against oxygen-inactivation or it may serve a regulatory role.

There is an urgent need for establishing the role of the 13.5-kDa protein by the appropriate enzymological studies. It would also be interesting to check whether the two homolo- gous enzymes from D. vulgaris Miyazaki K [9] and D. desul- furicans NRC49001 [l 11 which have been reported to consist of a single subunit (mass 46 - 55 kDa) also contain a small subunit. The hydrogenase from D. gigas has been reported to consist of two subunits of 62 kDa and 26 kDa [22]. Although this enzyme also contains three [4Fe-4S] clusters [22] it differs from the first group in containing nickel, in its enzymatic properties and its failure to cross-react antigenically.

The advice of Alan Coulson and Rodger Staden on their areas of expertise is gratefully acknowledged. Steve Powell is thanked for determining the amino acid composition of hydrogenase. G. V. enjoyed regular discussions with Homme Hellinga as well as the support of an EMBO long-term fellowship while on leave from the Department of Biochemistry, Agricultural University, Wageningen, The Netherlands.

REFERENCES Voordouw, G., Walker, J. E. & Brenner, S. (1985) Eur. J . Biochem

Bankier, A. T. & Barrell, B. G. (1983) in Techniques in nucleic acid biochemistry (Flavell, R. A. ed.) pp. 1-34. Elsevier Scientific Publishers, Ireland Ltd.

148, 509-514.

Vieira, J. &Messing, J . (1982) Gene 19, 259-268. Messing, J. & Vieira, J. (1982) Gene 19, 269-276.

5. Sanger, F., Nicklen, S. & Coulson, A. R. (1977) Proc. Natl Acad.

6 . Staden, R. (1982) Nucleic Acids Res. 10, 4731 -4751. 7. Staden, R. (1984) Nucleic Acids Res. 12, 499-503. 8. Postgate, J. R. (1984) The sulphate-reducing bacteria, 2nd ed,

9. Aketagawa, J., Kobayashi, K. & Ishimoto, M. (1983) J. Biochem.

10. Van der Westen, H. M., Mayhew, S. G. & Veeger, C. (1978) FEBS

1 1 . Glick, B. R., Martin, W. G. & Martin, S. G. (1980) Can. J .

12. Staden, R. & McLachlan, A. D. (1982) Nucleic Acids Res. 10,

13. Staden, R. (1984) Nucleic Acids Res. 12, 521 -538. 14. Yasunobu, K. T. & Tanaka, M. (1973) in Iron-sulfur proteins 11

(Lovenberg, W. ed.) pp. 27-130, Academic Press, New York. 15. Adman, E. T., Sieker, L. C. & Jensen, L. H. (1973) J . Biol. Chem.

16. Grande, H. J., van Dijk, C., Dunham, W. R. & Veeger, C . (1982) in The biological chemistry of iron (Dunford, H. B. et al., eds) pp. 193 -206, Reidel Publishing Co., Dordrecht.

17. Grande, H. J., Dunham, W. R., Averill, B., van Dijk, C. & Sands, R. H. (1983) Eur. J . Biochem. 136,201 -207.

18. Trousil, E. B. &Campbell, L. L. (1974) J . Biol. Chem. 249, 386- 393.

19. Pierrot, M., Haser, R., Frey, M., Pagan, F. & Astier, J. (1982) J . Biol. Chem. 257, 14341 -14348.

20. Higuchi, Y., Kusunoki, M., Matsuura, Y., Yasuoka, N. & Kakudo, M. (1984) J . Mol. Biol. 172, 109-139.

21. Staden, R. (1982) Nucleic Acids Res. 10, 2951 -2961. 22. Hatchikian, E. C., Bruschi, M. & Le Gall, J. (1978) Biochem.

Sci. USA 74, 5463 - 5467.

Cambridge University Press, Cambridge, UK

(Tokyo) 93, 755-762.

Lett. 86, 122- 126.

Microbiol. 26, 1214- 1223.

141 - 156.

248, 3987 - 3996.

Biophys. Res. Commun. 82, 451 -461.