6

Click here to load reader

Nucleotide sequence and characterization of a cDNA encoding the acetohydroxy acid isomeroreductase from Arabidopsis thaliana

Embed Size (px)

Citation preview

Page 1: Nucleotide sequence and characterization of a cDNA encoding the acetohydroxy acid isomeroreductase from Arabidopsis thaliana

Plant Molecular Biology 21:717-722, 1993. © 1993 Kluwer Academic Publishers. Printed in Belgium.

Update section

Short communication

717

Nucleotide sequence and characterization of a cDNA encoding the acetohydroxy acid isomeroreductase from Arabidopsis thaliana

Gilles Curien, Renaud Dumas* and Roland Douce UnitO Mixte CNRS/Rhrne-Poulenc Agrochimie, 14-20 rue Pierre Baizet, 69263 Lyon Cedex 09, France (* author for correspondence)

Received 15 September 1992; accepted in revised form 18 November 1992

Key words: acetohydroxy acid isomeroreductase, Arabidopsis thaliana, cDNA sequence, transit peptide

Abstract

The primary structure of acetohydroxy acid isomeroreductase from Arabidopsis thaliana was deduced from two overlapping cDNA. The full-length cDNA sequence predicts an amino acid sequence for the protein precursor of 591 residues including a putative transit peptide of 67 amino acids. Comparison of the A. thaliana and spinach acetohydroxy acid isomeroreductases reveals that the sequences are conserved in the mature protein regions, but divergent in the transit peptides and around their putative processing site.

Acetohydroxy acid isomeroreductase (AAIR, EC 1.1.1.86), the second enzyme in the parallel isoleucine/valine biosynthetic pathway, catalyses a two-step reaction in which the substrate, either 2-acetolactate or 2-aceto-2-hydroxybutyrate, is converted via an alkyl migration and a NADPH- dependent reduction to give 2,3-dihydroxy-3- methylbutyrate or 2,3-dihydroxy-3-methylvaler- ate respectively. The demonstration that selective inhibitors [2, 13] of acetohydroxy acid isomer- oreductase give rise to herbicidal effects has led to a renewed interest in the study of this enzyme. The protein has been first isolated and character- ized from Salmonella typhimurium [ 1, 8 ] and par-

tially purified from Neurospora crassa [ 9], Saccha- romyces cerevisiae [6 ] and Phaseolus radiatus [ 12]. cDNA sequences were then characterized in Es- cherichia coli [ 17] and S. cerevisiae [ 11 ]. Recently, we have purified AAIR from the stroma of spin- ach chloroplasts [ 3 ], isolated a full-length cDNA encoding for this protein [4], and overexpressed the mature spinach protein in E. coli to investigate the kinetic properties of this enzyme [5]. Further- more, we have shown that the acetohydroxy acid isomeroreductase gene in spinach is present in a single copy per haploid genome [4]. Here, as a preliminary approach to understanding the mo- lecular regulation of the three enzymes (acetohy-

The nucleotide sequence data reported will appear in the EMBL, GenBank and DDBJ Nucleotide Sequence Databases under the accession number X68150, (acetohydroxy acid isomeroreductase from Arabidopsis thaliana).

Page 2: Nucleotide sequence and characterization of a cDNA encoding the acetohydroxy acid isomeroreductase from Arabidopsis thaliana

718

A1 7 agt caatttgctacaacttaqaaaATG

Met

CCT TCT CTT TCA TGC CCA TCT Pro Set Leu Set Cys Pro Set

TGG TCT TCC AAA GCC AGA ACC Trp Set Ser Lys Ala Arg Thr

CTC TCG TCT TCT TCC AAG TCT Leu Set Ser Ser Ser Lys Ser

GCT GGA AAT GGC GCC ACT GGA Ala Gly Asn Gly Ala Thr Gly

TCT TCG TCT GCG GTC AAA GCC Ser Set Ser Ala Val Lys Ala

TCT GTC TTC AAA AAG GAG AAA Ser Val Phe Lys Lys Glu Lys

TAC ATT GTG AGA GGA GGA AGA Tyr Ile Val Arg Gly Gly Arg

GCT TTC AAG GGG ATT AAG CAG Ala Phe Lys Gly Ile Lys Gln

CAG GGA CCT GCC CAG GCT CAG Gln Gly Pro Ala Gln Ala Gln

GCA AAG TCT GAC ATT GTT GTC Ala Lys Ser Asp Ile Val Val

CGC TCA TTT GAG GAG GCA CGT Arg Ser Phe Glu Glu Ala Arg

GGT ACT TTG GGT GAT ATA TGG Gly Thr Leu Gly Asp Ile Trp

A2v

GTA TTG CTT TTG ATC TCT GAT Val Leu Leu Leu Ile Ser Asp

AAA ATA TTC TCT CAC ATG AAG Lys Ile Phe Ser His Met Lys

CAC GGG TTT CTA CTA GGG CAT His Gly Phe Leu Leu Gly His

CCA AAG AAC ATC AGT GTG GTC Pro Lys Asn Ile Ser Val Val

CCT TCT GTG AGG AGG CTT TAC Pro Ser Val Arg Arg Leu Tyr

GCT GGA ATC AAC GCC AGT TTT Ala Gly Ile Asn Ala Ser Phe

AGA GCC GCC GAT GTT GCA TTG Arg Ala Ala Asp Val Ala Leu

CCG TTT ACT TTT GCT ACT ACT Pro Phe Thr Phe Ala Thr Thr

ATC TTT GGA GAA AGA GGA ATT Ile Phe Gly Glu Arg Gly Ile

GCG GCG GCT ACT TCA TCC ATC GCT 51 Ala Ala Ala Thr Ser Ser Ile Ala 9

CCT TCT TCT TCA TCC AAA ACC CTT 96 Pro Ser Ser Ser Ser Lys Thr Leu 24

TTG GCT CTA CCC AAT ATC GGT TTC 141 Leu Ala Leu Pro Asn Ile Gly Phe 39

CTG AGG TCG CTT ACT GCC ACC GTC 186 Leu Arg Set Leu Thr Ala Thr Val 54

TCC TCC CTT GCC GCT CGC ATG GTT 231 Set Ser Leu Ala Ala Arg Me~al 69

CCT GTT TCT CTC GAT TTT GAG ACA 276 Pro Val Ser Leu Asp Phe Glu Thr 84

GTT TCT CTT GCT GGT TAC GAA GAG 321 Val Ser Leu Ala Gly Tyr Glu Glu 99

GAC TTG TTC AAG CAT CTC CCA GAT 366 Asp Leu Phe Lys His Leu Pro Asp 114

ATT GGT GTG ATT GGC TGG GGA TCT 411 Ile Gly Val Ile Gly Trp Gly Set 129

AAT TTA AGG GAT TCA CTT GTG GAG 456 Asn Leu Arg Asp Ser Leu Val Glu 144

AAG ATT GGT CTC AGA AAA GGG TCT 501 Lys lle Gly Leu Arg Lys Gly Set 159

GCT GCT GGC TTC ACT GAA GAA AGT 546 Ala Ala Gly Phe Thr Glu Glu Ser 174

GAA ACT ATC GCT GGC AGT GAT CTT 591 Glu Thr Ile Ala Gly Ser Asp Leu 189

GCT GCT CAA GCT GAT AAC TAT GAG 636 Ala Ala Gln Ala Asp Asn Tyr Glu 204

CCA AAC AGC ATT CTT GGT TTA TCA 681 Pro Asn Ser Ile Leu Gly Leu Ser 219

TTA CAG TCC TCG GGA CTC GAT TTC 726 Leu Gln Set Ser Gly Leu Asp Phe 234

GCT GTA TGC CCT AAG GGA ATG GGT 771 Ala Val Cys Pro Lys Gly Met Gly 249

GTC CAA GGA AAA GAA ATT AAC GGT 816 Val Gln Gly Lys Glu Ile Asn Gly 264

GCA GTC CAC CAG GAT GTT GAC GGT 861 Ala Val His Gln Asp Val Asp Gly 279

GGA TGG TCA GTA GCA CTT GGT TCT 906 Gly Trp Ser Val Ala Leu Gly Set 294

CTT GAA CAG GAG TAC AGG AGT GAC 951 Leu Glu Gln Glu Tyr Arg Ser Asp 309

TTG CTT GGT GCT GTT CAC GGA ATC 996 Leu Leu Gly Ala Val His Gly Ile 324

droxy acid synthase, EC 4.1.3.18; acetohydroxy acid isomeroreductase, and dihydroxy acid dehy- dratase, EC4.2.1.9) involved in the common

pathway of branched-chain amino acids, we have isolated and characterized a c D N A encoding AAIR from the model plant Arabidopsis thaliana.

Page 3: Nucleotide sequence and characterization of a cDNA encoding the acetohydroxy acid isomeroreductase from Arabidopsis thaliana

719

GTG GAG TCT CTG TTT AGA AGA TAC ACe GAA AAT GGG ATG AGT GAA 1041 Val Glu Set Leu Phe Arg Arg Tyr Thr Glu Ash Gly Met Ser GIu 339

GAC TTG GCT TAC AAG AAC ACA GTA GAA TGC ATC ACA GGA ACA ATT 1086 Asp Leu Ala Tyr Lys Ash Thr Val Glu Cys Ile Thr Gly Thr Ile 354

TCA AGG ACT ATC TCT ACC CAG GGA ATG TTG GCT GTG TAC AAC TCC 1131 Ser Arg Thr Ile Ser Thr Gln Gly Met Leu Ala Val Tyr Asn Set 369

TTG TCT GAA GAA GGT AAA AAA GAT TTT GAG ACT GCA TAC AGC GCA 1176 Leu Ser Glu Glu Gly Lys Lys Asp Phe Glu Thr Ala Tyr Set Ala 384

TCC TTC TAT CCT TGT ATG GAG ATT CTC TAT GAA TGT TAC GAG GAT 1221 Ser Phe Tyr Pro Cys Met Glu Ile Leu Tyr Glu Cys Tyr Glu Asp 399

GTA CAA AGT GGC AGC GAA ATC CGA AGT GTT GTC TTA GCC GGT CGT 1266 Val Gln Ser Gly Set Glu Ile Arg Set Val Val Leu Ala Gly Arg 414

CGC TTC TAT GAA AAG GAA GGC TTG CCA GCA TTC CCT ATG GGA AAT 1311 Arg Phe Tyr Glu Lys Glu Gly Leu Pro Ala Phe Pro Met Gly ASh 429

ATT GAC CAG ACA AGA ATG TGG AAG GTG GGT GAA CGC GTC AGG AAG 1356 Ile Asp Gln Thr Arg Met Trp Lys Val Gly Glu Arg Val Arg Lys 444

TCC AGA CCT GCT GGT GAC TTG GGT CCA TTG TAT CCC TTC ACC GCT 1401 Ser Arg Pro Ala Gly Asp Leu Gly Pro Leu Tyr Pro Phe Thr Ala 459

GGA GTT TAT GTA GCA CTT ATG ATG GCT CAG ATT GAG ATC TTG AGG 1446 Gly Val Tyr Val Ala Leu Met Met Ala Gln Ile Glu Ile Leu Arg 474

A1 7

AAG AAA GGT CAC TCT TAC TCA GAA ATC ATC AAC GAG AGT GTG ATT 1491 Lys Lys Gly His Ser Tyr Ser Glu Ile Ile Ash Glu Set Val Ile 489

GAA TCC GTG GAC TCT CTA AAC CCA TTT ATG CAC GCC AGG GGA GTG 1536 Glu Ser Val Asp Ser Leu ASh Pro Phe Met His Ala Arg Gly Val 504

TCC TTC ATG GTT GAC AAC TGC TCA ACC ACA GCA AGA TTG GGT TCG 1581 Ser Phe Met Val Asp Asn Cys Ser Thr Thr Ala Arg Leu Gly Ser 519

AGG AAA TGG GCG CCA CGG TTT GAC TAC ATC CTG ACC CAA CAA GCT 1626 Arg Lys Trp Ala Pro Arg Phe Asp Tyr Ile Leu Thr Gln Gln Ala 534

CTT GTG GCT GTG GAC AGT GGT GCA GCA ATC AAC AGA GAC CTA ATC 1671 Leu Val Ala Val Asp Set Gly Ala Ala Ile Ash Arg Asp Leu Ile 549

AGT AAC TTC TTC TCT GAT CCA GTC CAT GGT GCC ATT GAG GTC TGC 1716 Ser Asn Phe Phe Ser Asp Pro Val His Gly Ala Ile Glu Val Cys 564

GCA CAG CTC AGG CCT ACC GTT GAT ATC TCT GTG CCT GCT GAT GTA 1761 Ala Gln Leu Arg Pro Thr Val Asp Ile Ser Val Pro Ala Asp Val 579

GAC TTT GTT CGA CCT GAG TTG CGT CAA TCT AGC AAC tgagtgaaggg 1808

Asp Phe Val Arg Pro Glu Leu Arg Gln Set Set Ash stop 591

tt gaaagtttgt cagt ct ctt atttgt aat cggagt att aagt cgagagtttgt gatgtt 1868

tt ctt aggcgtgactgtttgttttgtttgaaggatt atgt ct ctttgctttggt ctt aaa 1928

A27

atct actt aaatcaataaaat agtttaacgaaaaaaaa 1966

Fig. 1. Nucleotide sequence of the cDNA clones A1 and A2 and their predicted amino acid sequence. These are overlapping clones with their termini indicated. The translation start site for the reading frame coding for acetohydroxy acid isomeroreductase and the putative polyadenylation signal are underlined. The arrow indicates the presumptive processing site between the transit pep- tide and the mature protein.

A Clontech 2gtl 1 c D N A library ofA. thaliana ization probe. Ten positive clones were isolated cv. Columbia was screened using a Spinacia ole- and the inserts subcloned into pUC 19 for dideoxy racea AAIR full-length e D N A [4] as the hybrid- sequencing on both strands. However, all these

Page 4: Nucleotide sequence and characterization of a cDNA encoding the acetohydroxy acid isomeroreductase from Arabidopsis thaliana

720

A. thaliana S. oleracea Consensus

46 50 50

A. thaliana S. oleracea Consensus

A. thaliana S, oleracea Consensus

AFKGIKQIGV

D AFKGII~IGV

I00

I00

150

150

S. oleracea 200 Consensus 200

S. oleracea ~FS~ZPNS 250 Consensus ~ .IFSi~q~NS II.G~HE~q~L 250

A. thafiana S. oleracea Consensus 1

PKG~3PSVRR LYV~.~G AGIII.ISFAVH QDVDGR~.pV m-G~I.~IP_ ~ 300

S. oleracea PFIFATrLEQ EY~SDIFG~R GILIC_4kVHGI 350

Consensus PFTFATrLEQ E~.ISDIF(~ GILIC~VHGI 350

A. thaliana ~ ~ LS~SsI~:~t~ ~ 394 S. oleracea 400 Consensus 400

A. thaliana ~ ! E I ~ RFYEKEGLPA ~ I ~ R ~ 444 S. oleracea EIRSVVIAE~ RFYEEEGLPA 450 Consensus EIRSVVIAGR RFYEKEGLPA FPM~.IIDQTR 450

A. thaliana RPA(~3I/~?L YPFrAGVYVA IMMAQIEILR KKGHSYSEII NESVI~ 494 S. oleracea RPAQDLGPL YPFrAGVYVA ~IEII/~ KKGHSYSEII ~ 500

Consensus RPAGDLGPL YPFTAGVYVA I/vMAQIEILR KKGHSYSEII 500

A. thaliana S. oleracea Consensus

I/q?F~4ARGV SFMVDNCSTT ARLC_-SRK~IAP R F D Y I ~ 544 I/~?FMHARGV SFMVDNCSTT ARIC~ R F D Y I ~ 550

I/qPFMHARGV SFMVDNCSrr ARLC~ RFDYI 550

S. oleracea DP AQLR~VDI S 595

Consensus ~ 4DLISN~ .IS DP AQI/~ .[4DIS 600

Fig. 2. Comparison ofArabidopsis thaliana and Spinacia oleracea deduced amino acid sequences. The arrow indicates the end of the presumptive transit peptides, and the beginning of the mature proteins. The putative NADPH-binding site is shaded.

clones were not long enough to encode the entire were used to determine the entire sequence of the AAIR sequence, Two distinct clones, A1 and A2 c D N A (Fig. 1). Southern-blot analysis indicates containing a 1 kb overlapping identical sequence, that these clones derive from an identical gene

Page 5: Nucleotide sequence and characterization of a cDNA encoding the acetohydroxy acid isomeroreductase from Arabidopsis thaliana

(not shown). The nucleotide sequence and de- duced amino acid sequence are presented in Fig. 1. The AAIR sequence consists of 1773 bp of coding region, 24 bp of 5'-untranslated sequence and 169bp of 3'-untranslated end. The stop codon (TAG) found in the 5'-untranslated re- gion, three bases before the first ATG codon, provides evidence that this open reading frame represents the full length of the coding sequence. The nucleotide sequence around the first ATG codon (GAAAATGGC) is similar to the pro- posed consensus sequence (AACAATGGC) for translation initiation in plants [10]. The 3'- untranslated sequence contains a putative poly- adenylation signal AATAAA, at position 1942, followed after 11 bp by a stretch of 8 adenines. Starting with the first methionine residue the open reading frame codes for a protein of 591 amino acid residues, giving a theoretical molecular mass of 63.9 kDa. The putative end of the A. thaliana transit peptide was deduced by homology with the beginning of the spinach mature protein [4].

Comparison of the A. thaliana and spinach ac- etohydroxy acid isomeroreductases reveals that sequences are highly conserved in the mature pro- tein regions (85~o), but divergent in the transit peptides (30~o) and around the putative process- ing sites (Fig. 2). However, the amino acid com- position of the A. thaliana and spinach transit peptides are closely related since they exhibit a typical enrichment in alanine (16~o and 21 ~o re- spectively) and serine (28~o and 24~o respec- tively) and do not contain negatively charged res- idues (aspartic or glutamic acid), features that are characteristic of chloroplast transit peptides [7]. The first eighteen amino acids at the N-terminus of the mature proteins show only 33 ~o of homol- ogy, suggesting that these amino acid residues of the mature protein are not important for the structure or the enzyme activity, but may be in- volved in the protein transport into chloroplasts. As shown earlier [15, 16], the transit peptide is not the only prerequisite for efficient interaction with the chloroplast because deletion of the first amino acids of mature nuclear encoded chloro- plast proteins abolishes binding and subsequent import [ 15, 16]. It has been therefore suggested

721

that the first amino acids of these mature proteins may be involved in the precursor protein confor- mation allowing the recognition of the transit pep- tide by the receptor complex in the chloroplast envelope [ 16]. In contrast to the high homology seen throughout the plant AAIRs, alignment with the E. coli and S. cerevisiae sequences reveals only a low overall degree of identity at the amino acid level (less than 30~o) [4]. One of the most con- served regions shared by these four proteins is around the putative NADPH binding site: flail- fold centered around a highly conserved sequence Gly-Xaa-Gly-Xaa-Xaa-Ala (or Gly)-Xaa-Xaa- Xaa-Ala (or Gly)- (where Xaa is any amino acid) [14, 18] (Fig. 2).

A1 and A2 cDNA were further used to screen a Clontech EMBL-3 genomic library of A. tha- liana cv. Columbia. One positive clone contain- ing the entire sequence of the A A I R gene was isolated. Preliminary results in the sequencing of this gene indicate that the A. thaliana AAIR is encoded by an interrupted gene containing nine introns (not shown). The fine structure of this gene is currently in progress so that detailed stud- ies on the regulation of the AA1R biogenesis in higher plants may be performed.

Acknowledgements

We would like to thank Rick DeRose, Michel Lebrun, Norbert Rolland, Bernard Dumas and Dominique Job for helpful discussions.

References

1. Arfin SM, Umbarger HE: Purification of the acetohy- droxy acid isomeroreductase of Salmonella typhimurium. J Biol Chem 244:1118-1127 (1969).

2. Aulabaugh A, Schloss JV: Oxalylhydroxamates as reaction-intermediate analogs for ketol-acid reductoi- somerase. Biochemistry 29:2824-2830 (1990).

3. Dumas R, Joyard J, Douce R: Purification and charac- terization of acetohydroxy acid reductoisomerase from spinach chloroplasts. Biochem J 262:971-976 (1989).

4. Dumas R, Lebrun M, Douce R: Isolation, characteriza- tion and sequence analysis of a full-length cDNA encod- ing acetohydroxy acid reductoisomerase from spinach chloroplasts. Biochem J 277:469-475 (1991).

Page 6: Nucleotide sequence and characterization of a cDNA encoding the acetohydroxy acid isomeroreductase from Arabidopsis thaliana

722

5. Dumas R, Job D, Ortholand J-Y, Emeric G, Greiner A, Douce R: Isolation and kinetic properties of acetohy- droxy acid isomeroreductase from spinach chloroplasts overexpressed in Escherichia coli. Biochem J 288: 865- 874 (1992).

6. Hawkes TR, Edwards LS: Inhibition of acetolactate isomeroreductase from Saccharomyces cerevisiae. In: Barak Z, Chipman DM, Schloss JV (eds) Biosynthesis of branched-chain amino acids, pp. 413-424. VCH Pub- lishers, New York (1990).

7. Heinje G, Steppuhn J, Herrmann R: Domain structure of mitochondrial and chloroplast targeting peptides. Eur J Biochem 180:535-545 (1989).

8. Hofter JG, Decedue CJ, Luginbuhl GH, Reynolds JA, Burns RO: The subunit structure of acetohydroxyacid isomeroreductase from Salmonella typhimurium. J Biol Chem 250:877-882 (1975).

9. Kiritani K, Narise S, Wagner RP: The reductoisomerase ofNeurospora crassa. J Biol Chem 241: 2047-2051 (1966).

10. LOtcke HA, Chow KC, Mickel FS, Moss KA, Kern HF, Scheele GA: Selection of A U G initiation codons differs in plants and animals. EMBO J 6:43-48 (1987).

11. Petersen JGL, Holmberg S: The ilv5 gene of Saccharo- myces cerevisiae is highly expressed. Nucl Acids Res 14: 9631-9651 (1986).

12. Satyanaryana T, Radhakrishnan AN: Biosynthesis ofva- line and isoleucine in plants III: Reductoisomerase of Phaseolus radiatus. Biochim Biophys Acta 110:380-388 (1965).

13. Schulz A, SpOnemann P, KOcher H, Wengenmayer F: The herbicidally active experimental compound Hoe 704 is a potent inhibitor of the enzyme acetolactate reductoi- somerase. FEBS Lett 238:375-378 (1988).

14. Scrutton NS, Berry A, Perham RN: Redesign of the co- enzyme specificity of a dehydrogenase by protein engi- neering. Nature 343:38-43 (1990).

15. Smeekens S, Geerts D, Bauerle C, Weisbeek P: Essen- tial function in chloroplast recognition of the ferredoxin transit peptide processing region. Mol Gen Genet 216: 178-182 (1989).

16. Wasmann CC, Reiss B, Bartlett SG, Bohnert HJ: The importance of the transit peptide and the transported protein for protein import into chloroplasts. Mol Gen Genet 205:446-453 (1986).

17. Wek RC, Hatfield GW: Nucleotide sequence and in vivo expression of the ilvY and ilvC genes in Escherichia coli K12. J Biol Chem 261:2441-2450 (1986).

18. Wierenga RK, De Maeyer MCH, Hol WG: Interaction of pyrophosphate moieties with ct-helixes in dinucleotide binding proteins. Biochemistry 24:1346-1357 (1985).