5
Eur. J. Biochem. 217, 83-87 (1993) 0 FEBS 1993 Sorbitol dehydrogenase Full-length cDNA sequencing reveals a mRNA coding for a protein containing an additional 42 amino acids at the N-terminal end Yi WEN’.* and Isaac BEKH0Rl.I Laboratory for Molecular Genetics, Doheny Eye Institute, Los Angeles, USA University of Southern California School of Dentistry, Los Angcles , USA (Received June 2/July 20, 1993) - EJB 93 0799/1 A cDNA clone encoding rat sorbitol dehydrogenase (SDH) was isolated from a rat testis iZAP I1 cDNA library. The full-length cDNA insert contained 2277 base pairs (bp), starting 182 bp up- stream from an ATG codon where translation to the active enzyme SDH is presumed to be initiated. A second ATG codon, however, was found 126 bp upstream, aligned in the same reading frame as that of the active enzyme. Therefore, the coding sequence for SDH can be translated into an addi- tional 42-amino-acid polypeptide linked to the N-terminal amino acid of the enzyme, generating a pre-sorbitol dehydrogenase. The sequence data indicate that the nucleotide cnvironmcnt around this ATG codon is more fdvorablc towards it being the actual open reading frame (ORF) for a pre-SDH than lhc ATG codon preceding the nucleotide sequence for SDH. Since no known SDH starts with the additional 42 amino acids, it may be that post-translational removal of this polypeptide accompa- nies the release of the active enzyme. Next, thc 3’ untranslatcd region of the cDNA contained a non-coding 1021 bp downstream from the TAA stop codon. The latter sequence included three putative poly(A) signals: one at nucleotides 1362- 1367, the second at nucleotides 1465- 1470, and the third at nucleotides 2212-2217 [17 bp away from the poly(A) tail]. In addition to the above findings we also report a variance in one of the amino acids in the SDH cDNA sequence. This variance occurs at position 957-960, where threonine is coded for instead of aspartic acid; in the rat testis SDH cDNA, we find the sequence is ACG inslcad of GAC, as was reported for the rat liver SDH cDNA. Northern-blot hybridization analysis showed that SDH mRNA is a doublet, onc band of 4 kb and the other of 2.3-2.4 kb, in both the rat liver and the rat lens, further confirming that Lhc isolated SDH cDNA constituted a full-length cDNA. Sorbitol dchydrogenase (SDH) is an enzyme of the polyol pathway, which is thought to play a significant role in diabetes, in cataract formation, and in neuropathy, retino- palhy and nephropathy [l-51. The enzyme is a tetramer with one zinc atodsubunit [6] that, in general and cxcept for the human liver enzyme [7], has no activity towards alcohol, and it is a member of the family of zinc-containing alcohol dehydrogenascs 181. Recent analysis of the sequence for the rat liver messenger RNA for SDH indicated that the coding sequence constituted 1094 bp flanked by short untranslatcd sequences at both the 5’ and 3’ ends of the molecule [9]. The Correspoi?dence to I. Bekhor, University of Southern California, Gerontology Ccntcr, 371 5 McClintock Avenue. Los Angeles. CA 90089-0191, USA Fcix: +1 213 740 0235. Abbrrvinrions. pre-SDH, pre-sorbitol dehydrogenase; SDH, sor- bitol dehydrogenase; ORE opcn rcading frame; bp, pase pair. Eti~~ies. Sorbitol dehydrogenase (EC 1.1.1.14); RNA polymer- ase (EC 2.7.7.6); DNase (EC 3.1.21 .I); restriction endonuclcascs IEC 3.1.21.4); T7 DNA polymerase (EC 2.7.7.7). Note. The novel nucleotide sequence data published here have bccn deposited with the EMBL sequence data banks and are avail- able under acccssion number X74593. 3’ region included the classical poly(A) signal, AATAAA, located 105 bp downstream from the stop codon, TAA [9]. In the studies of Karlsson et al. [9] the sequence was deduccd from two separate cDNA clones. In [hi\ study. we present sequence data obtained from a singlc full-length cDNA-containing clone isolated from a rat testis ZAP I1 cDNA library. The data now includes the complete sequence of the mRNA at the 3’ untranslated region, and also an addi- tional sequence of 182 bp in the 5’ region. The latter S‘ sequencc included an Additional ATG start codon, found 126 bp upstream from the initial ATG codon reported by Karlsson ct al. [9]. These findings suggest that the mature enzyme may be post-translationally processed, a result that has not been observed previously. MATERIALS AND METHODS Rat tissue Lenses and livers were dissected out from four week old female Sprague Dawley rats (King Animals, Tnc.) fed a diet containing ground Purina Rat Chow. Following dissection, tissues were immediately frozen in liquid nitrogen, and stored at -80OC until nceded. Use of animals was in accor-

Sorbitol dehydrogenase : Full-length cDNA sequencing reveals a mRNA coding for a protein containing an additional 42 amino acids at the N-terminal end

  • Upload
    yi-wen

  • View
    216

  • Download
    2

Embed Size (px)

Citation preview

Page 1: Sorbitol dehydrogenase : Full-length cDNA sequencing reveals a mRNA coding for a protein containing an additional 42 amino acids at the N-terminal end

Eur. J. Biochem. 217, 83-87 (1993) 0 FEBS 1993

Sorbitol dehydrogenase Full-length cDNA sequencing reveals a mRNA coding for a protein containing an additional 42 amino acids at the N-terminal end

Yi WEN’.* and Isaac BEKH0Rl.I ‘ Laboratory for Molecular Genetics, Doheny Eye Institute, Los Angeles, USA ’ University of Southern California School of Dentistry, Los Angcles , USA

(Received June 2/July 20, 1993) - EJB 93 0799/1

A cDNA clone encoding rat sorbitol dehydrogenase (SDH) was isolated from a rat testis iZAP I1 cDNA library. The full-length cDNA insert contained 2277 base pairs (bp), starting 182 bp up- stream from an ATG codon where translation to the active enzyme SDH is presumed to be initiated. A second ATG codon, however, was found 126 bp upstream, aligned in the same reading frame as that of the active enzyme. Therefore, the coding sequence for SDH can be translated into an addi- tional 42-amino-acid polypeptide linked to the N-terminal amino acid of the enzyme, generating a pre-sorbitol dehydrogenase. The sequence data indicate that the nucleotide cnvironmcnt around this ATG codon is more fdvorablc towards it being the actual open reading frame (ORF) for a pre-SDH than lhc ATG codon preceding the nucleotide sequence for SDH. Since no known SDH starts with the additional 42 amino acids, it may be that post-translational removal of this polypeptide accompa- nies the release of the active enzyme. Next, thc 3’ untranslatcd region of the cDNA contained a non-coding 1021 bp downstream from the TAA stop codon. The latter sequence included three putative poly(A) signals: one at nucleotides 1362- 1367, the second at nucleotides 1465- 1470, and the third at nucleotides 2212-2217 [17 bp away from the poly(A) tail]. In addition to the above findings we also report a variance in one of the amino acids in the SDH cDNA sequence. This variance occurs at position 957-960, where threonine is coded for instead of aspartic acid; in the rat testis SDH cDNA, we find the sequence is ACG inslcad of GAC, as was reported for the rat liver SDH cDNA. Northern-blot hybridization analysis showed that SDH mRNA is a doublet, onc band of 4 kb and the other of 2.3-2.4 kb, in both the rat liver and the rat lens, further confirming that Lhc isolated SDH cDNA constituted a full-length cDNA.

Sorbitol dchydrogenase (SDH) is an enzyme of the polyol pathway, which is thought to play a significant role in diabetes, in cataract formation, and in neuropathy, retino- palhy and nephropathy [l-51. The enzyme is a tetramer with one zinc atodsubunit [6] that, in general and cxcept for the human liver enzyme [7], has no activity towards alcohol, and it is a member of the family of zinc-containing alcohol dehydrogenascs 181. Recent analysis of the sequence for the rat liver messenger RNA for SDH indicated that the coding sequence constituted 1094 bp flanked by short untranslatcd sequences at both the 5’ and 3’ ends of the molecule [9]. The

Correspoi?dence to I. Bekhor, University of Southern California, Gerontology Ccntcr, 371 5 McClintock Avenue. Los Angeles. CA 90089-0191, USA

Fcix: + 1 213 740 0235. Abbrrvinrions. pre-SDH, pre-sorbitol dehydrogenase; SDH, sor-

bitol dehydrogenase; ORE opcn rcading frame; bp, pase pair. E t i ~ ~ i e s . Sorbitol dehydrogenase (EC 1.1.1.14); RNA polymer-

ase (EC 2.7.7.6); DNase (EC 3.1.21 . I ) ; restriction endonuclcascs IEC 3.1.21.4); T7 DNA polymerase (EC 2.7.7.7).

Note. The novel nucleotide sequence data published here have bccn deposited with the EMBL sequence data banks and are avail- able under acccssion number X74593.

3’ region included the classical poly(A) signal, AATAAA, located 105 bp downstream from the stop codon, TAA [9].

In the studies of Karlsson et al. [9] the sequence was deduccd from two separate cDNA clones. In [hi\ study. we present sequence data obtained from a singlc full-length cDNA-containing clone isolated from a rat testis Z A P I1 cDNA library. The data now includes the complete sequence of the mRNA at the 3’ untranslated region, and also an addi- tional sequence of 182 bp in the 5’ region. The latter S‘ sequencc included an Additional ATG start codon, found 126 bp upstream from the initial ATG codon reported by Karlsson ct al. [9]. These findings suggest that the mature enzyme may be post-translationally processed, a result that has not been observed previously.

MATERIALS AND METHODS Rat tissue

Lenses and livers were dissected out from four week old female Sprague Dawley rats (King Animals, Tnc.) fed a diet containing ground Purina Rat Chow. Following dissection, tissues were immediately frozen in liquid nitrogen, and stored at -80OC until nceded. Use of animals was in accor-

Page 2: Sorbitol dehydrogenase : Full-length cDNA sequencing reveals a mRNA coding for a protein containing an additional 42 amino acids at the N-terminal end

84

dance with guidelines in the Declaration of Helsinki and The Guiding Principles in the Care and Use of Animals (DHEW publication, NIH 86-23),

Preparation of riboprobes

The sources of the cRNA probes were as follows: the SDH probe was either an isolate (designated pBS,,, tes1,9

from a rat testis ;&ZAP I1 cDNA library (Stratagene), or sub- cloned into the pBS transcription vector (Stratagene) from the iSDH1 rat liver cDNA clone (EcoR1-fragment size, 600 bp; designated pBS,,, SDH) obtained from Karlsson et al. [9]. The SDH clones were linearized with BanzHl for synthesis of antisense cRNA with T7 RNA polymerase, and with XhoI for synthesis of sense cRNA with T3 RNA poly- merase. Synthesis of "P-labeled antisense and sense cRNAs were carried out by means of procedures described in a pre- vious communication [13]. The "P-labeled cRNA was re- covered in a final volume of 50 p1 diethyl-pyrocarbonate- treated sterile distilled water.

Northern-blot analysis

RNA was prepared from tissue by the guanidine thiocya- nate procedures [lo]. Each tissue RNA preparation was dissolved into 10 p1 diethyl-pyrocarbonate-trcated sterile dis- tilled water. The RNA was denatured by glyoxalation in di- methylsulfoxide (Me,SO) as described by Thomas [ l l ] , and used for Northern-blot analysis by previous methods [13]. The target mRNA waq localized by hybridization with the ."P-labeled riboprobes as described [ 131.

Screening the cDNA library

Rat testis >&ZAP11 cDNA library (Stratagene), prepared following ligation of the cDNAs with EcoRI linkers, was probed for SDH positive clones by immuno-detection meth- odq following the preparation of monospecific anti-(sheep SDH) polyclonal sera by procedures previously reported [121. The Bluescribe phagemids (pBS SK') were later res- cued from the SDH-positive plaques by procedures described by the manufacturer (Stratagene), and confirmed for SDH cDNA by hybridization analysis with "P-labeled antisense RNA transcribed from pBS,,, ,,,Er SDH. The cDNA insert was liberated from pBS by digestion with Sac1 (5' end) and XhoI (3' end), sites that are not found i n the SDH cDNA [13].

Sequencing

DNA sequence analysis was carried out by the dideoxy- nucleotide chain-termination method [14] with T7 DNA polymerase, using Sequenase version 2 DNA sequencing kit (United States Biochemical), together with [a-"PIdATP (specific activity. 3000 Ci/mmol). The DNA was sequenced along both strands initiated with primers Complementary to either T7 or T3 RNA polymerase promoters, then continued in a step-wise fashion by using newly synthesized primers (this laboratory, using a DNA synthesizer) along the already sequenced portion of the molecule. The strategy for the sequencing procedure is shown in Fig. 1. Analysis of the sequences was performed with Hitachi DNASIS software (National Biosciences, Plymouth, Minnesota), and sequence comparison analyses were carried out by searching the Gen- Bank database.

5' ku RI Amino Acid CodinQ Sequence

3 k o K I

I I83 1197 1024 In

- .

_ j f---

Fig.1. Strategy for full-length rat testis SDH cDNA sequence determination. Sequencing was initiated with primers complemen- tary to the T3 and T7 RNA polymerase promoters by procedures as described in Materials and Methods. The filled area refers to the amino-acid-encoding region. Sites for the location of the EcoKl link- ers are marked. Lines show the step-wise progression of sequencing through the use of newly synthesized 17-residue oligonucleotide primers complementary to a previously determined sequence. Ar- rows designate bidirectional sequencing initiated on opposite strands with a primer complementary either to the T3 RNA polymerase promoter at the 5' end, or to the T7 RNA polymerase promoter at the 3' end. Numbers refer to total number of bases found in that segment.

RESULTS

Two SDH cDNA clones were isolated, one contained the presumed full-length cDNA insert. and the other lacked a significant portion of the 5' region of the cDNA, as was con- firmed by sequencing. The clone containing the full-length cDNA was sequenced on both strands (Fig. 2). It covered 2277 bp, flanked by the EcoRI linkers found at either end of the molecule (Fig. 2). This sequence encompasses, 56 bp at the 5' untranslated region; 1197 bp at the coding sequence, incorporating an additional 126 bp which appears to code for an additional 42 amino acid polypeptide at the N-terminal end of the protein; and 1024 bp at the 3' untranslated region. The sequence contains a second ATG codon at nucleotides 54-58 (Fig. 2), while the primary ATG codon is at nucleo- tides 183-185. The primary ATG codon is in agreement with the data of Karlsson et al. [S], where assembly of the amino acid sequence for the SDH enzyme (amino acid no. + 1 , Fig. 2) is initiated. The secondary ATG codon, at posi- tion 56, could be non-functional ; however, the sequence sur- rounding the ATG codon at nucleotides 56-58 fits the con- sensus sequence of (CC)GCCATGG with greater confidence than the sequence surrounding the primary ATG codon 1151. In comparison to the published data [9], the 5' region now apparently represents the entire 5' sequence of the mRNA.

There are three polyadenylation consensus sequences at nucleotides 1362-1367 1868 bp from the poly(A) tail], at 1465-1470 1765 bp from the poly(A) tail], and a third at 2212-2217 [17 bp from the poly(A) tail. The nucleotide sequence ends with a poly(A) tail of 42 A residues. There- fore, according to our sequence data (Fig. 2), the 3' region appears to embody the entire 3' untranslatcd sequence of the mRNA.

The coding region translates into 399 amino acids, if we consider thc sccondary ATG codon (at position 56, Fig. 2) to be functional. If we consider only the primary ATG codon (at position 183, Fig. 2) to be functional, then the sequence translates into 356 amino acids, as it was found by Karlsson et al. [9]; however, we are reporting a variance in amino acid 2.58 (nucleotides 957-960. Fig. 2) which appears to be threonine for the rat testis SDH cDNA instead of aspartic acid for the rat liver SDH cDNA. We find the sequencc is ACG instead of GAC [9]. Fig. 3 shows the actual reading of

Page 3: Sorbitol dehydrogenase : Full-length cDNA sequencing reveals a mRNA coding for a protein containing an additional 42 amino acids at the N-terminal end

85 1

CTGCAociMT TCCCACCAGC G A C A G M m A C T A T T O O M GCAOlTTGAG AAAGCTCAOO TGITGGCC

a GTC TTC TCC AGC AGA GTC TIT TI7 m TCA CGT GTC CCC TTA CTC CAG ACC ClT G(3C GGT TFG ACG ACX AGA M C ACC AGC

.................................................... 56 ........ ............................... ,100.. .............

4 2 m m s € I I83 200

TCC CCG CCG GAT CCA GCC GAC ACC TCA AAG CAA GAG AGC GAC GCA GCT CCT GCT AAG GGC GAG AAC CTG TCC CTG GTG GTG - 1 4 - DID Y O ~ D m JI thr sw IVS & Ilu ~ e r &Lp mct a h aIa pro a h lys ply plu aan ku rer ku val v d

3M 0 * l

CAC GOA CC3 GGA GAC ATT CGC CTG GAG AAC TAC CCA ATC CCT GAG CTG GGC CCA AAT GAT GTG I T A C T A M G ATG CAT TCG OTO I4 his ply pro ply asp ilc arg leu glu ran tyr pm ile pro glu leu ply pro a m asp val leu leu lyr met his rer Val

GGG ATC TGT GGC TCG GAT GTT CAC TAC TOG GAG CAT GGC CGA A T T GGG GAC I T C GTT GTG AAA M G CCA ATG GTG ClT GGG CAT 42 gly ilc cya ply aer asp val hi8 tyr trp glu his gly rrg ilc gly asp phc val val lya lyr pm met val leu ply hia

G M G C T O C T G G A A C A G T C A C A A A A G T G G G A C C G A T G G T G A A A C A T C T A A A A C C A G G A G A T C G G G I Y i G C C A T C G A G C ~ O G C 4w

70 gIu ah ah gly Ihr v d Ihr lys v d gly pro met val lys his leu lys p m gly asp arg Val aIa i k glu pro gly 500

GIT CCC CGA GAA ATA GAT GAA TTC TGC AAG ATC GGC CGA TAC AAT CTG ACG CCA TCC ATC TTC TTC TGT GCC ACG CCC CCA GAT 97 val pro rrg glu ile asp plu phe cys lys ile gly arg tyr asn leu Ihr pro cer ile phe phe cya ala thr pm pm a*

600 GAT GGG AAC CTC TGC CGC TTC TAC AAG CAC AGC GCT GAC TTC TGC TAC AAG CTT CCT GAT ACT' GTC ACC TIT G M G M GGG GCC

125 asp gly am leu cyr arg phe tyr lys his ser sIa asp phe cys lyr lyr leu pro aop xi val thr phc plu plu gly a h 700

CTG ATT GAG CCT (JTC TCT GTG GGG ATC TAT CCC TGC CGT CGA GGT TCG GTT TCC CTC GCG AAC AAG GTC Crr GTG TGT GCA GCT 153 leu ilc plu pro leu x r V a l gly ile cyr aIa c y s arg arg gly scr val zer leu gly am lys val leu val cyb g\y &\a

800 GGG CCA ATT GGG ATA GTC ACT TTG ClT GTG GCC AAA GCA ATG GGA GCT TCT CAA GTA GTG GTG ATT GAC CTC T C T OCT TCT CGG

I81 gly pro ile gly ile Val Ihr leu Ibu val ala lyr a h mct gly slr scr gln val val val ile asp leu x r ala xr arg

TTA GCC AAG GCC AAG GAA GTT GGA GCA GAC TIT ACC ATC CAG GTT GCC AAA GAG ACC CCT CAC GAC ATT GCC AAG M G GTG G M 209 leu ala lys ala lys glu val gly a h asp phc Ihr i l c gln vrl ala lys glu Ihr p m his asp ile a h lya lya val plu

900 957 ACT GTG CTG GGG AGC AAC CCA GAG GTC ACC ATC GAA TGC ACG GGA GCG GAG TCC TCT GTC CAG ACG GGC ATCTAT GCC ACT CAC

237 scr v d leu gly scr lys pro glu va l thr ill: glu cys Ihr gly ala glu wr scr Val gln gly ilc lyr a h Ihr his 1000

TCTGGCGGGACC~GGTGG'ITGTGGGAATGGGCCCCGAGATGATCAATTTACCCCTAGTGCACGCAGCTGTGCGGGAGGTGGAC 265 xr gly gly thr leu val Val Val gly m.1 ply p r o glu nut ik asn 1r.u pro leu Val his d r ala val arg glu val asp

ATC AAA GGC GTG TIT CGA TAC TGC AAC ACG TGG CCG ATG GCA GTT TCC ATG (71T GCA TCG AAC ACTTTG AAT GTA M G CCCTTA 1100

293 ile lys gly val phc arg lyr cyr asn rhr trp pro nict 81s v d scr mct lcu a h scr lys thr leu asn V a l lya pro leu 1200

GTG ACC CAT AGG TTC CCC CTG GAG AAG GCT GTA GAA GCC TIT GAA ACA GCC AAA AAG GGA CTGGGG CTG AAA GTT A T 0 ATC M G 321 Val Ihr his srg phe pro leu glu lys 818 Val glu 81s phc glu thr .la lys lys gly leu ply leu lya val met ilc lya

1300 TGTGAC CCC AAT GACCAG AAC CCC TAA ATG TGATTGCTCT ATGCCCTTACCCCACTCTCTCAGCATCTAAGGG~AMTG GACCAGMGG

349 cys asp pro asn asp gln a m p m *** ............................................................................................................................. 1362 1100

GGAAGCCAIT AATGCAGAAC C T T m G AATGGTAGGA ATAATAAACT CATAAGCCGA GACCCTTAGA GGAGCTGGCG T G C C l T A M G

ACAGAAGTAGGCGCACCTTGGGGGACCTCGTAGCCAGAAT GAGATGCGTATACTGAGTAAAGTCTAGAACCAAGAGTCTGGCAGAGAGGT

CCCGGAAATG CCClTTCTCA GTACClTCTT TGGGTGAGGA GACGAAGCAT CCTTCGTCCA TGlTCCAATG TGGGTGCCAG AGAGTGGGGC ...................... 1600 ....... ................................................................ ..................................

........ ,1700. ....................................................................................................... GAGCTGlTC AGAGCACAGT GTTTCCCAAG T

ATAAGGGCAC TCAGCTCTGC CTCAGCTCAG AATTCTGTCC TTACATlTGC AAAGTGGAGG CClTcTTCCC AACACTGCTC ATTCATOTTC

AGGAGCAGTA TCG'ITGCTAA GCAACCAGGA GTCITCCACC CAAAGATCCT AAATCCAGCC TAACTCATAC AAGAGGGCCA CAGGAGGGCT .................................................................................... .19(H). ................................................................................................... T G A G m C C C ACTCACAGGA TTCGCCTCCT CTCCCAGGCT CACTCCTAGG CAATTATTAT CCCATCCCAC TCAGAAGATG CTCCCCTTCT

CGGCTGTTAAGGCTAGTGAT ATCTGATGGATGGGTATCAC AGAGCCTAATTAAATTATGGGGCTTTKITTATAAGATCTGGGTCCAAAT

C A T G C C ~ G T G A T ~ A A GATAATCAAG AAGAGCACAG TAACTGTGGT GTAACTTGGG CTGCAGTCTG TAATCCACCT ~ C A G C T A T

.... .2200.. .2212.. GA ........................................................................................................................ .I277 .................... CTGTCGffGG T W G A A A A A AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAGGA ATTCGATAT

...... .................................................... ................ .................... .................. ATGGC CAACACCTGA GCT7TCI'CAA ACTGCITCCA ATAGTAAATT

Fig. 2. Nucleotide sequence of full-length rat testis sorbitol dehydrogenase cDNA and its deduced amino acid sequence. Numbering to the left refers to the amino acid, and nurnbcring above the nucleotide sequence refers lo the nucleotide. The primary ATG codon is at nucleotides 183-185, the secondary ATG codon is at nucleotides 56-58. Bold underlined amino acids arc the presumed additional amino acids found at the N-terminal end of the enzyme. The amino acid threonine at nucleotides 957-959 is underlined. Consensus polyadenyla- tion signals are underlined. Double-un sequence at both the 5' and the 3' ends refer to the location of the EcoRI linkers. Within the cDNA molecule there arc two EcoRI nucleotides 496, and 1713.

the DNA sequencing gel to document occurrence of ATG at nucleotides 56-58 and at nucleotides 183-185.

Amino acid sequence comparisons for the rat tcstis SDH (Fig. 2, starting at nuclcolide 186, amino acid no. 1) versus

the corresponding rat liver SDH 191 shows 99.7% positional identity, and 82% positional identity with the sheep and human liver enzyme 1x1. T h e sequcnce starting at nucleotide 56 to nucleotide 182, corresponding to the 42 amino acids

Page 4: Sorbitol dehydrogenase : Full-length cDNA sequencing reveals a mRNA coding for a protein containing an additional 42 amino acids at the N-terminal end

86

Fig. 3. Sequencing gel of the cDNA at the 5' region showing the relative location of the ATG codons both under standard reac- tion conditions (A) and extended reaction conditions (B), peT- formed as described by the Sequenase DNA sequencing kit (United States Biochemical).

preceding amino acid no. 1 of the enzyme sequence, appears to be unique to SDH. Search of the GenBank database for that sequence proved negative. In this study, we refer to the 399 amino acid protein as a pre-sorbitol dehydrogenase (pre- SDH). and the 356 amino acid protein as sorbitol dehydroge- nase.

Northern-blot analysis of mRNA for SDH is shown in Fig. 4. The data demonstrate that the mRNA for SDH ap- pears to be a doublet in both the rat lens and the rat liver. One species is found to migrate as a 4-kb band and the other migrates as a 2.3 -2.4-kb band. This data further indicate that

Fig. 4. Northern-blot size analysis of rat liver and rat lens sorbi- to1 dehydrogenase mRNA determined by hybridization with 3zP- labeled SDH cRNA probe prepared as described in Materials and Methods. SDH mRNA appears to be a doublet, one band mi- grates at about 4 kb and the other band at 2.3-2.4 kb. At times, the 4-kb band is found at higher intensities than the 2.4-kb band [13]. The data indicate that the 2.4-kh mRNA could be a processed 4-kh mRNA species. Lane A, standard RNA size markers; €3, rat liver mRNA; C, rat lens mRNA.

the SDH clone that we have isolated and studied contains a full-length SDH cDNA, representing 2.3-2.4 kb mRNA for SDH.

DISCUSSION The 2277 nucleotides comprising our presumed full-

length rat testis SDH cDNA include the coding region for rat SDH (1071 bp), a secondary coding region upstream from thc primary ATG codon (126 bp), a 5' non-coding region be- yond the secondary ATG codon (56 bp), and a 3' non-coding region (1024 bp). The 3' non-coding sequence appears to be complete, since i t includes both the putativc poly(A) signal and the poly(A) tail. Although we find three poly(A) signals, the two immediately following the coding sequence (starting at nucleotides 1362 and 1465) are in agreement with the data of Karlsson et al. [9]. However, a third consensus sequence, 17 nucleotides upstream from the poly(A) sequence, indi- cates that the AATAAA and AGTAAA immediately following the coding sequence may be of an undetermined function. Thc third sequence is optimally located and there- fore may be functional. Yet, morc than one polyadenylation signal in the 3' non-coding region have also been found in several human alcohol dehydrogenase cDNAs [ 1 6 1.

Our data extend the results of Karlsson et al. [9] and update the sequence for the sorbitol dehydrogenase mRNA at several points. We have obtained the sequence of a full- length SDH cDNA; the 5' sequence contains a second ATG codon whose surrounding sequences are more in consensus with the ATG start codon, according to the study of Kozak I 151, than the primary ATG sequence immediately preceding the codon for the SDH N-terminal amino acid: the 3' non- coding sequence, following the termination codon TAA, is actually 1021 nucleotides in length and not 261 as was re- ported by Karlsson et al. [9]; amino acid 258 is threonine rather than aspartic acid; and it is possible that sorbitol dehy- drogenase is processed post-translationally. eliminating a polypeptide of 42 amino acids located at the N-terminal end

Page 5: Sorbitol dehydrogenase : Full-length cDNA sequencing reveals a mRNA coding for a protein containing an additional 42 amino acids at the N-terminal end

87

of the nascent protein, prior to the release of SDH into the cell cytosol; however, the possibility still remains that the secondary ATG codon i s non-Eunctional. Based on the analy- sis of Kozak [ 151 of the mRNA sequences upstream from the ATG start sites, the surrounding sequence TTGGCC-ATG-G for thc secondary ATG codon (in pre-SDH) appears to be 100% Compatible with P-globin mRNA of Xenopus. The sequence NNNGCC-ATG-G is compatible with numerous mRNAr, including human pre-antithrombin 111, rat pre-p- casein, mouse pre-a-fetoprotein, human pre-glycoprotein holmones, human pre-proiniulin, chicken pre-proalbumin, rat p actin, and chicken histone H5 among others [15]. The sequence surrounding the primary ATG codon (in SDH) is found far less frequently when compared to the mRNAs ex- amined by Kozak [15]; we could not find a single mRNA that contained the sequence AGCGAC-ATG-G, and the se- quence NNNGAC-ATG-G was compatible to only one mRNA coding for the chicken histone H2b [15]. In conclu- \ion, the sequence data indicate that it is possible that SDH is a poqt-translationally processed protein. Corroborative evi- dence utilizing immuno-detection methods or transformation in cell culture may be required to confirm the occurrence of a pre-SDH.

The authors are indebted to Drs Christina Karlsson and Hans Jornvall of the Karolinska lnstitute for the cDNA clone for sorbitol dehydrogenase, pSDHf3, and to Dr Ling Liu for the initial screening of the JZAP I1 cDNA library. This work was supported by Public Health Service Grant EY-05406 to I. B.

REFERENCES 1 . Varma, S. D. & Kinoshita, J . H. (1974) Biochim. Biophys. Actir

2. Kinoshita, J. H. & Nishimura, C. (1988) Diabetes Met& Rei:

3. Rubison, W. G. Jr., Kador, P. E & Kinoshita, J. H. (1983) Sci-

4. Gabbay, K. H., Mirola, L. 0. & Field, R. A. (1966) Scierice

5 . Gabbay, K. H. (1973) N. Engl. J. Mecl. 288, 831-836. 6. Jeffery, J., Chesters, J., Mills. C.. Sadler. P. J. & Jornvall. H.

7. Maret. W. & Auld, D. S. (1988) Biochemistry 27, 1622- 1628. 8. Jornvall, H., Persson, M. & Jcffery. J. (1981) Proc. Nntl Acnrl.

9. Karlsson, C., Jornvall, H. & Hoog, J.-0. (1991) Eur: J. Biucheni.

10. Chomcznski, P. & Sacchi, N. (1987) Anal. Biochem. 162.256--

11. Thomas, P. S . (1980) Pmc. Natl Acud. Sci. USA 77. 5201--

12. Hentzen, P. C., Bessem. C. C., Sogente, N. & Bekhor, I. (1984)

13. Wen, Y. & Bekhor, I. (1993) Curr Eye Res. 12, 323-332. 14. Sanger, E, Nicklin, S. & Coulson, A. R. (1977) Proc. Nntl Acatl.

15. Kozak, M. (1984) Nuclcic Acids Kes. 12, 857-872. 16. Heden L.-O., Hoog, J.-O., Larsson, K., Lake, M., Lagesholm.

E., Holmgren, A., Vallee, B. L., Jornvall, H. & von Bahrlind- strom, H. (1985) FEBS Lert. 194, 327-332.

338,632-640.

4, 323-331.

ence 221, 1177-1179.

151,209-210.

(1984) EMBO J , 3, 357-360.

Sci. USA 78,4226-4230,

198, 761-765.

159.

5205.

Exp. Eye Res. 39, 5 1 -60.

Sci. USA 74, 5463-5467.