6
Correction Proc. Natl. Acad. Sci. USA 83 (1986) 3567 Correction. In the article "Molecular cloning and nucleotide sequence of human glucocerebrosidase cDNA" by Joseph Sorge, Carol West, Beryl Westwood, and Ernest Beutler, which appeared in number 21, November 1985, of Proc. Natl. Acad. Sci. USA (82, 7289-7293), the authors request that the following be noted. Because of typographical errors in the laboratory, one of two consecutive GAT sequences was omitted from the glucocerebrosidase sequence shown in Fig. 2. The relevant portion of the sequence should read 625 630 635 CTGATGAT TT. Nucleotide 1616 in the original sequences should have been T instead of C. The correct length of the coding region is increased to 1548 nucleotides, representing 516 amino acids with a molecular weight of 55,498 for the mature protein.

Molecular cloning andnucleotide ofhuman glucocerebrosidase cDNA

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Molecular cloning andnucleotide ofhuman glucocerebrosidase cDNA

Correction Proc. Natl. Acad. Sci. USA 83 (1986) 3567

Correction. In the article "Molecular cloning and nucleotidesequence of human glucocerebrosidase cDNA" by JosephSorge, Carol West, Beryl Westwood, and Ernest Beutler,which appeared in number 21, November 1985, ofProc. Natl.Acad. Sci. USA (82, 7289-7293), the authors request that thefollowing be noted. Because of typographical errors in thelaboratory, one of two consecutive GAT sequences wasomitted from the glucocerebrosidase sequence shown in Fig.2. The relevant portion of the sequence should read

625 630 635CTGATGAT TT.

Nucleotide 1616 in the original sequences should have beenT instead of C. The correct length of the coding region isincreased to 1548 nucleotides, representing 516 amino acidswith a molecular weight of 55,498 for the mature protein.

Page 2: Molecular cloning andnucleotide ofhuman glucocerebrosidase cDNA

Proc. Nati. Acad. Sci. USAVol. 82, pp. 7289-7293, November 1985Biochemistry

Molecular cloning and nucleotide sequence of humanglucocerebrosidase cDNA

(expression vector/genedc deflciency/A library/cDNA done/chromosome localization)

JOSEPH SORGE, CAROL WEST, BERYL WESTWOOD, AND ERNEST BEUTLERDepartment of Basic and Clinical Research, Scripps Clinic and Research Foundation, 10666 North Torrey Pines Road, La Jolla, CA 92037

Contributed by Ernest Beutler, July 15, 1985

ABSTRACT Mutations in the human glucocerebrosidasegene cause Gaucher disease. A cDNA done containing theentire human glucocerebrosidase coding region from normalcells has been isolated using Xgtll expression libraries. Thecomplete nucleotide sequence, a restriction map, and ahydropathy profre are presented. Hybridization to chromo-some-specific DNA localizes the human glucocerebrosidasegene to chromosome 1. The likely precursor protein is 515amino acids long. The NH2-terminal 19 amino acids constitutea leader sequence that is cleaved from the mature protein. Thepredicted molecular weight of the mature protein is 55,384,without glycosylation or carboxyl-terminal processing.

Gaucher disease is caused by a hereditary deficiency in theenzyme glucocerebrosidase. It is an autosomal recessivedisorder that is inherited in a Mendelian fashion. Affectedindividuals cannot adequately catabolize glucocerebroside,which then accumulates in macrophages.

Cloning of the glucocerebrosidase gene has been reported(1); however, the restriction endonuclease map and nucleo-tide sequence were not disclosed. In an earlier publication,we presented the nucleotide sequence of a partial glucocer-ebrosidase cDNA clone that enabled us to study the geneticsof Gaucher disease (2). We now report the cloning of thefull-length cDNA for human glucocerebrosidase by using arandomly primed Xgtll cDNA expression library, show thecomplete nucleotide sequence of the cDNA and a restrictionmap, and demonstrate the localization ofthe cloned sequenceto chromosome 1.

MATERIALS AND METHODS

Cells and RNA. WI-38 human fibroblasts were obtainedfrom the American Type Culture Collection and grown inDulbecco's modified Eagle's medium supplemented with10% fetal bovine serum. Cellular RNA was prepared usingthe guanidinium thiocyanate/cesium chloride method (3).The poly(A)-rich fraction of RNA was prepared by twopurifications with oligo(dT)-Sepharose (4).

Synthesis of Randomly Primed cDNA. Five micrograms ofpoly(A)+ RNA was dissolved in 1 mM methylmercurichydroxide and incubated at room temperature for 5 min.2-Mercaptoethanol was then added to 0.5%. The reaction wascarried out in 0.4 ml of 50 mM Tris'HCl, pH 8.3/20 mMKCI/10 mM MgCl2/5 mM dithiothreitol/l mM dATP/dCTP/dGTP/dTTP containing 15- to 25-nucleotide-long random calfthymus oligonucleotides at 1 mg/ml, placental ribonucleaseinhibitor (Amersham) at 500 units/ml, [32P]dATP at 125gCi/ml (600 pgCi/mmol; 1 Ci = 37 GBq), and avianmyeloblastosis virus reverse transcriptase at 800 units/ml.

After incubation for 2 hr at 370C the randomly primed firststrand was separated from the unincorporated nucleotides ona Sephadex G-50 column. The cDNA was made 0.3 M withNaOH and incubated at 680C for 1 hr. The mixture was thenneutralized with acetic acid and the cDNA was precipitatedwith 2.5 volumes of ethanol. The second strand of the cDNAwas synthesized using reverse transcriptase followed by thelarge fragment of DNA polymerase as described (5). Thehairpins were nicked using S1 nuclease (Sigma) at 100units/ml and the ends were made blunt with the largefragment of DNA polymerase (5). The double-strandedcDNA was treated with EcoRI methylase (New EnglandBiolabs) according to the recommended procedure.Prephosphorylated EcoRI oligonucleotide linkers were thenligated to the cDNA and digested with EcoRI (5). The cDNAwas size fractionated on Sepharose 4B (Pharmacia).

Agtll Library. One hundred nanograms of cDNA, >1000base pairs long, was ligated to 1 pmg of alkaline phosphatase-treated Xgtll EcoRI arms (Vector Cloning Systems, SanDiego, CA). The DNA was packaged in vitro using ahigh-efficiency packaging extract (Vector Cloning Systems)and the recombinant phage were plated on strain Y1088 ofEscherichia coli (6). Approximately 3 x 106 plaques wereobtained with >90%o containing cDNA inserts. An amplifiedphage stock was made from these plates and was used forsubsequent screening.Antibody Purification. Rabbit antiserum raised against

purified human glucocerebrosidase was purified using twomethods. In the first method, an ammonium sulfate fractionwas allowed to react with an affinity column to which purifiedglucocerebrosidase had been bound using cyanogen bromide.The bound antibodies were then removed by elution with 0.1M glycine, pH 2.6. In the second method, the affinity columnwas prepared by first attaching a very-high-affinity anti-glucocerebrosidase monoclonal antibody to Sepharose withcyanogen bromide. Then, purified glucocerebrosidase wasbound to the monoclonal antibodies attached to the columnresin, and the column was washed extensively with 0.1 Mglycine, pH 2.6. Most of the glucocerebrosidase proteinremained bound to the column, as shown by measuringenzyme activity in the resin. A rabbit anti-glucocerebrosidaseserum was then allowed to react with the column, the resinwas washed extensively at neutral pH, and then the boundrabbit antibodies were removed by elution with 0.1 Mglycine, pH 2.6. This purified serum was found to be highlyspecific for human glucocerebrosidase on immunoblots ofhuman fibroblast proteins.

Library Screening. Approximately 1 x 106 recombinantphage from the amplified library were plated on E. coli strainY1090 (6) at a density of 5 x 104 phage per 150-mm plate (5).After 3 hr of incubation at 370C, the bacterial lawns werecovered with a 137-mm nitrocellulose filter (HATF 137,Millipore) that had been soaked in 5 mM isopropyl

Abbreviation: bp, base pair(s).

7289

The publication costs of this article were defrayed in part by page chargepayment. This article must therefore be hereby marked "advertisement"in accordance with 18 U.S.C. §1734 solely to indicate this fact.

Page 3: Molecular cloning andnucleotide ofhuman glucocerebrosidase cDNA

Proc. Natl. Acad. Sci. USA 82 (1985)

thiogalactoside (IPTG) and dried. Incubation at 37°C wascontinued for 5 hr, and then the filter was removed, a secondIPTG-treated filter was placed on the bacterial lawn, and theplate was incubated overnight at 37°C. The filters were driedat room temperature and then soaked first in phosphate-buffered saline (Pi/NaCl) for 10 min and then in Pi/NaClcontaining 3% bovine serum albumin for 30 min at roomtemperature. Finally, the filters were treated at room tem-perature for 2 hr with Pi/NaCl containing 3% bovine serumalbumin and affinity-purified anti-glucocerebrosidase anti-body at a dilution of 1:1000 at room temperature for 2 hr. Thefilters were washed at room temperature once with Pi/NaClfor 10 min and once with Pi/NaCl containing 0.1% TritonX-100 for 10 min and then with Pi/NaCl for 10 min. They werethen incubated for 1 hr with 125I-labeled protein A (specificactivity, 20 mCi/mg) in Pi/NaCl (5 x 105 cpm per filter),washed with Pi/NaCl and with Pi/NaCl containing Triton asabove, and air dried, and XAR-2 x-ray film was exposed tothe filters for 1-2 days at -80°C using an intensifying screen.Positive phage were picked and repeatedly rescreened atlower density until a pure population was obtained.DNA Subcloning and Sequencing. The clone G5A-1Y was

shown to contain an 1100-base-pair (bp) EcoRI insert thatwas subcloned into the EcoRI site of plasmid pBR322 (5).This plasmid, G5A-1Y, was then nick-translated and used asa probe to screen an oligo(dT)-primed human WI-38 XgtllcDNA library prepared as above, except that the first-strandcDNA was primed with oligo(dT) instead of with random calfthymus primers. Positive clones were identified at a frequen-cy of 1 in 600,000. Several plaques were purified and oneclone, ID9-bb, was found to have a 1500-bp insert. This insertwas subcloned into plasmid pBR322, yielding plasmid ID9-bb. Fig. 1 illustrates the portions of the glucocerebrosidasecDNA contained within plasmids G5A-1Y and ID9-bb. Afull-length cDNA clone was constructed by taking advantageof the unique Sca I site in the region where G5A-1Y andID-9bb overlap. The plasmid containing the full-length cloneis called GCS-2kb.The full-length clone was sequenced by the method of

Maxam and Gilbert (7). The complete DNA sequence wasanalyzed with computer programs from the University ofWisconsin. A hydrophobicity-hydrophilicity profile was gen-erated using the method of Kyte and Doolittle (8).Chromosome Localiation. The nick-translated 1100-bp

G5A-1Y probe was hybridized to chromosome-specific DNAattached to nitrocellulose filters according to publishedprocedures (9, 10).

ATGI ElE _

-'4m. ~n*i - -.

100

G5A -IY

C,Y L .\l/

Protein Sequencing Methods. Glucocerebrosidase was pu-rified from human placenta by the method of Dale et al. (11)and further purified by immunoaffinity purification over ananti-glucocerebrosidase monoclonal antibody column. Spe-cifically, 4 mg of an anti-glucocerebrosidase mouse mono-clonal antibody (16-B-3) was coupled to CNBr-activatedagarose. The partially purified glucocerebrosidase was ap-plied to the column in 50 mM NaCl/50mM citrate, pH 6, andthe column was washed extensively with this starting buffer.Glucocerebrosidase was removed by elution with 0.2 Mglycine, pH 2.6. The eluted protein was analyzed byNaDodSO4/PAGE and showed a single band.The NH2-terminal sequence was determined by solid-

phase Edman degradation with HPLC analysis of the phen-ylthiohydantoin-derivatized amino acid cleavage products.

RESULTS

Library Screening. Approximately 1 x 106 recombinantphage were screened with an affinity-purified anti-glucocerebrosidase antibody and a single, strongly positivephage was obtained. The same phage was positive with asecond, highly purified serum. The phage DNA contained an1100-bp EcoRI insert. When the same randomly primedlibrary was screened using this 1100-bp DNA insert as aprobe, 6 positives were found in approximately 1 x 106phage. This suggested an mRNA frequency of 0.0006% orroughly 1 or 2 mRNA molecules per cell. When the samelibrary was screened with a human /3-actin DNA probe (12),=0.7% of the phage were positive.Restriction Map and Sequence Analysis. The composite

restriction map ofhuman glucocerebrosidase cDNA is shownin Fig. 1 and the complete nucleotide sequence of the codingregion is shown in Fig. 2. Since the insert of the 5' clone,G5A-1Y, had been fused in frame to bacterial 3-galactosidasein Xgtll, it was necessary for an open reading frame to extendfrom the extreme left end of G5A-1Y. As shown in Fig. 2, anopen reading frame extends from nucleotide 1 to nucleotide1700.A hydrophobicity profile for the protein sequence predict-

ed from the full-length cDNA sequence is shown in Fig. 3.Chromosome Mapping. The 1100-bp G5A-1Y DNA insert

was nick-translated and hybridized with chromosome-speci-fic human DNA (Fig. 4). Hybridization was specific forchromosome 1, which has been shown to contain the gene forhuman glucocerebrosidase (14, 15).

_ oo&n a

U) LC .

0. 0a -O._

Oa O

IL (A...... ...................................................Y. 0 Ca z 0

UGA

-I

I D9bb

FIG. 1. Restriction map ofthe GCS-2kb insert. ATG represents the probable initiator methionine codon and UGA represents the translationalterminator codon. The boundaries of the partial cDNA clones G5A-1Y and ID9-bb are indicated with horizontal bars. We have previously usedclone G5A-1Y to perform restriction polymorphism analysis (2). Enzymes that do not digest the GCS-2kb insert include Xho I, Xba I, Cla I,EcoRI, EcoRV, Sma I, and Bgl II.

s

7290 Biochemistry: Sorge et al.

Page 4: Molecular cloning andnucleotide ofhuman glucocerebrosidase cDNA

Biochemistry: Sorge et al. Proc. Natl. Acad. Sci. USA 82 (1985) 7291

Banm HI1 TTCCTGCATCCTTGTTTTTGTTTAGTGGATCCTCTATCCTTCAGAGACTCTGGAACCCCTGTGGTCTTCTCTTCATCTAATGACCCTGAG 90

PheLeuHisProCysPheCysLeuValAspProLeuSerPheArgAspSerGlyThrProValValPheSerSerSerAsnAspProGlu

91 GGGATGGAGTTTTCAAGTCCTTCCAGAGAGGAATGTCCCAAGCCTTTGAGTAGGGTAAGCATCATGGCTGGCAGCCTCACAGGTTTGCTT 180GlyMetGluPheSerSerProSerArgGluGluCysProLysProLeuSerArgValSerlleMetAlaGlySerLeuThrGlyLeuLeu

.Hind m181 CTACTTCAGGCAGTGTCGTGGGCATCAGGTGCCCGCCCCTGCATCCCTAAAAGCTTCGGCTACAGCTCGGTGGTGTGTGTCTGCAATGCC 270

LeuLeuGlnAlaValSerTrpAlaSerGlyAlaArgProCysIleProLysSerPheGlyTyrSerSerValValCysValCysAsnAla. .t . KpnI . .

271 ACATACTGTGACTCCTTTGACCCCCCGACCTTTCCTGCCCTTGGTACCTTCAGCCGCTATGAGAGTACACGCAGTGGGCGACGGATGGAG 360ThrTyrCysAspSerPheAspProProThrPheProAlaLeuGlyThrPheSerArgTyrGluSerThrArgSerGlyArgArgMetGlu

StuI . PstII361 CTGAGTATGGGGCCCATCCAGGCTAATCACACGGGCACAGGCCTGCTACTGACCCTGCAGCCAGAACAGAAGTTCCAGAAAGTGAAGGGA 450

LeuSerMetGlyProIleGlnAlaAsnHisThrGlyThrGlyLeuLeuLeuThrLeuGlnProGluGlnLysPheGlnLysValLysGly

451 TTTGGAGGGGCCATGACAGATGCTGCTGCTCTCAACATCCTTGCCCTGTCACCCCCTGCCCAAAATTTGCTACTTAAATCGTACTTCTCT 540PheGlyGlyAlaMetThrAspAlaAlaAlaLeuAsnIleLeuAlaLeuSerProProAlaGlnAsnLeuLeuLeuLysSerTyrPheSer

.KpnI . Pvu II541 GAAGAAGGAATCGGATATAACATCATCCGGGTACCCATGGCCAGCTGTGACTTCTCCATCCGCACCTACACCTATGCAGACACCCCTGAT 630

GluGluGlyIleGlyTyrAsnIleIleArgValProMetAlaSerCysAspPheSerIleArgThrTyrThrTyrAlaAspThrProAsp.PstI

631 TTCCAGTTGCACAACTTCAGCCTCCCAGAGGAAGATACCAAGCTCAAGATACCCCTGATTCACCGAGCCCTGCAGTTGGCCCAGCGTCCC 720PheGlnLeuHisAsnPheSerLeuProGluGluAspThrLysLeuLysIleProLeulleHisArgAlaLeuGlnLeuAlaGlnArgPro

721 GTTTCACTCCTTGCCAGCCCCTGGACATCACCCACTTGGCTCAAGACCAATGGAGCGGTGAATGGGAAGGGGTCACTCAAGGGACAGCCC 810ValSerLeuLeuAlaSerProTrpThrSerProThrTrpLeuLysThrAsnGlyAlaValAsnGlyLysGlySerLeuLysGlyGlnPro

811 GGAGACATCTACCACCAGACCTGGGCCAGATACTTTGTGAAGTTCCTGGATGCCTATGCTGAGCACAAGTTACAGTTCTGGGCAGTGACA 900GlyAspIleTyrHisGlnThrTrpAlaArgTyrPheValLysPheLeuAspAlaTyrAlaGluHisLysLeuGlnPheTrpAlaValThrPvu *

901 GCTGAAAATGAGCCTTCTGCTGGGCTGTTGAGTGGATACCCCTTCCAGTGCCTGGGCTTCACCCCTGAACATCAGCGAGACTTCATTGCC 990AlaGluAsnGluProSerAlaGlyLeuLeuSerGlyTyrProPheGlnCysLeuGlyPheThrProGluHisGlnArgAspPheIleAla

Sca.I991 CGTGACCTAGGTCCTACCCTCGCCAACAGTACTCACCACAATGTCCGCCTACTCATGCTGGATGACCAACGCTTGCTGCTGCCCCACTGG 1080

ArgAspLeuGlyProThrLeuAlaAsnSerThrHisHisAsnValArgLeuLeuMetLeuAspAspGlnArgLeuLeuLeuProHisTrpKpnI

1081 GCAAAGGTGGTACTGACAGACCCAGAAGCAGCTAAATATGTTCATGGCATTGCTGTACATTGGTACCTGGACTTTCTGGCTCCAGCCAAA 1170AlaLysValValLeuThrAspProGluAlaAlaLysTyrValHisGlyIleAlaValHisTrpTyrLeuAspPheLeuAlaProAlaLys

Stul1171 GCCACCCTAGGGGAGACACACCGCCTGTTCCCCAACACCATGCTCTTTGCCTCAGAGGCCTGTGTGGGCTCCAAGTTCTGGGAGCAGAGT 1260

AlaThrLeuGlyGluThrHisArgLe4PheProAsnThrMetLeuPheAlaSerGluAlaCysValGlySe-rLysPheTrp~luGlnSer

1261 GTGCGGCTAGGCTCCTGGGATCGAGGGATGCAGTACAGCCACAGCATCATCACGAACCTCCTGTACCATGTGGTCGGCTGGACCGACTGG 1350ValArgLeuGlySerTrpAspArgGlyMetGlnTyrSerHisSerIleIleThrAsnLeuLeuTyrHisValValGlyTrpThrAspTrp

* . . . SailI . . .1351 AACCTTGCCCTGAACCCCGAAGGAGGACCCAATTGGGTGCGTAACTTTGTCGACAGTCCCATCATTGTAGACATCACCAAGGACACGTTT 1440

AsnLeuAlaLeuAsnProGluGlyGlyProAsnTrpValArgAsnPheValAspSerProIleIleValAspIleThrLysAspThrPhe* * . . *

1441 TACAAACAGCCCATGTTCTACCACCTTGGCCACTTCAGCAAGTTCATTCCTGAGGGCTCCCAGAGAGTGGGGCTGGTTGCCAGTCAGAAG 1530TyrLysGlnProMetPheTyrHisLeuGlyHisPheSerLysPheIleProGluGlySerGlnArgValGlyLeuValAlaSerGlnLys

* . NsiI, . . *1531 AACGACCTGGACGCAGTGGCACTGATGCATCCCGATGGCTCTGCTGTTGTGGTCGTGCTAAACCGCTCCTCTAAGGATGTGCCTCCTACC 1620

AsnAspLeuAspAlaValAlaLeuMetHisProAspGlySerAlaValValValValLeuAsnArgSerSerLysAspValProProThrBamnI . . .H

1621 ATCAAGGATCCTGCTGTGGGCTTCCTGGAGACAATCTCACCTGGCTACTCCATTCACACCTACCTGTGGCATCGCCAGTGATGGAGCAGA 1710IleLysAspProAlaValGlyPheLeuGluThrIleSerProGlyTyrSerIlelisThrTyrLeuTrpHisArgGlnEnd

1711 TACTCAAGGAGGCACTGGGCTCAGCCTGGGCATTAAAGGGACAGAGTCAGCTCACACGCTGTCTGTGACTAAAGAGGGCACAGCAGGGCC 1800

1801 AGTGTGAGCTTACAGCGACGTAAGCCCAGGGGCAATGGTTTGGGTGACTCACTTTCCCCTCTAGGTGGTGCCCAGGGCTGGAGGCCCCTA 1890

1891 GAAAAAGATCAGTAAGCCCCAGTGTCCCCCCAGCCCCCATGCTTATGTGAACATGCGCTGTGTGCTGCTTGCTTTGGAAACTXGCCTGGG 1980Stu I Sac I .

1981 TCCAGGCCTAGGGTOAGCTCACTGTCCGTACAAACACAAGATCAGGGCTGAGGGTAAGGAAAAGAAGAGACTAGG.AAAGCTGGGCCCAAA 2070

2071 ACTGGAGACTGTTTGTCTTTCCTAGAGATGCAGAACTGGGCCCGTGGAGCAGCAGTGTCAGCATCAGGGCGGAAGCCTTAAAGCAGCAGC 2160

2161 GGGTGTGCCCAGGCACCCAGATGATTCCTATGGCACCAGCCAGGAAAAATGGCAGCTCTTAAAGGGG 2227

FIG. 2. Nucleotide sequence and amino acid translation of GCS-2kb insert DNA. Nucleotide 1 represents the residue of the cDNA cloneGCS-2kb that would be 5'-most in the glucocerebrosidase mRNA molecule (although this may not be the 5' end of the mRNA). Some restrictionsites have been indicated. Clone G5A-1Y, used in our previous study (2), extends from nucleotide 1 to nucleotide 1038. The long open readingframe is indicated by the corresponding amino acids. The methionine at nucleotides 154-156 is probably the initiator of translation in vivo. Theend of the 19-amino acid leader sequence and beginning of the NH2-terminal sequence is indicated with an arrow. The sequence between theasterisks at nucleotides 1495 and 1590 has been identified by G. A. Grabowski (personal communication) as the active site of the enzyme byaffinity labeling with bromoconduritol B epoxide (13).

Confirmation with the Protein Sequence. The sequence of sequence. The probability of this occurring by chance isthe amino terminus of the glucocerebrosidase protein is as 1:160,000.follows:

Xaa-Xaa-Pro-Xaa-Ile-Pro-Xaa-Xaa-Phe. DISCUSSIONAlthough there are several unknown residues, all confirmed We have cloned and sequenced 2227 nucleotides of cDNAresidues match the amino acids predicted from the nucleotide containing the entire coding region of the human glucocere-

Page 5: Molecular cloning andnucleotide ofhuman glucocerebrosidase cDNA

Proc. Natl. Acad. Sci. USA 82 (1985)

poly -leucine

poly-methionine

poly-glycine

poly-proline

poly-glutomine

IJN V \/-- ~~~~~~~~~~~~~~~.1'f

I I I I I_

75 150 225 300 375 450 525Amino acid

FIG. 3. Hydropathy profile of the glucocerebrosidase proteinaccording to the method of Kyte and Doolittle (8). The horizontalaxis is amino acid number corresponding to Fig. 2; the vertical axisis the average hydropathy index (8) for 25 amino acids, 12 lying to theleft and 12 lying to the right of a given amino acid on the horizontalaxis. The proposed initiator methionine is amino acid 52.

brosidase gene. We initially attempted to clone the gene byusing an affinity-purified antiserum and an oligo(dT)-primedXgtll cDNA library, but we were unsuccessful, presumablybecause the cDNA was not long enough to contain sufficientprotein coding region to bind antibodies. We circumventedthis problem by constructing a Xgtll cDNA library withrandom primers to initiate first-strand cDNA synthesis. Sincethe random primers initiate cDNA synthesis at any pointalong the mRNA molecule, the random library does not havethe "3' bias" that conventional oligo(dT)-primed librarieshave. With the randomly primed library, we were able toidentify a positive clone on the first screening of 1 x I6plaques. Our first clone, GSA-1Y, extended from midcodingregion to a point near the 5' end of the mRNA molecule.

Although the left end of the open reading frame fromnucleotides 1-1700 could be part of the coding sequence ofthe glucocerebrosidase molecule, the sequence data (seebelow) suggest that the ATG at nucleotides 154-156 is theinitiator ATG for glucocerebrosidase. It is interesting, how'ever, that the 5' untranslated region of the glucocerebrosid-ase mRNA contains such a long open reading frame that isin-frame with the coding sequence for the protein. We havenot yet mapped the 5' end of the mRNA molecule. If anotherATG lies upstream of what we refer to as nucleotide 1, it ispossible that a larger protein precursor could be synthesizedas well (at least 6000 daltons larger). The ATG at nucleotides94-96 is also in the same open reading frame. However, the

1 . 2

8

3 4 5

9 10-12

19 20 22

13 1415

sequences surrounding this ATG do not match Kozak'sconsensus sequence for initiator ATGs (16). The sequencessurrounding the ATG at nucleotides 154-156 match Kozak'sconsensus sequence quite well.

If the ATG at positions 154-156 is the initiator codon, theproglucocerebrosidase protein is 515 amino acids long. Theprotein sequence data suggest that the first 19 amino acids areremoved from the mature protein. This conclusion has alsobeen reached by Tsuji et al. (17). Such cleavage of a leadersequence is common for secreted proteins; glucocerebrosid-ase is secreted into lysosomes. Thus the mature gluco-cerebrosidase protein would contain 496 amino acids. Theleader sequence (nucleotides 154-210) is the most hydropho-bic portion of the protein, followed closely in hydrophobicityby the carboxyl terminus, where the active site has beenshown to be located (G. A. Grabowski, personal communi-cation). The protein would have an unglycosylated molecularweight of 55,384. We do not yet know whether the carboxylterminus identified in the nucleotide sequence is present onthe mature protein or whether it also is cleaved.The fact that the gene we have cloned is glucocerebrosid-

ase is established by (i) amino-terminal protein sequencedata, (ii) chromosome localization, (iii) antibody reactivity,and (iv) the fact that amino acids 362-376 perfectly match apreviously published glucocerebrosidase oligopeptide se-qtlence (1).The isolation and characterization of a full-length cDNA

clone makes it possible to attempt expression experiments inhuman cells with the hope that these studies may eventuallylead us to an effective treatment of Gaucher disease throughgene therapy.

Note Added in Proof. We have now found that ourcDNA clone directsthe synthesis of functional glucocerebrosidase when inserted intomammalian cells.

We thank Y. W. Kan for his efforts in performing the Fhromo-some-specific hybridizations and L. Kedes and P. Gunning for theirhuman actin gene clone. This work was supported in part, Oy GrantCA36448 from the National Cancer Institute and Grant 1-955 fromthe March ofDimes Birth Defects Foundation. This is publication no.3874BCR from the Research Institute of Scripps Clinic.

1. Ginns, E. I., Choudary, P. V., Martin, B. M., Winfield, S.,Stubblefield, B., Mayor, J., Merkle-Lehman, D., Murray,G. J., Bowers, L. A. & Barranger, J. A. (1984) Biochem.Biophys. Res. Commun. 123, 574-580.

2. Sorge, J., Gelbart, T., West, C., Westwood, B. & Beutler, E.(1985) Proc. Nati. Acad. Sci. USA 82, 5442-5445.

3. Chirgwin, J. M., Pryzbyla, A. E., MacDonald, R. J. & Rutter,W. J. (1979) Biochemistry 18, 5294-5298.

6 7

18 17

x

16

21

FIG. 4. Chromosome localization. A G5A-1Y radioactive probe was hybridized to chromosome-specific DNA attached to nitrocellulosefilters. Only chromosome 1 was reactive.

7292 Biochemistry: Sorge et al.

Page 6: Molecular cloning andnucleotide ofhuman glucocerebrosidase cDNA

Biochemistry: Sorge et al.

4. Aviv, H. & Leder, P. (1972) Proc. Natl. Acad. Sci. USA 69,1408-1412.

5. Maniatis, T., Fritsch, E. F. & Sambrook, J. (1982) MolecularCloning: A Laboratory Manual (Cold Spring Harbor Labora-tory, Cold Spring Harbor, NY).

6. Young, R. A. & Davis, R. W. (1983) Proc. Natl. Acad. Sci.USA 80, 1194-1198.

7. Maxam, A. M. & Gilbert, W. (1980) Methods Enzymol. 65,499-560.

8. Kyte, J. & Doolittle, R. F. (1982) J. Mol. Biol. 157, 105-132.9. Lebo, R. V., Gorin, F., Fletterick, R. J., Kao, F.-T., Cheung,

M.-C., Bruce, B. D. & Kan, Y. W. (1984) Science 225, 57-59.10. Lebo, R. V. (1985) Hum. Genet. 69, 316-322.11. Dale, G. L., Beutler, E., Fournier, P., Blanc, P. & Liautaud,

J. (1980) in Enzyme Therapy in Genetic Diseases, ed. Desnick,R. J. (Liss, New York), Vol. 2, pp. 33-41.

12. Gunning, P., Ponte, P., Okayama, H., Engel, J., Blau, H. &

Proc. Natl. Acad. Sci. USA 82 (1985) 7293

Kedes, L. (1983) Mol. Cell. Biol. 3, 787-795.13. Gatt, S., Dinur, T. & Desnick, R. J. (1982) in Gaucher Dis-

ease: A Century of Delineation and Research, eds. Desnick,R. J., Gatt, S. & Grabowski, G. A. (Liss, New York), pp.315-331.

14. Shafit-Zagardo, B., Devine, E. A., Smith, M., Arredondo,G. A. F. & Desnick, R. J. (1981) Am. J. Hum. Genet. 33,564-575.

15. Barneveld, R. A., Keijzer, W., Tegelaers, F. P. W., Ginns,E. I., Geurts van Kessel, A., Brady, R. O., Barranger, J. A.,Tager, J. M., Galjaard, H., Westerveld, A. & Reuser, A. J. J.(1983) Hum. Genet. 64, 227-231.

16. Kozak, M. (1984) Nucleic Acid Res. 12, 857-872.17. Tsuji, S., Choudary, P. V., Barranger, J. A., Martin, B. &

Ginns, E. I. (1985) Fed. Proc. Fed. Am. Soc. Exp. Biol. 44,1612 (abstr.).