5
Proc. Natl. Acad. Sci. USA Vol. 83, pp. 4859-4863, July 1986 Genetics Human a-galactosidase A: Nucleotide sequence of a cDNA clone encoding the mature enzyme (Fabry disease/lysosomal hydrolase/mRNA/N-glycosylation/poly(A) signals) DAVID F. BISHOP*, DAVID H. CALHOUNt, HAROLD S. BERNSTEIN*, PETROS HANTZOPOULOSt, MERRIGENE QUINNt, AND ROBERT J. DESNICK*t *Division of Medical Genetics, Department of Pediatrics, and tDepartment of Microbiology, Mount Sinai School of Medicine, New York, NY 10029 Communicated by Edwin D. Kilbourne, March 12, 1986 ABSTRACT The complete nucleotide sequence has been determined for a Kgtll cDNA done (XAG18) containing the full-length coding region for the mature lysosomal form of human a-galactosidase A (a-Gal A; EC 3.2.1.22). The XAG18 insert contained a 1226-base-pair sequence with an open reading frame encoding 398 amino acids of the mature poly- peptide (predicted Mr = 45,356) and the last 5 amino acids of the propeptide sequence. The poly(A) signals AATACA and ATTAAA occurred 28 and 11 nucleotides prior to the TAA stop codon, respectively. There was no 3' untranslated region as the poly(A) sequence immediately followed the TAA termination codon; a second independently cloned cDNA confirmed this finding. The predicted amino acid sequence was colinear with 86 nonoverlapping residues (22% of the mature subunit) determined by microsequencing amino-terminal, tryptic, and cyanogen bromide peptides of the purified mature enzyme. Four potential N-glycosylation sites were identified, all of which occurred at predicted (3 turns in hydrophilic regions of sec- ondary structure. RNA transfer hybridization analysis of HeLa poly(A)+ RNA demonstrated a single 1.45-kilobase band whose signal was decreased by prior immunoabsorption of polysomes with monospecific a-Gal A antibodies. Searches of nucleic acid and protein data bases did not reveal significant homology even with the limited sequences available for mammalian lysosomal enzymes. Human a-galactosidase A (a-D-galactoside galactohydrolase; EC 3.2.1.22; a-Gal A) is the lysosomal glycosidase respon- sible for the hydrolysis of terminal a-galactosidic linkages in various glycolipids (1-4). The enzyme, encoded by a struc- tural gene on the long arm of the X chromosome, Xq21-+q22 (5), is synthesized as a precursor that undergoes co- and posttranslational modifications, including cleavage of pre- and propeptide sequences and N-glycosylation (6). Studies of a-Gal A biosynthesis in Chang liver cells (6) and human fibroblasts (D.F.B., P. Lemansky, R.J.D., and K. von Figura, unpublished results) indicated that a Mr - 55,000- 58,000 glycosylated propeptide is processed to a mature subunit (Mr 49,800). The mature active enzyme purified from human tissues and plasma is a homodimer (Mr 101,000) that contains one or more asparagine-linked oligo- saccharide chains (3, 4). The a-Gal A gene is expressed in all cells of normal males and females at similar levels due to random X chromosomal inactivation (1). Recently, it has been shown that the locus can be reactivated and partially expressed following 5- azacytidine treatment of mouse-human hybrid clones retain- ing only the inactive human X chromosome (7). Mutations at the a-Gal A locus result in Fabry disease, an X-linked recessive disorder characterized by deposition of the glyco- sphingolipid globotriaosylceramide in vascular lysosomes and early demise of affected males due to occlusive disease of the heart, kidney, and/or brain (1). Recently, we described the isolation of a human a-Gal A cDNA clone (XAG18) from a Xgtl1 expression library (8). In this communication, we report the nucleotide sequence of the XAG18 cDNA and the predicted amino acid sequence for the entire mature form of a-Gal A. RNA transfer hybridization analysis indicated that the processed message encoding the entire a-Gal A subunit is about 1.45 kilobases (kb). MATERIALS AND METHODS Materials. ,B-Cyanoethyl diisopropylphosphoramidites were purchased from American Bionuclear. Oligo(dT)-cel- lulose, type T3, was obtained from Collaborative Research (Waltham, MA). Restriction endonucleases, T4 DNA poly- merase, subcloning primer RD29, and the 17-mer M13 uni- versal sequencing primer were from International Biotech- nologies. T4 ligase was purchased from New England Biolabs. Terminal deoxynucleotidyltransferase was from Bethesda Research Laboratories. A human lung cDNA Xgtll library was obtained from Clontech (Palo Alto, CA). Protein A-Sepharose was purchased from Pharmacia. Nitrocellulose filters (type HATF) and Zetabind nylon transfer membranes were from Millipore and AMF Cuno (Meriden, CT), respec- tively. [a-32P]dNTPs (3000 Ci/mmol; 1 Ci = 37 GBq) were from Amersham. Amino Acid Microsequencing. Homogeneous a-Gal A was purified from human lung, and cyanogen bromide and tryptic peptides were prepared, isolated, and microsequenced as described (4, 8). Subcloning of the XAG18 Insert and Construction of M13- mpl8 Deletion Clones. The 1.2-kb EcoRI insert from AAG18 was subcloned into pBR322 [designated pAG18 (8)] and pUC8 (designated pUCAG18). The cDNA insert from pAG18 was recloned into M13mp18 (9) in both orientations for deletion subcloning (10). Twelve derivative clones containing deletions of different lengths from the message and comple- mentary strands of pAG18 (designated mAG25.0 and mAG- 27.0, respectively) were selected for subsequent sequencing (Fig. 1). Nucleotide Sequencing and Computer-Assisted Analyses. The M13mpl8 deletion subclones were sequenced by the Sanger method using the 17-mer M13 primer (11). To facili- tate sequencing of two internal regions, synthetic 17-mer oligonucleotide primers to XAG18 nucleotides (nt) 154-170 and 1062-1078 were constructed on a Biosearch Sam One DNA synthesizer using 8-cyanoethyl diisopropylphos- phoramidites. The deblocked oligonucleotides were used Abbreviations: a-Gal A, a-galactosidase A; kb, kilobase(s); nt, nucleotide(s). *To whom all correspondence should be addressed at: Division of Medical Genetics, Mount Sinai School of Medicine, Fifth Avenue at 100th Street, New York, NY 10029. 4859 The publication costs of this article were defrayed in part by page charge payment. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. §1734 solely to indicate this fact. Downloaded by guest on February 2, 2021

Human A: Nucleotide cDNA the mature · pAG18, a previously synthesized 14-mer oligonucleotide mixture (probe 2B; ref. 8), corresponding to codonsfor the amino-terminalaminoacids19to23,wasusedasaprimerfor

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Human A: Nucleotide cDNA the mature · pAG18, a previously synthesized 14-mer oligonucleotide mixture (probe 2B; ref. 8), corresponding to codonsfor the amino-terminalaminoacids19to23,wasusedasaprimerfor

Proc. Natl. Acad. Sci. USAVol. 83, pp. 4859-4863, July 1986Genetics

Human a-galactosidase A: Nucleotide sequence of a cDNA cloneencoding the mature enzyme

(Fabry disease/lysosomal hydrolase/mRNA/N-glycosylation/poly(A) signals)

DAVID F. BISHOP*, DAVID H. CALHOUNt, HAROLD S. BERNSTEIN*, PETROS HANTZOPOULOSt,MERRIGENE QUINNt, AND ROBERT J. DESNICK*t*Division of Medical Genetics, Department of Pediatrics, and tDepartment of Microbiology, Mount Sinai School of Medicine, New York, NY 10029

Communicated by Edwin D. Kilbourne, March 12, 1986

ABSTRACT The complete nucleotide sequence has beendetermined for a Kgtll cDNA done (XAG18) containing thefull-length coding region for the mature lysosomal form ofhuman a-galactosidase A (a-Gal A; EC 3.2.1.22). The XAG18insert contained a 1226-base-pair sequence with an openreading frame encoding 398 amino acids of the mature poly-peptide (predicted Mr = 45,356) and the last 5 amino acids ofthe propeptide sequence. The poly(A) signals AATACA andATTAAA occurred 28 and 11 nucleotides prior to the TAA stopcodon, respectively. There was no 3' untranslated region as thepoly(A) sequence immediately followed the TAA terminationcodon; a second independently cloned cDNA confirmed thisfinding. The predicted amino acid sequence was colinear with86 nonoverlapping residues (22% of the mature subunit)determined by microsequencing amino-terminal, tryptic, andcyanogen bromide peptides of the purified mature enzyme.Four potential N-glycosylation sites were identified, all ofwhichoccurred at predicted (3 turns in hydrophilic regions of sec-ondary structure. RNA transfer hybridization analysis ofHeLapoly(A)+ RNA demonstrated a single 1.45-kilobase band whosesignal was decreased by prior immunoabsorption of polysomeswith monospecific a-Gal A antibodies. Searches of nucleic acidand protein data bases did not reveal significant homology evenwith the limited sequences available for mammalian lysosomalenzymes.

Human a-galactosidase A (a-D-galactoside galactohydrolase;EC 3.2.1.22; a-Gal A) is the lysosomal glycosidase respon-sible for the hydrolysis of terminal a-galactosidic linkages invarious glycolipids (1-4). The enzyme, encoded by a struc-tural gene on the long arm of the X chromosome, Xq21-+q22(5), is synthesized as a precursor that undergoes co- andposttranslational modifications, including cleavage of pre-and propeptide sequences and N-glycosylation (6). Studies ofa-Gal A biosynthesis in Chang liver cells (6) and humanfibroblasts (D.F.B., P. Lemansky, R.J.D., and K. vonFigura, unpublished results) indicated that a Mr - 55,000-58,000 glycosylated propeptide is processed to a maturesubunit (Mr 49,800). The mature active enzyme purifiedfrom human tissues and plasma is a homodimer (Mr101,000) that contains one or more asparagine-linked oligo-saccharide chains (3, 4).The a-Gal A gene is expressed in all cells of normal males

and females at similar levels due to random X chromosomalinactivation (1). Recently, it has been shown that the locuscan be reactivated and partially expressed following 5-azacytidine treatment of mouse-human hybrid clones retain-ing only the inactive human X chromosome (7). Mutations atthe a-Gal A locus result in Fabry disease, an X-linkedrecessive disorder characterized by deposition of the glyco-

sphingolipid globotriaosylceramide in vascular lysosomesand early demise of affected males due to occlusive diseaseof the heart, kidney, and/or brain (1).

Recently, we described the isolation of a human a-Gal AcDNA clone (XAG18) from a Xgtl1 expression library (8). Inthis communication, we report the nucleotide sequence of theXAG18 cDNA and the predicted amino acid sequence for theentire mature form of a-Gal A. RNA transfer hybridizationanalysis indicated that the processed message encoding theentire a-Gal A subunit is about 1.45 kilobases (kb).

MATERIALS AND METHODSMaterials. ,B-Cyanoethyl diisopropylphosphoramidites

were purchased from American Bionuclear. Oligo(dT)-cel-lulose, type T3, was obtained from Collaborative Research(Waltham, MA). Restriction endonucleases, T4 DNA poly-merase, subcloning primer RD29, and the 17-mer M13 uni-versal sequencing primer were from International Biotech-nologies. T4 ligase was purchased from New EnglandBiolabs. Terminal deoxynucleotidyltransferase was fromBethesda Research Laboratories. A human lung cDNA Xgtlllibrary was obtained from Clontech (Palo Alto, CA). ProteinA-Sepharose was purchased from Pharmacia. Nitrocellulosefilters (type HATF) and Zetabind nylon transfer membraneswere from Millipore and AMF Cuno (Meriden, CT), respec-tively. [a-32P]dNTPs (3000 Ci/mmol; 1 Ci = 37 GBq) werefrom Amersham.Amino Acid Microsequencing. Homogeneous a-Gal A was

purified from human lung, and cyanogen bromide and trypticpeptides were prepared, isolated, and microsequenced asdescribed (4, 8).

Subcloning of the XAG18 Insert and Construction of M13-mpl8 Deletion Clones. The 1.2-kb EcoRI insert from AAG18was subcloned into pBR322 [designated pAG18 (8)] andpUC8 (designated pUCAG18). The cDNA insert from pAG18was recloned into M13mp18 (9) in both orientations fordeletion subcloning (10). Twelve derivative clones containingdeletions of different lengths from the message and comple-mentary strands of pAG18 (designated mAG25.0 and mAG-27.0, respectively) were selected for subsequent sequencing(Fig. 1).

Nucleotide Sequencing and Computer-Assisted Analyses.The M13mpl8 deletion subclones were sequenced by theSanger method using the 17-mer M13 primer (11). To facili-tate sequencing of two internal regions, synthetic 17-meroligonucleotide primers to XAG18 nucleotides (nt) 154-170and 1062-1078 were constructed on a Biosearch Sam OneDNA synthesizer using 8-cyanoethyl diisopropylphos-phoramidites. The deblocked oligonucleotides were used

Abbreviations: a-Gal A, a-galactosidase A; kb, kilobase(s); nt,nucleotide(s).*To whom all correspondence should be addressed at: Division ofMedical Genetics, Mount Sinai School of Medicine, Fifth Avenueat 100th Street, New York, NY 10029.

4859

The publication costs of this article were defrayed in part by page chargepayment. This article must therefore be hereby marked "advertisement"in accordance with 18 U.S.C. §1734 solely to indicate this fact.

Dow

nloa

ded

by g

uest

on

Feb

ruar

y 2,

202

1

Page 2: Human A: Nucleotide cDNA the mature · pAG18, a previously synthesized 14-mer oligonucleotide mixture (probe 2B; ref. 8), corresponding to codonsfor the amino-terminalaminoacids19to23,wasusedasaprimerfor

4860 Genetics: Bishop et al.

without purification as primers for Sanger sequencing at aconcentration of 2.0 ng/ml.

Nucleotide sequence for the 3' region was obtained byusing the 17-mer universal primer for dideoxy sequencing ofthe pUCAG18 cDNA insert. For sequencing the 5' region ofpAG18, a previously synthesized 14-mer oligonucleotidemixture (probe 2B; ref. 8), corresponding to codons for theamino-terminal amino acids 19 to 23, was used as a primer forSanger sequencing (11). In addition, Maxam and Gilbertchemistries (12) were used to sequence the 5' end and aninternal EcoRI-Msp I fragment.

Overlapping nucleotide sequences were aligned on a mi-crocomputer using the MicroGenie program (Beckman).Protein structural analyses and DNA and amino acid homol-ogy searches of the National Institutes of Health GenBankand the National Biomedical Research Foundation proteindata base were performed in February, 1986, usingIFIND/ALIGN programs on the Bionet Network (Intel-ligenetics).RNA Isolation andRNA Transfer Hybridization. Polysomes

were isolated from 8 x 109 HeLa cells (13). Rabbit anti-a-GalA was purified from serum and rendered RNase-free byprecipitation with ammonium sulfate, absorption to proteinA-Sepharose, elution with 0.1 M glycine HCl (pH 2.5), andimmediate neutralization with 0.5 M Tris base (14). Analiquot of the polysome preparation (50 ml; 17 A260 units/ml)was incubated with 5.0 mg of monospecific rabbit anti-humana-GalA IgG (8) for 3 hr at 40C and then was passed over a 5-mlcolumn of protein A-Sepharose at 2 ml/hr. Total mRNA anda-Gal A-depleted mRNA [from the unbound polysomes (13)]were isolated by detergent dissociation, phenol extraction,and two cycles of purification on oligo(dT)-cellulose. RNAtransfer hybridization analysis, using 1.0% agarose gelscontaining 1.9 M formaldehyde and 20 mM 4-morpholine-propanesulfonic acid buffer (15), was performed using nick-translated pAG18 cDNA insert.

Message Strand 5 '-* 3'

pAGUI

Proc. Natl. Acad. Sci. USA 83 (1986)

RESULTSNucleotide Sequence Analysis. Fig. 1 shows the strategy for

sequencing the a-Gal A cDNA subclones. The sequence wasconfirmed in its entirety on both strands from the deletionderivatives by using the Sanger method (11) with the excep-tion of a short segment (nt 939-1004) on the message strandthat was sequenced by primer extension using a synthetic17-mer oligonucleotide. Other regions, including the 3' and 5'ends, were confirmed by enzymatic (11) or chemical (12)sequencing (Fig. 1).The complete 1226-nt sequence of the XAG18 cDNA insert

and its deduced amino acid sequence are shown in Fig. 2.This sequence contained an open reading frame from posi-tions -15 to 1194 followed by a TAA termination codon. The12-nt poly(A) tail immediately followed the terminationcodon. Two hexanucleotide poly(A) signals, AATACA andATTAAA, were located 28 and 11 nt prior to the TAA stopcodon, respectively. In addition, the putative U4 smallnuclear ribonucleoprotein recognition sequence, CAGCT,which may be involved in polyadenylylation (16), was present38 nt from the stop codon.A second cDNA clone for a-Gal A designated AAGL4 was

isolated from a human lung Xgtll cDNA library by using thenick-translated EcoRI insert from XAG18 as a hybridizationprobe. This 1.4-kb clone contained the identical 3' sequenceas that observed for XAG18, including the absence of a 3'untranslated region; in addition, it contained a 120-nt poly(A)tail.RNA Transfer Hybridization. Analysis of cell poly(A)+

RNA before (Fig. 3, lanes 1) and after (Fig. 3, lanes 2)polysome immunoabsorption with anti-a-Gal A IgG showedreduced hybridization of the immunodepleted a-Gal AmRNA fraction (Fig. 3). Based on the migration of the 18Sand 28S human ribosomal subunits (Fig. 3, lanes 3), the a-GalA mRNA was estimated to be about 1.45 kb (17).

25.025.0

PUC

Complementary Strand 3'-_. 5'

27.0 _327.0O

27.527.24 i soi ON

FIG. 1. Strategy for sequencing the human a-Gal A cDNA clones. M13 clones mAG25.0 and mAG27.0 package the entire message andcomplementary strands, respectively, from the XAG18 EcoRI insert and were used to generate deletion subclones (e.g., 25.4, 27.2). The entirelength ofeach subclone insert is shown; the heavy lines and arrows indicate the extent and direction of sequence determined. Arrows originatingfrom short vertical lines indicate sequences determined by Sanger sequencing (11) using the universal primers that bind to the vector sequence;arrows originating from closed squares indicate sequences determined using as primers synthetic oligonucleotides that hybridize to the cDNAsequence; arrows originating from closed circles indicate sequences determined by the method of Maxam and Gilbert (12).

12

*~- * 2* 2S.47

4 I~~~-- 2S.8I--------- I 2552

2-5 25.51| 2 25.4

e~ 1 25.41--i 2544125.9

O 25.17of i 25.214---4 ~~~25.27

*~~~~~~~~~~~~~~~~~~~~~~~~~~A1S1

*sI

.17 1 100 200 300 400 s5e w0e 700 800 900 1000 1100 1200I I I I 1 1 1 1

27.11 -_27.14 I

Dow

nloa

ded

by g

uest

on

Feb

ruar

y 2,

202

1

Page 3: Human A: Nucleotide cDNA the mature · pAG18, a previously synthesized 14-mer oligonucleotide mixture (probe 2B; ref. 8), corresponding to codonsfor the amino-terminalaminoacids19to23,wasusedasaprimerfor

Genetics: Bishop et al. Proc. Natl. Acad. Sci. USA 83 (1986) 4861

-17-5

TC CCT GGG GCT AGA GCA -1Pro Gly Ala Arg Ala -1

1 CTG GAC AAT GGA TTG GCA AGG ACG CCT ACC ATG GGC TGG CTG CAC TGG GAG CGC TTC ATG TGC AAC CTT GAC TGC CAG GAA GAG CCA GAT1 Leu Asp Asn Gly Leu Ala Arg Thr Pro Thr Met Gly Trp Leu His Trp Glu Arg Phe Met Cys Asn Leu Asp Cys Gln Glu Glu Pro Asp

Ser ArgN-Ter

9030

91 TCC TGC ATC AGT GAG AAG CTC TTC ATG GAG ATG GCA GAG CTC ATG GTC TCA GM GGC TGG AAG GAT GCA GGT TAT GAG TAC CTC TGC ATT 18031 Ser Cys Ile Ser Glu Lys Leu Phe Met Glu Met Ala Glu Leu Met Val Ser Glu Gly Trp Lys Asp Ala Gly Tyr Glu Tyr Leu Cys Ile 60

X X Ser

181 GAT GAC TGT TGG ATG GCT CCC CAA AGA GAT TCA GAA GGC AGA CTT CAG GCA GAC CCT CAG CGC TTT CCT CAT GGG ATT CGC CAG CTA GCT 27061 Asp Asp Cys Trp Met Ala Pro Gln Arg Asp Ser Glu Gly Arg Leu Gln Ala Asp Pro Gln Arg Phe Pro His Gly Ile Arg Gln Leu Ala 90

271 MT TAT GTT CAC AGC AAA GGA CTG AAG CTA GGG ATT TAT GCA GAT GTT GGA AAT AAA ACC TGC GCA GGC TTC CCT GGG AGT TTT GGA TAC 36091 Asn Tyr Val His Ser Lys Gly Leu Lys Leu Gly Ile Tyr Ala Asp Val Gly Asn Lys Thr Cys Ala Gly Phe Pro Gly Ser Phe Gly Tyr 120

Cao361 TAC GAC ATT GAT GCC CAG ACC TTT GCT GAC TGG GGA GTA GAT CTG CTA AAA TTT GAT GGT TGT TAC TGT GAC AGT TTG GAA MT TTG GCA 450121 Tyr Asp Ile Asp Ala Gln Thr Phe Ala Asp Trp Gly Val Asp Leu Leu Lys Phe Asp Gly Cys Tyr Cys Asp Ser Leu Glu Asn Leu Ala 150

451 GAT GGT TAT AAG CAC ATG TCC TTG GCC CTG AAT AGG ACT GGC AGA AGC ATT GTG TAC TCC TGT GAG TGG CCT CTT TAT ATG TGG CCC TTT 540151 Asp Gly Tyr Lys His Met Ser Leu Ala Leu Asn Arg Thr Gly Arg Ser Ile Val Tyr Ser Cys Glu Trp Pro Leu Tyr Met Trp Pro Phe 180

CHO

541 CM MG CCC AAT TAT ACA GAA ATC CGA CAG TAC TGC AAT CAC TGG CGA AAT TTT GCT GAC ATT GAT GAT TCC TGG MA AGT ATA AAG AGT181 Gln Lys Pro Asn Tyr Thr Glu Ile Arg Gln Tyr Cys Asn His Trp Arg Asn Phe Ala Asp Ile Asp Asp Ser Trp Lys Ser Ile Lys Ser

CBO Asn X

630210

T-49

631 ATC TTG GAC TGG ACA TCT TTT AAC CAG GAG AGA ATT GTT GAT GTT GCT GGA CCA GGG GGT TGG AAT GAC CCA GAT ATG TTA GTG ATT GGC 720211 Ile Leu Asp Trp Thr Ser Phe Asn Gln Glu Arg Ile Val Asp Val Ala Gly Pro Gly Gly Trp Asn Asp Pro Asp Met Leu Val Ile Gly 240

721 AAC TTT GGC CTC AGC TGG AAT CAG CAA GTA ACT CAG ATG GCC CTC TGG GCT ATC ATG GCT GCT CCT TTA TTC ATG TCT AAT GAC CTC CGA 810241 Asn Phe Gly Leu Ser Trp Asn Gln Gln Val Thr Gln Met Ala Leu Trp Ala Ile Met Ala Ala Pro Leu Phe Met Ser Asn Asp Leu Arg 270

AlaCs-1

811 CAC ATC AGC CCT CAA GCC AAA GCT CTC CTT CAG GAT AAG GAC GTA ATT GCC ATC AAT CAG GAC CCC TTG GGC AAG CM GGG TAC CAG CTT 900271 His Ile Ser Pro Gln Ala Lys Ala Leu Leu Gln Asp Lys Asp Val Ile Ala Ile Asn Gin Asp Pro Leu Gly Lys Gln Gly Tyr Gln Leu 300

X X Arg Glu

T-535

901 AGA CAG GGA GAC MC TTT GM GTG TGG GAA CGA CCT CTC TCA GGC TTA GCC TGG GCT GTA GCT ATG ATA AAC CGG CAG GAG ATT GGT GGA 990301 Arg Gln Gly Asp Asn Phe Glu Val Trp Glu Arg Pro Leu Ser Gly Leu Ala Trp Ala Val Ala Met Ile Asn Arg Gln Glu Ile Gly Gly -330

Leu Gly Ser Lys XT-SI

991 CCT CGC TCT TAT ACC ATC GCA GTT GCT TCC CTG GGT AAA GGA GTG GCC TGT MT CCT GCC TGC TTC ATC ACA CAG CTC CTC CCT GTG AAA 1080331 Pro Arg Ser Tyr Thr Ile Ala Val Ala Ser Leu Gly Lys Gly Val Ala Cys Asn Pro Ala Cys Phe Ile Thr Gln Leu Leu Pro Val Lys 360

1081 AGG AAG CTA GGG TTC TAT GAA TGG ACT TCA AGG TTA AGA AGT CAC ATA AAT CCC ACA GGC ACT GTT TTG CTT CAG CTA GAA MT ACA ATG 1170361 Arg Lys Leu Gly Phe Tyr Glu Trp Thr Ser Arg Leu Arg Ser His Ile Asn Pro Thr Gly Thr Val Leu Leu Gln Leu Glu Asn Thr Met 390

T- 51 CfOT-53A

1171 CAG ATG TCA TTA AAA GAC TTA CTT TAAAAAAAAMAAAA391 Gln Met Ser Leu Lys Asp Leu Leu Tor

1209398

FIG. 2. Nucleotide and predicted amino acid sequences of the XAG18 insert encoding the human mature a-Gal A subunit. Amino acid 1 isassigned to the amino-terminal residue. Nucleotides -17 to -1 encode five amino acids of the propeptide. Bold underlines indicate confirmedamino acid sequence obtained by microsequencing of tryptic (T) and cyanogen bromide (CB) peptides. Differences between microsequencedand predicted amino acids are shown; X denotes unidentified amino acids. Note that peptide T-53B corrects two errors in the sequence of CB-1.CHO indicates potential sites of N-glycosylation. Overlines indicate the poly(A) signals AATACA and ATTAAA and the CAGCT site implicatedin U4 small nuclear RNA binding (16).

Predicted Amino Acid Sequence of the Mature Form ofHuman a-Gal A. The 1194-nt sequence encoded 398 aminoacids corresponding to a Mr of 45,356 for the unglycosylatedmature protein. Nucleotide positions 1-3 were assigned to theamino-terminal leucine residue of the microsequenced ma-ture enzyme protein. The nucleotide sequence was colinearwith 86 nonoverlapping amino acid residues determined bymicrosequencing of five tryptic peptides, one cyanogenbromide peptide, and the amino-terminal sequence of homo-geneous human a-Gal A (Fig. 2). Minor differences wereobserved between the amino acid sequences determined bymicrosequencing and those predicted from the cDNA se-quence, as indicated in Fig. 2. The predicted amino acidcomposition (Table 1) and unglycosylated molecular weightwere similar to those determined previously for the purifiedprotein (8). The predicted amino acid sequence had four

possible glycosylation sites (Asn-Xaa-Ser/Thr) for aspar-agine-linked oligosaccharides at asparagine residues 108,161, 184, and 377.Computer-assisted analysis by the algorithm of Kyte and

Doolittle (18) with a span setting of 6 revealed hydrophilicregions of at least 20 amino acids dispersed along the entiresequence, occurring at residues 66-86, 179-208, 289-313,and 359-379. In contrast, hydrophobic regions of 10 aminoacids or more were primarily located in the carboxyl-terminalregion of the mature polypeptide, occurring at residues37-47, 252-266, 314-323, and 335-358.

Local secondary structure was predicted by the Chou andFassman algorithm (19). Regions of a-helical structure of 10or more contiguous amino acids were located at residues16-31, 34-50, 252-264, 276-285, and 385-398 (Fig. 4). Re-gions of, sheets involving -s20-40 amino acids occurred at

Dow

nloa

ded

by g

uest

on

Feb

ruar

y 2,

202

1

Page 4: Human A: Nucleotide cDNA the mature · pAG18, a previously synthesized 14-mer oligonucleotide mixture (probe 2B; ref. 8), corresponding to codonsfor the amino-terminalaminoacids19to23,wasusedasaprimerfor

Proc. Natl. Acad. Sci. USA 83 (1986)

3 1 2 3

4.82 kb

4-1.87 kb

FiG. 3. RNA transfer hybridization of 32P-labeled a-Gal A cDNAwith HeLa poly(A)+ RNA. (A) Agarose gel (1.0%) stained withethidium bromide. (B) RNA transfer hybridization blots. Lanes 1, 10,ug of total poly(A)+ RNA from HeLa polysomes; lanes 2, 10 ,ug ofpoly(A)+ RNA, a-Gal A mRNA-depleted by polysome im-munoabsorption; lanes 3, 5 Lg of 18S and 28S rRNA isolated fromHeLa polysomes. Prehybridization was overnight at 42TC in 50%formamide, 0.75 M NaCl/75 mM sodium citrate, 5x concentratedDenhardt's solution (0.02% bovine serum albumin/0.02% Ficoll/0.02% polyvinylpyrrolidone), 50mM sodium phosphate (pH 7.6), 500pf of sheared, denatured salmon sperm DNA per ml, and 0.2%NaDodSO4. Hybridization with 2 ng ofnick-translated pAG18 EcoRIinsert per ml (2 x 108 cpm/,ug) Was overnight at 42TC in the abovebuffer containing Denhardt's solution and 100 s&g of salmon spermDNA per ml. The final washing stringency was 15mM NaCl/1.5 mMsodium citrate at 650C.

residues 88-106, 220-239, and 322-359. The four possibleN-glycosylation sites (Fig. 4) were all located in turnswithin hydrophilic regions ofthe enzyme (20), consistent withtheir probable surface localization. There were no repeatedsequences or inverted repeats within the a-Gal A cDNA.Computer searches revealed little, if any, amino acid

homology with that predicted for a-Gal A. The highest degreeofnucleotide sequence homology was with human factor VIII(21); when two gaps totaling 23 residues were introduced, -nt5489-5595 had 47% homology with a-Gal A nt 495-623.Possible N-glycosylation sites were located within 6 bases ofthe 5' ends of both aligned sequences. Comparison ofavailable data for lysosomal enzymes, including humancathepsin D (22), a-fucosidase (23), (3-glucosidase (24),glucuronidase (25), (-hexosaminidase A a subunit (26) and ,3

subunit (27), as well as rat cathepsin B (28), revealed littlenucleotide or amino acid sequence similarity, with the poS-sible exception of a-fucosidase, ofwhich nt 31-178 were 52%

1 20 40

Table 1. Amino acid composition of human a-Gal A

Amino a-Gal A a-Gal A Amino a-Gal A a-Gal Aacid protein* cDNAt acid protein* cDNAtAsx 48 49 Ile 18 21Thr 16 14 Leu 41 41Ser 25 23 Tyr 17 15Glx 41 40 Phe 16 15Pro 21 19 His 9 7Gly 32 31 Lys 17 17Ala 29 28 Arg 19 19Val 18 16 Cys 10 12Met 9 15 Trp 16 16

Amino acid composition is based on subunit Mr of 45,346.*Average integer number of residues per subunit for two independentpreparations of human lung a-Gal A (8).tAmino acid composition calculated from the cDNA sequence ofpAG18.

homologous to a-Gal A nt 34-168, when four gaps totaling 18bases were introduced.

DISCUSSIONNucleotide sequencing of the XAG18 cDNA insert for humana-Gal A revealed an open reading frame of 1209 nt encodingthe entire mature form of the lysosomal glycosidase. Thedemonstration of colinearity between the nucleotide se-quence and 86 nonoverlapping amino acid residues deter-mined by microsequencing identified' the amino-terminalcodon for the mature enzyme, was consistent with thereading frame throughout, and confirmed the authenticity ofthis clone.

Poly(A)+ RNA hybridization experiments indicated thatthe a-Gal A message of 1.45 kb was depleted by antibodyabsorption ofthe nascent polypeptide chains from polysomes(Fig. 3). Previous immunologic studies of human a-Gal Abiosynthesis have shown that a Mr 55,000-58,000 glycosyl-ated propeptide undergoes proteolytic processing to the Mr50,000 mature lysosomal form (6). Since the 1226-base-pairXAG18 cDNA insert encodes 403 amino acids (398 of themature subunit and the last 5 of the propeptide), the addi-tional 225 nt in the 1.45-kb a-Gal A mRNA encodes theprepeptide (signal peptide) and remaining propeptide aminoacids as well as sequences for a 5' untranslated region andpoly(A) tail.The a-Gal A cDNA contains two consensus sequences for

cleavage at the poly(A) site, ATTAAA and AATACA, whichare 11 and 28 nt upstream from the stop codon, respectively.The former is present in about 12% of vertebrate messages,whereas the latter occurs in only 2% (29). Further studies willbe required to determine which sequence(s) function inpoly(A) addition.

Analysis of the predicted amino acid sequence revealedfour possible N-glycosylation sites. This observation was

60 80 100 120 140 1601iO 200lzU Z40 b ou 3uuaOu ;qtbu

FIG. 4. Hydropathy profile and secondary structure of the mature a-Gal A amino acid sequence predicted from XAG18 cDNA. Positive andnegative numbers correspond to hydrophobic and bydrophilic regions, respectively. a = a helix; ,3 = 8 sheet; CHO indicates possibleN-glycosylation sites.

+3x

+2

- +1

6._-1

L

v -2

-3

w A Shiv 1&.bAAL A

- ~~~ ~~~~~~~~CHOCHO CHO CHO -I. * . A en ton A nn 2n I An innx n Ix fn I tn Ann

4862 Genetics: Bishop et al.

LOU 4%UU

Dow

nloa

ded

by g

uest

on

Feb

ruar

y 2,

202

1

Page 5: Human A: Nucleotide cDNA the mature · pAG18, a previously synthesized 14-mer oligonucleotide mixture (probe 2B; ref. 8), corresponding to codonsfor the amino-terminalaminoacids19to23,wasusedasaprimerfor

Proc. Natl. Acad. Sci. USA 83 (1986) 4863

consistent with previous studies of purified a-Gal A fromhuman plasma and spleen (4) and immunoprecipitated en-zyme from Chang liver cells (6), which indicated the presenceof tri- and tetraantennary complex and high mannose-typeoligosaccharides.A remarkable feature of the a-Gal A mRNA was the

absence of a nucleotide segment that separates the UAA stopcodon and the poly(A) tail. Previously, the lack of 3'untranslated nucleotides had been observed only in mito-chondrial genes (30). A search of the data bases identified 256genes in which the locations of the stop codon and poly(A)addition site are known. None of these had contiguous oroverlapping bases of the stop codon and poly(A) segment. Inthe a-Gal A mRNA, the adenosine residues of the UAAcodon could be coded by the template or added posttran-scriptionally by mRNA polyadenylylation.Evidence supporting the absence of a 3' untranslated

sequence in a-Gal A mRNA includes the following. (i) ThecDNA sequence was independently determined four timesfrom both strands (Fig. 1). (ii) Tryptic peptide T-53A, whichis located 28 residues from the carboxyl terminus (Fig. 2),confirmed the translation frame in this region. Shifts toalternate reading frames following peptide T-53A wouldsignificantly alter the close correspondence observed be-tween the predicted and observed amino acid composition(Table 1). (iii) The clone contained a consensus poly(A) signal(ATTAAA) and a poly(A) tail 12 nt from this signal. Aconsensus (16) pentameric small nuclear riboprotein bindingsite (CAGCT) also was present and the CA consensuspoly(A) site (31) was observed. (iv) Finally, a second a-GalA cDNA clone (XAGL4) from an independently constructedlibrary was found to contain the identical 3' sequence,including the absence of a 3' untranslated region between thetermination codon and a 120-nt poly(A) tail. At present, it isnot possible to eliminate the remote possibility that someregion of the a-Gal A cDNA sequence would predispose toidentical cloning errors in XAG18 and XAGL4, which mightoccur either during reverse transcription or during replicationin Escherichia coli.

Although only one message was observed in RNA transferblots of HeLa cytoplasmic poly(A)+ RNA (Fig. 3), it ispossible that additional transcripts are expressed in other celltypes. The presence of two messages for the a subunit of,B-hexosaminidase A observed in fibroblast but not placentalmRNA supports this concept (26). Additional cDNA andgenomic clones must be analyzed to investigate the possiblealternative mRNA processing for a-Gal A.

Searches of amino acid and nucleotide sequence data basesidentified only a few sequences of limited homology to a-GalA. Of the sequences available for mammalian lysosomalenzyme cDNAs (22-28), only a-L-fucosidase had any signif-icant homology. Thus, identification of unique structural orfunctional "lysosomal domains" must await additional se-quence and tertiary structure data for lysosomal enzymes.The availability of the sequenced cDNA encoding human

a-Gal A will facilitate detailed analyses of structure, organi-zation, and expression of this X chromosomal gene. Suchinformation should further delineate the nature of lysosomalenzyme biosynthesis and the molecular recognition eventsinvolved in lysosomal trafficking (32). Since the a-Gal A geneundergoes dosage compensation by random X-inactivation,studies with genomic clones may provide insight into the roleof mechanisms, such as methylation (7) and chromatinexonuclease hypersensitivity (33), in controlling the expres-sion of X-linked housekeeping genes such as a-Gal A.

Finally, the sequenced a-Gal A should enable characteriza-tion of the molecular lesions that cause the deficiency ofa-Gal A activity in Fabry disease (1).

We thank Noemi Fauer and Thomas Fitzmaurice for experttechnical assistance, Sibel Bessim for preparation and im-munoabsorption of human polysomes, Drs. Martha Bond, SteveKent, Leroy Hood, and Kenneth Williams for microsequencing thea-Gal A peptides, Dr. Paul Szabo for expert advice in the initialphases of this work, and Linda Lugo for clerical assistance. Thiswork was supported in part by Grant 1-578 from the March of DimesBirth Defects Foundation, Grant NP-453 from the American CancerSociety, and GrantAM 34045 from the National Institutes of Health.H.S.B. is the recipient of a National Institutes of Health PredoctoralFellowship in medical genetics (T32 HD07105).

1. Desnick, R. J. & Sweeley, C. C. (1983) in The Metabolic Basis ofInherited Diseases, eds. Stanbury, J. S., Wyngaarden, J. S.,Fredrickson, D. S., Goldstein, J. L. & Brown, M. S. (McGraw-Hill,New York), pp. 906-944.

2. Brady, R. O., Gal, A. E., Bradley, R. M. & Martensson, E. (1967) J.Biol. Chem. 242, 1021-1026.

3. Dean, K. J. & Sweeley, C. C. (1979) J. Biol. Chem. 254, 9994-10000.4. Bishop, D. F. & Desnick, R. J. (1981) J. Biol. Chem. 256, 1307-1316.5. Fox, M. F., DuToit, D. L., Warnich, L. & Retief, A. E. (1984)

Cytogenet. Cell Genet. 38, 45-49.6. LeDonne, N. C., Jr., Fairley, J. L. & Sweeley, C. C. (1983) Arch.

Biochem. Biophys. 224, 186-195.7. Mohandas, T., Sparkes, R. S., Bishop, D. F., Desnick, R. J. & Shapiro,

L. J. (1984) Am. J. Hum. Genet. 36, 916-925.8. Calhoun, D. H., Bishop, D. F., Bernstein, H. S., Quinn, M.,

Hantzopoulos, P. & Desnick, R. J. (1985) Proc. Natl. Acad. Sci. USA82, 7364-7368.

9. Messing, J. (1983) Methods Enzymol. 101, 20-78.10. Dale, R. M. K., McClure, B. A. & Houchins, J. P. (1985) Plasmid 13,

31-40.11. Sanger, F., Nicklen, S. & Coulson, A. R. (1977) Proc. Nat!. Acad. Sci.

USA 74, 5463-5467.12. Maxam, A. M. & Gilbert, W. (1980) Methods Enzymol. 65, 499-560.13. Shapiro, S. Z. & Young, J. R. (1981) J. Biol. Chem. 256, 1495-1498.14. Korman, A. J., Knudsen, P. J., Kaufman, J. F. & Strominger, J. L.

(1982) Proc. Natl. Acad. Sci. USA 79, 1844-1848.15. Chen-Kiang, S., Wolgemuth, D. J., Hsu, M.-T. & Darnell, J. E., Jr.

(1982) Cell 28, 575-584.16. Berget, S. M. (1984) Nature (London) 309, 179-182.17. Wool, I. G. (1979) Annu. Rev. Biochem. 48, 719-754.18. Kyte, J. & Doolittle, R. F. (1982) J. Mol. Biol. 157, 105-132.19. Chou, P. Y. & Fasman, G. D. (1978) Annu. Rev. Biochem. 47, 251-276.20. Aubert, J.-P., Biserte, G. & Loucheux-Lefebvre, M.-H. (1976) Arch.

Biochem. Biophys. 175, 410-418.21. Wood, W. I., Capon, D. J., Simonsen, C. C., Eaton, D. L., Gitschier,

J., Keyt, B., Seeburg, P. H., Smith, D. H., Hollingshead, P., Wion,K. L., Delwart, E., Tuddenham, E. G. D., Vehar, G. A. & Lawn,R. M. (1984) Nature (London) 312, 330-337.

22. Faust, P. L., Kornfeld, S. & Chirgwin, J. M. (1985) Proc. Natl. Acad.Sci. USA 82, 4910-4914.

23. Fukushima, H., deWet, J. R. & O'Brien, J. S. (1985) Proc. Nat!. Acad.Sci. USA 82, 1262-1265.

24. Sorge, J., West, C., Westwood, B. & Beutler, E. (1985) Proc. Natl.Acad. Sci. USA 82, 7289-7293.

25. Guise, K. S., Korneluk, R. G., Waye, J., Lamhonwah, A.-M., Quan,F., Palmer, R., Ganschow, R. E., Sly, W. S. & Gravel, R. A. (1985)Gene 34, 105-110.

26. Myerowitz, R., Piekarz, R., Neufeld, E. F., Shows, T. B. & Suzuki, K.(1985) Proc. Natl. Acad. Sci. USA 82, 7830-7834.

27. O'Dowd, B. F., Quan, F., Willard, H. F., Lamhonwah, A.-M.,Korneluk, R. G., Lowden, J. A., Gravel, R. A. & Mahuran, D. J. (1985)Proc. Nat!. Acad. Sci. USA 82, 1184-1188.

28. Segundo, B. S., Chan, S. J. & Steiner, D. F. (1985) Proc. Natl. Acad.Sci. USA 82, 2320-2324.

29. Wickens, M. & Stephenson, P. (1984) Science 226, 1045-1051.30. Anderson, S., Bankier, A. T., Barrell, B. G., deBruijn, M. H. L.,

Coulson, A. R., Drouin, J., Eperon, I. C., Nierlich, D. P., Roe, B. A.,Sanger, F., Schreier, P. H., Smith, A. J. H., Staden, K. & Young, I. G.(1981) Nature (London) 290, 457-465.

31. Birnstiel, M. L., Busslinger, M. & Strub, K. (1985) Cell 41, 349-359.32. Sly, W. S. & Fischer, H. D. (1982) J. Cell Biochem. 18, 67-85.33. Wolf, S. F. & Migeon, B. R. (1985) Nature (London) 314, 467-469.

Genetics: Bishop et al.

Dow

nloa

ded

by g

uest

on

Feb

ruar

y 2,

202

1