5
Proc. NatL. Acad. Sci. USA Vol. 80, pp. 472-476, January 1983 Genetics Isolation and DNA sequence of a full-length cDNA clone for human X chromosome-encoded phosphoglycerate kinase (cDNA-cloning/synthetic oligonucleotides/dosage compensation/multigene families/nucleotide-binding proteins) ALAN M. MICHELSON*, ALEXANDER F. MARKHAMt, AND STUART H. ORKIN*t *Division of Hematology-Oncology, Children's Hospital Medical Center, and the Sidney Farber Cancer Institute, Department of Pediatrics and Committee on Cell and Developmental Biology, Harvard Medical School, Boston, Massachusetts 02115; and tImperial Chemical Industries Pharmaceuticals Division, Mereside, Alderly Park, Macclesfield, Cheshire 5K10 4TG, United Kingdom Communicated by David Botstein, October 6, 1982 ABSTRACT Phosphoglycerate kinase (PGK), a major enzyme in glycolysis, is encoded by the X chromosome in mammals. We have initiated molecular analysis of the PGK structural gene by isolating a full-length cDNA clone from a human fetal liver cDNA library. Synthetic oligonucleotide. mixtures encoding two different portions of PGK were used as direct in situ hybridization probes for the cDNA library. Several classes of clones were obtained based on their hybridization at different stringencies to one or both of the PGK oligonucleotide mixtures. One clone, designated pHPGK-7e, which hybridized at- high stringency to each of the synthetic probes, encoded the complete PGK protein sequence as well as 82 base pairs of 5' and,437 base pairs of 3' untranslated regions. Southern blot analysis of human genomic DNAs revealed a complex pattern of hybridizing fragments, two of which were non-X in origin. These results suggest that the human genome con- tains a small family of dispersed PGK or PGK-like genes. Phosphoglycerate kinase (PGK; ATP:3-phosphoglycerate 1- phosphotransferase, EC 2.7.2.3), a major enzyme in glycolysis, catalyzes the reversible conversion of 1,3-diphosphoglycerate to 3-phosphoglycerate, generating one molecule of ATP (1). In mammals, this enzyme is encoded by a single active locus on the long arm of the X chromosome (2-6). This locus is subject to dosage compensation by X chromosome inactivation (7). The structures of PGK proteins from several species have been analyzed in detail. Complete amino acid sequences of horse muscle and human erythrocyte enzymes reveal extensive interspecies homology (8, 9). X-ray crystallography of horse PGK demonstrated two separate protein domains that may be of functional significance to its mechanism of catalysis (8). Be- cause the yeast enzyme has a. similar primary and tertiary struc- ture (8, 10), PGK appears to be highly conserved during the course of evolution. Specific human PGK variants, some of which lead to enzyme deficiency and hemolytic anemia, have also been described (11-14). In addition to the X-linked protein, a testis-specific isozyme, whose structural relationship to the X chromosome-encoded enzyme is largely unknown, is found in a number of mammalian species (15-18). In the mouse, this isozyme has been mapped to the major histocompatibility com- plex on chromosome 17 (19), whereas in the human its chro- mosomal location has not been determined. Given this wealth of biochemical and genetic information, we have initiated molecular analysis of the PGK locus. A probe for PGK sequences would, facilitate studies of X-chromosome inactivation, gene evolution, and the tissue-specific expression of homologous nonallelic loci. As a first step we have isolated a full-length cDNA for the X-linked PGK structural gene. MATERIALS AND METHODS Construction of a Human Fetal Liver cDNA Library. Liver from a human fetus of 20-22 weeks gestation was homogenized in the presence of guanidine hydrochloride, and poly(A)+RNA was isolated by two passages through oligo(dT)-cellulose. cDNA (2.2 ,ug) was synthesized from 10 Ag of liver mRNA in the pres- ence of placental RNase inhibitor (RNasin, Biotec, Madison, WI), 1,000 units/ml. Alkaline sucrose gradient centrifugation led to isolation of 1.2 ,ug of cDNA >400 nucleotides (nt) long. Second-strand synthesis was accomplished in the presence of the Klenow fragment of Escherichia coli DNA polymerase I (Boehringer Mannheim) at 180 units/ml and a DNA concen- tration of 2.4 ug/ml. The resulting double-stranded cDNA was treated with S1 nuclease (Sigma), 20 units/ml at 37°C for 30 min, and a size fraction of duplex DNA >400 base pairs (bp) in length was obtained by sedimentation through a neutral sucrose gradient. Homopolymer tracts of dC were added to the 3'-ends of this cDNA by using a recently developed protocol (unpub- lished). The dC-tailed cDNA was then hybridized to Pst I- cleaved and dG-tailed plasmid vector pKT218 (20), and tetra- cycline-resistant colonies were selected after transformation of E. coli strain MC1061 (21). Approximately 141,000 independent recombinant clones were obtained from the original plates, pooled, and stored at -70°C as a glycerol stock without further amplification. Synthesis of Oligodeoxyribonucleotides. Two mixtures of oligonucleotides encoding portions of the human PGK protein sequence (Fig. 1) were synthesized by using a solid-phase phos- photriester method (22, 23). For amino acids specified by more than one codon, all possible nucleotides were inserted at the ambiguous positions. Colony Screening with Oligonucleotides. A total of 50,000 colonies was screened on nitrocellulose filters (Millipore HAHY) by using 32P-labeled oligonucleotides. Filters were prehybridized and hybridized as described by Woods et al. (23), except that the incubation temperature was 40°C. Filters were then washed with several changes of 6x standard saline citrate (NaCl/Cit; lx is 0.15 M NaCl/0.015 M sodium citrate) con- taining 0.05% sodium pyrophosphate at the indicated temper- atures (see Results). Isolation and Characterization of Plasmid DNA. Plasmid DNA was extracted from selected clones by using an alkaline lysis procedure (24) and purified by CsCl/ethidium bromide density gradient centrifugation. A detailed restriction map of pHPGK-7e was constructed by standard procedures. DNA se- Abbreviations: PGK, phosphoglycerate kinase; nt, nucleotide(s); bp, base pair(s); kb,' kilobase pair(s); NaCI/Cit, standard saline citrate (0.15 M sodium chloride/0.015 M sodium citrate, pH 7.0). t To whom correspondence should be addressed at: Children's Hospital Medical Center, 300 Longwood Ave., Boston, 'MA 02115. 472 The publication costs of this article were defrayed in part by page charge payment. This article must therefore be hereby marked "advertise- nwnt" in accordance with 18 U. S. C. §1734 solely to indicate this fact. Downloaded by guest on August 31, 2020

Isolation DNA cDNA human - PNAS1.7, 1.9, 1.4, and0.5kilobasepairs(kb)inlengthinclones2bII, 7e, 14a, and 15a, respectively (data not shown). Theminimal coding region predicted for a

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Isolation DNA cDNA human - PNAS1.7, 1.9, 1.4, and0.5kilobasepairs(kb)inlengthinclones2bII, 7e, 14a, and 15a, respectively (data not shown). Theminimal coding region predicted for a

Proc. NatL. Acad. Sci. USAVol. 80, pp. 472-476, January 1983Genetics

Isolation and DNA sequence of a full-length cDNA clone for humanX chromosome-encoded phosphoglycerate kinase

(cDNA-cloning/synthetic oligonucleotides/dosage compensation/multigene families/nucleotide-binding proteins)

ALAN M. MICHELSON*, ALEXANDER F. MARKHAMt, AND STUART H. ORKIN*t*Division of Hematology-Oncology, Children's Hospital Medical Center, and the Sidney Farber Cancer Institute, Department of Pediatrics and Committee on Celland Developmental Biology, Harvard Medical School, Boston, Massachusetts 02115; and tImperial Chemical Industries Pharmaceuticals Division, Mereside,Alderly Park, Macclesfield, Cheshire 5K10 4TG, United KingdomCommunicated by David Botstein, October 6, 1982

ABSTRACT Phosphoglycerate kinase (PGK), a major enzymein glycolysis, is encoded by the X chromosome in mammals. Wehave initiated molecular analysis of the PGK structural gene byisolating a full-length cDNA clone from a human fetal liver cDNAlibrary. Synthetic oligonucleotide. mixtures encoding two differentportions of PGK were used as direct in situ hybridization probesfor the cDNA library. Several classes of clones were obtainedbased on their hybridization at different stringencies to one orboth of the PGK oligonucleotide mixtures. One clone, designatedpHPGK-7e, which hybridized at- high stringency to each of thesynthetic probes, encoded the complete PGK protein sequence aswell as 82 base pairs of 5' and,437 base pairs of 3' untranslatedregions. Southern blot analysis ofhuman genomic DNAs revealeda complex pattern of hybridizing fragments, two of which werenon-X in origin. These results suggest that the human genome con-tains a small family of dispersed PGK or PGK-like genes.

Phosphoglycerate kinase (PGK; ATP:3-phosphoglycerate 1-phosphotransferase, EC 2.7.2.3), a major enzyme in glycolysis,catalyzes the reversible conversion of 1,3-diphosphoglycerateto 3-phosphoglycerate, generating one molecule ofATP (1). Inmammals, this enzyme is encoded by a single active locus onthe long arm of the X chromosome (2-6). This locus is subjectto dosage compensation by X chromosome inactivation (7).The structures of PGK proteins from several species have

been analyzed in detail. Complete amino acid sequences ofhorse muscle and human erythrocyte enzymes reveal extensiveinterspecies homology (8, 9). X-ray crystallography of horsePGK demonstrated two separate protein domains that may beof functional significance to its mechanism of catalysis (8). Be-cause the yeast enzyme has a. similar primary and tertiary struc-ture (8, 10), PGK appears to be highly conserved during thecourse of evolution. Specific human PGK variants, some ofwhich lead to enzyme deficiency and hemolytic anemia, havealso been described (11-14). In addition to the X-linked protein,a testis-specific isozyme, whose structural relationship to theX chromosome-encoded enzyme is largely unknown, is foundin a number of mammalian species (15-18). In the mouse, thisisozyme has been mapped to the major histocompatibility com-plex on chromosome 17 (19), whereas in the human its chro-mosomal location has not been determined.

Given this wealth of biochemical and genetic information,we have initiated molecular analysis of the PGK locus. A probefor PGK sequences would, facilitate studies of X-chromosomeinactivation, gene evolution, and the tissue-specific expressionof homologous nonallelic loci. As a first step we have isolateda full-length cDNA for the X-linked PGK structural gene.

MATERIALS AND METHODSConstruction of a Human Fetal Liver cDNA Library. Liver

from a human fetus of20-22 weeks gestation was homogenizedin the presence of guanidine hydrochloride, and poly(A)+RNAwas isolated by two passages through oligo(dT)-cellulose. cDNA(2.2 ,ug) was synthesized from 10 Ag of liver mRNA in the pres-ence of placental RNase inhibitor (RNasin, Biotec, Madison,WI), 1,000 units/ml. Alkaline sucrose gradient centrifugationled to isolation of 1.2 ,ug ofcDNA >400 nucleotides (nt) long.Second-strand synthesis was accomplished in the presence ofthe Klenow fragment of Escherichia coli DNA polymerase I(Boehringer Mannheim) at 180 units/ml and a DNA concen-tration of2.4 ug/ml. The resulting double-stranded cDNA wastreated with S1 nuclease (Sigma), 20 units/ml at 37°C for 30min, and a size fraction of duplex DNA >400 base pairs (bp) inlength was obtained by sedimentation through a neutral sucrosegradient. Homopolymer tracts ofdC were added to the 3'-endsof this cDNA by using a recently developed protocol (unpub-lished). The dC-tailed cDNA was then hybridized to Pst I-cleaved and dG-tailed plasmid vector pKT218 (20), and tetra-cycline-resistant colonies were selected after transformation ofE. coli strain MC1061 (21). Approximately 141,000 independentrecombinant clones were obtained from the original plates,pooled, and stored at -70°C as a glycerol stock without furtheramplification.

Synthesis of Oligodeoxyribonucleotides. Two mixtures ofoligonucleotides encoding portions of the human PGK proteinsequence (Fig. 1) were synthesized by using a solid-phase phos-photriester method (22, 23). For amino acids specified by morethan one codon, all possible nucleotides were inserted at theambiguous positions.

Colony Screening with Oligonucleotides. A total of 50,000colonies was screened on nitrocellulose filters (MilliporeHAHY) by using 32P-labeled oligonucleotides. Filters wereprehybridized and hybridized as described by Woods et al. (23),except that the incubation temperature was 40°C. Filters werethen washed with several changes of6x standard saline citrate(NaCl/Cit; lx is 0.15 M NaCl/0.015 M sodium citrate) con-taining 0.05% sodium pyrophosphate at the indicated temper-atures (see Results).

Isolation and Characterization of Plasmid DNA. PlasmidDNA was extracted from selected clones by using an alkalinelysis procedure (24) and purified by CsCl/ethidium bromidedensity gradient centrifugation. A detailed restriction map ofpHPGK-7e was constructed by standard procedures. DNA se-

Abbreviations: PGK, phosphoglycerate kinase; nt, nucleotide(s); bp,base pair(s); kb,' kilobase pair(s); NaCI/Cit, standard saline citrate (0.15M sodium chloride/0.015 M sodium citrate, pH 7.0).t To whom correspondence should be addressed at: Children's HospitalMedical Center, 300 Longwood Ave., Boston, 'MA 02115.

472

The publication costs ofthis article were defrayed in part by page chargepayment. This article must therefore be hereby marked "advertise-nwnt" in accordance with 18 U. S. C. §1734 solely to indicate this fact.

Dow

nloa

ded

by g

uest

on

Aug

ust 3

1, 2

020

Page 2: Isolation DNA cDNA human - PNAS1.7, 1.9, 1.4, and0.5kilobasepairs(kb)inlengthinclones2bII, 7e, 14a, and 15a, respectively (data not shown). Theminimal coding region predicted for a

Proc. NatL Acad. Sci. USA 80 (1983) 473

PGK-1 OLIGONUCLEOTIDE MIXTURE

PROTEIN: Glu Trp, Glu, Ala, Phe Ala343 344 345 346 347 348

mRNA: 5GAA UGG GAG GCN UUU GCN3G G ~~COLIGO MIXTURE: 3'CTT ACC CTT CGN'AAA CG5c CT AA

PGK-2 OLIGONUCLEOTIDE MIXTURE

PROTEIN: Cys, Cys, Ala, Lys, Trp ,Asn378 379 380 381 382 383

mRNA: 5'UGUc UGU GCN AAA UGG AAU3OLIGO MIXTURE: 3'CA ACA CGN' TTT ACC TT5

FIG. 1. PGK oligonucleotide sequences. Two regions of the PGKprotein sequence (9) were chosen for the synthesis of oligonucleotideshaving minimal codon ambiguity. The PGK-1 mixture consisted of 32different 17-mers corresponding to amino acids 343-348; the PGK-2pool had 32 different 16-mers whose sequences were predicted fromamino acids 378-383. N, all four possible nucleotides; N', the comple-mentary bases.

quence analysis was performed by the method of Maxam andGilbert (25).

Blot Hybridization Analysis of Genomic DNA. Restrictionenzyme digests of genomic DNAs were subjected to electro-phoresis through 0.8% agarose gels, transferred to nitrocellu-lose filters by the method of Southern (26), and hybridized withnick-translated Pst I insert from clone pHPGK-7e.

RESULTSScreening of Fetal Liver cDNA Library for PGK Se-

quences. Two mixtures of oligonucleotides were prepared ashybridization probes to identify cDNA clones containing PGKsequences. The first, designated PGK-1 (Fig. 1), was 17 nt longand encoded amino acids 343-348 of the human PGKprotein. § The second, PGK-2, was 16 nt long and correspondedto the region ofamino acids 378-383. Due to codon ambiguity,each oligonucleotide mixture contained 32 different DNA se-quences, one of which should have perfect homology with thePGK gene.

Labeled oligonucleotides were used directly as in situ hy-bridization probes against replica filters containing lysed colo-nies of the human fetal liver cDNA library. The 50,000 coloniesscreened with a combination ofPGK-1 and PGK-2 oligonucleo-tides yielded about 60 recombinants that exhibited hybridiza-tion after washing in 6x NaCl/Cit at 450C. Of these, 19 werepositive on rescreening with both probes, and 14 were colony-purified for further analysis (Fig. 2). All clones formed stablehybrids with a combination ofPGK-1 and PGK-2 mixtures whenwashed in 6x NaCl/Cit at480C. Upon hybridization separatelyto either PGK-1 or PGK-2 followed by more stringent washing,several different hybridization patterns were observed. Al-though all 14 clones hybridized to PGK-1 upon washing at 480C,some (clones lc, 2al, 2aWI, 6b, 14b, 16b, and 18d) were negativewith PGK-2 under similar conditions (Fig. 2 c and d). Only 10of the original 14 clones formed stable hybrids with PGK-1 aftera 560C wash (Fig. 2e), whereas 6 of7 that hybridized with PGK-2 at 480C maintained stable hybridization at 540C (Fig. 2f).These 6 were included among the 10 clones exhibiting high-stringency hybridization with PGK-1 (Fig. 2e). The additionalfour clones displayed differential hybridization to PGK-1 under

a.| b PGK-1+2/48'ic 2b1 2b1 2al * * *

2al 6b 6d *

7e 10a 14a 14b* * * *

15a 16b 18d * *

c. PGK-1/48' d PGK-2/48'

0

e. PGK- 56 f PGK-254 I

0 * *i . 0

* I *

. * 0 f * * A

FIG. 2. In situ hybridization of selected cDNA clones with thePGKoligonucleotides. Fourteen human fetal livercDNA clones were chosenfor further analysis based on their initial hybridization to a combi-nation ofPGK-1 andPGK-2 oligonucleotides (see Results). Replica fil-ters were prepared with the colonies arranged in the grid pattern ina. Each filter was hybridized with one or both oligonucleotide mixturesand subsequently washed in 6x NaCl/Cit containing 0.05% sodiumpyrophosphate at the indicated temperatures (b-f).

conditions that should discriminate against single nucleotidemismatches (27). A single clone (1Oa) formed stable hybrids withPGK-1 at 56TC and with PGK-2 at 52TC (data not shown) butbegan to melt at 54TC with PGK-2. These findings indicate thatclones with different degrees of homology to PGK-1 and PGK-2 were represented in the fetal liver cDNA.

Identification of a PGK cDNA Clone. Plasmid DNA wasisolated from four clones (2bII, 7e, 14a, and 15a) that displayedthe most stringent hybridization with both oligonucleotide mix-tures (Fig. 2). Digestion with Pst I revealed single cloned inserts1.7, 1.9, 1.4, and 0.5 kilobase pairs (kb) in length in clones 2bII,7e, 14a, and 15a, respectively (data not shown). The minimalcoding region predicted for a full-length PGK cDNA clone is1.25 kb. The largest clone, 7e, was subjected to detailed re-striction mapping with a battery of enzymes that either failedto cleave the vector or cut it only once. HindIII, Xma I, PvuII, Xba I, Sal I, Sac I, and Ava I cleaved the cloned insertwhereas EcoRI, BamHI, Kpn I, Xho I, Pst I, Bgl II, and BstEIIdid not (data not shown). Double digestion of insert DNA withPst I and Xma I yielded fragments of 1.2 and 0.7 kb (see Fig.3) from which initial DNA sequence analysis predicted theCOOH terminus of the PGK protein. This clone has thereforebeen designated pHPGK-7e.

Fortuitously, the Xma I site in pHPGK-7e corresponded tothe region of the cDNA encoding amino acids 348-350 whichlies close to and between the segments to which the oligonu-cleotides were directed (Fig. 1). Ofthe 32 different possibilitiesin each of the oligonucleotide mixtures, the actual PGK se-quences that were determined from the Xma I site are 5' G-C-A-A-A-A-G-C-T-T-C-C-C-A-T-T-C 3' and 5' T-T-C-C-A-T-T-T-G-G-C-A-C-A-G-C 3' for PGK-1 and PGK-2, respectively.

Complete DNA Sequence of pHPGK-7e. A detailed restric-tion map of the pHPGK-7e Pst I insert was constructed (Fig.3A and B). Its DNA sequence, derived according to the strategydepicted in Fig. 3C, is presented in Fig. 4. The entire codingregion of the mRNA and a 3' untranslated region of 437 bp areincluded in the clone. In addition, 82 bp of 5' untranslated re-

§ Our numbering ofamino acids corresponds to that predicted from thecDNA sequence (Fig. 4) rather than to the published PGK proteinsequence (9), as discussed below.

Genetics: Michelson et aL

Dow

nloa

ded

by g

uest

on

Aug

ust 3

1, 2

020

Page 3: Isolation DNA cDNA human - PNAS1.7, 1.9, 1.4, and0.5kilobasepairs(kb)inlengthinclones2bII, 7e, 14a, and 15a, respectively (data not shown). Theminimal coding region predicted for a

474 Genetics: Michelson et aL Proc. NatL Acad. Sci. USA 80(1983)

A. 1

:5'-UTt

PttB. Avall

BstNIDdeFnu4HIHind Ill

HinfMspPvu 11SacSalSau3AISau961TaqXbaXma

500 1000

Codina1500 1767

34UT

I IIf I I

I ~~~I II I I I I

I

I I~~~~~Il~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~I~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Il

Pst

C. * * | * * * * * +

FIG. 3. Restriction map and sequence determination strategy for

the cDNA insert of pHPGK-7e. (A) Diagram of the regions encoded bythe cloned cDNA. The scale at the top is in base pairs and begins at

the first nucleotide of the 5' untranslated region. (B) Restriction map

of PGK cDNA. Xma I-Pst I fragments (5-labeled at the Xma I site)

specific for the 5' and 3' ends of the cDNA were mapped by partial

digestion with the indicated restriction endonucleases. (C) The se-

quence of the entire cDNA insert was determined according to the

strategy illustrated here. The PstI and Sac I sites were labeled at their

3' ends; all of the other sites wereDa-labeled.

gion are present. Although the actual length of the 5' untrans-

lated region is not yet known, it is likely that our clone is nearly

full-length because the 5' untranslated regions of eukaryoticmRNAs are generally less than 100 bp. In contrast, the 3' un-

translated region of PGK is unusually long, although even

longer 3' ends have been reported (28, 29). A single putativepolyadenylylation signal, A-A-T-A-A-A, is situated 17 bp from

the poly(A) addition site. In contrast, five polyadenylylationsignals are found in the lengthy 3' untranslated region ofporcine

preproenkephalin B mRNA (29).The amino acid sequence predicted from our cDNA agrees

with the published human PGK protein sequence (9) with the

following exceptions. The reported peptide sequence included

an extra lysine residue at position 39. In this regard, our pre-

dicted amino acid sequence agrees with that derived for the

horse enzyme (8): both proteins have 416 amino acids and Arg-

Ile-Lys-Ala rather than Arg-Lys-Ile-Lys-Ala at residues 38-41.

In addition, the cDNA sequence assigns Asn to residues 52, 109,

275, and 336 (based on a total of 416 amino acids, Fig. 4) rather

than Asp. Similarly, residue 299 is Gln, not Glu, and residue

385 is Glu, not Gln, as previously published (9). Only 11 dif-

ferences (3% nonhomology) exist between the human and horse

protein sequences.PGK-Like Sequences in Human and Rodent Genomic

DNAs. Southern blot hybridization (26) was used to examinethe representation of PGK sequences in genomic DNAs. Hu-

man samples containing derent copy numbers of the X chro-

mosome were analyzed in parallel after EcoRI digestion (Fig.5). A total of eight DNA fragments hybridized with the pHPGK-7e insert at high stringency (lanes 1-3), corresponding to

G36.A C~"CAI10 10 00 0 50 S0 70

1 10Ini Ser Lou Ser Asn Lys Lou Thr Leu Asp Lys Lou Asp Va1 Lys Gly Lys Arg Val ValATG TCG CTT TCT AAC AAG CTG ACG CTG GAC MG CTG GAC GTT AM GGG MG CGG GTC GTT*0 90 100 1le I20 11020 30

Met Arg Val Asp Phe Asn Val Pro Met Lys Asn Asn Gin Ile Thr Asn Asn Gin Arg IleATG AGA GTC GAC TTC MT GTT CCT ATG MG MC MC CAG ATA ACA MC MC CAG AGG ATT

1;0 150 160 1;0 100 lie40 SOLys Ala Ala Val Pro Ser Ile Lys Phe Cys Lou Asp Asn Gly Ala Lys Ser Val Val LeuMG GCT GCT GTC CCA AGC ATC AM TTC TGC TFG GAC MT GGA GCC MG TCG GTA GTC CTT

200 210 220 210 200 21060 70Met Ser His Leu Gly Arg Pro Asp Gly Val Pro Met Pro Asp Lys Tyr Ser Leu Glu ProATG AGC CAC CTA GGC CGG CCT GAT GGT GTG CCC ATG CCT GAC AAG TAC TCC TTA GAG CCA

260 270 200 2;0 100 31080 90

Val Ala Val Glu Leu Lys Ser Leu Leu Gly Lys Asp Val Leu Phe Leu Lys Asp Cys ValGTT GCT GTA GM CTC AM TCT CTG CTG GGC MG GAT GTT CTG TTC IG AAG GAC TGT GTA020 300 300 350 s60 3;0100 110Gly Pro Glu Val Glu Lys Ala Cys Ala Asn Pro Ala Ala Gly Ser Val Ile Leu Leu GluGGC CCA GM GTG GAG AM GCC TGT GCC MC CCA GCT GCT GGG TCT GTC ATC CTG CTG GAG

3s0 3S 00 010o 020 430120 130Asn Leu Arg Phe His Val Glu Glu Glu Gly Lys Gly Lys Asp Ala Ser Gly Asn Lys ValMC CTC CGC m CAT GTG GAG GM GM GGG MG GGA AM GAT GCT TCT GGG MC MG GTT

-.*o o 4iO ;o ao 4;0140 150Lys Ala Glu Pro Ala Lys Ile Glu Ala Phe Arg Ala Ser Lou Ser Lys Leu Gly Asp ValAM GCC GAG CCA GCC AM ATA GM GCT TTC CGA GCT TCA CTT TCC MG CTA GGG GAT GTC

500 s51 520 530 540 s55160 170Tyr Val Asn Asp Ala Phe Gly Thr Ala His Arg Ala His Ser Ser Met Val Gly Val AsnTAT GTC MT GAT GCT ITT GGC ACT GCT CAC AGA GCC CAC AGC TCC ATG GTA GGA GTC MTS60 570 5s0 590 600 610

180 190Leu Pro Gln Lys Ala Gly Gly Phe Leu Met Lys Lys Glu Leu Asn Tyr Phe Ala Lys AlaCTG CCA CAG MG GCT GGT GGG mTT TMG ATG AAG MG GAG CTG MC TAC mT GCA MG GCC620 600 600 650 660 670

200 210Leu Glu Ser Pro Glu Arg Pro Phe Leu Ala Ile Leu Gly Gly Ala Lys Val Ala Asp LysTTG GAG AGC CCA GAG CGA CCC TTC CTG GCC ATC CTG GGC GGA GCT AM GTT GCA GAC MG600 690 700 720 720 730220 230Ile Gln Leu Ile Asn Asn Met Leu Asp Lys Val Asn Glu Met Ile Ile Gly Gly Gly MetATC CAG CTC ATC MT AAT ATG CTG GAC AM GTC MT GAG ATG ATT ATT GGT GGT GGA ATG700 750 760 770 7;0 710

240 250Ala Phe Thr Phe Leu Lys Val Leu Asn Asn Met Glu Ile Gly Thr Ser Leu Phe Asp GluGCT m ACC TTC CTT MG GTG CTC MC MC ATG GAG ATT GGC ACT TCT CTG TmT GAT GM

S00 010 020 830 800 s05260 270Glu Gly Ala Lys Ile Val Lys Asp Leu Met Ser Lys Ala Glu Lys Asn Gly Val Lys IleGAG GGA GCC MG ATT GTC AM GAC CTA ATG TCC AAA GCT GAG MG MT GGT GTG MG ATT

s06 870 800 0S0 900 S91280 290Thr Leu Pro Val Asp Phe Val Thr Ala Asp Lys Phe Asp Glu Asn Ala Lys Thr Gly GlnACC TTG CCT GTT GAC mTI GTC ACT GCT GAC MG mTI GAT GAG MT GCC MG ACT GGC CM2io 930 9;0 95o g60 970300 310Ala Thr Val Ala Ser Gly Ile Pro Ala Gly Trp Met Gly Leu Asp Cys Gly Pro Glu SerGCC ACT GTG GCT TCT GGC ATA CCT GCT GGC TGG ATG GGC TTG GAC TGT GGT CCT GAA AGC0io 9;0 0000 1010 l020 l100320 330Ser Lys Lys Tyr Ala Glu Ala Val Thr Arg Ala Lys Gln Ile Val Trp Asn Gly Pro ValAGC MG MG TAT GCT GAG GCT GTC ACT CGG GCT MG CAG ATT GTG TGG AAT GGT CCT GTG1000 10o5 106 0I70 1000 1090

340 350Gly Val Phe Glu Trp Glu Ala Phe Ala Arg Gly Thr Lys Al. Leu Met Asp Glu Val ValGGG GTA TIT GM TGG GM GCT TmT GCC CGG GGA ACC AM GCT CTC ATG GAT GAG GTG GTGlies lGo 1i20 0100 11i0 lise360 370Lys Ala Thr Ser Arg Gly Cys Ile Thr Ile Ile Gly Gly Gly Asp Thr Ala Thr Cys CysAM GCC ACT TCT AGG GGC TGC ATC ACC ATC ATA GGT GGT GGA GAC ACT GCC ACT TGC TGT1160 li70 lis li9 lie lil380 390Ala Lys Trp Asn Thr Glu Asp Lys Val Ser His Val Ser Thr Gly Gly Gly Al. Ser LouGCC AM TGG AAC ACG GAG GAT AAA GTC AGC CAT GTG AGC ACT GGG GGT GGT GCC AGT TTG

li22 120 1200 liso li60 0270400 410Glu Leu Leu Glu Gly Lys Val Leu Pro Gly Val Asp Ala Leu Ser Asn Ile TerGAG CTC CTG GM GGT AM GTC CTT CCT GGG GTG GAT GCT CTC AGC MT ATT TAG TACTTTCl200 lio 1200 1010 li22 1230 1300

GC IAGTT IFITGCACAGCCCTAMFTCMACTAGCATIT7GCATCTCCACTCTFAGCTAAMCCTC1050 1260 1270 1300 1390 1;00 1;10 1;20

CATGTCAAATFCAGTG CCAGAGATGCAGTGCAGGMCLlAAACAGTTGCACAGCATCTCAGCTATC1;00 1;00 1;05 1;60 1970 1;00 1;90 1i50

0I10 1520 l5o l5o 1550 1960 0570 1500

CATATATATATT TA MATA A A1590 1600 1610 1620 1600 1600 1600 0660

TFGMAATMATTCAFAll0CAGT1l

1;99 Isle til li2S lilt Biit lise lid9

lSE 1760 1767

FIG. 4. DNA sequence of the pHPGK-7e insert. The entire Pst Iinsert of clone pHPGK-7e, including the 5' and 3' tails and the poly(A)stretch, corresponds to a length of 1,871 bp. The sequence was deter-mined by the method of Maxam and Gilbert (25) according to the strat-egy depicted in Fig. 3C. A presumptive polyadenylylation signal in the3' untranslated region is underlined.

Dow

nloa

ded

by g

uest

on

Aug

ust 3

1, 2

020

Page 4: Isolation DNA cDNA human - PNAS1.7, 1.9, 1.4, and0.5kilobasepairs(kb)inlengthinclones2bII, 7e, 14a, and 15a, respectively (data not shown). Theminimal coding region predicted for a

Proc. Natd Acad. Sci. USA 80 (1983) 475

x

a)

-j E

X>-0

Xx X I I

a)

16.0-

9.6-~7.8-ar% . 4

6.3- I -V*

4.0-

3.55-t

2 .75-

FIG. 5. Genomic representation of PGK sequences. EcoRI digestsof DNAs containing the indicated numbers of X chromosomes, as wellas HeLa, hamster, and mouse DNAs, were subjected to blot hybridiza-tion by the method of Southern (26) with the Pst I insert of pHPGK-7e as probe. Final washing of the filter was performed at 6600 in 0.1 xNaCI/Cit. The sizes (in kb) of the hybridizing fragments in humanDNA are indicated at the left.

lengths of 16.0, 11.4, 9.6, 7.8, 6.3, 4.0, 3.55, and 2.75 kb. Sixof these fragments exhibited dosage of the hybridization signalwith the number of X chromosomes, whereas two (the 9.6- and7.8-kb bands) had the same intensities in 4X and IX DNAs (lanes1-3). The apparent absence of the 16-kb band in XY DNA (lane3) is probably due to a combination of reduced efficiency oftransfer from the gel in this size range, the presence of only asmall proportion of sequences on this fragment that are com-plementary to the probe, and the haploid representation of theX chromosome in male genomic DNA. Although we anticipatedfinding X chromosome-specific hybridization of the PGKcDNA, the presence of non-X-derived sequences capable ofhybridization at high stringency (0.1 x NaCl/Cit at 66QC) wasunexpected. Both hamster and mouse DNAs also hybridizedat high stringency with the heterologous PGK probe (lanes 5and6). In HeLa cell DNA, the non-X 9.6-kb fragment was replacedby one of 8.3 kb (lane 4), due to either a DNA polymorphismor an abnormal rearrangement of sequences in this tumor cellline.

DISCUSSIONUsing synthetic oligonucleotides as probes, we have isolated afull-length cDNA clone for human PGK. The appropriate choiceofprobe sequences for this low-abundance product was directedby the reported amino acid sequence (9). To ensure identifi-cation of PGK cDNA clones, we used two mixtures of oligo-nucleotides directed to different regions of the PGK amino acidsequence. This approach simplified the identification of an au-

thentic PGK clone and minimized the chance that protein poly-morphism or sequence errors would result in the inability todetect such a clone in the cDNA library. In addition, the use

of two oligonucleotide mixtures led to the isolation of severalPGK-like cDNAs that exhibit different thermal stabilities withone or both probes. These do not merely contain partial cDNAtranscripts because preliminary sequence analysis of one clone(2aI) that hybridized to PGK-1 at 560C but which failed to hy-bridize with PGK-2 at any temperature demonstrates an insertthat is not entirely homologous to pHPGK-7e (unpublisheddata). It is also significant that many of the PGK-like clones,including 2aI, hybridized with the oligonucleotides under con-ditions that do not tolerate single mismatches (27).

Such clones may represent loci for partially homologous ki-nases or other nucleotide-binding proteins (30), transcripts ofrelated pseudogenes, or previously unidentified PGK isozymesexpressed in fetal liver and analogous to the known testis-spe-cific enzyme (15-18). The first possibility is particularly attrac-tive because both PGK-1 and PGK-2 oligonucleotides were di-rected to the region encoding the ATP binding domain of theenzyme, with PGK-1 lying adjacent to and including specificamino acid contact sites (8).The availability of a full-length cDNA clone will facilitate

detailed analysis of the structure, evolution, and expression ofthe PGK gene in various species. Our initial examination of thegenomic representation ofPGK sequences has revealed unex-pected complexity (Fig. 5). Six EcoRI fragments encompassing44 kb are of X-chromosome origin and hybridize at high strin-gency with the PGK cDNA. Whether this pattern reflects asingle large gene interrupted by several intervening sequencesor multiple genes having a simpler structure is unknown. Onlya single functional X-linked gene is predicted from the availablegenetic data (2-6). However, preliminary blot analysis with 5'-and 3'-specific PGK probes suggests the presence ofmore thanone X-linked 5' and 3' end (unpublished data). The apparentcomplexity of the genomic hybridization may result from ad-ditional PGK-like pseudogenes or processed genes (31). Alter-natively, there may be homologous functional genes, perhapsencoding sequences represented in the non-PGK cDNA clonesisolated with the PGK oligonucleotides. These may include locispecifying related nucleotide-binding proteins (30).The presence of two EcoRI fragments ofnon-X-chromosome

origin that hybridize at high stringency with PGK cDNA wasalso unexpected because most unique structural gene se-quences that have been examined map to a single chromosome(32). A likely but unproven possibility is that at least some ofthese sequences encode the human testis-specific PGK isozyme(18). The chromosomal assignment of these fragments is of par-ticular interest in light of the linkage of the testis-specific iso-zyme to the major histocompatibility complex on mouse chro-mosome 17 (19). Using pHPGK-7e as a heterologous probe, wehave recently identified PGK sequences on both mouse chro-mosomes X and 17 (unpublished data). PGK sequences are pres-ent, therefore, as a small, dispersed multigene family in thegenomes of at least two widely divergent species. An apparentlysimilar situation has been reported for mouse a-globin (33) andhuman argininosuccinate synthetase (34, 35) genes. In the lattercase, genomic fragments map to several chromosomes, includ-ing X and Y, despite the absence of protein heterogeneity (34).The existence of autosomal sequences homologous to X-linkedloci must be reconciled with the proposed evolutionary stabilityof the X chromosome (36). Duplication and transposition of suchsequences may have occurred.The evolutionary conservation of PGK has been inferred

from interspecies similarities in protein structure and enzymaticfunction (2, 8-20). The high-stringency cross-hybridization ofhuman PGK cDNA with rodent DNAs provides independentevidence for this proposal. Comparison ofPGK sequences froma variety of species, including yeast (37), may be informative in

Genetics: Michelson et aL

Dow

nloa

ded

by g

uest

on

Aug

ust 3

1, 2

020

Page 5: Isolation DNA cDNA human - PNAS1.7, 1.9, 1.4, and0.5kilobasepairs(kb)inlengthinclones2bII, 7e, 14a, and 15a, respectively (data not shown). Theminimal coding region predicted for a

Proc. Natd Acad. Sci. USA 80 (1983)

this regard. When the structure of the authentic, expressedPGK gene is determined, the placement of any intervening se-quences with respect to the defined protein domains (8, 10)should provide data relevant to the suggested evolutionary roleof intervening sequences (38).

Most genes on the X chromosome are subject to dosage com-pensation (7), including the loci for PGK, glucose-6-phosphatedehydrogenase (G6PD), and hypoxanthine phosphoribosyl-transferase (HPRT). In addition to our PGK cDNA, clones forthe G6PD (39) and HPRT (40) loci have recently been isolated.The availability of DNA probes for specific genes of knownstructure whose expression can be assayed easily in various celltypes will permit refined molecular analysis of the mechanismsresponsible for X chromosome inactivation. This approach mayhave distinct advantages over the use of random probes cor-responding to undefined regions ofthe X chromosome (41, 42).For example, DNA methylation has been implicated in the ini-tiation and maintenance of X chromosome inactivation (43).Support for this hypothesis derives from the ability of 5-aza-cytidine, an inhibitor of DNA methylation, to reactivate theexpression of genes on a previously repressed X chromosome(44), an effect which occurs at the DNA level (45). It will nowbe possible to map methylation sites in and around the PGK,G6PD, and HPRT genes on active, inactive, and reactivated Xchromosomes in an attempt to define the regulatory signals thatgovern their expression in each of these states.

We thank Dr. David Kurnit for providing us with fetal liver samplesand Dr. Gail Bruns for the human and rodent DNAs. We are also grate-ful to Dr. Derek Woods for his helpful suggestions on the constructionand screening of the cDNA library. A. M. M. was supported by fellow-ships from the Insurance Medical Scientist Scholarship Fund (throughthe generosity of the North American Reassurance Company) and fromthe O'Brien Foundation. S. H.O. is the recipient of a Research CareerDevelopment Award from the National Heart, Lung, and Blood Insti-tute.

1. Scopes, R. K. (1973) in The Enzymes, ed. Boyer, P. D. (Aca-demic, New York), Vol. 8, pp. 335-351.

2. Valentine, W. N., Hsieh, H. S., Paglia, D. E., Anderson, H. M.,Baughan, M. A., Jaff6, E. R. & Garson, 0. M. (1969) N. EngL J.Med. 280, 528-534.

3. Chen, S. H., Malcolm, L. A., Yoshida, A. & Giblett, E. R. (1971)Am. J. Hum. Genet. 23, 87-91.

4. Khan, P. M., Westerveld, A., Grzechik, K. H., Deys, B. G.,Garson, 0. M. & Siniscalco, M. (1971) Am. J. Hum. Genet 23,614-623.

5. Nielsen, J. T. & Chapman, V. M. (1977) Genetics 87, 319-325.6. Shows, T. B. & Brown, J. A. (1975) Proc. Natl Acad. Sci. USA 72,

2125-2129.7. Lyon, M. F. (1961) Nature (London) 190, 372-373.8. Banks, R. D., Blake, C. C. F., Evans, P. R., Haser, R., Rice, D.

W., Hardy, G. W., Merrett, M. & Philipps, A. W. (1979) Nature(London) 279, 773-777.

9. Huang, I. Y., Welch, C. D. & Yoshida, A. (1980)J. BioL Chem.255, 6412-6420.

10. Bryant, T. N., Watson, J. C. & Wendell, P. L. (1974) Nature(London) 247, 14-17.

11. Yoshida, A., Watanabe, S., Chen, S. H., Giblett, E. R. & Mal-colm, L. A. (1972)J. BioL Chem. 247, 446-449.

12. Fujii, H., Krietsch, W. K. G. & Yoshida, A. (1980)J. Biol Chem.255, 6421-6423.

13. Fujii, H. & Yoshida, A. (1980) Proc. Nati Acad. Sci. USA 77,5461-5465.

14. Fujii, H., Chen, S. H., Akatsuka, J., Miwa, S. & Yoshida, A.(1981) Proc. Natl Acad. Sci. USA 78, 2587-2590.

15. Vandeberg, J. L., Cooper, D. W. & Close, P. J. (1973) Nature(London) New Biol 243, 48-50.

16. Pegoraro, B. & Lee, C. Y. (1978) Biochim. Biophys. Acta 522,423-433.

17. Pegoraro, B., Ansari, A. A., Lee, C. Y. & Erickson, R. P. (1978)FEBS Lett. 95, 371-374.

18. Chen, S. H., Donahue, R. P. & Scott, C. R. (1976) Fertil Steril27, 699-701.

19. Eicher, E. M., Cherry, M. & Flaherty, L. (1978) Mol Gen. Ge-net. 158, 225-228.

20. Talmadge, K. & Gilbert, W. (1980) Gene 12, 235-241.21. Casadaban, M. J. & Cohen, S. N. (1980) J. Mol Biol. 138, 179-

207.22. Markham, A. F., Edge, M. D., Atkinson, T. C., Greene, A. R.,

Heathcliffe, G. R., Newton, C. R. & Scanlon, D. (1980) NucleicAcids Res. 8, 5193-5205.

23. Woods, D. E., Markham, A. F., Ricker, A. T., Goldberger, G.& Colten, H. R. (1982) Proc. Natl Acad. Sci. USA 79, 5661-5665.

24. Birnboim, H. C. & Doly, J. (1979) Nucleic Acids Res. 7, 1513-1523.

25. Maxam, A. M. & Gilbert, W. (1980) Methods Enzymol 65, 499-560.

26. Southern, E. M. (1975) J. Mol Biol 98, 503-517.27. Wallace, R. B., Shaffer, J., Murphy, R. F., Bonner, J., Hirose,

T. & Itakura, K. (1979) Nucleic Acids Res. 6, 3543-3556.28. Setzer, D. R., McGrogan, M., Nunberg, J. H. & Schimke, R. T.

(1980) Cell 22, 361-370.29. Kakidani, H., Furutani, Y., Takahashi, H., Noda, M., Morimoto,

Y., Hirose, T., Asai, M., Inayama, S., Nakanishi, S. & Numa, S.(1982) Nature (London) 298, 245-249.

30. Buehner, M., Ford, G. C., Moras, D., Olsen, K. W. & Ross-mann, M. G. (1973) Proc. Natl Acad. Sci. USA 70, 3052-3054.

31. Hollis, G. F., Hieter, P. A., McBride, 0. W., Sevan, D. &Leder, P. (1982) Nature (London) 296, 321-325.

32. Ruddle, F. H. (1981) Nature (London) 294, 115-120.33. Leder, A., Swan, D., Ruddle, F., D'Eustachio, P. & Leder, P.

(1981) Nature (London) 293, 196-200.34. Su, T. A., Bock, H. G. O., O'Brien, W. E. & Beaudet, A. L.

(1981)J. BwL Chem. 256, 11826-11831.35. Daiger, S. P., Wildin, R. S. & Su, T. S. (1982) Nature (London)

298, 682-684.36. Ohno, S. (1973) Nature (London) 244, 259-262.37. Hitzeman, R. A., Clarke, L. & Carbon, J. (1980) J. Biol Chem.

255, 12073-12080.38. Gilbert, W. (1978) Nature (London) 271, 501.39. Persico, M. G., Toniolo, D., Nobile, C., D'Urso, M. & Luzzatto,

L. (1981) Nature (London) 294, 778-780.40. Brennaud, J., Chinault, A. C., Konecki, D. S., Melton, D. W.

& Caskey, C. T. (1982) Proc. Nati Acad. Sci. USA 79, 1950-1954.41. Wolf, S. F., Mareni, C. E. & Migeon, B. R. (1980) Cell 21, 95-

102.42. Kunkel, L. M., Tantravahi, U., Eisenhard, M. & Latt, S. A.

(1982) Nucleic Acids Res. 10, 1557-1578.43. Riggs, A. D. (1975) Cytogenet Cell Genet. 14, 9-25.44. Mohandas, T., Sparkes, R. S. & Shapiro, L. J. (1981) Science 211,

393-396.45. Venolia, L., Gartler, S. M., Wassman, E. R., Yen, P., Mohan-

das, T. & Shapiro, L. J. (1982) Proc. Nati Acad. Sci. USA 79,2352-2354.

476 Genetics: Michelson et aL

Dow

nloa

ded

by g

uest

on

Aug

ust 3

1, 2

020