7
,- - \' GENOMICS 17, 456-462 (1993) Isochores and CpG Islands in YAC Contigs in Human Xq26.1-qter G,USEPPE PillA, * RANDALL D. LITTLE, * BRAHIM A'ISSANI, t GIORGIO BERNARDI, t AND DAVID SCHLESSINGER*,1 •Department of Molecular Microbiology, Washington University School of Medicine St. Louis Missouri 63110' and tLaboratoire de Genetique Moleculaire, Institut Jacques Monod, 2, Place Jus;ieu, 75005 Paris, France ' Received January 28, 1993; revised May 8. 1993 GC levels were assessed at 37 loci across 30 Mb of Xq26.1-qter, a region physically mapped in overlap- ping yeast artificial chromosome clones. In 8 Mb of R band Xq26, GC is relatively high (up to 44%) in the proximal·j Mb and relatively low (40-·11 %) in the dis- tal4 Mb. Consistently low GC values (38-41 %) nrc ob- served in G band Xq27. In contrast, further toward the telomere in Xq28, the GC level rises progressively to reach 52% at 2 to 4 Mb from the end of the chromosome; this region is delimited by low GC loci. Across these regions of Xq, the content of rare-cutter restriction ell: zyme sites containing CpG, including "CpG islands" in the most completely mapped Xq26-27.1 region, is correlated with GC level. Isochore mnpping cnn thus provide one index of putative gene cont.ent across mapped regions. @,:J 19U:1 Acnc.Jf'lItic Pre"". Inc. INTRODucnON The human genome, like t.hose of warm-blooded ver- tebrates in general, is a mosaic of large DNA regions (>:100 kb on average), the isochores, which are remark- ably homogeneous in composit ion and belong to a small number of families charact.erized by difrerent GC levels (Bernardi ct at., ID85; Bernardi, I(89). In human DNA, two GC-poor isochore families, L1 and L2, and three GC-rich isochore families, HI, H2, and H3, span a range of GC from 30 to 60% (Bernardi, 1989). The Ll and L2 families together represent more than 60% of the ge- nome, whereas the HI, H2, and H3 families correspond to 22, 9 and 5%, respectively. The distribut.ion of genes in isochore families is strik- ingly nonuniform (Bernardi et at., 1£)85), gene concen- tration in H3 beilW at least 8 times higher than that in HI and H2 and at least 16 times higher than that in Ll and 1..2 (Mouchiroud et at., 1991). Because the GC con- centration gradient across isochore families parallels the concentration of genes, isochore maps arc especially sig- I To whom correl;pondence should be addressed at Wal;hinbT1.on Uni- versity School or Medicine, Department or MhlecuJar Microbiology, 660 South Euclid Avenue, Box 82:.10. Sl. Louis, MO 63110. Telephone: (314)362.2744. Fax: (314)362-3203. nificant; they have a functional as well as structural rele- vance and their correlation with cytogenetic bands is of particular in teres!.. Early studies of metaphase chromosomes indicated that bands staining with Giemsa under standard condi- tions (G bands) contain GC-poor DNA whereas reverse (R) bands contain GC-rich DNA (Bernardi, 1989). There is, however, no overall correlation with isochores, at least for R bands, as shown by the fact that the ratio of Giemsa to reverse bands is 1:1, whereas that of GC-poor to GC-rich isochores is 2:1 (Bernardi, 1989; Gardiner et at., 1990). A clarification of t.he relation between isochores and chromosomal bands or subbands as well liS a gene den- sity map can be ohtained by constructing "composi- tional" or isochore maps. To find out where isochores belonging to diCferent families lie along human chromo- somes, single-copy sequences that arc already localized on a physical map (at least to chromosomal subregions) can be hybridized to human DNA fractionat.ed according to GG cont.ent and scored to construct an isochore map (Bernardi, 1989). Compositional mapping of the long arm of human chromosome 21 (Gardiner et at., 1990) has provided in- formation about GC levels for 53 loci t.hat had been re- gionally localized. Pract.ically, only GC-poor isochores were detected with probes from G bands, and the most GC-rich isochores were found in the telomeric R band; some low GC regions present along wit.h high GC zones were observed in R bands, and could be due to "thin" G bands (which can be observed at high resolution (Yunis, 1981) or could instead be intrinsic to R bands. Moreover, recent work has shown that the most GC-rich isochore family (H3) i.n general corresponds (DeSario ct at., 1991; Saccone et at., 1992) t.o the telomeric ('1') bands (Dutril- laux, 1973; Ambros and Sumner, 1987). Instead of using probes with only regional localiza- tion, a potentially more informative isochore map can be obtained by examining probes with known location within a yeast artificial chromosome (YAC; Burke et at., 1987) contig. Here we have used probes from contigs spanning 30 Mb in Xq26.1-qter (Schlessinger et aL., 1991), a region corresponding to several cytogenetic bands. The result is a detailed correlation of isochores 0M8·754:1/9:1 $5.00 C:"pyrighl «;1 J!J!J3 hy Acm-lemic PrcM, JIIC. All righls or rcproduction in any rorrn rc.crved. 456

Isochores and CpG Islands in YAC Contigs in Human … · and tLaboratoire de Genetique Moleculaire, Institut Jacques Monod, 2, Place Jus;ieu, 75005 Paris, France ' Received January

Embed Size (px)

Citation preview

,-

-\'

GENOMICS 17, 456-462 (1993)

Isochores and CpG Islands in YAC Contigs in Human Xq26.1-qter

G,USEPPE PillA, * RANDALL D. LITTLE, * BRAHIM A'ISSANI, t GIORGIO BERNARDI, t AND DAVID SCHLESSINGER*,1

•Department of Molecular Microbiology, Washington University School of Medicine St. Louis Missouri 63110'and tLaboratoire de Genetique Moleculaire, Institut Jacques Monod, 2, Place Jus;ieu, 75005 Paris, France '

Received January 28, 1993; revised May 8. 1993

GC levels were assessed at 37 loci across 30 Mb ofXq26.1-qter, a region physically mapped in overlap­ping yeast artificial chromosome clones. In 8 Mb of Rband Xq26, GC is relatively high (up to 44%) in theproximal·j Mb and relatively low (40-·11 %) in the dis­tal4 Mb. Consistently low GC values (38-41 %) nrc ob­served in G band Xq27. In contrast, further toward thetelomere in Xq28, the GC level rises progressively toreach 52% at 2 to 4 Mb from the end of the chromosome;this region is delimited by low GC loci. Across theseregions of Xq, the content of rare-cutter restriction ell:zyme sites containing CpG, including "CpG islands" inthe most completely mapped Xq26-27.1 region, iscorrelated with GC level. Isochore mnpping cnn thusprovide one index of putative gene cont.ent acrossmapped regions. @,:J 19U:1 Acnc.Jf'lItic Pre"". Inc.

INTRODucnON

The human genome, like t.hose of warm-blooded ver­tebrates in general, is a mosaic of large DNA regions(>:100 kb on average), the isochores, which are remark­ably homogeneous in composit ion and belong to a smallnumber of families charact.erized by difrerent GC levels(Bernardi ct at., ID85; Bernardi, I(89). In human DNA,two GC-poor isochore families, L1 and L2, and threeGC-rich isochore families, HI, H2, and H3, span a rangeof GC from 30 to 60% (Bernardi, 1989). The Ll and L2families together represent more than 60% of the ge­nome, whereas the HI, H2, and H3 families correspondto 22, 9 and 5%, respectively.

The distribut.ion of genes in isochore families is strik­ingly nonuniform (Bernardi et at., 1£)85), gene concen­tration in H3 beilW at least 8 times higher than that inHI and H2 and at least 16 times higher than that in Lland 1..2 (Mouchiroud et at., 1991). Because the GC con­centration gradient across isochore families parallels theconcentration of genes, isochore maps arc especially sig-

I To whom correl;pondence should be addressed at Wal;hinbT1.on Uni­versity School or Medicine, Department or MhlecuJar Microbiology,660 South Euclid Avenue, Box 82:.10. Sl. Louis, MO 63110. Telephone:(314)362.2744. Fax: (314)362-3203.

nificant; they have a functional as well as structural rele­vance and their correlation with cytogenetic bands is ofparticular in teres!..

Early studies of metaphase chromosomes indicatedthat bands staining with Giemsa under standard condi­tions (G bands) contain GC-poor DNA whereas reverse(R) bands contain GC-rich DNA (Bernardi, 1989).There is, however, no overall correlation with isochores,at least for R bands, as shown by the fact that the ratio ofGiemsa to reverse bands is 1:1, whereas that of GC-poorto GC-rich isochores is 2:1 (Bernardi, 1989; Gardiner etat., 1990).

A clarification of t.he relation between isochores andchromosomal bands or subbands as well liS a gene den­sity map can be ohtained by constructing "composi­tional" or isochore maps. To find out where isochoresbelonging to diCferent families lie along human chromo­somes, single-copy sequences that arc already localizedon a physical map (at least to chromosomal subregions)can be hybridized to human DNA fractionat.ed accordingto GG cont.ent and scored to construct an isochore map(Bernardi, 1989).

Compositional mapping of the long arm of humanchromosome 21 (Gardiner et at., 1990) has provided in­formation about GC levels for 53 loci t.hat had been re­gionally localized. Pract.ically, only GC-poor isochoreswere detected with probes from G bands, and the mostGC-rich isochores were found in the telomeric R band;some low GC regions present along wit.h high GC zoneswere observed in R bands, and could be due to "thin" Gbands (which can be observed at high resolution (Yunis,1981) or could instead be intrinsic to R bands. Moreover,recent work has shown that the most GC-rich isochorefamily (H3) i.n general corresponds (DeSario ct at., 1991;Saccone et at., 1992) t.o the telomeric ('1') bands (Dutril­laux, 1973; Ambros and Sumner, 1987).

Instead of using probes with only regional localiza­tion, a potentially more informative isochore map can beobtained by examining probes with known locationwithin a yeast artificial chromosome (YAC; Burke et at.,1987) contig. Here we have used probes from contigsspanning 30 Mb in Xq26.1-qter (Schlessinger et aL.,1991), a region corresponding to several cytogeneticbands. The result is a detailed correlation of isochores

0M8·754:1/9:1 $5.00C:"pyrighl «;1 J!J!J3 hy Acm-lemic PrcM, JIIC.All righls or rcproduction in any rorrn rc.crved.

456

ISOCHOHES AND GC CONTENT IN HUMAN Xq26.1-qter

2 3 4 5 6 7and bands that is further related to the distribution ofrare-cutter restriction enzyme sites and CpG islands.

MATERIALS AND METHODS

"

3.0Kb- ..0" _

I457 I

I,8 9 10 11 12

IiWf.•-3.0Kb

;

Clone.~ and prob('.~. Except for two YACs selected from the CGMlibrary of clones from total human DNA (Brownstein et 01., 1989; seelegend to Fig. 4), YACs were from the X3000.11 hybrid cell, whichcontains Xq24-qter DNA as its only content of human DNA (Nuss­baum el aI., 1986). Probes for all the loci tested for GC content arelisted in Davies f!t 01. (1991) and the Genome Database (GDB) andwere kindly supplied by the laboratories of origin or ATCC. Someadditional clones from the ends of YACs in Xq2G were also tested forGC content (Fig. 2); they are reported in Little et 01. (1992) and in theGenome Database. Five additional end-clones of YACs, pol 77L,p477R, p228R, poll4R, p539R, and p512L, from contigs in Xq27 arereported in detail elsewhere (Zucchi e/ 01., in preparation). The probe"AUI''' in Fil{. 2 is the eDNA for the actin binding protein reportl'd byGorlin et 01. (l990). The order of the probes in YAC-based maps acrossthe region Xq2G-q28 has been puhlished (Schlessinger et aI., 1991).and that in Xq26 is further delniled (Litt.le el al., 1992).

lJl'tf!rmil/atiol/ of C;C wl/lel/l in ;,.oc!wrl's col//a;I/;l/g prohes. Totalhuman DNA and fractions of dirfl'C('nl GC content (described by Gar·dinl'r et al.• 1990; for ot her I'xamples of DNA fractionation see Aissaniund Bernardi. 1!)91; DeSario el al., 1991) were digested with EcoHI,electrophoret<ed, and transferred to II Hyhond N+ membrane (A mer­5ham). Each prolJe was then Inheled and tested for hybridization tothe arrayed DNA frndions (Gardiner ct al., 1990).

Rarc·cutter re.•lriclion mapping of YACs, Hare-cutter restrictionmapping was carried out using the techniques of total and partial en­zyme digestion, electrophoresis, transfer uf DNA fragments to Nytennnylon filters, and detection of fragments with left (L) or right (R) YACvector arms to infer the location of successive sites by indirect end-laohel mapping, as in Burke f!l ai, (1987). Electrophoresis was corried outin 8 CHEF mapper (Bio-Had).

RESULTS

Di..;lribulion of GC conlenl and isochores across Xq26­<Iler. To construct a compositional map for Xq26-qter,cloned DNAs mapped at various points in the contigprofile (Schlessinger el al., 1991; Litlle el ai., 1992; Da­vies el ai., 1991) were used as probes in Southern blothybridization against DNA fractions separated by pre­parative centrifugation in Cs2SO./llAMD density gra­dients (Bernardi el al., 1985; Bernardi, 1989). This ap­proach establishes the GC level of 100-200 kb surround­ing a probed landmark .•

A sample analYliis for"'a Fncl.or VII I gene probe isshown in Fig. I. and GC level is plotted for 37 probesalon~ Xq in Fig. 2. In 'lssessing the rcsults, OIlC mustaccollnt for the fact. that. for some probes the GC value issharply dclimited to one compnrtmcnt of percent.ngeGC. whercas for others, the percentage GC is spread overa range of values. The reason for this is that'probes mayfall either in a region of very constant GC environmentor in a region with some local variation. As a result, re­gions of higher or lower GC may be close enough to aprobe in genomic DNA to be variably included in therandom fragments that are separated by density. The

FIG. 1. Isochorc mapping of a Factor VIII gene probe. The probe(114.12) was labeled and tested for hybridization to total human DNAand DNA subfractions of different GC content, as described underMaterials and Methods and in Gardiner et 01. (1990). Lanes 1 and 12,results with total human DNA. Lanes 2-11, results with isochorefractions obtained by preparative ultracentrifugation in Cs2SO./BAMD at a ligand/nucleotide ,molar ratio R, = 0.14; the modal GCcontent of the major component in successive fractions was 38. 40, 41,43,44, 4G, ~8, 51, 52, and 54% (as in Gardiner et 01.,1990, and Fig. 2).

fractionations of DNA by density are, however, alwaysquite sharp, and probes at 38-40% and others at 41 %, forexample, are distinctly nonoverlapping in their values.

The localization of probes in cytogenetic bands isbased on standard assignments (Davies el al., 1991). Theborders of bands are approximate, estimated from theprobe assignments, from in situ hybridization experi­ments with YACs (Montanaro et aI., 1991), and fromobservations of X chromosomes containing transloca­tions [or lacking the terminal portion of the chromo­some in cells derived from individuals affected by theFragile X syndrome (Warren el al., 1990)]. The mainresults in Fig. 2 can be summarized as follows:

(1) Extensive portions of DNA within cytogeneticbands have ,relatively constant GC levels.

(2) Xq27, a G band expected to have lower GC con­tent, indeed has an overall GC content lower than that ofthe R band Xq2G. However, there are only weak changesin GC values near the borders ofbands defined by cytoge­netic staining methods (for example, distal to DXS403),and no changes that might correlate with the subbandsXq26.2 and Xq27.2. Also, comparable GC values wereobserved across considerable regions of adjacent cytoge­netic bands. For example, although a rather wide regioncentered in the putative region of Xq26.3 is in the dis­tinct.ly higher 41-44% GC range, at least the more tela­meric portion of Xq26 is in the lower 40-41 % GC range,similar to many isochores in Xq27.

(3) The GC level in Xq28 increases toward the telo­mere. The steep rise reaches a peak value well auove 50%GC about :3 Mb from the telomere. The gradient is mostmarl<ed disl.al 1.0 the two local.ions of thc probe forDXS52 and is comparable to that shown earlier (Gar­diner el ai., 19nO) in band q22.3 of chromosome 21. Theaccentuated most GC-rich peak around the color visionlocus (CV in Fig. 2) in the subtelomeric region of Xq28does not, however, continue to increase monotonicallyto the telomere, but is flanked by discontinuities in GClevel. In the most distal 1.6 Mb, a much lower GC con­tent is seen in the region around Factor VIII. No uniquesequences are available in probes closer to the telomere,

'"~

,;~u',''''i ''.

.. '

"

.,·i

iii!,'II;"

i!:;

:i,I,

. /

PILlA ET AL.

Xq26.1 Xq26.2 Xq26.3 Xq27.1 Xq27.2 Xq27.3 Xq28 lei

,~~I •

.1

L..----DL...-__-=-L_---~oI

I

I

II

I

II

I

I

II

N;;NIn" In aVHf) en Q. 0-XX)( > co CI000 u'" CJII I I I I. .

c ~

n nII) II)

" "o 0

II)

9

1•"..0:..

n~.,'".."""w"00II

~.,nII)

"o

~"' ..~ .,-0: "w,...<><% 00I II

54 -

52 -

50 -

48 -

G+C% 46 -

44 -

42 -

40 -

38 -

30 20Mb

10 o

FIG. 2. Plnc('mcnt of:l7 pro),('s alon/: a schcmntic of (:yl.ll/:cnctic lmnds Xq2G.l-qLcr (!'cc tcxt). Values for probes nnd loci indicated ahovethe har nrc plotted ncc<Jrdin/: 10 tlll·ir (;C contcnl (verLical scnle), with the lenl:th of II filled unr indicatinl: the rllnl:e of GC vnlues ohserved inexperim('nts like t hose of 1"il:. 1. Distllnces arc shown frol1l the telomere (uottom scale). The positions of additional reference loci ore shownhelow thl' har.

but I.hat. zone contains at least short sequences with aGC content. as high as 85% (Brown cl al., 1990).

Rare-culler mapping and CpG islands in Xq26. J­q27. J. The analysis has been extended to higher resolu­I,ion for the region spanning Xq26.1-q27 (Fig. 3 and 4) inoverlapping YACs (Little el at., 1992). Rare-cutter re­st rict ion enzyme mLlpping of YACs was first used to ver­ify physical distances and to est.imate the number ofsites for some enzymes Ihat cont.ain CpG in their recog­nition sit.es and t.hereby include potential locations ofCpG islands at gene loci (with islands defined accordingt.o Bird, 1987; Gardiner-Garden and Frommer, 1987).Such analyses are unamhil..,'uous in YACs because t.heabsence of methylation in yeast reveals enzyme sitesthat may be block~d by methylat.ion in uncloned humanDNA. For t.his ph'ase of t.he analysis, three rare-cutterenzymes that contain CpG dinucleotides were used, buttwo are not particularly associated with the CpG islandst.hat define many potential gene loci (Bird, 1987). Theywould provide an index of sites that might be expected toincrease in number with increasing GC level, but wouldnot show any bias toward sites of CpO islands. The otherenzyme, NotI, would give some estimate of the relativelocations of CpG islands, since essentially every NotIsite is located near an active gene (Bird, 1987).

The region ofXq26.1-q27.1, from the centromeric endof yWXD457 to the telomeric end of yWXDG36, con-

tained 64 sites for Noll, MluI, or NruI (Fig. 3). The pat­t.ern of cleavage sites was self-consistent in all the YACsof the published contig (Little cl al.• 1992), and the rare­cut.ter map with those and other enzymes yielded thesame estimate for the overall distance that had earlierbeen estimated from t.he probe content of the set of over­lapping YACs: about 7.5-8 Mb, or 0.25% of the genome,is included in the region.

As anticipated for a correlation of GC content withrare-cutter sites containing CpG dinucleot.ides, the cen­tromere proximal 4 Mb, from p299R to just beyondp476L in Fig. 2, which is relatively enriched for GC con­tent, is also enriched for rare-cutter sites for the threeenzymes (Fig. 3): 11 of 15 Noll; 23 of 25 MluI; and 18 of24 NruI sites. The comparable enrichment of NotI sitesindicated that CpG islands might show a similar correla­tion.

The mapping was ext.ended to give a further assess­ment of the internal consistency of YAC structure andto estimate the number and location of potential CpGislands. For this purpose, four additional enzymes thatarc very often associated with CpG islands were chosen;t.hree of them contain only G and C in their recognitionsites (Bird, 1987; Bickmore and Bird, 1992). Figure 4includes data with the additional enzymes SociI, BssHII,SfiI, and EagI across two sample regions, 2.1 Mb aroundthe HPRT gene and 1.2 Mb around Factor IX. In bothcases, YACs were derived from two different sources of

ISOCHORES AND GC CONTENT IN HUMAN Xq26.1-qter 459

o Mb 2 :> " T

a: ..J W ..J.. ... ." " N C')

a: a: .. ..J f- «-I<II<CII..J:o,...lIlr ~ -'(["J .J ..J II ..J.J "a: a: • = " a: ;; 0 0 ~ ~;ti~~~t: ..J..... ... ..U) ~~~

Cl 0 .." - Cl ;;, Cl N ;;, ClIII a: ~ ~~~~ ~~ lI)jj)~ ~~ ,,- III III C').... .. N .. ... ...... Ill ..... .. • W ~GN"lIltlllON" C')

"N " N o. r-tM",.,mnn)()()( Mwt )(0lC)1fI .. C') C') "" ".. " . " . " 0( " ci::;1ci::~~~tg ~:. Cl.o. 0 d X •••••• 000000. o .QQ . • • ". 0" " 0 d 0 d 0 d 0 d, ! , '))11 ) ))( , " , ,. !

I I I , I , I , , If I If tC' , I

N .... II .. .. .. II N .. ..Nr IIMI_IIIIM "Nr"MM MNr IIlCNMIlMNNrNMM H 101 H HM H \I NrM NrMN NrHII Nr Nr 101 NrNrM Nr, If! (( I , , tCl I I ),rerYc tC. " ), I If"- ! I , I (

[ ] [NrNlH H H

)(

]FIG. 3. Rare-cutter restriction maps or the Xq26.1-q27.1 region. The locations or genes and probes ror loci are as in Fig. 2. Sites ror the

rare-cuUer enzymes Mlul (M), Nrul (Nr), and NOLI (N) were inrerred rrom pulsed-field gel mapping orYAC DNAs (Burke et 01.,1987). Bracketsindicate the rej;iom; analyzed rurther ror Fig. 4.

DNA (see Fig. 4 legend and Little el at., 1992); and wher­ever the c10lles overlap, the same rest.riction sites arcobserved at. t.he same distances wit.hin experimentalerror.

Based on these data and the criterion that sites for at.least. t.hree oft.he seven enzymes coincide wit.hin the reso-

lution of the pulsed-field gels, 20 CpG islands were as­signed to the two subregions (these remain potentialCpG islands, since there is no evidence about the level ofJll9t.hylat.ion or expression of corresponding genes incells; see Bickmore and Bird, ID92). The results are inagreement wit.h the expectations (Bird, 1987) that Noll

A ! U j j ! ! !Hr101

M Nr NSe 8 M N 101 8

5 E Se 88 8 Se 5e8 8 8 S SeM Nr 8 S. Nr 58 SS S E 1018 8 S S5e Se 5e 5e 5

Nr SS 101 SHr E M S.M5E SeME 8 Hr EM Nr MEE EE SSeS 5 E EHr E ESe5cME 5 E 8 E II E E 5 BEi, II ,.e I If I de , , ! , :w: )If I ! ! I ! I! III I I I , I I I If

a457R tf'RT

457 563

299 325 382

458 837 290

100 kb931 893

398

B CpG

SeNr 8 5eE E E

Se8 M 8 5 E SNr S 8 85 N,S 58 8NrN S8I I , , d II ...,' I)' II

p446R F9 p311L

..123446 .~

930

100 kb 205

311

! ! CpG

8SeE BN S

B5 N' 8II

Ip342R

342

FIG. 4. CpG island~ rrom rare-cutler sit.e coincidenc~ in YACs spanning two subregions or Xq26.1-q27.1. YACs arc named by yWXDaccession numhers. yWXD931 (containing the HPRT gene; subcont.ig A) and yWXD930 (containing the F9 gene; subcontig 13) were rrom theCGM library; all other YACs were derived rrom X3000.11 cells (sec Materials and Methods). Locations orCpG islands (indicated by arrows onthe levellnheled epG) were scored by the ncar coincidence or rare-cutter sites ror nt least three orthe seven enzymes used, usually among Notl(N), /J.•.•HII (13), J~ngl (I~), Sncll (Sc), ano Sfil (S); Nrul (Nr) and Mlul (M) arc also shown as in Fig. 3. Three or the CpG islands, marked byasterisks, were previously described (on eil her side or F9, by Nb'Uyen ct aI., 1987, and proximal to HPHT. by wolr ano Migeon, 1985). yWXD893is a chimeric YAC, indicated by the wavy line at its cocloned end (Little et 01., 1992). Filled hlack bars indicat.e the extent of the F9 gene and theMlul-Mlul rral:menl thal cont.ains the HPRT gene (Huxley et 01.,1991).

'/460

5.7 Mb-

4.6 Mb-

3.5 Mb - ..','.

2

"1

PILlA ET AL.

of DNA that had been digested, fractionated by pulsed­field gel electrophoresis, and probed with DNA for thelocus DXS102 (Fig. 4, lane 1). The same probe detected a3.5-Mb MluI fragment (Fig. 4, lane 2) and was also seenin an expected 400-kb NruI fragment (data not shown).Further studies would be required to determine whichsites are selectively met.hylated.

FIG.:;. l'ulsl'lI-lil·ill/:el t'lecl.((.phorctic analysis of the rare-cutlerfragments of human BeLa cell DNA thut contuin the prohe scquencefor the DXSIO:llocus {Davies el 01.,1991 l. DNA was di/:ested to COnT­

plet ion wil h each of se\'eral enzymes and electrophoresed in parallelwith Sc"i<(j.~al·charomyCl!sp<lnlb.. chromosomes as a size murker. Elec­t.rophorl'sis wus c;Hricd oul in TAE bUffer und 0_8% al:urose at. 14°Con a CHEF mapper (Bio·It;1(1) wilh Ihe lIH1nuf;'c1urer's rel:imen t.os,'parall' DNA of ·100 '" liOIlO kh alld markers illt'1ucJin/: S. poml'" chro­mosomes lIlId the I;H/:esl (I.G Mh) SIlt'cllIlrtJII1-"Ct"~ccr..uisiw' chromo­sOllie. DNA was Ihen transferred to u Nyt.ran nylon J11emLJmne undlested for hyhridization to a mdiolahelcd probe for DXS102. The.correspondinl: restriction fragmenls are the largesl found for t.his re­/:ion for Noll (Inne 1) llnd MluI (lane 2).

sites usually fall into CpG islands (in these regions, 4 of4), and that SacII, BssHII, and EagI are also indicatorsof choice for CpG islands (Bird, 1987). Overall, 17 is­lands 0/120 kb) were fOllnd in the GC-rich proximaland 3 (1/360 kb) in the GC-poorer distal portion. Thus,in agreement with A'issani and Bernardi 09!Jl), CpGislands correlated as well with GC content as did thetotal content of MluI, Nrul, and NotI sites.

The data from overlapping YACs can also be com­pared to results with digests of uncloned human DNA.According to Fig. 3, probe sequences should be found inrestriction fragments ofpredict.able size. The map has ingeneral been verified for the regions tested. For example,in the region around Factor IX, t.he enzyme sites indi­cated are in agreement with t hose detected for unclonedDNA from another .human donor (Nguyen el ai., 1987),

Some fragments were larger than expected, consistentwith the expected DNA methylation at specific sites invarious cultured cells (as shown, for example, forXq27.3-qter in the st.udy of Poust.ka et at., 1991, and forthe HPRT gene region by Wolf and Migeon, 1985). Thelargest NotI and MluI fragments detected from the re­gion are shown in Fig. 5, as seen in control DNA ex­tracted from BeLa cells. The largest NotI fragment ex­pected from the region, about 2.6 Mb from DXS99through Factor IX to the mcf2 gene, was included in aneven larger fragment of about -1 Mb seen in hybridization

"

1.6 Mb-

...

'~;.:~

DISCUSSION

Detection of high-GC regions in mapped portions of thegenome. In an analysis of the long arm of chromosome21 (Gardiner et al., 1990),63% of the telomeric band 22.3was accounted for in terms of composition, but muchlower fractions (about 10%) of other bands were ana­lyzed, and probes were only regionally ordered. In thisstudy, in contrast, only 10 to 20% of the Xq26-qter re­gion was accounted for compositionally, but the order ofthe landmarks tested and the distances between themwere more precise. The present rC5ults with probes froma contig-based map of distal Xq agree with the results forchromosome 21 in three important respects:

(1) G bands contained GC-poor isochores.(2) R bands were heterogeneous in GC level, ranging

from portions as low as G-bands to more moderatelevels.. (3) DNA of the highest average GC content was local­

izcd near the telomere. This general result is not surpris­ing, since the most GC-rich DNA has been shown tocorrespond to T bands (DeSario et at., 1991; Saccone etat., 1992). The observed most GC-rich region in Xq28 isremarkable, however, in the sense that it docs not corre­spond to a T band (Dutrillaux, 1973; Ambros andSumner, 1987) nor is it labeled with H3 DNA as a probeby in situ hybridizat.ion (Saccone et ai., l!J92). This islikely due to the averaging of hybridizing signals over0.5% 05 Mb) stretches of DNA (Saccone et at., 1992).Thus, the small proportion of very GC-rich DNA inXq28 is apparently revealed with greater sensitivity bythe direct isochore analysis, The current analysis hasemployed probes only at intervals of about a millionbasepairs, and each probe defines the GC level of about200 kb of surrounding DNA. Thus, the analysis could berefined much further, to the level of neighboring iso­chores, by studying many additional probes (as in theanalyses of the CFTR gene region by Krane et al., 1991,and of the DMD region by Bettecken et al., 1992). In anycase, the results indicate that such T bands may prove t.obe even more widespread than has been suggested bycytogenetic analysis (Dutrillaux, 1973; Ambros andSumner, 1987).

Incidence of CpG islands and gencs across Xq26-qtcr:Some implications. No assay of gene content has beencarried out, but as discussed below, the results are in fullaccord with the relative distribution of CpG islands andgenes expected from isochore analyses (Bernardi et at.,1985; Bernardi, 1!J89; MOtlchiroud et al., 1!J91; Ai'ssani

ISOCHORES AND GC CONTENT IN HUMAN Xq2G.I-4ter 461

lind Bernardi, 1991) and available pulsed-field gel elec­rrophoretic analyses.

The results indicate a correlation among the contentsIlr C;C, of rare-cutter sites containing CpG dinucleotides,"nel ofCpG islands. The distal port.ion ofXq26, which is'low in GC (40.5%), contained fewer rare-cutter sit.es init:' -t Mb compared to the more proximal 4 Mb, which ishigher in GC (43%). For example, summing the resultsfor Notl, MiuI, and NruI, a total of 43 of 57 sites, orabout threefold more, were in the higher GC half of therl·gion. In the test subregions analyzed further with ad­ditional rare-cutt.er enzymes, the more GC-rich zoneagain contained a density of CpG islands about threefoldgreater (1/120 kb compared to 1/360 kb).

Published and ongoing work shows that Xq28 exhibits.. trend in GC level comparable to that in CpG islandcontent.. Proximal portions around IDS (Palmieri et at.,1992) and DXS304 (Palmieri et ai., 1!J93), which containluwer GC, have a CpG island roughly every 250 kb,whereas in distal Xq, in the region of high GC (52%)between color vision and G6PD, t.here arc up to eightfoldmore CpG islands, with one every :30 t.o 50 kb (Palmieril'l at., 1992; unpublished dat.a). These results arc consis­tent wit.h the observations of Poustka et at. (1992) andDietrich et at. (1992) that rare-cutter sites cluster in aregion corresponding t.o the zone of highest GC as shownin Fig. 2. Also consist.ent with this inference is a study ofcosmids from Xq24-q28 containing CpG islands. In thatwork, 56 such cosmids came from Xq28 and only 6 fromXq24-q27 (Kaneko et ai., 1992), and Maestrini et ai.(1990) report. a correspondingly high yield of cloned CpGislands from Xq28.

An index of the levels of CpG islands based on the GCcontent of a region is no substitute for the more incisivedetermination of CpG islands by restriction mapping,which also localizes the genes. Also, as shown by thecomparisons of Figs. 2 and 3, the observed increase inCpG island content was much more than proportional toincreases in GC. However, a preliminary characteriza­tion of possible CpG island density across a region canbe achieved with a small set of hybridization blots liket.hose in Fig. I, whereas the rare-cutter analyses of Fig. 4require about 70 pulsed-field gel analyses. In addition,GC level is a particularly strong index for regions with\'l'r~' hi~h len'l", t1 f C'pG i",l<lnd~. like Ihe one obserwd in\~:~: ..;~ Xl.::.:'. 1:: ::: ..:: :-t=~;t~::· :4~f .=-:;z:::-:: GC :~ ...~!= ~Cl:';I?:.

:,pt1nd tt) H~; i:'Qcot1rt':,'-\\"olCh na\'('-iJe",n 8:'':;('':;':;(:0 to cor.­lain about ~ of hUJlllln genes in Ilbout ] 00 Mb of DNA(Mouchiroud et 01., 1991). Based on current estimates ofabout 50-100,000 human genes, 2 Mb of DNA surround­ing the color vision locus might then encode hundreds ofgenes-a concentration that could approach the genedensity now established for the nematode Cae/z.orhabdi­tis clcgans (Sulston et at., 1992).

The results arc in accord with the (indint:s of geneticlinkage of a large number of inherited diseases to Xq28and distallo DXS52 for most of those disease genes thathave been well·mapped (Davies et aI., 1991; Fig. 2).Those findings hllve bccn puz:dil1l:, for it seemcd improb-

able that genes likely to be modified in inherited diseasest.ates would somehow cluster near the telomere. In­st.ead, the clustering can be simply related to the higherGC content, with its accompanying higher gene content,in the region. Both the general gene content and thepotential medical interest make YACs from the high GCregion a priority choice for long-range sequencing, and Tbands on other chromosomes are presumptively attrac­tive material for early sequence analysis. We have begunsystematic genomic sequencing in the Xq28 region andhave already confirmed, in agreement with the presentresults, that the GC content in the 30 kb around theG6PD gene is 57% (Chen et at., 1991). The number ofother genes and the GC content of the neighboring re­gion have not yet been studied in detail, but two othergenes, 3.5 (Toniolo et ai., 1988) and 4 kb long (Alcalayand Toniolo, !!J88), have been reported thus far in thesame zone; consist.ent with expectation, they are 61­62% CC.

One may wonder whether regional levels of GC con­tent and gene-associated CpG islands are correlated bythe 'lcl.ion of a common mechanism durin~ evolution.Ais!;ani and Bernan.li (1 DU1) have considered some spec­ulative possibilities. It may be relevant that the veryhigh GC levels and gene concentrations at a number oftelomeres [or at intercalary T bands resulting from telo­mere fusion during evolution (Dutrillaux, 1973)] are par­alleled by high transcriptional and recombinational ac­tivity (Bernardi, 1989, 1992; Rynditch et ai., 1991). Telo­meres may localize at points along the nuclearmembrane (Cremer et at., 1982), and t.ranscription andprocessing ha~e been suggested to occur in ort:aneJle-likeent.it.ies ncar nuclear pores (Newport and Forbes, 1987).Thus, by a mechanism that is still poorly defined, cluo­mosomal activity that is generally promoted by localizedregions of high GC and CpG island content might beaugmented Ilear telomeres.

ACKNOWLEDGMENTS

We thank Eric Green and our laboratory colleagues for discussionsand comments on the manuscript. This study was supported by NIHGrant HG00201 and by grants from the Association Contre les Myopa­lhies IAFM) and lhe French Ministry of Research and Technology.B.A. was an ,\F.\'I Ff!/J,)W in J!I!I') ·!IJ.

Aissani, B., and Bernardi, G. (1991). CpG islands: Features and distri­bution in the genornes of vertebrates. Gene 106: 173-183; 185-195.

Alcalay, M.• and Toniolo, D. (1988). The mosaic genome of warm.blooded vertebrates. Nucleic Acids Res. 16: 9527-9543.

Ambros, P. F., and Sumner. A. T. (1987). Correlation of pachytenechrornomeres and metaphase bands of human chromosomes anddistinctive properties of telomeric regions. Cylogenet. Cell Genet.44: 223-228.

Bernardi, G., 010f550n, D., Filipski, J., Zerial, M., Salinas, J., Cuny,G., el al. (1985). Sciellce 228: 953-958.

Bernardi, G. (I!lS!)l. The isochore organization of the human genome.Allllu. Reu. Gel/rl. 2:1: 637-661.

II. i !I I :

II

'i. :

462 PILlA ET AL.

Bernardi, G. (1992). Genome organization and species formation invertebrates. J. Mol. Euol., in press.

Bettecken, T., Aissani. B., Muller, C. R.. and Bernardi, G. (992).Compositional mapping of the human dystrophin-encoding gene.Gene 122: 329-335.

Bickmore, W. A.• and Bird. A. P. (1992). Usr of restriction rndonucle­ases to detect and isolate genes from mammalian cells. MethodsEnzymol. 216: 224.

Bird, A. P. (1987). epG islands as gene markers in the vertebratenucleus. Trend.~ Genet. 3: 342-347.

Brown. W. R. A., MncKinnon. P. ,J .• Villasante. A.. Spurr. N., Buckle.J., and Dobson, M. J. (l9901. Structure and polymorphism of hu­man telomere-nssociated DNA. Cell 63: 119-132.

Brownstein, B. H.. Silverman. G, A., Little, R. D., Burke. D. T., Kors­meyer, S. J .• Schlessinger. D., and Olson, M. V. (1989). Isolation ofsingle-copy human genes from a library of yeast artificial chromo·some clones. Science 244: 1348-1351.

Burke. D. 1'.. Carle. G. F.. and Olson. M. V. (1987). Cloning of largesegments of exogenous DNA into yeast by means of art iricial chro­mosome vectors. Science 236: 806-812.

Chen, E. Y.. Chen/:. A.• Lee. A.• Kuon/:, W. J., Ilillier. L., Green, P.,Schlessinger, D.• Ciccodicola. A., lind D·Urso. M. (199J). Sequenceof human glucose·6·phosphate-dehydrogen:lse c10nNl in plasm idsand a yeast orlilicial chromosome. Genomics 10: 7!J2-800.

Cremer. T .• Cremer, C.. Baumann, 1-1., Luedtke. E.-K .. Sperlinl(. K.,Teuher. V.• et al. (982). Hahl's model of t.he intl'rphnse chroll1~­some arrangement t.ested in chinese hamster cells by prematurechromosome condensation and Inser-UV-Microbeam experiments.Hum. Genet. 60: 46-56.

Davies, K. E.• MnndeJ. J.·L.• Monaco. A. P., Nussbaum, R. L.• andWillard. H. F. (1991). Report on the genetic constitution of the Xchromosome: Human Genome Mapping 11. Cytogenet. Cell Genet.58: 853-966.

DeSario, A., Aissani. B., and Bernardi, G. (1991). Compositional prop­erties of telomeric regions from human chromosomes. FEBS Lett.295: 22-26.

Dietrich. A., I(orn. B., und Poustka, A. (1992). Maml1l. Genome 3:168-172.

Dutrillaux. B. (1973). Nouveau systeme de marquage chromosomique:Les bandes T. Chromo.~oma41: 395-402.

Gardiner-Garden, M.• and Frommer, M. (1987). CpG islnnds in verte­brnte genomes. J. Mol. Bioi. ] 96: 261-282.

Gardiner. K.• Aissani. B., and Bernardi, G. (1990). A compositionalmap of human chromosome 21. EMBO J. 9: 1853-1858.

Gorlin, J. B., Yamin. R., Egan, S.. Stewart, M., Stossel, T. P., I{wuiat­kowski, D. J., and Hartwil(, ,J. H. (1990). Human endothclinl actin­hinding protein (AfiP-280, nonJ1luscie filamin): A moleculur leafsprinl(. J. Cel/niol. ] 11: 1089-1105.

Huxley, C.. Hal(ino, Y.• Schlessinger, D., and Olson. M. V. (W91l. Thehuman HPRT gene on a yeast artificial chromosome is functionalwhen transferred to mouse cells by cell fusion. Genomic.~9: 742-750.

Kaneko, K.. Miyatake. T .. Migeon, B. R., Warren, S. T., and Tsuji, S.(1992). Isolation of cosmid clones containing CpG islands at Xq24­qter region. Cylocc'net. Cel/ GCllet. 58: 2069.

Krane, D. E:., Hartl, D. L., and Ochman, H. (1991). Rapid determina­tion of nucleotide content and its application to the study of genomestructure. Nucleic Acid.. /les. 19: 5181-5]85.

Lillie, R. D., Pilia, G...Johnson, S.. Zucchi, I., D'Urso, M., and Schles­sin/:er, D. (1992). Yeast artificial chromosomes spanning 8 Mb and10-15 centiMorgans of human cytogenetic bnnd Xq26. /'roc. Nall.Acad. Sci. USA 89: 177-181.

Maestrini, E., Rivella, S., Tribioli, C., Purtilo, D., Rocchi, M., Archi­diacono, N., et 01. (1990). Probes for CpG islands on the distal longarm of the human X chromosome are clustered in Xq24 and Xq28.Genomics 8: 664-670.

Montanaro, V., Casamassimi, A., D'Urso, M., Yoon, J.- Y., Freije, W.,Schlessinger, D., Muenke, M., Nussbaum, R. L., Saccone, S., Mau­geri, S., Santoro, A. M., Motta, S., and Della Valle, G. (1991)./n situhybridization to cytogenetic bands of yeast artificial chromosomescovering 50% of human Xq24-Xq28 DNA. Am. J. Hum. Gen. 48:183-194.

Mouchiroud, D., D'Onofrio, G., Aissani, B., Macaya, G., Gautier, C.,and Bernardi, G. (1991). The distribution of genes in the humangenome. Gene 100; 181-187.

Newport. J. W., and Forbes, D. J. (1987). The nucleus: Structure,function, and dynnmics. A,IIlU. /leu. Biochem. 56: 535-565.

Nguyen, C.• Pontarolli, P., Birnbaum, D., Chimini, G., Rey, J. A.,Mattei, J. F., el 01. (19871. Large scale physical mapping in the q27region of the human X chromosome: The coagulation factor IX geneand the mcf.2 transforming sequence are separated by at most 270kilobase pairs and are surrounded by several 'HTF islands'. EMBOJ. 6: 3285-3289.

Nussbaum, R. L., Airhart, S. D., and Ledbetter, D. H. (1986). A ro­dent-human hybrid containing Xq24-Xqter translocated to a ham­ster chromosome expresses the Xq27 folate-sensitive fragile site.Am. J. Med. GCIlCI. 23: 457-466.

Palmieri, r.., Cappa, V., Romano, G., D'Urso, M., .Johnson, S., Schles­singer, D., Morris, P., Hopwood, J., Di Natale, P., Galli. R., andl3allabio. A. (HJ92). The iduronate sulfatase gene: Isolation of a 1.2­Mb YAC con til( spanning the entire /(ene and identification of het­erogeneous deletions in patients with Hunter Syndrome. G('nomic.~

12: 52-57.

Palmieri, G., Romano, G., Casamassimi, A., D'Urso, M., Lillie, R. D.,Abidi, F. E., Schlessinger, D., Lagerstrom, M., Malmgren, 1-1"Steen-Bondeson, M. L., Pettersson, U., and Lnndegren, U. (1993).1.5·Mb YAC contig in Xq28 formatted with sequence-lagged sitesand including a region unstable in the clones. Genomics 16: 586­592.

Poustka, A., Dietrich, A., Lanl(estein, G., Toniolo, D.. Warren, S., andLehrach, H. (1991). Proc. Nat!. A cad. Sci. USA 88: 8302-8306.

Rynditch, A., Kadi, F., Geryk, J., Zoubak, S., Svoboda, J., and Ber­n~rdi, G. (1991). The isopycnic, compartmentalized integration ofRous sarcoma virus sequences. Gellc 106: 165-172.

Saccone, S., DeSario, A., Della Valle. G., and Bernardi, G. (1992). Thehighest gene concentrations in the human genome are in T-bandsofmetaphase chromosomes. Proc. Nat!. Acad. Sci. USA 89: 4913­4917.

Schlessinger, D., Lillie, R. D., Freije, D., Ahidi, F., Zucchi, I., Porta,G., Pilia, G., Nagaraja, R., Johnson, S. K., Yoon, .J. Y., Srivastava,A.,I(ere, .J., Palmieri, G.• Ciccodicola, A., Monlanaro, V.• Romano,G., Casmnassimi, A., and D'Urso, M. (1991). Yeast artificial chro­mosome-based genome mapping: Some lessons from Xq24-q28. Ge­Ilomics 1 ]: 783-7!J3.

Sulston, ,I., Du, Z.. Thomns, K., Wilson, H., Hillier, L., Staden, It, et01. (1992). The C. e1cgafl.~ genome sequencing project: A heginning.Nature 356: 37-41.

TQ.niolo, D.; Persico. M., and Alcalay, M. (1988). A "housekeeping"gene on the X chromosome encodes a protein similar to ubiquitin.l'roc. Not/. Acad. Sci. USA 85: 851-855.

Warren, S. T .. Knight. S. J .. Peters, .J. F., Stayton, C. L., Consalez.G. G., and Zhang. F. P. (1990). Isolation of the humon chromosomalband Xq28 within somatic cell hybrids by fragile X site breakage.Proc. Nat!. Acad. Sci. USA 87: 3856-3860.

Wolf, S. F., and Migeon, B. R. (1985). Clusters of CpG dinuclcotidesimplicated by nuclease hypersensitivity as control elements ofhousekeeping genes. Nature 3]4: 467-469. _

Yunis, J . •J. (1981). Mid-prophase human chromosomes. The attain­ment of 2000 bands. Ilum. Genet. 56: 293-298.

Zerial, M., Salinas, ,J., Filipski, J., and Bernardi, G. (1986). Gene dis­tribution and nucleotide sequence organization in the human ge­nome. Eur. J. Biochem. 160: 479-485.