5
475 Gene, 32 (1984) 475-479 Elsevier GENE 1174 Short Communications Analysis of a 1%3-bp polymorphic region flanking the human insulin gene (Recombinant DNA; gene library; nucleotide sequence; phage 1 vectors; genomic repeats; restriction analysis) David Owerbach * and Lissi Aagaard ** *Department of Biochemistry, University of Massachusetts Medical Center, 55 Lake Avenue North, Worcester, MA 01605 (U.S.A.) Tel. (617) 856-5570, and **Hagedorn Research Laboratory, Niels Steensensvej 6, DK2820, Gentofte (Denmark) Tel. (01) 68 08 60 (Received August 2nd, 1984) (Revision received and accepted September 21st, 1984) SUMMARY The nucleotide sequence of a long polymorphic region located 365 bp upstream from the human insulin gene is reported. The region is composed of 139 repeating sequences whose consensus structure is related to ACAGGGGTGTGGGG. Expansion in the number of repeating sequences appears to have taken place through duplication and triplication of 112-141-bp regions. However, ancestral polymorphic regions containing additions or deletions of 50 bp or more were not detected in two previous generations. INTRODUCTION A highly polymorphic region flanks the 5’ end of the insulin gene on chromosome 11 in man (Bell et al., 1980a; 1981; 1982; Rotwein et al., 1981; Owerbach and Nerup, 1982; Ullrich et al., 1982). In Caucasians, the polymorphism exists in two major forms; a short (up to 600 bp) and a longer (> * To whom correspondence and reprint requests should be sent; on request a detailed experimental evidence for conclusions reached in this short presentation will be supplied. Abbreviations: bp, base pairs; Denhardt’s solution, 0.02% each bovine serum albumin, Ficoll400, polyvinylpyrrolidone 360; kb, 1000 bp; SDS, sodium dodecyl sulfate; SSC, 150 mM NaCl, 15 mM Na, . citrate, pH 7-8. 0378-l 119/84/$03.00 0 1984 Elsevier Science Publishers 1600 bp) polymorphic region, present in approx. 75% and 25% of the insulin gene regions analysed, respectively (Bell et al., 1981; Owerbach and Nerup, 1982; Rotwein et al., 1983). The sequence of two of the short polymorphic regions has been determined and is composed of simple tandemly repeating sequences whose struc- ture is related to ACAGGGGTGTGGGG (Bell et al., 1982; Ullrich et al., 1982). The sequence of the longer polymorphic regions is speculated to be composed of larger numbers of the tandemly repeat- ed sequences (Bell et al., 1982); however, this hypo- thesis has never been directly tested by nucleotide sequence analysis. These longer polymorphic regions are of particular interest because of their positive association with atherosclerosis (Owerbach et al., 1982b; Mandrup-Poulson et al., 1984) and hypertri- glyceridaemia (Jowett et al., 1984) as well as negative

Analysis of a 1963-bp polymorphic region flanking the human insulin gene

  • Upload
    lissi

  • View
    213

  • Download
    0

Embed Size (px)

Citation preview

475 Gene, 32 (1984) 475-479

Elsevier

GENE 1174

Short Communications

Analysis of a 1%3-bp polymorphic region flanking the human insulin gene

(Recombinant DNA; gene library; nucleotide sequence; phage 1 vectors; genomic repeats; restriction analysis)

David Owerbach * and Lissi Aagaard **

*Department of Biochemistry, University of Massachusetts Medical Center, 55 Lake Avenue North, Worcester, MA 01605 (U.S.A.) Tel. (617) 856-5570, and **Hagedorn Research Laboratory, Niels Steensensvej 6, DK2820, Gentofte (Denmark) Tel. (01) 68 08 60

(Received August 2nd, 1984) (Revision received and accepted September 21st, 1984)

SUMMARY

The nucleotide sequence of a long polymorphic region located 365 bp upstream from the human insulin gene is reported. The region is composed of 139 repeating sequences whose consensus structure is related to ACAGGGGTGTGGGG. Expansion in the number of repeating sequences appears to have taken place through duplication and triplication of 112-141-bp regions. However, ancestral polymorphic regions containing additions or deletions of 50 bp or more were not detected in two previous generations.

INTRODUCTION

A highly polymorphic region flanks the 5’ end of the insulin gene on chromosome 11 in man (Bell et al., 1980a; 1981; 1982; Rotwein et al., 1981; Owerbach and Nerup, 1982; Ullrich et al., 1982). In Caucasians, the polymorphism exists in two major forms; a short (up to 600 bp) and a longer (>

* To whom correspondence and reprint requests should be sent; on request a detailed experimental evidence for conclusions reached in this short presentation will be supplied.

Abbreviations: bp, base pairs; Denhardt’s solution, 0.02% each bovine serum albumin, Ficoll400, polyvinylpyrrolidone 360; kb, 1000 bp; SDS, sodium dodecyl sulfate; SSC, 150 mM NaCl, 15 mM Na, . citrate, pH 7-8.

0378-l 119/84/$03.00 0 1984 Elsevier Science Publishers

1600 bp) polymorphic region, present in approx. 75% and 25% of the insulin gene regions analysed, respectively (Bell et al., 1981; Owerbach and Nerup, 1982; Rotwein et al., 1983).

The sequence of two of the short polymorphic regions has been determined and is composed of simple tandemly repeating sequences whose struc- ture is related to ACAGGGGTGTGGGG (Bell et al., 1982; Ullrich et al., 1982). The sequence of the longer polymorphic regions is speculated to be composed of larger numbers of the tandemly repeat- ed sequences (Bell et al., 1982); however, this hypo- thesis has never been directly tested by nucleotide sequence analysis. These longer polymorphic regions are of particular interest because of their positive association with atherosclerosis (Owerbach et al., 1982b; Mandrup-Poulson et al., 1984) and hypertri- glyceridaemia (Jowett et al., 1984) as well as negative

416

association with type 1 (insulin-dependent) diabetes (Bell et al., 1984).

EXPERIMENTAL AND DISCUSSION

sequence (Bell et al., 1980b; Ulhich et al., 1980). A PstI site is thus eliminated in the a type sequence; within positions -365 to + 1483 only three dilfer- ences occur compared with the CI sequence of Ullrich et al. (1980). These occur in IVS l(216, A + T), IVS2 (1102, C + T) and the 3’ untranslated region (1380, C+A).

(a) Gene library (d) Polymorphic regions

Lymphocytes from a 30-year-old Caucasian male were used for preparing a chromosomal library. DNA was digested to completion with the restriction endonuclease EcoRI, size-fractionated on a 10-40x sucrose gradient and approx. 15-kb se- quences cloned into A Charon 28 by established procedures to prepare a chromosomal library (Ma- niatis et al., 1982). The chromosomal library was screened without prior amplification using a probe prepared from a HindII-BglI fragment of the human insulin genomic clone AHI- (Bell et al., 1980a) which was a gift from Drs. Graeme Bell and William Rutter of San Francisco.

(b) Sequence

A 4.2-kb fragment containing both polymorphic and insulin regions was isolated by BglI digestion of one isolate (IHI-3) and various regions were sub- cloned into Ml3 vectors of Messing et al. (1977). Nucleotide sequence analysis of overlapping DNA fragments was carried out according to the dideoxy method of Sanger et al. (1977).

In total, 3943 nucleotides were sequenced. Ap- prox. 180 bp at the 5’ end and 70 bp at the 3’ end of the BglI fragment were not determined. However, included within the sequenced region are the entire polymorphic region and messenger RNA coding regions of the insulin gene. The cap and poly(A) addition sites are located at positions + 1 and + 1431, respectively. The polymorphic region is lo- cated upstream at positions -365 to -2327.

(c) Differences in sequence

The sequence of the AHI- insulin gene (sequence not shown) indicates that it conforms to the a-type sequence described by Ulhich et al. (1980). Nucleo- tide 1367 of the upper strand contains a C residue instead of the T residue that is found in the P-type

In contrast to the highly conserved sequence found between alleles in the region -365 to + 1483, the region between -365 and -2327 shows extensive polymorphism. Fig. 1 shows the complete sequence of the polymorphic region from -365 to 2327. The sequence contains 1963 nucleotides comprising 139 repeating units whose structure is related to the consensus sequence ACAGGGGTGTGGGG. This sequence is present in 81 of the 139 repeats. In addition, eight other related sequences are present, ranging in frequency from 1 to 20 times (Table I, Fig. 1).

Table I also shows the frequency of various se- quences found in two short polymorphic regions isolated and sequenced from a chromosomal library (AHI- and AHI-2) by Bell et al. (1982). Most strik- ing is the absence of sequence e in the 45 repeats of AHI-2; AHI- contains this sequence once compared to 10 times in IHI-3.

The 5’ and 3’ ends of the polymorphic regions shows absolute conservation in sequence between AHI-1, 2, and 3. The first three repeats (cdi) at the 5’ end and last two repeats (cb) at the 3’ end are identical in all three polymorphic regions analyzed (Table I). The variable regions are therefore located internally within the polymorphic region.

In contrast to the organization of repeats in the internal regions of AHI- and AHI- (Bell et al., 1982), the organization of repeats in AHI- does not appear to be completely random. Blocks of 8-10 repeats appear to be duplicated or triplicated within the region. As seen in Fig. 1, the 112-bp sequences 13-20 are repeated at 26-33 and again at 37-44. The 127-bp sequence found at 63-71 is identical to the sequences at 79-87 except for the single nucleotide substitution at position 69 converting ATAGGGGTGTGGGG to ATAGGGGTGTGTGG. Interestingly, this nucleo- tide substitution is very rare, occurring only once in

1 2 3 4 5 ACAGGGGTCCTGGGG ACAGGGGTCCGGGG ACAGGGTCCTGGGG ACAGGGGTGTGAGG ACAGGGGTCCTGGGG

6 7 8 9 10 ACAGGGGTGTGGGG ACAGGGGTGTGAGG ACAGGGGTCCCGGGG ACAGGGGTGTGGGG ACAGGGGTGTGGGG

11 12 13 14 15 ATAGGGGTGTGGGG ACAGGGGTGTGGGG ACAGGGGTGTGGGG ACAGGGGTCTGGGG ACAGGGGTGTGGGG

16 17 18 19 20 ATAGGGGTGTGGGG ACAGGGGTGTGGGG ACAGGGGTGTGGGG ACAGGGGTCTGGGG ACAGGGGTGTGGGG

21 22 23 24 25 ACAGGGGTCCGGGG ACAGGGGTGTGGGG ACAGGGGTGTGGGG ACAGGGGTGTGGGG ACAGGGGTCCCGGGG

26 27 28 29 30 ACAGGGGTGTGGGG ACAGGGGTCTGGGG ACAGGGGTGTGGGG ATAGGGGTGTGGGG ACAGGGGTGTGGGG

31 32 33 34 35 ACAGGGGTGTGGGG ACAGGGGTCTGGGG ACAGGGGTGTGGGG ACAGGGGTCTGGGG ACAGGGGTGTGGGG

36 37 38 39 40 ACAGGGGTCCCGGGG ACAGGGGTGTGGGG ACAGGGGTCTGGGG ACAGGGGTGTGGGG ATAGGGGTGTGGGG

41 42 43 44 45 ACAGGGGTGTGGGG ACAGGGGTGTGGGG ACAGGGGTCTGGGG ACAGGGGTGTGGGG ACAGGGGTGTGGGG

46 47 48 49 50 ACAGGGGTGTGGGG ACAGGGGTCCGGGG ACAGGGGTGTGGGG ACAGGGGTCTGGGG ACAGGGGTGTGGGG

51 52 53 54 55 ACAGGGGTGTGGGG ACAGGGGTGTGGGG ACAGGGGTCTGGGG ACAGGGGTGTGGGG ACAGGGGTCTGGGG

56 57 58 59 60 ACAGGGGTGTGGGG ACAGGGGTGTGGGG ACAGGGGTGTGGGG ACAGGGGTGTGGGG ACAGGGGTCCGGGG

61 62 63 64 65 ACAGGGGTCTGGGG ACAGGGGTGTGGGG ACAGGGGTGTGGGG ACAGGGGTGTGGGG ACAGGGGTCCCGGGG

66 67 68 69 70 ACAGGGGTGTGGGG ACAGGGGTCTGGGG ACAGGGGTGTGGGG ATAGGGGTGTGTGG ACAGGGGTGTGGGG

71 72 73 74 75 ATAGGGGTGTGGGG ACAGGGGTCCCGGGG ACAGGGGTGTGGGG ACAGGGGTGTGGGG ATAGGGGTGTGGGG

76 77 78 79 80 ACAGGGGTCCCGGGG ACAGGGGTGTGGGG ACAGGGGTCTGGGG ACAGGGGTGTGGGG ACAGGGGTGTGGGG

81 82 83 84 85 ACAGGGGTCCCGGGG ACAGGGGTGTGGGG ACAGGGGTCTGGGG ACAGGGGTGTGGGG ATAGGGGTGTGGGG

86 87 88 89 90 ACAGGGGTGTGGGG ATAGGGGTGTGGGG ACAGGGGTGTGGGG ACAGGGGTCCTGGGG ACAGGGGTGTGGGG

91 92 93 94 95 ACAGGGGTGTGGGG ACAGGGGTGTGGGG ACAGGGGTGTGGGG ACAGGGGTCCCGGGG ACAGGGGTGTGGGG

96 97 98 99 100 ACAGGGGTGTGGGG ACAGGGGTGTGGGG ACAGGGGTCCGGGG ACAGGGGTGTGGGG ACAGGGGTGTGGGG

101 102 103 104 105 ACAGGGGTGTGGGG ACAGGGGTGTGGGG ACAGGGGTCCTGGGG ACAGGGGTCTGGGG ACAGGGGTGTGGGG

106 107 108 109 110 ACAGGGGTGTGGGG ACAGGGGTCCGGGG ACAGGGGTGTGGGG ACAGGGGTCCGGGG ACAGGGGTGTGGGG

111 112 113 114 115 ACAGGGGTGTGGGG ACAGGGGTGTGGGG ACAGGGGTGTGGGG ACAGGGGTCCTGGGG ACAGGGGTCTGGGG

116 117 118 119 120 ACAGGGGTGTGGGG ACAGGGGTGTGGGG ACAGGGGTCCCGGGG ACAGGGGTGTGGGG ACAGGGGTGTGGGG

121 122 123 124 125 ACAGGGGTGTGGGG ACAGGGGTGTGGGG ACAGGGGTGTGGGG ACAGGGGTCCCGGGG ACAGGGGTGTGGGG

126 127 128 129 130 ACAGGGGTGTGGGG ACAGGGGTCCTGGGG ACAGGGGTCTGGGG ATAGGGGTGTGGGG ACAGGGGTCTGGGG

131 132 133 134 135 ACAGGGGTGTGGGG ACAGGGGTCTGGGG ATAGGGGTGTGGGG ACAGGGGTGTGGGG ACAGGGGTGTGGGG

136 137 138 139 ACAGGGGTGTGGGG ACAGGGGTGTGGGG ACAGGGGTCCTGGGG ACAGGGGTCTGGGG

Fig. 1. Nucleotide sequence ofthe polymorphic region within IHI-3. These sequences represent positions -365 to -2327 upstream from

the cap site ofthe human insulin gene. Numbers 1 through 139 represent the different repeating units. Only the 5’ -+ 3’ strand is shown.

The relative number of each repeat type is shown in Table I and is compared between the long polymorphic region (AHI-3) and two

short polymorphic regions (IHI- and AHI-2) isolated by Bell et al. (1982).

478

TABLE I

Composition of oligonucleotide repeats in three polymorphic regions

Sequence IHI-1” IHI-2” IHI-

No. (%) No. (%) No. (%)

a ACAGGGGTGTGGGG 15 (44.1)

b ACAGGGGTCTGGGG 5 (14.7)

c ACAGGGGTCCTGGGG 7 (20.6)

d ACAGGGGTCCGGGG 2 (5.9) e ATAGGGGTGTGGGG 1 (2.9) f ACAGGGGTCCCGGGG 1 (2.9) g ACAGGGGTCTGAGG 1 (2.9) h ACAGGGGTGTGGGC 1 (2.9) i ACAGGGTCCTGGGG 1 (2.9) j ACAGGGGTGTGAGG 0 (0) k ATAGGGGTGTGTGG 0 (0)

Total 34 -

24

6

0

45

(53.3)

(13.3)

(15.6)

(2.2)

(0) (11.1)

(0)

(0)

(2.2)

(2.2)

(0)

-

81

20

10

10

0

0

2

139

(58.3)

(14.4)

(5.0)

(5.0)

(7.2)

(7.2)

(0)

(0)

(0.7)

(1.4)

(0.7)

-

a Bell et al. (1982)

JHI-3 and not at all in AHI- or 1HI-2 (Table I). Furthermore, the 141-bp sequences 97-106 are found again at 108-117. It seems highly probable that the 112-141 bp duplicated regions within 1HI-3 arouse by recombination. It seems unlikely that these regions are due to a cloning artifact, since the sizes of restriction fragments in IH1-3 agree closely with

those detected in native DNA used to prepare the chromosomal library. Furthermore, it is unlikely that these duplicated regions arose by random point mutations within the consensus sequence ACAGGGGTGTGGGG, because certain of the repeating sequences, such as ATAGGGGTGTGGGG, are quite rare in other polymorphic regions examined.

If the 112- to 141-bp duplicated regions resulted from a recent recombinatorial event, one would expect to pick up differences in restriction fragment length between the long polymorphic region in the ancestors of the individual from whom the chromo- somal library was prepared. We tested this hypo- thesis by Southern blot hybridization analysis on DNA prepared from this individual as well as a brother, both parents, and two grandparents. The long polymorphic region was present in the father and paternal grandfather but not in the other rela- tives tested. We were unable to detect differences in size between the long polymorphic regions and thus, if size differences exist, they are smaller than approx.

50 bp. These results, however, are consistent with previous studies in large families covering 3 or 4 generations in which new insertions or deletions of DNA in the polymorphic region were not detected (Owerbach et al., 1982a; 1983). The identification and characterization of families containing expand- ed or deleted polymorphic regions will be necessary to further elucidate the mechanism of the recombi- nation process.

(d) Conclusions

We demonstrate that in IHI-3, the polymorphic region begins 365 bp before the putative start of transcription of the insulin gene and that it is com- posed of 139 repeating sequences related to the consensus sequence ACAGGGGTGTGGGG. In contrast to the 1848 bp downstream from the poly- morphic region which show strikingly high conser- vation of sequence between different insulin gene regions (3 or 4 bp mismatches out of 1848 bp), the three different polymorphic regions show extensive differences. These differences occur in the number of repeats, sequence of certain repeats, and organi- zation of repeats. It is still unclear whether number, sequence or organization of repeats affect function of the insulin gene which resides 365 bp downstream. Our results do show that sequences other than the ACAGGGGTGTGGGG related sequences do not

479

exist in this long polymorphic region and therefore it is unlikely that the region codes for a polypeptide molecule. Furthermore, the region between -168 and -258 bp upstream from the transcription start site, which contains essential control elements for efficient cell specific expression (Walker et al., 1984), are not different in 1HI-3 from those of ilHI-1 or 1HI-2. Thus, linkage-disequilibrium between se- quences in this control region and specific poly- morphic regions appears not to be the explanation for the disease association between the long poly- morphic regions and atherosclerosis (Owerbach

et al., 1982b; Mandrup-Poulsen et al., 1984) or hypertriglyceridaemia (Jowett et al., 1984). Alter- natively, other control elements or other genes, in linkage-disequilibrium with certain alleles of the polymorphic region, could account for the disease associations.

ACKNOWLEDGEMENTS

We wish to thank Dr. Graeme Bell for providing nucleotide sequence data prior to publication and Drs. Ake Lemmark and William Kastem for helpful discussions. This work was supported by funds from Nordisk Insulin Laboratorium, Gentofte, Denmark and start-up funds to D.O. from the University of Massachusetts Medical Center, Worcester, MA. D.O. was working at the Hagedom Research Labo- ratory, Gentofte, Denmark during much of this study.

REFERENCES

Bell, G.I., Horita, S. and Karam, J.: A polymorphic locus near the human insulin gene is associated with insulin-dependent diabetes mellitus. Diabetes 33 (1984) 176-183.

Bell, G.I., Karam, J.H. and Rutter, W.J.: A polymorphic DNA region adjacent to the 5’ end ofthe human insulin gene. Proc. Nat]. Acad. Sci. USA 78 (1981) 5759-5763.

Bell, G.I., Pictet, R.L. and Rutter, W.J.: Analysis of the regions flanking the human insulin gene and sequence of an Ah

family member. Nucl. Acids Res. 8 (1980a) 4091-4109. Bell, G.I., Pictet, R.L., Rutter, W.J., Cordell, B., Tischer, E. and

Goodman, H.M.: Sequence ofthe human insulin gene. Nature 284 (1980b) 26-32.

Bell, G.I., Selby, M.J. and Rutter, W.J.: The highly polymorphic

region near the insulin gene is composed of simple tandemly repealing sequences. Nature 295 (1982) 31-35.

Jowett, N.I., Williams, C.G., Hitman, G.A. and Gahon, D.J.: Diabetic hypertriglyceridaemia and a related 5’ flanking polymorphism of the human insulin gene. Br. Med. J. 288 (1984) 96-99.

Mandrup-Paulsen, T., Owerbach, D., Mortensen, S.A., Johan- sen, K., Meinertz, H., Sorensen, H. and Nerup, J.: DNA sequences flanking the insulin gene on chromosome 11 confer risk of atherosclerosis. Lancet 2 (1984) 250-252.

Maniatis, T., Fritsch, E.F. and Sambrook, J. Molecular Cloning.

A Laboratory Manual. Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, 1982, pp. 113, 138, 270-294.

Messing, J., Gronenborg, B., Muller-Hill, B. and Hofschneider, P.-H.: Filamentous coliphage Ml3 as a cloning vehicle: inser- tion of a Hind11 fragment of the lac regulatory region in Ml3 replicative form in vitro. Proc. Nat]. Acad. Sci. USA 74 (1977) 3642-3646.

Owerbach, D., Billesbolle, P., Paulsen, S., and Nerup, J. DNA insertions sequences near the insulin gene affect glucose regulation. Lancet 1 (1982a) 880-883.

Owerbach, D., Johansen, K., Billesbolle, P., Poulsen, S., Schroll, M. and Nerup, J.: Possible associate between DNA se- quences flanking the insulin gene and atherosclerosis. Lancet 2 (1982b) 1291-1293.

Owerbach, D. and Nerup, J.: Restriction fragment length poly- morphism of the insulin gene in diabetes melhtus. Diabetes 31 (1982) 275-277.

Owerbach, D., Thomsen, B., Johansen, K., Lamm, L.U. and Nerup, J.: DNA insertion sequences near the insulin gene are not associated with maturity-onset diabetes of young people. Diabetologia 25 (1983) 18-20.

Rotwein, P., Chyn, R., Chirgwin, J., Cordell, B., Goodman, H.M. and Permutt, M.A.: Polymorphism in the 5’-flanking region of the human insulin gene and its possible relation to type 2 diabetes. Science 213 (1981) 1117-1120.

Rotwein, P., Chirgwin, J., Province, M., Knowler, W., Petitt, D., Cordell, B., Goodman, H. and Permutt, A.: Polymorphism in the 5’ flanking region of the human insulin gene: A genetic marker for non-insulin-dependent diabetes. N. Engl. J. Med. 308 (1983) 65-71.

Sanger, F., Nicklen, S. and Coulson, A.R.: DNA sequencing with chain-terminating inhibitors. Proc. Nat]. Acad. Sci. USA 74 (1977) 5463-5467.

Ullrich, A., Dull, J.J., Gray, A., Brosius, J. and Sures, I.: Genetic variation in the human insulin gene. Science 209 (1980) 612-615.

Ullrich, A., Dull, T.J., Gray, A., Philips, J.A. and Peter, S.: Variation in the sequence and modification state ofthe human insulin gene flanking regions. Nucleic Acids Res. 10 (1982) 2225-2240.

Walker, M.D., Edlund, T., Boulet, A.M. and Rutter, W.J.: Cell specific expression controlled by the 5’ flanking region of insulin and chymotrypsin genes. Nature 305 (1984) 557-561.

Communicated by A.-M. Skalka.