5
Proc. Natl. Acad. Sci. USA Vol. 92, pp. 11829-11833, December 1995 Genetics Characterization of repetitive DNA in the Mycoplasma genitalium genome: Possible role in the generation of antigenic variation (genetic variability/adhesins/minimal genome) SCOTr N. PETERSON*t, CAMELLA C. BAILEYt§, J0RGEN S. JENSENS, MARTIN B. BORRET, ELIZABETH S. KINGt, KENNETH F. Borr*t, AND CLYDE A. HUTCHISON LII*tII *Curriculum in Genetics and :Department of Microbiology and Immunology, University of North Carolina, Chapel Hill, NC 27599; and IMycoplasma Laboratory, Neisseria Department, Statens Seruminstitut, DK-2300 Copenhagen, Denmark Contributed by Clyde A. Hutchison III, September 7, 1995 ABSTRACT We have characterized a family of repetitive DNA elements with homology to the MgPa cellular adhesion operon of Mycoplasma genitalium, a bacterium that has the smallest known genome of any free-living organism. One element, 2272 bp in length and flanked by DNA with no homology to MgPa, was completely sequenced. At least four others were partially sequenced. The complete element is a composite of six regions. Five of these regions show sequence similarity with nonadjacent segments of genes of the MgPa operon. The sixth region, located near the center of the element, is an A+T-rich sequence that has only been found in this repeat family. Open reading frames are present within the five individual regions showing sequence homology to MgPa and the adjacent open reading frame 3 (ORF3) gene. However, termination codons are found between adjacent regions of homology to the MgPa operon and in the A+T-rich sequence. Thus, these repetitive elements do not appear to be directly expressible protein coding sequences. The sequence of one region from five different repetitive elements was compared with the homologous region of the MgPa gene from the type strain G37 and four newly isolated M. genitalium strains. Recombination between repetitive elements of strain G37 and the MgPa operon can explain the majority of polymorphisms within our partial sequences of the MgPa genes of the new isolates. Therefore, we propose that the repetitive elements of M. genitalium provide a reservoir of sequence that contributes to antigenic variation in proteins of the MgPa cellular adhe- sion operon. Mycoplasmas are a class of wall-less bacteria with genomes ranging from 1700 kb to as little as 580 kb as in the case of Mycoplasma genitalium (1-4). It has been estimated that M. genitalium has the capacity to encode "400 proteins (5, 6). Its genome is arranged in a conservative manner, making heavy use of operons and having minimal spacer regions between coding sequences (6, 7). All indications suggest that mycoplas- mas are under selective pressure to reduce their genome size to that of a minimal system (8). In light of this, it was somewhat surprising that M. genitalium, as well as several other Myco- plasma species, possess repetitive DNAs in their genomes (1, 9-11). In M. genitalium it has been estimated that repetitive DNA represents up to 4% of the genome (6). Repetitive DNA is a common feature of many bacterial genomes. In certain cases, this DNA has the clearly established function of creating antigenically distinct cells among a pop- ulation (12-15). Such genetic variation may be important in evading host immune responses as well as optimizing the bacterial surface for colonization of a particular host. In other The publication costs of this article were defrayed in part by page charge payment. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. §1734 solely to indicate this fact. cases, repetitive DNA has been characterized at the molecular level, but its function remains unclear. Repetitive DNA in M. genitalium has sequence homology to the three-gene operon that encodes one of the major surface proteins, the adhesin molecule MgPa (16). MgPa is highly immunogenic and necessary for attachment of the organism to the host epithelium (17-19). Spontaneous mutants of M. genitalium that lack or are unable to correctly localize MgPa lose the ability to cause hemagglutination or cellular adhesion, a phenotype that has been associated with the ability of the organism to attach to host epithelial cells in vivo (17, 18). Dallo and Baseman (11) divided the MgPa gene into 10 restriction fragments (A-J; see Fig. 1A) and used these individual frag- ments to probe restriction digests of genomic M. genitalium DNA. They showed that certain regions of the MgPa gene were repeated, while other regions were present only once, in the operon itself (11). To investigate the role of repetitive DNA in the minimal genome of M. genitalium, we have analyzed repetitive DNA sequences and compared them with sequences from the func- tional MgPa operon in new clinical isolates. * * Our data support the idea that maintenance of these sequences by this minimal organism is required to provide a mechanism for antigenic variation. MATERIALS AND METHODS M. genitalium Strains. Type strain G37 was generously provided by P.-c. Hu (University of North Carolina, Chapel Hill). This strain was originally obtained from the American Type Culture Collection (no. 33530) in 1983 and was propa- gated in modified Hayf lick's medium (30) containing agamma horse serum, 10% yeast dialysate, and penicillin G (1000 units/ml). It is referred to here as G37-US. Four new strains of M. genitalium were isolated as described (20). The G37 strain used by one of us (J.S.J.) was obtained from David Taylor- Robinson in 1982 after its initial isolation but prior to deposit into the American Type Culture Collection. This strain was propagated in modified Friis FF medium (31); it is referred to here as G37-DK. Clones and Sequencing. Escherichia coli clones containing fragments of M. genitalium G37-US DNA in pUC118 were sequenced partially as part of a random sequencing project (6). Abbreviation: ORF, open reading frame. tPresent address: Burroughs Wellcome, Division of Cell Biology, Research Triangle Park, NC 27709. §Present address: Center for Vaccine Development, University of Maryland at Baltimore, Baltimore, MD 21201. "To whom reprint requests should be addressed. **The sequences reported in this paper have been deposited in the GenBank data base (accession nos. U34967-U34970 and X91071- X91075 as identified in Figs. 1 and 2). 11829 Downloaded by guest on May 30, 2021

Characterization of DNAin the Mycoplasma genome: Possible ...Proc. Natl. Acad. Sci. USA Vol. 92, pp. 11829-11833, December 1995 Genetics Characterizationofrepetitive DNAin theMycoplasma

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

  • Proc. Natl. Acad. Sci. USAVol. 92, pp. 11829-11833, December 1995Genetics

    Characterization of repetitive DNA in the Mycoplasmagenitalium genome: Possible role in the generation ofantigenic variation

    (genetic variability/adhesins/minimal genome)

    SCOTr N. PETERSON*t, CAMELLA C. BAILEYt§, J0RGEN S. JENSENS, MARTIN B. BORRET, ELIZABETH S. KINGt,KENNETH F. Borr*t, AND CLYDE A. HUTCHISON LII*tII*Curriculum in Genetics and :Department of Microbiology and Immunology, University of North Carolina, Chapel Hill, NC 27599; andIMycoplasma Laboratory, Neisseria Department, Statens Seruminstitut, DK-2300 Copenhagen, Denmark

    Contributed by Clyde A. Hutchison III, September 7, 1995

    ABSTRACT We have characterized a family of repetitiveDNA elements with homology to the MgPa cellular adhesionoperon of Mycoplasma genitalium, a bacterium that has thesmallest known genome of any free-living organism. Oneelement, 2272 bp in length and flanked by DNA with nohomology to MgPa, was completely sequenced. At least fourothers were partially sequenced. The complete element is acomposite of six regions. Five of these regions show sequencesimilarity with nonadjacent segments of genes of the MgPaoperon. The sixth region, located near the center of theelement, is an A+T-rich sequence that has only been found inthis repeat family. Open reading frames are present within thefive individual regions showing sequence homology to MgPaand the adjacent open reading frame 3 (ORF3) gene. However,termination codons are found between adjacent regions ofhomology to the MgPa operon and in the A+T-rich sequence.Thus, these repetitive elements do not appear to be directlyexpressible protein coding sequences. The sequence of oneregion from five different repetitive elements was comparedwith the homologous region of the MgPa gene from the typestrain G37 and four newly isolated M. genitalium strains.Recombination between repetitive elements of strain G37 andthe MgPa operon can explain the majority of polymorphismswithin our partial sequences of the MgPa genes of the newisolates. Therefore, we propose that the repetitive elements ofM. genitalium provide a reservoir of sequence that contributesto antigenic variation in proteins of the MgPa cellular adhe-sion operon.

    Mycoplasmas are a class of wall-less bacteria with genomesranging from 1700 kb to as little as 580 kb as in the case ofMycoplasma genitalium (1-4). It has been estimated that M.genitalium has the capacity to encode "400 proteins (5, 6). Itsgenome is arranged in a conservative manner, making heavyuse of operons and having minimal spacer regions betweencoding sequences (6, 7). All indications suggest that mycoplas-mas are under selective pressure to reduce their genome sizeto that of a minimal system (8). In light of this, it was somewhatsurprising that M. genitalium, as well as several other Myco-plasma species, possess repetitive DNAs in their genomes (1,9-11). In M. genitalium it has been estimated that repetitiveDNA represents up to 4% of the genome (6).

    Repetitive DNA is a common feature of many bacterialgenomes. In certain cases, this DNA has the clearly establishedfunction of creating antigenically distinct cells among a pop-ulation (12-15). Such genetic variation may be important inevading host immune responses as well as optimizing thebacterial surface for colonization of a particular host. In other

    The publication costs of this article were defrayed in part by page chargepayment. This article must therefore be hereby marked "advertisement" inaccordance with 18 U.S.C. §1734 solely to indicate this fact.

    cases, repetitive DNA has been characterized at the molecularlevel, but its function remains unclear.

    Repetitive DNA in M. genitalium has sequence homology tothe three-gene operon that encodes one of the major surfaceproteins, the adhesin molecule MgPa (16). MgPa is highlyimmunogenic and necessary for attachment of the organism tothe host epithelium (17-19). Spontaneous mutants of M.genitalium that lack or are unable to correctly localize MgPalose the ability to cause hemagglutination or cellular adhesion,a phenotype that has been associated with the ability of theorganism to attach to host epithelial cells in vivo (17, 18). Dalloand Baseman (11) divided the MgPa gene into 10 restrictionfragments (A-J; see Fig. 1A) and used these individual frag-ments to probe restriction digests of genomic M. genitaliumDNA. They showed that certain regions of the MgPa gene wererepeated, while other regions were present only once, in theoperon itself (11).To investigate the role of repetitive DNA in the minimal

    genome of M. genitalium, we have analyzed repetitive DNAsequences and compared them with sequences from the func-tional MgPa operon in new clinical isolates. * * Our datasupport the idea that maintenance of these sequences by thisminimal organism is required to provide a mechanism forantigenic variation.

    MATERIALS AND METHODSM. genitalium Strains. Type strain G37 was generously

    provided by P.-c. Hu (University of North Carolina, ChapelHill). This strain was originally obtained from the AmericanType Culture Collection (no. 33530) in 1983 and was propa-gated in modified Hayflick's medium (30) containing agammahorse serum, 10% yeast dialysate, and penicillin G (1000units/ml). It is referred to here as G37-US. Four new strains ofM. genitalium were isolated as described (20). The G37 strainused by one of us (J.S.J.) was obtained from David Taylor-Robinson in 1982 after its initial isolation but prior to depositinto the American Type Culture Collection. This strain waspropagated in modified Friis FF medium (31); it is referred tohere as G37-DK.

    Clones and Sequencing. Escherichia coli clones containingfragments of M. genitalium G37-US DNA in pUC118 weresequenced partially as part of a random sequencing project (6).

    Abbreviation: ORF, open reading frame.tPresent address: Burroughs Wellcome, Division of Cell Biology,Research Triangle Park, NC 27709.§Present address: Center for Vaccine Development, University ofMaryland at Baltimore, Baltimore, MD 21201."To whom reprint requests should be addressed.**The sequences reported in this paper have been deposited in theGenBank data base (accession nos. U34967-U34970 and X91071-X91075 as identified in Figs. 1 and 2).

    11829

    Dow

    nloa

    ded

    by g

    uest

    on

    May

    30,

    202

    1

  • Proc. Natl. Acad. Sci. USA 92 (1995)

    In this study we have performed a more complete analysis ofsome of these clones and of some clones from the same projectthat were not previously sequenced. DNA inserts from clonesSAll, SC2, SE6, SF4, SG8, and XF2, each -2 kb, were usedto prepare sonication libraries. Sonicated DNA fragmentswere electrophoresed in a 1% low-melting-point agarose gel.DNA fragments of >300 nt were repaired with the Klenowfragment of DNA polymerase I by standard procedures.Repaired fragments were cloned into the HinclI site ofpUC118 and sequenced with the Universal primer. In a fewcases, gaps in sequence were closed by use of specific oligo-nucleotide primers. In such cases, sequence was generallydetermined for only one of the two strands.

    Sequence Analysis. Sequence data were analyzed with theGenetics Computer Group (Madison, WI) program package(GCG) version 7.0 (21). Sequences were read manually andentered by using the program SEQED. The Staden programs forshotgun sequencing (22) were used to assemble sequencesobtained from sonication libraries. The Genetics ComputerGroup program GAP was used for comparing individual se-quences. The programs PILEUP or LINEUP were used formultiple sequence alignments. Data base searches were per-formed on flanking sequences and the A+T-rich repetitiveDNA region by using the program FASTA (23).

    Analysis of MgPa Genes from Four New M. genitaliumIsolates. Crude boiled lysates of strains G37-DK, M2288,M2300, M2321, and M2341 were subjected to PCR primed bysequences from nonrepetitive parts of the MgPa operonflanking the B region (see Fig. 1): Mgpat-1010 (5'-AAATT-AGTGATGTTGTTAGTGATTGTGTG) and Mgpat-3072(5'-TAGGGGAGTGTTGGTTAGTTTGTTAGA). Individ-ual PCR products were digested with BstNI, and the 1008-bpfragment was recovered by electrophoresis through an agarosegel and was purified with Prep-A-Gene (Bio-Rad). The frag-ments were blunt-ended with phage T4 polymerase and clonedinto the EcoRV site of pBluescript. Cloned fragments weresequenced with the Applied Biosystems cycle-sequencing kitwith dye-terminators by using oligonucleotide sequencingprimers. Usually only one representative clone was sequencedin conserved regions. In regions where variability betweensequences was encountered, both strands of more than oneclone were sequenced. Sequences were read by an AppliedBiosystems model 373A automatic sequencer.

    RESULTS AND ANALYSISSequences of Repetitive Elements. Fifteen clones containing

    repetitive DNA homologous to the MgPa operon were iden-tified in a random sequencing analysis of the M. genitaliumgenome (6). These clones contained DNA fragments thatshowed 71-92% nucleotide sequence identity to regions of theMgPa operon. Additional sequence was obtained for several ofthese clones. Extended regions of exact sequence identity orshared non-MgPa flanking sequence allowed the sequencesfrom certain clones to be joined. The resulting nine contiguoussequences, each named for one of the clones that contributedto it, are diagrammed in Fig. 1.

    Structure of a Complete Repetitive Element. The sequenceSAl1 contains a complete repetitive element, defined as anMgPa-like sequence flanked on both sides by sequence with norelationship to MgPa. Fig. 1B depicts the structure of thiscomplete repetitive element sequence, 2272 bp long, withindividual regions of homology to the MgPa operon shaded tomatch the corresponding regions of the MgPa operon shownin Fig. 1A. The restriction fragments defined in the originalDallo and Baseman study (11) are also indicated. We havechosen to retain the Dallo and Baseman nomenclature andrefer to regions present in the elements by the names of therestriction fragments from which they come. For example,repetitive region B contains sequence with similarity to nt

    1635-2080 of the MgPa gene, which lies almost entirely withinthe B restriction fragment. Our results confirm the presence ofrepetitive sequence in restriction fragments B, E, F, and G andthe absence from fragments A, D, H, and I. Analysis of acomplete repetitive element within sequence SAl 1 allowed usto define three additional restriction fragments downstream ofthe fragments tested by Dallo and Baseman in ORF-3 of theMgPa operon. We refer to these fragments as K, L, and M, andthe regions found within repetitive DNA containing thesesequences as KL and LM (Fig. 1).The structure of the SA 1 repeat indicates that this element

    is a composite of noncontiguous regions from two genesencoded by the MgPa operon, MgPa and ORF-3. The order ofthese regions as they appear in SA 1 is B, EF, KL, G, and LM.We find an unusually A+T-rich sequence (20% G+C) be-tween the EF and KL region; this sequence shows no apparentsequence similarity to the MgPa operon. During randomsequencing of this genome, this region was never encounteredout of the context of the repetitive DNAs (6). The A+Trichness of this DNA is expected in sequence that is no longerfunctional; as such they are free to accumulate A and Tnucleotides because of the mutational pressure thought to beacting in M. genitalium (6, 8).

    Partial Element Sequences. Analysis of other partial repet-itive element sequences gave a minimum estimate of fivedistinct repetitive elements in the M. genitalium genome. Thesequences in Fig. 1 show five different nonrepetitive sequencesflanking B regions of repeats. These partial sequences allowedus to define some common structural features of repetitivesequences (Fig. 1 B and C). Several of the structural featuresseen in SAl 1 were present on all of the elements that wecharacterized. Five elements begin with a B region. We havefour examples of B followed by EF. An A+T-rich regionfollows EF in three cases. We encountered an exception to thestructure of the complete element (SAl1) in the clone HSA7.This does not contain an A+T-rich region downstream of EF.Four elements contain A+T-rich sequence followed by KL.The complete element sequence is our only example of theKL-G-LM arrangement. Finally, we have three examplesending with region LM.The exact endpoints of each region within the repeat vary

    somewhat (Fig. 1). The approximate nucleotide ranges corre-sponding to repeated DNA sequence are as follows: B (nt1651-2082), EF (nt 3359-3942), G (nt 4368-4553), KL (nt5527-6283), and LM (nt 6788-6938) (numbers refer to the MgPaoperon sequence found in GenBank accession no. M31431). Itshould be noted that, although a very small number of nucleotidesfrom restriction fragments C and J (42 and 31, respectively) arefound in repetitive elements, we have not included these frag-ments in the names given to repetitive sequence regions.Coding Potential of the Repeats. A pairwise alignment was

    performed on the repetitive DNA sequences. In each case, thewindow of comparison was defined as the boundary of a region(B-M). This comparison revealed several noteworthy featuresas illustrated by the alignment shown in Fig. 2.ORFs are maintained throughout individual regions that

    share a high level of amino acid sequence similarity with theMgPa or ORF-3 proteins. Deletions and insertions (withrespect to the published MgPa gene sequence) that occur in thesequence of the repetitive DNAs are almost always in multiplesof three nucleotides (Fig. 2). In general, we find stop codonsnear the beginning and end of the B, EF, KL, and LM regions.No methionine codons or other start codons are found nearthe 5' end of the ORFs contained by the various regions of theelements. Because these sequences lack obvious translationinitiation signals and in view of the high mutation rate thoughtto be acting in M. genitalium (8), it was unexpected to findORFs throughout almost all of the individual regions (the onlyexception is in region KL of sequence SG8). Even morestriking was the maintenance of high levels of sequence

    11830 Genetics: Peterson et al.

    Dow

    nloa

    ded

    by g

    uest

    on

    May

    30,

    202

    1

  • Proc. Natl. Acad. Sci. USA 92 (1995) 11831

    A 29kD - POR MgPa-------- PC 49,35

    ORF-3

    4 4 4 4835 1380 2040 2573

    A B C

    * r11/4-A .12I I4 rz4 4 4 4-1

    3174 3805 4285 4770 5195 5537 5960D E F G H J K

    6815 7080L M

    0 s t N cocc coc O 0FCO aT CD

    B EF AT

    LO CO LO CMU: oo )C o_2cu

    SC2 1l\\E\\tz//L

    coLO

    ?06

    _ILc') co cl) co r-CO (D 4n 00 CONmc LC) r:lNaCDto-s rc,o CD

    KL G LMCD

    CD

    A

    "sN C C\J CO C)LO co v Nl- co

    N X O 325 e

    SG8I1_

    IllilN NNOcl

    CD O Cf mI\\X EScmccWfflmfl~~~/~ESH1 0

    t,

    a) Cc

    N

    I°r- ~ ~ CO

    HSA7 .X I>1X157 C

    HB7N-m

    SD3a L,ZI-0ccCDOtiCD CD

    HG4vE

    FIG. 1. Structures of MgPa-related repetitive elements. The sequence of the MgPa operon (GenBank accession no. M31431) is compareddiagrammatically to nine homologous repetitive element sequences. Accession numbers of repetitive sequences are listed in parentheses followingthe names of the contributing genomic clones. Sequences based upon data from more than one clone are named for the clone listed first: SAil,SE6, and SF4 (U34967); SC2 and XF2 (U34968); SG8 and ESA4 (U34969); ESH10 and HSD3 (U34970); X5 (U01810); HSA7 and HE3 (U01766);HB7 (U02105); SD3A (U02157); HG4 (U02110). (A) Schematic representation of the three-gene MgPa operon. Divisions (A-J) represent therestriction fragments described by Dallo and Baseman (11). Positions of restriction sites used by Dallo and Baseman are indicated by vertical arrows.The coordinate of each site on the MgPa operon sequence is also shown. Sites defining three additional fragments, K, L, and M, are also indicated.Five shaded boxes indicate the regions that are found in repetitive DNA elements. These are named B, EF, KL, G, and LM. Positions of the PCRprimers used in the analysis of new isolates are indicated by horizontal arrows. ORF-3, open reading frame 3. (B) The structure of a completerepetitive element found in sequence SAil. Regions of sequence similarity between repetitive elements and the operon are indicated by identicalshading. Flanking sequences with no similarity to the MgPa operon are indicated by open boxes. Coordinates of the regions of homology withinthe MgPa sequence are shown above the junctions between regions. Letters below each box indicate regions of homology to the MgPa operon (B,EF, KL, G, and LM) and the A+T-rich region of the repeat (AT). (C) Schematic representation of eight partial repetitive element sequencesanalyzed in this study. Boxes are shaded as in B to indicate different regions of homology with the MgPa operon or are open to indicate nonrepetitiveflanking sequence. The partial elements are aligned to illustrate structural similarities to the SAl 1 sequence. Coordinates of the regions of homologywithin the MgPa sequence are shown as in B. The positions of single base insertions and deletions within sequence SG8 are indicated by (+ 1) and(-1) and the corresponding position in the homologous region of the MgPa operon sequence is given.

    identity to the corresponding region of the MgPa protein(amino acid sequence identities range from 66% to 98%).Since Mycoplasma promoter sequences are not well charac-terized, we cannot rule out the possibility that these elementsare transcribed from their own promoters or in a promiscuousfashion from neighboring promoters. However, it is an attrac-tive hypothesis that these sequences are not directly expressedas protein but are under indirect selective pressure to maintainORFs with sequence similarity to the MgPa protein.

    Second, sequence polymorphisms are shared between twoor more elements over varying lengths of sequence. Particu-larly noteworthy is the fact that polymorphisms involvingdeletions and insertions are shared between repetitive DNAs(Fig. 2). When we compared pairs of repetitive elements in aparticular region, the two elements most closely related to oneanother with respect to both sequence identity and lengthpolymorphisms changed over the length of that region. Third,some elements shared 100% nucleotide sequence identity overregions as long as 107 bp, and repetitive sequence SC2 isidentical to the MgPa operon sequence for 333 bp. Given theoverall relatedness of these sequences and assuming a randomdistribution of mutational events, the absence of even third

    position changes is remarkable. All of these features stronglyimply that recombination events occur between repetitiveDNAs and likewise between repetitive DNAs and the MgPacoding sequence.

    Analysis of the MgPa Operon B Region Sequence from FourNew Isolates of M. genitalium. To obtain additional evidenceconcerning the role of repetitive sequences in genetic variationof the MgPa operon, we made use of four new M. genitaliumstrains isolated from urethral specimens (by J.S.J.) afterprimary culture in Vero monkey kidney cells. A segment of theMgPa gene from these strains containing the B region wasamplified by PCR. Oligonucleotide primers were derived fromnonrepetitive sequences flanking the B region, so that thefunctional MgPa gene would be amplified and the repeatedsequences would not (see Fig. 1). Restriction enzyme analysesof PCR-amplified sequences from the four new isolates andother clinical specimens showed clear differences in theirMgPa genes (24). The cloned BstNI fragments generated bycleavage of the PCR product encompassing nt 1320-2327 fromthe MgPa operons of the four new strains were sequenced.Comparison of these sequences to one another showed largeregions of sequence identity flanking a variable region of -380

    BSA11 [

    C

    iFi.1

    X./E

    ////////Ei~~~~~~~1

    Genetics: Peterson et al.

    Dow

    nloa

    ded

    by g

    uest

    on

    May

    30,

    202

    1

  • Proc. Natl. Acad. Sci. USA 92 (1995)

    1960.C. .CACC.. TC ....... .......... .A .. AGC .. . ......... .................... .......... ..C....... G37-US.......... ........ ... .......... RepeatsA.. ACA.TC C.......J .. C.A.. ACA.TC C 1......... ........ :A.. AGC. .. .......... ..........A.. ACA.TC C ........... C. ...... MgPaA.. ACA.TC C ............ Operons

    ..... . . . . . . . . . . . ... . .. . . . . . . . . .CACTTTTAAA AAACGACTTT GCTAAAAAGCJ

    ...AG.A.A...... ACA.

    .......... .......ACA.

    *-... .. ... .

    .......... .......ACA.

    .G...G..A.-.---------

    .......... .......ACA.

    ... AG.A.A...... ACA.

    r- t" 7k -_-_______.u ... U..A~~~ ------

    - ... ...*

    TAAGAACAGT AGTGGG---G

    2007....... G. C..-.......... .. CT. .GA.T

    G..T. ..GAGT G37-US

    .......... ..TT..C.. Repeats

    ---A........ .C.

    .T........ ... TT..C..G..T. ..GA.T

    ---A G. .T... C.. MgPa.......... ... TGAGT Operons

    G. .T..GAGT

    AGGTGAAGTT AGAGGCAGAGJ

    FIG. 2. Sequence alignment of aportion of the B region of fiverepetitive element sequences fromG37-US (ESH10, SAil, X5, SG8,SC2) and MgPa genes from sixstrains. The G37-US and repetitiveelement sequences are the same asin Fig 1. The new MgPa gene se-quence GenBank accession num-bers follow the strain name in pa-rentheses: M2288 (X91071);M2300 (X91072); M2321(X91073); M2341 (X91074);G37-DK (X91075). Identities tobases present in G37-US are rep-resented by dots, and differenceswith respect to the sequence ofG37-US by the correspondingbases. Dashes represent the ab-sence of bases and are added foralignment of the sequences.

    nt spanning positions 1710-2080. This corresponds closely tothe B region found in the repetitive DNA elements. In thevariable region, more than one clone derived from each strainwas sequenced to exclude the possibility that reassortment ofsequences between the MgPa gene and the repeated sequenceswas occurring as an artifact of the PCR. Several differenceswithin the B-region DNA sequence were identified in theMgPa gene from the Danish stock of the laboratory strain(G37-DK) and the published MgPa sequence from stocksmaintained in Chapel Hill (G37-US). Our analyses of therepetitive elements of M. genitalium were performed on DNAisolated from the same stock from which the sequence of thepublished MgPa gene was determined (G37-US) (16).Almost the entire sequence of the MgPa gene B regions of

    five strains (G37-DK, M2288, M2300, M2321, and M2341)could be generated by recombination between small portionsof several of the repetitive element B regions sequenced fromG37-US (Fig. 3). The sequence differences found in the Bregion DNA sequence of the G37-DK and G37-US MgPagenes, spanning nt 1997-2010, can be explained by a singlerecombination event involving the B region of the repetitiveelement contained on clone X5. The sequence similaritybetween X5 and both G37 strains extends 100 nt 5' and 40 nt

    1649M23211

    1649

    M230 0 I.

    16492341::::::::.........M2341 ......... , . . . . . ..,,FF......,.... l

    1649M228 l----I

    1649G37.- D''''''''""""" K................................................................. 1'us5 1 -ursr:::::::::::::-:::::: ::::::::::::::::::::::::::::::::::::::::::::::::::-:-:-.-:-.-.- -:

    3' to the polymorphic region, making it impossible to identifythe exact endpoint of the putative sequence exchange.

    DISCUSSION

    Previous estimates from an analysis of a complete cosmidlibrary suggested evidence for at least seven repetitive ele-ments with sequence homology to the MgPa operon within theM. genitalium genome (3). Here we have characterized at leastfive examples of this DNA, including one complete repetitiveunit, which share several structural features. In M. genitaliumG37, repetitive elements appear to be -2200 nt long and to bea composite of noncontiguous regions of the MgPa adhesinoperon. In all cases examined, we saw an extremely A+T-richsequence (with no apparent homology to MgPa or any othersequences from M. genitalium) present between the EF and KLregions. This A+T-rich region averaged 266 nt in length andpossessed several translational stop codons in all frames.The genomic sequence of M. genitalium has been obtained

    since the work described in this paper was performed and willallow a detailed global analysis of the structure of this familyof repeated sequences (25).

    2089_ Z | ..~ .:

    2099_'

    2099 ESH10_ r;:;;1 SAl11

    7 1ul1t- X5 >2099 SC2 i

    I]G37-US

    M23212099 M2300

    M2341, ,,2.:::::::::: ...-::::::: M2288u1u

    FIG. 3. Schematic representation of the MgPa operon B regions from the four new M. genitalium strains and G37-US, showing sequencesimilarities between the sequences present in the operon and in the repetitive DNA sequences. Different shadings represent repetitive DNAsequences or strains ofM. genitalium, as shown in the key. Regions of 100% sequence identity between repetitive elements and MgPa genes ofvariousstrains are indicated by identical shading. Blocks of white represent regions of the particular MgPa gene, which were not identical to any of theelements we have sequenced or other MgPa B region sequence. In each case the repetitive element whose sequence showed the longest stretchof sequence identity with the operon was chosen. Block sizes are multiples of 10 nt.

    TC.---GACAA..CAAGGCG

    A._.CAA_CG

    A..CAAGGCGA..CAAGGCGA..CAAGGCGA..CAAGGCGA..CAAGGCG

    GAT------T

    ESH10SAil

    X5SG8SC2

    M2288M2341M2300M2321G37-DKG37-US

    ESH10SAi1

    X5SG8SC2

    M2288M2341M2300M2321G37-DKG37-US

    1917TTC .TC. G. GC... A.TGTC

    ... .A.TGTC

    ....A..GTCC... A.TGTC.A.A.TGTC

    ....A..GTC

    GAGTCAAAGT

    1961....G..A......G..A......G..A....G..A..

    .GT.T..AG.

    ....G..A..

    ....G..A..

    .GT.T..AG.**....***-

    ....G..A..CACTAAAGCA

    ---------- - - -_--_A_---___--___ 5|

    11832 Genetics: Peterson et al.

    Dow

    nloa

    ded

    by g

    uest

    on

    May

    30,

    202

    1

  • Proc. Natl. Acad. Sci. USA 92 (1995) 11833

    Our sequence data strongly suggests that five regions of theMgPa operon found in repetitive elements (B, EF, KL, G, andLM) recombine with themselves and the MgPa operon, bothin the laboratory and in vivo. These conclusions are based upon(i) the wide range of similarities exhibited between pairs ofrepetitive elements over the length of their entire sequence; (ii)the maintenance of ORFs in individual regions of the ele-ments; (iii) the occurrence of deletions and insertions inmultiples of three nucleotides; (iv) our inability to suggest apossible mode for the translation of these sequences, suggest-ing some indirect selective pressure for the maintenance ofORFs with similarity to the MgPa sequence; and (v) theconfirmation that four new isolates of M. genitalium and anindependently maintained stock of the laboratory strainG37-DK show polymorphisms within the B region of theirrespective MgPa genes. Recombination events involving therepetitive elements and the MgPa operon could have createdthese sequences, thus providing a potential mechanism forantigenic variation of the surface proteins encoded by theadhesin operon. This idea is further supported by the fact thatat least some of the non-repeated regions of MgPa genes arehighly or perfectly conserved among different strains.

    It is of additional interest that the MgPa protein was shownto be immunodominant in chimpanzees which had been in-fected with M. genitalium (19). Opitz and Jacobs (26) mappedepitopes of adherence inhibiting monoclonal antibodies tooverlapping eight-residue synthetic peptides designed from theMgPa protein sequence. Four of these epitopes mapped withinregions of the protein encoded by the B and EF regions,suggesting that these regions may be surface-exposed andimportant in adherence (26). The ability of M. genitalium tovary the.sequence of these regions could enable this organismto survive the host immune response.

    Neisseria gonorrhoeae exhibits genetic variation in its pilin,which functions as an adhesin, by a mechanism similar to theone we are proposing for the MgPa operon of M. genitalium(15). However, these systems differ in several ways: (i) theproteins encoded by the MgPa operon are not pilins; (ii) "silentcopies" of the pilin gene in N. gonorrhoeae are found in tandemarrays, whereas repeats are scattered in the M. genitaliumgenome; and (iii) silent copies in N. gonorrhoeae represent acontiguous segment of the pilin gene, whereas the M. geni-talium repeat is composed of several regions that are non-contiguous in the MgPa operon. The fact that only selectedregions of the adhesin operon are found in the repetitive DNAof M. genitalium rather than larger truncated gene duplicationsmay be a reflection of selective pressure on this organism tominimize its genome size.Our findings suggest that two genes of M. genitalium, MgPa

    and ORF-3, are subject to genetic variation by the mechanismproposed here. Although no direct evidence that the ORF-3product functions in cellular adhesion has been reported, itshould be noted that the Mycoplasma pneumoniae homolog,ORF-6 in the Pl adhesin operon; has been implicated in thecellular adhesion process (27). The adhesin operon of M.genitalium is closely related to the P1 adhesin operon of M.pneumoniae by several different criteria (28). M. pneumoniaealso contains repetitive DNA in the form of truncated copiesof the P1 adhesin operon, but no evidence for compositeelements similar to those described here has been reported.Recent evidence suggests that the ORF-6 gene of the P1adhesin operon may be undergoing gene conversion eventswith sequences present in one type of repetitive element (29).Since both species maintain DNA in the form of truncatedcopies of their adhesin operons, and these sequences seem toundergo recombination events with the functional operons, itappears that the ability to alter the appearance of adhesinproteins is necessary for survival.

    Our finding that M. genitalium repeated sequences appear tobe involved in genetic variation of the adhesin MgPa helps torationalize the presence of repeats in a minimal genome. Thisorganism is extremely fastidious as a direct consequence of itssmall genome size and in nature must obtain essential nutrientsfrom its mammalian host. It seems plausible that an efficientmechanism to optimize cellular adhesion and to evade the hostimmune response is therefore a necessary part of such agenome.

    Note Added in Proof. Comparison of our data to the completesequence ofM. genitalium indicates that the sequence SC2 was derivedfrom a clone that is a recombinant between the MgPa operon and theMgPa repeat at positions 167174-169797 in the genome (25). Genomiccoordinates of the sequences presented here are available on request(send electronic mail to [email protected]).

    C.C.B. and S.N.P. contributed equally to this manuscript. We thankJanne Cannon and Kevin Dybvig for helpful comments. This work wassupported by National Institutes of Health Grants A108998 andGM21313 to C.A.H. and A133161 to K.F.B.

    1. Colman, S. D., Hu, P.-C., Litaker, W. & Bott, K. F. (1990) Mol.Microbiol. 4, 683-687.

    2. Krawiec, S. & Riley, M. (1990) Microbiol. Rev. 54, 502-539.3. Lucier, T. S., Hu, P.-Q., Peterson, S. N., Song, X.-Y., Miller, L.,

    Heitzman, K., Bott, K. F., Hutchison, C. A., III, & Hu, P.-C.(1994) Gene 150, 27-34.

    4. Su, C. J. & Baseman, J. B. (1990) J. Bacteriol. 172, 4705-4707.5. Morowitz, H. J. (1984) Isr. J. Med. Sci. 20, 750-753.6. Peterson, S. N., Hu, P.-C., Bott, K. F. & Hutchison, C. A., III

    (1993) J. Bacteriol. 175, 7918-7930.7. Bailey, C. C. & Bott, K. F. (1994) J. Bacteriol. 176, 5814-5819.8. Maniloff, J. (1992) in Phylogeny of Mycoplasmas, eds. Maniloff,

    J., McElhaney, R. N., Finch, L. R. & Baseman, J. B. (Am. Soc.Microbiol., Washington, DC), pp. 549-559.

    9. Ferrell, R. V., Heidari, M. B., Wise, K. S. & McIntosh, M. A.(1989) Mol. Microbiol. 3, 957-967.

    10. Wenzel, R. & Herrmann, R. (1988) Nucleic Acids Res. 16,8337-8350.

    11. Dallo, S. F. & Baseman, J. B. (1991) Microb. Pathog. 10, 475-480.12. Haas, R. & Meyer, T. F. (1986) Cell 44, 107-115.13. Meier, J. T., Simon, M. I. & Barbour, A. G. (1985) Cell 41,

    403-409.14. Moxon, E. R., Rainey, P. B., Nowak, M. A. & Lenski, R. E.

    (1994) Curr. Biol. 4, 24-33.15. Robertson, B. D. & Meyer, T. F. (1992) Trends Genet. 8,422-427.16. Inamine, J. M., Loechel, S., Collier, A. M., Barile, R. M. & Hu,

    P.-c. (1989) Gene 82, 259-267.17. Collier, A. M., Carson, J. L., Hu, P.-C., Hu, S. S., Huang, C. H.

    & Barile, M. F. (1990) Zbl. Bakt. S20, 730-732.18. Mernaugh, G. R., Dallo, S. F., Holt, S. C. & Baseman, J. B.

    (1993) Clin. Infect. Dis. 17, S69-78.19. Tully, J. G., Taylor-Robinson, D., Rose, D. L., Furr, P. M.,

    Graham, C. E. & Barile, M. F. (1986) J. Infect. Dis. 153, 1046-1054.

    20. Jensen, J. S., Hansen, H. T. & Lind, K. (1994) Int. Organ.Mycoplasmol. Lett. 3, 143-144 (abstr.).

    21. Devereaux, J., Haberli, P. & Smithies, 0. (1984) Nucleic AcidsRes. 12, 387-395.

    22. Staden, R. (1982) Nucleic Acids Res. 10, 4731-4751.23. Pearson, W. R. & Lipman, D. J. (1988) Proc. Natl. Acad. Sci. USA

    85, 2444-2448.24. Jensen, J. S. (1994) Int. Organ. Mycoplasmol. Lett. 3, 429-430

    (abstr.).25. Fraser, C. M., Gocayne, J. D., White, O., Adams, M. D., Clayton,

    R. A., et al. (1995) Science 270, 397-403.26. Opitz, 0. & Jacobs, E. (1992) J. Gen. Microbiol. 138, 1785-1790.27. Layh-Schmitt, G., Hilbert, H. & Pirkl, E. (1995) J. Bacteriol. 177,

    843-846.28. Razin, S. & Jacobs, E. (1992) J. Gen. Microbiol. 138, 407-422.29. Ruland, K., Himmelreich, R. & Herrmann, R. (1994)J. Bacteriol.

    176, 5202-5209.30. Hayf lick, L. (1965) Tex. Rep. Biol. Med. 23, 285-303.31. Jensen, J. S., 0rsum, R., Dohn, B., Uldum, S., Worm, A.-M. &

    Lind, K. (1993) Genitourin. Med. 69, 265-269.

    Genetics: Peterson et al.

    Dow

    nloa

    ded

    by g

    uest

    on

    May

    30,

    202

    1