15
THE JOURNAL OF BIOLOGICAL CHEMISTRY Printed in U. S. A. Vol. 255, No. 24, Issue of December 25, pp. 11927-11941, 1980 Assembly of the Mitochondrial Membrane System STRUCTURE AND NUCLEOTIDE SEQUENCE OF THE GENE CODING FOR SUBUNIT 1 OF YEAST CYTOCHROME OXIDASE* (Received for publication, July 23, 1980) Susan G. Bonitz, Gloria Coruzzi, Barbara E. Thalenfeld, and Alexander Tzagoloff From the Department of Biological Sciences, Columbia University, New York, New York 10027 Giuseppe Macino From the Zstituto di Fisiologia Generale, Universita di Roma, Rome, Italy The oxid locus of yeast mitochondrial DNA has been sequenced in Saccharomyces cerevisiae D273-10B. The sequence was obtained from the mitochondrial ge- nomes of a series of cytoplasmic “petite” mutants se- lected for the retention of genetic markers in the onid locus. The oxi3 locus has been ascertained to code for Subunit 1 of cytochrome oxidase. The Subunit 1 gene is 9,979 nucleotides long, consisting of seven to eight ex- ons that account for only 16% of the gene sequence. The coding sequences have been identified on the basis of protein sequence homology with Subunit 1 of human cytochrome oxidase. The yeast Subunit 1 is 510 amino acid residues long and has a molecular weight of 56,000. In addition to the exon sequences, the Subunit I gene contains six to seven introns. The first four introns have long reading frames that are continuous with the exon coding sequences. These reading frames are po- tentially capable of coding for basic proteins with mo- lecular weights ranging from 30,000 to 80,000. The first two introns of the gene have a sequence homology of 508, while the reading frame of the fourth intron is 70% homologous with an intron of the apocytochrome b gene. At least five stable transcripts have been found by Northern blot hybridizations with single-stranded DNA probes containing either exon or intron sequences. A 1.9-kilobase transcript hybridizes only with probes from the exon regions of the gene. This RNA species has been tentatively identified as the fully processed messenger of Subunit 1. Other transcripts aredetected with intron probes. Three transcripts with sizes of 2.5, 2.4, and 0.85 kilobases appear to be stable excision products from the first,second, and fifth introns. In the companion paper, (1) we reported the properties of a seriesof p- clones’ of Saccharomyces cerevisiae selected for the retention of markers in the oxid locus. The mitochondrial DNA (mtDNA) segments of the clones spanned the wild type sequence from 42 to 58 map units. This region has been shown in otherstudiestocontainthe oxid locus that codes for Subunit 1 of cytochrome oxidase (2, 3). * This research was suported by Research Grant GM25250 and a National Research Service Award (to B. E. T.) from the National Institutes of Health, United States Public Health Service. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked “aduer- tisement” in accordance with 18 U.S.C. Section 1734 solely to indicate this fact. The abbreviations used are: p- , cytoplasmic “petite” mutants with long deletions in mitochondrial DNA; SDS, sodium dodecyl sulfate; kb, kilobase. The mitochondrial genomes of the p- mutants have been used in thepresentstudytosequencethe oxid locus. A continuous nucleotide sequence of 10 kb has been obtained from approximately 43.5 to 58 map units. The sequence con- tains the entire gene of Subunit 1. The gene is, 9,979 nucleo- tides long and is composed of seven to eight exons and six to seven introns. In addition to the exon-coding regions, the Subunit 1 gene has four other long reading frames. These are located in the introns and are potentially capable of coding for proteins with molecular weights of up to 80,000. One of the 0x8 introns has extensive sequence homology with the first intron of the apocytochrome b gene (4). The two genes also exhibit sequence homologies in some of the exon-intron boundaries, suggesting a common mechanism of splicing of thetranscripts.Thetranscripts originating fromthe 0x23 region have been studied by Northern blot hybridization of total mitochondrial RNA to 5’ end-labeled DNA probes con- taining either exon or intron sequences. The results of these hybridizations indicate the messenger of Subunit 1 to be 1.9 kb long. The 1.9-kb transcript is the smallest and most abun- dant RNA detected by exon probes. MATERIALS AND METHODS fied from the p- clones DS6, DS6/A400, DS6/A401, DS6/A402, DS6/ Strains-The mtDNAs used for the sequence analysis were puri- A407, DS6/A422, and DS6/A462. The isolation of the clones and physical properties of their genomes have been reported in the accompanying paper (1). Purification of rnlDNA-Yeast was grown at 30°C to early station- ary phase in 2% glucose, 1% yeast extract, 1% peptone. The procedures for the preparation of yeast mitochondria and purification of mtDNA have been described (I). DNA Sequencing-Restriction fragments were labeled at the 5‘ ends with [yJ’P]ATP (Zoo0 to 3000 Ci/mmol, New England Nuclear Corp., Boston, MA) in a T4 polynucleotide kinase-catalyzed reaction (5). The labeled fragments were denatured in 90% formamide and strand-separated on 4%. 6%, or 8% polyacrylamide gels at 4°C (5). The labeled single strands were eluted from the gels and sequenced by the chemical derivatization method of Maxam and Gilbert (5). Preparation of Mitochondrial RNA-The wild type strain of S. cereuisiue D273-10B/AI (6) was grown to early stationary phase in 1 liter of a medium containing 28 galactose, 1% yeast extract, and 1% peptone. Approximately 10 g of wet weight cells were obtained from 1 liter of medium. The cells were washed in 1.2 M sorbitol and suspended in 30 ml of a solution containing 1.2 M sorbitol, 0.08 M KPO,, pH 7.5, 0.16 M 2-mercaptoethanol, 1 mM EDTA, and 1 mg/ml of Zymolase (Kirin Brewery Ltd., Japan). Virtually complete conver- sion of the cells to protoplasts occurred after 45 to 60 min of incubation at 32°C. The protoplasts were collected by centrifugation at 6,000 X g for 10 min and washed two times with 150 ml of 1.2 M sorbitol. The washed protoplasts were suspended in 75 ml of 0.7 M sorbitol, 10 mM Tris/acetate, pH 7.5, 1 mM EDTA, 0.1 mg/ml of bovine serum albumin, and 5 mM vanadium adenosine. The vanadium adenosine was used as a ribonuclease inhibitor (7). The suspension was homog- 11927

THE JOURNAL OF BIOLOGICAL CHEMISTRY 255, No. 25, … · THE JOURNAL OF BIOLOGICAL CHEMISTRY Printed in U. S. A. Vol. 255, No. 24, Issue of December 25, pp. 11927-11941, 1980 Assembly

  • Upload
    lehuong

  • View
    224

  • Download
    0

Embed Size (px)

Citation preview

THE JOURNAL OF BIOLOGICAL CHEMISTRY

Printed in U. S. A. Vol. 255, No. 24, Issue of December 25, pp. 11927-11941, 1980

Assembly of the Mitochondrial Membrane System STRUCTURE AND NUCLEOTIDE SEQUENCE OF THE GENE CODING FOR SUBUNIT 1 OF YEAST CYTOCHROME OXIDASE*

(Received for publication, July 23, 1980)

Susan G. Bonitz, Gloria Coruzzi, Barbara E. Thalenfeld, and Alexander Tzagoloff From the Department of Biological Sciences, Columbia University, New York, New York 10027

Giuseppe Macino From the Zstituto di Fisiologia Generale, Universita di Roma, Rome, Italy

The oxid locus of yeast mitochondrial DNA has been sequenced in Saccharomyces cerevisiae D273-10B. The sequence was obtained from the mitochondrial ge- nomes of a series of cytoplasmic “petite” mutants se- lected for the retention of genetic markers in the onid locus. The oxi3 locus has been ascertained to code for Subunit 1 of cytochrome oxidase. The Subunit 1 gene is 9,979 nucleotides long, consisting of seven to eight ex- ons that account for only 16% of the gene sequence. The coding sequences have been identified on the basis of protein sequence homology with Subunit 1 of human cytochrome oxidase. The yeast Subunit 1 is 510 amino acid residues long and has a molecular weight of 56,000. In addition to the exon sequences, the Subunit I gene contains six to seven introns. The first four introns have long reading frames that are continuous with the exon coding sequences. These reading frames are po- tentially capable of coding for basic proteins with mo- lecular weights ranging from 30,000 to 80,000. The first two introns of the gene have a sequence homology of 508, while the reading frame of the fourth intron is 70% homologous with an intron of the apocytochrome b gene.

At least five stable transcripts have been found by Northern blot hybridizations with single-stranded DNA probes containing either exon or intron sequences. A 1.9-kilobase transcript hybridizes only with probes from the exon regions of the gene. This RNA species has been tentatively identified as the fully processed messenger of Subunit 1. Other transcripts are detected with intron probes. Three transcripts with sizes of 2.5, 2.4, and 0.85 kilobases appear to be stable excision products from the first, second, and fifth introns.

In the companion paper, (1) we reported the properties of a series of p- clones’ of Saccharomyces cerevisiae selected for the retention of markers in the oxid locus. The mitochondrial DNA (mtDNA) segments of the clones spanned the wild type sequence from 42 to 58 map units. This region has been shown in other studies to contain the oxid locus that codes for Subunit 1 of cytochrome oxidase (2, 3) .

* This research was suported by Research Grant GM25250 and a National Research Service Award (to B. E. T.) from the National Institutes of Health, United States Public Health Service. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked “aduer- tisement” in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

’ The abbreviations used are: p - , cytoplasmic “petite” mutants with long deletions in mitochondrial DNA; SDS, sodium dodecyl sulfate; kb, kilobase.

The mitochondrial genomes of the p - mutants have been used in the present study to sequence the oxid locus. A continuous nucleotide sequence of 10 kb has been obtained from approximately 43.5 to 58 map units. The sequence con- tains the entire gene of Subunit 1. The gene is, 9,979 nucleo- tides long and is composed of seven to eight exons and six to seven introns. In addition to the exon-coding regions, the Subunit 1 gene has four other long reading frames. These are located in the introns and are potentially capable of coding for proteins with molecular weights of up to 80,000. One of the 0 x 8 introns has extensive sequence homology with the first intron of the apocytochrome b gene (4 ) . The two genes also exhibit sequence homologies in some of the exon-intron boundaries, suggesting a common mechanism of splicing of the transcripts. The transcripts originating from the 0x23 region have been studied by Northern blot hybridization of total mitochondrial RNA to 5’ end-labeled DNA probes con- taining either exon or intron sequences. The results of these hybridizations indicate the messenger of Subunit 1 to be 1.9 kb long. The 1.9-kb transcript is the smallest and most abun- dant RNA detected by exon probes.

MATERIALS AND METHODS

fied from the p- clones DS6, DS6/A400, DS6/A401, DS6/A402, DS6/ Strains-The mtDNAs used for the sequence analysis were puri-

A407, DS6/A422, and DS6/A462. The isolation of the clones and physical properties of their genomes have been reported in the accompanying paper (1).

Purification of rnlDNA-Yeast was grown a t 30°C to early station- ary phase in 2% glucose, 1% yeast extract, 1% peptone. The procedures for the preparation of yeast mitochondria and purification of mtDNA have been described (I).

DNA Sequencing-Restriction fragments were labeled at the 5‘ ends with [yJ’P]ATP (Zoo0 to 3000 Ci/mmol, New England Nuclear Corp., Boston, MA) in a T4 polynucleotide kinase-catalyzed reaction (5). The labeled fragments were denatured in 90% formamide and strand-separated on 4%. 6%, or 8% polyacrylamide gels at 4°C (5). The labeled single strands were eluted from the gels and sequenced by the chemical derivatization method of Maxam and Gilbert (5).

Preparation of Mitochondrial RNA-The wild type strain of S. cereuisiue D273-10B/AI (6) was grown to early stationary phase in 1 liter of a medium containing 2 8 galactose, 1% yeast extract, and 1% peptone. Approximately 10 g of wet weight cells were obtained from 1 liter of medium. The cells were washed in 1.2 M sorbitol and suspended in 30 ml of a solution containing 1.2 M sorbitol, 0.08 M KPO,, pH 7.5, 0.16 M 2-mercaptoethanol, 1 mM EDTA, and 1 mg/ml of Zymolase (Kirin Brewery Ltd., Japan). Virtually complete conver- sion of the cells to protoplasts occurred after 45 to 60 min of incubation at 32°C. The protoplasts were collected by centrifugation at 6,000 X g for 10 min and washed two times with 150 ml of 1.2 M sorbitol. The washed protoplasts were suspended in 75 ml of 0.7 M sorbitol, 10 mM Tris/acetate, pH 7.5, 1 mM EDTA, 0.1 mg/ml of bovine serum albumin, and 5 mM vanadium adenosine. The vanadium adenosine was used as a ribonuclease inhibitor (7). The suspension was homog-

11927

11928 Sequence of Gene Coding for Subunit 1 of Cytochrome Oxidase

enized for 30 s in a Waring Blendor and centrifuged two times at 1,iKX) X g. The supernatant was then centrifuged at .30,000 X g for 10 min to pellet the mitochondria. The mitochondria were washed three times with 30 mi of 0.5 M sorbitol, 0.05 M Tris/acetate, pH 7.5, 1 mM EDTA. The washed mitochondria were lysed in 2 ml of 2% sodium dodecyl sulfate and immediately mixed with an equal volume of water-saturated phenol. The two phases were separated by centrifu- gation at 5,000 X g for 10 min, and the water phase was dialyzed for 17 h against 2 Iiters of cold distilled water. Little degradation was observed under these conditions even after 1 week of storage.

Northern Blots-Yeast mitochondrial RNA was denatured for 5 min at 60°C in 10 mM methylmercuric hydroxide, and 1 to 2 pg were loaded on 0.6-cm-wide wells of a 1% or 1.2% agarose gel containing 10 mM methylmercuric hydroxide. Electrophoresis was carried out at 5 V/cm for 4 to 5 h. The gel was treated with 14.3 mM 2-mercaptoeth- anol, stained with 5 p g / d of ethidium bromide, and photographed. Prior to blotting, the gel was soaked in 7 mM iodoacetic acid followed by two changes of 200 mM sodium acetate buffer, pH 4.0.

The preparation of diazobenzyloxymethyl paper and the conditions for the transfer of RNA from the agarose gel to the paper were the same as described by Alwine et al. (8). Single-stranded DNA frag- ments labeled with 32P at the 5’ ends were hybridized to the diazo- benzyloxymethyl strips (9). The diazobenzyloxymethyl strips were exposed to Kodak XR-1 film with an intensifying screen for 1 to 3 days at -80OC.

RESULTS

Sequencing Strategy-The p- clone DS6 has a mitochon- drial genome with a unit length of 16.5 kb (3). Even though this clone was shown to contain the entire oxi3 gene, the large genomic size precluded its use for DNA sequencing. Most of the DNA sequences were therefore determined on the smaller genomes of the derivative clones which ranged from 2.3 to 6.1 kb in size, spanning the sequence from 42 to 58 map units. The regions of DNA (expressed in map units) sequenced in each clone are indicated in Table I. Since DS6/A462 had been shown to have a small internal deletion and inversion in its DNA, the sequence of the region retained in this clone was also obtained from a preparative BgZ 11-Ban HI fragment from DS6.

All the sequence data were obtained from single-stranded fragments labeled at the 5’ ends. By choosing appropriate combinations of restriction endonucleases, it was possible to generate a sufficiently discrete number of fragments for sub- sequent resolution of the labeled mixture on the strand sepa- ration gels. All the restriction sites except those for Alu I and Rsa I had been mapped in each genome, thus reducing the number of different labelings necessary to complete the se- quence.

Nucleotide Sequence of the oxi3 Locus-The restriction map of the 0 x 2 locus is shown in Fig. 1. The sites used for labeling and the directions and extents to which the labeled fragments were read on the sequencing gels are indicated by the arrows. All the restriction sites used for labeling were crossed except for the Alu I site at 51.8 units. We therefore cannot be sure that the sequence of a small Alu I fragment may not have been missed. Most of the sequences were confirmed either from the complementary strands or from identical sequences obtained from different genomic DNAs.

TABLE I Regions of mtDNA sequenced in various p - clones

Clone Region sequenced map units

44.1-47.6 47.1-51.7 50.6-53.2

50.8-51.9, 52.8-58.2 53.7-57.4

50.2-51.9, 52.1-53.7

DS6/A401 DS6/A400 DS6/A402 Bgl 11-Barn HI DS6/A422 DS6/A407 DS6/A462

43.5-45.4, 47.2-50.7

FIG. I. Physical map of the oxid locus. The restriction frag- ments used for DNA sequencing are indicated by the arrows. The extent to which the sequences were read is represented by the lengths of the mows. The map units are shown on the inner circle. The following symbols are used for the restriction sites: A, HinfI; A, Hpa 11; 0, Hue III; *, Taq I; 0, Mbo I; 0, Mbo 11; 0, A h I; Q, Puu 11; 0, HincII; e, HindIII; 8, Eco RI; J,, Eco RII; 0, Hha I; Q, Rsa I; 1. Hph I; 8, Bgl II; 0, Barn HI; ., Hpa 1.

As indicated in the previous paper, the restriction maps of the different p- mtDNA segments were consistent with each other except for the DS6/A462 clone. This was confirmed by the DNA sequences which were identical in clones with inclusive or overlapping regions.

A continuous sequence from 43.5 to 58 units is presented in Fig. 2. The sequence includes the region of DS6 where all the known oxi3 markers have been mapped (1). The sequence starts with an adenine + thymine-rich region. Although not shown, the adenine + thymine-rich sequence proceeds for at least 1 kb further upstream to the deletion end point of the DS6/A401 genome at approximately 42 map units. The ATG initiator at nucleotide +1 starts an open reading frame that continues for 2503 nucleotides before reaching a TAA termi- nation codon. The amino acid sequence encoded in this read- ing frame is shown above the DNA sequence in Fig. 2 . The assignment of codons is based on the universal code except for the termination codon UGA, which has been translated as tryptophan (10-12), and the CUN codons of leucine that have been shown to code for threonine in yeast mitochondria (13, 14). Another long reading frame occurs some 100 nucleotides downstream from the terminator of the fist coding sequence. This reading frame is also very long, comprising 2399 nucleo- tides. There are four additional regions of variable lengths potentially capable of coding for proteins. These are seen between nucleotides 6 0 7 5 and “3211, +6697 and +8148, +8169 and +8642, and +9422 and +9979. The last reading frame ending with an ochre terminator at nucleotide +9979 is followed by a long adenine + thymine-rich sequence only part of which is shown in Fig. 2. The six reading frames account for almost the entire sequence of the oxi3 locus. There are, however, two moderately long regions with fre- quent stop codons in all three registers of the sequence (+6212 to +6696 and +8643 to +9421).

Identification of the Subunit 1 Exons-Subunit 1 of yeast cytochrome oxidase migrates with an apparent molecular weight of 40,000 when electrophoresed on SDS-polyacryl- amide gels (15, 16). A co-linear gene coding for a protein of

Sequence of Gene Coding for Subunit 1 o f Cytochrome Oxidase 11929

-126 5"TTATTATCCTTTAAGATATAACAATA

-100 ATTATTTAAATTAAATTAAATTAAATTTAATTAATTTTTTTTTTTAATGA ATATAATAATAATAATATTATTAAAATTAATATATAAAAAAAAAGTAAAA

+1 ATG U A A AGA TGA TTA TAT TCA ACA AAT GCA AAA GAT ATT GCA GTA TTA TAT TTT ATG T T U ATT TTT AGT fMET VAL GLN ARG TRP LEU TYR SER THR ASN ALA LYS ASP ILE ALA VAL LEU TYR PHE MET LEU A L A I L E PHE SER

Rsa I A1 u I

GLY MET ALA GLY THR ALA MET SER LEU I L E I L E ARG LEU GLU LEU ALA ALA PRO GLY SER GLN TYR LEU HIS GLY +76 GGT ATG GCA GGA ACA GCA ATG TCT TTA ATC ATT AGA TTA GAA T T m GCA C C T T TCA CAA TAT TTA CAT GGT

A1 uI E c o R I I

+151 AAT TCA CAA TTA TTT AAT GGT E C T CTC AGT GCG TAT ATT TCG TTG ATG CGT CTA GCA TTA GTA TTA T U ASN SER GLN LEU PHE ASN gly ala pro thr ser ala tyr ile ser leu met arg thr ala leu Val leu trp i le

HhaI H i n f I

+226 ATC AAT AGA TAC TTA AAA CAT ATG ACT AAC TCA GTA GGG GCT AAC T T T ACG GGG ACA ATA GCA TGT CAT AAA ACA i l e asn arg tyr leu lys his met t h r asn ser Val g l y ala asn phe thr gly thr ile ala cys his lys thr

t301 CCT ATG A T T AGT GTA GGT GGA GTT AAG TGT TAC ATG GTT AGG TTA ACG AAC AAC TTA CAA GTC TTT ATC AGG ATT pro met i le ser Val gly gly Val lys cys tyr met Val a r g leu thr asn asn leu g l n val phe i l e arg i l e

HincII HpaI

thr i le ser ser tyr his leu asp i l e Val lys gln Val trp leu phe tyr Val glu Val i l e arg leu trp phe +376 ACA ATT TCC TCT TAT CAT TTG GAT ATA GTA AAA CAA GTT TGA TTA TTT TAC GTT GAG GTA ATC AGA TTA r G

HinfI

+451 ATT GTT TTA GAT AGC ACA GGC AGT GTG AAA AAG ATG AAG GAC CTA AAT AAC ACA AAA GGA AAT ACG A M AGT GAG i l e Val leu asp ser thr gly ser Val lys lys met lys asp thr asn asn thr lys gly asn thr lys ser glu

+526 G G A ACT GAA AGA GGA AAC TCT GGA GTT GAC AGA GGT ATA GTA F C G AAT ACT CAA ATA AAA ATG AGA TTT gly ser thr glu arg gly asn ser gly Val asp arg gly i l e Val Val pro asn thr g l n i le lys met arg phe

Mbo I HincII RsaI

leu asn gln Val a rg tyr tyr ser val asn asn asn leu lys i le gly lys asp thr asn i l e g l u leu ser lys +601 TTA AAT CAA GTT AGA TAC TAT TCA GTA AAT AAT AAT TTA AAA ATA GGG AAG GAT ACC AAT ATT GAG TTA TCA AAA

+676 GAT ACA AGT ACT TCG GAC TTG TTA GAA TTT GAG AAA TTA GTA ATA GAT AAT ATA AAT GAG GAA AAT ATA AAT AAT asp thr ser thr ser asp leu leu g l u phe glu lys leu Val i l e asp asn i l e asn glu glu asn i l e asn asn

Rsar

t 751 AAT TTA TTA AGT ATT ATA M A AAC GTA GAT ATA TTA ATA I T A GCA TAT AAT AGA ATT AAG AGT AAA C E T AAT arn leu leu ser i l e i l e lys asn Val asp i l e leu i l e leu ala tyr asn arg i l e lys ser lys p ro g l y asn

i l e thr pro gly thr thr leu glu thr leu asp gly i l e asn i l e i l e tyr leu asn lys leu ser asn g l u leu

EcoRII

+a26 ATA ACT CCA GGT ACA ACA TTA GAA ACA TTA GAT GGT ATA RAT ATA ATA TAT TTA AAT AAA TTA TCA AAT GAA TTA EcoRII RsaI

+go1 GGA ACA GGT AAA TTC AAA TTT AAA CCC ATG AGA ATA GTT AAT ATT CCT AAA CCT M A GGT GGT ATA AGA CCT TTA g l y thr gly lys phe lys phe lys pro met arg i l e Val asn i l e pro lys pro lys gly gly i l e a r g pro leu

+976 AGT GTA GGT AAT CCA AGA GAT AAA ATT U A A GAA GTT ATA AGA ATA ATT TTA GAT ACA ATT TTT GAT AAA AAG ser Val gly asn pro arg asp lys i l e Val g l n glu Val i l e arg i l e i l e leu asp thr i le phe asp lys lys

RsaI

+lo51 ATA TCA ACA CAT TCA CAT GGT T T T AGA AAG AAT ATA AGT TGT CAA ACA GCA ATT TGA GAA GTT AGA AAT ATA TTT i le ser thr his ser his gly phe arg lys asn ile ser cys gln thr ala ile trp glu Val arg asn i le phe

+1126 GGT GGA AGT AAT TGA TTT ATT GAA GTA GAC TTA AAA AAA TGT TTT GAT ACA ATT TCT CAT GAT TTA ATT ATT AAA gly gly ser asn trp phe i l e glu Val asp leu lys lys cys phe asp thr i le ser h i s asp leu i le i le lys

g l u leu lys arg tyr i le ser asp lys g l y phe i l e asp leu Val tyr lys leu leu arg ala g l y tyr i le asp +1201 GAA TTA AAA AGA TAT ATT TCA GAT AAA GGT TTT ATT GAT I T A GTA TAT AAA I T A TTA AGA GCT GGT TAT ATT GAT

AluI

+1276 GAG AAA GGA ACT TAT CAT AAA CCT ATA TTA GGT TTA CCT CAA G S A TTA ATT AGT CCT ATC TTA TGT AAT ATT g l u lys g l y thr tyr his lys pro i l e leu gly leu pro gln gly ser leu i le ser pro i le leu cys asn i l e

M b o I

val i le thr leu v a l asp asn trp leu glu asp tyr i le asn leu tyr asn lys gly lys val lys lys g l n his +1351 GTA ATA ACA TTG GTA GAT AAT TGA TTA M T TAT ATT AAT TTA TAT AAT AAA GGT AAA GTT AAA AAA CAA CAT

MboII

+1426 CCT ACA TAT AAA A M TTA TCA AGA ATA ATT GCA AAA GCT AAA ATA TTT TCG ACA AGA TTA AAA TTA CAT AAA GAA pro thr tyr lys lys leu ser arg i l e i l e a l a lys a l a lys ile phe ser thr arg leu lys leu his l y s g l u

TaqI

+1501 AG*T A A A S C A CTA TTT ATT TAT AAT GAT CCT AAT TTC AAG AGA ATA AAA TAC GTT AGA TAT GCA GAT GAT arg a l a lys g l y pro thr phe i le tyr asn asp pro asn phe lys arg i l e lys tyr Val arg tyr ala asp asp

A l u I HaeIII __ Mbo I

FIG. 2. Nucleotide sequence of the oxi3 gene. The sequence of the nontranscribed strand is shown. The exon reading frames have been translated into the protein sequences shown in capital letters. The amino acid sequences of the intron reading frames are indicated in lower case. All the restriction sites are marked. The sequence starts at approximately 43.5 and ends a t 58.2 map units.

11930 Sequence of Gene Coding for Subunit 1 of Cytochrome Oxidase

i l e l e u i l e g l y Val l eu g ly s e r l y s a sn a sp cys l y s i l e i l e l y s a rg a sp l eu a sn a sn phe leu asn se r +1576 ATT TTA ATT GGG GTA TTA GGT TCA AAA PAT GAT TGT AAA ATA ATC AAA AGA GAT TTA AAC AAT TTT TTA AAT TCA

+I651 TTA GGT TTA ACT ATA AAT *A AAA ACT TTA ATT ACT TGT GCA ACT GAA CTA CCA GCA AGA TTT TTA GGT TAT l eu g ly l eu thr i l e a sn g lu g lu l y s thr leu i l e thr cys a la thr g lu thr pro a la a rg phe l e u g l y t y r

MboI I

a s n i l e s e r i l e thr pro l eu lys a rg i l e p ro thr Val thr lys thr i l e a r g g l y l y s thr i l e a r g s e r a r g +1726 AAT ATT TCA ATT ACA CCT TTA AAA AGA ATA CCT ACA GTT ACT AAA CTA ATT AGA GGT AAA CTT ATT AGA AGT AGA

+la01 AAT ACA ACT AGA CCT ATT ATT AAT GCA CCA ATT AGA GAT ATT ATC AAT AAA TTA GCT ACT AAT GGA TAT TGT AAG asn thr thr a r g p r o i l e i l e a s n a l a pro i l e arg asp i l e i l e asn lys l eu a la thr asn g ly ty r cy5 lys

io" h i s a s n l y s a s n g l y a r g i l e g l y Val pro thr arg Val g l y a r g t r p thr tyr g lu g lu p ro a rg thr i l e i l e

+I876 CAT AAT AAA AAT GGT AGA ATA GGA GTG CCT ACA AGA GTA GGT AGA TGA CTA TAT -A CCT AGA ACA ATT ATT MboI I

+1951 AAT AAT TAT AAA GCG TTA GGT AGA GGT ATC TTA AAT TAT TAT AAA TTA GCT ACT AAT TAT AAA AGA TTA AGA GAA asn a sn t y r l y s a l a l eu g ly a rg g ly i l e l eu a sn t y r t y r l y s l eu a l a t h r a s n t y r l y s a r g l e u a r g g l u

AluI a r g i l e t y r t y r Val l e u t y r t y r s e r c y s Val leu thr l eu a l a ser ly s t y r a rg l eu l y s thr i l e s e r l y s

+2026 A- TAT TAC GTA TTA TAT TAT TCA TGT GTA TTA ACT TTA GCT AGT AAA TAT AGA TTA AAA ACA ATA AGT AAA Hinf I AluI

thr i l e l y s l y s phe g l y t y r a s n l e u a s n i l e i l e g l u asn a s p l y s leu i l e a l a a s n phe pro arg asn thr +2101 ACT ATT AAA AAA TTT GGT TAT AAT TTA AAT ATT ATT GAA AAT GAT AAA TTA ATT GCC AAT TTT CCA AGA AAT ACT

+2176 TTT GAT AAT ATC AAA AAA ATT GAA AAT CAT GGT ATA TTT ATA TAT ATA TCA G A B AAA GTA ACT e C T TTT phe a s p a s n i l e l y s l y s i l e g l u a s n h i s g l y i l e phe i l e t y r i l e s e r g l u a l a l y s Val thr asp pro phe

A1 uI M b o I

+2251 GAA TAT ATC GAT TCA ATT AAA TAT ATA TTA CCT ACA GCT AAA GCT AAT TTT AAT AAA CCT TGT AGT ATT TGT AAT g l u t y r i l e a s p s e r i l e l y s t y r i l e leu pro thr a l a l y s a l a a s n phe asn lys p ro cys se r i l e cys asn

T a q I HinfI AluI A l u I

+2326 TCA ACT ATT GAT GTA GAA ATA CAT CAT GTT AAA CAA TTA CAT AGA GGT ATA TTA AAA GCA CTT AAA GAT TAT ATT ser thr i l e a s p Val g lu i l e h i s h i s Val l y s g l n l e u h i s a r g g l y i l e leu l y s a l a thr l y s a s p t y r i l e

+2401 CTA GGT AGA ATA ATT ACC ATA AAC AGA AAA CAA ATT CCA TTA TGT AAA CAA TGT CAT ATT AAA ACA CAT AAA AAT thr g l y a r g i l e i l e t h r i l e a sn a rg l y s g ln i l e p ro leu c y s l y s g l n c y s h i s i l e l y s thr h is lys asn

+2476 AAA TTT AAA AAT ATA GGA U T ATA TAA AATCTATTA TTAATGATACTCAATATGGAAGCCGTATGATGGGAAACTATCAC-G l y s phe ly s a sn i l e g ly p ro g ly i l e och

EcoRI I RsaI

phe a l a i l e s e r phe LEU VAL VAL GLY HIS ALA VAL L E U MET I L E tZ565 GTTTGGGAAAGGCTCTTTAACACGTGGCAACATAGGTTAA TTT GCT ATT TCA TTT TTA GTA GTT GGT CAT GCT GTA TTA ATG ATT

+ X 5 0 TTC TGT E C G TTT CGC TTA ATT TAT CAC TGT ATT GAA GTG TTA ATT GAT AAA CAT ATC TCT GTT TAT TCA ATT PHE C Y S a l a pro phe a r g l e u i l e t y r h i s c y s i l e g l u v a l l e u i l e a s p l y s his i l e ser Val t y r ser i l e

Hha I

asn glu asn phe thr Val ser phe t r p phe t r p l e u l e u Val Val thr t y r i l e Val phe a rg tYr Val asn his +2725 AAT GAA AAC TTT ACC GTA TCA TTT TGG TTC TGA I T A TTA GTA GTA ACA TAC ATA GTA TTT AGA TAC GTA AAC CAT

+2800 ATG GCT TAC CCA GTT G G E AAC TCA ACG GGG ACA ATA GCA TGC CAT AAA A X T GGA GTA AAA CAG CCA GCG met a l a t y r pro val g ly a la asn ser thr g ly thr i l e a l a c y s h i s l y s ser a l a g l y Val lys g ln p ro a la

HaeI I I HhaI Hha 1

gln gly lys asn cys pro met a l a a r g l e u thr asn ser cys lys g lu cys leu g ly phe ser leu thr pro ser t2875 CAA GGT AAG AAC TGT CCG ATG GCT AGG TTA ACG AAT TCC TGT AAA GAA TGT TTA GGG TTC TCA TTA ACT CCT TCC

HpaI HincII EcoRI h i s l e u g l y i l e Val i l e h i s a l a t y r Val leu g lu g lu g lu Val h i s g l u l e u thr lys a sn g lu s e r l eu a l a

+2g50 CAC TTG GGG ATT GTG ATT CAT GCT TAT GTA TTG GAA GAA GAG GTA CAC GAG TTA ACC AAA AAT S A T T S HinfI M b m b O I I RsaI HincII HpaI HinfI AluI

+3025 TTA AGT AAA AGT TGA CAT l e u s e r l y s ser t r p h i s

HincII

gly asn pro gly asp asn +3100 GGA AAC S G GAT AAC

EcoRI I

+3175 TCT AAA TTA AAT GCA AGG s e r l y s l e u asn a l a a r g

leu glu gly cys thr ser ser asn gly lys leu arg asn thr g ly l eu ser glu arg TTG GAG GGC T S G AGT TCA AAT GGA AAA TTA AGA AAT ACG GGA TTG TCC GAA AGG

RsaI gly Val phe i l e Val p ro l y s phe a sn l eu a sn l y s a l a a rg t y r phe ser thr leu GGA GTC TTC ATA =CC AAA TTT AAT TTA AAT AAA GCG AGA TAC TTT A E T TTA

HinfI M b o I I RsaI Rsal

l y s g l u a s p s e r l e u a l a t y r l e u thr l y s i l e asn thr thr asp phe ser glu leu AAG GAA GAC AGT TTA GCG TAT TTA ACA AAG ATT AAT ACT ACG GAT TTT TCC GAG TTA

MboiI FIG. 2-Continued

Sequence of Gene Coding for Subunit 1 of Cytochrome Oxidase 11931

asn lys leu ile glu asn asn his asn lys thr glu ttlr i l e asn thr arg i l e l eu lys l eu met ser asp i l e M T AAA TTA ATA GAA AAT M T CAT M T AAA CTT G M ACC ATT AAT ACT AGA ATT TTA A M TTA ATG TCA GAT ATT +3250

+3325

+3400

+3475

+3550

+3625

+ 3 700

+3775

+3850

+3925

+4000

+4075

+4150

+4225

+4300

+4375

'4450

+4525

+4600

+4675

+4750

+482 5

arg met leu l eu i l e a la ty r asn lys i l e lys se r lys lys g ly asn i le se r lys g ly se r asn asn i l e thr AGA ATG TTA TTA ATT GCT TAT M T AAA ATT AAA AGT AAG M A GGT AAT ATA TCT A M GGT TCT AAT AAT ATT ACC

leu asp gly i le asn i le ser tyr leu asn lys leu ser lys a s p i l e asn thr asn met phe lys phe s e r pro TTA GAT GGG ATT AAT ATT TCA TAT TTA M T A M TTA TCT M A GAT ATT AAC ACT AAT ATG TTT AAA TTT TCT CCG

HpaII

Val arg arg Val g lu i l e pro lys thr ser gly gly phe a r g pro leu ser Val gly asn pro arg glu lys i le

- MboI I GTT A- GTT GAA ATT CCT M A ACA TCT GGA GGA TTT AGA CCT TTA AGT GTT GGA AAT CCT AGA G M M A ATT

Val g l n g l u se r met a r g i l e i l e l e u g l u i l e i l e t y r asn asn ser phe ser tyr tyr ser his gly phe arg GTA CAA G M AGT ATG AGA ATA ATA TTA GAA ATT ATC TAT M T AAT AGT TTC TCT TAT TAT TCT CAT GGA TTT AGA RsaI

CCT M C TTA TCT TGT TTA ACA GCT ATT ATT C M TGT AAA M T TAT ATG CAA TAC TGT M T TGA TTT ATT M A GTA pro asn leu ser cys leu thr a la i l e i l e gln cys lys asn tyr met gln tyr cys asn trp phe i l e lys Val

m" asp leu asn lys cys phe asp t h r i l e pro his asn met leu i l e asn Val leu asn g l u arg i le lys asp lys GAT TTA M T M A TGC TTT GAT ACA ATT CCA CAT M T ATG TTA ATT AAT GTA TTA AAT GAG AGA ATC A M GAT AAA

m

GGT TTC ATA GAC TTA TTA TAT M A TTA TTA AGA GCT GGA TAT GTT GAT AAA M T AAT AAT TAT CAT AAT ACA ACT gly phe i l e asp leu leu t y r lys leu leu arg a la gly tyr Val asp lys asn asn asn t y r his asn t h r thr

- AluI

TTA GGA ATT CCT CAA GGT AGT GTT GTC AGT CCT ATT TTA TGT AAT ATT TTT TTA GAT M A TTA GAT A M TAT TTA leu gly i l e pro gln gly s e r val Val se r pro i l e leu cys asn i l e phe leu asp lys leu asp lys tyr leu

EcoRI glu asn lys phe glu asn glu phe asn thr gly asn met ser asn arg gly arg asn pro i le tyr asn ser leu GAA AAT AAA TTT GAG AAT GAA TTC AAT ACT GGA AAT ATG TCT AAT AGA GGT AGA AAT CCA ATT TAT M T AGT TTA

EcoRI ser ser l y s i l e t y r a r g cys lys leu leu ser glu lys leu lys leu i le arg leu arg asp his tyr gln arg TCA TCT AAA ATT TAT AGA TGT A M TTA TTA TCT GAA AAA TTA AAA TTG ATT AGA TTA AGA GAC CAT TAC CAA AGA

asn met gly ser asp lys ser phe lys a rg a la ty r phe Val arg tyr a la a s p a sp i l e i l e i l e g ly Val met M T ATG GGA TCC GAT A M AGT TTT AAA A G A X TAT TTT GTT AGA TAT GCT GAT GAT ATT ATC ATT GGT GTA ATG Fm- A1 uI

GGT TCT CAT AAT GAT TGT M A M T ATT TTA AAC GAT ATT AAT AAC TTC TTA M A GAA AAT TTA GGT ATG TCA ATT gly ser his asn asp cys lys asn i l e l e u asn asp i l e asn asn phe leu lys glu asn leu gly met s e r i l e

M T ATA GAT AAA TCC GTT ATT A M CAT TCT AAA GAA GGA GTT AGT TTT TTA GGG TAT GAT GTA AAA GTT ACA CCT asn i l e asp lys ser val i l e lys his se r lys glu gly Val s e r phe leu ply tyr asp Val lys val thr pro

trp glu lys a r g pro tyr arg met i le lys lys gly asp asn phe i l e a rg Val arg his his t h r ser leu Val TGA GAA AAA AGA CCT TAT AGA ATG ATT M A AAA *T AAT TTT ATT AGG GTT AGA CAT CAT ACT AGT TTA GTT

HphI

GTT AAT GCC CCT ATT AGA AGT ATT GTA ATA M A TTA AAT AAA CAT GGC TAT TGT TCT CAT GGT ATT TTA GGA AAA Val asn a la pro i l e a r g s e r i l e Val i l e lys leu asn lys his gly tyr cys s e r his gly i l e leu gly lys

pro arg gly Val gly arg leu ile his glu glu m t lys thr i l e l e u met his tyr leu ala Val gly a r g gly CCC AGA GGG GTT G- TTA ATT CAT S . 4 ATG AAA ACC ATT TTA ATG CAT TAC T T E GTT GGT AGA GGT

MboI I MboI I AluI

ATT ATA AAC TAT TAT AGA TTA GCT ACC AAT TTT ACC ACA TTA AGA GGT AGA ATT ACA TAC ATT TTA TTT TAT TCA i l e i l e asn tyr tyr a rg l eu a la thr asn phe t h r thr leu a r g gly a r g i l e thr tyr i l e l eu phe tyr se r

AluI

cys cys leu thr leu ala ser lys phe lys leu asn thr Val lys lys Val i le leu lys phe qly lys Val leu TGT TGT TTA ACA TTA GCA AGT AAA TTT AAA TTA AAT ACT GTT AAG AAA GTT ATT TTA AAA TTC GGT A M GTA TTA

Val asp pro h is se r lys Val se r phe ser i l e asp asp phe l y s i l e a r g h i s lys i l e asn i l e thr asp ser GTT m C T CAT TCA AAA GTT AGT TTT AGT ATT GAT GAT TTT M A ATT AGA CAT AAA ATA AAT ATA ACT G A T T

MboI HinfI

asn tyr thr pro asp glu i l e leu a s p a r g tyr lys tyr met leu pro arg ser leu ser leu phe se r gly i l e AAT TAT ACA CCT GAT GAA ATT TTA GAT AGA TAT AAA TAT ATG TTA CCT AGA TCT TTA TCA TTA TTT AGT GGT ATT

Eg"-

cys gln i l e cys gly ser lys his asp leu glu Val his his Val arg t h r leu asn asn ala ala asn l y s i l e TGT CAA ATT TGT GGT TCT AAA CAT GAT TTA G A A S C A T CAC GTA AGA ACA TTA AAT AAT GCT GCC M T AAA ATT

RsaI

FIG. 2"Continued

11932 Sequence of Gene Coding for Subunit 1 of Cytochrome Oxidase

l y s a sp a sp t y r l eu l eu g ly a rg met i l e l y s i l e asn a rg lys g ln i l e thr i l e cys ly s thr cys his phe +4900 A M GAT GAT TAT TTA TTA GGT AGA ATG ATT AAG ATA AAT AGA AAA CAA ATT ACT ATC TGT A M ACA TGT CAT T i l

+4975 AAA GTT CAT C M GGT AAA TAT AAT GGT U T TTA TAA TAATTATTATACTCCTTCGGGGTCGCCGCGGGGGCGUCGACTA l y s val h i s g ln g ly l y s t y r a sn g ly p ro g ly l eu och H U

EcoRII HaeI I I

+5060 TTAAATATGCGTTAA ATG GAG AGC CGT ATG ATA TGA AAG TAT CAC U G G TTC GGA GAG GGC TCT TTT ATA TGA met g lu s e r a rg met i l e trp l y s t y r h i s Val arg phe g ly g lu g ly se r phe i l e t r p

RsaI

met leu l eu h i s se r asp a rg phe a l a thr thr THR LEU VAL MET PRO ALA LEU I L E GLY GLY PHE GLY ASN gln +5135 ATG TTA TTA CAT TCA GAT AGG TTT GCT ACT CTA CTC TTA GTA ATG CCT GCT TTA ATT GGA GGT T l l GGT AAC CAA

+5210 AAA AGA TAT GAA AGT AAT AAT AAT AAT M T CAA GTA ATA GAA AAT AAA GAA TAT AAT TTA AAA TTA AAT TAT GAT l y s a r g t y r g l u ser asn asn asn asn asn gln Val i l e g l u asn lys g lu ty r asn l eu lys l eu asn ty r asp

t5285 AAG TTG GGA CCT TAT T T E GGA TTA ATT GAA GGT GAT GGA TCT ATT CTA GTT CAA M T TCA -A ATA AAA l y s l e u g l y p r o t y r l e u a l a g l y l e u i l e g l u g l y a s p g l y s e r i l e thr Val g l n a s n s e r s e r s e r i l e l y s

A1 u I "

HphI M b o I M b o I I

+5360 AAA TCT AAA TAT AGA CCG TTA ATT GTT GTA GTA TTT AAA TTA G A A GAT TTA GAA T T E AAT TAT TTA TGT AAT l y s ser l y s t y r a r g p r o l e u i l e Val va7 val phe l y s l e u g l u asp leu g l u l e u a l a a s n t y r leu cys asn

MboII AluI

+5435 TTA ACT M A TGT GGA AAA GTG TAT AAA AAA AT1 AAT CGT AAT TAT GTA TTA TGA CCT A T 1 CAT GAT TTA AAA GGT leu thr lys cys gly lys Val t y r l y s l y s i l e asn arg asn tyr Val l e u t r p pro i l e his asp leu lys gly

+551O GTA TAT ACA TTA TTA AAT ATT AT1 BAT GGA TAT ATG AGA ACA CCT AAA TAT GGA GCA TTT GTT AGA GGT GCT GAA Val t y r thr leu leu asn i l e i l e a s n g l y t y r met arg thr p r o l y s t y r g l y a l a phe Val a rg q ly a la g lu

45585 TTT ATA AAT AAT TAT ATT RAT TCA ACC AAC CAA CTA CAT AAT M A TTA AAA AAT ATA GAT M T ATT AAA ATT M A phe i l e asn asn t y r i l e a s n ser thr asn g l n thr h i s a s n l y s l e u l y s a s n i l e a s p a s n i l e l y s i l e l y s

+5660 CCA TTA GAT ACA TCA GAT A T 1 GGT TCA AAC GCT TGA T T E ATC TTG ACA GAT GCA GAT GGT M T TTT TCT ATT pro leu asp thr ser a s p i l e g l y ser asn a la trp l e u a l a i l e leu thr asp ala asp gly asn phe s e r i l e

A1 uI

asn leu i l e a s n g l y l y s a s n a r g s e r s e r a r g thr met pro t y r t y r cys l eu g lu l eu a rg g ln asn ty r g ln +5735 AAT TTA ATA AAT GGT A M AAT CGT TCT AGT AGG ACA ATG CCT TAT TAT TGT TTA G A A TTA AGA C M AAT TAT CAA

+581O AAA ATT TCT AAT AAT AAT M T A T 1 AAT TTT TCT TAT TTT TAT ATT ATG TCT GCA ATT GCA CTA TAT TTT AAT GTT l y s i l e s e r a sn a sn a sn a sn i l e a sn phe ser t y r phe t y r i l e met s e r a l a i l e a l a thr t y r phe asn Val

+5885 M T TTA TAT AGT AGA GAA CGT AAT TTA AAT TTA TTA GTA TCT CTT AAT AAT ACG TAT AAA CTA TAT TAT AGT TAT asn l e u t y r ser arg g lu a rg asn leu asn leu l eu Val ser thr asn asn t h r t y r l y s thr t y r t y r ser t y r

+5960 M A GTA ATA GTG GCT AAT CTA TAT AAA AAT ATT AAA GTA ATA GAA TAC TTT AAT M A TAT TCT TTA TTA TCA TCT l y s val i l e val a l a asn thr t y r l y s asn i l e l y s Val i l e glu t y r phe asn l y s t y r s e r leu leu ser ser

t6035 M A CAC TTA GAT TTT TTA GAT T U T A M T T A GTT ATT TTA ATT RAT AAT GAG GGT CAR AGT ATA AAT CTT AAT l y s h i s l e u asp phe l eu asp t r p s e r l y s leu Val i l e leu i l e asn asn g lu g ly g ln s e r i l e asn thr asn

g ly s e r t r p g lu l eu gly i l e asn leu a rg l y s asp t y r asn l y s thr arg thr thr phe thr trp s e r h i s leu +6110 GGT AGT TGA GAA TTA GGT ATA AAT TTA CGT AAA GAT TAT AAT AAA ACT AGA ACT ACG TTT ACT T U T CAT TTA

M b o I

MboI

+6275 AGTGTAAAAGGTGTAACGAGATTATTAATAAGATGCCGTAATATATTGTA AAATATATTATTATTACAACACTATATGCGGGAAAACCCTAAAGTCATAA

+6475 ATAATTTCCCCACCCCCATGCGAAGCATGGGGGGGGGTATAAGTATGGAC AATCCGCAGGAAACCAAATAATAATTAATATCCTGAACAAAGTAAGTGAA

Sequence of Gene Coding for Subunit 1 of Cytochrome Oxidase 11933

leu lys asp tyr asn lys met asn TYR LEU LEU PRO LEU I L E I L E GLY ALA THR ASP 6 6 7 5 T W T A T A G T C C A M T A T A A TTG A M GAT TAT AAT AAA ATG M C TAT TTA TTA CCA TTA ATA ATT G G U ACA GAT

M b o I I A l u I

THR ALA PHE PRO ARG I L E ASN ASN I L E ALA PtIE TRP VAL LCU PRO MET GLY LEU VAL CYS LEU VAL THR SER THR +6754 ACA GCA TTT CCA AGA ATT AAT AAC ATT GCT TTT TGA GTA TTA CCT ATG GGG TTA GTA TGT TTA GTT ACA TCA ACT

LEU VAL GLU SER GLY ALA GLY THR GLY TRP THR VAL TYR PRO PRO LEU SER SER I L E GLN ALA HIS SER GLY PRO +6829 TTA GTA -A GGT GCT GGT ACA GGG TGA ACT GTC TAT CCA CCA TTA TCA TCT ATT CAG GCA CAT TCA GGA CCT

HinfI R s a I HphI

SER VAL ASP LEU ALA I L E PHE ALA LEU HIS LEU THR SER I L E SER SER LEU LEU GLY ALA I L E ASN PHE I L E VAL +6904 AGT GTA GAT I T A GCA ATT T I T GCA TTA CAT TTA ACA TCA ATT TCA TCA TTA TTA GGT GCT ATT AAT TTC ATT GTA

THR THR LEU ASN MET ARG THR ASN GLY MET THR MET H I S LYS LEU PRO LEU PHE VAL TRP SER I L E PHE I L E THR +6979 ACA ACA TTA M T ATG AGA ACA AAT GGT ATG ACA ATG CAT M A TTA CCA TTA TTT GTA TGA TCA ATT TTC ATT ACA

FlboI ALA PHE LEU LEU LEU LEU SER LEU PRO VAL LEU SER ALA GLY I L E THR MET LEU LEU LEU ASP ARG ASN PHE ASN

+7054 GCG TTC TTA TTA TTA TTA TCA TTA CCT GTA TTA TCT GCT GGT ATT ACA ATG TTA TTA TTA GAT AGA AAC TTC M T

THR SER PHE PHE GLU VAL ALA GLY GLY GLY ASP PRO I L E LEU TYR GLU HIS LEU PHE TRP PHE PllE GLY gln thr +7129 ACT TCA TTC TTT GAA GTA GCA GGA GGT S C CCA ATC TTA TAC GAG CAT TTA TTT T G TTT GGT CAA ACA

HphI HinfI

t 7 2 0 4 GTG GCC CTT ATT ATT ATA I T A ATA ATA TAT AAT GAT ATG CAT TTT TCT AAA TGC TGG AAA TTA TTA AAA AAA TGA Val ala thr i le i le i le leu i le i le tyr asn a s p met his phe ser lys cys trp lys leu leu lys lys trp - H a e I I I

i l e thr asn i le i le ser thr leu phe lys ala leu phe Val lys ile phe i le ser tyr asn asn gln gln asp +7279 ATT ACA AAT ATT ATA AGT CTA TTA TTT AAA GCC TTA TTT GTA A M ATA TTC ATA TCT TAT AAT AAT CAG CAG GAT

+7354 MG ATA ATA AAT M T CTT ATA TTA M A AAA GAT AAT ATT A M AGA TCC TCA GAG ACT ACA AGA A M ATA TTA AAT l y s i l e i l e asn asn thr i le leu lys lys asp asn ile lys arg ser ser glu thr thr arg lys ile leu a s n

- M b o I

asn ser i l e asn lys lys phe asn gln trp leu ala gly leu i l e asp gly asp gly tyr phe gly i le Val ser + 7 4 2 9 AAT TCA ATA A4T A M M A TTT M T C M TGA TTA GCT GGA TTA ATT GAT GGT GAT GGA TAT TTT GGT ATT GTA AGT

Til" Hphr

lys 1 ~ s t y r Val ser leu glu i le thr Val ala leu glu asp glu ile ala leu lys glu i l e gln asn lYs Phe +7504 AAG AAA TAT GTA TCA TTA GAA ATT CTA GTA GCA TTA =T GAA A T E TTA AAA GAA ATT CAA M T AAA TTT

M b o I I A1 u1

gly gly ser i le lys leu a r g ser gly Val lys ala ile arg tyr arg leu thr asn lys thr gly i le i le lys +7579 GGT GGT TCT ATT A M TTA A G A GGT GTA A M GCT ATT AGA TAT AGA TTA CTT AAT AAA ACT GGT ATA ATT AAA

leu i l e asn ala Val asn gly asn i le arg asn thr lys a r g leu Val g l n phe asn lys Val cys i le leu leu t 7 6 5 4 TTA ATT AAT GCA GTT AAT GGT AAT ATT AGA AAT ACT M A AGA TTA E A A TTT AAT A M GTT TGT ATT TTA TTA

Mbo I AluI

R s a I

+7729 GGT ATT GAT TTT ATT TAT CCA ATT AAA TTA ACT AAA GAT AAT AGT TGA TTT GTT GGA TTT TTT GAT GCT GAT Ga gly i le asp phe i le tyr pro i l e lys leu thr lys asp asn ser trp phe Val gly phe phe asp a l a asp g1Y

t 7 8 0 4 E A ATT AAT TAT TCA TTT AAA AAT AAT CAT CCT CAA TTA ACA AAA ACT GTA ACT AAT AAA TAT TTA CAA GAT th r i l e asn tyr ser phe lys asn asn his pro gln leu thr lys thr val thr asn lys tyr leu gln asp Val

Rsa I RsaI

+7879 CAA GAA TAT AAA AAT ATT I T A GGT GGT AAT ATT TAT TTT GAT AAA TCA CAA AAT GGT TAT TAT AAA TGA TCC ATT g l n glu tyr lys asn i l e leu gly gly asn i le ty r phe a s p lys ser gln asn gly t y r tyr 1ys t r p ser i le - MboI

+7954 CAA TCA AAA GAT ATA GTA TTA AAT TTT ATT AAT GAT TAT ATT AAA ATA AAT CCA TCA AGA ACA CTA AAA ATA AAT gln ser lys asp i l e val leu asn phe i l e asn asp tyr i le lys i le asn p r o ser a r g thr thr lys i l e asn

+a029 A M T5A TAT TTA AGT M A GM TTT TAT AAT TTA AAA GAA TTA AAA GCT TAT M T AAA TCT TCT GAT TCA ATA C M lys leu tyr leu ser lys glu phe tyr asn leu lys glu leu lys ala tyr asn lys ser ser asp ser ile g l n

H i n d 1 1 1 F7Gii-m

+8104 TAT AAA GCA TGA TTA M T U T GAA AAT AAA TGA AAA AAT M A T M ATTATTTMTAAAGATATAG TCC MA TTA TAT ATA tyr lys ala trp leu asn phe glu asn lys trp lys asn lys och ser lys leu tyr ile

+E184 TAT AAT ATA TAT ATA TAT AAC AAG CAC CCT GAA GTA TAT ATT TTA ATT ATT =A TTT GGT ATT ATT TCA CAT t y r asn i le ty r i l e t y r asn 1yS HIS PRO GLU VAL TYR I L E LEU I L E I L E PRO GLY PHE GLY I L E I L E SER H I S

EcoRI I

VAL VAL SER THR TYR SER LYS LYS PRO VAL PHE GLY GLU I L E SER MET VAL TYR ALA MET ALA SER I L E GLY LEU +E259 GTA GTA TCA ACA TAT TCT A M M A CCT GTA TTT =A ATT TCA ATG GTA TAT GCT ATG GCT TCA ATT GGA TTA

Hph I FIG. 2"Continued

11934 Sequence of Gene Coding for Subunit 1 of Cytochrome Oxidase

LEU GLY PHE LEU VAL TRP SER H I S H I S MET TYR I L E VAL GLY LEU ASP ALA ASP THR ARG ALA TYR PHE THR SER +a334 TTA G- TTA GTA T G A CAT CAT ATG TAT ATT GTA GGA TTA GAT GCA GAT ACT AGA GCA TAT TTC CTA TCC

H i n f I Mbo I

t8409 GCA CTG ATG ATT ATT GCA ATT CCA ACA GGA ATT AAA ATC TTT TCT TGA TTA GCC C T S TAC GGT GGT TCA ATT ALA THR MET I L E I L E ALA I L E PRO THR GLY I L C LYS I L E PHE SER TRP LEU ALA THR I L E TYR GLY GLY SER I L E

M b o I

ARG LEU ALA THR PRO MET LEU TYR ALA I L E ALA PHE LEU PHE LEU PHE THR MET GLY GLY LEU THR GLY VAL ALA +a484 AGA TTA GCA CTA CCT ATG TTA TAT GCA ATT GCA TTC TTA TTC TTA TTC ACA ATG GGT GGT TTA ACT GGT GTT GCC

+8559 TTA GCT AAC GCC TCA TTA GAT GTG GCA TTC CAC GAT ACT TAC TAC GTG GTG GGA CAT TTT CGA GCG GTC TGA AAG LEU ALA ASN ALA SER LEU ASP VAL ALA PHE H I S ASP THR TYR TYR VAL VAL GLY H I S PHE a r g a l a V a l t r p l y s

A" Taqr

t8634 TTA TCA TAA ATAATATTTACCATATAATAATGGATAAATTATATTT TTATCAATATAAGTCTAATTACAAGTGTATTAAAATGGTAACATAAATAT leu s e r o c h

t8730 GCTAAAGTAATGACAAAAGTATCCATATTCTTGACAGTTATTTTATAT TATAAAAAAAAGATGAAGGAACTTTGACTETAATATGCTCAACGAAA A1 uI Mbo I

t8830 GTSMATGTTATAAAATTACTTACACCACTAATTGAAAACCTGTCT GATATTCAATTATTATTTATTATTATATAATTATATAATAATAAATAAAA H i n f I

t8930 TGGTTGATGTTATGTATTGGAAATCAGCATACGATAAATCATATAACCAT TAGTAATATAATTTGAGAGCTAAGTTAGATATTTACGTATTTATGATAAA - A1 uI

+go30 ACAGAATAAACCCTATAAATTATTATTATTAATAATAAAAAATAATAATA ATACCAATATATATATTATTTAATTTATTATTATTATATTAATAAAATTT

+ 9 1 3 0 AATATATATTATAAATAATTATTGGATTAAGAAATATAATATTTTATAGA AATTTTCTTTATATTTAGAGGGTAAAAGATTGTATAMAAKAATGCCA A1 uI

+9230 TATTGTAATGATATGGATAAGAATTATTATTCTAAAGATGAAAATCTGCT AACTTATACTATAGGTGATATGCCTATCTTTATTTATATATATATTATTA rn

+9428 TTG CGT GAG CCG TAT GCG ATG AAA GTC GCA C G G GTT CTT A E G GGA AAA CTT GTA AAG GTC TAC CTA TCG leu a r g g l u p r o t y r a l a wt l y s v a l a l a arg t h r v a l t h r t h r g l y g l y l y s t h r v a l l y s v a l t y r t h r s e r

Rsa I H p a I I

+9503 GGA TAC TAT GTA TTA TCA ATG GGT GCT ATT TTC TCT TTA TTT GCA GGA TAC TAC TAT TGA AGT CCT CAA ATT TTA 9 1 y TYR TYR VAL LEU SER MET GLY ALA I L E PHE SER LEU PHE ALA GLY TYR TYR TYR TRP SER PRO GLN I L E LEU

+9653 CCA ATG CAT TTC TTA GGT ATT AAT GGT ATG CCT AGA AGA ATT CCT GAT TAT CCT GAT GCT TTC GCA GGA TGA AAT PRO MET H I S PHE LEU GLY I L E ASN GLY MET PRO ARG ARG I L E PRO ASP TYR PRO ASP ALA PHE ALA GLY TRP ASN

E c o R I

+9803 GTT AAT GGA TTA AAT AAT AAA GTT AAT AAT AAA TCA GTT ATT TAT GCT AAA GCA CCC GAT TTT GTA GAA TCT AAT VAL ASN g l y leu a s n a s n l y s v a l a s n ASN LSY SER VAL I L E TYR ALA LYS ALA PRO a s p p h e v a l g l u s e r a s n

i5"

+9953 TTT AAT ACA CCA GCT GTA CAA TCT TAA GTTTTAAAATTTA ATTATTTACTTAATAATTAAAAAAAAAAAGTAAATATTATATCTAAAACT PHE ASN THR PRO ALA VAL GLN SER OCH

P v u I I R s a I FIG. 2"Continued

Sequence of Gene Coding for Subunit I of Cytochrome Oxidase 11935

FIG. 3. Amino acid sequence of Subunit 1 of yeast cytochrome oxi- dase. The sequence is derived from the nucleotide sequence of the gene. The eight exons are indicated by the arrous above the one-letter amino acid se- quence. The underlined sequences cor- respond to those regions of the protein where the homology with the human protein was 50% or greater.

1 * ~ V Q R W L Y S T N A K D I A V L Y F M L A I F S G M A G T A M S L I I R L E L A A P G S Q Y L H G N S Q

Exon A1

54 -b f Exon A2 __* +Exon A3 ___* 4 ...............................................

F[il N _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ T . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . r1 V V G H A V L M I F L V M P A L I G G F G N Y L L P L I I G A T D T A F P R I N N I A F W V L P M

I n7 G L V C L V T S T L V E S G A G T G W T V Y P P L S S I Q A H S G P S V D L A I F A L H L T S l S S L L G ___"""_____"""""""""""""""""""""""""""""""""""""""""""""-

160 Exon A4

A I N F l V T T L N M R T N G M T M H K L P L F V ~ S I F I T A F L L L L S L P V L S A G I T M L L L D R "_"__""__"""""""""""""""""""""""""""""""""""""""""""""" 21 3

L A IT

N F N T S F F E V A G G G D P I L Y E H L F W F F G H P E V Y I L I I P G F G I I S H V V S T Y S K K P V "_"_""""""""""""""""""""""""""""""""""""""""""""""-""-- 266

F G E I S M V Y A M A S I G L L G F L V W S H H M Y I Y G L D A O T R A Y F T S A T M I I A I P T G I K ~ __"__""____"""""""""""""""""""""""""""""""""""""""""""""~ 31 9

F S W L A T I Y G G S I R L A T P M L Y A I A F L F L F T M G G L T G V A L A N A S L D V A F H D T Y Y V

Exon A5

_____"_""___"___""""""""""""""""""""""""""""""""""""""""""- 372 __* 4

V G H F Y Y V L S M G A I F S L F A G Y Y Y W S P Q I L G L N V N E K L A Q I ~ F W L I F I G A N V I F F "__"_______""""""""""""""""""""""""""""""""""""""""""-"---" 425 Exon Ab b+

P M H F L G I N G M P R R l P D Y P D A F A G W N Y V A S I G S F I A T L S L F L F I Y I L Y D ~ L V N N ....................................

478 " E x o n A 7 + 4 " E x o n AB-

K S V I Y A K A P S S S I E P L L T S P P A V H S F N T P A V O S

this size should not exceed, 1,100 nucleotides, assuming the average molecular weight of an amino acid to be 110. The 9,979-nucleotide-long sequence of the oxi3 locus is nine times longer than the expected length of the gene, thus confrrming earlier evidence about the split nature of the gene (17-20).

Since there is no information about the amino acid sequence of the yeast Subunit 1, we had to rely on the sequence homology with the mammalian protein in order to identify the exon regions of the gene. The amino acid sequence of the human cytochrome oxidase Subunit 1 has recently been de- duced from its gene sequence.' Using the primary structure of the human Subunit 1 provided to us by these authors, a homologous amino acid sequence has been found in the oxi3 locus. The alignment of the two proteins is predicated on the presence in the yeast gene of six to seven introns. The uncer- tainty about the seventh intron arises from a somewhat low homology of the two proteins at the COOH-terminal end.

The first exon (Al) starts with the ATG initiator a t nucleo- tide +1 and ends at nucleotide +la or +171. This exon codes for the first 55 to 56 amino acids from the NH2-terminal end. The homology with the human sequence averages 45%. The second exon (A2) is a short sequence (+2620 to +2655) coding for the next 10 to 12 amino acids and exhibiting 50% homology with the human protein. The thud exon (A3) also codes for a short segment of the protein (+5168 to +5206). The 12 to 13 amino acids encoded in A3 are 60% homologous with the human Subunit 1. Exons A4 and A5 (+6718 to +7197 and +8203 to +8618) code for a total of 296 amino acids. This is the most conserved region, being almost 70% homologous with human Subunit 1. The sixth exon (A6) starts at nucleotide +9506. It is difficult to ascertain where this exon ends. There are 43 identities in the 83 amino acids encoded between nucleotides +9506 and +9754. We have somewhat arbitrarily indicated this exon to end at nucleotide +9781 since this gives the best fit with the human sequence. The last 18 amino acids of exon A6 have only two identities. The substitutions in the other 16 residues, however, involve acidic for acidic or neutral for neutral amino acids.

The last exon (AS) starts at nucleotide +9879 and ends with the COOH-terminal serine at nucleotide +9976. The human and yeast protein sequences in this region are only 25%

B. Barrel1 and F. Sanger, personal communication.

homologous. The gap between what we have defined as the end of exon A6 and the begining of the last exon is a 69- nucleotide-long sequence that can code for an additional 23 amino acids. Since there are only 10 amino acid residues in the comparable region of the human protein, it is possible that there is an additional small exon and two short introns in this sequence. We have not been able to find any significant homology in this region even by shifting the registers in the yeast sequence. Alternatively, the yeast Subunit 1 could have an insertion of 13 residues. It should be noted that the 69- nucleotide sequence contains three lysine (AAA) and seven asparagine (AAU) codons. Since these codons have been found to occur at. high frequency in the intron coding frames of the apocytochrome b gene (4) and of the Subunit 1 gene reported here, we tend to favor the former possibility, namely that there is still a seventh short exon in the sequence. We have tentatively designated the seventh exon to extend from nucleotides +9830 to +9859. The two amino acid identities in the sequence include a proline which is usually found to be conserved. Until the exon-intron boundaries are precisely defined at the level of the messenger sequence, it is still not clear whether there are 7 or 8 exons.

The primary structure of the yeast subunit 1 encoded in the oxi3 locus based on the best fit with the human subunit is shown in Fig. 3. The reported sequence assumes that there are no major insertions or deIetions between the two proteins. Although this cannot be ruled out at present, it is significant that the sequence homology of 68% in the fourth and fifth exons extends for 296 amino acids with only a single amino acid deletion in the yeast protein. This fact tends to support the idea that the two proteins are highly conserved both in terms of size and primary structure.

Based on our best approximation of the Subunit 1 sequence, the protein consists of 510 residues. The amino acid compo- sition of the protein is presented in Table 11. The composition derived from the gene sequence agrees reasonably well with two independently reported amino acid compositions of the bona fide yeast Subunit 1 (21, 22). The molecular weight estimated from the amino acid content is 55,933. This value is approximately 30% higher than measured by SDS-gel electro- phoresis (15,16). Similar discrepancies in size have been found for Subunit 3 of cytochrome oxidase (23) and cytochrome b (4). Although these differences are most likely due to under-

11936 Sequence of Gene Coding for Subunit 1 of Cytochrome Oxidase

TABLE I1 Amino acid composition of Subunit I ofyeast cytochrome oxidase

DNA sequence

Resi-

Amino acid analysis

Amino acid dues M 9 M 7" M $"

Alanine 47 9.2 8.4 7.8 Arginine 9 1.8 2.9 2.3 Aspartic acid and asparagine 31 6.1 8.7 7.8 Cysteine 1 0.2 1.4 Glycine 42 8.2 7.9 12.0 Glutamic acid and glutamine 17 3.3 6.6 5.8 Histidine 14 2.7 3.2 2.1 Isoleucine 49 9.6 7.2 7.2 Leucine 66 12.9 12.1 11.5 Lysine 8 1.6 3.6 4.3 Methionine 20 3.9 2.3 3.2 Phenylalanine 40 7.8 6.6 6.3 Proline 26 5.1 5.1 4.3 Serine 39 7.6 6.8 7.2 Threonine 31 6.1 6.5 5.1 Tryptophan 10 2.0 2.1 Tyrosine 26 5.1 4.8 3.5 Valine 34 6.7 7.3 6.3

" Values taken from Tzagoloff et al. (22). Values taken from Poyton and Schatz (21).

A I A2 A 3 A 4 A 5 1 6 - 8

NH2 ' I I I I 1 COOH

npar H P O 1 HhO I

utncu H ! " < U B p l u e L -~~ + nlndrn pruu

UhO 1 EcoRI B o m H I EccRI

o o c o n o O O D O n I 2 3 4 6 7 8 9 1 0 1 1 1 2 1 4 I 5

5 I 3 3 0

FIG. 4. DNA probes used for the Nothern blot hybridiza- tions. The eight exons and seven introns are depicted by the solid and open bars on the upper part of the figure. The restriction sites previously mapped on wild type mtDNA have been aligned with the gene. The probes used for the hybridizations are shown by the open bars in the louserpart of the figure. Each probe has been numbered.

estimation of the true size of this class of hydrophobic proteins on SDS gels, it cannot be excluded that the mature proteins are smaller than indicated by their gene sequences due to post-translational processing.

Transcripts from the oxi3 Region of mtDNA-The restric- tion map of the oxi3 locus, together with the physical locations of the Subunit 1 coding sequences, has made it possible to examine the stable RNA species transcribed from this region. We were particularly interested in seeing whether DNA probes with exon or intron sequences could be used to identify the messenger of Subunit 1.

To study the oxi3 transcripts, the mitochondrial genomes of the p- clones used for the sequence analysis were digested with different restriction endonucleases and the fragments labeled at the 5' termini with ['"'PIATP. The labeled mixtures were separated into the single strands on polyacrylamide gels, eluted, and used as hybridization probes against Northern blots of total mitochondrial RNA. Prior to use, the DNA probes were sequenced for identification of strand (sense or nonsense) and origin on the physical map. Since preliminary experiments indicated that the oxid transcripts were copied from the sense strand, only single-stranded probes comple- mentary to this strand were used for the hybridizations.

The relation of the probes to the exon and intron regions of the gene is shown in Fig. 4. Four of the probes contained sequences confined to either exons A l , A4, A5, or As. Because of the distribution of restriction sites, it was not possible to

isolate probes containing pure exons A2 or A3. A number of probes were also obtained from each of the introns except those present in the COOH-terminal end of the gene where there are still uncertainties about the exon-intron boundaries.

The results of the hybridizations are summarized in Table 111, and some representative autoradiographs of Northern blots are shown in Fig. 5. The most abundant transcript detected with each of the pure exon probes has a size corre- sponding to 1.9 kb. Although the exon probes also hybridize to larger transcripts, these are present in considerably lower concentrations. The higher molecular weight transcripts have not been studied in detail but could represent steady state levels of partially processed messenger. The fact that the 1.9- kb transcript is the smallest RNA species detected exclusively by the exon probes leads us to tentatively identify it as the subunit 1 messenger.

TABLE 111 Transcripts detected with exon and intron probes

For the designation of the Drobes. see Fia. 4.

Probe Origin Size of transcript

0.85 1.9 2.4 + 2.5

1 2 3 4 5 6 7 8 9

10 11 12 13 14 15

Exon AI Intron A1-A2 Intron A1-A2 Intron Al-A2 Intron A2-A3 Intron A2-A3 Intron A2-A3 Intron A2-A3 Intron A3-A4 Exon A4 Intron A4-A5 Exon A5, intron A4-A5 Exon A5 Intron A5-A6 Exon A8

" This probe hybridized to a prominent 3.2-kb transcript. In this experiment, the RNA was separated under conditions

which would have caused transcripts smaller than 1 kb to migrate out of the gel.

(D a

0.9 :;:I -

Probe 14 I 13 IO 9 12 3 FIG. 5. Autoradiographs of Northern blots hybridized to

oxi3 probes. The ethidium bromide-stained gels of total mitochon- drial RNA (negatiuephotographs) have been aligned with the auto- radiographs. The sizes of the two mitochondrial rRNAs (3.2 and 1.6 kb) and of a 0.9-kb transcript are marked on the left-hand margins. The exon and intron probes are numbered as in Fig. 4.

Sequence of Gene Coding for Subunit 1 of Cytochrome Oxidase 11937

TABLE IV Nucleotide sequences retained in the p - clones

The nucleotides have been numbered according to the convention described in the legend to Fig. 2.

Clone Genetic markers Nucleotide sequence

DS6/A401 M8-227, M15-98, M12-193, “1060 to +5061 M3-9, M15-190

DS6/A422 M10-63, M5-121, M11-224, +5025 to -+lo300 M5-85, M15-233, M16-41

DS6/A400 M15-190, M15-98, M12-193 +291 to +2843 DS6/A402 M12-193, M3-9 +2137 to +5530 DS6/A462 M5-121, M10-63 +4538 to -+5768

+5982 to +7189 DS6/A407 M11-224, M5-85, M15-233, +7052 to +9632

M16-41

‘ 2 9 1 + 7 R 4 3

us!~fl\~l!~!l ~ ~ ~ ~ ~ ~ 4 ~ ~ b a c a a r a ~ ~ n i c ( r C n r - - - h a ~ r ~ a a ~ c c b ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ) ~ ~

*ill? + 5 5 3 n I

b W A W A h l l l A A A T A T T ~ l l ( G A A A - - - - - - ~ ~ ~ ~ ~ - - - - - A i a r ~ ~ ~ ~ i ) n r l A A l

* 7 0 5 2 * 9 6 3 2 I

U W A 4 I l 7 A T G A T C ~ ~ T ~ T T C A T T A ~ C A G C - - - - - - - - - - - - ~ ~ T ~ ~ ~ ~ ~ ~ T C A ~ T G ) ~ ~ ~ ~ ~ A A ~

in DS6/A400, DS6/A402, and DS6/A407. The repeated sequences FIG. 6. Deletion end points of the mtDNA segments retained

are underlined. The sequences retained are enclosed in parentheses. The nucleotides are numbered according to the convention described in the legend to Fig. 2.

In contrast to the above results, the intron probes hybrid- ized to at least four different transcripts. Probes originating from the first or second introns showed pronounced hybridi- zation to two similar transcripts estimated to be 2.4 and 2.5 kb long. These transcripts are probably identical with the 19 S and 19.2 S RNAs recently described by Grivell et al. (24). The two transcripts are not always resolved on agarose gels due to their closeness in size. The ability of probes from either the first or second introns to detect both the 2.4- and 2.5-kb transcripts can be explained by cross-hybridization to homol- ogous sequences in the two introns (see below for discussion of intron homologies). A third transcript that appears to consist of a pure intron sequence is detected by a probe from the fifth intron. This transcript is approximately 0.85 kb long and probably corresponds to the 11 S circular RNA reported by Arnberg et al. (25). One additional abundant higher mo- lecular weight transcript (3.2 kb) was detected with a probe from the third intron. Since this transcript also hybridizes to different exons but not to the first two intron probes, it is likely to be a processing intermediate in which the early introns have been spliced out.

Deletion End Points and Localization of oxi3 Mutations- In the previous paper (l), oxi3 mutations were mapped to physically defined spans of the oxi3 locus. Some of the mu- tations can now be assigned to either intron or exon regions of the Subunit 1 gene. The complete nucleotide sequences of most of the mtDNA segments used in this study have allowed their deletion end points to be precisely defined. These data are presented in Table IV. Based on the marker retention of the clones and the nucleotide sequences of their mtDNAs, three mutations have been mapped to the intron regions. The two markers, M15-190 and M15-98, are present in DS6/A400 but absent in DS6/A402. Both mutations must, therefore, be located between nucleotides +262 and +2136 (i.e. the sequence present exclusively in DS6/A400) which includes most of the first intron of the gene. A similar analysis indicates that the mutation M3-9 falls in the second intron. Although the other mutations studied can also be assigned within physically defined limits, these contain both intron and exon sequences.

The only exception is the mutation in M11-224 which has been localized in a short sequence inside the fifth exon.

The mechanism by which spontaneous or ethidium bro- mide-induced deletions in mtDNA occur is not known pres- ently. Gaillard et al. (26) have recently reported that deletions in several spontaneous p- clones are initiated near repeated sequences on the wild type mtDNA. The repeated sequence, however, was present only once in the mtDNA segment of the clones examined by these authors. Although this is not gen- erally true in ethidium bromide-induced p- clones, it is inter- esting that, in at least three of the clones used in our studies, the deletions occurred near repeated sequences. This is shown in Fig. 6 for the clones DS6/A400, DS6/A402, and DS6/A407. In each case, the mtDNA segment contains a 10- to 17- nucleotide sequence at one end whose duplicate forms the start of the deletion at the opposite end but is not included in the retained segment.

Reading Frames of the oxi3 Introns-The overall structure of the Subunit 1 gene is depicted in Fig. 7. As indicated earlier, most of the exon sequences are part of longer reading frames that can code for proteins of 30 to 80 kb. Such reading frames are associated with the first, second, third, and fourth exons. The longest reading frame starts at nucleotide +169 following

AI A2

43 44 45 46 47 48

“3-9- MII-224

C M l 0 - 6 3 , M 5 - I 2 l + f-

A3 A4

” - 49 50 51 52 53 54

“16-41, M5-85. Ml5-233 +

1- A5 A6 A7 A 8

55 56 57 58 FIG. 7. Organization of the oxi3 gene. The physical map has

been linearized and is shown from 43 to 58 map units. The symbols used for the restriction sites are the same as described in the legend to Fig. 1. The exon and intron reading frames are indicated by the solid and open bars above the restriction map. The locations of the ox&’ mutations are shown by the arrows.

TABLE V Guanine + cytosine contents of exon and intron reading frames

Nucleotides Guanine + cytosine

%

Exon A1 +1 to +165 28.5 Exon A2 +2623 to +2652 33.3 Exon A3 +5171 to +5206 36.1 Exon A4 +6722 to +7197 30.2 Exon A5 +8208 to +8618 33.6 Exon A6 +9506 to +9754 30.9 Exon A7 +9830 to +9859 30.1 Exon A8 +9905 to +9976 36.1 First intron +172 to +2502 24.7 Second intron +2656 to +5010 26.3 Third intron +5207 to +6208 20.1 Fourth intron +7198 to +8145 18.6

11938 Sequence of Gene Coding for Subunit 1 of Cytochrome Oxidase

TABLE VI

- Codon frequencies in the Subunit 1 gene

A$? Codon Total" unit Sub- Intron e:? Codon Total" :i:i Intron

1 2 3 4 1 2 3 4

- Codon frequencies in the Subunit 1 gene

7 . 11 Amino _.=.- m.l.lY Sub- Intron

1 1 2 3 4

Ala GCA 32 21 10 3 4 3 G 1 0 8 6 1 2 U 47 19 10 13 6 5

Lys AAA 23 8 74 67 30 41

Met AUG 45 20 7 17 3 1 G 1 1 3 4 0 0 C 2 3 1 3 0 2

AGA G

CGA U C G

Asn AAU C

28 9 41 37 7 8

0 0 0 0 0 0 C 38 22 3 8 0 1 0 0 2 4 1 0

Phe U U U 34 17 23 26 10 16

C 1 1 1 2 0 0 0 0 0 1 0 0 U 26 13 20 14 4 I 0 0 0 0 0 0

Pro CCA 24 11 7 5 1 2 0 0 1 0 4 0

G 0 0 1 3 1 0

4 5 9 11 2 0 Ser UCA 46 25 17 13 6 9 49 13 58 61 48 35

Asp GAU 31 10 29 29 12 15 C 1 1 4 3 0 0

Cys UGU 11 1 13 18 3 1 C 0 0 0 0 0 0

Gln CAA 22 7 11 9 6 10 G 3 1 0 1 0 2

Glu GAA 31 6 22 24 12 8 G 2 1 6 6 1 1

Gly GGA 15 13 12 16 6 3 U 64 24 28 21 9 12 C 0 0 2 2 0 0 G 3 3 4 8 0 0

U 23 11 4 19 12 5 C 0 1 1 6 0 2 G 0 0 3 0 0 0

AGU 12 3 14 20 6 4 C 0 0 I I 0 0

Thr ACA 34 17 25 14 6 6 U 27 7 19 11 4 6 C 1 0 2 7 1 0 G 0 0 8 5 2 0

CUA 12 2 7 0 5 3 U 2 0 2 1 2 3 C 0 0 1 0 0 0 G 0 2 0 0 0 0

Trp UGA 26 10 7 4 5 6 G 0 0 0 1 0 1

His CAU 36 11 16 24 3 2 C 0 3 0 4 1 0

Tyr UAU 56 17 30 29 27 17 C 6 7 6 8 1 0

Ile AUU 108 41 50 56 19 24 Val GUA 54 22 22 18 8 9 C 21 5 11 5 1 0 U 38 5 14 21 5 3 A 6 1 42 13 9 17 C 1 2 1 2 0 0

G 3 3 2 2 2 1 Leu UUA 164 60 63 73 34 28

G 2 0 4 5 2 0 - " These codons have been tabulated from the gene sequences of the ATPase Subunits 6 and 9 (27,29), cytochrome oxidase Subunits 2 and

3 (23, 28), and apocytochrome b (4).

the last codon of exon A1 and ends 2434 nucleotides further downstream with an ochre terminator. Excluding the exon sequence, this frame codes for an additional 784 amino acids. An almost equally long reading frame is present in the second intron. Although the other two intron sequences are shorter, they nonetheless exceed the length of fortuitous reading frames that might be expected in a random DNA sequence. This is emphasized by the occurrence of frequent ochre and amber termination codons when the intron sequences are read in the other two registers.

The intron reading frames share some features which dis- tinguish them from the exon sequences. Because of a lower guanine + cytosine content (Table V), they tend to have a large number of codons with combinations of adenine and/or uridine. The protein sequences encoded in the introns are consequently unusually rich in lysine, asparagine and tyrosine, i.e. amino acids whose codons have either adenine and/or uridine in the first two positions of the letter code. The intron sequences also generate more polar proteins with a higher proportion of charged to neutral amino acids (Table VI).

In addition to the somewhat unusual amino acid composi- tions, the intron reading frames utilize codons that are either absent or occur only infrequently in mitochondrial genes.

There are six arginine codons of the CGN series in the first 3 introns of the Subunit 1 gene. None of these codons have been found in established yeast mitochondrial genes (4, 23, 27-29). The codons tabulated in Table VI also point to some interest- ing differences in the distribution of codons within single amino acid families. For example, there are nearly the same number of AUA and AUU isoleucine codons in the intron reading frames. In contrast, mitochondrial genes, including the Subunit 1 gene of cytochrome oxidase, use almost exclu- sively the AUU codon. A similar situation holds true for the two codons of phenylalanine. Whereas equal use is made of UUU and UUC in all the known mitochondrial genes, the introns exhibit a strong bias for the UUU codon. All the features described here for the intron reading frames of the Subunit 1 gene have also been noted in a long reading frame in the first intron of the apocytochrome b gene (4).

Several remarkable facts have emerged from a comparison of the Subunit 1 introns with one another and with the intron of the apocytochrome b gene. A computer search for identical sequences has revealed extensive homology between the first two introns of the Subunit 1 gene. Equally significant is the homology of the two protein sequences encoded by the reading frames (Fig. 8). The protein sequences average 50% identical

Sequence of Gene Coding for Subunit 1 of Cytochrome Oxidase 11939

D N A s e q u e n c e

7 Amtno actd sequence

I

t . I I

In t ron A I - A 2 Int ron A I - A 2

FIG. 8. Nucleotide and amino acid sequence homologies be- tween the first and second introns of the oxid gene. The nucleo- tide and amino acid residues in the two introns have been numbered on the coordinates. The program of Staden (30) was used to search for nucleotide and amino acid homologies. The program was set to detect a minimum of nine out of 10 consecutive identities in the nucleotide sequence and three out of four identities in the amino acid sequences.

TABLE VI1 Nucleotide substitutions in the codons of the homologous intron

reading frames No. of changes in dif- ferent positions of the

codons

First :%- Third

Total Codons substi- changed tutions

First and second 678 343 introns of the Subunit 1 gene

apocytochrome b and fourth in- tron of the Sub- unit 1 genes

First intron of 231 75

D N A Seauence

oat 3. In t ron A 4 - A 5

433 224 204 275

102 47 42 71

N m m -

0 .- c

n 0 u

0 200

o x 1 3 l n i r o n A 4 - A 5

FIG. 9. Nucleotide and amino acid sequence homologies be- tween the fourth intron of the oxi3 and the first intron of the apocytochrome b (cob) genes. The homologies were detected with the Staden program (30) as described in the legend to Fig. 8.

EXON A 1

FIG. 10. Nucleotide sequence ho- mologies at the exon-intron bound- aries. The exon and intron sequences are shown in large and small capital letters, respectively. Bl and B2 refer to the first two exons of the apocytochrome b gene (4).

amino acids over almost the entire lengths of the two introns. The results of Table VI1 summarize the divergence of the two sequences in terms of changes in each of these positions of the codons. These data indicate a small but significant preference for nucleotide substitutions in the third position.

An even more striking homology is evident between the fwst intron of the apocytochrome b and the fourth intron of the Subunit 1 genes (Fig. 9). In this case, the apocytochrome b intron is approximately 300 nucleotides longer. The two reading frames, however, can be aligned over 700 nucleotides with an excellent match in the amino acid sequences. The protein sequences are 70% homologous with a strong bias for base substitutions in the third positions of the codons (Table VII). We have also compared the sequences at the intron- exon boundaries, both within the Subunit 1 and with the apocytochrome b gene. The only significant homology near the splice points is found between the fiist two introns of the Subunit 1 and the first and fourth introns of the Subunit 1 and apocytochrome b genes. These homologies are shown in Fig. 10. All the other boundaries appear to have unique sequences.

DISCUSSION

The oxi3 locus is a complex region of yeast mitochondrial DNA located between the oligomycin and paromomycin re- sistance loci (31). During the past several years, a substantial body of evidence has been accumulated indicating that this locus represents a single mosaic gene coding for Subunit 1 of cytochrome oxidase. This conclusion is supported by the following observations. 1) Physical mapping of 0x23 mutations places a lower limit of 10 kb on the size of the gene (3). Since the apparent molecular weight of Subunit 1 is 40,000 (15, 16), the locus is at least nine times longer than tf. ? predicted length of a co-linear gene. 2) Mutations in 0x23 form a single genetic complementation group suggestive of a single gene (32). 3) The 0x23 region gives rise to different sized transcripts, some of which have been proposed to be stable intron excision products (24, 33, 34).

The DNA sequence of the oxi3 locus reported here fully supports the earlier interpretations about the organization of the Subunit 1 gene. Our sequence data indicate the gene to be 10 kb long, spanning the region of wild type mtDNA from 43.7 to 58 map units. Based on its orientation relative to the wild type map, the NH2-terminal end of the gene lies proximal to the paromomycin marker (Fig. 11). The gene is therefore transcribed from the same strand of DNA as the other two subunits of cytochrome oxidase (23, 28) and other recently sequenced mitochondrial genes (4, 27, 29). The Subunit 1 coding sequences are distributed among 7 to 8 different exons with lengths ranging from 36 to 480 nucleotides. Together, the exons dictate a protein whose sequence averages 50% homol- ogy with Subunit 1 of human cytochrome oxidase. Even though the exact junctions at the exon-intron

INTRON

EXON B1 INTRON TTA GGT CAA AAT ATG GCC TTA TTA ----- TTTTATGAAGATATAGTCCA"TATATTTATTATAAAAG t l I l l I l l I I 1 i l l I I I l l I1 l ~ l l l l l l l l l l l i l l 1 1 I l l ! I l l TTT GGT CAA ACA GTG GCC CTT ATT ----- TTTAATAAAGATATAGTCCA"TATATATATATAACAAG

EXON A 4 INTRON

boundaries

EXON A2

TTT TTA GTA I I l l I l l

CTC TTA GTA

EXON A 3

CAT CCT GAT EXON B 2

I I I l l / I CAC CCT GAA

EXON A5

11940 Sequence of Gene Coding for Subunit 1 of Cytochrome Oxidase

0 x 1 3 FIG. 11. Physical locations of yeast mitochondrial genes. The

genes encoded by the various genetic loci are as follows: oxil, Subunit 2 of cytochrome oxidase; oxi2, Subunit 3 of cytochrome oxidase; 0x23, Subunit 1 of cytochrome oxidase; cob. apocytochrome b; olil, Subunit 9 of the ATPase; oligomycin (oliZ), Subunit 6 of the ATPase; chlor- amphenicol (cup), 21 S rRNA, paromomycin bar), 14 S rRNA. The map units are shown inside the circle. Transcription is clockwise.

cannot be specified at present, the homology of the yeast and human proteins is sufficiently high to provide a reasonably accurate idea of the overall length of the coding sequence. According to our estimates, the Subunit 1 exons are comprised of 1,533 nucleotides and represent only 16% of the total length of the gene. The yeast Subunit 1 consists of 510 amino acid residues with a molecular weight of 56,000. This value is 30% higher than measured on SDS-polyacrylamide gels (15, 16). Similar discrepancies in molecular weights have been found for Subunit 3 of cytochrome oxidase (23) and cytochrome b (4). We believe that the differences are most likely due to anomalous binding of SDS by these extremely hydrophobic proteins.

The identification of four of the exon sequences is supported by Northern blot hybridizaton studies. Single-stranded probes from exons Al, A4, A5, and A8 hybridize to a 1.9-kb transcript which we tentatively identify to be the fully processed mes- senger. The migration of this RNA on agarose gels corre- sponds to that of the 18 S transcript described by Van Ommen et al. (33). Several other abundant transcripts are detected when the Nothern blots are challenged with intron probes. Two stable transcripts hybridize to probes from the fist or second introns. The sizes of these RNAs (2.4 and 2.5 kb) suggest that they may be excision products of the first two introns. A smaller transcript of 0.85 kb is also detected specif- ically with a probe from the fifth intron.

The apocytochrome b and Subunit 1 genes of yeast mito- chondria are now well documented examples of mosaic genes. All the current evidence suggests that the messengers for both genes arise from processing events involving a stepwise re- moval of the intron sequences from large primary transcripts. The yeast mitochondrial genes offer a unique opportunity for studying the details of RNA splicing mechanisms for several reasons. First is the availability of a substantial number of mutants in which DNA sequence modifications, particularly in the intron regions, prevent nonm.1 :-recessing of the RNA precursors (33). In addition, t,he nucleotide sequences of the genes are of significant advantage in evaluating possible splic- ing mechanisms.

The recent finding of intron reading frames in mitochondrial genes has prompted considerable speculation concerning the function of such sequences (35, 36). We have previously re- ported a long reading frame in the first intron of the apocy- tochrome b gene (4). The sequence of the oxid locus has revealed the presence of four other reading frames in the

Subunit 1 gene. Similarly, Dujon (37) has recently described a reading frame in the single intron of the 21 S rRNA gene. All of the currently known intron reading frames have certain properties that distinguish them from the identified genes of yeast mtDNA. They have a consistently lower guanine + cytosine content. The encoded protein sequences are charac- terized by unusual amino acid compositions, being very rich in lysine and asparagine. Their codon usage is somewhat different from that found in the coding sequences of the exons.

In addition to the above general features, we have found considerable sequence homology among some of the introns of the Subunit 1 and apocytochrome b genes. The protein sequences encoded in the first two introns of the oxi3 gene exhibit 50% homology. An even higher sequence homology is seen between the fist and fourth introns of the apocyto- chrome b and Subunit 1 genes, respectively. These homologies extend to the nucleotide sequences adjacent to the exon-intron boundaries as well. The similarities in the DNA sequences imply a common mechanism of splicing, at least for some of the intron regions of the two genes. In this context, it may be significant that some mutations in apocytochrome b introns have been shown to prevent the synthesis of cytochrome oxidase Subunit 1 (38-40). Some preliminary studies suggest a functional link in the excision of the fust intron of apocy- tochrome b and the fourth intron of Subunit 1. An examina- tion of the transcripts present in a mutant carrying a lesion in the fist apocytochrome b intron has shown that the process- ing of the Subunit 1 messenger is arrested at the level of the fourth intron."'

Several laboratories have proposed that the intron reading frames code for a specialized set of proteins required for RNA splicing (35, 36). Such a function is strengthened by the observation that certain intron mutations in apocytochrome b lead to an accumulation of precursor transcripts which are translated to high molecular weight proteins antigenically related to cytochrome b (40). These new proteins have been postulated to contain both exon- and intron-specified amino acid sequences. Although there are a number of attractive features in processing models requiring the synthesis of intron- encoded splicing enzymes (35, 36), not all of the evidence supports this interpretation. Morimoto et al. (34) have re- ported proper splicing of the 21 S rRNA in p- mutants defective in mitochondrial protein synthesis and therefore incapable of translating mitochondrial messengers. Along the same line, we have found that mitochondrial tRNA mutants are capable of processing the apocytochrome b ,messenger, albeit at a less efficient rate than wild type cells.4 It should also be noted that reading frames coding for proteins with a high lysine and asparagine content are not restricted to the intron regions of mitochondrial genes. For example, in S. cereuisiae D273-10B, there is a 1500-nucleotide-long uninter- rupted reading frame between the Subunit 2 and 3 genes of cytochrome oxidase." A somewhat shorter reading frame of 500 nucleotides has also been found between the genes of Subunit 6 of the ATPase and apocytochrome b (29). Whether these sequences, as well as those of the introns, are expressed as translation products remains obscure at present.

Acknowledgments-We wish to express our appreciat.ion to Dr. Bart Barrel1 and Dr. Frederick Sanger for providing us with the sequence of the human Subunit 1 of cytochrome oxidase prior to publication.

' S. Bonitz, unpublished studies. I S . G. Bonitz, G. Coruzzi, B. E. Thalenfeld, A. Tzagoloff, and G .

Macino, unpublished studies. G . Coruzzi, unpublished studies.

Sequence of Gene Coding for SI

REFERENCES

1. Bonitz, S. G., Coruzzi, G., Thalenfeld, B. E., Tzagoloff, A., and Macino, G. (1980) J. Biol. Chem. 255, 11922-11926

2. Grivell, L. A,, and Moorman, A. F. M. (1977) in Genetics and Biogenesis of Mitochondria (Bandlow, W., Schweyen, R. J., Wolf, K., and Kaudewitz, F., eds) pp. 371-384, Walter de Gruy- ter, Berlin, West Germany

3. Morimoto, R., Lewin, A,, and Rabinowitz, M. (1979) Mol. Gen. Genet. 170, 1-9

4. Nobrega, F. G., and Tzagoloff, A. (1980) J. Biol. Chem. 255,9828- 9837

5. Maxam, A. M., and Gilbert, W. (1977) Proc. Natl. Acad. Sci. U.

6. Tzagoloff, A,, Akai, A., and Foury, F. (1976) FEBS Lett. 65,391-

7. Burger, S. L., and Birkenmeier, C. S. (1979) Biochemistry 18,

8. Alwine, J . C., Kemp, D. J., and Stark, G. R. (1977) Proc. Natl.

9. Wahl, G. M., Stern, M., and Stark, G. R. (1979) Proc. Natl. Acad.

10. Macino, G., Coruzzi, G., Nobrega, F. G., Li, M., and Tzagoloff, A.

11. Fox, T. D. (1979) Proc. Natl. Acad. Sci. U. S. A. 76, 6534-

12. Barrell, B. G., Bankier, A. T., and Drouin, J. (1979) Nature 282,

13. Li, M., and Tzagoloff, A. (1979) Cell 18,47-53 14. Bonitz, S. G., Berlani, R., Coruzzi, G., Li, M., Macino. G., Nobrega,

F. G., Nobrega, M. P., Thalenfeld, B., and Tzagoloff, A. (1980) Proc. Natl. Acad. Sci. U. S. A. 77, 3167-3170

15. Mason, T. L., Poyton, R. O., Wharton, D. C., and Schatz, G. (1973) J. Biol. Chem. 248, 1346-1354

16. Rubin, M. S., and Tzagoloff, A. (1973) J. Biol. Chem. 248, 4269- 4274

17. Slonimski, P. P., Claisse, M. L., Foucher, M., Jacq, C., Kochko, A,, Lamoroux, A., Pajot, P., Perrodin, G., Spyridakis, A., and Wambier-Kluppel, M. L. (1978) in Biochemistry and Genetics of Yeast (Bacila, M., Horecker, B., Stoppani, A. 0. M., eds) pp. 391-401, Academic Press, New York

18. Tzagoloff, A., Foury, F., and Macino, G. (1978) in Biochemistry a n d Genetics of Yeast (Bacila, M., Horecker, B., Stoppani, A. 0. M., eds) pp. 477-488, Academic Press, New York

S. A. 74,560-564

396

5143-5149

Acad. Sci. U. S. A. 74, 5350-5354

Sci. U. S. A. 76,3683-3687

(1979) Proc. Natl. Acad. Sci. U. S. A. 76,3784-3785

6538

189-194

19. Borst, P., and Grivell, L. A. (1978) Cell 15, 705-723 20. Grivell, L. A., Arnberg, A. C., Boer, P. H., Borst, P., Bos, J . L.,

Van Bruggen, E. F. J., Groot, G. S. P., Hecht, N. B., Hensgens,

vbunit 1 of Cytochrome Oxidase 11941

L. A. M., Van Ommen, G. J . B., and Tabak, H. F. (1979) in ICN-UCLA Symposia on Molecular and Cellular Biology: Extrachromosomal DNA (Cummings, D. J., Borst, P., Dawid, I. B., Weissman, S. M., and Fox, C. F., eds) Vol. XV, pp. 305- 324, Academic Press, New York

21. Poyton, R. O., and Schatz, G. (1975) J . Biol. Chem. 250, 752-761 22. Tzagoloff, A,, Akai, A,, and Rubin, M. S. (1974) in Biogenesis of

Mitochondria (Kroon, A. M., and Saccone, C., eds) pp. 405-

23. Thalenfeld, B. E., and Tzagoloff, A. (1980) J. Biol. Chem. 255, 421, Academic Press, New York

24. Grivell, L. A,, Arnberg, A. C., Hensgens, L. A. M., Roosendaal, E., Van Ommen, G. J . B., and Van Bruggen, E. F. J. (1980) in The Organization and Expression of the Mitochondrial Genome (Kroon, A. M., and Saccone, C., eds) pp. 37-49, North-Holland Publishing Co., Amsterdam

25. Arnberg, A. C., Van Ommen, G. J. B., Grivell, L. A., Van Bruggen, E. F. J., and Borst, P. (1980) Cell 19, 313-319

26. Gaillard, C., Strauss, F., and Bernardi, G. (1980) Nature 283,218- 220

27. Hensgens, L. A. M., Grivell, L. A., Borst, P., and Bos, J . L. (1979) Proc. Natl. Acad. Sci. U. S. A. 76, 1663-1667

28. Coruzzi, G., and Tzagoloff, A. (1979) J. Biol. Chern. 254, 9324- 9330

29. Macino, G., and Tzagoloff, A. (1980) Cell 20, 507-517 30. Staden, R. (1977) Nucleic Acids Res. 4, 4037-4051 31. Slonimski, P. P., and Tzagoloff, A. (1976) Eur. J. Biochem. 61,

32. Foury, F., and Tzagoloff, A. (1978) J. Biol. Chem. 253,3792-3797 33. Van Ommen, G. J. B., Groot, G. S. P., and Grivell, L. A. (1979)

Cell 18, 511-523 34. Morimoto, R., Locker, J., Synenki, R. M., and Rabinowitz, M.

(1979) J. Biol. Chem. 254, 12461-12470 35. Slonimski, P. P. (1980) C. R. Seances Acad. Sci. (Paris) 290, 1-

4 36. Church, G. M., and Gilbert, W. (1980) in Mobilization and

Reassembly of Genetic Information (Joseph, D. R., Shultz, J., Scott, W. A,, and Werner, R., eds) Academic Press, New York, in press

6173-6180

27-41

37. Dujon, B. (1980) Cell 20, 185-197 38. Alexander, N. J., Vincent, R. D., Perlman, P. S., Miller, D. H..

Hanson, D. K., and Mahler, H. R. (1979) J. Biol. Chem. 254,

39. Church, G. M., Slonimski, P. P., and Gilbert, W. (1979) Cell 18,

40. Claisse, M., Slonimski, P. P., Johnson, J., and Mahler, H. R.

2471-2479

1209-1215

(1980) Mol. Gen. Genet. 177, 375-387