12
Gene, 70 (1988) 1-12 Elsevier GEN 02561 Organization and nucleotide sequence of genes at both junctions between the two inverted repeats and the large single-copy region in the rice chloroplast genome (Ribosomal protein; photosystem II; histidine tRNA gene; chromosome walking; recombinant DNA) Eunpyo Moon and Ray Wu Section of Eioche~ist~, molecular and Cell Biology, Cornell Universi~, Ithaca, NY (U.S.A.) Received 3 March 1988 Revised 9 May 1988 Accepted 13 May 1988 Received by publisher 6 June 1988 We describe the isolation and organization, at the nucleotide sequence level, of genes located at the two junctions of the large single-copy region (LSCR) and the two inverted repeats (IR, and IR,) in the rice chloroplast genome. This is the first example where the two junctions are precisely located in a monocot. In rice, a ribosomal protein gene cluster, rpl23-rpl2-rps19, which codes for the ribosomal proteins L23 (r&23), L2 (~012)and S19 (@9), lies at the ends of the two IRS near the LSCR. The inverted repeats end 45 bp from the translation stop codon of rpsl9. The gene for the 3%kDa photosystem II protein, psbA, is located at the extremity of the LSCR near IR,, and transcribed towards IR,. The translation stop codon of p&A is 68 bp from the right-hand junction (JLA). Thus, J,, is located within the intergenic sequence of the two genes, rpsl9 and psbA. Around the lee-had junction (Jr,), there is a typical ribosomal protein gene cluster, ~pZ~3-~~ZZ- rps19-rpl22 frp122 for the ribosomal protein L22). The translation start codon of ~$22 is located in the LSCR 25 bp from J,,. Therefore, I,, is located within the intergenic sequence between tpsl9 and #22. INT~ODU~~ON The chloroplast genomes of land plants are single circular DNA molecules that range between 120-217 kb. One of the most outst~~ng features of the chloroplast genome is a large IR of 10-76 kb, part of which codes for ribosomal RNA genes (for reviews, see Palmer, 1985; Weil, 1987; Whitfeld and Bottomley, 1983). This IR is present in the chloro- Co~~es~o~de~ce to: Dr. Ray Wu, Section of Biochemistry, Molecular and Cell Biology, Cornell University, Ithaca, NY 14853 (U.S.A.) Tel. (607)255-5710. Abbreviations: atpB and a@, genes for / and E subunits of chloropiast ATPase; bp, base pair(s); HB,, fragment covering the right junction; HB,, fragment covering the left junction; IR, inverted repeat; J,,, the right-hand junction; J,,, the left-hand junction; kb, kilobasejs) or 1000 bp; LSCR, large single-copy region; nt, nucleotide(s); p&4, gene for 32-kDa protein of photo- system II; &CL, gene for ribulose bisphosphate carboxylase large subunit; $2, rpl14, r&16, ~122 and rp123, genes for ribosomal protein large subunit 2, 14, 16, 22 and 23; 1ps3, I@? and rpsl9, genes for ribosomal small subunit 3, 8 and 19; SDS, sodium dodecyl sulfate; SSC, 0.15 M NaCl, 0.015 M sodium citrate; SSCR, small single-copy region; tmHI, gene for histidine tRNA; tndl, gene for isoleucine tRNA. 0378-I 119/88,/%03.30 Q 1988 Elsevier Science Publishers B.V. (Biomedical Division)

Organization and nucleotide sequence of genes at both junctions between the two inverted repeats and the large single-copy region in the rice chloroplast genome

  • Upload
    ray

  • View
    212

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Organization and nucleotide sequence of genes at both junctions between the two inverted repeats and the large single-copy region in the rice chloroplast genome

Gene, 70 (1988) 1-12

Elsevier

GEN 02561

Organization and nucleotide sequence of genes at both junctions between the two inverted repeats and the large single-copy region in the rice chloroplast genome

(Ribosomal protein; photosystem II; histidine tRNA gene; chromosome walking; recombinant DNA)

Eunpyo Moon and Ray Wu

Section of Eioche~ist~, molecular and Cell Biology, Cornell Universi~, Ithaca, NY (U.S.A.)

Received 3 March 1988

Revised 9 May 1988

Accepted 13 May 1988

Received by publisher 6 June 1988

We describe the isolation and organization, at the nucleotide sequence level, of genes located at the two junctions of the large single-copy region (LSCR) and the two inverted repeats (IR, and IR,) in the rice chloroplast genome. This is the first example where the two junctions are precisely located in a monocot. In rice, a ribosomal protein gene cluster, rpl23-rpl2-rps19, which codes for the ribosomal proteins L23 (r&23), L2 (~012) and S19 (@9), lies at the ends of the two IRS near the LSCR. The inverted repeats end 45 bp from the translation stop codon of rpsl9. The gene for the 3%kDa photosystem II protein, psbA, is located at the extremity of the LSCR near IR,, and transcribed towards IR,. The translation stop codon of p&A is 68 bp from the right-hand junction (JLA). Thus, J,, is located within the intergenic sequence of the two genes, rpsl9 and psbA. Around the lee-had junction (Jr,), there is a typical ribosomal protein gene cluster, ~pZ~3-~~ZZ- rps19-rpl22 frp122 for the ribosomal protein L22). The translation start codon of ~$22 is located in the LSCR 25 bp from J,,. Therefore, I,, is located within the intergenic sequence between tpsl9 and #22.

INT~ODU~~ON

The chloroplast genomes of land plants are single circular DNA molecules that range between

120-217 kb. One of the most outst~~ng features of the chloroplast genome is a large IR of 10-76 kb, part of which codes for ribosomal RNA genes (for reviews, see Palmer, 1985; Weil, 1987; Whitfeld and Bottomley, 1983). This IR is present in the chloro-

Co~~es~o~de~ce to: Dr. Ray Wu, Section of Biochemistry,

Molecular and Cell Biology, Cornell University, Ithaca, NY

14853 (U.S.A.) Tel. (607)255-5710.

Abbreviations: atpB and a@, genes for / and E subunits of

chloropiast ATPase; bp, base pair(s); HB,, fragment covering

the right junction; HB,, fragment covering the left junction; IR,

inverted repeat; J,,, the right-hand junction; J,,, the left-hand

junction; kb, kilobasejs) or 1000 bp; LSCR, large single-copy

region; nt, nucleotide(s); p&4, gene for 32-kDa protein of photo-

system II; &CL, gene for ribulose bisphosphate carboxylase large

subunit; $2, rpl14, r&16, ~122 and rp123, genes for ribosomal

protein large subunit 2, 14, 16, 22 and 23; 1ps3, I@? and rpsl9,

genes for ribosomal small subunit 3, 8 and 19; SDS, sodium

dodecyl sulfate; SSC, 0.15 M NaCl, 0.015 M sodium citrate;

SSCR, small single-copy region; tmHI, gene for histidine tRNA;

tndl, gene for isoleucine tRNA.

0378-I 119/88,/%03.30 Q 1988 Elsevier Science Publishers B.V. (Biomedical Division)

Page 2: Organization and nucleotide sequence of genes at both junctions between the two inverted repeats and the large single-copy region in the rice chloroplast genome

2

plast genomes of angiosperms (Whitfeld and Bot-

tomley, 1983; Palmer, 1985), gymnosperms (Palmer

and Stein, 1986), ferns (Palmer and Stein, 1982;

Stein et al., 1986) and liverworts (Ohyama et al.,

1986). Exceptions include the pea, broad bean and

chick pea of the family Leguminosae, in which one

of the IRS is lost (Koller and Delius, 1980; Pahner

and Thompson, 1981; Chu and Tewari, 1982;

Palmer et al., 1987).

Although the IR can vary from 10 to 76 kb among

angiosperms, it is striking that in most taxa it is a

rather constant 22-26 kb in size. Even more striking

is the observation that the junction between the

inverted repeat and the LSCR is located in a more

or less fixed position around the rpsl9 gene in

spinach, ~~c5tia~u debneyi, N. tabacum, and

~pi~o~~~a (Posno et al., 1985; Zurawski et al., 1984).

It has been speculated by Palmer (1985) that this

reflects some measure of selection operating to

constrain the boundaries of the IR. It is clear, how-

ever, that the boundaries can occasionally shift by

intermediate amounts compared to the entire dele-

tion of the repeat in certain legumes or tripling in size

as in geraniums (Palmer et al., 1987). The 217-kb

geranium chloroplast genome possesses a greatly

enlarged IR of 76 kb, the result of spreading the IR

into both SSCR and LSCR, producing duplicated

genes which are single copy in all other angiosperms.

In coriander, the IR has shrunk to no more than half

the normal size so that rp12, which is normally located

within the terminus of repeat, is a single-copy gene

over 10 kb away from the end of the repeat (Palmer,

1985).

In this paper, for the first time for monocots, we

report the exact locations of two j~ctions between

the LSCR and the IRS in the rice chloroplast genome.

We also report the organization and the exact

sequence of the genes around the junctions.

MATERIALS AND METHODS

(a) Isolation of total DNA from rice and other piant

species

Rice (Oryza sativa L., var. Labelle) seeds were

germinated as described previously (Kao et al.,

1984). The rapid minipreparation method for iso-

lating DNA from rice seedlings was described in a

previous paper (Moon et al., 1987), and was used to

prepare total DNA from various plant taxa.

(b) Construction of size-selected libraries

Total rice genomic DNA (10 pg) was digested

separately with several restriction enzymes, frac-

tionated on a 0.9% agarose gel, and transferred to a

Nytran filter. The locations and sizes of specific

restriction fragments were determined by hybridiza-

tion analysis using probes, 32P labeled by nick trans-

lation, specific for individual chloroplast genes. Total

rice genomic DNA (50 pg) was digested with a

restriction enzyme that gave only one band, of 2-6

kb in size, upon the above-mentioned Southern blot

analysis, and then fractionated on a 0.9% low-

melting-point agarose gel. The area cont~ning the

desired fragment was isolated, and the DNA was

eluted and ligated to the vector pUC13 that had been

treated with calf-intestine alkaline phosphatase. The

recombined plasmids were used to transform

Escherichia co/i JMlOl cells. Ampicillin-resistant

white colonies were selected and transferred to nitro-

cellulose filters. This size-selected amplified library

gave one positive colony per every 100 colonies upon

colony hyb~~zation using a chloroplast DNA

probe, 32P labeled by nick translation.

(c) Chromosome walking using several size-selected

libraries

To isolate the chloroplast DNA fragments cover-

ing the junctions between the IRS and the LSCR, the

chromosome walking method was employed. In-

stead of using a library constructed with DNA

digested partially with a specific restriction enzyme,

we used several size-selected libraries constructed

separately with several restriction enzymes, as de-

scribed in section b above. We previously reported

a rice mitochondrial DNA fragment (pMt-0) which

contains a rearranged chloroplast gene cluster, rpZ2-

rbcL-atpB-atpE (Moon et al., 1987, 1988). As the

first step in chromosome walking, the rice ~$2

homologue in the transferred copy, Mt-0, was used

as a probe to isolate the part of the IRS containing

up12 from a size-selected library. The end of this

fragment was used as the next probe to isolate the

overlapping fragment containing the adjacent DNA

region from a size-selected library constructed after

Page 3: Organization and nucleotide sequence of genes at both junctions between the two inverted repeats and the large single-copy region in the rice chloroplast genome

digestion with different restriction enzymes. The same approach was applied repeatedly until the two IR-LSCR junctions were fully covered.

RESULTS ANDDISCUSSION

(a) Identification and isolation of the rice chloroplast DNA fragments containing rpZ2

The rp12 gene in the chloroplast genome is located at the ends of the two IRS near the LSCR in most of the higher plant taxa. To study the location and organ~ation of the genes at the two junctions between the IRS and the LSCR of the rice chloro- plast genome, we first isolated the rice ~$2 gene on the IRS using the rpl2 homolog from the rearranged

A.

Kb

3

transferred copy, M-0. As the fust step, the 60%bp XbaI fr~ent of the rpl2 homologue from the rearranged cluster in pMt-0 was hybridized to rice genomic blots after individual digestion by BanzHI, BgIII, EcoRI, HindIII, MI, ML, SstI and Xbal. As shown in Fig. lA, there were 2-6 kb bands in the BarnHI, BgEII, EcoRI and Hind111 lanes. As de- scribedin MATERIALS AND METHODS, section(h), a M~dIII size-selected library of 2.8 kb was con- structed, and the transformants were screened using ehe 60%bp X&I fragment. Seven positive colonies were obtained out of 500 transformants. These were all identical. The restriction map of the 2.8”kb Bad111 fragment is shown as a part of the region depicted by heavy black bars in the composite map in Fig. 2. The rem~ning part of the rice qpD gene was isolated by hybridization using the q&Z homolog.

B.

Kb

5.2- 4.3- 3.6-

Fig. 1. Southern blot analysis of rice genomic DNA. Total rice DNA, each sample (10 pg) digested with indicated restriction enzymes,

separately, was fractionated on a 0.9% agarose gel in separate lanes, and then transferred to a Nytran filter. Panel (A): The filter was

hybridized to the nick-translated rpZ2 homologue from the transferred copy. Panel (B): The same filter used for (A) was reprobed with

the nick-translated rice psbA after the $2 probe was removed by boiling in 0.01 y0 SDS, 0.01 x SSC for 20 min, twice.

Page 4: Organization and nucleotide sequence of genes at both junctions between the two inverted repeats and the large single-copy region in the rice chloroplast genome

HBB

trnJ rs3 rpl2 trnH rpsl9 psb A >d ---_);A

trn: e rp12

I I I I H B Ba B E X Ba

i ‘RB JLB LSC

k66$

rbcL rpl2

rearranged rpl2 in mitochondria

I I I I II I

B X C X #)j

______~j_____t__~

EH X B

< rvlt-0

i

Fig. 2. The composite restriction maps and gene locations of the two fragments, HB, and HB,, covering the junctions of the rice chloroplast genome. HB,: fragment covering the right junction (JLA). HB,: fragment covering the left junction (J&. Restriction maps were determined by gel electrophoretic analysis of several single or double restriction enzyme digests of several overlapping clones isolated by chromosome walking. Southern transfers of these digests were hybridized with nick-translated rice rp/2-homologue and rice psbA to locate the corresponding genes. All the other genes are identified using sequencing data and Microgenie DNA analysis program. Inverted repeats are shown by thick lines. The arrows indicate the length and the transcription direction of the corresponding genes. Abbreviations: B, BgZII; Ba, BumHI; C, &I; H, HiadIII; X, XbaI.

(b) Isolation of the two junctions between the in-

verted repeats and the large single-copy region by

chromosome walking

Since the Hind111 fragment did not cover the junctions, we continued chromosome walking with an .&oRI size-selected library of 2.4 kb (Fig. IA). The restriction map of this EcoRI fragment is shown as a part of the composite map in Fig. 2. We con- tinued chromosome walking using BgfII, BumHI and ;YbaI size-selected libraries until the junction region was covered. The final composite restriction maps of the two HindIII-BamHI fragments for the two junctions were constructed as shown in Fig. 2. To determine which fragment represents which junction, we used the rice psbA (Wu et al., 1987) as a marker for J,, since in most higher plants psbA is located in the LSCR just outside J,. The same

filter used for Fig. 1A was probed with the rice psbA and the result is shown in Fig. 1B. Of the two BamHI fragments hybridized in Fig. lA, only the shorter one hybridized to psbA. This result indicates that the shorter BamHI fragment contains the IR, and J,,. Now we designate the two HindIII- BamHI fragments in Figs. 2 and 3 to be HB, and HB,, respectively. The exact location of the rice psbA relative to other genes on HB, was dete~ined by further restriction mapping and sequencing as shown in Figs. 2 and 4.

(c) A ribosomal protein gene cluster, rpl23-rp12- rpsl9, lies at the ends of the two inverted repeats

Extensive DNA sequence analysis of the two frag- ments, HB, and HB,, revealed that lp123+&?- rpsf9, which is a part of the complete ribosomal gene

Page 5: Organization and nucleotide sequence of genes at both junctions between the two inverted repeats and the large single-copy region in the rice chloroplast genome

Fig. 3. Locations of the two cloned fragments, HB, and HB,,

on the rice chloroplast genome. The outermost circle represents

rice chloroplast genome. Two inverted repeats, IR, and IR,,

and the gene for the large subunit of ribulose bisphosphate

carboxylase (&I,) are indicated by filling-in. Jr, and Jr, are the

junctions between the IR, and IR, and the LSCR (LSC in

figure). J,, and J,, are the junctions between the IR, and IR,

and the SSCR (SSC in figure). HB, and HB, represent the

HindIII-BamHI fragments which span .I,, and Jr,, respec-

tively.

cluster, rp123-rp12-rps19-rp122-rps3-rp116-rp114-rps8

(Tanaka et al., 1986; Shinozaki et al., 1986), is

located at the very end of the two IRS (Fig. 2). The

sequences are shown in Fig. 4. The two IRS end 45

bp from the translation stop codon of rpsl9. As a

result, the first three genes of the cluster, rpZ23, rpZ2

and rpsl9, are duplicated in each circular chloroplast

genome, while the genes in the rest of the cluster

occur only once in the LSCR near the left junction.

The gene for isoleucine tRNA, tmI1 is located on the

same strand as rpZ23 and starts 248 bp from the

translation start codon of rpZ23 (Figs. 2 and 4).

(d) Gene psbA is located at the extremity of the large

single-copy region near the right junction

As mentioned earlier, psbA, the gene for the photo-

system II 32-kDa protein, is mapped at the very end

of the LSCR near the right junction by Southern blot

analysis of restriction digests of the clone (Figs. IB

and 2). Further sequencing data revealed that the

translation stop codon, TAA, is in the LSCR 68 bp

from J,,. The 42-bp sequence which can form the

stable stem and loop structure functioning as the

psbA transcription terminator (Zurawski et al.,

1982) spans the last 6 bp of the IR and the first 36

bp of the LSCR (positions 2631-2672 in Fig. 4).

Therefore, the right junction between the IR and the

5

LSCR lies within the transcription terminator of

psbA in rice.

(e) The sequences around the junctions are highly

divergent in different taxa

As mentioned earlier, the IR is usuahy 22-26 kb

in size in most angiosperms, and the junction

between the IR and the LSCR is located in a more

or less fixed position, within the region covering

rpsl9. Detailed sequence comparison of the junction,

however, revealed that the sequences around the

junctions are very divergent in all of the taxa

examined: rice, maize, spinach, Nicotiana debneyi,

Nicotiana tabacum, and liverwort. The divergence is

due mainly to the expansion or shrinkage of the IRS.

The transposition of trnHZ to the different locations

is another factor generating divergence, which will be

discussed later in detail.

To examine the divergence more closely, we com-

pared the sequences near the junction of two pairs of

plants, rice versus maize and N. debneyi versus

N. tabacum (Fig. 5). The overall sequence homology

of the region covering the junction between the IR

and the LSCR is over 95% in both pairs. The

junction is located about 40 bp downstream from the

translation stop codon of rpsl9 in the fust pair of

plants, rice and maize (Fig. 5A); however, it is

located around the 5’ end of rpsl9’ in the second

pair, N. debneyi and N. tabacum, where the 3’

portion of rpsl9 is located on the LSCR, thus, the

two copies of rpsl9 are not identical, and the one at

J, is designated lps19’ (Fig. 5B). Comparison of

the two sets of sequences revealed that there are

deletions around the junction boundaries. This

implies, regardless of the location of the junction,

that deletions around this region seem to have

occurred readily. Therefore, the deletions are another

major factor for the sequence divergence at the

junction region.

(f) Intervening sequence in rp12

Some chloroplast genes contain introns, a charac-

teristic of eukaryotic genes. In tobacco, 15 identified

and putative genes (six tRNA genes and nine protein

genes) have been shown to contain introns

(Shinozaki et al., 1986), while in Marchantia these

numbers are six and twelve for tRNA genes and

Page 6: Organization and nucleotide sequence of genes at both junctions between the two inverted repeats and the large single-copy region in the rice chloroplast genome

{TACT YTTAATGTC GAATCGGGA TTCACTAAGACAGAA TEXSLAZLGXNQYTFNVESGFTKTE

ATA AAG CAT TGG GTC GAA CTC TTC TTT GGT GTA TAG GTA GTA GCT GTG ART AGC CAT CGA CTA CCC GGA AAG GGT IKXWVELFFGVIJVVAVNSHRLPGXG

AGA AGA ATG GGC CCT ATT CTA ~ RRMGPLLGHTMHYRRMlITLQPGYS

rpl2 a CCA >CTTAATAAT AlX GCG AAA CAT TTA TAC AhA ACA CCT ATC I P L L D R E K N MAXNLYKTPI

CCGAGC ACA CGCAAG GGAACC ATAGATAGG CAA GTGAAA TCCAAT CCACGAAATAAT TTGATC CAT GGA CGG CAC PSTRKGTIDRQVKSNPRNNLIHGRH

CGT TGT GGT Ahh GGT CGT AAT TCC A&.% GG& ATC ATT ACC GUI RGG CAT WA GGG GGP. GGT CAT A&c: CGC CTh TAC RCGKGRNSRGIITARURGGGHKRLY

C&T m 81%ii83: m cm cGG iw mu AAA GAC ATA TUT wr hGA ATE GTA act ATA GAA TAC mc cm AAT ew\ RKIDFRRNEXDISGRIVTIEYDPNR

~~T~AT"~~CA~~~"""""";"G"‘TATATT"""A""""GCTATA" YGDGEKGXILHPRGAII

GGA GAT ACT ATT GTT TCT GGT ACA AAA GTT CCT ATA TCA ATG GGk AAT GCC CTA CCT 'ITG A%TGCGGTTTGA%CTATTG GDTIVSGTXVPISMGMALPL

ATTTACGT~~~GTAACCAATTACGACGAAACCTAWLA XbalC

ACTGAAGAGTAACGGCAGCAAAAGTGILTTGAGTTCAGTAGTTCCTCATAGAAAAlTATTGAC ZZAGA GATATAGTAATATGGAGAAGACTGTT TCAAGCACCGACAWLACC~GCGCCCGTTCmCAARGAGRGGA AGTGGTAATTAAAGATTCCCC~GG~TAGG~TGTCT~CTACG~ACC~T~TAT~~A~GTATC~CGT~~~TA~GT~~C~TCTG AATGCTACAT~~~T~~~~T~C~CGC~~CCTA~~TA~~T~Y~~T~~~~G~~~G~~CC~CCT

xbar Xba1 ATATA~~ACTCATGTGGTA~T~~~C~ATAC~~~TAT~~~~~TG m TATCATCATCTACA ZZii@, AAGCCGTATGCTTTGGA

HirsdIII AG~6~~~~~TT~~~ GAA~TCTAC~~CC GAT ATG CCC TTA GG& ACG G&C ATA CAT

TIIMPLGTAIB

GAAGGTAAATCG GCCACT TTAAGATTACCATCT GGGGAG GTC CGT TTG GTATCC CAAAAC TGC TTAGCAACAGTC EGKSATLRLPSGEVRLVSQNCLATV

GGA CA?b GTG GGT AAT GlT EGG GTG ARC AAA AAA AGT TTG GGT AGA GCC GGG TCT AAG TGT TGG CTA GGT AAA CGC GQVGNVGVNKKSLGRAGSKCWLGXR

CCC GTA GTA AGA GGG GYA GTX ATG AX CCT GTG GAC CAC CCC ATG GEG GCG GTG AAG GGA A?iG CCC CC% TTG GTA PVVRGVVMNPVDHPMGAVKGKFPLV

GAAAAAAAC CCACAACCC CCT TGGGGT TATCCT GCG CTY G~A~A~ACTAGG~A~~~TATA~ GAT EKNPQPPWGYPALGRRTRXRXKYSD

98 198

286

361

436

511

589

664

739

814

889

968

1066

1164 1263 1363 1463

1559

1646

1796

1871

1946

2021

2210 2309

OM RCA CGA AAA AAA ACG AAT CCT TTT GTA GCT CAT CAT x"FA %'I% GCA AAA ATA GAA AAG GTC AAT ATG AAG AGA 2384 MTRICKTNPFVAHHLZAKIEKVNNKE

GAG AAA GAA ACA ATA GTA ACE TGG TCC CGG GCA TCT AGC ATT CAT CCC GCh ATG GTl' GGC CAT ACA ATC GCG ATT 2459 EKETIVTISRASSIHPAMVGHTIAI

CAT AAT GGA AhG GAA CAT ATA CCT ATT TAC ATA RCA AAT CCT ATG GTA GGT CL% AAA TTG GGG U GTA CCA 2534 KNGXEH~PIYITNPMVGRXLGEFVP

Ec0R.I ACT CGG CAT TM ACG AGT TAT GAA AGT GCA AGA AAG GAT ACT AAA TCT CGT CGl' m CT ~TA~~~ 2613 TRBFTSYESARKDTKSRR

Ju

c----w I L$c----+ CAAAATAARAAAAAAAAGRAATA/CCCAATATCTT~TAGC'ITTCCTTTCTTC!AFAAATTGCTATATG '," 2706 CAWIATAAAAAAAAAAAGAAATA/GETAAGC~TAACATTT m ADA AGT TTC AAA TUG GTA AAG TAT act CCT AGG 2696

+____nt,

I

LS+___, rp122

JLB Fig. 4. Nucfeotide sequence of the genes located around the junctions of the rice chloroplast genome. Translation initiation and stop codons are indicated by bold letters. The tRNA genes are underlined with thin lines. Only the complementary strand of rmH1 is shown. Restriction sites are shown above the sequence with underlines. Triangles indicate the first and the last bases of $2 intro% Amino acid sequences are indicated by one letter under the corres~o~d~g codons. The sequences up to 2613 are common to IR, and W,. The junctions are indicated by siashes between the IRS and LSCR. The sequerrces for p&A ~~~~S~~~~5n terminator near the $mction J,, are underlined. The ISl-bp repeat sequence (nt poskions 264-323 and 459-549) is underlined with boId lines.

Page 7: Organization and nucleotide sequence of genes at both junctions between the two inverted repeats and the large single-copy region in the rice chloroplast genome

A. JLB

Rice G------_---_____---_--_-___TMG(-G

Maize --TAA ATTCAAATAATCAAATAAATTAAAGG - GC

rp122 Rice GGGAATAACCTTATTTATG Maize An:

B.

12 I.&&-&& 3x L

H.tabac\lm -- TAG A ------

rp12

JLA trnE1

P.Asbnevi TATGG

N.aka!m __________________-~_--_----~_-__---____----___----_----____---_ a!&

Fig. 5. Comparison of junction sequences in two pairs of closely related plants. Pane1 (A): rice vs. maize, Pane1 (B): N. debneyi vs.

N. tabacum. The DNA sequences are aligned to maximize homology. Only those nucleotides which differ from the top sequences are

shown for the bottom ones. The translation start and stop codons for $2, rpsl9 and rp122 are shown in bold letters. Dashes indicate

deletions. The junctions are indicated by triangles, and the IRS on the left side of the junctions by boxes. The sequence of the LSCR

is on the right side of the junction. Underlined sequences are complementary strand of tmHI.

protein genes, respectively (Ohyama et al., 1986).

Contrary to this, tRNA genes have no introns in

Euglena chloroplasts, whereas multiple introns are

found in some protein genes that have no intron in

higher plants, such as rbcL which has nine introns,

or psbA which has four introns (Koller and Delius,

1984; Karabin et al., 1984; Crouse et al., 1985).

There is an intron in rpZ2 from rice (667 bp),

N. debneyi (666 bp) and Marchantia (445 bp), but no

intron in rp12 from spinach (Zurawski and

Bottomley, 1984; Ohyama et al., 1986). The rice rp12 intron shows 95% sequence homology with that of

N. debneyi, and 60% sequence homology to that of

Marchantia. Using the rice rp12 intron-specific probe,

we examined the existence of the intron in rp12 from

various plant taxa (three monocots and nine dicots).

Except for spinach, all twelve taxa examined have

the intron in rp12 (Fig. 6B). This suggests that the

intron was lost from spinach after it diverged from

other taxa.

(g) The ~~123 sequence is upstream of the rp12 gene

in the ribosomal protein gene cluster

The protein-coding region of the rp123 gene was

assigned on the basis of amino-acid homology

between the predicted product of the coding region

and the E. coli L23 ribosomal protein (Tanaka et al.,

1986). The assignment is further supported by the

adjacent and colinear clustering of the ribosomal

genes, rp123-rp12-rpsl9-rp122-rps3-rpll6-rpll4-rps8,

in both the chloroplast genome and the E. coli genome.

Our sequencing data revealed that the rice rp123 sequence potentially encodes the L23 protein and is

located upstream of rp12 (Fig. 4), which is the typical

order of the ribosomal protein gene clustering. The

rice rpl23 gene shows 89.4% sequence homology

with that of tobacco.

Zurawski and Clegg (1987) showed that the

chloroplast rpZ23 region, in all but two genomes

examined, has undergone extensive additionldele-

Page 8: Organization and nucleotide sequence of genes at both junctions between the two inverted repeats and the large single-copy region in the rice chloroplast genome

Fig. 6. Identification of the rpZ2 intron in various plant taxa. Southern blot analysis was performed as for Fig. 1, using HindHI-digested total genomic DNA from various plants. Panel (A): The filter was hybridized to the nick-translated exon-specific probe, XbaI-ClaI (227 bp, Figs. 3 and 4). Panel (B): The same filter was reprobed with the nick-translated intron-specific probe, X&I-XbaI (390 bp, Figs. 3

and 4).

tion changes that are ch~~cte~stic of non-protein- coding regions. They further speculate that this result may indicate either a loss of the requirement for the L23 protein in chloroplast ribosomes, or a trans- location of this gene to the nuclear genome of at least some plants.

Relative to this speculation, we have made an interesting observation in our previous paper {Moon et al., 1988). The i51-bp repeat sequence which we found to be repeated upstream of the rice ~$2 and downstream of the rice &CL is now identified as a part of rpD3 (nt positions 264-323 and 459-549 in Fig. 4) which is located upstream of the rice rp12. Using the 151-bp sequence as a probe, we examined the extra locations of this sequence in 13 different taxa and found that all have extra copies elsewhere

on either the chloroplast or nuclear genome (Moon et al., 1988). Therefore, it is possible that any one of these duplicated copies is functional in species where the rpZ23 sequence of the typical ribosomal protein gene cluster has undergone extensive mutations, rendering it non-functional.

(h) The gene for histidine tRNA, hnH1, is located in

the intergenie region between rpl2 and rpsf9 on the

IRS of the rice thlorapiast genome

Schwarz et al. (1981) have studied the 2.2-kb maize chloroplast DNA fragment that contains the tmH1 gene and a gene for a 1.6-kb RNA. Nucleotide sequences of this region can be aligned to the inter- genie region between the rice rpl.2 and ~7~19. This

Page 9: Organization and nucleotide sequence of genes at both junctions between the two inverted repeats and the large single-copy region in the rice chloroplast genome

Rice Maize

Spinach

debneyi N.

N. tabacum --

Liverwort

JLA rp12 rpsl9 ;

- - trnHl ; psbA

‘R A LSC

rpl2

‘RA

JLA i rpsl9’

~trntit psbA LSC

rp12

‘RA

JLA &pslQ’

$rnHI psbA LSC

rpl2

‘RA

Jp jtrnHl psbA

LSC

trnHl

IRA * LSC

I. i

500 bp

Fig. 7. Location of tvnHI relative to the junction on the chloro- plast genome of various plant taxa. The dashed vertical lines indicate the position of the right junction, J,,. The areas on the left of the dashed lines represent IR,, and those on the right LSCR. Genes transcribed toward the LSCR are indicated by solid bars, those transcribed toward the inverted repeat by open bars. On the liverwort chloroplast genome, trnHI and psbA are located 28.4 kb away from the .I,. By introducing a gap in the map, the two genes are drawn on the same scale.

alignment revealed that the t&51 gene of rice is identical to that of maize, and that a 1.6kb RNA is the transcription product from the rpsl9 gene. Since

the tnztil gene is located in the intergenic region between rpZ2 and rpsl9 on the two IRS of rice and maize (Fig. 2), the tmHZ gene occurs twice in the chloroplast genomes of these two taxa, while it occurs only once in those of all other taxa so far examined. The transcript studies by Schwarz et al. (198 1) revealed that the transcription units of the two genes, tmHI and ~psf9, overlap and are divergently transcribed from ~omp~rnent~ DNA strands. It was further suggested t&at transcription of one possi- bly interferes with that of the other. We speculate that the duplication of the two genes by being on the two inverted repeats may compensate for the possi- ble transcription interference.

Fig. 7 summarizes the locations of tr&U relative

to Jr_, in different taxa. It is noteworthy that while in most taxa one strand is relatively fixed for the ribosomal protein genes ~12 -rpsl9, the trnH1 gene is transposed along the complement~y strand covering

9

these genes. There are also extreme cases like liver- wort, where trnHI is transposed 28.4 kb away from J LA' Although the relative location of trnHI in

various taxa is very different, the sequence is ex- tremely conserved; there are two nucleotide dif- ferences between trnHI from rice and spinach, with the rice sequence being identical to those found in tobacco, maize and soybean. Interestingly, however, the flanking sequences are very divergent in all species.

(i) The intergenic region between rp12 and rpsl9 of rice is highly divergent from those of other taxa

The distance between rpl2 and rpsl9 is about 70-80 bp long and more or less invariable in most of the taxa examined so far: three dicots (spinach, IV. debneyi and A? tabacum) and liverwort. The se- quences of this region are quite conserved, although there are a few deletions of stretches of sequences. In rice, however, this region is very divergent as com- pared to those in other taxa. The major difference is the longer distance of 261 bp with trnHI within this region. Interestingly, the remainder of the sequence flanking trnHI does not show any homology to the flanking sequences of trnHI of other taxa. We speculate that tlse sequence divergence of the region is due not only to the disposition of trnHI into this region, but also to other events such as inversion, addition, or deletion of sequences.

(j) Unusual initiation codons are used in rice rpZ2 and rpsl9: ACG for rp12 and GUG for rpsl9

After the triplet genetic code was elucidated in 1966, it was thought to be universal. Recently, how- ever, deviations from the universal code have been reported (for reviews, Fox, 1987; Kozak, 1983; Wallace, 1982). Variations in codon assignment are mainly found in mitochondrial genetic systems of many organisms, nucleocytoplasmic systems of some protozoa, and genetic systems of some pro- karyotes. In addition to mitochondria, some free- living prokaryotes and eukaryotes use variations of the genetic code such that codons normally specify- ing translation termination specify amino acids (Fox, 1987).

The variations are also found in the initiation codon. In the classical ribosome binding expe~ents

Page 10: Organization and nucleotide sequence of genes at both junctions between the two inverted repeats and the large single-copy region in the rice chloroplast genome

TABLE I

Start codon used for genes rpl2 and rpsl9 in E. coli and various

pxant taxa

AUG AUG

ACG GUG

AUG GUG

AUG GUG

AUG GUG

AUG AUG

that were carried out in the 1960s with ~~u~ieotides as templates, AUG, GUG and UUG were found to stabilize the binding of fMet-tRNA to E. coli ribo- somes (Clark and Marcker, 1966). Nucfeotide se- quence analyses of bacterial and phage mRNAs sub- sequentIy confirmed that prok~yotic ribosomes are not limited to using AUG as the initiation codon. There are 12 known genes that initiate with GUG, three with UUG, and one with AUU (Kozak, 1983). Same of the genes in the ribasomal protein gene cluster of the rice chloroplast genome use different initiation codons (Table I>. In contrast to the usual AUC initiation codon in Q&Z, of E. cold, all the plant species so far examined, except liverworts, use GUG. Rice is unique in that it has ACG as the i~tia~io~ codon in ~12. By using defined polynucle- otides as templates in a phasing assay, Thach et al. (1966) identified ACG as a start codon. Our observa- tion is the first case of an ACG start codon found in a functional gene. However, more work needs to be carried out to confirm this unusual start codon.

(k) Conclusions

The sequence of the two junctions between the LSCR and the fRs in the c~oroplast genome of four dicots (soybean, spinach, N. ~e~~e~~i and N. t&z- cum), spirodela and liverwort (Ohyama et al., 1986; Posno et al., 1985; Shinozaki et al., 1986; Spielman and Stutz, 1983; Sugita et al., 1984) has been reported. In the maize chloroplast genome, an open reading frame that can now be identi~ed as part of rp.719 was reported to be contained in an EcoRI fragment derived entirely from within the IR region (Schwartz et al., 1981). However, the location of the two ends of the IRS relative to the LSCR af maize has not been establish~.

In the rice chloroplast genome, the J,, is located within the intergenic region between rpsl9 and rpl22, while J,, is located within the intergenic region between rpsl9 and p&A. Also, we have established the exact locations and the complete sequences of the ribosomal gene cluster ~~~~~-~~-~~~~~, two tRNA genes (&$I and ~~~~~~, and p&A around the junctions in the rice chloroplast genome. These genes are commonly found in the region covering the junction between the IRS and the LSCR in the taxa studied thus far. Sequence comparison of the junction revealed that the sequences around this region are divergent in all taxa examined, mainly because of the expansion or shrinkage of the IRS and the different location of Toni. By comparing the junction sequences of two pairs of closeiy retated plants, we showed that there have been frequent additions or deletions of sequences around the junctions, which also caused sequence divergence in this region. Therefore, there are some mechanisms generating flexibility in the junction sequence, proba- bly at the time of fixing the junction, although there are mechanisms of selection operating to constrain the junction aear ~~19 in most species as suggested by Palmer (1985).

The relatively different l~a~o~s of &&$I around the junction in different species are espe&lIy inter- esting considering the extremely high nucleo~ide sequence homology of their coding region but the lack of any homology of their flanking sequences. As suggested by Spielmann and Stutz (1983), the di$- ferent locations of the t&U gene may reflect that during evolution not only the single-copy region, but also the rather conservative IR region, underwent DNA rearrangement.

We thank E. Kemmerer for careful reading of this manuscript and V. Shaff for typing. This work was supported by research grant No. RF84066, Allo- cation 3, from the Rockefeller Foundation, and by grant No. GM29179 from the NIH, U.S. Public Health Service.

Page 11: Organization and nucleotide sequence of genes at both junctions between the two inverted repeats and the large single-copy region in the rice chloroplast genome

11

REFERENCES of geranium chloroplast DNA: a triple-sized inverted repeat, extensive gene duplications, multiple inversions, and two

repeat families. Proc. Natl. Acad. Sci. USA 84 (1987a) 169-173.

Palmer, J.D., Osorio, B., Aldrich, J. and Thompson, W.F.: Chloroplast DNA evolution among legumes: loss of a large inverted repeat occurred prior to other sequence rearrange- ments. Curr. Genet. 11 (1987b) 275-286.

Posno, M., Torenvliet, D.J., Lustig, H., Van Noort, M. and Groot, G.S.P.: Localization of three chloroplast ribosomal protein genes at the left junction of the large single-copy region and the inverted repeat of Spirodela oligorhiza chloro- plast DNA. Curr. Genet. 9 (1985) 211-219.

Schwarz, Z., Jolly, SO., Steinmetz, A.A. and Bogorad, L.: Over- lapping divergent genes in the maize chloroplast chromosome and in vitro transcription ofthe gene for tRNAHiS. Proc. Natl. Acad. Sci. USA 78 (1981) 3423-3427.

Shinozaki, K., Ohme, M.,Tanaka,M., Wakasugi, T., Hayashida, N., Matsubayashi, T., Zaita, N., Chunwongse, J., Obokata, .I., Yama~chi-S~nozaki, K., Ohto, K., Torazawa, K., Meng, B.Y., Sugita, M., Deno, H., Kamogashira, T., Yamada, K., Kusuda, J., Takiwa, F., Kato, A., Tohdoh, N., Shimada, H. and Sugiura, M.: The complete nucleotide sequence of the tobacco chloroplast genome: its gene organization and ex- pression. EMBO J. 5 (1986) 2043-2049.

Spiehnann, A. and Stutz, E.: Nucleotide sequence of soybean chloroplast DNA regions which contain the psbA and trnH

genes and cover the ends of the large single-copy region and one end of the inverted repeats. Nucleic Acids Res. ll(lP83) 7157-7167.

Stem, D.B., Palmer, J.D. and Thompson, W.F.: Structural evolu- tion and flip-flop recomb~ation of chloroplast DNA in the fern genus Osmunda. Curr. Genet. f0 (1986) 835-841.

Sugita, M., Kato, A., Sbimada, H. and Sugiura, M.: Sequence analysis of the junctions between a large inverted repeat and single-copy regions in tobacco chloroplast DNA. Mol. Gen. Genet. 194 (1984) 200-205.

Tanaka, M., Wakasugi, T., Sugita, M., Shinozaki, K. and Sugiura, M.: Genes for the eight ribosomal proteins are clustered on the chloroplast genome of tobacco (Nicotiana

tabacum) similarly to the S 10 and spc operons of Escherichia

coli. Proc. Natl. Acad. Sci. USA 83 (1986) 6030-6034, Thach, R.E., Sundararajan, T.A., Dewey, K., Brown, J.C. and

Doty, P.: Translation of synthetic messenger RNA. Cold Spring Harbor Symp. Quant. Biol. 31 (1966) 85-97.

Wallace, D.C.: Structure and evolution of organelle genomes. Microbial. Rev. 46 (1982) 208-240.

Weil, J.H.: Organization and expression of the chloroplast genome. Plant Science 49 (1987) 149-157.

Whitfeld, P.R. and Bottomley, W.: Organization and structure of chloroplast genes. Annu. Rev. Plant Physiol. 34 (1983) 279-310.

Wu, N.H., Cote, J.-C. and Wu, R.: Structure of the chloroplast psbA gene encoding the QB protein from Ovza sativa I.. Dev. Genet. 8 (1987) 339-350.

Zurawski, G. aud Clegg, M.T.: Evolution of higher-plant chloro- plast DNA-encoded genes: implications for structure-func-

Chu, N.M. and Tewari, K.K.: Arrangement of the ribosomal RNA genes in chloroplast DNA of Leguminosae. Mol. Gen. Genet. 186 (1982) 23-32.

Clark, B.F.C. and Marcher, K.A.: The role of N-formyl- methionyl-sea in protein bios~~esis. J. Mol. Biol. 17 (1966) 394-406.

Crouse, E.J., Schmitt, J.M. and Bohnert, H.J.: Chloroplasts and cyanobacterial genomes, genes and RNAs: a compilation. Plant Mol. Biol. Rep. 3 (1985) 43-89.

Fox, T.D.: Natural variation in the genetic code. Annu. Rev. Genet. 2 (1987) 67-91.

Kao, T.-H., Moon, E. and Wu, R.: Cytochrome oxidase subunit II gene of rice has an insertion sequence within the intron. Nucleic Acids Res. 12 (1984) 7305-7315.

Karabin, G.D., Farley, M. and Hallick, R.B.: Chloroplast gene for Mr 32000 polypeptide ofphotosystem II in EugZena gra&s

is interrupted by four introns with conserved boundary sequences. Nucleic Acids Res. 12 (1984) 5801-5812.

Keller, B. and Delius, H.: viciafaba chloroplast DNA has only one set of ribosomal RNA genes as shown by partial denatu- ration mapping and R-loop analysis. Mol. Gen. Genet. 178 (1980) 261-269.

Koller, B. and Delius, H.: Intervening sequences in chloroplast genomes. Cell 36 (1984) 613-622.

Koller, B., Delius, H. and Helling, R.B.: Structure and rearrange- ment of rRNA genes in chloroplast DNA in two strains of EugCena gracilis. Plant Mol. Biol. 3 (1984) 127-136.

Kozak, M.: Comparison of initiation of protein synthesis in prokaryotes, eukaryotes, and organelles. Microbial. Rev. 47 (1983) l-45.

Moon, E., Kao, T.-H. and Wu, R.: Rice chloroplast DNA molecules are heterogeneous as revealed by DNA sequences of a cluster of genes. Nucleic Acids Res. 15 (1987) 61 I-630.

Moon, E., Kao, T.-H. and Wu, R.: Rice mitochondrial genome contains a rearranged chloroplast gene cluster. Mol. Gen. Genet. (1988), in press.

Ohyama, K., Fukuzawa, H., Kohchi, T., Shirai, H., Sane, T., &no, S., Umesono, K., Shiki, Y., Takeuchi, M., Chang, Z., Aota, S., Inokuchi, H. and Ozeki, H.: Chloroplast gene organization deduced from complete sequence of liverwort ~arc~an~a po~mo~ha chloroplast DNA. Nature 322 (1986) 572-574.

Palmer, J.D.: Comparative org~ization of chloroplast genomes. Annu. Rev. Genet. 19 (1985) 325-354.

Palmer, J.D. and Stein, D.B.: Chloroplast DNA from the fern Osmunda cinaamomea: physical organization, gene locali- zation and comparison to angiosperm chloroplast DNA. Curr. Genet. 5 (1982) 165-170.

Palmer, J.D. and Stein, D.B.: Conservation of chloroplast genome structure among vascular plants. Curr. Genet. 10 (1986) 823-833.

Palmer, J.D. and Thompson, W.F.: Rearrangements in the chloroplast genomes ofmung bean and pea. Proc. Natl. Acad. Sci. USA 78 (1981) 5533-5537.

Palmer, J.D., Nugent, JM. and Herbon, L.A.: Unusual structure

Page 12: Organization and nucleotide sequence of genes at both junctions between the two inverted repeats and the large single-copy region in the rice chloroplast genome

12

tion and phylogenetic studies. Annu. Rev. Plant Physiol. 38

(1987) 391-418.

Zurawski, G. and Zurawski, S.M.: Structure of the Escherichia

coli S 10 ribosomal protein operon. Nucleic Acids Res. I3

(1985) 4521-4526.

Zurawski, G., Bohnert, H.J., ~itfeld, P.R. and Bottomley, W.:

Nucleotide sequence of the gene for the 32 OOO-Mr thylakoid

membrane protein from Spi~acja oleracea and ~~catian~

debneyi predicts a totally conserved primary translation pro-

duct of Mr 38950. Proc. Natl. Acad. Sci. USA 79 (1982)

1699-1703.

Zurawski, G., Bottomley, W. and Whitfeld, P.R.: Junctions ofthe

large single-copy region and the inverted repeats in Spinacia

olevacea and Nicotiana debneyi chloroplast DNA: sequence of

the genes for tRNAHiS and the ribosomal proteins Sl9 and

LZ. Nucleic Acids Res. 12 (1984) 6547-6558.

Communicated by T.D. M&night.