4
Biochimica et Biophysica Acta, 1216 (1993) 335-338 335 © 1993 Elsevier Science Publishers B.V. All rights reserved 0167-4781/93/$06.00 BBAEXP 90574 Short Sequence-Paper Nucleotide sequence of the genes encoding the L3, L4 and L23 equivalent ribosomal proteins from the archaebacterium Halobacterium halobium Yoshiki Yuki, Rikako Kanechika and Takuzi Itoh * Department of Bioresource Development, Hiroshima Prefectural Universisty, Shobara-shi, Hiroshima 727 (Japan) (Received 23 June 1993) Key words: Ribosomal protein; Nucleotide sequence; Archaebacterium; Evolution; (H. halobium) A AEMBL clone containing a gene cluster coding for the ribosomal proteins L3, L4, L23 and 5' region of L2 was identified in a genomic library for the halophilic archaebacterium Halobcterium halobium using a heterologous hybridization probe from the related organism Halobacterium marismortui. The clone also contains two conserved open reading frames found in H. marismortui, although with still unknown function. Its gene organization is very similar to that of '$10 operon' of H. marismortui. The deduced amino acid sequence of these ribosomal proteins (HhaL3, HhaL4, HhaL23 and 5' region of HhaL2) shows high similarity (64-71%) to those of the archaebacterium H. marismortui and a lesser degree of similarity to their eukaryotic (31-42%) and eubacterial (17-33%) counterparts. Sm K Nr Na Na S Na SS SS BS Na Sa SSm I I I I I I II II III I I[ N I I I [ I i oloo [ 2 010lo [ 3o__olo b D orf orf Hha Hha Hha Hha 1 2 L3 L4 L23 L2 Fig. 1. Restriction endonuclease map for H. halobium DNA showing the organization of the ribosomal proteins. The locations of orf, ribosomal protein HhaL3, HhalA, HhaL23 and HhaL2 genes are shown by thin lines. Restriction enzyme cleavage sites are indicated by the following abbreviations: B, BamHI; Na, NaeI; Nr, Nrul; K, KpnI; S, Sall; Sa, SacI, Sm, Sinai. Until the latter half of 1970s, the biological kingdom was divided into two parts, eukaryotes and prokaryotes. However, based on a study on the structure of riboso- mal RNA, one of the cellular substances essential to organisms, Woese et al. in 1977 differentiated a group markedly different evolutionary and phylogenetically from other prokaryotes, the archaebacteria, terming other prokaryotes as eubacteria [1]. According to the structural studies of RNA polymerase, elongation fac- tors and ribosomal protein in addition to analyses of ribosomal RNA, archaebacteria form a specific group * Corresponding author. Fax: + 81 82 4740191. The nucleotide sequence data reported in this paper have been submitted to the DDBJ, EMBL and GenBank Nucleotide Sequence Databases under the accession number D14879. far different apparently in evolution, and most of their proteins as gene products are much more analogous to those of eukaryotes [2-6] In this study, a series of structural analyses was carried with groups of ribosomal protein genes in halobacteria, one of the archaebacteria showing pri- mordial properties of eukaryotes with a simple gene constitution, to help in the understanding of the biolog- ical phylogeny, especially the complicated gene struc- tures and modes of gene expression in eukaryotes. Using a 1063 bp PvulI fragment of a gene corre- sponding to Escherichia coli S10 operon of Halobac- terium marismortui as a probe kindly provided by Dr. E. Arndt [4], one positive plaque (termed as A-EMBL • HB2. S10) was obtained from the genomic library of A-EMBL3 of Halobacterium halobium $9, according to the plaque hybridization method. The whole sequence

Nucleotide sequence of the genes encoding the L3, L4 and L23 equivalent ribosomal proteins from the archaebacterium Halobacterium halobium

Embed Size (px)

Citation preview

Page 1: Nucleotide sequence of the genes encoding the L3, L4 and L23 equivalent ribosomal proteins from the archaebacterium Halobacterium halobium

Biochimica et Biophysica Acta, 1216 (1993) 335-338 335 © 1993 Elsevier Science Publishers B.V. All rights reserved 0167-4781/93/$06.00

BBAEXP 90574 Short Sequence-Paper

N u c l e o t i d e s e q u e n c e o f the g e n e s e n c o d i n g the L3, L4 a n d L23 e q u i v a l e n t r i b o s o m a l p r o t e i n s f r o m the a r c h a e b a c t e r i u m

Halobacterium halobium

Yoshiki Yuki, Rikako Kanechika and Takuzi Itoh *

Department of Bioresource Development, Hiroshima Prefectural Universisty, Shobara-shi, Hiroshima 727 (Japan)

(Received 23 June 1993)

Key words: Ribosomal protein; Nucleotide sequence; Archaebacterium; Evolution; (H. halobium)

A AEMBL clone containing a gene cluster coding for the ribosomal proteins L3, L4, L23 and 5' region of L2 was identified in a genomic library for the halophilic archaebacterium Halobcterium halobium using a heterologous hybridization probe from the related organism Halobacterium marismortui. The clone also contains two conserved open reading frames found in H. marismortui, although with still unknown function. Its gene organization is very similar to that of '$10 operon' of H. marismortui. The deduced amino acid sequence of these ribosomal proteins (HhaL3, HhaL4, HhaL23 and 5' region of HhaL2) shows high similarity (64-71%) to those of the archaebacterium H. marismortui and a lesser degree of similarity to their eukaryotic (31-42%) and eubacterial (17-33%) counterparts.

Sm K Nr Na Na S Na SS SS BS Na Sa SSm

I I I I I I II II III I I[ N I I I

[ I i oloo [ 2 010lo [ 3o__olo b D or f o r f Hha Hha Hha Hha

1 2 L3 L4 L23 L2

Fig. 1. Restriction endonuclease map for H. halobium DNA showing the organization of the ribosomal proteins. The locations of orf, ribosomal protein HhaL3, HhalA, HhaL23 and HhaL2 genes are shown by thin lines. Restriction enzyme cleavage sites are indicated by the following

abbreviations: B, BamHI; Na, NaeI; Nr, Nrul; K, KpnI; S, Sall; Sa, SacI, Sm, Sinai.

Until the latter half of 1970s, the biological kingdom was divided into two parts, eukaryotes and prokaryotes. However, based on a study on the structure of riboso- mal RNA, one of the cellular substances essential to organisms, Woese et al. in 1977 differentiated a group markedly different evolutionary and phylogenetically from other prokaryotes, the archaebacteria, terming other prokaryotes as eubacteria [1]. According to the structural studies of R N A polymerase, elongation fac- tors and ribosomal protein in addition to analyses of ribosomal RNA, archaebacteria form a specific group

* Corresponding author. Fax: + 81 82 4740191. The nucleotide sequence data reported in this paper have been submitted to the DDBJ, EMBL and GenBank Nucleotide Sequence Databases under the accession number D14879.

far different apparently in evolution, and most of their proteins as gene products are much more analogous to those of eukaryotes [2-6]

In this study, a series of structural analyses was carried with groups of ribosomal protein genes in halobacteria, one of the archaebacteria showing pri- mordial properties of eukaryotes with a simple gene constitution, to help in the understanding of the biolog- ical phylogeny, especially the complicated gene struc- tures and modes of gene expression in eukaryotes.

Using a 1063 bp PvulI fragment of a gene corre- sponding to Escherichia coli S10 operon of Halobac- terium marismortui as a probe kindly provided by Dr. E. Arndt [4], one positive plaque ( termed as A-EMBL • HB2. S10) was obtained from the genomic library of A-EMBL3 of Halobacterium halobium $9, according to the plaque hybridization method. The whole sequence

Page 2: Nucleotide sequence of the genes encoding the L3, L4 and L23 equivalent ribosomal proteins from the archaebacterium Halobacterium halobium

336

or f l • ~o CCCGGG(;TTTTCGGTCG ACGC ACGGGTCTCGGTG CGCTTT G CCGTTCCG CTGCGG

P (; F S V E T G L G D A D R S D C G izo

GCGTTCGCATTGCGACCTCCCGGCACGGAGCCGCTTTCGGTGGCTTCCTCGG'rGGGTACC R S H C D L A R S R F R l, P R ~ V P

oFf2 -- , s o (;CGAACGATCGCCAGGGACGGCGGTTGACCGTTGCGTTCGGAGCACCGGAACGTGGGCTG

R T 1 A R D G ll* T V A F G A P R G l,

CCGCCAATGCTTGGTGTTTCGGCGGACGCGGTCAACGAGTCGGTCACCGACTACTCCGCA P P II G V S h D A V N E S V T Y S A

• 30o GACGCCCCCGCGCGATTCGACGCCTGGCTCAACACCATCCCCGATCAGGGGAGCGAGGTC D A P R F D A | L N T l P D Q S E V

GTGC{;CACCGAAGAAGCGGTGCTCGCGACACTCGGCTCCCTG CGCTCACGGAGTGAAC(; V R T E A V L A T L G S T I. E * HhaL3 + ~ z ¢, ATGCCACAGCCAAACCGACC CGCAAAGGCTCGATGGGGTTCAGTCCCCGCAAGCGCGCG W P Q N R P R K G S I G F S P K R ^

GAGAGCGAAGTACCGCGCTTCAACTCGTGGCCCGCCGACGACGGTGA AGTCGGCCTCCAG E S E P R F N S W P A D D (; E G L Q

GCGTTCGCTGGCTACAAGGCCGGCATGACCCACGTCGTCCTCGTTGACGACAAGGCCAAC ^ F h Y K A G W T H V V L V D K a N

GCACCGACCGAGGGCATGGAGACGACCGTCCCCGTGACGGTCGAAACGCCGCCCATGCGG ^ P T G il E T T V P V T V E T P J

GCCGTCCGCCTCTACG GGACACGCCGTACGGCAAGAAGCGGCTSACGGAAGTCTGGGCG A V R Y E D T P Y G K K P L T V' W A

720 GACGACACCCACGAGTCGCTCGACCGCACGCTCTCTGTCCCGACGAGGCGAGTAACAGAC D D T E S L D R T L S V P T R V T D

• 7H G A ACTGATCGAGGCGCTCGACACCGAGGAAATCGCCGACATCCGTGTCATCACGCACACG E L 1 A L D T E E A D R V T H T

• 840 G TCCCCGGCG ACCACGCAGGCGTCCCG A ;AAAAACCCGGACGTGATGGAGACTCGCGTC V P G H A G V P K N P D V I~ T R V

90o GGCGGCGGCACGCTCGCCGACCGACTGG AGTTCGCTGCCG CCTC ATCG GGACGGCGGC G (; (; L ^ D R L E F A ^ L I D G G

96o GTCCACGCGTTCGGCGATGTGTTCCGCGCCGGCG GTrCGCCG CGCCGCGGGC TCACC V II ^ G D V F R A G E F D A (; I T

o2o AAAGGCAAGGGCACCCAGGGCCCCGTCAAGCGCTGGGGCGTCCAGAAGCGAAAGGGCAAG K G K T O G P V K R W G O K K G K

CACGCCCGCCACGTGGCGCGcCGcATCGGTAAcCTCGGCCCGTGGAACCCCTcTCGdG'~ H A R Y A R R 1 G N L G P W N S R V

CGCTCGACGGTTCCACAGCAGGGCCAGACCC~CTACCACCAGCGCACCGAACTGAACAAA

R S T P O Q G O T G Y H Q R T L N K

izoo CGCCTCATCGACATCAACGATGGGGACGAGCCGAGCCCGGACGGCGGCTTCCCGAACTAC R l, I D I N D G D E P S P D G G F P N Y

t z,,o GGCGAAGTCGACGGCCCGTACACGCTCGTGAAAGGCTCGGTGCCCGGCCCGGAGCAGCGC G E V G P Y T L V (; S V P (; E 0 R

CTCGTGCGCTTCCGGCTCGGCGTCCGTCCGAACGAATCGCCGCGCCTCGATCCGGAGGTG L V R R L (; V R P N E S P R I, P E V

HhaL4 CGGTACGTGAGCACCGCATCCAACCAGGGATAATTTATGCAGGTAACCGTACGCGACCT6 R Y V T A S N O (; * i O V T R D L

GACGGCGACGACGCTGGCACGCTTGACCTGCCGCGGGTCTTCGAGACGA ACGTCCGCCCG D (; D A (; T L D L P R V E T V R P

i 5 ~> u (; ACCTCGTGAATCGAGCCGTGCTCGCCGCTCAGGCC ATCGGACACAGGAGTACGGCGCG D L V R h V L A A Q A N R T Q Y (; A

G ACGAGTACGCCGGCCTGCGCACC CCGCCGAGTCACAGG(,GAGTGGCCGTGGAATGGCC D E Y G L R T T & E S O G S (; G II A

CACGTGCCGAAGCAA CGGGCAGGGTGCGTCGTGCCCCAGACTCGTTCGGCGGCCGGAA A tl V P ~ T G R V R R A P S F (; R K

GCGCACCCGCCGAAAGCCGAGAAAGACCGCTCGCTCGACGTChACGACAAGGAGCGCCAG A H P K A E K D R S I. D N D E R O

CTCGCCGTTCGCTCGGCGCTGCTGGCGACAGCCGACCACGACAGCGAGCTTGCACGCGGC I, A V S A L L A T D H D S E A R I;

CACAACTTCGACGACGACGTCGAGTTTCCCCTCGTTGTCAGCGACGACTTCGAGGATCTC tl N F D D V E F P L V V S D D E D I,

la~o GTGAAAACCCAGGACGTCGTTTCCCTCCTGGAGGCGCTTGGCGTCCACGCGGACATCGAG V K T D V V S L L E A L G V H D 1 E

i~zu CGCGCAGACGAGGCCGCAACCGTTCGTGCCGGCCAGGGGGAGCTGCGCGGCCGCAAATAC R A D A A T V R A (; Q G E L R R K Y

CAGGAGCCGACGTCGATCCTCTTCGTCACGGCCAGCGAGAGCGGGCCGTCGACGGCCCGC Q E P S 1 L F V T S E S (; P T A R

AACCTCGCGGGGGTCGACGTGGCGTCCGGTCGTGAAGTGAACGCCGAGGACCTCGCGCCG N L A V D V A S G R E V N A E 1, A P

GGCG AGCCGGGCCG ACTG ACGG'rGTGG ACCG AA AGCGCCGTCG AGGAGGTGGCRC AGCG A (; E P R L T v II T E S A E E A ~ R il HhaL23~ ~ ~ ~ o TGAGTTCGATCATCGACTACCCACTGGTGACCGAGAAGGCGATGGACGAGATGGACTTCC • S S I l D Y P L V E K II D E D F Q

zzzo AGAACAAGCTCCAGTTCATCGTCGACATCGACGCGGCGAAACCCGAGhTCCGGGATGTCG

N K L Q F 1 V D [ ^ A P E 1 D V V

TCGAGTCGGAGTACGACGTCACGGTCGTCGACGTGAACACACAGATCACGCCCGAAGCCG E S E Y D V T V v V N O I T E A E

AGAAGAAGGCAACGGTGAAACTCTCCGCGGAGGACGACGCCCAAGACGTCGCCTCCCGCA K K A T V K 1, S A D D O D V S R I

HhaL2~ ~ ~, o TCGGGGTGTTCTGAGA ATGGGACGCAGGATCC A GGGCAGCG CGCGGGCGCGGGACGTC

G V F * W G R ~ I O [; Q R R G R G T S

• z4~o GACGTTCCGTGCGCCGTCGCACCGATACAAGGCCGAACTGTCGCACAA~GAACCGAGGA

T F R A P S H R Y K A E L S H K T E D • * z5zo

CACGGACGTGCTGGCCG~GAGGTCATCGACGTGGAACACGACCCG~GC~ACGGCCGT T D V L A G E V I D V E H D P A T A V

A R V A F E D D D Q R L V L A S G V G

V G D T 1 E E S S A T I E E G T L P

GCTGGCGGAGA~CGAGGGAGT~CGG~T~AA~GAAAG~A~CGG~GACGG L ^ E I P E V P V C N V E S H G D G

• • • Z760 CGGC A AGTT~C~GGGC~GTC^AC~CG ACCT~TG ACCC CG A~GTG CGCCAC G K L P R G V N A D L V T H E D A T

GA~GTGGA;CTC~G~CG~GAGACGAA~C~Td~GGACT~C~AAC~AT I V E L A S E T K R L S P D C A T I

• • z s so CG~GT~TG~GG~TG~CGGACCGAGAA~CGTTCGTGAAG~TG~AACAAGC^ G V V A G G G R T K P F V K A N K H

CCACAAGATGAAG~GGGACGAAGTGGCCCC~GT~GTGGTG~GATGAACGC H K I i A R T K P R V R G V A | N A

• 2 9 8 0

CGTCGACCACCCGTTCGGTGCGGTG~C~CAGACCCGGG V D H P F G A V A A R P G

F ig . 2. N u c l e o t i d e s e q u e n c e o f a 2 9 8 0 b p g e n o m i c D N A r e g i o n o f ~ halobium a n d d e d u c e d a m i n o a c i d s e q u e n c e . T h e p u t a t i v e S h i n e - D a l g a r n o

s e q u e n c e s a r e s h o w n by a c o n t i n u o u s u n d e r l i n e . T e r m i n a t i o n c o d o n s a r e s h o w n ~ a s t e r i s k s .

Page 3: Nucleotide sequence of the genes encoding the L3, L4 and L23 equivalent ribosomal proteins from the archaebacterium Halobacterium halobium

337

of the approx. 3 kb SmaI fragment in the A-EMBL. HB2. SlO phage was determined by subcloning appro- priate restriction fragments which were sequenced us- ing the dideoxy-chain termination method (Figs. 1 and 2) [7,8]. In the whole nucleotide sequence of 2980 bp, six open reading frames (orO, namely, orfl (1-148), orf2 (145-357), off3 (361-1353), orf4 (1357-2103), off5 (2100-2354), and orf6 (2357-2980), can be presumed from the sites of the termination codon and the initia- tion codon. Orfl and orf6 can be considered to indi- cate a part of the amino acid sequence, the C- and N-terminal of these proteins, as neither the initiation codon nor the termination codon is noticeable. The

initiation codon TTG of orf2 is different from a normal ATG initiation cod,n, but it can be considered as referring to the site of the termination codon in the upstream orfl and the frame of the downstream off2. In the upstream to approx. 10 bp (351-356, 2085-2091) of orf3 and orf5, complementary analogous sequences of 16S rRNA terminal exist, and 3-base intervals are found between orf2 and 3 and between off3 and 4, respectively, while no interval is noted between orfl and orf2 as well as orf4 and orfS, revealing that the initiation and termination codons are overlapping. Such an arrangement of contiguous genes and the presence of the Shine-Dalgarno (complementary sequence of

HhaL3 HmaL3 YeaL3 EcoL3

*haL3 HmaL3 YeaL3

• 6 o

,PQPNRPRKGSMGFSPRKRAESEVPRFNS|PADD -GEVGLQAFAGYKAGMTHVVLVD **********************************- **********************

MSHRKYEA**H,HL*HL*****A*IRA*VKAF*K**RSKP*A*TS*L*******T!*~DL MIGLV*K*V***RIFYE*

• i x 0

,KA~AP*EGME*TVPFT VETPPHRAV RLYEDTPYGKKPLTEV|A**THESLDRTL ,***S,***********V*********A**A*******QR***********S****** ORPGSKFHKREVVEAX,V,D***VVV*GVVG*VE**R*LRS**T***EHL.%DVK*RFYK

EcoL3 G V * - I***VI,VEAN*VTQVKDLANDG*RAIQV*TGARRANRVTKPEAG • 1 8 0

HhaL3 SVP - TRRVTDELIEALDTEE* ADI *VITHTVPGDHAGVPKKNPDV HmaL3 - D * * E D H D P D A A E E Q I R D * H E A G D L G*L **********V,S******** YeaL3 N|YESKKKAFTKYS AKYAQ**AGIERELARIKKY*SVV**LV**QIRKTPLAQ~* AHL EcoL3 - HPA KAG*EAGRGLWEFRLAEGEEPTV -

• 2 4 0

HhaL3 ,ETRV-GGGTLADRLEFAADLIEDGG VBAFGDVFRAGEFA~AGITKGKGTQGPVKRIG HmaL3 ***** ***SVS***DH*L*IV**** ********************************* YeaL3 AEIQLN**SIS-EKVDI*REHF*KTVA**S **EQNEMI**IAV***H*FE*VTH**'~ *coL3 -- - GQSI~SVE --L,ADVKKV,VT*TS****FA*T****N

• 3 0 0

HhaL3 VQKRKGKHARHV ARRIGNLGP| NPSRVRSTVPQQGQTGYHQRTELNKRLIDINDGDEP HmaL3 YeaL3 EcoL3

HhaL3 HmaL3 YeaL3 EcoL3

HhaL3 HmaL3 YeaL3

TK*LP**TH*G- L*KVACI*AC HxAH*M|S*ARA**P***S**SI*HKIYRVGK**DE FRTQDAT,GNSLSH*VP*SI*QNQT*GK*FKGKKIA**I*NERV*VOSLDVFRV-

• 3 6 0

SPD - - GGFPNYGEVDGPYTLVKGSVPGPEQRLVRFRLGVRPNESPRL TV* - ***************************************** ANGATSFDRTKKETITPM***VH***IKNDFIM***CI**NRK*I*TL*KSLYT*T**KA

. . . . . . . . DAERNLL****A***ATGSDLIVKPA*KA

DPEVRYV* TASNQG ******** -NE**** LE**SLK|ID***KF*KGRFQTPAEKHAFMGTLKKDL

IlhaL4 HmaL4 YeaL4 *coL4

HhaL4 HmaL4 *eaL4 EcoL4

ffhaL4 HmaL4 YeaL4 EcoL4

HhaL4 H®aL4 YeaL4 EcoL4

HhaL4 Hmal,4 YeaL4 EcoL4

YeaL4

d o

MQVTVRDL~;D DAGTLDLPRVFETNVRPDLVNRAVLAAQANRTQEYGADEYAGL*T ***********-*D,E*****************************************

MSRP****HS*T*EAT*NA*P**A**SAPI***I*HTLFTSVNK*KR*A*AVS*K**HQ* ME*VLKDAQS*LYVSETTF*** .....

1 2 0

TAESQGSGRGMAHVPK -, rGRVRRAPDSF GGRKAHPPKAEKDRSLDVNDKERQL P***F*****Q***** *D**A**V*QAV-- ************************* S***W*T**AV**I*RVGGG*TG*SGQGA*GNMCRG**MFA*T*T|RK|NVK**HN*K*Y

FNEALVHQVVVA- YAAGA*QGTRAQK -T*AEVTGSGK*P|RQKGTGRA*SC 1 8 0

AVRSALLATADHDSELARG- HNFDDDVEFPLVVSDDFEDLVKTQDVV --- ******A****A*LVAD** --*E**** ********************- *TA**IA***VASLV**** *RVETIP*I*****T*LDSIQ**KEA* SIK*PI|RSGGVTFA **PQDHSQKVNIKMYRGALKSILSE**RQDRLIVVEKFSVEAPK

~ 4 0

SLLEALGVHADIERADEAATVRAGQGELRGRKYQEPTSILFVTASESGP ******D********** ***K****SA*****RR*A******SD**-* AA*K*V*A*S*LLKVLKSKKL***K*KY*N**tTQRRGP*V*Y*ED**I

TKLLAQKLKDMA*EDV*IITGELDEN . . . . . **L --- 3 O 0

*TA-RNLAGVDVASGREVNAEDLAP-GEP-GRLTV|TESAVEEVAQR TAx ************************* *****F****LA***E* VK*L**VPS*ET*NVASL*LLQ*** SAHL**FVI***A*FTKLD*VIGSETVASSKVGY

** ***HK***RDATGID - - * VS*IAFDKVVMTAD*VKQVEEMLA 3 6 0

TLPSHIISTSDVTRIINSSEIQSAIRPAGQATQKRTHVLKKNPLKNKQVLLRLNPYAKVF

YeaL4 AAEKLGSKKAEKTGTKPAAVFTETLKH*

HhaL23 HmaL23 YeaL23 EcoL23

HhaL23 HmaL23 YeaL23

u o

MS MSW

MAPSAKATAAKKAVVKGTNGKKALKVRTSATFRLPKTLKLARAPKYASKAVPHYNRLDSY MIREERLL

SIIDYPLVTEKAMDE*DFQNKLQFIVDIDAAKPEIRDVVESEYDVTVVDVNTQITPEAE ~ ********************************G*V**A**E******EQ****N*MDG* KV*EQ*ITS*T**KKVEDG*I*V*Q*SMK*N*YQ*KKA,KEL,E,NILVRPNGT---

*coL23 KVLRA*H*S***STA*EKS*TIVLK*AK**T*A**KAA*QKLFE*E*EV***LVVKGKVK 1 5 0

* h a L 2 3 KKATVKLSAEDDAQDVASRIGVF HmaL23 -- *********************** YeaL23 . . . . . . . . . . . **********Y***********l

6 O

IlhaL2 MG*RIQGQR*GRGTSTFRAPSHR HmaL2 *********************** RatL8 ***V*R***K*A* *V***HVKH EcoL2 AVVKCKPTSPGRRHVVKVVNPELHKGKPPFAPLLEKNSKSGGRNNS**IT*-RHIGGG*K

1 2 0

HhaL2 *KAELSHKRT EDTDVLAGEVIDVEHDPAR-TA VARVAFED- - DDQR LVLASE HmaL2 ***D*E*RKV- **G**I**T*V*I****** S*P**A*E*** - G*R*-*I**P* RatL8 R*GAARKRAV*FA*RHGYIK*I*K*II***G*GAPL-*K*V*R*PY*FKKRTE-*FI*A* EcoL2 Q AYRIVDFKRNK*GIPAV*ERL*Y**N*-S*NI*L*-LYK . . . . *GE*RYI**PK

t 8 0

*haL2 GVGVGDTIEIESSATIEEGNTLPLAEIPEGVPVCNVESHPGDGGKLPRGR-VS-ADLVTH HmaL2 *********V*V****AP********************S******FA*ASG**-***L** RatL8 *IHT*QFVYCGKK*QLNI**V**VGTM***TI**CL*EK***R***A*AS-G*Y*TVIS* EGoL2 *LKA**Q*QSGVD*A*KP*****MRN**V*ST*H***MK**K**Q*A*SA-GT YVQIVA

2 4 0

HhaL2 ERDA *IVELASGETKRLSPDCRATIGVVAGGGRTEKPFVKAGNKHHKMKA*GTK|P*V flmaL2 D**V AV*K****EM******************************************** RatL8 NPETKKTRVK*P**SK*VI*SAN**VV********I***IL***RAY**Y**KRNC**** EcoL2 **GAYVTLR*R***IRKVEA*****L*E*GNAEHML*VLG*** - AARI*GVR*T*

HhaL2 ~GVAINAVDHPFGAVA-ARPG HmaL2 ********************* *atL8 *************GGN-fiQfiIGKPS

EcoL23 *HGQRIGRRSD***Y*T*KEGQNLDF*GGAE EcoL2 **T***P****G*GHEGRNFGKHP

Fig. 3. Alignment of amino acid sequences of HhaL3, HhaL4, HhaL23 and HhaL2 from H. halobium with the homologous protein HmaL3, HmaIA, HmaL23 and HmaL2 from H. marismortui [4], YeaL3, YeaL4 and YeaL23 from yeast [9-11], RatL8 from rat [12] and EcoL3, EcoL4, EcoL23 and EcoL2 from E. col* [13]. The proteins are aligned for maximal similarity. Amino acid residues identical with those from the H.

halobium D N A fragment are shown by asterisks,

Page 4: Nucleotide sequence of the genes encoding the L3, L4 and L23 equivalent ribosomal proteins from the archaebacterium Halobacterium halobium

338

TABLE I

Percentage sequence similarities of the ribosomal proteins from H. halobium to those from other organisms

Similarities were calculated from alignment as shown in Fig. 3.

Ribosomal Archaebacterium Eukaryote Eubacterium protein H. marismortui yeast or rat E. coli

HhaL3 70 31 21 HhaL4 76 40 17 HhaL23 64 35 33 HhaL2 71 42 33

16S rRNA-terminal) sequence indicate genetic charac- teristics of eubacteria. Further, such a gene region is considered to constitute a transcription unit (operon).

Referring to 6 orfs presumed from the nucleotide sequence, a comparative study was carried out on homology with proteins and nucleotide sequences recorded in the EMBL and NBRF data banks. No homology to proteins with apparent functions at the present stage was noted, but a high homology was indicated with orfl and orf2 found in the $10 operon of analogous H. marismortui in the furthest upstream region [4]. On the other hand, all of orf3, orf4, orf5 and orf6 showed similarity to the ribosomal proteins, in the order of those of H. marismortui as archaebacteria, eukaryotes, and eubacteria (Fig. 3 and Table I) [4,8-13].

According to the nomenclature for E. coli ribosomal proteins, the cloned region of genes based on the above results is to have a gene structure of orfl-orf2- HhaL3-HhaL4-HhaL23-HhaL2, showing the same con- stitution and sequential order as E. coli S10 operon of S10-L3-L4-L23-L2 except for the S10 gene. Further, contrary to the constitution of genes and sequential orders similar to that of eubacteria, the amino acid sequences of the translation product show a higher homology to those of eukaryotes compared with eubac- teria (Table I). The same result has already been reported in the str-operon and L l l / L 1 0 operon in- cluding other ribosomal protein genes [2,5,6]. It could hardly be considered that the constitution and sequen- tial order of genes in the operon of ribosomal proteins

became the same by converged gene constitution in the eubacteria group and the archaebacteria group in the course of evolution, but it would be appropriate to consider instead that it has been maintained from the progenotes, which are positioned between the eubacte- rial and archaebacterial origins in the phylogenetic tree. Based on such an idea, the S10 protein gene positioned in the most upstream part of the E. coli operon, known to reflect the difference in gene consti- tution between archaebacteria and eubacteria, is not found at the utmost upstream of the S10 operon corre- sponding to halobacteria, but orf l and orf2 exist with unknown function at the present stage. It is a very interesting matter to see how such genes are related to the E. coli $10 gene.

We are greatly indebted to Dr. E. Arndt for kindly providing the PvuII DNA fragment from the H. maris- mortui $10 operon and Mr. C. Davenport for critically reading the manuscript.

References

1 Woese, C.R. and Fox, G.E. (1977) Proc. Natl. Acad. Sci. USA 74, 5088-5090.

2 Leffers, H., Gropp, F., Lottspeish, F., Zillig, W. and Garrett, R.A. (1989) J. Mol. Biol. 206, 1-17.

3 Arndt, E., Scholzen, T., Kr6mer, W., Hatakeyama, T. and Kimura, M. (1991) Biochimie 73, 657-668.

4 Arndt, E., Kr6mer, W. and Hatakeyama, T. (1990) J. Biol. Chem. 265, 3034-3039.

5 Itoh, T. (1988) Eur. J. Biochem. 176, 297-303. 6 Itoh, T. (1989) Eur. J. Biochem. 186, 213-219. 7 Itoh, T., Kumazaki, T., Sugiyama, M. and Otaka, E. (1988)

Biochim. Biophys. Acta 671, 16-24. 8 Tabor, S. and Richardson, C.C. (1987) Proc. Natl. Acad. Sci.

USA 84, 4767-4771. 9 Presutti, C., Lucioli, A. and Bozzoni, I. (1988) J. Biol. Chem. 263,

6188-6192. 10 Chan, Y.L. and Wool, I.G. (1992) Biochem. Biophys. Res. Com-

mun. 185, 539-547. 11 Schultz, L.D. and Friesen, J.D. (1983) J. Bacteriol. 155, 8-14. 12 Leer, R.J., Van Raamsdonk-Duin, M.M.C., Hagendoorn, M. J.M.,

Mager, W.H. and Planta, R.J. (1984) Nucleic Acids Res. 12, 6685 -6700.

13 Zurawski, G. and Zurawski, S.M. (1985) Nucleic Acids Res. 13, 4521-4526.