9
Plant Molecular Biology 41: 415–423, 1999. © 1999 Kluwer Academic Publishers. Printed in the Netherlands. 415 Selection of Arabidopsis genes encoding secreted and plasma membrane proteins Jae Hwan Goo 1 , Ae Ran Park 2 , Woo Jin Park 1 and Ohkmae K. Park 2,* 1 Department of Life Science, Kwangju Institute of Science and Technology, Kwangju; 2 Kumho Life and Environmental Science Laboratory, Kwangju 500-712, Korea ( * author for correspondence) Received 19 February 1999; accepted in revised form 27 August 1999 Key words: Arabidopsis thaliana, cloning, plasma membrane proteins, secreted proteins, signal sequence Abstract Secreted and plasma membrane proteins play crucial roles in a variety of physiological and developmental processes of multicellular organisms. Systematic cloning of the genes encoding these proteins is therefore of general interest. An effective method of trapping signal sequences was first described by Tashiro et al. (1993), and a similar yet more efficient method was reported by Klein et al. (1996) and Jacobs et al. (1997). In this study, we carried out the latter yeast-based signal sequence trap to clone genes from Arabidopsis thaliana encoding secreted and plasma membrane proteins. Of 144 sequenced cDNA clones, 18% are identical to previously cloned Arabidop- sis thaliana genes, 12% are homologous to genes identified from various organisms, and 46% are novel. All of the isolated genes identical or homologous to previously reported genes are either secreted or plasma membrane proteins, and the remaining novel genes appear to contain functional signal sequences based on computer-aided sequence analysis. The full-length cDNA clones of one homologous gene and another novel gene were isolated and sequenced. The deduced amino acid sequences suggest that the former encodes a secreted protein, and the latter encodes a type 1 membrane protein. These results indicate that the signal sequence trap method is effective and useful for the isolation of plant genes encoding secreted and plasma membrane proteins. Introduction Intercellular signaling is essential for development and differentiation of multicellular organisms. This process occurs frequently in the extracellular space, and is mostly mediated by the secreted and plasma membrane-bound proteins, such as growth factors, morphogens, hormones, and their receptors. Most of these molecules contain a short stretch of amino acids, known as a signal sequence, in their N-termini (Blobel and Dobberstein, 1975; von Heijne, 1985; Gierasch, 1989). The presence of a signal sequence is crucial for a translated protein to be targeted to the secre- tory pathway. Although highly degenerate, the signal The nucleotide sequence data reported will appear in the EMBL, GenBank and DDBJ Nucleotide Sequence Databases under the accession numbers AF104328 (cell wall-plasma membrane linker protein homologue, atCWLP) and AF104329 (putative membrane protein, PMP). sequences are largely interchangeable among differ- ent secreted and plasma membrane proteins and even among diverse organisms (Gilmore, 1993; Johnson, 1993; Walter and Johnson, 1994; Rapoport et al., 1996). Based on these observations, innovative meth- ods for systematic cloning of genes encoding secreted and plasma membrane proteins have been invented (Tashiro et al., 1993, 1996). A mammalian cell culture-based method was first reported by Tashiro et al. (1993). In this sys- tem, termed the signal sequence trap, COS-7 cells were transformed with plasmids of an expression li- brary, where 5 0 portion-enriched cDNA fragments were inserted upstream of Tac (α chain of the hu- man interleukin-2 receptor) lacking its endogenous translation initiator methionine and signal sequence. When inserts with initiator methionine and signal sequence were cloned in-frame with the correct ori- entation, it directed the expression of the Tac fusion

Selection of Arabidopsis genes encoding secreted and plasma membrane proteins

Embed Size (px)

Citation preview

Plant Molecular Biology41: 415–423, 1999.© 1999Kluwer Academic Publishers. Printed in the Netherlands.

415

Selection ofArabidopsisgenes encoding secreted and plasma membraneproteins

Jae Hwan Goo1, Ae Ran Park2, Woo Jin Park1 and Ohkmae K. Park2,∗1Department of Life Science, Kwangju Institute of Science and Technology, Kwangju;2Kumho Life andEnvironmental Science Laboratory, Kwangju 500-712, Korea (∗author for correspondence)

Received 19 February 1999; accepted in revised form 27 August 1999

Key words: Arabidopsis thaliana, cloning, plasma membrane proteins, secreted proteins, signal sequence

Abstract

Secreted and plasma membrane proteins play crucial roles in a variety of physiological and developmentalprocesses of multicellular organisms. Systematic cloning of the genes encoding these proteins is therefore ofgeneral interest. An effective method of trapping signal sequences was first described by Tashiroet al. (1993), anda similar yet more efficient method was reported by Kleinet al. (1996) and Jacobset al. (1997). In this study, wecarried out the latter yeast-based signal sequence trap to clone genes fromArabidopsis thalianaencoding secretedand plasma membrane proteins. Of 144 sequenced cDNA clones, 18% are identical to previously clonedArabidop-sis thalianagenes, 12% are homologous to genes identified from various organisms, and 46% are novel. All ofthe isolated genes identical or homologous to previously reported genes are either secreted or plasma membraneproteins, and the remaining novel genes appear to contain functional signal sequences based on computer-aidedsequence analysis. The full-length cDNA clones of one homologous gene and another novel gene were isolatedand sequenced. The deduced amino acid sequences suggest that the former encodes a secreted protein, and thelatter encodes a type 1 membrane protein. These results indicate that the signal sequence trap method is effectiveand useful for the isolation of plant genes encoding secreted and plasma membrane proteins.

Introduction

Intercellular signaling is essential for developmentand differentiation of multicellular organisms. Thisprocess occurs frequently in the extracellular space,and is mostly mediated by the secreted and plasmamembrane-bound proteins, such as growth factors,morphogens, hormones, and their receptors. Most ofthese molecules contain a short stretch of amino acids,known as a signal sequence, in their N-termini (Blobeland Dobberstein, 1975; von Heijne, 1985; Gierasch,1989). The presence of a signal sequence is crucialfor a translated protein to be targeted to the secre-tory pathway. Although highly degenerate, the signal

The nucleotide sequence data reported will appear in the EMBL,GenBank and DDBJ Nucleotide Sequence Databases under theaccession numbers AF104328 (cell wall-plasma membrane linkerprotein homologue, atCWLP) and AF104329 (putative membraneprotein, PMP).

sequences are largely interchangeable among differ-ent secreted and plasma membrane proteins and evenamong diverse organisms (Gilmore, 1993; Johnson,1993; Walter and Johnson, 1994; Rapoportet al.,1996). Based on these observations, innovative meth-ods for systematic cloning of genes encoding secretedand plasma membrane proteins have been invented(Tashiroet al., 1993, 1996).

A mammalian cell culture-based method was firstreported by Tashiroet al. (1993). In this sys-tem, termed the signal sequence trap, COS-7 cellswere transformed with plasmids of an expression li-brary, where 5′ portion-enriched cDNA fragmentswere inserted upstream of Tac (α chain of the hu-man interleukin-2 receptor) lacking its endogenoustranslation initiator methionine and signal sequence.When inserts with initiator methionine and signalsequence were cloned in-frame with the correct ori-entation, it directed the expression of the Tac fusion

416

proteins at the cell surface. By selecting cells ex-pressing Tac epitope on their cell surface, a numberof mammalian cDNA clones of putative growth fac-tors, receptors, or adhesion molecules were identified(Tashiroet al., 1993; Nakamuraet al., 1995; Hamadaet al., 1996; Shirozuet al., 1996; Imaiet al., 1996;Yoshie et al., 1997; Furutaniet al., 1998; Kimuraet al., 1998). A similar effort was made to clonecDNAs encoding secreted or membrane-associatedplant proteins (Kristoffersenet al., 1996). Plant cD-NAs were expressed as fusion proteins containingan ER-signal peptide at the N-terminus, so that se-creted proteins with signal sequence or membraneproteins with membrane-spanning domains becometrapped at the cell surface. This approach allowedthe isolation of plant cDNAs encoding secreted ormembrane-associated proteins.

Recently, a yeast-based signal sequence trapmethod was reported (Kleinet al., 1996; Jacobset al.,1997). This method utilizes a yeast enzyme, invertase,that catalyzes breakdown of sucrose into glucose andfructose in the periplasmic space. Deletion of thesuc2gene, which encodes the invertase protein, or a mu-tation in the invertase signal sequence results in yeastcells defective in their ability to grow on sucrose me-dia. The vector used in this system directed the expres-sion and secretion of the invertase into the periplasmicspace when cDNA fragments with initiator methionineand signal sequence were inserted in-frame next to aninvertase, which does not contain its own initiator me-thionine and signal sequence. Numerous mammaliancDNAs encoding secreted and plasma membrane pro-teins were isolated in a high-throughput manner usingthis technique.

In the work reported here, we used the yeast-basedsignal sequence trap method to isolateArabidopsisthaliana genes encoding secreted and plasma mem-brane proteins. Our results indicate that this methodcan be successfully used for plants as well as mam-mals.

Materials and methods

Vector construction

The signal sequence trap vector, pSMASH, was con-structed using pGAD424 (Clontech) as a backboneplasmid. pGAD424 was digested withHindIII and lig-ated with Kex linker (5′-AGCTGAATTCAGTCAGCGGCCGCTTGGATAAAAGGTACCCATACGATGTTC

CAGATTATGCTCATATGGCAGTTGTCGAC-3′; 5′-AGCTGTCGACAACTGCCATATGAGCATAATCTGGAACATCGTATGGGTACCTTTTATCCAAGCGGCCGCTGACTGAATTC-3′) resulting in pGAD-Kex.The Kex linker contains four restriction sites (EcoRI,NotI, NdeI, and SalI) and a Kex2 cleavage site(Lys-Arg). A suc2 cDNA fragment (ca. 1.5 kb)encoding the cytosolic form of invertase lack-ing the initiator methionine and the signal se-quence was amplified by polymerase chain reac-tion (PCR) and subcloned intoNdeI-SalI of pGAD-Kex, forming pSMASH. The PCR primers used were5′-TGCACATATGACAAACGAAACTAGCGATA-3′and 5′-GTTTTAGTCGACCTATTTTACTTCCCTTACTTGG-3′.

Library construction

Library construction followed the method of Naka-muraet al. (1995) with minor modifications. Poly(A)RNA was isolated fromArabidopsisseedlings (wholeplant) with the Oligotex mRNA purification kit (Qi-agen). First-strand cDNA was synthesized usingrandom primer (SuperScript Choice system, Gibco-BRL). After alkaline degeneration of mRNA, a (dC)n

tail was added to the 3′ end of cDNA with terminal de-oxynucleotidyl transferase (Gibco-BRL). The secondstrand was synthesized by priming with primer 5′-GCGGCCGCGAATTCTGACTAACTGAC(dG)17-3′which contains theEcoRI site. After sonication andblunting, double-stranded cDNA was ligated withNotIlinkers (5′-CCGCGCGGCCGCGATATCAAGCTTGTAC-3′ and 5′-GAGGTACAAGCTTGATATCGCGGCCGCGCGG-3′). DNA fragments around 200–600 bp were isolated by 1.5% agarose gel elec-trophoresis and then amplified by polymerase chainreaction (PCR) using two primers (5′-GCCGCGAATTCTGACTAACTGC-3′ and 5′-GAGGTACAAGCTTGATATCGCGGCCGCGCGG-3′) under the followingconditions: 94◦C for 1 min, 48 ◦C for 1.5 minand 72◦C for 2 min for 35 cycles. Amplified DNAfragments were digested, and then ligated into theEcoRI-NotI site of the pSMASH vector.

Yeast transformation and selection

The library pSMASH DNAs were introduced intoDBYα2445 (Matα, suc21-9, lys2-801, ura3-52,ade2-101) by lithium acetate transformation, and theyeast cells were directly spread on sucrose plates(1% yeast extract, 2% peptone, 2% sucrose, 2%

417

agar). Healthily growing colonies appeared after 4–5 days of incubation at 30◦C. The colonies werere-streaked on sucrose plates, and plasmid DNAswere isolated from the individual colonies. The plas-mid DNAs were again introduced toEscherichiacoli strain DH10B by electroporation, and then pu-rified and sequenced in both strands using the for-ward and reverse primers with the sequences cor-responding to ADH1 promoter of pGAD424 (5′-CTCGTTCCCTTTCTTCCTTGTTTC-3′) and suc2gene (5′-GGACCAAAGGTCTATCGCTAGTTTC-3′),respectively.

Isolation of full-length cDNA clones

cDNA library screening was performed as describedelsewhere (Sambrooket al., 1989).ArabidopsiscDNAlibrary constructed in pYESTrp2 (Invitrogen) wasused. Briefly, 1× 106 colonies were probed withinserts of clone 140 (264 bp) and clone 142 (307bp) that were labeled by random priming (BoehringerMannheim). Six independent clones for clone 140,and eight independent clones for clone 142 were fi-nally obtained. Clones with the longest inserts weresequenced.

DNA sequence analysis

The cDNA clones were sequenced with an automaticsequencer (ABI Prism 377, Perkin Elmer). The DNAsequences were analyzed with MacVector (OxfordMolecular Group PLC), and homology-searched withBLAST in the GenBank database. Motif search weredone in the Prosite database with the on-line programBCM search launcher (http://dot.imgen.bcm.tmc.edu).The presence of a signal peptide and transmembranedomain was predicted by means of the on-line pro-grams SignalP and TMHMM in the Prediction server(http://genome.cbs.dtu.dk).

Northern blot analysis

Total RNA from Arabidopsis tissues was isolatedas described by Logemannet al. (1987). Two vol-umes of guanidine buffer (8 M guanidine hydrochlo-ride, 20 mM MES (4-morpholineethanesulfonic acid),20 mM EDTA, 50 mM 2-mercaptoethanol at pH 7.0)were added to the tissue homogenate. The homogenatewas stored at 0◦C for several hours, and centrifuged.Phenol/chloroform/isoamyl alcohol was added to re-move proteins. The RNA-containing aqueous phasewas collected by centrifugation, and precipitated with

ethanol. The RNA pellet was dissolved in diethylpyrocarbonate (DEPC)-treated water.

Total RNA (20µg) was fractionated on agarosegels and transferred onto nitrocellulose with fixationby UV crosslinking. After prehybridization, the blotwas hybridized with32P-labeled probe, and processedat 62 ◦C for 16 h. The blot was then washed with2× SSC, 0.1% SDS for 30 min at 65◦C and 1× SSC,0.1% SDS for 30 min at 65◦C.

Results

Signal sequence trap ofArabidopsis thalianagenes

The plasmid vector used in this study, pSMASH, issimilar to vectors described previously (Jacobset al.,1997). Briefly, it contains aEcoRI-NotI subcloningsite followed by a Kex2 protease recognition site(Lys-Arg) and the invertase lacking its own initiatormethionine and signal sequence. The pSMASH cannotsupport growth ofsuc2yeast on sucrose because theinvertase encoded by this vector is not secreted due tothe lack of a signal sequence. When a proper signalsequence is inserted into the subcloning site in framewith the invertase, the invertase is then transported tothe endoplasmic reticulum (ER), and may finally besecreted into the periplasmic space. Secretion of theinvertase, indicating the presence of a signal sequencein the inserted fragment, can be easily detected by thegrowth of the transformedsuc2yeast cells on sucrosemedia.

To test our system, the 5′ portions of cDNAsof well-characterized secreted and plasma membraneproteins (plant AmarandinS and AmarandinII, andDrosophilaFrizzled and Wingless) were inserted intotheEcoRI-NotI site in frame with the invertase. Trans-formation of the resulting plasmids into thesuc2yeast strain, DBYα2445 (see Materials and methods),which carries a deletion in the endogenous invertasegene, yielded colonies after 4–5 days of incuba-tion on sucrose medium. On the contrary, insertionof the 5′ portions of cDNAs of known cytoplasmicproteins (ArabidopsisPhytochromeA, andDrosophilaShaggy and Armadillo) did not affect the viabilityof DBYα2445 on sucrose media (data not shown).These indicated our system worked properly. Inter-estingly, most of the rapidly growing yeast coloniesalso enabled their immediate neighboring cells togrow slowly, thus making satellites around them. Thismay have occurred because glucose produced by the

418

Table 1. Summary of signal sequence trap.

Gene classes Isolated cDNA clones Unique cDNA clones

Identical 26 (18%) 16

Homologous 17 (12%) 9

Novel 66 (46%) 40

Miscellaneousa 35 (24%) ND

Total 144 ND

aMost of these clones (32 isolates) belong to rRNAs except threeclones inserted in reverse orientation.ND, not determined.

invertase-secreting cells diffused into the media andsupported the growth of neighboringsuc2yeast cells.Accordingly, rapidly growing colonies with satelliteswere selected as authentic invertase-secreting transfor-mants.

A cDNA library was then constructed by usingArabidopsisseedling mRNA. First-strand cDNA wassynthesized with random primer, and (dC)n-tailed toenrich for the 5′ portion of cDNAs. The second strandwas synthesized with a primer containing (dG)n. Itwas essential to subclone small (200–600 bp) 5′ frag-ments of the genes in the right orientation with theinvertase gene (see Materials and methods for de-tails). The resulting plasmids were transformed intoDBYα2445. The yeast cells were then directly platedon the sucrose medium for 4–5 days at 30◦C. In in-dependent experiments, about 0.1% of the total cDNAwas selected for containing signal sequences.

We further analyzed 144 colonies. The results aresummarized in Table 1. Of the cDNAs 18% were iden-tical to previously knownArabidopsisgenes, and 12%showed significant homology to other genes identi-fied from various organisms, whereas 46% representednovel genes. To our surprise, a large number of theinserts were those of ribosomal RNA. The clones cate-gorized as ‘Miscellaneous’ in Table 1 represent rRNAsand a few other cDNAs that are inserted in reverseorientation. These artifacts may be explained by thefact that a short stretch of hydrophobic amino acidscould serve as a signal sequence in yeast (Kaiseret al.,1987).

Sixteen unique cDNAs encoding known proteinsare listed in Table 2. Except one clone for a receptorprotein kinase-like protein, all the proteins encoded bythe cDNAs are known to be secreted proteins. It is notyet clear whether our system is biased for secreted pro-teins or our library is somehow enriched for the genesencoding secreted proteins. No cytoplasmic proteins

were selected. Nine unique cDNAs encoding homolo-gous proteins are listed in the Table 3. Once again, theproteins encoded by the cDNAs seemed to be eithersecreted proteins or cell membrane proteins. These re-sults indicate successful cloning ofArabidopsisgenesencoding secreted and plasma membrane proteins.

Forty novel genes are listed in Table 4. The resultsobtained from the identical or homologous clones (Ta-bles 2 and 3) strongly suggest that the selected novelgenes would encode secreted or plasma membraneproteins. In all cases, a stretch of hydrophobic aminoacids was found in their N-termini, consistent with thepresence of a signal sequence. To further confirm this,the putative translation products were analyzed by us-ing SignalP, a computer on-line program designed forsearching signal sequences and their cleavage sites(Nielsenet al., 1997). Based on the analysis of nu-merous known signal sequences, the mean S scoreof greater than 0.48 suggests that the analyzed pro-tein contains a genuine signal sequence. As shown inTable 4, all the clones contain a stretch of 14 to 40amino acids with mean S scores greater than 0.48. Thepresence of putative signal sequence combined withthe ability of these clones to restore invertase secre-tion indicates that the novel cDNAs encode secretedor plasma membrane proteins.

Sequence analysis of two full-length cDNA clones

For further evaluation of this method and character-ization of potentially interesting clones, we obtainedfull-length cDNA clones of clone 140 and clone 142by cDNA library screening and determined their com-plete nucleotide sequences. The conceptual translationof clone 140 suggests that its gene product is a cellwall-plasma membrane linker protein (CWLP) (Fig-ure 1). The protein shows 85% sequence identity withCWLP of Brassica napus(bCWLP), and 62% se-quence identity with a previously reported cell wallprotein homologue ofArabidopsis(atCWP-h). Inter-estingly, atCWP-h contains a completely divergentstretch of 162 amino acids at its C-terminus. Basedon the analysis with a transmembrane domain predic-tion program, TMHMM, we suggest that this newlyisolatedArabidopsisCWLP (atCWLP) is a secretedprotein (data now shown). The protein is very rich inproline (over 30% of total amino acid residues) likebCWLP and atCWP-h, and possibly involved in a coldresponse (Goodwinet al., 1996).

The deduced gene product of clone 142 is noveland does not contain any functional motifs (Figure 2a).

419

Table 2. cDNA clones with amino acid sequence identical to knownArabidopsisproteins.

Clones GenBank Protein name M/S/Ca

accession number

5, 109 AL021684 receptor protein kinase-like protein M

8, 77, 85, 92, 107, 124, 158 M80567 non-specific lipid transfer protein S

17 X91259 lectin-like protein S

18, 96 X98189 peroxidase Atpla S

22 U01880 pre-hevein-like protein S

23 U75188 germin-like protein S

33, 38, 72 D63508 endoxyloglucan transferase S

34 X98777 peroxidase Atp16a S

35 S71225 xyloglucan endotransglycosylase-

related protein Xtr-6 S

55 AC000375 Brassicaaspartic protease-like S

63 AF057357 lipid transfer protein 2 precursor S

79 U11766 GAST1 protein homologue S

103 D16454 endoxyloglucan transferase S

117 AC002335 trypsin inhibitor isologue S

133 U72153 β-glucosidase S

135 D13042 thiol protease Rd19A S

aM, cell membrane; S, secreted; C, cytoplasmic.

Table 3. cDNA clones with amino acid sequence homologous to known proteins.

Clones GenBank Protein name (organism) Blastp score/E value M/S/Ca

accession number

1, 11, 13 X91819 respiratory nitrate reductase (Bacillus subtilis) 72/6e-13 M

54 U39289 myrosinase-associated protein (Brassica napus) 73/3e-13 S

15, 126, 140, X94976 cell wall-plasma membrane linker protein 198/3e-36 M/Sb

146, 161 (Brassica napus)

19 Q42589 non-specific lipid transfer protein 1 (LTP1) 109/8e-24 S

(Arabidopsis thaliana)

47 AL021749 copper-binding protein-like (Arabidopsis thaliana) 43.8/2e-04 S

52, 61, 70 P39176 protein ERFK/SRFK precursor (Escherichia coli) 55/7e-08 S

59 P37965 glycerophosphoryl diester phosphodiesterase 53.1/7e-07 S

(Bacillus subtilis)

91 AJ005895 protein translocase (Homo sapiens) 45/6e-05 M

145 U34334 non-specific lipid transferase-like protein 60.5.3e-09 M/Sb

(Phaseolus vulgaris)

aM, cell membrane; S, secreted; C, cytoplasmic.bNot determined yet whether it is secreted or membrane protein.

The program TMHMM suggests the presence of a pu-tative transmembrane domain near the C-terminus ofthe protein (data not shown). Thus, this protein is mostlikely a type 1 integral membrane protein. This geneis expressed ubiquitously in all the tested vegetativetissues, as shown by northern analysis (Figure 2b).We therefore suggest that this protein plays a generalrole in many cell types, but their specific physiological

functions remain to be studied. This protein is referredto as PMP (putative membrane protein) and currentlybeing characterized in this laboratory.

420

Table 4. N-terminal amino acid sequence of novel cDNA clones.

Clones Signal peptide sequencea Signal S scorec

peptideb

4, 104, 114, MAILKSHFFLLFPLHLLHFHTVSFA QTLFV.. 1–25 0.882

132, 136, 143

6, 113 MTSSTFSSMIFLLVLLFSLHMGEA LGAQT.. 1–24 0.994

7, 37, 71 MAASMKFLCILGLILLIGTVVDG AGECG.. 1–23 0936

10, 128 MFIIYLFIFLSSAIIDS DGVAM.. 1–17 0.846

12, 43, 69, MQALIFLGFLGTSCLA QAPAP.. 1–16 0.934

78, 112

14 MFSMFGFFVQAIVTGK GPIEN.. 1–16 0.610

25 MTKLFFFFFFSFLYTITTLTFPPLTTSA ATSCR.. 1–28 0.855

36 MDRCIYGCSVITIFFSFFFLLNASA LESGH.. 1–25 0.807

41, 152 MHSSCLLYLTVLVVFIVSFAGG ERFKE.. 1–22 0.923

46 MLVLVKVLRLSKTPAFRVQIASLIGLLIRHSTS IEDDL.. 1–33 0.519

49, 108 MSPRIVNEARSDLILCFFFLSLPSFSSLPSFQ.. 1–27 0.608

53, 90, 127 MDSSKLSSLSLCLFLICIIYLLPQHSLA CGSCN.. 1–27 0.854

57, 123 MLSLKLFLVTLFLPLQTLFIAS QTLLP.. 1–22 0.949

60 MAKMQLSIFIAVVALIVCSASA KTASP.. 1–22 0.923

64 MARCSNNLVGILNFLVSLLSIPILAG GIWLS.. 1–26 0.835

65 MSKFTGFSSLAISYFLLVSTIVAA TDVHY.. 1–24 0.844

66, 67 MARSFAIAVICIVLIAGVTGQA PTSPP.. 1–20 0.930

80, 122 MASPNWPSLLMVVLALYPMAAYTSA QYSPT.. 1–25 0.851

83 MKAFSAAVALSSILLSAPMPAVA DISGL.. 1–23 0.816

86, 118, 155 MDSSKLSSLSLCLFLICIIYLPQHSLA CGSPR.. 1–27 0.847

88 MRILSYGIVILSLLVFSFIEFSVHA RPVAL.. 1–25 0.876

89 MNREMTSSFLLLTFAICKLIIAVG LNVGP.. 1–24 0.873

93, 121 MASSSTSISLLLFVSFLLLLVNSRA ENAWP.. 1–25 0.930

98 MISLRMKGLGHCLVYVVVFSVIAAIVTA YDSPS.. 1–28 0.858

99 MASSSPSLLILAVACFVSLISPAIS QQASK.. 1–25 0.878

100 MTTFSTSFLFLLLVFCLIDPLAA DDLQH.. 1–23 0.944

101 MRHLSSPPWPLLLLLLLSSFTSG ESSLS.. 1–23 0.943

102 MFWKTKRDSISLDFSKMILQSQKLWTMFLILAIWSPI 1–40 0.492

SYSLHFDL..

106 MASSSSSLLILAVACFVSLISPAIS QQACK.. 1–25 0.892

119 MGSKIVQVFLMLALFATSALA QAPAP.. 1–21 0.948

125, 148 MSSLRLRLCLLLLLPITIS CVTVT.. 1–19 0.977

130 MGSRVLASFFVFLIFTVITLPPTIQA CTPCT.. 1–26 0.926

137, 142 MKAFYVFVVALLLTLNYRGEA SGSVF.. 1–21 0.881

147 MTSLQLAELFVSSIVHLLYGFYIFSSAVA GDISQ.. 1–29 0.722

153 MSSSISPLLTTVIFVSSLLFLTISKA ATIPN.. 1–26 0.878

154 MMFNKISLITALLFFLLGTNVFA HSHLE.. 1–23 0.937

156 MDQTLYRKCLVTLSMMAMIGTSMA TYAGT.. 1–24 0.614

157, 164 MGSGMIRTLVILAIALYMIGS DNVHV.. 1–21 0.860

160 MIINLATLLANIVT NPFNN.. 1–14 0.617

162 MASSSIALFLALNLLFFTTISA CGSCT.. 1–22 0.961

aPredicted signal peptides are shown underlined and in bold.bAmino acid numbers of signal peptide.cSignal peptide score.

421

Figure 1. Amino acid sequence comparison of atCWLP (clone 140) with two other cell wall-plasma membrane linker proteins fromB. napusandA. thaliana(accession numbers X94976 and Z97338, respectively). Identical residues are shaded and boxed; conservative substitutions areboxed. atCWP-h,A. thalianacell wall protein homologue; bCWLP,B. napuscell wall-plasma membrane linker protein.

Discussion

The yeast-based signal sequence trap method de-scribed in this report and others has been used forthe selection of mammalian genes encoding secretedand plasma membrane proteins. In this report, wehave shown that this method is equally efficient forthe identification ofArabidopsissecreted and plasmamembrane proteins. We were able to obtain a numberof Arabidopsisgenes of potential interest from thispioneer-scale screening. As summarized in Table 1,the results show that the majority of the isolated genesare novel genes or homologues of known genes, andthat only a small fraction of secreted and plasma mem-brane proteins have previously been identified fromArabidopsis. The selected clones encode secreted orplasma membrane proteins of diverse functions (Ta-bles 2 and 3) as well as novel proteins of diversesequences (Table 4). These together suggest that thismethod can be utilized as an efficient tool for the studyof plant genes.

One technical issue is that we have obtained ca.0.1% positive clones growing on the sucrose mediumamong the total transformants. This percentage ismuch lower than the expected ratio (ca. 10%) of se-creted and plasma membrane proteins among the totalcellular proteins. Similar results were obtained forthe mammalian genes (Kleinet al., 1996; Jacobset al., 1997). We consider several factors for thislow probability: first, the cDNAs were either partially

synthesized with truncated 5′ ends or cloned out-of-frame withsuc2; second, thesuc2fusion proteins werenot efficiently translated or degraded rapidly by theyeast proteases; third, the proteins targeted to the plantplasma membrane were not targeted efficiently to theyeast plasma membrane. This may also be attributedto the very low number of plasma membrane proteinsfound in the selection.

With the rapid progress of genome projects, nu-merous genes have been identified without knowl-edge about their function. This has created the needfor new concepts and approaches to understand bi-ological processes in the context of knowledge ofthe whole genome structure, termed ‘functional ge-nomics’ (Hieter and Boguski, 1997). For example,the recent completion of the genome sequence ofSaccharomyces cerevisiaehas allowed the systematicconstruction of mutant yeast strains, each of whichwill be deleted for one of the 6000 predicted genes(Smith et al., 1995; Oliver, 1996; Shoemakeret al.,1996). In this regard, the signal sequence trap methoddescribed in this report may also provide a powerfultool for analyzing genes on a large scale.

Recently, an effort has been made to isolateDrosophila genes in a high-throughput manner by theBerkeleyDrosophilagenome project group (Kopczyn-ski et al., 1998). In their approach, rough ER-boundmRNA was first prepared to enrich mRNAs encodingsecreted and membrane proteins, and then the derivedcDNAs were normalized to increase the prevalence of

422

Figure 2. Analysis of PMP (clone 142). a. The deduced amino acidsequence and hydropathy plot of PMP. The putative signal sequenceat the N-terminus and transmembrane domain at the C-terminus areunderlined. b. Northern blot analysis of the expression pattern ofPMP in different tissues. The RNA gel blot was probed with radio-labeled cDNA of PMP. The ethidium bromide-stained gel shows therelative RNA amount used in the experiment. Fl, flower; Le, leaf;Ro, root; St, stem; Si, silique.

rare cDNAs in the library. In conjunction within situhybridization, they were able to detect a number ofinterestingDrosophila genes encoding secreted andplasma membrane proteins. It was found, however,that the library also contained a large fraction of genesencoding cytosolic and nuclear proteins.

A signal sequence trap method based on tran-sient expression in COS cells has previously beendeveloped to clone plant cDNAs encoding secreted ormembrane proteins (Kristoffersonet al., 1996). Theexpression vector contained the sequence correspond-ing to the ER signal peptide that directs the expressedproteins to the ER as a default pathway. Trapping oc-curred at the cell surface when the cDNAs encodedsecreted or membrane proteins. Whereas this methodfacilitated the isolation of full-length cDNA clones,its limitation was that the presence of the additionalN-terminal signal sequence leads to the expression

of proteins in reverse orientation at the cell mem-brane that might cause misfolding or instability ofthe proteins. Another approach to study plant plasmamembrane proteins on a large scale was recently de-scribed, where the plasma membrane fraction waspurified and analyzed by two-dimensional gel elec-trophoresis (Santoniet al., 1998). Microsequencingresults of the spots in the plasma membrane-enrichedfractions supported the validity of the approach.

The data in this study show that the yeast-basedsignal sequence trap method is highly efficient inspecifically isolating genes encoding secreted andplasma membrane proteins. Except for the contam-inating rRNAs, all the isolated genes encode eitherknown secreted or plasma membrane proteins, or pro-teins similar to known secreted or plasma membraneproteins, or novel proteins containing hydrophobicstretches in their N-termini that are most likely signalsequences. In conclusion, we suggest that large-scaleapplication of this method, combined with systematicnorthern analysis and/orin situ hybridization, wouldprovide essential data for deciphering the genome ofArabidopsis thaliana.

Acknowledgements

We are grateful to Dr Pill-Soon Song for critical read-ing of the manuscript. We thank Dr David Botstein forthe suc2cDNA and yeast strain DBYα2445, Dr SooYoung Kim for the cDNA library, Dr Chang Ho Chungfor cDNAs of AmarandinS and AmarandinII, and DrChung Mo Park for cDNA of PhytochromeA. We alsothank Dr Eric Johnson for assistance in the prepara-tion of the manuscript. During this work, W.J.P. wassupported by a grant from KOSEF and a grant (STAR)from the Ministry of Science and Technology, Korea.

References

Blobel, G. and Dobberstein, B. 1975. Transfer of proteins acrossmembranes. I. Presence of proteolytically processed and un-processed nascent immunoglobulin light chains on membrane-bound ribosomes of murine myeloma. J. Cell. Biol. 67: 835–851.

Furutani, M., Arii, S., Mizumoto, M., Kato, M. and Imamura,M. 1998. Identification of a neutrophil gelatinase-associatedlipocalin mRNA in human pancreatic cancers using a modifiedsignal sequence trap method. Cancer Lett. 122: 209–214.

Gierasch, L.M. 1989. Signal sequences. Biochemistry 28: 923–930.Gilmore, R. 1993. Protein translocation across the endoplasmic

reticulum: a tunnel with toll booths at entry and exit. Cell 75:589–592.

423

Goodwin, W., Pallas, J.A. and Jenkins, G.I. 1996. Transcripts of agene encoding a putative cell wall-plasma membrane linker pro-tein are specifically cold-induced inBrassica napus. Plant Mol.Biol. 31: 771–781.

Hamada, T., Tashiro, K., Tada, H., Inazawa, J., Shirozu, M., Shiba-hara, K., Nakamura, T., Martina, N., Nakano, T. and Honjo, T.1996. Isolation and characterization of a novel secretory protein,stromal cell-derived factor-2 (SDF-2) using the signal sequencetrap method. Gene 176: 211–214.

Hieter, P. and Boguski, M. 1997. Functional genomics: it’s all howyou read it. Science 278: 601–602.

Imai, T., Yoshida, T., Masataka, B., Nishmura, M., Kakizaki, M. andYoshie, O. 1996. Molecular cloning of a novel T cell-directed CCchemokine expressed in thymus by signal sequence trap usingEpstein-Barr virus vector. J. Biol. Chem. 271: 21514–21521.

Jacobs, K.A., Collins-Racie, L.A., Colbert, M., Duckett, M.,Golden-Fleet, M., Kelleher, K., Kriz, R., LaVallie, E.R., Mer-berg, D., Spaulding, V., Stover, J., Williamson, M.J. and McCoy,J.M. 1997. A genetic selection for isolating cDNAs encodingsecreted proteins. Gene 198: 289–296.

Johnson, A.E. 1993. Protein translocation across the ER membrane:a fluorescent light at the end of the tunnel. Trends Biochem. Sci.18: 456–458.

Kaiser, C.A., Preuss, D., Grisafi, P. and Botstein, D. 1987.Many random sequences functionally replace the secretion signalsequence of yeast invertase. Science 235: 312–317.

Kimura, N., Toyoshima, T., Kojima, T. and Shimane, M. 1998.Entactin-2: a new member of basement membrane protein withhigh homology to entactin/nidogen. Exp. Cell Res. 241: 36–45.

Klein, R.D., Gu, Q., Goddard, A. and Rosenthal, A. 1996. Selectionfor genes encoding proteins and receptors. Proc. Natl. Acad Sci.USA 93: 7108–7113.

Kopczynski, C.C., Noordermeer, J.N., Serano, T.L., Chen, W.Y.,Pendleton, J.D., Lewis, S., Goodman, C.S. and Rubin, G.M.1998. A high throughput screen to identify secreted and trans-membrane proteins involved inDrosophilaembryogenesis. Proc.Natl. Acad. Sci. USA 95: 9973–9978.

Kristoffersen, P., Teichmann, T., Stracke, R. and Palme, K.1996. Signal sequence trap to clone cDNA encoding secretedor membrane-associated plant proteins. Anal. Biochem. 243:127–132.

Logemann, J., Schell, J., Willmitzer, L. 1987. Improved method forthe isolation of RNA from plant tissues. Anal. Biochem. 163:16–20.

Nakamura, T., Tashiro, K., Nazarea, M., Nakno, T., Sasayama, S.and Honjo, T. 1995. The murine lymphotoxin-β receptor cDNA:

isolation by the signal sequence trap and chromosomal mapping.Genomics 30: 312–319.

Nielsen, H., Engelbrecht, J., Brunak, S. and von Heijne, G. 1997.Identification of prokaryotic and eukaryotic signal peptides andprediction of their cleavage sites. Protein Eng. 10: 1–6.

Oliver, S. 1996. A network approach to the systematic analysis ofyeast gene function. Trends Genet. 12: 241–242.

Rapoport, T.A., Jungnickel, B. and Kutay, U. 1996. Protein transportacross the eukaryotic endoplasmic reticulum and bacterial innermembranes. Annu. Rev. Biochem. 65: 271–303.

Sambrook, J., Fritsch, E.F. and Maniatis, T. 1989. MolecularCloning: A Laboratory Manual, 2nd ed., Cold Spring HarborLaboratory, Cold Spring Harbor, NY.

Santoni, V., Rouquie, D., Doumas, P., Mansion, M., Boutry, M., De-gand, H., Dupree, P., Packman, L., Sherrier, J., Prime, T., Bauw,G., Posada, E., Rouze, P., Dehais, P., Sahnoun, I., Barlier, I. andRossignol, M. 1998. Use of a proteosome strategy for taggingproteins present at the plasma membrane. Plant J. 16: 633–641.

Shirozu, M., Tada, H., Tashiro, K., Nakamura, T., Lopez, N.D.,Nazarea, M., Hamada, T., Sato, T., Nakano, T. and Honjo, T.1996. Characterization of novel secreted and membrane proteinsisolated by the signal sequence trap method. Genomics 37: 273–280.

Shoemaker, D.D., Lashkari, D.A., Morris, D., Mittmann, M. andDavis, R.W. 1996. Quantitative phenotypic analysis of yeastdeletion mutants using a highly parallel molecular bar-codingstrategy. Nature Genet. 14: 450–456.

Smith, V., Botstein, D. and Brown, P.O. 1995. Genetic footprinting:a genomic strategy for determining a gene’s function given itssequence. Proc. Natl. Acad. Sci. USA 92: 6479–6483.

Tashiro, K., Tada, H., Heiker, R., Shirozu, M., Nakano, T. andHonjo, T. 1993. Signal sequence trap: a cloning strategy forsecreted proteins and type I membrane proteins. Science 261:600–603.

Tashiro, K., Nakano, T. and Honjo, T. 1996. Signal sequence trap:expession cloning method for secreted proteins and type I mem-brane proteins. In: I.G. Cowell and C.A. Austin (Eds.), Methodsin Molecular Biology, Humana Press, Totowa, NJ, pp. 203–219.

von Heijne, G. 1985. Signal sequences. The limits of variation. J.Mol. Biol. 184: 99–105.

Walter, P. and Johnson, A.E. 1994. Signal sequence recognition andprotein targeting to the endoplasmic reticulum membrane. Annu.Rev. Cell Biol. 10: 87–119.

Yoshie, O., Imai, T. and Nomiyama, H. 1997. Novel lymphocyte-specific CC chemokines and their receptors. J. Leukocyte Biol.62: 634–644.