Upload
manuel-sanchez
View
213
Download
0
Embed Size (px)
Citation preview
Yeast Sequencing Report
Analysis of 41 kb of the DNA sequence from the rightarm of chromosome II of Schizosaccharomyces pombe
Manuel Sanchez1, Jose L. Revuelta1, Francisco del Rey1, Rhian Gwilliam2, Jason Skelton2, Carol
Churcher2, Marie-Adele Rajandream2, Valerie Wood2, Bart Barrell2, Rachel Lyne2, Richard Reinhardt3,
Katja Borzym3, Alfred Beck3, Sergio Moreno1 and Angel Domınguez1*1 Departamento de Microbiologıa y Genetica, Instituto de Microbiologıa Bioquımica/CSIC. Universidad de Salamanca, 37071 Salamanca, Spain2 Sanger Centre, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK3 Max Planck Institut fur Molekulare Genetik, Ihnestrasse 73, D-14195 Berlin, Germany
*Correspondence to:A. Domınguez, Departamento deMicrobiologıa y Genetica, Institutode Microbiologıa Bioquımica/CSIC,Universidad de Salamanca,37071 Salamanca, Spain.E-mail: [email protected]
Received: 28 February 2001
Accepted: 24 April 2001
Abstract
We report the complete sequence of cosmid c18A7 (41 046 bp insert), located on the right
arm of chromosome II of the Schizosaccharomyces pombe genome. The sequence, which
partially overlaps with cosmids SPBC4F6 and SPBC336, contains 16 open reading frames
(ORFs) capable of coding for proteins of at least 100 amino acid residues in length (one
partial) and one small nucleolar RNA (snoRNA). Four known genes were found: swi10(encoding a mating-type switching protein also involved in nucleotide excision repair); dim1(encoding a dimethyladenosine transferase); arf1 (encoding ADP-ribosylation factor 1);
and pol3 (cdc6) the partial fragment, encoding the 125 kDa catalytic subunit of the DNA
polymerase type B. Six ORFs similar to known proteins were found. They include a
transporter of the major facilitator superfamily class, a vacuolar sorting protein, an
asparagine synthase, a nuclear protein, a reticulum oxidoreductin and a heat shock protein.
Each protein product of the other six ORFs has conserved domains and can be assigned a
molecular, but not a biological, function. The sequence has been submitted to the EMBL
database under Accession No. AL080287. Copyright # 2001 John Wiley & Sons, Ltd.
Keywords: genome sequencing; Schizosaccharomyces pombe; chromosome II; swi10;
arf1; dim1; pol3; enterobactin transporter; vacuolar sorting protein; asparagine synthase;
oxidoreductin 1-Lb; LIM domain protein; heat shock protein; DNA helicases
Introduction
As participants in the European Schizosacchar-omyces pombe Genome Sequencing Project, wehave been involved in the sequencing of 120 kbcontained in three different cosmids. We havepreviously presented the sequence and a computeranalysis of one cosmid, SP32F12 (Sanchez et al.,1999). Here we describe the results obtained forSPBC18A7. This fragment (41 046 bp) correspondsto the entire insert of cosmid c18A7 and is locatedon the right arm of chromosome II (Hoheisel et al.,1993). Cosmid c18A7 overlaps at one end withcosmid c4F6 (Accession No. AL031534) and at theother with c336 (Accession No. AL121815). ORFsidentified from both cosmids are annotated with thesystematic gene names of SPBC4F6 and SPBC336.
Materials and methods
Cosmids, plasmids and strains
The DNA coordinator R. Gwilliam, The Sanger
Centre, Hinxton, Cambridge, provided cosmid
c18A7. This contains a 41 kb insert of chromosome
II obtained by Sau3A partial digestion of Sz. pombe
DNA (strain 972 hx) and cloned into the BamHI
site of the cosmid vector Lawrist4 (Hoheisel et al.,
1993). The insert of cosmid c18A7 partially overlaps
the insert of cosmids c4F6 (Hinxton) and c336
(Berlin). The pBluescript KS+ (Stratagene) phage-
mid was used as vector for all subsequent sub-
cloning and sequencing steps. The Escherichia coli
strain used as host for transformation and amplifica-
tion of plasmids was DH5a supE44 DlacU169(ø80
YeastYeast 2001; 18: 1111–1116.DOI: 10.1002 / yea.760
Copyright # 2001 John Wiley & Sons, Ltd.
lacZ DM15)hsdR17 recA1 endA1 gyrA96 thi-1 relA1
(Hanahan, 1983). Transformants were selected on LB
media supplemented with 100 mg/l ampicillin.
Manipulation of nucleic acids
Routine DNA manipulations, cosmid preparation,subcloning, Southern blotting, restriction enzymedigestions, agarose gel electrophoresis, ligation ofDNA fragments and E. coli transformation were per-formed according to standard techniques (Sambrooket al., 1989). Plasmid preparations were carried outusing Wizard miniprep columns (Promega).
Sequencing strategy
The DNA sequence was determined using a randomapproach. A shotgun library of short fragments ofthe 41 kb insert of c18A7 was obtained as follows.10 mg of purified cosmid DNA was subjected tosonication in an Eppendorf tube, using a MSESoniprep 150 sonicator cell disruptor. After sonic-ation for 5 s, fragments in the size range100–5000 bp were obtained. Sonication productswere end-repaired using T4 DNA polymerase andelectrophoresed on 1% agarose gel. Fragments inthe 1–5 kb size range were extracted from agaroseby electrolution and inserted into the EcoRV site ofthe pBluescript KS+ vector. The recombinantplasmids obtained were used to transform the E.coli DH5a strain. A total of 312 clones were selectedand stored at x30uC in 96-well plates. The actualsize of the inserts ranged from 2 kb to 4 kb, with amean size of 3.5 kb. Random sequencing reactionswere performed using universal and reverse primers.Gap-filling sequencing reactions were performedusing custom-synthesized primers. Sequencing wasperformed on an ABI 377 sequencer (AppliedBiosystems Inc.) using the Taq DyeDeoxyTM
Terminator Cycle Sequencing Kit as supplied bythe manufacturer. The kit uses dITP as a standardsubstitute for dGTP, which effectively eliminatescompressions formed during polyacrylamide gelelectrophoresis. In total, 401 random sequences(167 direct and 154 reverse reads) and 80 customprimer-directed sequences were performed. Al-together, raw data from 230 704 bases were alignedto assemble the final contig, the average readingnumber per base pair being 5.6 and each base pairbeing sequenced on both strands and at least twice(upper and lower strand together). The quality of
the final sequence was checked by visual inspectionof the sequencing profiles at each position on eachDNA strand. The sequence was considered finalonly when an unambiguous reading of each nucleo-tide on each strand was achieved.
Computer-assisted sequence analysis
Assembly of the sequences was done with theSeqMan program of the DNASTAR programpackage (DNASTAR Ltd). Gene prediction wasperformed as previously described (Xiang et al.,1999). DNA sequence was compared with EMBL,EMBLNEW and the Sz. pombe sequencing projectfinished and unfinished data, using BLAST 2.0(Altschul et al., 1997). ORFs were named accordingto the working nomenclature of the EU Sz. pombeGenome Sequencing Project (Table 1). The lettersSP stand for Sz. pombe and the letter B for thechromosome number (B=chromosome II); thefollowing alphanumeric symbols indicate thecosmid name (C18A7) and the last two digits referto consecutive ORFs in the cosmid. An additional‘c’ letter indicates the complementary strand. Homo-logy searches were performed against EMBL,EMBLNEW and a non-redundant protein database(SWISSPROT+TREMBL+TREMBLNEW), asdescribed in Xiang et al. (1999). Sequences ofpredicted proteins were also compared against thenon-redundant protein sequence database usingFASTA3 (Pearson and Lipman, 1988) and scannedfor Pfam motifs (Sonnhammer et al., 1998).
Results and discussion
Sequence analysis
Cosmid c18A7 contained an insert of 41 kb fromthe right arm of chromosome II of Sz. pombe. Thesequenced region was analysed as described inMaterials and methods. Two large portions of theinsert of cosmid c18A7 overlap with cosmid c4F6(24 618 bp at one end) and with cosmid c336(13 883 bp at the other). The sequence has a GCcontent of 36.72% and contains 16 putative ORFs(one partial), giving a density of one gene per2.5 kb. Both values are similar to those describedfor other Sz. pombe cosmids (Sanchez et al., 1999;Lucas et al., 2000; Xiang et al., 2000). However, the16 coding sequences cover 67.5% of the totalsequence, a value higher than those reported for
1112 M. Sanchez et al.
Copyright # 2001 John Wiley & Sons, Ltd. Yeast 2001; 18: 1111–1116.
other cosmids located on chromosome II (54.1%cosmid c32F12, Sanchez et al., 1999; and 59%cosmids c16H5, c12D12, c24C6 and c19G7, Xianget al., 2000). Seven of the ORFs have introns. Wefound four known genes (one partial) and 12 openreading frames longer than 300 bp. All of the 12ORFs show homology with proteins of knownfunctions (FASTA scores>200) from other species.
Small nucleolar RNA (snoRNA)
The snoRNA contains 86 nucleotides and is locatedbetween the swi10-SPBC4F6.16c genes (positions15 282–15 367). It belongs to the box C/D family(Balakin et al., 1996) and displays homology to theS. cerevisiae U18 snRNA.
ORFs corresponding to known genes
SPBC4F6.15c corresponds to the swi10 gene codingfor a protein involved in mating-type switching andin nucleotide excision repair (SWISSPROT Q06182)located in the nucleus. Our sequence matchesexactly with the one previously described (Roedelet al., 1992).
SPBC4F6.18c is the arf1 gene (SWISSPROTP36579) encoding an ADP-ribosylation factor that
belongs to the APF family of GTP-binding pro-
teins. Our sequence exactly matches the reported
cDNA sequence (Erickson et al., 1993).SPBC336.02 corresponds to the dim1 gene (Swall
Q9USU2), coding for a dimethyladenosine transfer-
ase (EC2.1.1-), which specifically dimethylates two
adjacent adenosines in the loop of a conserved
hairpin near the 3k-end of 18S rRNA in the 40S
particle (submitted by Housen et al., 1995). We
have observed differences with the previously
reported sequence in eight amino acids (positions
144, 170, 290–291, 298, 304–305 and 307). Also, our
protein appeared to be nine amino acids shorter
than the one previously described and exactly
matches the sequence reported for cosmid c336.SPBC336.04 is truncated at the 3k end and is
located at one end of the genomic fragment cloned
in c18A7 (Figure 1). SPBC336.04 corresponds to
positions 1–839 of pol3 (cdc6) encoding the 125 kDa
catalytic subunit of the DNA polymerase delta
(SWISSPROT P30316; Pignede et al., 1993; Park
et al., 1993). In our sequence we detected changes in
amino acids 102, 419, 545 and 777–784. This DNA
polymerase III is located in the nucleus, belongs to
the DNA polymerase type B family, and displays
two enzymatic activities: DNA synthesis and an
Table 1. Characteristics of open reading frames (ORFs) identified in cosmid c18A7
ORF name Coordinates Length
(aa)
Introns (bp) Homologies FastA scores
initn init1 opt
SPBC4F6.09 156–1994 612 S. cerevisiae Enb1p 784 420 576
SPBC4F6.10 2709–4427 537 S. cerevisiae Vps9p 697 220 782
SPBC4F6.11c 4482–6128 548 Human FLJ20752 860 294 569SPBC4F6.12 7145–8517 438 1(7175–7230) Human leupaxin 312 285 434
SPBC4F6.13c 9058–11268 736 D. melanogaster 52C10.1 1674 813 1319
SPBC4F6.14 11499–13523 674 C. elegans R05H10.2 304 118 358
SPBC4F6.15c 13784–14772 252 1(14292–14351) Sz. pombe swi102(14438–14480)
3(14499–14625)
snoRNA 15282–15367
SPBC4F6.16c 15628–17031 467 Human ERO1-Lb 565 239 564SPBC4F6.17c 17869–20280 803 S. cerevisiae Hsp78p 2823 1812 2930
SPBC4F6.18c 21936–22546 180 1(22428–22496) Sz. pombe arf1
SPBC18A7.01 23892–25247 452 Pyrococcus horikoskii dipeptidase 478 193 546
SPBC18A7.02c 25370–26846 458 1(26596–26646) Sz. pombe C26H507c 235 133 4532(26726–26778)
SPBC336.01 27930–30606 829 1(28541–28687) Ureaplasma parvum DNA helicase II 198 84 205
2(29740–29779)SPBC336.02 31597–32578 307 1(31731–31790) Sz. pombe dim1
SPBC336.03 34836–37799 898 D. discoideum aimless 267 149 501
SPBC336.04 38477–? ? 1(38706–38757) Sz. pombe pol3
Sequence analysis of Schizosaccharomyces pombe 1113
Copyright # 2001 John Wiley & Sons, Ltd. Yeast 2001; 18: 1111–1116.
exonucleotic activity that degrades single-strandedDNA in the 3k–5k direction.
Novel ORFs with putative functions
SPBC4F6.09 shows the best similarity (by FASTAanalysis) to the S. cerevisiae ENB1 gene, whichencodes a transporter for enterobactin (Heymannet al., 2000). Comparison of the Sz. pombe proteinby BLAST analysis in the MIPS (www-mips.biochem.mpg.de/cgi-bin/blast) indicated higher simi-larity with the S. cerevisiae subtelomeric proteinsYKR106w and YCL073c. Since the three S. cerevisiaeproteins belong to the major facilitator superfamily class(multidrug permease homologies family 2; Nelissenet al., 1977), the assignment of Sz. pombe to this familyseems pertinent.
SPBC4F6.10 shows high similarity with Vpt9p, avacuolar sorting protein of S. cerevisiae (Stepp et al.,1997).
SPBC4F6.11c encodes a putative protein whichshows similarity to human FLJ20752 (coding fora putative protein of 643 amino acids), to theCaenorhabditis elegans m18.3 protein and to S.cerevisiae genes ASN1, ASN2 and YML096w, whichencode asparagine synthases. Thus, we propose thatthis gene encodes a Sz. pombe asparagine synthase.
SPBC4F6.14 shows similarity with human cDNAFLJ10377 (Swall Q9NW13), with the C. elegansR05H10.2 protein (SWISSPROT O62325; Wilsonet al., 1994) and with S. cerevisiae Nop4p (Sun and
Woolford, 1994, 1997). The latter is a nuclearprotein, whose gene disruption is lethal, thatcauses failure in the maturation of the 27S rRNAprecursor to 25S mature rRNA and whose absenceleads to a deficit of 60S ribosomal subunits (Bergeset al., 1994) and contains three rrm RNA recogni-tion motifs.
SPBC4F6.16c shows similarity (36.1% identity in423 amino acids) with human endoplasmic reti-culum oxidoreductin 1-Lb (ERO1-Lb), a protein of467 amino acids (Swall Q9NR62; Cabibo et al.,2000), and with S. cerevisiae ERO1 (SWISSPROTO03103), possibly required for protein disulphidebond formation in the cell.
SPBC4F6.17c shows a strong degree of similaritywith S. cerevisiae Hsp78p, a mitochondrial heat-shock protein in the Clp family of ATP-dependentproteases (Leonhart et al., 1993) and contains achaperonin CipA/B domain.
ORFs with no assigned functions
SPBC4F6.12 encodes a putative protein with 25.8%identity over 321 amino acids to the C-terminal partof Leupaxin, a human LIM domain protein (386amino acids SWISSPROT O60711) that forms acomplex with PYK2 (Lipsky et al., 1998). The Sz.pombe protein also shows weak homology with S.cerevisiae Lgr1p, a GTPase-activating protein of therho/sac family and contains LIM domains.
SPBC4F6.13c shows similarity (38.5% identity
Figure 1. Computer analysis of the SPBC18A7 insert. Arrows indicate location and direction of the ORFs. The gene name isindicated above the arrow. Black arrows correspond to known genes (also labelled with their names); white ones to genescoding for proteins with putative functions, and hatched ones to genes coding for proteins of unknown function
1114 M. Sanchez et al.
Copyright # 2001 John Wiley & Sons, Ltd. Yeast 2001; 18: 1111–1116.
over 761 amino acids) with the 52C10.1 protein ofDrosophila melanogaster (Swissprot O96841) and40.2% identity over 820 amino acids with S.cerevisiae YMR049c, a member of the b-transducinfamily.
SPBC18A7.01 shows similarity with dipeptidasesfrom Archaea (Kawarabayasi et al., 1998) andBacteria, both Gram-negative (Stover et al., 2000)and Gram-positive (Kunst et al., 1997), and weaksimilarity with S. cerevisiae YER078c, and has aM24 peptidase domain.
SPBC18A7.02 shares similarity with the Sz.pombe hypothetical 56.9 kDa protein C26H5.07c,located on chromosome I, and with S. cerevisiaePtm1p (23.2 identity in 460 amino acids), a memberof the major facilitator superfamily (Nelissen et al.,1995).
SPBC336.01 could encode a protein 829 aminoacids long exhibiting weak similarity (27.0% identityin 400 amino acids) with a DNA helicase II ofUreaplasma parvum (Ureaplasma urealyticum bio-type 1; Glass et al., 2000). A central block of 82amino acids (428–510) shows a higher degree ofsimilarity with other DNA helicases, including theS. cerevisiae Hmi1p mitochondrial DNA helicase.
SPBC336.03 presents a low degree of similaritywith Dictyostelium discoideum aimless RasGEF(Insall et al., 1996) and with the S. cerevisiaeGDP/GTP exchange factors Cdc25p and Sdc25p.The Sz. pombe protein contains the N-terminalmotif of the guanine nucleotide exchange factor forRas-like GTPases (Cdc25-like domain) and, in theC-terminal part, the domain (250 amino acidregion) found in the guanine nucleotide dissociationstimulators of the CDC25 family. It containsRasGEF, a guanine nucleotide exchange factor.
Acknowledgements
This work was supported by the European Commission in
the framework of the European Schizosaccharomyces
Genome Sequencing Project (BIO4-CT960159) and by the
Comision Interministerial de Ciencia y Tecnologıa (BIO97-
1535-C01-02-03-04-CE).
References
Altschul SF, Madden TL, Schaffer AA, et al. 1997. Gapped
BLAST and PSI-BLAST: a new generation of protein
database search programs. Nucleic Acids Res 25: 3389–3402.
Balakin AG, Smith I, Fournier MJ. 1996. The RNA world of the
nucleus: two major families of small RNAs defined by different
box elements with related functions. Cell 86: 823–834.
Berges T, Petfalski E, Tollervey D, Hurt EC. 1994. Synthetic
lethality with fibrillarin identifies Nop77p, a nucleolar protein
required for pre-rRNA processing and modification. EMBO J
13: 3136–3148.
Cabibbo A, Pagani M, Fabbri M, et al. 2000. ERO1-L, a human
protein that favors disulfide bond formation in the endo-
plasmic reticulum. J Biol Chem 275: 4827–4833.
Erickson FL, Hannig EM, Krasinskas A, Kahn RA. 1993.
Cloning and sequence of ADP-ribosylation factor 1 (ARF1)
from Schizosaccharomyces pombe. Yeast 9: 923–927.
Glass JI, Lefkowitz EJ, Glass JS, et al. 2000. The complete
sequence of the mucosal pathogen Ureaplasma urealyticum.
Nature 407: 757–762.
Hanahan D. 1983. Studies on transformation of Escherichia coli
with plasmids. J Mol Biol 166: 557–580.
Heymann P, Ernst JF, Winkelmann G. 2000. A gene of the
major facilitator superfamily encodes a transporter for
enterobactin (Enb1p) in Saccharomyces cerevisiae. Biometals
13: 65–72.
Hoheisel JD, Maier E, Mott R, et al. 1993. High resolution
cosmid and P1 maps spanning the 14 Mb genome of the fission
yeast S. pombe. Cell 73: 109–120.
Insall RH, Borleis J, Devreotes PN. 1996. The aimless RasGEF
is required for processing of chemotactic signals through G-
protein-coupled receptors in Dictyostelium. Curr Biol 6:
719–729.
Kawarabayasi Y, Sawada M, et al. 1998. Complete sequence and
gene organization of the genome of a hyperthermophilic
archaebacterium, Pyrococcus horikoshii OT3. DNA Res 5:
147–155.
Kunst F, Ogasawara N, et al. 1997. The complete genome
sequence of the Gram-positive bacterium Bacillus subtilis.
Nature 390: 249–256.
Leonhardt SA, Fearon K, Danese PN, Mason TL. 1993. HSP78
encodes a yeast mitochondrial heat shock protein in the Clp
family of ATP-dependent proteases. Mol Cell Biol 13:
6304–6313.
Lipsky BP, Beals CR, Staunton DE. 1998. Leupaxin is a novel
LIM domain protein that forms a complex with PYK2. J Biol
Chem 273: 11709–11713.
Lucas M, Gwillam R, Lepingle M, et al. 2000. Sequence analysis
of two cosmids from Schizosaccharomyces pombe chromosome
III. Yeast 16: 1519–1526.
Nelissen B, Mordant P, Jonniaux JL, De Wachter R, Goffeau A.
1995. Phylogenetic classification of the major superfamily of
membrane transport facilitators, as deduced from yeast
genome sequencing. FEBS Lett 377: 232–236.
Nelissen B, De Wachter R, Goffeau A. 1997. Classification of all
putative permeases and other membrane plurispanners of the
major facilitator superfamily encoded by the complete genome
of Saccharomyces cerevisiae. FEMS Microbiol Rev 21:
113–134.
Park H, Francesconi S, Wang TSF. 1993. Cell cycle expression of
two replicative DNA polymerases, alpha and delta, from
Schizosaccharomyces pombe. Mol Biol Cell 4: 145–157.
Pearson WR, Lipman DJ. 1988. Improved tools for biological
sequence comparison. Proc Natl Acad Sci U S A 85:
2444–2448.
Pignede G, Bouvier D, de Recondo AM, Baldacci G. 1991.
Sequence analysis of Schizosaccharomyces pombe 1115
Copyright # 2001 John Wiley & Sons, Ltd. Yeast 2001; 18: 1111–1116.
Characterization of the pol3 gene product from Schizosacchar-
omyces pombe indicates inter-species conservation of the
catalytic subunit of DNA polymerase delta. J Mol Biol 222:
209–218.
Roedel C, Kirchhoff S, Schmidt H. 1992. The protein sequence
and some intron positions are conserved between the switching
gene swi10 of Schizosaccharomyces pombe and the human
excision repair gene ERCC1. Nucleic Acids Res 20: 6347–6353.
Sambrook J, Fritsch E, Maniatis Y (eds). 1989. Molecular
Cloning: A Laboratory Manual. Cold Spring Harbor Labor-
atory Press: New York.
Sanchez M, del Rey F, Domınguez A, Moreno S, Revuelta JL.
1999. DNA sequencing and analysis of a 40 kb region from the
right arm of chromosome II from Schizosaccharomyces pombe.
Yeast 15: 419–426.
Sonnhammer EL, Eddy SR, Birney E, Bateman A, Durbin R.
1998. Pfam: multiple sequence alignments and HMM-profiles
of protein domains. Nucleic Acids Res 26: 320–322.
Stepp JD, Huang K, Lemmon SK. 1997. The yeast adaptor
protein complex, AP-3, is essential for the efficient delivery of
alkaline phosphatase by the alternate pathway to the vacuole.
J Cell Biol 139: 1761–1774.
Stover CK, Pham X, et al. 2000. Complete genome sequence of
Pseudomonas aeruginosa PA01, an opportunistic pathogen.
Nature 406: 959–964.
Sun C, Woolford JL Jr. 1994. The yeast NOP4 gene product is
an essential nucleolar protein required for pre-rRNA proces-
sing and accumulation of 60S ribosomal subunits. EMBO J 13:
3127–3135.
Sun C, Woolford JL Jr. 1997. The yeast nucleolar protein Nop4p
contains four RNA recognition motifs necessary for ribosome
biogenesis. J Biol Chem 272: 25345–25352.
Wilson R, Ainscough R, et al. 1994. 2.2 Mb of contiguous
nucleotide sequence from chromosome III of C. elegans.
Nature 368: 32–38.
Xiang Z, Lyne MH, Wood V, et al. 1999. DNA sequencing and
analysis of a 67.4 kb region from the right arm of Schizo-
saccharomyces pombe chromosome II reveals 28 open reading
frames including the genes his5, pol5, ppa2, rip1, rpb8 and skb1.
Yeast 15: 893–901.
Xiang Z, Moore K, Wood V, et al. 2000. Analysis of 114 kb of
DNA sequence from fission yeast chromosome 2 immediately
centromere-distal to his5. Yeast 16: 1405–1411.
1116 M. Sanchez et al.
Copyright # 2001 John Wiley & Sons, Ltd. Yeast 2001; 18: 1111–1116.