Upload
guilherme-oliveira
View
213
Download
1
Embed Size (px)
Citation preview
Mining the schistosome
DNA sequence
database
The basic understanding of organism
biology that can be gained through genome
analysis has prompted numerous research
communities to initiate projects to
sequence either whole genomes or specific
components such as expressed genes (the
transcriptome). In many networks, the
initial priority is the generation of
expressed sequence tags (ESTs); these are
randomly selected, short, single-pass
cDNA sequences that reflect the
transcriptional activity of organisms or
tissues1. For parasitologists, such
approaches promise the discovery of new
drug targets and new candidate vaccine
antigens, as well as revealing the
molecular mechanisms underlying, for
example, parasite biochemistry,
development, pathogenicity and diversity2.
Schistosoma possesses a large, highly
repetitious genome; therefore, EST analysis
has provided the Schistosome Genome
Network (SGN) with a rapid and cost-
effective way to produce gene catalogues for
both Schistosoma mansoni (the main focus)
and Schistosoma japonicum3. With the
support of various agencies, the SGN has
generated, annotated and deposited more
than 16 000 sequences in dbEST (the EST
division of GenBank)4,5 (Box 1a). EST
analysis continues for both species; major
new initiatives totaling 200 000 S. mansoni
ESTs (potentially a 1.6–2.5× transcriptome
coverage) have recently been announced in
Brazil (Box 1, b–e) and large-scale projects
for S. japonicum are under development in
China. An overview of the SGN’s activities
can be obtained from its website (Box 1d).
To date, exploitation of public EST data
has chiefly been by keyword search of
sequence annotations or by homology
search. The size of the Schistosoma EST
dataset now permits its exploration in
additional creative and informative ways.
Some analyses require access to advanced
computing and programming support, but
there are increasing possibilities for
individual research analysis. This article
provides an overview of methods being
used to mine Schistosoma EST data; an
accompanying website provides more
comprehensive information (Box 1e).
Data mining through database searching
Text searches and sequence comparisons
are the most common ways to query a
database (Box 1) and allow the putative
assignment of sequence–function
relationships. Many database queries can
be carried out via the World Wide Web and
several schistosome-specific search
resources are available. These include:
blast analysis against non-redundant
DNA sequence sets and their six-frame
amino acid translations, a parasite protein
motif search (both are available through
the Parasite Genome Web server) and the
S. mansoni gene index [at The Institute
for Genomic Research (TIGR)]. These tools
and links to more general bioinformatics
services are listed in Box 1.
Computational technology also allows
more imaginative and complicated
database searches that analyze gene
expression in a wider context and
generate testable hypotheses.
(1) Microsatellite polymorphisms provide
essential markers for genome sequencing,
positional cloning, physical mapping and
population analysis6,7. RepeatMasker Web
servers (Box 1) allow large numbers of
sequences to be scanned rapidly for
microsatellite-like simple repeats. For
example, examination of 310 S. japonicum
cluster consensus sequences revealed
31 tri-, eight tetra- and two
pentanucleotide repeat regions. Primers
designed to regions flanking these
sequences can then be used to test for
polymorphisms. (2) Parasite biochemistry
and genomics can be integrated through
in silico metabolomics8 – the mapping of
identified genes onto metabolic pathways.
With a view to identifying new drug
targets, it should be possible to identify
standard pathways that are present
exclusively in the parasite. Moreover,
where the parasite’s gene catalog does
not appear to contain expected enzymes,
this could suggest that alternate or novel
pathways are in operation. Metabolism-
related ESTs can also be separated by life-
cycle stage and statistical comparisons
made. A comprehensive list of enzyme
descriptions has been used to search
Schistosoma sequence annotations and
hits classified by life-cycle stage. Fructose-
biphosphate aldolase appears to be
expressed at a higher-than-expected level
in cercariae, whereas expression of
ubiquinol, cytochrome-c-oxidase and
glycogen synthase appears to be biased
towards adult worms. Such observations
can be correlated with current knowledge
of parasite metabolism and used to create
testable hypotheses.
Data mining through cluster analysis
Cluster analysis groups together
homologous sequences, identifying the
non-redundant sequence set9. Several
such analyses are available for
Schistosoma EST data (Box 1). Once
cluster data is available, a wide variety of
secondary analyses can be performed.
(1) In general, the utility of the primary
databases (GenBank and European
Molecular Biology Laboratory; EMBL)
is determined by the accuracy of
sequence annotation. This can be
assessed by comparing the annotation
of sequences within individual
clusters10. Our analysis of 2763
Schistosoma ESTs reveals only 0.65%
discrepancy in annotation. Thus, with
a quality database available, data-
mining processes involving computer
analysis and interpretation of data can
be reliably undertaken11.
(2) Cluster consensus sequences are
frequently longer than the individual
sequences within them, facilitating
identification through homology
searching, and blast analysis of
consensus sequences is often
performed as part of the clustering
process (Box 1). More than 40% of
previously unidentified Schistosoma
ESTs could be classified in this way.
(3) Transcript families and alternate
splicing events can be revealed by
comparing clusters that return similar
database homology results. For
example, two S. mansoni clusters
grouped with different actin cDNA
sequences present in GenBank, and
S. mansoni glucose-3-phosphate
dehydrogenase (G3PDH) might be
alternatively spliced because one of two
G3PDH clusters lacks a stretch of
codons that is present in the other.
(4) Single nucleotide polymorphisms
(SNPs)12 can be identified by inspection
of sequence alignments for individual
clusters. For example, rRNA and
cytochrome oxidase subunit II clusters
reveal two major polymorphic groups.
Schistosome SNPs could be relevant to
pharmacogenomics (how individual
TRENDS in Parasitology Vol.17 No.10 October 2001
http://parasites.trends.com 1471-4922/01/$ – see front matter © 2001 Elsevier Science Ltd. All rights reserved. PII: S1471-4922(01)02019-0
501Forum
ParaSite – Genome Analysis
parasites or isolates are affected by
chemotherapy, or recognized by natural
or induced immune responses13) as well
as to population studies.
(5) Some Schistosoma EST clusters are
derived exclusively from a single
developmental stage (e.g. calcium
binding protein, glutathione
S-transferase), whereas others reveal
transcription across two or more stages
(e.g. actin, cytochrome oxidase subunit I).
Statistical analysis of expression with
respect to life cycle can, with some
caveats, reflect transcriptional activity
and might be useful for those interested
in studying a specific developmental
stage or the regulation of gene
expression throughout development. In
addition, abundantly expressed
sequences that cannot be identified by
simple homology analysis are worthy of
more detailed study as they could
represent parasite-specific genes14.
Data mining to support post genomics
Data mining also contributes to post-
genomic activities such as microarray and
proteomic analysis. Non-redundant clone
sets identified by clustering are being used
to prepare Schistosoma microarrays.
These will facilitate analysis of global
transcription profiles, contributing to our
understanding of parasite development,
sexual differentiation and responses to
environmental or experimental
perturbations (e.g. pharmacological
attack)15. For proteomics, ESTs and cluster
consensus sequences provide a database
for analysis of peptide mass fingerprints
and peptide sequence tags, linking gene
expression to gene products16. In
particular, cluster consensus sequences
provide more accurate open reading frame
predictions than individual ESTs.
Perspective
Although schistosome genome analysis is
far from complete, and almost all aspects
of functional genomics still remain
unexplored, mining of the accumulated
data already enables workers to
undertake important and exciting
research into basic schistosome biology.
Guilherme Oliveira
Centro de Pesquisas René Rachou,Fundação Oswaldo Cruz – FIOCRUZ, BeloHorizonte, Minas Gerais 30190-002, Brazil. e-mail: [email protected]
David A. Johnston
Dept of Zoology, The Natural HistoryMuseum, London, UK SW7 5BD.
References
1 Adams, M.D. et al. (1991) Complementary DNA
sequencing: expressed sequence tags and human
genome project. Science 252, 1651–1656
2 Johnston, D.A. et al. (1999) Genomics and the
biology of parasites. BioEssays 21, 131–1473
3 Williams, S.A. et al. (1999) Helminth genome
analysis: the current status of the filarial and
schistosome genome projects. Parasitology
118 (Suppl.), S19–S38
TRENDS in Parasitology Vol.17 No.10 October 2001
http://parasites.trends.com
502 Forum
(a) DbEST [expressed sequence tag database at the National Center for BiotechnologyInformation (NCBI)]: http://www.ncbi.nlm.nih.gov/dbEST/index.htmlEST summary by organism: http://www.ncbi.nlm.nih.gov/dbEST/dbEST_summary.html
(b) http://verjo18.iq.usp.br/schisto/(c) http://www.mct.gov.br/sobre/noticias/2001/25_04canexo.htm (d) http://www.nhm.ac.uk/hosted_sites/schisto/index.html
(Contains network administration, current project descriptions, resources, clusteranalysis, protocols and links of interest)
(e) www.nhm.ac.uk/hosted_sites/schisto/TIP2001/
Database annotation searches
Entrez server at NCBI: http://www.ncbi.nlm.nih.gov/Entrez/Sequence Retrieval System (SRS) at European Bioinformatics Institute (EBI):http://srs.ebi.ac.uk/ Other SRS servers: http://www.lionbio.co.uk/publicsrs.html
Database homology searches
Blast at NCBI: http://www.ncbi.nlm.nih.gov/BLAST/Blast at EBI: http://www.ebi.ac.uk/blast2/Blast tutorials: http://www.ncbi.nlm.nih.gov/Education/BLASTinfo/information3.htmlParasite genome blast at EBI: http://www.ebi.ac.uk/blast2/parasites.html
World Health Organization parasite genome and proteome website
http://www.ebi.ac.uk/parasites/parasite-genome.htmlIncludes servers to search a range of parasite-specific sequence databases with yourown sequence. Available searches include: parasite blast server; proteome keywordsearch; protein motif search; six-frame translation of DNA sequences motif search andparasite codon usage tables. (Standard databases are updated monthly.)
Schistosoma cluster analyses
S. mansoni and S. japonicum clusters at the Schistosome Genome Network website: http://www.nhm.ac.uk/hosted_sites/schisto/clusters/intro.html
S. mansoni gene index at The Institute for Genomic Research: http://www.TIGR.org/tdb/smgi/
S. mansoni clusters at University of Pennsylvania: http://www.cbil.upenn.edu/ParaDBs/Schistosoma_2/index.html
Newsgroups or mailing lists
Parasite genome newsgroup: http://www.jiscmail.ac.uk/lists/parasite-genome.htmlSchistosoma newsgroup: http://www.bio.net/hypermail/schisto/
Repeat Masker Web server
http://ftp.genome.washington.edu:80/cgi-bin/RepeatMaskerScreens DNA sequences against libraries of simple and characterized repetitiveelements (use the ‘Only mask simple repeats and low complexity DNA’ option to searchfor microsatellite sequences)
General web tools
See http://www.ebi.ac.uk/parasites/webtools.html for listing.
Box 1. Websites of interest to the schistosome research community
4 Oliveira, G.C. (2001) Schistosoma gene discovery
project, an update. Trends Parasitol. 17, 108–109
5 Franco, G.R. et al. (2000) The Schistosoma gene
discovery program: state of the art. Int. J.
Parasitol. 30, 453–463
6 Curtis, J. and Minchella, D.J. (2000) Schistosome
population genetics structure: when clumping
worms is not just splitting hairs. Parasitol. Today
16, 68–71
7 Durand, P. et al. (2000). Isolation of microsatellite
markers in the digenetic trematode Schistosoma
mansoni from Guadeloupe island. Mol. Ecol.
9, 997–998
8 Goto, S. et al. (1997) Organizing and computing
metabolic pathway data in terms of binary
relations. Pac. Symp. Biocomput. 175–186
9 Yee, D.P. and Conklin, D. (1998) Automated
clustering and assembly of large EST collections.
Proc. Int. Conf. Intell. Syst. Mol. Biol. 6, 203–211
10 Pennisi, E. (1999) Keeping genome databases
clean and up to date. Science 286, 447–450
11 Boguski, M.S. (1998) Bioinformatics – a new era.
Bioinformatics: A Trends Guide 5, 1–3
12 Picoult-Newberg, L. et al. (1999) Mining SNPs
from EST databases. Genome Res. 9, 167–174
13 Evans, W.E. and Relling, M.V. (1999)
Pharmacogenomics: translating functional
genomics into rational therapeutics. Science
286, 487–491
14 Meira, W.S. et al. (1998) Protein, nucleotide
characterization of an abundant Schistosoma
mansoni transcript with no homologs in the
databases. Mem. Inst. Oswaldo Cruz 93 (Suppl.)
1, 211–213
15 Marshal, E. (1999) Do-it-yourself gene watching.
Science 286, 444–447
16 Ashton, P.D. et al. (2001) Linking proteome and
genome: how to identify parasite proteins. Trends
Parasitol. 17, 198–202
TRENDS in Parasitology Vol.17 No.10 October 2001
http://parasites.trends.com
503Forum
ParaSite
Size matters on
the Web
ProMed Discussion List
(http://www.fas.org/promed/)Leishmaniasis and foxhounds in the USA:how is it transmitted and why only tofoxhounds?These questions were raised on the ProMed
List and speculated about following a report
in a local newspaper in Virginia, USA. After
the death of ‘an avid fox hunter’, his pack of
15 foxhounds was distributed among his
friends. One by one, the hounds succumbed
to Leishmania infantum MON1, a strain
endemic in the Mediterranean area. Later,
their offspring also began to show symptoms.
Peter Schantz (Centers for Disease
Control and Prevention, Atlanta, GA, USA)
quoted that although native sandflies could
have picked up the parasite, it was now
believed to be transmitted from dog to dog.
The moderator also thought this was likely
because other breeds of dog living nearby
were not infected and were liable to be
exposed to the same vector. His theory was
that some foxhounds could have been sent
to the Middle East, perhaps to a dog show,
where they became infected. Foxhounds live
in packs and are often exchanged between
hunts so: ‘There are perfect opportunities
for the spread of the disease…(if it is
transmitted) sexually and congenitally as
well as by sandfly bite.’ [see ParaSite (2000)
Parasitol. Today 16, 371–372 for more
information]. Bruce Akey, a vet from the
Virginia Department of Agriculture, added
that 14 000 registered foxhounds
nationwide have been tested by the Center
for Disease Control’s (CDC)
immunofluorescence assay. At least one dog
in 69 different kennels across 21 states and
two Canadian provinces was sero-positive
and all culture isolates were L. infantum
MON1. The dogs often sustain superficial
lacerations during hunting and from
fighting, which commonly occurs within a
pack to establish a pecking order. Perhaps
foxhounds, similar to some humans and
mice, are also genetically immunodeficient?
Mosquito Discussion Group
(mosquito-l@ iastate.edu)Funeral urns and the toxic properties of copperScott Campbell (Arthropod-Borne Disease
Laboratory, Suffolk County, NY, USA)
knew of a local cemetery pushing the use
of bronze flower vases because mosquitoes
will not breed in them. Was this claim
true? Rick Duhrkopf (Baylor University,
TX, USA) said he does not bother looking
for larvae in bronze containers because he
has never found any, and several others
explained that a small amount of copper is
toxic enough to prevent larvae surviving
beyond the first few instars [Romi, R. et al.
(2000) J. Med. Entomol. 37, 281–285].
Tom Iwanejko (Arthropod-Borne
Disease Laboratory, Suffolk County, NY,
USA) said that in the aquarium trade,
copper salts are used as an anti-parasite
treatment, with a warning to remove all
invertebrates. Dominick Ninivaggi
(Arthropod-Borne Disease Laboratory,
Suffolk County, NY, USA) warned that in
some states putting copper in a vase might
be considered as an unregistered pesticide.
However, Cam Lay (Clemson University,
SC, USA), speaking for the sovereign state
of South Carolina, hoped his fellow pesticide
regulators had more significant things to
worry about: ‘You can apply trimethyl mole
cricket death [an insecticide]…to your lawn
whether or not mole crickets are
present…and if it kills the chinch bugs
that’s OK, even if they’re not on the label.’
Mosquito abundanceMartina Schäfer (Uppsala University,
Sweden) wanted to compare numbers.
Last summer, the record in central
Sweden was ~56 500 females (mostly
Aedes sticticus) caught in one night in a
CDC light trap with dry ice. Larry
Hribar’s collections in Florida Keys
ranged from 0 to ~55 000 mosquitoes in a
single night. In 1999, the top of the list
was Paul Reiter (CDC, San Juan, PR,
USA), who trapped 267 600 Aedes
taeniorhynchus in Grand Cayman. In an
answer to another question, Rick
Duhrkopf (Baylor University, TX, USA)
replied that Texas has the highest number
of mosquito species (85) (a current list can
be found at http://www.texasmosquito.org/
Checklist.html), although Gary
McCallister (Mesa State, CO, USA)
suggested that the figure should be
corrected for square miles…and size!
Is the mosquito the official state bird of
Minnesota? ‘No,’said Carlos Andrade
(Unicamp, SP, Brazil), ‘it’s the official state
bird of Michigan – the big Aedes vexans’.
Dennis Wallette (East Baton Rouge
Mosquito Abatement, LA, USA) was
scornful: ‘Aedes vexans big? … our
Psorophora ciliata or Psorophora howardii
… are often referred to as “gallinippers”
because they take a gallon of blood.’
Dominick Ninivaggi was told that in Texas,
mosquitoes cannot enter through windows
smaller than 18 inches square, but
Duhrkopf informed him that this is only
true in summer when adults are fully
grown and that, ‘the wintering adults are
small enough to get in which is why so
many Texans are armed’. Rob Anderson
(Simon Fraser University, Canada) agreed
that Ps. ciliata is the largest
haematophagous mosquito anywhere.
When doing media interviews he found it