Upload
sheryl-mitchell
View
219
Download
4
Embed Size (px)
Citation preview
In silico analysis of expressed sequence tags (EST) from Trichostrongylus vitrinus
(Nematoda): comparison of the automated ESTExplorer workflow platform with database
searches.
Shivashankar H. Nagaraj and Shoba Ranganathan
Professor and Chair – Bioinformatics Biotechnology Research Institute and Adjunct Professor Dept. of Chemistry & Biomolecular Sciences Dept. of BiochemistryMacquarie University National University of SingaporeSydney, Australia Singapore([email protected]) ([email protected])
Expressed Sequence Tags (ESTs)
Unedited, short, single pass sequences generated from 5' or 3' end of randomly selected cDNA libraries in desired cells/tissues/organ.
Length: 200-700 bp (average 360 bp) Can be quickly generated at low cost
(“poor-man’s genome”) EST data is highly fragmented EST annotations have very little
biological information High-throughput in nature
EST Applications Gene Discovery Gene Structure Prediction Expression Maps Alternative Splicing Identification and characterization of SNPs Gene expression studies
tissue or disease specific developmental stage
Proteomics (for example peptide mass fingerprinting)
Identification of drug and vaccine candidates
Genomic DNA
5’ ESTs 3’ ESTs
mRNA
cDNA
ESTs
vector repeats high quality sequence vector
High quality ~ 1-50 bp
~ 50 - 500 bp
~ 500- 700 bp
An EST sequence
Properties of ESTs
EST data resources
Available in plenty Several dedicated databases Fragmented Quality dubious
Need cleaning Clustering Annotation!
EST data repositoriesdbEST release 061507 (June, 2007) www.ncbi.nlm.nih.gov/dbEST/
43,396,096 ESTs from 659 different organisms
Homo sapiens (human) 8,119,106Mus musculus (mouse) 4,850,243 Danio rerio (zebrafish) 1,350,105 Bos taurus (cattle) 1,318,208 Arabidopsis thaliana (thale cress) 1,276,692 Xenopus tropicalis 1,271,375 Oryza sativa (rice) 1,211,418 Zea mays (maize) 1,161,241 Triticum aestivum (wheat) 1,050,267
Overview of EST sequence analysis
Submit Data
Raw EST sequence
data
Visualize results
Gene annotationRNAi
Gene mapping Alternative splicing
SNPs
Contamination check
Vector clipping
Poly-A removal
Repeat Masking
Clustering
Assembly
Consensus generation Peptide annotationProtein interactorsGene Ontologies
KEGG Conceptual translation
Evolution of ESTExplorer
Comparison of current methods for EST analysis
Critical evaluation of contemporary toolsand EST analysis pipelines
Benchmarking of tools using EST datasets
Lack of downstream functional annotation at DNA and protein levels
ESTExplorer
Description of ESTExplorer
ESTExplorer – features
Suite of programs to pre-process, assemble and functionally annotate ESTs
User-defined input and analysis – parameter control
Species-specific analysis Input: ESTs or assembled contigs Output: Assembled ESTs, Gene Ontologies,
mapping to Domains/Motifs, Pathway mapping
ESTExplorer analysis and annotation workflow, showing Phase I (pre-processing and assembly), Phase II (nucleotide-level annotation) and Phase III (protein-level annotation).
Phase I (EST pre-processing)
Phase II (DNA level Annotation)
Phase III (Protein level Annotation)
BLASTX
BLAST2GO
ESTScan
InterProScan KOBAS
Short sequences removed from the analysis
Input Option 2assembled ESTs
Input Option 1EST sequences
SeqClean
RepeatMasker
CAP3
Final output: Annotation summary for assembled ESTs
Quality values(.qual)
Assembled ESTs
Workflow
estexplorer.biolinfo.org
Annotation summary page
Trichostrongylus vitrinus (order Strongylida) is a parasitic nematode.
Principal causative nematode associated with parasitic diseases in sheep and cattle
Current treatment for the disease : chemotherapeutic agents (anti-helmintics)
Disadvantages with current treatments: a. Expensive and only partially effectiveb. Anthelmintics drug resistance over the last decadec. Residue problems in meat and milk
Possible alternative: the development of anti-parasite drugs and/or vaccines Nisbet AJ, et al. Int J Parasitol, 2004
The worm in question
Phase I
Phase II
Phase III
Creation of cDNA libraries and EST generation from the parasite Trichostrongylus vitrinus
Subset of potential drug target genes
Isolation of full length genes Functional Genomics via RNAi Biochemical activity assays Proteomics
Phase IV
Virtual and High-throughput screening
Pre-clinical and clinical evaluation
Comparative genomics with nematodes
Bioinformatics Analysis of the ESTs
Categorization of Differentially expressed ESTs
Raw ESTsmale: 910female: 866
EST pre-processing(SeqClean & RepMasker)
male:902 female:857
EST clustering and assembly(CAP3)
male contigs:180; singletons: 251female contigs:143; singletons:122
Conceptual translation(ESTSCAN)
peptide sequencesmale : 400
female: 240
Gene OntologiesBLAST2GO
male: 134female:133
Locate RNAi phenotype from C. elegans(BLASTX against Wormpep)
Phase I: EST pre-processing
Phase II:DNA level annotation
Database similarity searches for locating mammalian homologues
(BLASTX against NR)
Database similarity searches for locating parasitic nematode homologues
(BLASTX)
Database similarity searches againstNR and Wormpep
(BLASTX) for updating Nisbet et al results.
EST analysis schema
EST analysis schema
Secretome analysis(SignalP, TMHMM,
PSORT)Domain/Motif analysis
(InterProScan)Pathway Mapping
(KOBAS)
Phase III: Protein level Annotation
male: 28female: 12
male: 141female:120
male: 120female: 110
Results of overall EST analysis
Number of ESTs analysed : 1776 ( male : 910 female : 866)
Caenorhabditis elegans homologues 290 (41%) Homologues in parasitic nematodes 329 (42%) Homologues in non-nematodes 202 (28%) No significant match to any sequence 218 (31%) in the current databases
Gene Ontologies (GO) assigned 267 (38%) Pathway associations established 230 (33%)
Of the C. elegans homologues, 90 entries had observed ‘non-wildtype’ RNAi phenotypes, including embryonic lethality, maternal sterility, sterile progeny, larval arrest and slow growth.
EST ID E-value BLAST results
TVm02_C07
2.00E-37
PP1-gamma serine/threonine protein phosphatase
Trichostrongylus vitrinus (Nematoda: Strongylida): Molecular characterization and transcriptional analysis of Tv-stp-1, a serine/threonine phosphatase gene. Hu M, Abs El-Osta YG, Campbell BE, Boag PR, Nisbet AJ, Beveridge I, Gasser RB. Exp Parasitol. 2007 Mar 24;
Manual annotation using BLAST
Results from BLAST vs. ESTExplorer
Results from BLAST vs. ESTExplorer
EST ID E-value BLAST results BLAST results E-value Gene OntologiesMetabolic
Pathway MappingDomain/Motif data
TVm02_C07
2.00E-37
PP1-gamma serine/threonine protein phosphatase
protein phosphatase catalyticgamma isoform isoform 1 1.00E-36
chromatin modification, protein amino acid dephosphorylation, embryonic cleavage, cytokinesis, meiosis, oviposition, manganese ion binding, protein phosphatase type 1 activity, mitochondrial outer membrane, protein binding, mitosis, glycogen metabolic process, iron ion binding, nucleus
Long-term potentiation, Regulation of actin cytoskeleton, Focal adhesion, Insulin signaling pathway
Metallophosphoesterase, Serine/threonine-specific protein phosphatase and bis(5-nucleosyl)-tetraphosphatase
Trichostrongylus vitrinus (Nematoda: Strongylida): Molecular characterization and transcriptional analysis of Tv-stp-1, a serine/threonine phosphatase gene. Hu M, Abs El-Osta YG, Campbell BE, Boag PR, Nisbet AJ, Beveridge I, Gasser RB. Exp Parasitol. 2007 Mar 24;
Manual annotation using BLAST
Annotations obtained automatically from ESTExplorer
BLAST results E-value Gene OntologiesMetabolic Pathway Mapping
Domain/Motif data
protein phosphatase catalytic gamma isoform isoform 1 1.00E-36
chromatin modification, protein amino acid dephosphorylation, embryonic cleavage, cytokinesis, meiosis, oviposition, manganese ion binding, protein phosphatase type 1 activity, mitochondrial outer membrane, protein binding, mitosis, glycogen metabolic process, iron ion binding, nucleus
Long-term potentiation, Regulation of actin cytoskeleton, Focal adhesion, Insulin signaling pathway
Metallophosphoesterase, Serine/threonine-specific protein phosphatase and bis(5-nucleosyl)-tetraphosphatase
Annotations obtained automatically from ESTExplorer
Redefining parameters for possible drug/vaccine targets in parasitic nematodes
Secreted Proteins
Strong RNAi phenotypes in C. elegans Absence of homologues in mammalian host (nematode specific genes)
Genes with specificity to nematodes may serve as excellent targets for drugs/vaccines with low toxicity to humans and other vertebrates. Better understanding of the unusual nematode biochemistry can also have industrial or therapeutic value.
Embryonic lethality Larval lethality Sterile progeny Larval arrest Maternal sterility Slow growth
Parasites must secrete biologically active mediators to manipulate the host environment in order to survive immune attack Inhibit host antigen-processing pathwaysExamples :
• Aspartyl protease inhibitor (API-1)• Cystatin (cysteine protease inhibitor)• Acetylcholinesterase (AChE)
Harcus YM, et al. Genome Biol, 2004Delaney A, et al. Int J Parasitol 2005Vanholme B, et al. Gene 2004
6 5589
3
Non-nematodes100 (23.20%) 191 (44.31%)
45
C. elegans169 (39.21%)
19
2
Parasitic nematodes
Venn diagram
T. vitrinus male EST data comparison
6 2485
8Non-nematodes
26
C. elegans
6
3
121 (45.6%)
102 (38.4%) 138 (52.1%)Parasitic nematodes
Venn diagram
T. vitrinus female EST data comparison
SimiTri provides a two-dimensional display of relative similarity relationships among three different datasets.
SimiTri : visualizing similarity relationships for groups of sequences Database 1
Database 2Database 3
Query dataset (EST sequences in this study)
BLAST
vizualization
Parkinson J, et al. Bioinformatics, 2003Parkinson J, et al. Nat Genetics, 2004
Java/Perl-based application Display of relative similarity relationships Analysis of relative similarity relationships Based on raw bit score from BLAST output
Color scale of maximalBLAST scores for tiles
No match for114 ESTs
100 200150
100
250300
C. elegans
Parasitic nematodesNon-nematodes
169 (39.21%)
100 (23.20%) 191 (44.31%)
431 male ESTs 19
3 45
556
2
89
a. SimiTri: Male dataset
Color scale of maximalBLAST scores for tiles
No match for78 ESTs
100 200150
100
250300
C. elegans
Parasitic nematodesNon-nematodes
121 (45.6%)
102 (38.4%) 138 (52.1%)
265 female ESTs6
8 26
246
3
85
b. SimiTri : Female dataset
SimiTri results: T. vitrinus ESTs are closer to parasitic nematodes and C. elegans than to other non-nematode organisms.
1776 ESTs
Analysis of individual ESTs using BLAST
1776 ESTs
Analysis using semi-automated approach via ESTExplorer
Slow (took several weeks)
BLAST results are the only evidence for functional assignment
Peripheral annotation
Fast (took few minutes)
Multiple evidences for annotationsupported by GO, InterProScan and Pathway Mapping
In depth annotation
BLAST vs. ESTExplorer ESTExplorer reliably and rapidly annotated 301 ESTs, with pathway and GO information, eliminating 60 low quality hits from database searches.
Secreted protein analysis
Number of putative secreted proteins : 40
Proteases
Ion channels
Protease inhibitors
Signalling molecules
Immune-response related genes
EST contig/
singletons
Seq
Length ( in aa)
Homology
(Wormpep)
RNAi phenotype (Wormbase)
Gene Ontology Mammalian
homolog
Secreted
Protein
Tvmale_Contig9
113 Translation initiation factor 3, subunit f (eIF-3f)
embryonic lethal (Emb)
larval arrest (Lva)
sterile progeny (Stp)
slow growth (Gro)
GO:0003743:translation initiation factor activity
NO YES
Tvfemale_Contig105
115 pbs-2 - (Proteasome Beta Subunit)
embryonic lethal (Emb) locomotion abnormal
larval arrest (Lva)
maternal sterile
larval lethal (Let)
GO:0005839: proteasome core
GO:0006511 : ubiquitin-dependent protein catabolism
GO:0008233 : peptidase activity
GO:0004175 : endopeptidase activity
YES
(weakly similar)
YES
Tvmale 04_F02 96 asb-2 - (ATP Synthase B homolog)
embryonic lethal (Emb)
larval arrest (Lva)
sterile progeny (Stp)
slow growth (Gro)
maternal sterile
GO:0046933 :ATP synthase activity
YES
(weakly similar)
YES
Tvmale 02_C01 136 RNA splicing factor - Slu7p
embryonic lethal (Emb)
early emb (Emb)
molt defect (Mlt
adult early lethal (Adl)
larval arrest (Lva)
GO:0006375: nuclear mRNA splicing
NO YES
Candidate target genes in Trichostrongylus vitrinus
EST ID E-value BLAST results
TVm02_C07
2.00E-37
PP1-gamma serine/threonine protein phosphatase
Trichostrongylus vitrinus (Nematoda: Strongylida): Molecular characterization and transcriptional analysis of Tv-stp-1, a serine/threonine phosphatase gene. Hu M, Abs El-Osta YG, Campbell BE, Boag PR, Nisbet AJ, Beveridge I, Gasser RB. Exp Parasitol. 2007 Mar 24;
Manual annotation using BLAST
Results from BLAST vs. ESTExplorer
Results from BLAST vs. ESTExplorer
EST ID E-value BLAST results BLAST results E-value Gene OntologiesMetabolic
Pathway MappingDomain/Motif data
TVm02_C07
2.00E-37
PP1-gamma serine/threonine protein phosphatase
protein phosphatase catalyticgamma isoform isoform 1 1.00E-36
chromatin modification, protein amino acid dephosphorylation, embryonic cleavage, cytokinesis, meiosis, oviposition, manganese ion binding, protein phosphatase type 1 activity, mitochondrial outer membrane, protein binding, mitosis, glycogen metabolic process, iron ion binding, nucleus
Long-term potentiation, Regulation of actin cytoskeleton, Focal adhesion, Insulin signaling pathway
Metallophosphoesterase, Serine/threonine-specific protein phosphatase and bis(5-nucleosyl)-tetraphosphatase
Trichostrongylus vitrinus (Nematoda: Strongylida): Molecular characterization and transcriptional analysis of Tv-stp-1, a serine/threonine phosphatase gene. Hu M, Abs El-Osta YG, Campbell BE, Boag PR, Nisbet AJ, Beveridge I, Gasser RB. Exp Parasitol. 2007 Mar 24;
Manual annotation using BLAST
Annotations obtained automatically from ESTExplorer
BLAST results E-value Gene OntologiesMetabolic Pathway Mapping
Domain/Motif data
protein phosphatase catalytic gamma isoform isoform 1 1.00E-36
chromatin modification, protein amino acid dephosphorylation, embryonic cleavage, cytokinesis, meiosis, oviposition, manganese ion binding, protein phosphatase type 1 activity, mitochondrial outer membrane, protein binding, mitosis, glycogen metabolic process, iron ion binding, nucleus
Long-term potentiation, Regulation of actin cytoskeleton, Focal adhesion, Insulin signaling pathway
Metallophosphoesterase, Serine/threonine-specific protein phosphatase and bis(5-nucleosyl)-tetraphosphatase
Annotations obtained automatically from ESTExplorer
ESTExplorer : applications so far ..1. In silico analysis of expressed sequence tags (EST) from Trichostrongylus
vitrinus (Nematoda): comparison of the automated ESTExplorer workflow platform with database searches. Nagaraj SH, Gasser RB, Ranganathan S.
2. A transcriptomic analysis of the adult stage of the bovine lungworm, Dictyocaulus viviparus. Ranganathan S, Nagaraj SH, Hu M, Strube C, Schnieder T and Gasser RB. BMC Genomics, 2007, accepted
3. Gender-enriched transcripts in adult Haemonchus contortus (Nematoda) – predicted functions and genetic interactions based on comparative analyses with Caenorhabditis elegans. Campbell BE, Nagaraj SH, Hu M, Zhong W, Sternberg PW, Ong EK, Loukas A, Ranganathan S, Beveridge A and Robin B. Gasser.
4. Transcriptional changes in the third-stage larva of Ancylostoma caninum (Nematoda) following in vitro serumstimulation, employing a suppressive-subtractive hybridisation-based microarray approach. Datu BJD, Gasser RB, Nagaraj SH, Eng K. Onge, O’Donoghue P, McInnes R, Ranganathan S and Loukas A
5. Trichostrongylus vitrinus (Nematoda: Strongylida): Molecular characterization and transcriptional analysis of Tv-stp-1, a serine/threonine phosphatase gene. Hu M, Abs El-Osta YG, Campbell BE, Boag PR, Nisbet AJ, Beveridge I, Gasser RB. Exp Parasitol. 2007, accepted
Ref papers
Acknowledgements
Prof. Robin Gasser (University of Melbourne)
Genetics Technologies Pty. Ltd.
Australian Research Council LINKAGE PROJECT (LP0667795)
M41 family metalloproteasemitochondrial membrane proteinase : SchistosomaPathogenesis related protein similar to helminth venom allergen homologues :Schistosoma
Some more examples of secreted proteins