36
In silico analysis of expressed sequence tags (EST) from Trichostrongylus vitrinus (Nematoda): comparison of the automated ESTExplorer workflow platform with database searches. Shivashankar H. Nagaraj and Shoba Ranganathan Professor and Chair – Bioinformatics Biotechnology Research Institute and Adjunct Professor Dept. of Chemistry & Biomolecular SciencesDept. of Biochemistry Macquarie University National University of Singapore Sydney, Australia Singapore ([email protected]) ([email protected])

In silico analysis of expressed sequence tags (EST) from Trichostrongylus vitrinus (Nematoda): comparison of the automated ESTExplorer workflow platform

Embed Size (px)

Citation preview

Page 1: In silico analysis of expressed sequence tags (EST) from Trichostrongylus vitrinus (Nematoda): comparison of the automated ESTExplorer workflow platform

In silico analysis of expressed sequence tags (EST) from Trichostrongylus vitrinus

(Nematoda): comparison of the automated ESTExplorer workflow platform with database

searches.

Shivashankar H. Nagaraj and Shoba Ranganathan

Professor and Chair – Bioinformatics Biotechnology Research Institute and Adjunct Professor Dept. of Chemistry & Biomolecular Sciences Dept. of BiochemistryMacquarie University National University of SingaporeSydney, Australia Singapore([email protected]) ([email protected])

Page 2: In silico analysis of expressed sequence tags (EST) from Trichostrongylus vitrinus (Nematoda): comparison of the automated ESTExplorer workflow platform

Expressed Sequence Tags (ESTs)

Unedited, short, single pass sequences generated from 5' or 3' end of randomly selected cDNA libraries in desired cells/tissues/organ.

Length: 200-700 bp (average 360 bp) Can be quickly generated at low cost

(“poor-man’s genome”) EST data is highly fragmented EST annotations have very little

biological information High-throughput in nature

Page 3: In silico analysis of expressed sequence tags (EST) from Trichostrongylus vitrinus (Nematoda): comparison of the automated ESTExplorer workflow platform

EST Applications Gene Discovery Gene Structure Prediction Expression Maps Alternative Splicing Identification and characterization of SNPs Gene expression studies

tissue or disease specific developmental stage

Proteomics (for example peptide mass fingerprinting)

Identification of drug and vaccine candidates

Page 4: In silico analysis of expressed sequence tags (EST) from Trichostrongylus vitrinus (Nematoda): comparison of the automated ESTExplorer workflow platform

Genomic DNA

5’ ESTs 3’ ESTs

mRNA

cDNA

ESTs

vector repeats high quality sequence vector

High quality ~ 1-50 bp

~ 50 - 500 bp

~ 500- 700 bp

An EST sequence

Properties of ESTs

Page 5: In silico analysis of expressed sequence tags (EST) from Trichostrongylus vitrinus (Nematoda): comparison of the automated ESTExplorer workflow platform

EST data resources

Available in plenty Several dedicated databases Fragmented Quality dubious

Need cleaning Clustering Annotation!

Page 6: In silico analysis of expressed sequence tags (EST) from Trichostrongylus vitrinus (Nematoda): comparison of the automated ESTExplorer workflow platform

EST data repositoriesdbEST release 061507 (June, 2007) www.ncbi.nlm.nih.gov/dbEST/

43,396,096 ESTs from 659 different organisms

Homo sapiens (human) 8,119,106Mus musculus (mouse) 4,850,243 Danio rerio (zebrafish) 1,350,105 Bos taurus (cattle) 1,318,208 Arabidopsis thaliana (thale cress) 1,276,692 Xenopus tropicalis 1,271,375 Oryza sativa (rice) 1,211,418 Zea mays (maize) 1,161,241 Triticum aestivum (wheat) 1,050,267

Page 7: In silico analysis of expressed sequence tags (EST) from Trichostrongylus vitrinus (Nematoda): comparison of the automated ESTExplorer workflow platform
Page 8: In silico analysis of expressed sequence tags (EST) from Trichostrongylus vitrinus (Nematoda): comparison of the automated ESTExplorer workflow platform

Overview of EST sequence analysis

Submit Data

Raw EST sequence

data

Visualize results

Gene annotationRNAi

Gene mapping Alternative splicing

SNPs

Contamination check

Vector clipping

Poly-A removal

Repeat Masking

Clustering

Assembly

Consensus generation Peptide annotationProtein interactorsGene Ontologies

KEGG Conceptual translation

Page 9: In silico analysis of expressed sequence tags (EST) from Trichostrongylus vitrinus (Nematoda): comparison of the automated ESTExplorer workflow platform

Evolution of ESTExplorer

Page 10: In silico analysis of expressed sequence tags (EST) from Trichostrongylus vitrinus (Nematoda): comparison of the automated ESTExplorer workflow platform

Comparison of current methods for EST analysis

Critical evaluation of contemporary toolsand EST analysis pipelines

Benchmarking of tools using EST datasets

Lack of downstream functional annotation at DNA and protein levels

ESTExplorer

Page 11: In silico analysis of expressed sequence tags (EST) from Trichostrongylus vitrinus (Nematoda): comparison of the automated ESTExplorer workflow platform

Description of ESTExplorer

Page 12: In silico analysis of expressed sequence tags (EST) from Trichostrongylus vitrinus (Nematoda): comparison of the automated ESTExplorer workflow platform

ESTExplorer – features

Suite of programs to pre-process, assemble and functionally annotate ESTs

User-defined input and analysis – parameter control

Species-specific analysis Input: ESTs or assembled contigs Output: Assembled ESTs, Gene Ontologies,

mapping to Domains/Motifs, Pathway mapping

Page 13: In silico analysis of expressed sequence tags (EST) from Trichostrongylus vitrinus (Nematoda): comparison of the automated ESTExplorer workflow platform

ESTExplorer analysis and annotation workflow, showing Phase I (pre-processing and assembly), Phase II (nucleotide-level annotation) and Phase III (protein-level annotation).

Phase I (EST pre-processing)

Phase II (DNA level Annotation)

Phase III (Protein level Annotation)

BLASTX

BLAST2GO

ESTScan

InterProScan KOBAS

Short sequences removed from the analysis

Input Option 2assembled ESTs

Input Option 1EST sequences

SeqClean

RepeatMasker

CAP3

Final output: Annotation summary for assembled ESTs

Quality values(.qual)

Assembled ESTs

Workflow

Page 14: In silico analysis of expressed sequence tags (EST) from Trichostrongylus vitrinus (Nematoda): comparison of the automated ESTExplorer workflow platform

estexplorer.biolinfo.org

Page 15: In silico analysis of expressed sequence tags (EST) from Trichostrongylus vitrinus (Nematoda): comparison of the automated ESTExplorer workflow platform

Annotation summary page

Page 16: In silico analysis of expressed sequence tags (EST) from Trichostrongylus vitrinus (Nematoda): comparison of the automated ESTExplorer workflow platform

Trichostrongylus vitrinus (order Strongylida) is a parasitic nematode.

Principal causative nematode associated with parasitic diseases in sheep and cattle

Current treatment for the disease : chemotherapeutic agents (anti-helmintics)

Disadvantages with current treatments: a. Expensive and only partially effectiveb. Anthelmintics drug resistance over the last decadec. Residue problems in meat and milk

Possible alternative: the development of anti-parasite drugs and/or vaccines Nisbet AJ, et al. Int J Parasitol, 2004

The worm in question

Page 17: In silico analysis of expressed sequence tags (EST) from Trichostrongylus vitrinus (Nematoda): comparison of the automated ESTExplorer workflow platform

Phase I

Phase II

Phase III

Creation of cDNA libraries and EST generation from the parasite Trichostrongylus vitrinus

Subset of potential drug target genes

Isolation of full length genes Functional Genomics via RNAi Biochemical activity assays Proteomics

Phase IV

Virtual and High-throughput screening

Pre-clinical and clinical evaluation

Comparative genomics with nematodes

Bioinformatics Analysis of the ESTs

Categorization of Differentially expressed ESTs

Page 18: In silico analysis of expressed sequence tags (EST) from Trichostrongylus vitrinus (Nematoda): comparison of the automated ESTExplorer workflow platform

Raw ESTsmale: 910female: 866

EST pre-processing(SeqClean & RepMasker)

male:902 female:857

EST clustering and assembly(CAP3)

male contigs:180; singletons: 251female contigs:143; singletons:122

Conceptual translation(ESTSCAN)

peptide sequencesmale : 400

female: 240

Gene OntologiesBLAST2GO

male: 134female:133

Locate RNAi phenotype from C. elegans(BLASTX against Wormpep)

Phase I: EST pre-processing

Phase II:DNA level annotation

Database similarity searches for locating mammalian homologues

(BLASTX against NR)

Database similarity searches for locating parasitic nematode homologues

(BLASTX)

Database similarity searches againstNR and Wormpep

(BLASTX) for updating Nisbet et al results.

EST analysis schema

Page 19: In silico analysis of expressed sequence tags (EST) from Trichostrongylus vitrinus (Nematoda): comparison of the automated ESTExplorer workflow platform

EST analysis schema

Secretome analysis(SignalP, TMHMM,

PSORT)Domain/Motif analysis

(InterProScan)Pathway Mapping

(KOBAS)

Phase III: Protein level Annotation

male: 28female: 12

male: 141female:120

male: 120female: 110

Page 20: In silico analysis of expressed sequence tags (EST) from Trichostrongylus vitrinus (Nematoda): comparison of the automated ESTExplorer workflow platform

Results of overall EST analysis

Number of ESTs analysed : 1776 ( male : 910 female : 866)

Caenorhabditis elegans homologues 290 (41%) Homologues in parasitic nematodes 329 (42%) Homologues in non-nematodes 202 (28%) No significant match to any sequence 218 (31%) in the current databases

Gene Ontologies (GO) assigned 267 (38%) Pathway associations established 230 (33%)

Of the C. elegans homologues, 90 entries had observed ‘non-wildtype’ RNAi phenotypes, including embryonic lethality, maternal sterility, sterile progeny, larval arrest and slow growth.

Page 21: In silico analysis of expressed sequence tags (EST) from Trichostrongylus vitrinus (Nematoda): comparison of the automated ESTExplorer workflow platform

EST ID E-value BLAST results

TVm02_C07

2.00E-37

PP1-gamma serine/threonine protein phosphatase

Trichostrongylus vitrinus (Nematoda: Strongylida): Molecular characterization and transcriptional analysis of Tv-stp-1, a serine/threonine phosphatase gene. Hu M, Abs El-Osta YG, Campbell BE, Boag PR, Nisbet AJ, Beveridge I, Gasser RB. Exp Parasitol. 2007 Mar 24;

Manual annotation using BLAST

Results from BLAST vs. ESTExplorer

Page 22: In silico analysis of expressed sequence tags (EST) from Trichostrongylus vitrinus (Nematoda): comparison of the automated ESTExplorer workflow platform

Results from BLAST vs. ESTExplorer

EST ID E-value BLAST results BLAST results E-value Gene OntologiesMetabolic

Pathway MappingDomain/Motif data

TVm02_C07

2.00E-37

PP1-gamma serine/threonine protein phosphatase

protein phosphatase catalyticgamma isoform isoform 1 1.00E-36

chromatin modification, protein amino acid dephosphorylation, embryonic cleavage, cytokinesis, meiosis, oviposition, manganese ion binding, protein phosphatase type 1 activity, mitochondrial outer membrane, protein binding, mitosis, glycogen metabolic process, iron ion binding, nucleus

Long-term potentiation, Regulation of actin cytoskeleton, Focal adhesion, Insulin signaling pathway

Metallophosphoesterase, Serine/threonine-specific protein phosphatase and bis(5-nucleosyl)-tetraphosphatase

Trichostrongylus vitrinus (Nematoda: Strongylida): Molecular characterization and transcriptional analysis of Tv-stp-1, a serine/threonine phosphatase gene. Hu M, Abs El-Osta YG, Campbell BE, Boag PR, Nisbet AJ, Beveridge I, Gasser RB. Exp Parasitol. 2007 Mar 24;

Manual annotation using BLAST

Annotations obtained automatically from ESTExplorer

BLAST results E-value Gene OntologiesMetabolic Pathway Mapping

Domain/Motif data

protein phosphatase catalytic gamma isoform isoform 1 1.00E-36

chromatin modification, protein amino acid dephosphorylation, embryonic cleavage, cytokinesis, meiosis, oviposition, manganese ion binding, protein phosphatase type 1 activity, mitochondrial outer membrane, protein binding, mitosis, glycogen metabolic process, iron ion binding, nucleus

Long-term potentiation, Regulation of actin cytoskeleton, Focal adhesion, Insulin signaling pathway

Metallophosphoesterase, Serine/threonine-specific protein phosphatase and bis(5-nucleosyl)-tetraphosphatase

Annotations obtained automatically from ESTExplorer

Page 23: In silico analysis of expressed sequence tags (EST) from Trichostrongylus vitrinus (Nematoda): comparison of the automated ESTExplorer workflow platform

Redefining parameters for possible drug/vaccine targets in parasitic nematodes

Secreted Proteins

Strong RNAi phenotypes in C. elegans Absence of homologues in mammalian host (nematode specific genes)

Genes with specificity to nematodes may serve as excellent targets for drugs/vaccines with low toxicity to humans and other vertebrates. Better understanding of the unusual nematode biochemistry can also have industrial or therapeutic value.

Embryonic lethality Larval lethality Sterile progeny Larval arrest Maternal sterility Slow growth

Parasites must secrete biologically active mediators to manipulate the host environment in order to survive immune attack Inhibit host antigen-processing pathwaysExamples :

• Aspartyl protease inhibitor (API-1)• Cystatin (cysteine protease inhibitor)• Acetylcholinesterase (AChE)

Harcus YM, et al. Genome Biol, 2004Delaney A, et al. Int J Parasitol 2005Vanholme B, et al. Gene 2004

Page 24: In silico analysis of expressed sequence tags (EST) from Trichostrongylus vitrinus (Nematoda): comparison of the automated ESTExplorer workflow platform

6 5589

3

Non-nematodes100 (23.20%) 191 (44.31%)

45

C. elegans169 (39.21%)

19

2

Parasitic nematodes

Venn diagram

T. vitrinus male EST data comparison

6 2485

8Non-nematodes

26

C. elegans

6

3

121 (45.6%)

102 (38.4%) 138 (52.1%)Parasitic nematodes

Venn diagram

T. vitrinus female EST data comparison

Page 25: In silico analysis of expressed sequence tags (EST) from Trichostrongylus vitrinus (Nematoda): comparison of the automated ESTExplorer workflow platform

SimiTri provides a two-dimensional display of relative similarity relationships among three different datasets.

SimiTri : visualizing similarity relationships for groups of sequences Database 1

Database 2Database 3

Query dataset (EST sequences in this study)

BLAST

vizualization

Parkinson J, et al. Bioinformatics, 2003Parkinson J, et al. Nat Genetics, 2004

Java/Perl-based application Display of relative similarity relationships Analysis of relative similarity relationships Based on raw bit score from BLAST output

Page 26: In silico analysis of expressed sequence tags (EST) from Trichostrongylus vitrinus (Nematoda): comparison of the automated ESTExplorer workflow platform

Color scale of maximalBLAST scores for tiles

No match for114 ESTs

100 200150

100

250300

C. elegans

Parasitic nematodesNon-nematodes

169 (39.21%)

100 (23.20%) 191 (44.31%)

431 male ESTs 19

3 45

556

2

89

a. SimiTri: Male dataset

Color scale of maximalBLAST scores for tiles

No match for78 ESTs

100 200150

100

250300

C. elegans

Parasitic nematodesNon-nematodes

121 (45.6%)

102 (38.4%) 138 (52.1%)

265 female ESTs6

8 26

246

3

85

b. SimiTri : Female dataset

SimiTri results: T. vitrinus ESTs are closer to parasitic nematodes and C. elegans than to other non-nematode organisms.

Page 27: In silico analysis of expressed sequence tags (EST) from Trichostrongylus vitrinus (Nematoda): comparison of the automated ESTExplorer workflow platform

1776 ESTs

Analysis of individual ESTs using BLAST

1776 ESTs

Analysis using semi-automated approach via ESTExplorer

Slow (took several weeks)

BLAST results are the only evidence for functional assignment

Peripheral annotation

Fast (took few minutes)

Multiple evidences for annotationsupported by GO, InterProScan and Pathway Mapping

In depth annotation

BLAST vs. ESTExplorer ESTExplorer reliably and rapidly annotated 301 ESTs, with pathway and GO information, eliminating 60 low quality hits from database searches.

Page 28: In silico analysis of expressed sequence tags (EST) from Trichostrongylus vitrinus (Nematoda): comparison of the automated ESTExplorer workflow platform

Secreted protein analysis

Number of putative secreted proteins : 40

Proteases

Ion channels

Protease inhibitors

Signalling molecules

Immune-response related genes

Page 29: In silico analysis of expressed sequence tags (EST) from Trichostrongylus vitrinus (Nematoda): comparison of the automated ESTExplorer workflow platform

EST contig/

singletons

Seq

Length ( in aa)

Homology

(Wormpep)

RNAi phenotype (Wormbase)

Gene Ontology Mammalian

homolog

Secreted

Protein

Tvmale_Contig9

113 Translation initiation factor 3, subunit f (eIF-3f)

embryonic lethal (Emb)

larval arrest (Lva)

sterile progeny (Stp)

slow growth (Gro)

GO:0003743:translation initiation factor activity

NO YES

Tvfemale_Contig105

115 pbs-2 - (Proteasome Beta Subunit)

embryonic lethal (Emb) locomotion abnormal

larval arrest (Lva)

maternal sterile

larval lethal (Let)

GO:0005839: proteasome core

GO:0006511 : ubiquitin-dependent protein catabolism

GO:0008233 : peptidase activity

GO:0004175 : endopeptidase activity

YES

(weakly similar)

YES

Tvmale 04_F02 96 asb-2 - (ATP Synthase B homolog)

embryonic lethal (Emb)

larval arrest (Lva)

sterile progeny (Stp)

slow growth (Gro)

maternal sterile

GO:0046933 :ATP synthase activity

YES

(weakly similar)

YES

Tvmale 02_C01 136 RNA splicing factor - Slu7p

embryonic lethal (Emb)

early emb (Emb)

molt defect (Mlt

adult early lethal (Adl)

larval arrest (Lva)

GO:0006375: nuclear mRNA splicing

NO YES

Candidate target genes in Trichostrongylus vitrinus

Page 30: In silico analysis of expressed sequence tags (EST) from Trichostrongylus vitrinus (Nematoda): comparison of the automated ESTExplorer workflow platform

EST ID E-value BLAST results

TVm02_C07

2.00E-37

PP1-gamma serine/threonine protein phosphatase

Trichostrongylus vitrinus (Nematoda: Strongylida): Molecular characterization and transcriptional analysis of Tv-stp-1, a serine/threonine phosphatase gene. Hu M, Abs El-Osta YG, Campbell BE, Boag PR, Nisbet AJ, Beveridge I, Gasser RB. Exp Parasitol. 2007 Mar 24;

Manual annotation using BLAST

Results from BLAST vs. ESTExplorer

Page 31: In silico analysis of expressed sequence tags (EST) from Trichostrongylus vitrinus (Nematoda): comparison of the automated ESTExplorer workflow platform

Results from BLAST vs. ESTExplorer

EST ID E-value BLAST results BLAST results E-value Gene OntologiesMetabolic

Pathway MappingDomain/Motif data

TVm02_C07

2.00E-37

PP1-gamma serine/threonine protein phosphatase

protein phosphatase catalyticgamma isoform isoform 1 1.00E-36

chromatin modification, protein amino acid dephosphorylation, embryonic cleavage, cytokinesis, meiosis, oviposition, manganese ion binding, protein phosphatase type 1 activity, mitochondrial outer membrane, protein binding, mitosis, glycogen metabolic process, iron ion binding, nucleus

Long-term potentiation, Regulation of actin cytoskeleton, Focal adhesion, Insulin signaling pathway

Metallophosphoesterase, Serine/threonine-specific protein phosphatase and bis(5-nucleosyl)-tetraphosphatase

Trichostrongylus vitrinus (Nematoda: Strongylida): Molecular characterization and transcriptional analysis of Tv-stp-1, a serine/threonine phosphatase gene. Hu M, Abs El-Osta YG, Campbell BE, Boag PR, Nisbet AJ, Beveridge I, Gasser RB. Exp Parasitol. 2007 Mar 24;

Manual annotation using BLAST

Annotations obtained automatically from ESTExplorer

BLAST results E-value Gene OntologiesMetabolic Pathway Mapping

Domain/Motif data

protein phosphatase catalytic gamma isoform isoform 1 1.00E-36

chromatin modification, protein amino acid dephosphorylation, embryonic cleavage, cytokinesis, meiosis, oviposition, manganese ion binding, protein phosphatase type 1 activity, mitochondrial outer membrane, protein binding, mitosis, glycogen metabolic process, iron ion binding, nucleus

Long-term potentiation, Regulation of actin cytoskeleton, Focal adhesion, Insulin signaling pathway

Metallophosphoesterase, Serine/threonine-specific protein phosphatase and bis(5-nucleosyl)-tetraphosphatase

Annotations obtained automatically from ESTExplorer

Page 32: In silico analysis of expressed sequence tags (EST) from Trichostrongylus vitrinus (Nematoda): comparison of the automated ESTExplorer workflow platform

ESTExplorer : applications so far ..1. In silico analysis of expressed sequence tags (EST) from Trichostrongylus

vitrinus (Nematoda): comparison of the automated ESTExplorer workflow platform with database searches. Nagaraj SH, Gasser RB, Ranganathan S.

2. A transcriptomic analysis of the adult stage of the bovine lungworm, Dictyocaulus viviparus. Ranganathan S, Nagaraj SH, Hu M, Strube C, Schnieder T and Gasser RB. BMC Genomics, 2007, accepted

3. Gender-enriched transcripts in adult Haemonchus contortus (Nematoda) – predicted functions and genetic interactions based on comparative analyses with Caenorhabditis elegans. Campbell BE, Nagaraj SH, Hu M, Zhong W, Sternberg PW, Ong EK, Loukas A, Ranganathan S, Beveridge A and Robin B. Gasser.

4. Transcriptional changes in the third-stage larva of Ancylostoma caninum (Nematoda) following in vitro serumstimulation, employing a suppressive-subtractive hybridisation-based microarray approach. Datu BJD, Gasser RB, Nagaraj SH, Eng K. Onge, O’Donoghue P, McInnes R, Ranganathan S and Loukas A

5. Trichostrongylus vitrinus (Nematoda: Strongylida): Molecular characterization and transcriptional analysis of Tv-stp-1, a serine/threonine phosphatase gene. Hu M, Abs El-Osta YG, Campbell BE, Boag PR, Nisbet AJ, Beveridge I, Gasser RB. Exp Parasitol. 2007, accepted

Page 33: In silico analysis of expressed sequence tags (EST) from Trichostrongylus vitrinus (Nematoda): comparison of the automated ESTExplorer workflow platform

Ref papers

Page 34: In silico analysis of expressed sequence tags (EST) from Trichostrongylus vitrinus (Nematoda): comparison of the automated ESTExplorer workflow platform

Acknowledgements

Prof. Robin Gasser (University of Melbourne)

Genetics Technologies Pty. Ltd.

Australian Research Council LINKAGE PROJECT (LP0667795)

Page 35: In silico analysis of expressed sequence tags (EST) from Trichostrongylus vitrinus (Nematoda): comparison of the automated ESTExplorer workflow platform
Page 36: In silico analysis of expressed sequence tags (EST) from Trichostrongylus vitrinus (Nematoda): comparison of the automated ESTExplorer workflow platform

M41 family metalloproteasemitochondrial membrane proteinase : SchistosomaPathogenesis related protein similar to helminth venom allergen homologues :Schistosoma

Some more examples of secreted proteins