In silico analysis of expressed sequence tags (EST) from Trichostrongylus vitrinus (Nematoda):...

Preview:

Citation preview

In silico analysis of expressed sequence tags (EST) from Trichostrongylus vitrinus

(Nematoda): comparison of the automated ESTExplorer workflow platform with database

searches.

Shivashankar H. Nagaraj and Shoba Ranganathan

Professor and Chair – Bioinformatics Biotechnology Research Institute and Adjunct Professor Dept. of Chemistry & Biomolecular Sciences Dept. of BiochemistryMacquarie University National University of SingaporeSydney, Australia Singapore(shoba.ranganathan@mq.edu.au) (shoba@bic.nus.edu.sg)

Expressed Sequence Tags (ESTs)

Unedited, short, single pass sequences generated from 5' or 3' end of randomly selected cDNA libraries in desired cells/tissues/organ.

Length: 200-700 bp (average 360 bp) Can be quickly generated at low cost

(“poor-man’s genome”) EST data is highly fragmented EST annotations have very little

biological information High-throughput in nature

EST Applications Gene Discovery Gene Structure Prediction Expression Maps Alternative Splicing Identification and characterization of SNPs Gene expression studies

tissue or disease specific developmental stage

Proteomics (for example peptide mass fingerprinting)

Identification of drug and vaccine candidates

Genomic DNA

5’ ESTs 3’ ESTs

mRNA

cDNA

ESTs

vector repeats high quality sequence vector

High quality ~ 1-50 bp

~ 50 - 500 bp

~ 500- 700 bp

An EST sequence

Properties of ESTs

EST data resources

Available in plenty Several dedicated databases Fragmented Quality dubious

Need cleaning Clustering Annotation!

EST data repositoriesdbEST release 061507 (June, 2007) www.ncbi.nlm.nih.gov/dbEST/

43,396,096 ESTs from 659 different organisms

Homo sapiens (human) 8,119,106Mus musculus (mouse) 4,850,243 Danio rerio (zebrafish) 1,350,105 Bos taurus (cattle) 1,318,208 Arabidopsis thaliana (thale cress) 1,276,692 Xenopus tropicalis 1,271,375 Oryza sativa (rice) 1,211,418 Zea mays (maize) 1,161,241 Triticum aestivum (wheat) 1,050,267

Overview of EST sequence analysis

Submit Data

Raw EST sequence

data

Visualize results

Gene annotationRNAi

Gene mapping Alternative splicing

SNPs

Contamination check

Vector clipping

Poly-A removal

Repeat Masking

Clustering

Assembly

Consensus generation Peptide annotationProtein interactorsGene Ontologies

KEGG Conceptual translation

Evolution of ESTExplorer

Comparison of current methods for EST analysis

Critical evaluation of contemporary toolsand EST analysis pipelines

Benchmarking of tools using EST datasets

Lack of downstream functional annotation at DNA and protein levels

ESTExplorer

Description of ESTExplorer

ESTExplorer – features

Suite of programs to pre-process, assemble and functionally annotate ESTs

User-defined input and analysis – parameter control

Species-specific analysis Input: ESTs or assembled contigs Output: Assembled ESTs, Gene Ontologies,

mapping to Domains/Motifs, Pathway mapping

ESTExplorer analysis and annotation workflow, showing Phase I (pre-processing and assembly), Phase II (nucleotide-level annotation) and Phase III (protein-level annotation).

Phase I (EST pre-processing)

Phase II (DNA level Annotation)

Phase III (Protein level Annotation)

BLASTX

BLAST2GO

ESTScan

InterProScan KOBAS

Short sequences removed from the analysis

Input Option 2assembled ESTs

Input Option 1EST sequences

SeqClean

RepeatMasker

CAP3

Final output: Annotation summary for assembled ESTs

Quality values(.qual)

Assembled ESTs

Workflow

estexplorer.biolinfo.org

Annotation summary page

Trichostrongylus vitrinus (order Strongylida) is a parasitic nematode.

Principal causative nematode associated with parasitic diseases in sheep and cattle

Current treatment for the disease : chemotherapeutic agents (anti-helmintics)

Disadvantages with current treatments: a. Expensive and only partially effectiveb. Anthelmintics drug resistance over the last decadec. Residue problems in meat and milk

Possible alternative: the development of anti-parasite drugs and/or vaccines Nisbet AJ, et al. Int J Parasitol, 2004

The worm in question

Phase I

Phase II

Phase III

Creation of cDNA libraries and EST generation from the parasite Trichostrongylus vitrinus

Subset of potential drug target genes

Isolation of full length genes Functional Genomics via RNAi Biochemical activity assays Proteomics

Phase IV

Virtual and High-throughput screening

Pre-clinical and clinical evaluation

Comparative genomics with nematodes

Bioinformatics Analysis of the ESTs

Categorization of Differentially expressed ESTs

Raw ESTsmale: 910female: 866

EST pre-processing(SeqClean & RepMasker)

male:902 female:857

EST clustering and assembly(CAP3)

male contigs:180; singletons: 251female contigs:143; singletons:122

Conceptual translation(ESTSCAN)

peptide sequencesmale : 400

female: 240

Gene OntologiesBLAST2GO

male: 134female:133

Locate RNAi phenotype from C. elegans(BLASTX against Wormpep)

Phase I: EST pre-processing

Phase II:DNA level annotation

Database similarity searches for locating mammalian homologues

(BLASTX against NR)

Database similarity searches for locating parasitic nematode homologues

(BLASTX)

Database similarity searches againstNR and Wormpep

(BLASTX) for updating Nisbet et al results.

EST analysis schema

EST analysis schema

Secretome analysis(SignalP, TMHMM,

PSORT)Domain/Motif analysis

(InterProScan)Pathway Mapping

(KOBAS)

Phase III: Protein level Annotation

male: 28female: 12

male: 141female:120

male: 120female: 110

Results of overall EST analysis

Number of ESTs analysed : 1776 ( male : 910 female : 866)

Caenorhabditis elegans homologues 290 (41%) Homologues in parasitic nematodes 329 (42%) Homologues in non-nematodes 202 (28%) No significant match to any sequence 218 (31%) in the current databases

Gene Ontologies (GO) assigned 267 (38%) Pathway associations established 230 (33%)

Of the C. elegans homologues, 90 entries had observed ‘non-wildtype’ RNAi phenotypes, including embryonic lethality, maternal sterility, sterile progeny, larval arrest and slow growth.

EST ID E-value BLAST results

TVm02_C07

2.00E-37

PP1-gamma serine/threonine protein phosphatase

Trichostrongylus vitrinus (Nematoda: Strongylida): Molecular characterization and transcriptional analysis of Tv-stp-1, a serine/threonine phosphatase gene. Hu M, Abs El-Osta YG, Campbell BE, Boag PR, Nisbet AJ, Beveridge I, Gasser RB. Exp Parasitol. 2007 Mar 24;

Manual annotation using BLAST

Results from BLAST vs. ESTExplorer

Results from BLAST vs. ESTExplorer

EST ID E-value BLAST results BLAST results E-value Gene OntologiesMetabolic

Pathway MappingDomain/Motif data

TVm02_C07

2.00E-37

PP1-gamma serine/threonine protein phosphatase

protein phosphatase catalyticgamma isoform isoform 1 1.00E-36

chromatin modification, protein amino acid dephosphorylation, embryonic cleavage, cytokinesis, meiosis, oviposition, manganese ion binding, protein phosphatase type 1 activity, mitochondrial outer membrane, protein binding, mitosis, glycogen metabolic process, iron ion binding, nucleus

Long-term potentiation, Regulation of actin cytoskeleton, Focal adhesion, Insulin signaling pathway

Metallophosphoesterase, Serine/threonine-specific protein phosphatase and bis(5-nucleosyl)-tetraphosphatase

Trichostrongylus vitrinus (Nematoda: Strongylida): Molecular characterization and transcriptional analysis of Tv-stp-1, a serine/threonine phosphatase gene. Hu M, Abs El-Osta YG, Campbell BE, Boag PR, Nisbet AJ, Beveridge I, Gasser RB. Exp Parasitol. 2007 Mar 24;

Manual annotation using BLAST

Annotations obtained automatically from ESTExplorer

BLAST results E-value Gene OntologiesMetabolic Pathway Mapping

Domain/Motif data

protein phosphatase catalytic gamma isoform isoform 1 1.00E-36

chromatin modification, protein amino acid dephosphorylation, embryonic cleavage, cytokinesis, meiosis, oviposition, manganese ion binding, protein phosphatase type 1 activity, mitochondrial outer membrane, protein binding, mitosis, glycogen metabolic process, iron ion binding, nucleus

Long-term potentiation, Regulation of actin cytoskeleton, Focal adhesion, Insulin signaling pathway

Metallophosphoesterase, Serine/threonine-specific protein phosphatase and bis(5-nucleosyl)-tetraphosphatase

Annotations obtained automatically from ESTExplorer

Redefining parameters for possible drug/vaccine targets in parasitic nematodes

Secreted Proteins

Strong RNAi phenotypes in C. elegans Absence of homologues in mammalian host (nematode specific genes)

Genes with specificity to nematodes may serve as excellent targets for drugs/vaccines with low toxicity to humans and other vertebrates. Better understanding of the unusual nematode biochemistry can also have industrial or therapeutic value.

Embryonic lethality Larval lethality Sterile progeny Larval arrest Maternal sterility Slow growth

Parasites must secrete biologically active mediators to manipulate the host environment in order to survive immune attack Inhibit host antigen-processing pathwaysExamples :

• Aspartyl protease inhibitor (API-1)• Cystatin (cysteine protease inhibitor)• Acetylcholinesterase (AChE)

Harcus YM, et al. Genome Biol, 2004Delaney A, et al. Int J Parasitol 2005Vanholme B, et al. Gene 2004

6 5589

3

Non-nematodes100 (23.20%) 191 (44.31%)

45

C. elegans169 (39.21%)

19

2

Parasitic nematodes

Venn diagram

T. vitrinus male EST data comparison

6 2485

8Non-nematodes

26

C. elegans

6

3

121 (45.6%)

102 (38.4%) 138 (52.1%)Parasitic nematodes

Venn diagram

T. vitrinus female EST data comparison

SimiTri provides a two-dimensional display of relative similarity relationships among three different datasets.

SimiTri : visualizing similarity relationships for groups of sequences Database 1

Database 2Database 3

Query dataset (EST sequences in this study)

BLAST

vizualization

Parkinson J, et al. Bioinformatics, 2003Parkinson J, et al. Nat Genetics, 2004

Java/Perl-based application Display of relative similarity relationships Analysis of relative similarity relationships Based on raw bit score from BLAST output

Color scale of maximalBLAST scores for tiles

No match for114 ESTs

100 200150

100

250300

C. elegans

Parasitic nematodesNon-nematodes

169 (39.21%)

100 (23.20%) 191 (44.31%)

431 male ESTs 19

3 45

556

2

89

a. SimiTri: Male dataset

Color scale of maximalBLAST scores for tiles

No match for78 ESTs

100 200150

100

250300

C. elegans

Parasitic nematodesNon-nematodes

121 (45.6%)

102 (38.4%) 138 (52.1%)

265 female ESTs6

8 26

246

3

85

b. SimiTri : Female dataset

SimiTri results: T. vitrinus ESTs are closer to parasitic nematodes and C. elegans than to other non-nematode organisms.

1776 ESTs

Analysis of individual ESTs using BLAST

1776 ESTs

Analysis using semi-automated approach via ESTExplorer

Slow (took several weeks)

BLAST results are the only evidence for functional assignment

Peripheral annotation

Fast (took few minutes)

Multiple evidences for annotationsupported by GO, InterProScan and Pathway Mapping

In depth annotation

BLAST vs. ESTExplorer ESTExplorer reliably and rapidly annotated 301 ESTs, with pathway and GO information, eliminating 60 low quality hits from database searches.

Secreted protein analysis

Number of putative secreted proteins : 40

Proteases

Ion channels

Protease inhibitors

Signalling molecules

Immune-response related genes

EST contig/

singletons

Seq

Length ( in aa)

Homology

(Wormpep)

RNAi phenotype (Wormbase)

Gene Ontology Mammalian

homolog

Secreted

Protein

Tvmale_Contig9

113 Translation initiation factor 3, subunit f (eIF-3f)

embryonic lethal (Emb)

larval arrest (Lva)

sterile progeny (Stp)

slow growth (Gro)

GO:0003743:translation initiation factor activity

NO YES

Tvfemale_Contig105

115 pbs-2 - (Proteasome Beta Subunit)

embryonic lethal (Emb) locomotion abnormal

larval arrest (Lva)

maternal sterile

larval lethal (Let)

GO:0005839: proteasome core

GO:0006511 : ubiquitin-dependent protein catabolism

GO:0008233 : peptidase activity

GO:0004175 : endopeptidase activity

YES

(weakly similar)

YES

Tvmale 04_F02 96 asb-2 - (ATP Synthase B homolog)

embryonic lethal (Emb)

larval arrest (Lva)

sterile progeny (Stp)

slow growth (Gro)

maternal sterile

GO:0046933 :ATP synthase activity

YES

(weakly similar)

YES

Tvmale 02_C01 136 RNA splicing factor - Slu7p

embryonic lethal (Emb)

early emb (Emb)

molt defect (Mlt

adult early lethal (Adl)

larval arrest (Lva)

GO:0006375: nuclear mRNA splicing

NO YES

Candidate target genes in Trichostrongylus vitrinus

EST ID E-value BLAST results

TVm02_C07

2.00E-37

PP1-gamma serine/threonine protein phosphatase

Trichostrongylus vitrinus (Nematoda: Strongylida): Molecular characterization and transcriptional analysis of Tv-stp-1, a serine/threonine phosphatase gene. Hu M, Abs El-Osta YG, Campbell BE, Boag PR, Nisbet AJ, Beveridge I, Gasser RB. Exp Parasitol. 2007 Mar 24;

Manual annotation using BLAST

Results from BLAST vs. ESTExplorer

Results from BLAST vs. ESTExplorer

EST ID E-value BLAST results BLAST results E-value Gene OntologiesMetabolic

Pathway MappingDomain/Motif data

TVm02_C07

2.00E-37

PP1-gamma serine/threonine protein phosphatase

protein phosphatase catalyticgamma isoform isoform 1 1.00E-36

chromatin modification, protein amino acid dephosphorylation, embryonic cleavage, cytokinesis, meiosis, oviposition, manganese ion binding, protein phosphatase type 1 activity, mitochondrial outer membrane, protein binding, mitosis, glycogen metabolic process, iron ion binding, nucleus

Long-term potentiation, Regulation of actin cytoskeleton, Focal adhesion, Insulin signaling pathway

Metallophosphoesterase, Serine/threonine-specific protein phosphatase and bis(5-nucleosyl)-tetraphosphatase

Trichostrongylus vitrinus (Nematoda: Strongylida): Molecular characterization and transcriptional analysis of Tv-stp-1, a serine/threonine phosphatase gene. Hu M, Abs El-Osta YG, Campbell BE, Boag PR, Nisbet AJ, Beveridge I, Gasser RB. Exp Parasitol. 2007 Mar 24;

Manual annotation using BLAST

Annotations obtained automatically from ESTExplorer

BLAST results E-value Gene OntologiesMetabolic Pathway Mapping

Domain/Motif data

protein phosphatase catalytic gamma isoform isoform 1 1.00E-36

chromatin modification, protein amino acid dephosphorylation, embryonic cleavage, cytokinesis, meiosis, oviposition, manganese ion binding, protein phosphatase type 1 activity, mitochondrial outer membrane, protein binding, mitosis, glycogen metabolic process, iron ion binding, nucleus

Long-term potentiation, Regulation of actin cytoskeleton, Focal adhesion, Insulin signaling pathway

Metallophosphoesterase, Serine/threonine-specific protein phosphatase and bis(5-nucleosyl)-tetraphosphatase

Annotations obtained automatically from ESTExplorer

ESTExplorer : applications so far ..1. In silico analysis of expressed sequence tags (EST) from Trichostrongylus

vitrinus (Nematoda): comparison of the automated ESTExplorer workflow platform with database searches. Nagaraj SH, Gasser RB, Ranganathan S.

2. A transcriptomic analysis of the adult stage of the bovine lungworm, Dictyocaulus viviparus. Ranganathan S, Nagaraj SH, Hu M, Strube C, Schnieder T and Gasser RB. BMC Genomics, 2007, accepted

3. Gender-enriched transcripts in adult Haemonchus contortus (Nematoda) – predicted functions and genetic interactions based on comparative analyses with Caenorhabditis elegans. Campbell BE, Nagaraj SH, Hu M, Zhong W, Sternberg PW, Ong EK, Loukas A, Ranganathan S, Beveridge A and Robin B. Gasser.

4. Transcriptional changes in the third-stage larva of Ancylostoma caninum (Nematoda) following in vitro serumstimulation, employing a suppressive-subtractive hybridisation-based microarray approach. Datu BJD, Gasser RB, Nagaraj SH, Eng K. Onge, O’Donoghue P, McInnes R, Ranganathan S and Loukas A

5. Trichostrongylus vitrinus (Nematoda: Strongylida): Molecular characterization and transcriptional analysis of Tv-stp-1, a serine/threonine phosphatase gene. Hu M, Abs El-Osta YG, Campbell BE, Boag PR, Nisbet AJ, Beveridge I, Gasser RB. Exp Parasitol. 2007, accepted

Ref papers

Acknowledgements

Prof. Robin Gasser (University of Melbourne)

Genetics Technologies Pty. Ltd.

Australian Research Council LINKAGE PROJECT (LP0667795)

M41 family metalloproteasemitochondrial membrane proteinase : SchistosomaPathogenesis related protein similar to helminth venom allergen homologues :Schistosoma

Some more examples of secreted proteins