12
Introducing Database Mining to Molecular Genetics Students (Juniors & Seniors) Karl Wilson

Introducing Database Mining to Molecular Genetics Students (Juniors & Seniors)

Embed Size (px)

DESCRIPTION

Introducing Database Mining to Molecular Genetics Students (Juniors & Seniors). Karl Wilson. Objectives:. Introduce students to online protein and nucleotide databases (via GenBank at the NCBI website). Specific operations: Use of BLAST to find similar sequences (protein & nucleotide) - PowerPoint PPT Presentation

Citation preview

Page 1: Introducing Database Mining to Molecular Genetics Students (Juniors & Seniors)

Introducing Database Mining to Molecular Genetics Students

(Juniors & Seniors)

Karl Wilson

Page 2: Introducing Database Mining to Molecular Genetics Students (Juniors & Seniors)

Objectives:

• Introduce students to online protein and nucleotide databases (via GenBank at the NCBI website).

• Specific operations:– Use of BLAST to find similar sequences (protein &

nucleotide)– Downloading and saving sequences– Comparison of sequences and alignment with

ClustalW– Interpretation of phylogenetic data.

Page 3: Introducing Database Mining to Molecular Genetics Students (Juniors & Seniors)

• The “test” protein sequence:AAA92063. cysteinyl endopep...[gi:1223922] LOCUS AAA92063 362 aa linear PLN 22-AUG-2002 DEFINITION cysteinyl endopeptidase [Vigna radiata]. ACCESSION AAA92063 VERSION AAA92063.1 GI:1223922 DBSOURCE locus VRU49445 accession U49445.1 KEYWORDS . SOURCE Vigna radiata ORGANISM Vigna radiata Eukaryota; Viridiplantae; Streptophyta; Embryophyta; Tracheophyta; Spermatophyta; Magnoliophyta; eudicotyledons; core eudicots; rosids; eurosids I; Fabales; Fabaceae; Papilionoideae; Phaseoleae; Vigna. REFERENCE 1 (residues 1 to 362) AUTHORS Lee,K., Tan-Wilson,A.L. and Wilson,K.A. TITLE Direct Submission JOURNAL Submitted (16-FEB-1996) K. Lee, Department of Biological Sciences, State University of New York at Binghamton, P.O. Box 6000, Binghamton, NY 13902-6000, USA

Page 4: Introducing Database Mining to Molecular Genetics Students (Juniors & Seniors)

Student given VRU49445 sequence (only) via e-mail or Blackboard

Find sequence via Entrez, download in Fasta format

VRU49445 sequence

Submit to Protein-Protein BLAST (BLASTP)

BLASTP results – related sequences

Page 5: Introducing Database Mining to Molecular Genetics Students (Juniors & Seniors)

Score E Sequences producing significant alignments: (bits) Value gi|1223922|gb|AAA92063.1| cysteinyl endopeptidase [Vigna ra... 705 0.0 gi|118158|sp|P12412|CYSP_VIGMU Vignain precursor (Bean endo... 686 0.0 gi|445927|prf||1910332A Cys endopeptidase 684 0.0

gi|7435774|pir||S22502 cysteine proteinase (EC 3.4.22.-) - ... 677 0.0 gi|544129|sp|P25803|CYSP_PHAVU Vignain precursor (Bean endo... 674 0.0 gi|1345573|emb|CAA40073.1| endopeptidase (EP-C1) [Phaseolus... 673 0.0 gi|31559530|dbj|BAC77523.1| cysteine proteinase [Glycine ma... 657 0.0 gi|31559526|dbj|BAC77521.1| cysteine proteinase [Glycine ma... 653 0.0 gi|7435817|pir||T08122 cysteine endopeptidase (EC 3.4.22.-)... 580 e-164 gi|600111|emb|CAA84378.1| cysteine proteinase [Vicia sativa] 540 e-152 gi|3688528|emb|CAA06243.1| pre-pro-TPE4A protein [Pisum sat... 539 e-152 gi|18423124|ref|NP_568722.1| cysteine proteinase [Arabidops... 521 e-147 gi|30141021|dbj|BAC75924.1| cysteine protease-2 [Helianthus... 516 e-145 gi|1076552|pir||S49166 cysteine proteinase (EC 3.4.22.-) pr... 510 e-143 gi|7435811|pir||T06708 cysteine proteinase (EC 3.4.22.-) T2... 490 e-137 gi|1169186|sp|P43156|CYSP_HEMSP Thiol protease SEN102 precu... 490 e-137 gi|25289998|pir||JC7787 carrot seed cysteine proteinase (EC... 485 e-136 gi|18408616|ref|NP_566901.1| cysteine proteinase, putative ... 483 e-135 gi|1173630|gb|AAB37233.1| cysteine proteinase 470 e-131 gi|4731374|gb|AAD28477.1|AF133839_1 papain-like cysteine pr... 462 e-129 gi|22331686|ref|NP_680113.1| cysteine proteinase, putative ... 462 e-129

Page 6: Introducing Database Mining to Molecular Genetics Students (Juniors & Seniors)

BLASTP results – related sequences

Copy most similar cDNA sequences (in FASTA format)

cDNA sequences from P. vulgaris, V. mungo, G. max, V. sativa, etc.

Submit sequences to CLUSTALW at Biology Workbench website.

Page 7: Introducing Database Mining to Molecular Genetics Students (Juniors & Seniors)

gi_118158_sp_P12412_CYSP_VIG MAMKKLLWVVLSLSLVLGVANSFDFHEKDLESEESLWDLYERWRSHHTVS gi_1223922_gb_AAA92063.1__cy MAMKKLLWVVLSLSLVLGVANSFDFHEKDLASEESLWDLYERWRSHHTVS gi_31559526_dbj_BAC77521.1__ MAMKKLLWVVLSLSLVLGSANSFDFHDKDLASEESFWDLYERWRSHHTVS gi_31559530_dbj_BAC77523.1__ MAMKKFLWVVLSLSLVLGVANSFDFHDKDLESEESLWDLYERWRSHHTVS gi_600111_emb_CAA84378.1__cy MEMKKLLFISLSLALIFTVANTFDFNEHDLESEKSLWNLYERWRSHHTVT

gi_118158_sp_P12412_CYSP_VIG RSLGEKHKRFNVFKANVMHVHNTNKMDKPYKLKLNKFADMTNHEFRSTYA gi_1223922_gb_AAA92063.1__cy RSLTEKHKRFNVFKENVMHVHNTNKMDKPYKLKLNKFADMTNHEFRSTYA gi_31559526_dbj_BAC77521.1__ RSLGDKHKRFNVFKANVMHVHNTNKMDKPYKLKLNKFADMTNHEFRSTYA gi_31559530_dbj_BAC77523.1__ RSLGDKHKRFNVFKANMMHVHNTNKMDKPYKLKLNKFADMTNHEFRSTYA gi_600111_emb_CAA84378.1__cy RNLDEKHNRFNVFKANVMHVHNTNKLDKPYKLKLNKFGDMTNYEFRRIYA

gi_118158_sp_P12412_CYSP_VIG GSKVNHHKMFRGSQHGSGTFMYEKVGSVPASVDWRKKGAVTDVKDQGQCG gi_1223922_gb_AAA92063.1__cy GSKVNHHKMFRGTQHGNGTFMYEKVGSVPASVDWRKKGAVTDVKDQGQCG gi_31559526_dbj_BAC77521.1__ GSKVNHHRMFQGTPRGNGTFMYEKVGSVPPSVDWRKNGAVTGVKDQGQCG gi_31559530_dbj_BAC77523.1__ GSKVNHHRMFRDMPRGNGTFMYEKVGSVPASVDWRKKGAVTDVKDQGHCG gi_600111_emb_CAA84378.1__cy DSKISHHRMFRGMSHENGTFMYENAVDVPSSIDWRNKGAVTGVKDQGQCG

Alignment of the Cysteine Proteases from Vigna, Phaseolus, Glycine, and Vicia.

Page 8: Introducing Database Mining to Molecular Genetics Students (Juniors & Seniors)

Unrooted Phylogenetic Tree

Page 9: Introducing Database Mining to Molecular Genetics Students (Juniors & Seniors)

• Add more sequences (e.g. of non-legumes) and see how tree changes?

• Repeat, all of above, but this time do with nucleotide sequences of the same proteins (cDNA) sequences.

• Compare results.

Page 10: Introducing Database Mining to Molecular Genetics Students (Juniors & Seniors)

Possible Additions:

• Add more sequences (e.g. of non-legumes) and see how tree changes?

• Repeat, all of above, but this time do with nucleotide sequences of the same proteins (cDNA) sequences. Compare results with those from protein sequences.

Page 11: Introducing Database Mining to Molecular Genetics Students (Juniors & Seniors)

• Compare the nucleotide sequences of the cDNA and gene pairs where available – exons/introns?

ACGTGTGACGAATCAAAGGTGCATGTTAGGCCAAACATATTTTCCAATGAACGTGTGACGAATCAAAGGTG----------------------------- ACCTGTGATGCATCAAAGGTGCATGTTCGGCCAAACTTTTTTTTTTTT–-ACCTGTGATGCATCAAAGGTG-----------------------------

AACCACTATAATTAATAGATAACTTGAGAAACT--AAAGTGCCAAAAATC -------------------------------------------------- -TTTAATGAAACCAATA--TAACTTGAGAAATCTAAAATTGCCAAAAATC --------------------------------------------------

TTTCATGTGGTAGGTGAATGACCTAGCTGTGTCAATTGATGGTCATGAAA ----------------AATGACCTAGCTGTGTCAATTGATGGTCATGAAA TTGCATGTGGTAGGTGAATGACCTAGCTGTGTCAATTGATGGCCATGAGA

----------------AATGACCTAGCTGTGTCAATTGATGGCCATGAGA

Page 12: Introducing Database Mining to Molecular Genetics Students (Juniors & Seniors)

• Examine targeting of cysteine protease – e.g. with TargetP or PSORT.

PSORT : http://psort.ims.u-tokyo.ac.jp/

With AAA92063 (Vigna radiata cysteine protease):

endoplasmic reticulum (lumen) --- Certainty= 0.910(Affirmative) outside --- Certainty= 0.719(Affirmative) lysosome (lumen) --- Certainty= 0.190(Affirmative) endoplasmic reticulum (membrane) --- Certainty= 0.100(Affirmative)