EBI web resources III: Web-based tools in Europe (EBI...

Preview:

Citation preview

EBI web resources III: Web-based tools in Europe (EBI, ExPASy,

EMBOSS, DTU)

Yanbin Yin

1

Homeworkassignment41. Downloadhttp://cys.bios.niu.edu/yyin/teach/PBB/purdue.cellwall.list.lignin.fa to

yourcomputer2. SelectaC3HproteinandaF5Hproteinfromtheabovefileandcalculatethe

sequence identitybetweenthemusingtheWaterserveratEBI.3. Performamultiple sequencealignmentusingMAFFTwithallFASTA

sequences inthefile4. Builtaphylogenywiththealignmentusingthe"AlaCarte"modeat

http://www.phylogeny.fr/5. Buildanotherphylogenystartingfromtheunalignedsequences usingthe

“one-click”modeathttp://www.phylogeny.fr/;ifyouencounteranyerrorreports,trytofigureoutwhyandhowtosolveit (hint:skiptheGblocks step).

Writeareport(inwordorppt)toincludealltheoperations,screenshotsandthefinalphylogeniesfromstep3and4.

Dueon10/11(sendbyemail)

2

Officehour:Tue,ThuandFri2-4pm,MO325AOremail:yyin@niu.edu

Outline

• Handsonexercises!

3

Pairwisealignment(includingdatabasesearch)tools

4

FasterSlower

LessmatchesMorematchesBLASTFASTASSEARCH

BLATBWABowtiePSI-BLAST

PSI-SearchHMMER3RPS-BLAST

5

http://www.ebi.ac.uk/

Tothebottomofthepage

Clickbynames(A-Z)

6

Thisisaverylong listoftoolsScrolldowntofindFASTA

OrCtrl+F andtypefasta

Wearegonna tryFASTAtool

ClickonFASTA[nucleotide]

7

ClickGenomesWe’regonna searchArabidopsisgenome

8

Clickonthislittlearrow ChooseArabidopsis

Gotohttp://cys.bios.niu.edu/yyin/teach/PBB/cesa-pr.faandcopythefirstseq (CesA)andpastehere

Changeheretoprotein

9

Tfastx:allowframeshiftbetweencodons

Tfasty:alsoallowframeshiftwithincodons

Toleratesequenceerrors

Good forfinding pseudogenes

http://www.ebi.ac.uk/Tools/sss/fasta/help/index-genomes.html

10

Should befinished veryquickly

Rawoutput (plain text)Graphicalpresentationoftheoutput

ShowEMBLformatofthesubject(hit)Showalignment

11

Inthealignment, lookfor/ frameshift\ frameshift* stopcodon

Weareattherawoutputview

Inorder toalignthequeryprotein tothesubjectgenomicDNA,reading frameshavetomove1or2baseahead(1baseinsertionor2baseinsertion)

12

BLASTgivesshorteralignmentbecauseitsalignmentbreakswhereitseesframeshifts

13

GobacktothetoolA-Zpage:http://www.ebi.ac.uk/services/all

Ctrl+F andtypessearch

SSEARCHisacommand intheFASTApackageimplementing Smith-Watermanalgorithm

Ssearch canonlydoprotein-protein ornucleotide-nucleotide searchesSlowerbutmostaccurate

Canonlydopr-prornt-nt search

14

GobacktothetoolA-Zpage:http://www.ebi.ac.uk/services/all

Ctrl+F andtypeemboss

EMBOSS:EuropeanMolecularBiologyOpenSoftwareSuiteEMBOSS:TheEuropeanMolecular BiologyOpenSoftwareSuite(2000)Rice,P.Longden,I.andBleasby,A.TrendsinGenetics16,(6)pp276--277

EMBOSScontainhundredsofcomputerprogramsforsequenceanalysis

15

Needleman-wunsch algorithmSmith-Watermanalgorithm

Equivalenttothebl2seqcommandoftheBLASTpackage

Let’stryneedlefirst

16

Globalvs localalignment:• inalocalalignment, youtrytomatchyourquerywithasubstring (aportion)

ofyoursubject(reference)• inaglobalalignmentyouperformanendtoendalignmentwiththesubject

17

CslA:539aaCesA:1089aa

Gotohttp://cys.bios.niu.edu/yyin/teach/PBB/cesa-pr.faCopy&pasteCesACopy&pasteCslA

18

- gap.Negativescore:positivescore|identical

ThisisdifferentfromwhatBLASTshowsthealignment

Notadatabasesearch,sonoE-valueisreported

Thisisneedleoutput

19

Thisiswateroutput

Thebestwaytofind theoptimallyaligned regionsandcalculatethesimilaritybetweentwosequences

20

Selectheretotryblast2seq

21

Gotohttp://cys.bios.niu.edu/yyin/teach/PBB/cesa-pr.faCopy&pasteCesACopy&pasteCslA

Chooseblastp

22

FragmentedalignmentsThisblast2seqoutput

Multiplesequencealignmenttools

Foundationformanyotherfurtheranalyses:phylogeny,evolution,motif,

proteinfamilyetc.

23

24

http://www.ebi.ac.uk/Tools/msa/TheMSApageshowsninetoolsandwe’regonna tryClustal Omega,MAFFTandMUSCLE

25

26

Youcanalwayscheckthehelppage

Gohttp://cys.bios.niu.edu/yyin/teach/PBB/cesa-pr.faandcopypasteallthe9protein seq here

Thensubmit

ThisisClustal Omegapage

27

Thisiscalledclustal formatofMSA

ColorAAbasedonchemicalproperties,e.g.acidicAAinblue

Checktheevolutionaryrelatedness

Gettextformatsummaryoftheresults

28

Txtformattodescriberelatednessandcanbevisualizedgraphicallyasatreegraph

Matrixtellshowsimilareachpairofseqs is

Bothcanbecopypastetonotepadandsaveasplaintextfile

29

30

ThisisMAFFTpageYoucanalwayscheckthehelppage

Gohttp://cys.bios.niu.edu/yyin/teach/PBB/cesa-pr.faandcopypasteallthe9protein seq here

ChangeheretoClustalW

31

IDsweretruncated

ForcethefirstMresiduealigned

32

ThisisMUSCLEpage

Gohttp://cys.bios.niu.edu/yyin/teach/PBB/cesa-pr.faandcopypasteallthe9protein seq here

ChangeheretoClustalW

33

34

MolecularSystemsBiology7:539,2011

accuracy speedMAFFT>Clustal Omega>MUSCLE>>ClustalW

http://mafft.cbrc.jp/alignment/software/about.html

SowhichMSAtoolshould Iuse?

35

http://www.ebi.ac.uk/Tools/msa/Visualizealignment

36

Youcanalwayscheckthehelppage

Gohttp://cys.bios.niu.edu/yyin/teach/PBB/cesa-pr.fa.alnandcopypastetheMSAbuiltabove

ThisisMview page

MView reformatstheresultsofasequencedatabasesearch(BLAST,FASTA,etc)oramultiplealignment(MSF,PIR,CLUSTAL,etc)addingoptionalHTMLmark-uptocontrolcolouring andwebpagelayout.MView isnotamultiplealignmentprogram,norisitageneralpurposealignmenteditor

37Consensuslettersexplainedathttp://bio-mview.sourceforge.net/manual/manual.html#ref-output-formats

38

AnotherMSAvisualizationtool:ESPript http://espript.ibcp.fr/ESPript/ESPript/

Clickheretostart

39

Copyhttp://cys.bios.niu.edu/yyin/teach/PBB/cesa-pr.fa.alnandpastetoatxtfileandsaveitonyourdesktop,andupload to

Afterthefileisuploaded

40

Anewwindowpopped out,viewinPDF

41

Wejusttriedtheverybasicfunction.ThiswebserverhasmanymoreusefulfunctionssuchasdisplayingsecondarystructuresalongwithMSA.Tolearnmore:http://espript.ibcp.fr/ESPript/ESPript/esp_tutorial.php

42

ExPASy:ExpertProteinAnalysisSystematSIBCollectionofexternal/internaltools

43

http://expasy.org/

Clickongenomics, thensequencealignment

Thiswebsitecollectandclassifyweblinkstohundreds ofbioinfo tools

44

Thispageliststoolsforsequencealignment

We’regonna try

45

http://weblogo.berkeley.edu/logo.cgi

Uploadthefilethatwedownloaded fromhttp://cys.bios.niu.edu/yyin/teach/PBB/cesa-pr.fa.aln

Toggle thistoallowlogoshown inmultiline

46

Clicktoincrease

47

Youcanalsocopypasteasegmentofthealignment toweblogoNoneedtousetheentirealignment

48

Pastethecopiedsegmenthere

49

WithMSAyoucanbuildaphylogeny todescribetherelatednessofseqs

Seqs

MSA

Phylogeny

Graph

Wearegonna trythiswebsite

50

http://phylogeny.lirmm.fr/phylo_cgi/index.cgi

Threemodesofphylogenyreconstruction

Trytheoneclickmode

51

Oneclickmodeusesthesetools

Givethisjobaname

http://cys.bios.niu.edu/yyin/teach/PBB/cesa-pr.fa

Gblocks isaprogramautomaticallyeditthealignment

52

53

EMBOSS:EuropeanMolecularBiologyOpenSoftwareSuite

EMBOSS:TheEuropeanMolecularBiologyOpenSoftwareSuite(2000)Rice,P.Longden,I.andBleasby,A.TrendsinGenetics16,(6)pp276--277

54

http://emboss.sourceforge.net/

EMBOSScontainhundredsofcomputerprogramswritteninClanguageforsequenceanalysis

ThebestwaytouseistoinstallitonaLinuxcomputer

Herewe’regonna trysomepublicwebserversthathaveEMBOSSpackageinstalled

55

Manyothersarenotaccessible,butthisoneis

56

ThisiscalledEMBOSSexplorer,whichisawebinterfacetosupportrunning EMBOSSprograms throughweb

350+programsputintodifferentgroups

Wewilltryafewprograms inthispackage

57

Themostbasicone:translateanucleotideseq toanaminoacidseq (relatedtofinding theopenreadingframes)

Find theprogramtranseq inthenucleictranslationgroup

58

Copyandpastetheseq inhttp://cys.bios.niu.edu/yyin/teach/PBB/nt-example.faIt’sanassembledtranscriptfromESTdataofsomealgalspeciesWedonotknowifitindeedencodeaproteinandifyeswhereistheORFRemembermRNAcontainsuntranslated region(UTR)

Chooseallsixframes

59

Thisislikelytherightframe

60

Ifthisisacorrectresult?Youcantakethent seq todoblastatNCBI

Puttheseq IDherebecauseitisinGenBank already

Chooseswiss-prot becauseitissmallerandhighquality

61

ClickformattingoptionsChooseplaintextviewClickreformat

62

Thisisthealignmentofourquerywiththebesthit,theframeis+2,sameasthetranseq result

63

64

ThisisthelongestORF

65

ATGCGCTA

TACGCGAT

TAGCGCATReverse

Complementhttp://cys.bios.niu.edu/yyin/teach/PBB/nt-example.fa

66

67

ATGCGCTA

TGCGCTRegion2-7

http://cys.bios.niu.edu/yyin/teach/PBB/nt-example.fa

Region57-400

68

69

CalculateGCcontent

ATGCGCTAGC%=50%

Changetoyestogetapic

70

71

ATGCGCTA

16possibledinuc64possible trinuc256possibletetranuc

Defaultisdinuc

72

Equaloccurrence:1/16

Application: scangenometolookforregionswithabnormalcompositions

73

http://cys.bios.niu.edu/yyin/teach/PBB/cesa-pr.fa,copypastethe1st seq

74

75

PopulartoolsdevelopedatTechnicalUniversityofDenmark

76

Google:cbs dtu

77

http://cys.bios.niu.edu/yyin/teach/PBB/nt-example.fa

78

ThislistsalltheATGintheseq,eachwasscoredtoindicateitslikelihood tobeastartcodon

79

80

http://cys.bios.niu.edu/yyin/teach/PBB/cesa-pr.fa,copypastethe1st seq

81

82

Nextclass:ClustalX andMEGA

83

Recommended