63
Sequence Comparison: Pairwise Alignment Shifra Ben-Dor Irit Orr Bioinformatics Lecture 5 2019

Sequence Comparison: Pairwise Alignmentdors.weizmann.ac.il/course/introbioinfo/Lect5_pairwise.pdf · 2019-04-29 · Match = 3 Gap = -2 8(3) + 3(-2) = 18 8(3) + 3(-2) = 18. However

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Sequence Comparison: Pairwise Alignmentdors.weizmann.ac.il/course/introbioinfo/Lect5_pairwise.pdf · 2019-04-29 · Match = 3 Gap = -2 8(3) + 3(-2) = 18 8(3) + 3(-2) = 18. However

SequenceComparison:PairwiseAlignment

ShifraBen-DorIritOrr

Bioinformatics Lecture 5 2019

Page 2: Sequence Comparison: Pairwise Alignmentdors.weizmann.ac.il/course/introbioinfo/Lect5_pairwise.pdf · 2019-04-29 · Match = 3 Gap = -2 8(3) + 3(-2) = 18 8(3) + 3(-2) = 18. However

PAIRWISE ALIGNMENT

DATABASE SEARCHING

MULTIPLE ALIGNMENT

Page 3: Sequence Comparison: Pairwise Alignmentdors.weizmann.ac.il/course/introbioinfo/Lect5_pairwise.pdf · 2019-04-29 · Match = 3 Gap = -2 8(3) + 3(-2) = 18 8(3) + 3(-2) = 18. However

MULTIPLE ALIGNMENT

Phylogenetic Analysis

Homology Modeling

Advanced Database Searches, Patterns, Motifs, Promoters

Page 4: Sequence Comparison: Pairwise Alignmentdors.weizmann.ac.il/course/introbioinfo/Lect5_pairwise.pdf · 2019-04-29 · Match = 3 Gap = -2 8(3) + 3(-2) = 18 8(3) + 3(-2) = 18. However

Theproblems:

IhaveaDNAsequence:Whatdoesitdo?possiblecodingregionpossibleregulatoryregionIhaveaproteinsequence:Whatdoesitdo??

Page 5: Sequence Comparison: Pairwise Alignmentdors.weizmann.ac.il/course/introbioinfo/Lect5_pairwise.pdf · 2019-04-29 · Match = 3 Gap = -2 8(3) + 3(-2) = 18 8(3) + 3(-2) = 18. However

SequenceComparison

•  Generally,sequencedeterminesstructureandstructuredeterminesfuncHon

•  Bystudyingsequencesimilarity,wehopetofindcorrelaHonsbetweenoursequenceandothersequenceswithknownstructureorfuncHon

•  ThisapproachisoKensuccessful,howevermanymoleculeshavelowsequencesimilarity,yetsHllsharesimilarstructureorfuncHon.

Page 6: Sequence Comparison: Pairwise Alignmentdors.weizmann.ac.il/course/introbioinfo/Lect5_pairwise.pdf · 2019-04-29 · Match = 3 Gap = -2 8(3) + 3(-2) = 18 8(3) + 3(-2) = 18. However

SequenceComparison

•  MoHfs/Domains-similarityoversmallstretches

•  Sequencefamilies-similarityoverlongersequences

•  Comparisoncanhelpuswith:•  structure•  funcHon•  evoluHon

Page 7: Sequence Comparison: Pairwise Alignmentdors.weizmann.ac.il/course/introbioinfo/Lect5_pairwise.pdf · 2019-04-29 · Match = 3 Gap = -2 8(3) + 3(-2) = 18 8(3) + 3(-2) = 18. However

ComparisonQuesHons:

•  Arethesequencesrelated(homology)?

•  Canwequalifytheirsimilarity?

•  Dotheyhavesimilarsegments?

Page 8: Sequence Comparison: Pairwise Alignmentdors.weizmann.ac.il/course/introbioinfo/Lect5_pairwise.pdf · 2019-04-29 · Match = 3 Gap = -2 8(3) + 3(-2) = 18 8(3) + 3(-2) = 18. However

Terminology:

•  Homology

•  IdenHty

•  Similarity

Page 9: Sequence Comparison: Pairwise Alignmentdors.weizmann.ac.il/course/introbioinfo/Lect5_pairwise.pdf · 2019-04-29 · Match = 3 Gap = -2 8(3) + 3(-2) = 18 8(3) + 3(-2) = 18. However

Homology

•  Commonancestry

•  Sequence(andusuallystructure)conservaHon

•  HomologyisnotameasurablequanHty

•  Homologycanbeinferred,undersuitablecondiHons

Page 10: Sequence Comparison: Pairwise Alignmentdors.weizmann.ac.il/course/introbioinfo/Lect5_pairwise.pdf · 2019-04-29 · Match = 3 Gap = -2 8(3) + 3(-2) = 18 8(3) + 3(-2) = 18. However

IdenHty

•  ObjecHveandwelldefined

•  CanbequanHfiedbyseveralmethods:

•  Percent

•  ThenumberofidenHcalmatchesdividedbythelengthofthealignedregion

Page 11: Sequence Comparison: Pairwise Alignmentdors.weizmann.ac.il/course/introbioinfo/Lect5_pairwise.pdf · 2019-04-29 · Match = 3 Gap = -2 8(3) + 3(-2) = 18 8(3) + 3(-2) = 18. However

Similarity

•  Mostcommonmethodused

•  Notsowelldefined

•  Dependsontheparametersused(alphabet,scoringmatrix,etc.)

Page 12: Sequence Comparison: Pairwise Alignmentdors.weizmann.ac.il/course/introbioinfo/Lect5_pairwise.pdf · 2019-04-29 · Match = 3 Gap = -2 8(3) + 3(-2) = 18 8(3) + 3(-2) = 18. However

Whatarewecomparing?

•  DNAorRNA•  Fournucleicacids(basicset)

•  Protein•  Twentyaminoacids(basicset)

Page 13: Sequence Comparison: Pairwise Alignmentdors.weizmann.ac.il/course/introbioinfo/Lect5_pairwise.pdf · 2019-04-29 · Match = 3 Gap = -2 8(3) + 3(-2) = 18 8(3) + 3(-2) = 18. However

Alignment•  Analignmentisanarrangementoftwosequencesoppositeoneanother

•  Itshowswheretheyaredifferentandwheretheyaresimilar

•  WewanttofindtheopHmalalignment-themostsimilarityandtheleastdifferences

Page 14: Sequence Comparison: Pairwise Alignmentdors.weizmann.ac.il/course/introbioinfo/Lect5_pairwise.pdf · 2019-04-29 · Match = 3 Gap = -2 8(3) + 3(-2) = 18 8(3) + 3(-2) = 18. However

Alignment

•  Alignmentshavetwoaspects:

•  QuanHty:Towhatdegreearethesequencessimilar(percentage,otherscoringmethod)

•  Quality:Regionsofsimilarityinagivensequence

Page 15: Sequence Comparison: Pairwise Alignmentdors.weizmann.ac.il/course/introbioinfo/Lect5_pairwise.pdf · 2019-04-29 · Match = 3 Gap = -2 8(3) + 3(-2) = 18 8(3) + 3(-2) = 18. However

TheopHmalalignmentoftwosequencesisonethatfindsthelongestsegmentofhighsequencesimilarity.

Page 16: Sequence Comparison: Pairwise Alignmentdors.weizmann.ac.il/course/introbioinfo/Lect5_pairwise.pdf · 2019-04-29 · Match = 3 Gap = -2 8(3) + 3(-2) = 18 8(3) + 3(-2) = 18. However

Howisanalignmentdone?

•  Whenwecomparesequences,wetaketwostringsofleXers(nucleoHdesoraminoacids)andalignthem.

•  WherethecharactersareidenHcal,wegivethemaposiHvescore,andwheretheydiffer,anegaHvevalue.

•  WecounttheidenHcalandnon-idenHcalcharacters,andgivethealignmentascore(usuallycalledthequality)

Page 17: Sequence Comparison: Pairwise Alignmentdors.weizmann.ac.il/course/introbioinfo/Lect5_pairwise.pdf · 2019-04-29 · Match = 3 Gap = -2 8(3) + 3(-2) = 18 8(3) + 3(-2) = 18. However

Differencesinthesequencecanbe

causedbydeleHonsorinserHonsin

theDNA,orbypointmutaHons.These

changescanbeseenattheproteinlevel

aswell(changesinthetranslaHonof

theprotein)

Page 18: Sequence Comparison: Pairwise Alignmentdors.weizmann.ac.il/course/introbioinfo/Lect5_pairwise.pdf · 2019-04-29 · Match = 3 Gap = -2 8(3) + 3(-2) = 18 8(3) + 3(-2) = 18. However

ThisschemeworksfineaslongasyouassumethatallpossiblemutaHonsoccuratthesamefrequency.However,naturedoesn’tworkthisway.IthasbeenfoundthatinDNA,transiHonsoccurmoreoKenthantransversions.

Page 19: Sequence Comparison: Pairwise Alignmentdors.weizmann.ac.il/course/introbioinfo/Lect5_pairwise.pdf · 2019-04-29 · Match = 3 Gap = -2 8(3) + 3(-2) = 18 8(3) + 3(-2) = 18. However

Purines(A,G) are2-ringbasesPyrimidines(C,T)are1-ringbasesTransiHon:purinetopurineor pyrimidinetopyrimidineTransversion:purinetopyrimidineorpyrimidinetopurineTransiHonsconserveringnumberTransversionschangeringnumber

Page 20: Sequence Comparison: Pairwise Alignmentdors.weizmann.ac.il/course/introbioinfo/Lect5_pairwise.pdf · 2019-04-29 · Match = 3 Gap = -2 8(3) + 3(-2) = 18 8(3) + 3(-2) = 18. However

takenfromMolecularCellBiology,DarnellLodishBalHmore1990

Page 21: Sequence Comparison: Pairwise Alignmentdors.weizmann.ac.il/course/introbioinfo/Lect5_pairwise.pdf · 2019-04-29 · Match = 3 Gap = -2 8(3) + 3(-2) = 18 8(3) + 3(-2) = 18. However

Forproteins,thesituaHonisfarmorecomplex

•  AminoacidscanbegroupedbyanumberofclassificaHons:

•  Chemical:aromaHc,aliphaHc,sulphuric

•  FuncHonal:hydrophobic,hydrophilic,acidic,basic

•  Charge:posiHve,negaHve,neutral

•  Structural:internal,external

Page 22: Sequence Comparison: Pairwise Alignmentdors.weizmann.ac.il/course/introbioinfo/Lect5_pairwise.pdf · 2019-04-29 · Match = 3 Gap = -2 8(3) + 3(-2) = 18 8(3) + 3(-2) = 18. However

ScoringMatrices

•  Scoringmatricesareusedtoassignascoretoeachcomparisonofapairofcharacters

•  ThescoresinthematrixareintegervalueswhichassignaposiHvescoretoidenHcalorsimilarcharacterpairs,andanegaHvevaluetodissimilarpairs

•  Thematriceswereconstructedbyanalyzingknownfamiliesofproteins

Page 23: Sequence Comparison: Pairwise Alignmentdors.weizmann.ac.il/course/introbioinfo/Lect5_pairwise.pdf · 2019-04-29 · Match = 3 Gap = -2 8(3) + 3(-2) = 18 8(3) + 3(-2) = 18. However

Anexample:Blosum62Henikoff&Henikoff

A B C D E F G H I K L M N P Q R S T V W X Y Z A 4 -2 0 -2 -1 -2 0 -2 -1 -1 -1 -1 -2 -1 -1 -1 1 0 0 -3 -1 -2 -1 B -2 6 -3 6 2 -3 -1 -1 -3 -1 -4 -3 1 -1 0 -2 0 -1 -3 -4 -1 -3 2 C 0 -3 9 -3 -4 -2 -3 -3 -1 -3 -1 -1 -3 -3 -3 -3 -1 -1 -1 -2 -1 -2 -4 D -2 6 -3 6 2 -3 -1 -1 -3 -1 -4 -3 1 -1 0 -2 0 -1 -3 -4 -1 -3 2 E -1 2 -4 2 5 -3 -2 0 -3 1 -3 -2 0 -1 2 0 0 -1 -2 -3 -1 -2 5 F -2 -3 -2 -3 -3 6 -3 -1 0 -3 0 0 -3 -4 -3 -3 -2 -2 -1 1 -1 3 -3 G 0 -1 -3 -1 -2 -3 6 -2 -4 -2 -4 -3 0 -2 -2 -2 0 -2 -3 -2 -1 -3 -2 H -2 -1 -3 -1 0 -1 -2 8 -3 -1 -3 -2 1 -2 0 0 -1 -2 -3 -2 -1 2 0 I -1 -3 -1 -3 -3 0 -4 -3 4 -3 2 1 -3 -3 -3 -3 -2 -1 3 -3 -1 -1 -3 K -1 -1 -3 -1 1 -3 -2 -1 -3 5 -2 -1 0 -1 1 2 0 -1 -2 -3 -1 -2 1 L -1 -4 -1 -4 -3 0 -4 -3 2 -2 4 2 -3 -3 -2 -2 -2 -1 1 -2 -1 -1 -3 M -1 -3 -1 -3 -2 0 -3 -2 1 -1 2 5 -2 -2 0 -1 -1 -1 1 -1 -1 -1 -2 N -2 1 -3 1 0 -3 0 1 -3 0 -3 -2 6 -2 0 0 1 0 -3 -4 -1 -2 0 P -1 -1 -3 -1 -1 -4 -2 -2 -3 -1 -3 -2 -2 7 -1 -2 -1 -1 -2 -4 -1 -3 -1 Q -1 0 -3 0 2 -3 -2 0 -3 1 -2 0 0 -1 5 1 0 -1 -2 -2 -1 -1 2 R -1 -2 -3 -2 0 -3 -2 0 -3 2 -2 -1 0 -2 1 5 -1 -1 -3 -3 -1 -2 0 S 1 0 -1 0 0 -2 0 -1 -2 0 -2 -1 1 -1 0 -1 4 1 -2 -3 -1 -2 0 T 0 -1 -1 -1 -1 -2 -2 -2 -1 -1 -1 -1 0 -1 -1 -1 1 5 0 -2 -1 -2 -1 V 0 -3 -1 -3 -2 -1 -3 -3 3 -2 1 1 -3 -2 -2 -3 -2 0 4 -3 -1 -1 -2 W -3 -4 -2 -4 -3 1 -2 -2 -3 -3 -2 -1 -4 -4 -2 -3 -3 -2 -3 11 -1 2 -3 X -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 Y -2 -3 -2 -3 -2 3 -3 2 -1 -2 -1 -1 -2 -3 -1 -2 -2 -2 -1 2 -1 7 -2 Z -1 2 -4 2 5 -3 -2 0 -3 1 -3 -2 0 -1 2 0 0 -1 -2 -3 -1 -2 5

Page 24: Sequence Comparison: Pairwise Alignmentdors.weizmann.ac.il/course/introbioinfo/Lect5_pairwise.pdf · 2019-04-29 · Match = 3 Gap = -2 8(3) + 3(-2) = 18 8(3) + 3(-2) = 18. However

Alignmentalgorithms

•  Visualalignment•  allowsintegraHonofrelevantdatanotavailabletocomputerizedalgorithms

•  Timeconsuming,notfeasibleforallbuttheshortestsequences

•  Fixedlengthalgorithms•  donotconsiderinserHonsanddeleHons•  inserHonsanddeleHonsareneededevenforcloselyrelatedsequences

Page 25: Sequence Comparison: Pairwise Alignmentdors.weizmann.ac.il/course/introbioinfo/Lect5_pairwise.pdf · 2019-04-29 · Match = 3 Gap = -2 8(3) + 3(-2) = 18 8(3) + 3(-2) = 18. However

AlignmentAlgorithms

•  Thenaïveapproach:•  generateallpossiblealignmentsfor2sequences(includinggaps)andchoosethealignmentwiththehighestscore

•  TooHmeconsuming

Page 26: Sequence Comparison: Pairwise Alignmentdors.weizmann.ac.il/course/introbioinfo/Lect5_pairwise.pdf · 2019-04-29 · Match = 3 Gap = -2 8(3) + 3(-2) = 18 8(3) + 3(-2) = 18. However

Dynamicprogrammingalgorithms

•  Eachcharacteralongbothsequencesisevaluated.AteachposiHontherearefourpossibilites•  idenHty•  subsHtuHon•  deleHoninsequence1•  deleHoninsequence2

Page 27: Sequence Comparison: Pairwise Alignmentdors.weizmann.ac.il/course/introbioinfo/Lect5_pairwise.pdf · 2019-04-29 · Match = 3 Gap = -2 8(3) + 3(-2) = 18 8(3) + 3(-2) = 18. However

Dynamicprogramming

•  IdenHcalcharacters(matches)orsubsHtuHons(mismatches)arescoredaccordingtoamatrix.

•  DeleHonsineitherofthesequencesarecalledgaps.

•  GapsaregivenanegaHvescore,referredtoasthegappenalty

Page 28: Sequence Comparison: Pairwise Alignmentdors.weizmann.ac.il/course/introbioinfo/Lect5_pairwise.pdf · 2019-04-29 · Match = 3 Gap = -2 8(3) + 3(-2) = 18 8(3) + 3(-2) = 18. However

Thealignmentisgivenascore,calledthequalityQuality=matches-(mismatches+gappenalty)Theprogramwillfindthealignmentwiththehighestquality.ThechoicebetweengapsandsubsHtuHonsismadetogivethehigherqualityofthetwo.

Page 29: Sequence Comparison: Pairwise Alignmentdors.weizmann.ac.il/course/introbioinfo/Lect5_pairwise.pdf · 2019-04-29 · Match = 3 Gap = -2 8(3) + 3(-2) = 18 8(3) + 3(-2) = 18. However

TheGapPenaltyConsiderthetwofollowingalignments: V I T K L G T C V G S V I T K L G T C V G S

V I T . . . T C V G S V . T K . G T C V . S

Accordingtothealgorithmthese2caseswillgetthesamegappenalty:

Match=3Gap=-2 8(3) + 3(-2) = 18 8(3) + 3(-2) = 18

Page 30: Sequence Comparison: Pairwise Alignmentdors.weizmann.ac.il/course/introbioinfo/Lect5_pairwise.pdf · 2019-04-29 · Match = 3 Gap = -2 8(3) + 3(-2) = 18 8(3) + 3(-2) = 18. However

Howevernatureisdifferent.Inmost

casesinserHons/deleHonsarelonger

thanasingleresidue,evenforvery

similarsequences.

Page 31: Sequence Comparison: Pairwise Alignmentdors.weizmann.ac.il/course/introbioinfo/Lect5_pairwise.pdf · 2019-04-29 · Match = 3 Gap = -2 8(3) + 3(-2) = 18 8(3) + 3(-2) = 18. However

Tocompensateforthis,andtodifferenHatebetweencasesliketheoneabove,thegappenaltyismadeupoftwofactors:ThegapcreaHonpenalty-subtractedfromthealignmentqualitywheneveragapisopened.Thegapextensionpenalty-subtractedfromthealignmentqualityaccordingtothelengthofthegap.

Page 32: Sequence Comparison: Pairwise Alignmentdors.weizmann.ac.il/course/introbioinfo/Lect5_pairwise.pdf · 2019-04-29 · Match = 3 Gap = -2 8(3) + 3(-2) = 18 8(3) + 3(-2) = 18. However

Thuswehave:Quality=matches-(mismatches+gappenalty)Gappenalty=gapcreaHonpenalty+(gapextensionpenaltyXgaplength)

Page 33: Sequence Comparison: Pairwise Alignmentdors.weizmann.ac.il/course/introbioinfo/Lect5_pairwise.pdf · 2019-04-29 · Match = 3 Gap = -2 8(3) + 3(-2) = 18 8(3) + 3(-2) = 18. However

TheGapPenaltySonowwehave: V I T K L G T C V G S V I T K L G T C V G S V I T . . . T C V G S V . T K . G T C V . S

Match=3Gapopen=-4Gapextension=-1 8(3)+[1(-4)+3(-1)]=178(3)+[3(-4)+3(-1)]=9

Page 34: Sequence Comparison: Pairwise Alignmentdors.weizmann.ac.il/course/introbioinfo/Lect5_pairwise.pdf · 2019-04-29 · Match = 3 Gap = -2 8(3) + 3(-2) = 18 8(3) + 3(-2) = 18. However

Gappenaltyparameters

InserHonofagapmustimprovethequalityofthealignment(raisethequalityscore).

IfthegapcreaHonandgapextensionpenalHesarehigh,lessgapswillbeinsertedintothealignment.

IfthegapcreaHonandgapextensionpenalHesarelow,moregapswillbeinsertedintothealignment.

Page 35: Sequence Comparison: Pairwise Alignmentdors.weizmann.ac.il/course/introbioinfo/Lect5_pairwise.pdf · 2019-04-29 · Match = 3 Gap = -2 8(3) + 3(-2) = 18 8(3) + 3(-2) = 18. However

SoifyouareinterestedinanalignmentbetweentwoverysimilarsequencesthegappenalHesshouldberaised,toreducethechancesofgejngsomethingrandom.

IfyouareinterestedindetecHnghomology(findingaweaksimilarity)betweentwodistantlyrelatedsequencesthegappenalHesshouldbelowered.

Ifyoudon'tknowwhattoexpect,startoffwiththedefaultparameters

Page 36: Sequence Comparison: Pairwise Alignmentdors.weizmann.ac.il/course/introbioinfo/Lect5_pairwise.pdf · 2019-04-29 · Match = 3 Gap = -2 8(3) + 3(-2) = 18 8(3) + 3(-2) = 18. However

Tosummarize:■  Alignmentscoresaredependentonwhatwechoosefor:matches,mismatches,subsHtuHonsandgaps.

■  Dynamicprogrammingcanbeusedforglobalorlocalalignment

Page 37: Sequence Comparison: Pairwise Alignmentdors.weizmann.ac.il/course/introbioinfo/Lect5_pairwise.pdf · 2019-04-29 · Match = 3 Gap = -2 8(3) + 3(-2) = 18 8(3) + 3(-2) = 18. However

Twotypesofalignment:

•  Globalalignment

•  Localalignment

Page 38: Sequence Comparison: Pairwise Alignmentdors.weizmann.ac.il/course/introbioinfo/Lect5_pairwise.pdf · 2019-04-29 · Match = 3 Gap = -2 8(3) + 3(-2) = 18 8(3) + 3(-2) = 18. However

Globalalignment

Localalignment

Page 39: Sequence Comparison: Pairwise Alignmentdors.weizmann.ac.il/course/introbioinfo/Lect5_pairwise.pdf · 2019-04-29 · Match = 3 Gap = -2 8(3) + 3(-2) = 18 8(3) + 3(-2) = 18. However

Globalalignment

Aglobalpairwisealignmentisonewhereitisassumedthatthetwosequenceshavedivergedfromacommonancestorandthattheprogramshouldtrytostretchthetwosequences,introducinggapswherenecessary,inordertoshowthealignmentoverthewholelengthofthetwosequencesthatbestillustratestheirsimilariHes.

Page 40: Sequence Comparison: Pairwise Alignmentdors.weizmann.ac.il/course/introbioinfo/Lect5_pairwise.pdf · 2019-04-29 · Match = 3 Gap = -2 8(3) + 3(-2) = 18 8(3) + 3(-2) = 18. However

Globalalignment

•  Comparessequencesandgivesbestoverallalignment

•  Mayfailtofindthebestlocalregionofsimilarity(suchasasharedmoHf)amongdistantlyrelatedsequences

•  Will(generally)returnonlythebestmatchingsegmentforagivenpairofsequences

Page 41: Sequence Comparison: Pairwise Alignmentdors.weizmann.ac.il/course/introbioinfo/Lect5_pairwise.pdf · 2019-04-29 · Match = 3 Gap = -2 8(3) + 3(-2) = 18 8(3) + 3(-2) = 18. However

Globalalignment–EndGaps

•  Sinceaglobalalignmentcanonlygiveoneoveralloutput,thequesHonarisesofhowwedealwithoverhangingends,alsoknownas‘endgaps’

•  ThereisanopHonalpenaltyforendgapsinmostglobalalignmentprograms,thoughtheyarenotnecessarilyonbydefault

Page 42: Sequence Comparison: Pairwise Alignmentdors.weizmann.ac.il/course/introbioinfo/Lect5_pairwise.pdf · 2019-04-29 · Match = 3 Gap = -2 8(3) + 3(-2) = 18 8(3) + 3(-2) = 18. However

Globalalignment

•  TheclassicalalgorithmforglobalalignmentistheNeedleman-Wunsch

AgeneralmethodapplicabletothesearchforsimilariHesintheaminoacidsequenceoftwoproteins.NeedlemanSB,WunschCDJMolBiol1970Mar;48(3):443-53

Page 43: Sequence Comparison: Pairwise Alignmentdors.weizmann.ac.il/course/introbioinfo/Lect5_pairwise.pdf · 2019-04-29 · Match = 3 Gap = -2 8(3) + 3(-2) = 18 8(3) + 3(-2) = 18. However

LocalAlignment•  SearchesforregionsoflocalsimilaritybetweentwosequencesandneednotincludetheenHrelengthofthesequences.

•  Findsregionsof(ungapped)sequencewithahighdegreeofsimilarity

•  BeXeratfindingmoHfs,especiallyforsequencesthataredifferentoverall

•  Canreturnmorethanonematchingsegmentforagivenpairofsequences

Page 44: Sequence Comparison: Pairwise Alignmentdors.weizmann.ac.il/course/introbioinfo/Lect5_pairwise.pdf · 2019-04-29 · Match = 3 Gap = -2 8(3) + 3(-2) = 18 8(3) + 3(-2) = 18. However

LocalAlignment

•  TheclassicalalgorithmforlocalalignmentistheSmith-Waterman

IdenHficaHonofcommonmolecularsubsequencesSmithTF,WatermanMSJMolBiol1981Mar25;147(1):195-7

Page 45: Sequence Comparison: Pairwise Alignmentdors.weizmann.ac.il/course/introbioinfo/Lect5_pairwise.pdf · 2019-04-29 · Match = 3 Gap = -2 8(3) + 3(-2) = 18 8(3) + 3(-2) = 18. However

SequenceComparisonPrograms

•  Global

•  Needle(EMBOSS)

•  Stretcher(EMBOSS)–modifiedtoconserve

memory,goodforlongsequences

Page 46: Sequence Comparison: Pairwise Alignmentdors.weizmann.ac.il/course/introbioinfo/Lect5_pairwise.pdf · 2019-04-29 · Match = 3 Gap = -2 8(3) + 3(-2) = 18 8(3) + 3(-2) = 18. However

SequenceComparisonPrograms

•  Local

•  Lalign(Fasta)–canreturnmorethanonesegment

•  Matcher(EMBOSS)-basedonlalign,canreturn

morethanonesegment

•  Water(EMBOSS)-Smith-Waterman,onlyonehit

•  Bl2Seq–Blast2sequences

Page 47: Sequence Comparison: Pairwise Alignmentdors.weizmann.ac.il/course/introbioinfo/Lect5_pairwise.pdf · 2019-04-29 · Match = 3 Gap = -2 8(3) + 3(-2) = 18 8(3) + 3(-2) = 18. However

LocalpairwisealignmentusingBL2SEQatNCBI

■  ThistoolproducesthealignmentoftwogivensequencesusingBLASTalgorithmforlocalalignment.

■  Reference:TaHanaA.Tatusova,ThomasL.Madden(1999),"Blast2sequences-anewtoolforcomparingproteinandnucleoHdesequences",FEMSMicrobiolLeX.174:247-250

Page 48: Sequence Comparison: Pairwise Alignmentdors.weizmann.ac.il/course/introbioinfo/Lect5_pairwise.pdf · 2019-04-29 · Match = 3 Gap = -2 8(3) + 3(-2) = 18 8(3) + 3(-2) = 18. However

LocalpairwisealignmentusingBL2SEQ

■  ThistooluHlizestheBLASTengineforpairwisesequencecomparisonandisbasedonthesamealgorithmandstaHsHcsoflocalalignmentsthathavebeendescribedintheBLASTpaper.

■  TheBLASTalgorithmgeneratesagappedalignmentbyusingdynamicprogrammingtoextendthecentralsegmentofalignedresidues.

■  Becausetheparameterswerebasedondatabasesearching,somemayhavetobechangedtofindamatch

Page 49: Sequence Comparison: Pairwise Alignmentdors.weizmann.ac.il/course/introbioinfo/Lect5_pairwise.pdf · 2019-04-29 · Match = 3 Gap = -2 8(3) + 3(-2) = 18 8(3) + 3(-2) = 18. However

StaHsHcalEvaluaHonofAlignments

TheproblemwiththeseprogramsisnomaXerhowdissimilarthesequencesyoucompare,theprogramswillalwaysalignthem.

Evena5%idenHtywillbedisplayedasavalidresult.

SohowcanyoutellifthealignmentisstaHsHcallyvalid????

Page 50: Sequence Comparison: Pairwise Alignmentdors.weizmann.ac.il/course/introbioinfo/Lect5_pairwise.pdf · 2019-04-29 · Match = 3 Gap = -2 8(3) + 3(-2) = 18 8(3) + 3(-2) = 18. However

StaHsHcsbyrandomizaHon

■  Aprogramwilltakethesecondsequenceyouinputandshuffleit,toobtainarandomsequencewiththesamecharactercomposiHon.

■  Thisrandomsequencewillbecomparedtothefirstsequence,usingeitheraglobalorlocalalgorithm(thesamethatyouusedoriginally),andaqualityscorewillbeobtained.

Page 51: Sequence Comparison: Pairwise Alignmentdors.weizmann.ac.il/course/introbioinfo/Lect5_pairwise.pdf · 2019-04-29 · Match = 3 Gap = -2 8(3) + 3(-2) = 18 8(3) + 3(-2) = 18. However

RandomizaHon■  ThisprocessisrepeatedmanyHmes,(numberofHmesgenerallyspecifiedbytheuser)inordertoobtainapopulaHonofsequencesthatcanbeusedforstaHsHcalanalysis.

■  ThequalityofthesealignmentsisploXedinadistribuHonandcomparedtotheoriginalquality,andthenbeusedtogiveastaHsHcallymeaningfulanswertothealignment.

Page 52: Sequence Comparison: Pairwise Alignmentdors.weizmann.ac.il/course/introbioinfo/Lect5_pairwise.pdf · 2019-04-29 · Match = 3 Gap = -2 8(3) + 3(-2) = 18 8(3) + 3(-2) = 18. However

IntheFASTApackage,thePRSSprogramcanperformshufflingofsequencesItcanbedoneuniformlythroughoutthesequence,orusingwindows(whichisusefuliftherearenon-randomwindowsinasequence,likeatransmembranedomain,whichwillbeskewedtowardshydrophobicaminoacids).

Page 53: Sequence Comparison: Pairwise Alignmentdors.weizmann.ac.il/course/introbioinfo/Lect5_pairwise.pdf · 2019-04-29 · Match = 3 Gap = -2 8(3) + 3(-2) = 18 8(3) + 3(-2) = 18. However

Dotplotsaretwodimensionalgraphs,showingacomparisonoftwosequences.Thetwoaxesofthegraphrepresentthetwosequencesbeingcompared.Everyregionofthesequenceiscomparedtoeveryregionoftheothersequence.

Dotplots

Page 54: Sequence Comparison: Pairwise Alignmentdors.weizmann.ac.il/course/introbioinfo/Lect5_pairwise.pdf · 2019-04-29 · Match = 3 Gap = -2 8(3) + 3(-2) = 18 8(3) + 3(-2) = 18. However
Page 55: Sequence Comparison: Pairwise Alignmentdors.weizmann.ac.il/course/introbioinfo/Lect5_pairwise.pdf · 2019-04-29 · Match = 3 Gap = -2 8(3) + 3(-2) = 18 8(3) + 3(-2) = 18. However

DotplotsDotplojngisthebestwaytoseeallofthestructuresincommonbetweentwosequences.Dotplojngcanalsobeusedtoviewrepeatedstructuresorinvertedrepeatsinasinglesequence.Thisisaccomplishedbycomparingasequencetoitself.Dotplojnghelpsrecognizelargeregionsofsimilarity.InmostcasesitisnotsensiHveenoughtoseesmallstructures.

Page 56: Sequence Comparison: Pairwise Alignmentdors.weizmann.ac.il/course/introbioinfo/Lect5_pairwise.pdf · 2019-04-29 · Match = 3 Gap = -2 8(3) + 3(-2) = 18 8(3) + 3(-2) = 18. However

ComparisonCriteriaThematchcriterioncanbemetintwodifferentways:Thewindow/stringencymethod.Thewordmethod.

Page 57: Sequence Comparison: Pairwise Alignmentdors.weizmann.ac.il/course/introbioinfo/Lect5_pairwise.pdf · 2019-04-29 · Match = 3 Gap = -2 8(3) + 3(-2) = 18 8(3) + 3(-2) = 18. However

Thewindow/stringencymethod

Searchesforalltheplaceswhereagivennumberofmatches(stringency)occurwithinagivenrange(window).ThismethodismoreHme-consuming,butmoresensiHve.Comparisonsaredoneaccordingtoascoringmatrix.

Page 58: Sequence Comparison: Pairwise Alignmentdors.weizmann.ac.il/course/introbioinfo/Lect5_pairwise.pdf · 2019-04-29 · Match = 3 Gap = -2 8(3) + 3(-2) = 18 8(3) + 3(-2) = 18. However

Mustbespecifiedonthecommandline(-wordsize=X,whereXisthesizeyouchoose).Searchesforshortperfectmatchesofasetlength(words).Thismethodisabout1000Hmesfasterthanthewindow/stringencymethod,butismuchlesssensiHve.Ifthesequencesdonotcontainshortperfectmatchesthenthismethodwillfindnothing.

Thewordmethod

Page 59: Sequence Comparison: Pairwise Alignmentdors.weizmann.ac.il/course/introbioinfo/Lect5_pairwise.pdf · 2019-04-29 · Match = 3 Gap = -2 8(3) + 3(-2) = 18 8(3) + 3(-2) = 18. However

HintsIfyouhavelongsequences,tryawordcomparisonfirst.Thisismuchfaster,andwillgiveyouanideaofwhatthedotplotforthemoresensiHvewindow/stringencymethodwilllooklike.Whenusingthewordmethod,startoffwithawordsizeof6fornucleicacidsequencesofupto1,000bases,or8forsequencesofupto10,000.

Page 60: Sequence Comparison: Pairwise Alignmentdors.weizmann.ac.il/course/introbioinfo/Lect5_pairwise.pdf · 2019-04-29 · Match = 3 Gap = -2 8(3) + 3(-2) = 18 8(3) + 3(-2) = 18. However

Hints

ForpepHdesequences,startoffwithawordsizeof2-3.Whenusingthewindow/stringencymethodstartoffwithawindowof21andastringencyof14fornucleicacids.ForpepHdesequencesstartoffwithawindowof30andastringencyof11.

Page 61: Sequence Comparison: Pairwise Alignmentdors.weizmann.ac.il/course/introbioinfo/Lect5_pairwise.pdf · 2019-04-29 · Match = 3 Gap = -2 8(3) + 3(-2) = 18 8(3) + 3(-2) = 18. However

Programsfordotplots

■  FASTA–  PLALIGN

■  EMBOSS–  Dotmatcher-window/stringency–  DoXup-wordplot–  Dotpath-non-overlappingwordplot–  Polydot-allagainstallwordplot

Page 62: Sequence Comparison: Pairwise Alignmentdors.weizmann.ac.il/course/introbioinfo/Lect5_pairwise.pdf · 2019-04-29 · Match = 3 Gap = -2 8(3) + 3(-2) = 18 8(3) + 3(-2) = 18. However

AlternaHve“dotplots”

DoXerisagraphicaldotplotprogramfordetailedcomparisonoftwosequences.Tomakethescorematrixmoreintelligible,thepairwisescoresareaveragedoveraslidingwindowwhichrunsdiagonally.Theaveragedscorematrixformsathree-dimensionallandscape,withthetwosequencesintwodimensionsandtheheightofthepeaksinthethird.

Page 63: Sequence Comparison: Pairwise Alignmentdors.weizmann.ac.il/course/introbioinfo/Lect5_pairwise.pdf · 2019-04-29 · Match = 3 Gap = -2 8(3) + 3(-2) = 18 8(3) + 3(-2) = 18. However

Thislandscapeisprojectedontotwodimensionsbyaidofgreyscales-thedarkergreyofapeak,thehigheritis.DoXerprovidesatooltoexplorethevisualappearanceofthislandscape,aswellasatooltoexaminethesequencealignmentitrepresents.