Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
CS3000:Algorithms&DataJonathanUllman
Lecture9:• DynamicProgramming:EditDistance,RNAFolding
Oct5,2018
OfficeHourssched2
IIIsoso.IE oos7To s oo 31 s
EditDistanceAlignments
DistanceBetweenStrings
• Autocorrectworksbyfindingsimilarstrings
• ocurrance andoccurrence seemsimilar,butonlyifwedefinesimilaritycarefully
ocurranceoccurrence
oc urranceoccurrenceit
EditDistance/Alignments
• Giventwostrings! ∈ Σ$, & ∈ Σ',theeditdistanceisthenumberofinsertions,deletions,andswapsrequiredtoturn! into&.
• Givenanalignment,thecostisthenumberofpositionswherethetwostringsdon’tagree
o c u r r a n c eo c c u r r e n c e
I
O OyAHou gaps but not
transpositions
cost of the alignment is the of columns where too
symbols dont agree
AsktheAudience
• Whatistheminimumcostalignmentofthestringssmitten andsitting
editdistance
Edit Dist 3
s m i t t e n
s i t t i n gL 2 3
smitten sitter sittin sitting
EditDistance/Alignments
• Input: Twostrings! ∈ Σ$, & ∈ Σ'• Output: Theminimumcostalignmentof! and&• EditDistance=costoftheminimumcostalignment
on goesin theAnand
DynamicProgramming
• Considertheoptimal alignmentof!, &• Threechoicesforthefinalcolumn• CaseI:onlyuse! (!$,− )• CaseII:onlyuse& (−, &' )• CaseIII:useonesymbolfromeach(!$, &' )
optimalalignment optimalalignment optimalalignment
Xl Xn 1 Xn X1 Xn Xi Xu 1 Xn
Y Jm Yi yn i ym Yi Ym i ym
To determine which case is best solve alignmenton a smaller X Y
DynamicProgramming
• Considertheoptimal alignmentof!, &• CaseI:onlyuse! (!$, − )• deletion+optimalalignmentof!):$+), &):'
• CaseII:onlyuse& (−, &' )• insertion+optimalalignmentof!):$, &):'+)
• CaseIII:useonesymbolfromeach(!$, &' )• If!$ = &':optimalalignmentof!):$+), &):'+)• If!$ ≠ &':mismatch+opt.alignmentof!):$+), &):'+)
DynamicProgramming
• ./0 1, 2 =costofopt.alignmentof!):3 and&):4• CaseI:onlyuse! (!3, − )• CaseII:onlyuse& (−, &4 )• CaseIII:useonesymbolfromeach(!3, &4 )
f nti ma subproblems
I 1 OPT i l jl t OPT i j 1
OPT i t j t t 1 if Xi y0PT i jope f j M if Xi yj
opt iI mm OPT i bj OPT i j t OPT i i j i if yjmm It OPT fit HOPT i j D OPTCi t j D x yj
DynamicProgramming
• ./0 1, 2 =costofopt.alignmentof!):3 and&):4• CaseI:onlyuse! (!3, − )• CaseII:onlyuse& (−, &4 )• CaseIII:useonesymbolfromeach(!3, &4 )
Recurrence:
OPT 8, 9 = : 1 + min OPT 8 − 1, 9 , OPT 8, 9 − 1 , OPT(8 − 1, 9 − 1)min{1 + DEF 8 − 1, 9 , 1 + OPT 8, 9 − 1 , OPT(8 − 1, 9 − 1)}
BaseCases:OPT 8, 0 = 8,OPT 0, 9 = 9
Examplex = perty = beast
- b e a s t-pert
Ogl 2 3 4
51I 2 3 4
522RI B 2 3 4
3 3 2 2 3q4
4 4 3 3theft
FindingtheAlignment
• ./0 1, 2 =costofopt.alignmentof!):3 and&):4• CaseI:onlyuse! (!3, − )• CaseII:onlyuse& (−, &4 )• CaseIII:useonesymbolfromeach(!3, &4 )
EditDistance(“Bottom-Up”)
// All inputs are global varsFindOPT(n,m):M[0,j]←j, M[i,0]←i
for (i= 1,…,n):for (j = 1,…,m):if (xi = yj):M[i,j] = min{1+M[i-1,j],1+M[i,j-1],M[i-1,j-1]
elseif (xi != yj):M[i,j] = 1+min{M[i-1,j],M[i,j-1],M[i-1,j-1]}
return M[n,m]
Ohm entries x Oli perentry O nm time
01hm space just for the table
AsktheAudience
• Supposeinserting/deletingcostsJ > L andswappingM ↔ O costsPM,O > L• Writearecurrenceforthemin-costalignment
EditDistanceSummary
• Computetheeditdistance,ormin-costalignmentbetweentwostringsintime/spaceD QR
• DynamicProgramming:• Decidethefinalpairofsymbolsinthealignment
• Spacecanbeprohibitiveinpractice• ComputeeditdistanceinspaceD min Q,R• CanalsofindalignmentinspaceD Q +R usingacleverdivide-and-conqueralgorithm!
RNAFolding
DNA
• DNAisastringoffourbases{A,C,G,T}• TwocomplementarystrandsofDNAsticktogetherandformadoublehelix• A—T andC—G arecomplementarypairs
RNAFolding
• RNAisastringoffourbases{A,C,G,U}• AsingleRNAstrandstickstoitselfandfoldsintocomplexstructures• A—U andC—G arecomplementarypairs
RNAFolding
• RNAstrandwilltrytominimizeenergy (formthemostbonds)subjecttoconstraints
c
crossing
i
Iotcomplementary
sharp turn
RNAFolding
• RNAisastringofbasesOS,… , OU ∈ V, W, X, Y• ThestructureisgivenbyasetofbondsZ consistingofpairs 8, 9 with8 < 9• (Complements)Only\ − ] or^ − _ canbepaired• (Matching)Nobase`3 isintwopairsinZ• (NoSharpTurns)If 8, 9 ∈ Z,then8 < 9 − 4• (Non-Crossing)If 8, 9 , b, ℓ ∈ Z thenitcannotbethecasethat8 < b < 9 < ℓf
f IT TET tuft 5 j
RNAFolding
• Input:RNAsequence`), … , `$ ∈ \, ^, _, ]• Output:AsetofpairsZ ⊆ 1,… , Q × 1,… , Q• Goal:maximizethesizeofZ• (Complements)Only\ − ] or^ − _ canbepaired• (Matching)Nobase`3 isintwopairsinZ• (NoSharpTurns)If 8, 9 ∈ Z,then8 < 9 − 4• (Non-Crossing)If 8, 9 , b, ℓ ∈ Z thenitcannotbethecasethat8 < b < 9 < ℓ
DynamicProgramming
• LetD betheoptimalsetofpairsfor`) ⋯`$• Case1: Q pairswithnothinginD
• Case2:Q pairswithsomeg < Q − 4 inD
Thes O is the opt set of pairs for b sbn
Then 0 is It n t opt set of pans for bee bopt set of pairs for b be
non crossing
Moon an l n
DynamicProgramming
• LetD3,4 betheoptimalsetofpairsfor`3 ⋯ 4̀• Case1: 9 pairswithnothinginD3,4
• Case2:9 pairswithsomeg < 9 − 4 inD3,4
DynamicProgramming
• LetOPT 8, 9 betheopt.number ofpairsfor`3 ⋯ 4̀• Case1: 9 pairswithnothinginD3,4
• Case2t:9 pairswithg < 9 − 4 inD3,4
OPT i j OPT i j l
OPT i j It OPT i t t OPT ttt j MConsider
any ts 1 tej 4 and bigbj are complements
1OPT I t l OPTH11 j l
DynamicProgramming
• LetOPT 8, 9 betheopt.number ofpairsfor`3 ⋯ 4̀• Case1: 9 pairswithnothinginD3,4• Case2t:9 pairswithg < 9 − 4 inD3,4Recurrence:OPT 8, 9= max OPT 8, 9 − 1 ,max OPT 8, g − 1 + OPT g + 1, 9 − 1
BaseCases:OPT 8, 9 = 0 if8 ≥ 9 − 4
Maximumoverallg suchthat• 8 ≤ g < 9 − 4• `l, 4̀ arecompatiblebases
I
FillingtheTable
Recurrence:OPT 8, 9 = max OPT 8, 9 − 1 , maxmnoopqrsl OPT 8, g − 1 + OPT g + 1, 9 − 1
6 7 8 j=9
4 0 0 0
3 0 0
2 0
i=1
Sequence:\^^__]\_]
r's
n 5 i entries
O n perentry
RNAFoldingSummary
• ComputetheoptimalRNAfolding intimeD QtandspaceD Qu
• DynamicProgramming:• Decideonanoptimalpair`l − `$• RemainingRNAistwonon-overlappingpieces• Addingvariables: onesubproblemforeachinterval
• Non-crossing andmatching arecritical• Thinkabouthowthedynamicprogrammingalgorithmchangesifweremoveeachoftheconditions