Algorithm Design and Analysiscseweb.ucsd.edu/classes/sp20/cse101-a/Slides/Week-08/Lec... · 2020....

Algorithm Design and AnalysisSanjoy Dasgupta

Russell ImpagliazzoRagesh Jaiswal

(Thanks Miles Jones)

Lecture 28: Dynamic Programming(String Algorithms)

CSE 101

DP = BT + memoizationMemoization = store and re-use, like the Fibonnacci algorithm (from first week

lectures)

Two simple ideas, but easy to get confused if you rush:▪ Where is the recursion? (It disappears into the memoization, like the Fib.

Example did). Have I made a decision? (only temporarily, like BT)

If you don’t rush, a surprisingly powerful and simple algorithm technique

One of the most useful ideas around

DYNAMIC PROGRAMMING

1. Come up with simple back-tracking algorithm2. Characterize sub-problems3. Define matrix to store answers to the above4. Simulate BT algorithm on sub-problem5. Replace recursive calls with matrix elements6. Invert "top-down" order of BT to get "bottom-up" order

THE LONG WAY

FINAL ALGORITHM

7. Assemble into DP algorithm:• Fill in base cases into matrix• In bottom-up order do: Use translated recurrence to fill in each matrix

element• Return “main problem” answer• (Trace-back to get corresponding solution)

1. Come up with simple back-tracking algorithm2. Characterize sub-problems3. Define matrix to store answers to the above4. Simulate BT algorithm on sub-problem5. Replace recursive calls with matrix elements6. Invert "top-down" order of BT to get "bottom-up" order

Define sub-problems and corresponding matrixGive recursion for sub-problemsFind bottom-up orderAssemble as in the long way:▪ Fill in base cases of the recursion▪ In bottom-up order do:

▪ Fill in each cell of the matrix according to recursion▪ Return main case▪ (Traceback to find corresponding solution)

THE EXPERT’S WAY

You MUST explain what each cell of the matrix means AS a solution to a sub-problem

You MUST explain what the recursion is in terms of a LOCAL, COMPLETE case analysis

Undocumented dynamic programing is indistinguishable from nonsense. Assumptions about optimal solution almost always wrong.

EITHER WAY, A MUST

General issue: Comparing strings

Applications: Comparing versions of documents to highlight recent edits (diff), copyright infringement, plagiarism detection, genomics (comparing strands of DNA)

Many variants for particular applications, but use same general idea.We’ll look at one of the simplest, longest common subsequence

LONGEST COMMON SUBSEQUENCE

Hamming distance: Line the two strings up and compare them character by character. Count the number of identical symbols (distance= number of different symbols).

Example:▪ ALOHA▪ HALLOA

No matches!!

Hamming distance is not robust under small shifts, spacing, insertions▪ ALOHA▪ HALLOA

3 matches

WHY HAMMING DISTANCE IS INADEQUATE

A subsequence of a string is a string that appears left to right within the word, but not necessarily consecutively

The longest common subsequence (LCS) of two words is the largest string that is a subsequence of both words

ALOHAHALLOA

ALOA is a subsequence of both.

LONGEST COMMON SUBSEQUENCE

ALOHAHALLOAFirst letter mismatch: Must drop first letter from one or the other word

ALOHA or LOHAALLOA HALLOA

First letter match: Can keep first letter, and find LCS in restALOHA LOHA OHAALLOA = A + LLOA = AL + LOA

RECURSION

LCS(𝑢! , … , 𝑢" ; 𝑣! , … , 𝑣#)IF 𝑛 = 0 or 𝑚 = 0 return 0IF 𝑢! = 𝑣! return 1+ LCS(𝑢$ , … , 𝑢" ; 𝑣$ , … , 𝑣#)ELSE return max(LCS(𝑢$ , … , 𝑢" ; 𝑣! , … , 𝑣#), LCS(𝑢! , … , 𝑢" ; 𝑣$ , … , 𝑣#))

BT ALGORITHM

EXAMPLE

LOHAHALLOA

ALOHAALLOA

LOHAALLOA

OHAHALLOA OHA

ALOHAHALLOA

2(AL)+

HAHALLOA

OHAALLOA

LOHALLOA

OHALOA

maxmax

Say we start with words 𝑢! … , 𝑢"𝑣! , … , 𝑣#

In recursive calls, we recursively compute the LCSbetween one word of the form:and another word of the form:

SUBPROBLEMS

Say we start with words 𝑢! , … , 𝑢"𝑣! , … 𝑣#

In recursive calls, we recurse on: 𝑢$ , … , 𝑢" , 𝐼 = 1 … 𝑛 + 1to: 𝑣% , … , 𝑣# , 𝐽 = 1 … 𝑚 + 1

(𝐼 = 𝑛 + 1: first word empty, 𝐽 = 𝑚 + 1: second word empty)

Use matrix L[𝐼 , 𝐽]:= LCS(𝑢$ , … , 𝑢" ; 𝑣% , … , 𝑣#)

SUBPROBLEMS

LCS(𝑢! , … , 𝑢" ; 𝑣! , … , 𝑣#) LCS(𝑢$ , … , 𝑢" ; 𝑣% , … , 𝑣#)

IF 𝑛 = 0 or 𝑚 = 0 return 0IF or return 0IF 𝑢! = 𝑣! return 1+ LCS(𝑢& , … , 𝑢" ; 𝑣& , … , 𝑣#)IF return 1+ LCS ( ; )

ELSE return max(LCS(𝑢& , … , 𝑢" ; 𝑣! , … , 𝑣#), LCS(𝑢! , … , 𝑢" ; 𝑣& , … , 𝑣#))

ELSE return max(LCS( ; )LCS( ; ))

BT ALGORITHM

LCS(𝑢$ , … , 𝑢" ; 𝑣% , … , 𝑣#)To fill in L[𝐼 , 𝐽]

IF 𝐼 = 𝑛 + 1 or 𝐽 = 𝑚 + 1 return 0Base cases: L[ , ] = L[ , ] = 0IF 𝑢$ = 𝑣% return 1+ LCS(𝑢$'! , … , 𝑢" ; 𝑣%'! , … , 𝑣#)IF 𝑢$ = 𝑣% THEN L[𝐼 , 𝐽]:= 1+

ELSE return max(LCS(𝑢$'! , … , 𝑢" ; 𝑣% , … , 𝑣#), LCS(𝑢$ , … , 𝑢" ; 𝑣%'! , … , 𝑣#)

ELSE L[𝐼 , 𝐽] : = max (L[ , ], L[ , ])

BT ALGORITHM

L[𝐼 , 𝐽] ≡ max length of common subsequence between 𝑢$ , … , 𝑢" , 𝑣% , … , 𝑣#

Base cases: L[𝑚 + 1, 𝐽] =0, L[𝐼 , 𝑛 + 1] =0

Recurrence: IF 𝑢$ = 𝑣% THEN L[𝐼 , 𝐽]:= 1+ L[𝐼 + 1, 𝐽 + 1]ELSE L[𝐼 , 𝐽]:= max (L[𝐼 + 1, 𝐽], L[𝐼 , 𝐽 + 1])

FINAL RECURRENCES

Top down: 𝐼 increases OR 𝐽 increases

Bottom up: Both 𝐼 and 𝐽 decrease

BOTTOM UP ORDER

DPLCS(𝑢! , … , 𝑢" ; 𝑣! , … , 𝑣#) Initialize L[1 … 𝑛 + 1, 1 … 𝑚 + 1]FOR 𝐼 = 1 to 𝑛 + 1 do: L[𝐼 , 𝑚 + 1]:=0FOR 𝐽 = 1 to 𝑚 do: L[𝑛 + 1, 𝐽]:=0 FOR 𝐼 = 𝑛 down to 1 do:

FOR 𝐽 = 𝑚 down to 1 do:IF 𝑢$ = 𝑣% THEN L[𝐼 , 𝐽] :=1+ L[𝐼 + 1, 𝐽 + 1]

ELSE L[𝐼 , 𝐽] := max(L[𝐼 + 1, 𝐽], L[𝐼 , 𝐽 + 1])Return L[1,1]

DP-VERSION

EXAMPLE

0 0 0 0 0 0 0

EXAMPLE

H A L L O A

A 4 4 3 3 2 1 0

L 3 3 3 3 2 1 0

O 2 2 2 2 2 1 0

H 2 1 1 1 1 1 0

A 1 1 1 1 1 1 0

0 0 0 0 0 0 0

[ABW, 2015]: “If a conjecture by Impagliazzo-Paturi about the worst-case complexity of SAT (famous NP-complete problem) is true, then there is no substantial improvement in this algorithm for LCS possible”

SURPRISING RELATIONSHIP

SECONDARY STRUCTURE

RNA folds back on itself, forming chemical bonds between amino acids in the sequence

If we view the protein as a string, the secondary bonds form a matching on the characters of the string with a restriction: bonded pairs are either entirely inside or entirely outside other bonded pairs

ACGTAAAGCATGCAAGCATTAAACCTGG

SECONDARY STRUCTURE IS “OUTER PLANAR”

Strength of a bond between 𝐼 and 𝐽 depends on the two amino acids, Strength(𝑤! , 𝑤") (given as a table with 10 numbers, for the 10 pairs possible)

Given 𝑤! … 𝑤" , find the maximum possible strength of a secondary structure meeting the constraints of no intersecting bonds.

Cases: ▪ 𝑤! not matched▪ 𝑤! bonded to 𝑤$

Combines DP with divide and conquer

MAX STRENGTH SECONDARY STRUCTURE

Either 𝑤! bonds to some 𝑤$ , 𝐼 > 1 or remains unbonded.If it bonds to 𝑤$ , can only bond within 2 … 𝐼 − 1 and 𝐼 + 1 … 𝑛

BTSS(𝑤! … 𝑤")IF 𝑛 = 0 or 𝑛 = 1 return 0Max:= BTSS[𝑤& , . . 𝑤" ] //(case when 𝑤! unbonded)FOR 𝐼 = 2 to 𝑛 do:

THISCASE:= strength(𝑤! , 𝑤$ ) + BTSS(𝑤& , … , 𝑤$,!) + BTSS(𝑤$'! , … , 𝑤")

IF THISCASE > Max THEN Max:=THISCASEReturn Max

BACKTRACKING VERSION

Subproblems all have the form 𝑤$ , … , 𝑤% , consecutive subsequences

As we recur, size = 𝐽 − 𝐼 + 1 gets smaller.

Bottom up: size gets larger

Size=1, 0 : no bonds possible (Use 𝐽 = 𝐼 − 1 for size 0)

MS[𝐼 , 𝐽] : = max strength of secondary structure for 𝑤$ , … , 𝑤%

SUBPROBLEMS

DPSS(𝑤! … 𝑤")Initialize MS[1 … 𝑛, 0 … 𝑛]For 𝐼 = 1 to 𝑛 do:

MS[𝐼 , 𝐼 − 1] =0; MS[𝐼 , 𝐼 ]=0For 𝐾 = 1 to 𝑛 − 1 do:

FOR 𝐼 = 1 to 𝑛 − 𝐾 do:MS[𝐼 , 𝐼 + 𝐾 ] : = MS[𝐼 + 1, 𝐼 + 𝐾 ]

FOR 𝐿 = 𝐼 + 1 to 𝐼 + 𝐾 do:MS[𝐼 , 𝐼 + 𝐾 ]:= max(MS[𝐼 , 𝐼 + 𝐾 ], Strength(𝑤! , 𝑤# ) + MS[𝐼 , 𝐿 − 1 ] + MS[𝐿 + 1, 𝐼 + 𝐾 ])

Return MS[1, 𝑛]

DP ALGORITHM

TIME ANALYSIS

DPSS(𝑤! … 𝑤")Initialize MS[1 … 𝑛, 0 … 𝑛]For 𝐼 = 1 to 𝑛 do:

MS[𝐼 , 𝐼 − 1] =0; MS[𝐼 , 𝐼 ]=0For 𝐾 = 1 to 𝑛 − 1 do:

FOR 𝐼 = 1 to 𝑛 − 𝐾 do:MS[𝐼 , 𝐼 + 𝐾 ] : = MS[𝐼 + 1, 𝐼 + 𝐾 ]

FOR 𝐿 = 𝐼 + 1 to 𝐼 + 𝐾 do:MS[𝐼 , 𝐼 + 𝐾 ]:= max(MS[𝐼 , 𝐼 + 𝐾 ], Strength(𝑤! , 𝑤# ) + MS[𝐼 , 𝐿 − 1 ] + MS[𝐿 + 1, 𝐼 + 𝐾 ])

Return MS[1, 𝑛]

Bringman, Grandoni, Saha, Vassilevska-Williams [FOCS, 2016]: O(𝑛&.34…) time algorithm for RNA secondary structure, using speeded-up min-plus product and improved matrix multiply algorithms

BEST ALGORITHM

Algorithm Design and Analysiscseweb.ucsd.edu/classes/sp20/cse101-a/Slides/Week-08/Lec... · 2020....

Documents

Extracting Randomness From Few Independent Sources Boaz Barak, IAS Russell Impagliazzo, UCSD Avi Wigderson, IAS

Security Seminar, Fall 2003 On the (Im)possibility of Obfuscating Programs Boaz Barak, Oded Goldreich, Russel Impagliazzo, Steven Rudich, Amit Sahai, Salil

REPLica: REPL Instrumentation for Coq Analysiscseweb.ucsd.edu/~lerner/papers/REPLica.pdf · REPLica is a tool that collects fine-grained proof develop-ment data. It is available on

A stable trimeric influenza hemagglutinin stem as a …...RESEARCH ARTICLE VACCINES A stable trimeric influenza hemagglutinin stem as a broadly protective immunogen Antonietta Impagliazzo,

SimPoint 3.0: Faster and More Flexible Program Phase Analysiscseweb.ucsd.edu/~calder/papers/JILP-05-SimPoint3.pdf · 2006-06-19 · SimPoint 3.0: Faster and More Flexible Program

The stack algorithm vs viterbi algorithm

Optimistic Algorithm and Concurrency Control Algorithm

UNIT V GRAPHS - Weebly...Dijkstra’s Algorithm ii. Bellman-Ford Algorithm 2. All pair shortest path algorithm a. Floyd–Warshall Algorithm Dijkstra’s Algorithm Dijkstra(G,s) 01

CSE/Beng/BIMM 182: Biological Data Analysiscseweb.ucsd.edu/classes/fa09/cse182/slides/L1.ppt.pdf · 2009. 9. 24. · Life begins with Cell ... Walter Goad, 1942-2000. Sequence data

Luca Aceto School of Computer Science, Reykjavik ...luca/EATCS/GA2013.pdf · Nerode Prize 2013:Chris Calabro, Russell Impagliazzo, Valentine Kabanets, ... Luca Aceto EATCS General

CSE101: Algorithm Design and Analysiscseweb.ucsd.edu/classes/sp20/cse101-a/Slides/Week-06/Lec-22.pdf · CSE101: Algorithm Design and Analysis Russell Impagliazzo Sanjoy Dasgupta RageshJaiswal

Siz&Depth Trade-offs for Threshold Circuitspaturi/myPapers/pubs/ImpagliazzoPaturiS… · Siz&Depth Trade-offs for Threshold Circuits Russell Impagliazzo~ Ramamohan Paturi”, Michael

Exponential Lower Bounds for the Pigeonhole …toni/Papers/p200-beame.pdfExponential Lower Bounds for the Pigeonhole Principle * Paul Beame t Russell Impagliazzo Jan Kraji6ek Toniann

Recent Title s in Thi s Series - American Mathematical · PDF fileRecent Title s in Thi s Series ... (University of New Hampshire, ... 50 James Glimm, John Impagliazzo, and Isadore

Algorithm Residential – Algorithm Residential

algorithm Chapter 2 Algorithm Analysis

Private Programs: Obfuscation, a survey Guy Rothblum Barak, Goldreich, Impagliazzo, Rudich, Sahai, Vadhan and Yang Lynn, Prabhakaran and Sahai Goldwasser

Contents Routing Protocol and Algorithm Classifications Link State Routing Algorithm Distance Vector Routing Algorithm LS Algorithm vs. DV Algorithm Hierarchical

Filter-Based Algorithm for Metering Applications Algorithm for Metering Applications, ... algorithm setup and optimization, ... Filter-Based Algorithm for Metering Applications,

Approximate List- Decoding and Hardness Amplification Valentine Kabanets (SFU) joint work with Russell Impagliazzo and Ragesh Jaiswal (UCSD)