Assignments + Exam€¦ · shorter sequence, which in this case is 2", ... •In fact, Dynamic Programming in Optimization is an example of Dynamic Programming algorithm design technique

Assignments + Exam• Assignment 2• Due 23:59 today, grace period until tomorrow 13:00• Marking, planned release: 22 Oct• Assignment 3• Planned release: Thu, 3 Oct• Due: 21 Oct 23:59, grace period 22 Oct 13:00• Mark, planned release: 12 Nov (after final exam)• Final exam timetable is out• Final exam: 7 November 9.00am, 2 hours• You can bring 1 A4 page, hand-written on both sides• Split into 2, 7-11 Barry Drive, please check timetable

COMP3600/6466 – Algorithms Dynamic Programming 1

[CLRS 15.4]

Hanna Kurniawati

https://cs.anu.edu.au/courses/comp3600/

https://cs.anu.edu.au/courses/comp3600/

TopicsüWhat is it?üExample: Fibonacci Sequence• Example: Longest Common Subsequence• Requirements• Dynamic Programming in Algorithm vs in Optimization

Today• Example: Longest Common Subsequence• Requirements• Dynamic Programming in Algorithm vs in Optimization

Longest Common Subsequence (LCS)• The Problem: Given two strings X and Y, find a

subsequence of the strings that appear in both X and Y and has the longest length• Note: A subsequence does not need to be contagious

but, the order must be the same• Example: • Suppose X = (A, B, C, B, D, A, B) and Y = (B, D, C, A,

B, A). Then, LCS(X, Y) = (B, C, A, B) OR (B, D, A, B)• Applications:• Computational biology, e.g., comparing DNA• diff

Brute Force• Suppose X has length m and Y has length

n, and suppose Y is shorter than X. Then, take all possible subsequence of the shorter sequence, which in this case is 2", and check each of these subsequences, if it is also a subsequence of Y. • Time complexity?

Optimal Substructure Properties of LCS• Suppose X = (x1, x2, …, xm) and Y = (y1, y2, …, yn) be

the input sequences and suppose Z = (z1, z2, …, zk) be any LCS of X and Y, then there’s 3 cases:• If xm = yn and zk = xm = yn then Zk-1 is an LCS of Xm-1 and Yn-1

• If xm ≠ yn and zk ≠ xm then Zk-1 is an LCS of Xm-1 and Y• If xm ≠ yn and zk ≠ yn then Zk-1 is an LCS of X and Yn-1• Xi, Yi, and Zi are the subsequence of X, Y, and Z respectively,

starting from index-1 to index-i

The Dynamic Programming Algorithm• Save the length of LCS of X1...Xm and Y1…Yn in a 2D

table, denoted as C, where c[i, j] is the length of an LCS for the sequence Xi and Yj

• Initiate the entire values of c[0, *] and c[*, 0] with 0• Use bottom-up approach, starting from c[0, 0]• Then, fill in the value for C from top to bottom and from

left to right following the optimal substructure property:

𝑐 𝑖, 𝑗 = )0 𝑖 = 0 𝑂𝑅 𝑗 = 0

𝑐 𝑖 − 1, 𝑗 − 1 + 1 𝑖, 𝑗 > 0 𝑎𝑛𝑑 𝑥5 = 𝑦7max(𝑐 𝑖, 𝑗 − 1 , 𝑐[𝑖 − 1, 𝑗]) 𝑖, 𝑗 > 0 𝑎𝑛𝑑 𝑥5 ≠ 𝑦7

Example• Please find the LCS of X = (A, B, C, B, D, A, B)

and Y = (B, D, C, A, B, A)

[CLRS] p. 395

TopicsüWhat is it?üExample: Fibonacci SequenceüExample: Longest Common Subsequence• Requirements• Dynamic Programming in Algorithm vs in Optimization

Two Requirements for DP• Optimal substructure: Optimal solution to

the problem is formed by optimal solutions to sub-problems• Overlapping sub-problems

A note about optimal substructure• Need to be independent sub-problems, in the sense

that the solution of one sub-problem is not affected by the solution of another sub-problem• Example:• Given an undirected graph, find the shortest simple path vs

the longest simple path• Simple path: acyclic path• Note that for shortest path, we can remove “simple”

TopicsüWhat is it?üExample: Fibonacci SequenceüExample: Longest Common SubsequenceüRequirements• Dynamic Programming in Algorithm vs in Optimization

Dynamic Programmingin Algorithm vs in Optimization

• Dynamic Programming is a well-known approach (in fact one of two major approaches) in control and sequential decision-making• Sequential decision-making: The problem of deciding what

should a system do now, so as to get good long-term performance

• Rely on Bellman Principle of Optimality: An optimal policy has the property that whatever the initial state and initial decision are, the remaining decisions must constitute an optimal policy with regard to the state resulting from the fir st decision

Dynamic Programming: in Algorithm vs in Optimization• As we have seen here, it is also a well-known

technique for algorithm design• Somehow, the two are often seen/considered

disconnected• But, they’re actually not! • Bellman principle of optimality is essentially the optimal

substructure we’ve been discussing• In fact, Dynamic Programming in Optimization is an

example of Dynamic Programming algorithm design technique for solving optimization problem

An Example: Solving a Markov Decision Processes (MDP) Problem• A framework to find the best sequence of

actions to perform when the outcome of each action is non-deterministic.• Example:• Games: Tic Tac Toe, Chess, Go, etc.• Races: bicycle race, car race, etc.• Navigation:

Markov Decision Processes• The non-determinism must be 1st order Markov.• 1st order Markov means given the present state, the

future states are independent from the past states.• P(st+1 | st, at) = P(st+1 | st, at, st-1, at-1, .., s1, a1, s0)

Defining an MDP Problem• Formally defined as 4-tuples

(S, A, T, R):• S: State space• A: Action space• T: transition function

T(s, a, s’) = P(St+1 = s’ | St = s, At = a)• R: Reward function

R(s) or R(s, a) or R(s, a, s’)

G

Solving an MDP problem• Is finding an optimal policy, usually

denoted as π*.• Policy = strategy• A mapping from states to actions π : S à A.• Meaning for any state s in S, π(s) wil tell us the best

action the system should perform.• Example: +1

-1

Using a Policy

Policy

Action

Observation (state)

G

1. Starts from the initial state.2. Move according to the policy.3. The system moves to a new

state and receives a reward Some notes:The new state the system ends up may be different in different runs.The goal of the system is to get the maximum possible total reward

4. Repeat to 2 until stopping criteria is satisfied (e.g., goal is reached)

Solving an MDP is Solving an Optimization Problem

• Recall optimal policy maps states to the best action. Best here means maximizing the following

• Theorem: There is a unique function V* satisfying the above function

Q(s, a)

Bellman equation

𝑉∗ 𝑠 = maxB

𝑅 𝑠 + 𝛾DEF

𝑇 𝑠, 𝑎, 𝑠′ 𝑉∗ 𝑠′

Solving an MDP is Solving an Optimization Problem

• Optimal policy?• If we know V*, the optimal policy can be generated

easily.

𝜋∗ 𝑠 = argmaxB

𝑅 𝑠 + 𝛾DEF

𝑇 𝑠, 𝑎, 𝑠′ 𝑉∗ 𝑠′

Value Iteration: A way to compute the optimal value function• Iterate calculating the optimal value of a state until

convergence.• Algorithm:

Initialize for all s.Loop

For all s {

}t = t + 1

Until Vt+1(s)=Vt(s) for all s (impl: maxs |Vt+1(s)-Vt(s)| < 1e-7)• Essentially, bottom-up dynamic programming

Often called value update or Bellman update or Bellman backup.

𝑉LMN 𝑠 = maxB

𝑅 𝑠 + 𝛾DEF

𝑇 𝑠, 𝑎, 𝑠′ 𝑉L 𝑠′

𝑉O 𝑠 = 𝑅 𝑠

Example: Simple Navigation• An agent moves in 4X3 grid cells.• It can move to one of four neighboring

cells. The actions’ accuracy is 70%.30% of the time, the agent ends up at the left or right of its intended cell, or at the current cell, with equal probability. If there’s no cell in the left or right of its intended cell, the probability mass is added to staying where it is.• Collision with obstacle/boundary will result in no

movement.• Two terminal states, with reward +1 and -1. All other

actions incur a cost of -0.04.

+1-1

S

Let’s first define the MDP+1-1

S

TopicsüWhat is it?üExample: Fibonacci SequenceüExample: Longest Common SubsequenceüRequirementsüDynamic Programming in Algorithm vs in Optimization

Documents

Assignments + Exam€¦ · shorter sequence, which in this case is 2", ... •In fact, Dynamic Programming in Optimization is an example of Dynamic Programming algorithm design technique