Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
Assignments + Exam• Assignment 2• Due 23:59 today, grace period until tomorrow 13:00• Marking, planned release: 22 Oct• Assignment 3• Planned release: Thu, 3 Oct• Due: 21 Oct 23:59, grace period 22 Oct 13:00• Mark, planned release: 12 Nov (after final exam)• Final exam timetable is out• Final exam: 7 November 9.00am, 2 hours• You can bring 1 A4 page, hand-written on both sides• Split into 2, 7-11 Barry Drive, please check timetable
COMP3600/6466 – Algorithms Dynamic Programming 1
[CLRS 15.4]
Hanna Kurniawati
https://cs.anu.edu.au/courses/comp3600/
TopicsüWhat is it?üExample: Fibonacci Sequence• Example: Longest Common Subsequence• Requirements• Dynamic Programming in Algorithm vs in Optimization
Today• Example: Longest Common Subsequence• Requirements• Dynamic Programming in Algorithm vs in Optimization
Longest Common Subsequence (LCS)• The Problem: Given two strings X and Y, find a
subsequence of the strings that appear in both X and Y and has the longest length• Note: A subsequence does not need to be contagious
but, the order must be the same• Example: • Suppose X = (A, B, C, B, D, A, B) and Y = (B, D, C, A,
B, A). Then, LCS(X, Y) = (B, C, A, B) OR (B, D, A, B)• Applications:• Computational biology, e.g., comparing DNA• diff
Brute Force• Suppose X has length m and Y has length
n, and suppose Y is shorter than X. Then, take all possible subsequence of the shorter sequence, which in this case is 2", and check each of these subsequences, if it is also a subsequence of Y. • Time complexity?
Optimal Substructure Properties of LCS• Suppose X = (x1, x2, …, xm) and Y = (y1, y2, …, yn) be
the input sequences and suppose Z = (z1, z2, …, zk) be any LCS of X and Y, then there’s 3 cases:• If xm = yn and zk = xm = yn then Zk-1 is an LCS of Xm-1 and Yn-1
• If xm ≠ yn and zk ≠ xm then Zk-1 is an LCS of Xm-1 and Y• If xm ≠ yn and zk ≠ yn then Zk-1 is an LCS of X and Yn-1• Xi, Yi, and Zi are the subsequence of X, Y, and Z respectively,
starting from index-1 to index-i
The Dynamic Programming Algorithm• Save the length of LCS of X1...Xm and Y1…Yn in a 2D
table, denoted as C, where c[i, j] is the length of an LCS for the sequence Xi and Yj
• Initiate the entire values of c[0, *] and c[*, 0] with 0• Use bottom-up approach, starting from c[0, 0]• Then, fill in the value for C from top to bottom and from
left to right following the optimal substructure property:
𝑐 𝑖, 𝑗 = )0 𝑖 = 0 𝑂𝑅 𝑗 = 0
𝑐 𝑖 − 1, 𝑗 − 1 + 1 𝑖, 𝑗 > 0 𝑎𝑛𝑑 𝑥5 = 𝑦7max(𝑐 𝑖, 𝑗 − 1 , 𝑐[𝑖 − 1, 𝑗]) 𝑖, 𝑗 > 0 𝑎𝑛𝑑 𝑥5 ≠ 𝑦7
Example• Please find the LCS of X = (A, B, C, B, D, A, B)
and Y = (B, D, C, A, B, A)
[CLRS] p. 395
TopicsüWhat is it?üExample: Fibonacci SequenceüExample: Longest Common Subsequence• Requirements• Dynamic Programming in Algorithm vs in Optimization
Two Requirements for DP• Optimal substructure: Optimal solution to
the problem is formed by optimal solutions to sub-problems• Overlapping sub-problems
A note about optimal substructure• Need to be independent sub-problems, in the sense
that the solution of one sub-problem is not affected by the solution of another sub-problem• Example:• Given an undirected graph, find the shortest simple path vs
the longest simple path• Simple path: acyclic path• Note that for shortest path, we can remove “simple”
TopicsüWhat is it?üExample: Fibonacci SequenceüExample: Longest Common SubsequenceüRequirements• Dynamic Programming in Algorithm vs in Optimization
Dynamic Programmingin Algorithm vs in Optimization
• Dynamic Programming is a well-known approach (in fact one of two major approaches) in control and sequential decision-making• Sequential decision-making: The problem of deciding what
should a system do now, so as to get good long-term performance
• Rely on Bellman Principle of Optimality: An optimal policy has the property that whatever the initial state and initial decision are, the remaining decisions must constitute an optimal policy with regard to the state resulting from the fir st decision
Dynamic Programming: in Algorithm vs in Optimization• As we have seen here, it is also a well-known
technique for algorithm design• Somehow, the two are often seen/considered
disconnected• But, they’re actually not! • Bellman principle of optimality is essentially the optimal
substructure we’ve been discussing• In fact, Dynamic Programming in Optimization is an
example of Dynamic Programming algorithm design technique for solving optimization problem
An Example: Solving a Markov Decision Processes (MDP) Problem• A framework to find the best sequence of
actions to perform when the outcome of each action is non-deterministic.• Example:• Games: Tic Tac Toe, Chess, Go, etc.• Races: bicycle race, car race, etc.• Navigation:
Markov Decision Processes• The non-determinism must be 1st order Markov.• 1st order Markov means given the present state, the
future states are independent from the past states.• P(st+1 | st, at) = P(st+1 | st, at, st-1, at-1, .., s1, a1, s0)
Defining an MDP Problem• Formally defined as 4-tuples
(S, A, T, R):• S: State space• A: Action space• T: transition function
T(s, a, s’) = P(St+1 = s’ | St = s, At = a)• R: Reward function
R(s) or R(s, a) or R(s, a, s’)
G
Solving an MDP problem• Is finding an optimal policy, usually
denoted as π*.• Policy = strategy• A mapping from states to actions π : S à A.• Meaning for any state s in S, π(s) wil tell us the best
action the system should perform.• Example: +1
-1
Using a Policy
Policy
Action
Observation (state)
G
1. Starts from the initial state.2. Move according to the policy.3. The system moves to a new
state and receives a reward Some notes:The new state the system ends up may be different in different runs.The goal of the system is to get the maximum possible total reward
4. Repeat to 2 until stopping criteria is satisfied (e.g., goal is reached)
Solving an MDP is Solving an Optimization Problem
• Recall optimal policy maps states to the best action. Best here means maximizing the following
• Theorem: There is a unique function V* satisfying the above function
Q(s, a)
Bellman equation
𝑉∗ 𝑠 = maxB
𝑅 𝑠 + 𝛾DEF
𝑇 𝑠, 𝑎, 𝑠′ 𝑉∗ 𝑠′
Solving an MDP is Solving an Optimization Problem
• Optimal policy?• If we know V*, the optimal policy can be generated
easily.
𝜋∗ 𝑠 = argmaxB
𝑅 𝑠 + 𝛾DEF
𝑇 𝑠, 𝑎, 𝑠′ 𝑉∗ 𝑠′
Value Iteration: A way to compute the optimal value function• Iterate calculating the optimal value of a state until
convergence.• Algorithm:
Initialize for all s.Loop
For all s {
}t = t + 1
Until Vt+1(s)=Vt(s) for all s (impl: maxs |Vt+1(s)-Vt(s)| < 1e-7)• Essentially, bottom-up dynamic programming
Often called value update or Bellman update or Bellman backup.
𝑉LMN 𝑠 = maxB
𝑅 𝑠 + 𝛾DEF
𝑇 𝑠, 𝑎, 𝑠′ 𝑉L 𝑠′
𝑉O 𝑠 = 𝑅 𝑠
Example: Simple Navigation• An agent moves in 4X3 grid cells.• It can move to one of four neighboring
cells. The actions’ accuracy is 70%.30% of the time, the agent ends up at the left or right of its intended cell, or at the current cell, with equal probability. If there’s no cell in the left or right of its intended cell, the probability mass is added to staying where it is.• Collision with obstacle/boundary will result in no
movement.• Two terminal states, with reward +1 and -1. All other
actions incur a cost of -0.04.
+1-1
S
Let’s first define the MDP+1-1
S
TopicsüWhat is it?üExample: Fibonacci SequenceüExample: Longest Common SubsequenceüRequirementsüDynamic Programming in Algorithm vs in Optimization