31
Assignments + Exam Assignment 2 Due 23:59 today, grace period until tomorrow 13:00 Marking, planned release: 22 Oct Assignment 3 Planned release: Thu, 3 Oct Due: 21 Oct 23:59, grace period 22 Oct 13:00 Mark, planned release: 12 Nov (after final exam) Final exam timetable is out Final exam: 7 November 9.00am, 2 hours You can bring 1 A4 page, hand-written on both sides Split into 2, 7-11 Barry Drive, please check timetable

Assignments + Exam€¦ · shorter sequence, which in this case is 2", ... •In fact, Dynamic Programming in Optimization is an example of Dynamic Programming algorithm design technique

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Assignments + Exam€¦ · shorter sequence, which in this case is 2", ... •In fact, Dynamic Programming in Optimization is an example of Dynamic Programming algorithm design technique

Assignments + Exam• Assignment 2• Due 23:59 today, grace period until tomorrow 13:00• Marking, planned release: 22 Oct• Assignment 3• Planned release: Thu, 3 Oct• Due: 21 Oct 23:59, grace period 22 Oct 13:00• Mark, planned release: 12 Nov (after final exam)• Final exam timetable is out• Final exam: 7 November 9.00am, 2 hours• You can bring 1 A4 page, hand-written on both sides• Split into 2, 7-11 Barry Drive, please check timetable

Page 2: Assignments + Exam€¦ · shorter sequence, which in this case is 2", ... •In fact, Dynamic Programming in Optimization is an example of Dynamic Programming algorithm design technique

COMP3600/6466 – Algorithms Dynamic Programming 1

[CLRS 15.4]

Hanna Kurniawati

https://cs.anu.edu.au/courses/comp3600/

Page 3: Assignments + Exam€¦ · shorter sequence, which in this case is 2", ... •In fact, Dynamic Programming in Optimization is an example of Dynamic Programming algorithm design technique

TopicsüWhat is it?üExample: Fibonacci Sequence• Example: Longest Common Subsequence• Requirements• Dynamic Programming in Algorithm vs in Optimization

Page 4: Assignments + Exam€¦ · shorter sequence, which in this case is 2", ... •In fact, Dynamic Programming in Optimization is an example of Dynamic Programming algorithm design technique

Today• Example: Longest Common Subsequence• Requirements• Dynamic Programming in Algorithm vs in Optimization

Page 5: Assignments + Exam€¦ · shorter sequence, which in this case is 2", ... •In fact, Dynamic Programming in Optimization is an example of Dynamic Programming algorithm design technique

Longest Common Subsequence (LCS)• The Problem: Given two strings X and Y, find a

subsequence of the strings that appear in both X and Y and has the longest length• Note: A subsequence does not need to be contagious

but, the order must be the same• Example: • Suppose X = (A, B, C, B, D, A, B) and Y = (B, D, C, A,

B, A). Then, LCS(X, Y) = (B, C, A, B) OR (B, D, A, B)• Applications:• Computational biology, e.g., comparing DNA• diff

Page 6: Assignments + Exam€¦ · shorter sequence, which in this case is 2", ... •In fact, Dynamic Programming in Optimization is an example of Dynamic Programming algorithm design technique

Brute Force• Suppose X has length m and Y has length

n, and suppose Y is shorter than X. Then, take all possible subsequence of the shorter sequence, which in this case is 2", and check each of these subsequences, if it is also a subsequence of Y. • Time complexity?

Page 7: Assignments + Exam€¦ · shorter sequence, which in this case is 2", ... •In fact, Dynamic Programming in Optimization is an example of Dynamic Programming algorithm design technique

Optimal Substructure Properties of LCS• Suppose X = (x1, x2, …, xm) and Y = (y1, y2, …, yn) be

the input sequences and suppose Z = (z1, z2, …, zk) be any LCS of X and Y, then there’s 3 cases:• If xm = yn and zk = xm = yn then Zk-1 is an LCS of Xm-1 and Yn-1

• If xm ≠ yn and zk ≠ xm then Zk-1 is an LCS of Xm-1 and Y• If xm ≠ yn and zk ≠ yn then Zk-1 is an LCS of X and Yn-1• Xi, Yi, and Zi are the subsequence of X, Y, and Z respectively,

starting from index-1 to index-i

Page 8: Assignments + Exam€¦ · shorter sequence, which in this case is 2", ... •In fact, Dynamic Programming in Optimization is an example of Dynamic Programming algorithm design technique

The Dynamic Programming Algorithm• Save the length of LCS of X1...Xm and Y1…Yn in a 2D

table, denoted as C, where c[i, j] is the length of an LCS for the sequence Xi and Yj

• Initiate the entire values of c[0, *] and c[*, 0] with 0• Use bottom-up approach, starting from c[0, 0]• Then, fill in the value for C from top to bottom and from

left to right following the optimal substructure property:

𝑐 𝑖, 𝑗 = )0 𝑖 = 0 𝑂𝑅 𝑗 = 0

𝑐 𝑖 − 1, 𝑗 − 1 + 1 𝑖, 𝑗 > 0 𝑎𝑛𝑑 𝑥5 = 𝑦7max(𝑐 𝑖, 𝑗 − 1 , 𝑐[𝑖 − 1, 𝑗]) 𝑖, 𝑗 > 0 𝑎𝑛𝑑 𝑥5 ≠ 𝑦7

Page 9: Assignments + Exam€¦ · shorter sequence, which in this case is 2", ... •In fact, Dynamic Programming in Optimization is an example of Dynamic Programming algorithm design technique

Example• Please find the LCS of X = (A, B, C, B, D, A, B)

and Y = (B, D, C, A, B, A)

[CLRS] p. 395

Page 10: Assignments + Exam€¦ · shorter sequence, which in this case is 2", ... •In fact, Dynamic Programming in Optimization is an example of Dynamic Programming algorithm design technique

TopicsüWhat is it?üExample: Fibonacci SequenceüExample: Longest Common Subsequence• Requirements• Dynamic Programming in Algorithm vs in Optimization

Page 11: Assignments + Exam€¦ · shorter sequence, which in this case is 2", ... •In fact, Dynamic Programming in Optimization is an example of Dynamic Programming algorithm design technique

Two Requirements for DP• Optimal substructure: Optimal solution to

the problem is formed by optimal solutions to sub-problems• Overlapping sub-problems

Page 12: Assignments + Exam€¦ · shorter sequence, which in this case is 2", ... •In fact, Dynamic Programming in Optimization is an example of Dynamic Programming algorithm design technique

A note about optimal substructure• Need to be independent sub-problems, in the sense

that the solution of one sub-problem is not affected by the solution of another sub-problem• Example:• Given an undirected graph, find the shortest simple path vs

the longest simple path• Simple path: acyclic path• Note that for shortest path, we can remove “simple”

Page 13: Assignments + Exam€¦ · shorter sequence, which in this case is 2", ... •In fact, Dynamic Programming in Optimization is an example of Dynamic Programming algorithm design technique

TopicsüWhat is it?üExample: Fibonacci SequenceüExample: Longest Common SubsequenceüRequirements• Dynamic Programming in Algorithm vs in Optimization

Page 14: Assignments + Exam€¦ · shorter sequence, which in this case is 2", ... •In fact, Dynamic Programming in Optimization is an example of Dynamic Programming algorithm design technique

Dynamic Programmingin Algorithm vs in Optimization

• Dynamic Programming is a well-known approach (in fact one of two major approaches) in control and sequential decision-making• Sequential decision-making: The problem of deciding what

should a system do now, so as to get good long-term performance

• Rely on Bellman Principle of Optimality: An optimal policy has the property that whatever the initial state and initial decision are, the remaining decisions must constitute an optimal policy with regard to the state resulting from the fir st decision

Page 15: Assignments + Exam€¦ · shorter sequence, which in this case is 2", ... •In fact, Dynamic Programming in Optimization is an example of Dynamic Programming algorithm design technique

Dynamic Programming: in Algorithm vs in Optimization• As we have seen here, it is also a well-known

technique for algorithm design• Somehow, the two are often seen/considered

disconnected• But, they’re actually not! • Bellman principle of optimality is essentially the optimal

substructure we’ve been discussing• In fact, Dynamic Programming in Optimization is an

example of Dynamic Programming algorithm design technique for solving optimization problem

Page 16: Assignments + Exam€¦ · shorter sequence, which in this case is 2", ... •In fact, Dynamic Programming in Optimization is an example of Dynamic Programming algorithm design technique

An Example: Solving a Markov Decision Processes (MDP) Problem• A framework to find the best sequence of

actions to perform when the outcome of each action is non-deterministic.• Example:• Games: Tic Tac Toe, Chess, Go, etc.• Races: bicycle race, car race, etc.• Navigation:

Page 17: Assignments + Exam€¦ · shorter sequence, which in this case is 2", ... •In fact, Dynamic Programming in Optimization is an example of Dynamic Programming algorithm design technique

Markov Decision Processes• The non-determinism must be 1st order Markov.• 1st order Markov means given the present state, the

future states are independent from the past states.• P(st+1 | st, at) = P(st+1 | st, at, st-1, at-1, .., s1, a1, s0)

Page 18: Assignments + Exam€¦ · shorter sequence, which in this case is 2", ... •In fact, Dynamic Programming in Optimization is an example of Dynamic Programming algorithm design technique

Defining an MDP Problem• Formally defined as 4-tuples

(S, A, T, R):• S: State space• A: Action space• T: transition function

T(s, a, s’) = P(St+1 = s’ | St = s, At = a)• R: Reward function

R(s) or R(s, a) or R(s, a, s’)

G

Page 19: Assignments + Exam€¦ · shorter sequence, which in this case is 2", ... •In fact, Dynamic Programming in Optimization is an example of Dynamic Programming algorithm design technique

Solving an MDP problem• Is finding an optimal policy, usually

denoted as π*.• Policy = strategy• A mapping from states to actions π : S à A.• Meaning for any state s in S, π(s) wil tell us the best

action the system should perform.• Example: +1

-1

Page 20: Assignments + Exam€¦ · shorter sequence, which in this case is 2", ... •In fact, Dynamic Programming in Optimization is an example of Dynamic Programming algorithm design technique

Using a Policy

Policy

Action

Observation (state)

G

1. Starts from the initial state.2. Move according to the policy.3. The system moves to a new

state and receives a reward Some notes:The new state the system ends up may be different in different runs.The goal of the system is to get the maximum possible total reward

4. Repeat to 2 until stopping criteria is satisfied (e.g., goal is reached)

Page 21: Assignments + Exam€¦ · shorter sequence, which in this case is 2", ... •In fact, Dynamic Programming in Optimization is an example of Dynamic Programming algorithm design technique

Solving an MDP is Solving an Optimization Problem

• Recall optimal policy maps states to the best action. Best here means maximizing the following

• Theorem: There is a unique function V* satisfying the above function

Q(s, a)

Bellman equation

𝑉∗ 𝑠 = maxB

𝑅 𝑠 + 𝛾DEF

𝑇 𝑠, 𝑎, 𝑠′ 𝑉∗ 𝑠′

Page 22: Assignments + Exam€¦ · shorter sequence, which in this case is 2", ... •In fact, Dynamic Programming in Optimization is an example of Dynamic Programming algorithm design technique

Solving an MDP is Solving an Optimization Problem

• Optimal policy?• If we know V*, the optimal policy can be generated

easily.

𝜋∗ 𝑠 = argmaxB

𝑅 𝑠 + 𝛾DEF

𝑇 𝑠, 𝑎, 𝑠′ 𝑉∗ 𝑠′

Page 23: Assignments + Exam€¦ · shorter sequence, which in this case is 2", ... •In fact, Dynamic Programming in Optimization is an example of Dynamic Programming algorithm design technique

Value Iteration: A way to compute the optimal value function• Iterate calculating the optimal value of a state until

convergence.• Algorithm:

Initialize for all s.Loop

For all s {

}t = t + 1

Until Vt+1(s)=Vt(s) for all s (impl: maxs |Vt+1(s)-Vt(s)| < 1e-7)• Essentially, bottom-up dynamic programming

Often called value update or Bellman update or Bellman backup.

𝑉LMN 𝑠 = maxB

𝑅 𝑠 + 𝛾DEF

𝑇 𝑠, 𝑎, 𝑠′ 𝑉L 𝑠′

𝑉O 𝑠 = 𝑅 𝑠

Page 24: Assignments + Exam€¦ · shorter sequence, which in this case is 2", ... •In fact, Dynamic Programming in Optimization is an example of Dynamic Programming algorithm design technique

Example: Simple Navigation• An agent moves in 4X3 grid cells.• It can move to one of four neighboring

cells. The actions’ accuracy is 70%.30% of the time, the agent ends up at the left or right of its intended cell, or at the current cell, with equal probability. If there’s no cell in the left or right of its intended cell, the probability mass is added to staying where it is.• Collision with obstacle/boundary will result in no

movement.• Two terminal states, with reward +1 and -1. All other

actions incur a cost of -0.04.

+1-1

S

Page 25: Assignments + Exam€¦ · shorter sequence, which in this case is 2", ... •In fact, Dynamic Programming in Optimization is an example of Dynamic Programming algorithm design technique

Let’s first define the MDP+1-1

S

Page 26: Assignments + Exam€¦ · shorter sequence, which in this case is 2", ... •In fact, Dynamic Programming in Optimization is an example of Dynamic Programming algorithm design technique
Page 27: Assignments + Exam€¦ · shorter sequence, which in this case is 2", ... •In fact, Dynamic Programming in Optimization is an example of Dynamic Programming algorithm design technique
Page 28: Assignments + Exam€¦ · shorter sequence, which in this case is 2", ... •In fact, Dynamic Programming in Optimization is an example of Dynamic Programming algorithm design technique
Page 29: Assignments + Exam€¦ · shorter sequence, which in this case is 2", ... •In fact, Dynamic Programming in Optimization is an example of Dynamic Programming algorithm design technique
Page 30: Assignments + Exam€¦ · shorter sequence, which in this case is 2", ... •In fact, Dynamic Programming in Optimization is an example of Dynamic Programming algorithm design technique
Page 31: Assignments + Exam€¦ · shorter sequence, which in this case is 2", ... •In fact, Dynamic Programming in Optimization is an example of Dynamic Programming algorithm design technique

TopicsüWhat is it?üExample: Fibonacci SequenceüExample: Longest Common SubsequenceüRequirementsüDynamic Programming in Algorithm vs in Optimization