50
4SC000 Q2 2017-2018 Optimal Control and Dynamic Programming Duarte Antunes

Optimal Control and Dynamic Programming › New › Antunes › Slides › Lecture3.pdfDynamic programming solution 6 Apply the DP algorithm to this transition diagram • Costs-to-go

  • Upload
    others

  • View
    7

  • Download
    1

Embed Size (px)

Citation preview

Page 1: Optimal Control and Dynamic Programming › New › Antunes › Slides › Lecture3.pdfDynamic programming solution 6 Apply the DP algorithm to this transition diagram • Costs-to-go

4SC000 Q2 2017-2018

Optimal Control and Dynamic Programming

Duarte Antunes

Page 2: Optimal Control and Dynamic Programming › New › Antunes › Slides › Lecture3.pdfDynamic programming solution 6 Apply the DP algorithm to this transition diagram • Costs-to-go

Outline

• Shortest paths in graphs

• Dynamic programming

• Dijkstra’s and A* algorithms

• Certainty equivalent control

Page 3: Optimal Control and Dynamic Programming › New › Antunes › Slides › Lecture3.pdfDynamic programming solution 6 Apply the DP algorithm to this transition diagram • Costs-to-go

Graph

1

1

23

4

5

67

813

3

4 6

5

Weighted Graph

• Nodes

• Edges

• Weights

• Undirected if

V := {1, . . . , n}

(i, j) 2 E

wij = wji

1

23

4

5

67

813

3

4 6

5 3

Undirected Directed

wij � 0

E := {(i1, j1), . . . , (ir, jr)|i1, . . . ir, j1, . . . , jr 2 V}

wij = wji

3

E = {(3, 6), (2, 3), . . . }w36 = 7, w23 = 5, . . .

Page 4: Optimal Control and Dynamic Programming › New › Antunes › Slides › Lecture3.pdfDynamic programming solution 6 Apply the DP algorithm to this transition diagram • Costs-to-go

Applications

2

Graphs model networks (road, social, transportation, etc.) and can be found in numerous applications

Page 5: Optimal Control and Dynamic Programming › New › Antunes › Slides › Lecture3.pdfDynamic programming solution 6 Apply the DP algorithm to this transition diagram • Costs-to-go

Shortest path problem

3

1

23

4

5

67

813

3

4 6

5

Find a path from an initial node to a destination node in a weighted graph, with minimum length (sum of the weights of its edges)

Initial

Final

Minimum length 11

Can we use the DP algorithm to find the shortest path?

Page 6: Optimal Control and Dynamic Programming › New › Antunes › Slides › Lecture3.pdfDynamic programming solution 6 Apply the DP algorithm to this transition diagram • Costs-to-go

Discussion

4

• Computing an optimal path in a transition diagram can be seen as computing the shortest path from the nodes at stage to the node at stage of the following weighted graph:

c011

Stage 1Stage 0 Stage hStage h�1

c0n01

c0n02

c021

c022

c012

c111

c121

c122

c123

c1n11

ch�111

ch�121

ch�122

ch�1nh�11

chnh

ch1 artificial node

0 h+ 1

h+ 1artificial stage

0

• For graphs with this structure we already know how to use DP to compute shortest paths.

• Adjustments are needed for general graphs (e.g. cycles may occur) but DP can still be used to provide the shortest path, as we show next.

Page 7: Optimal Control and Dynamic Programming › New › Antunes › Slides › Lecture3.pdfDynamic programming solution 6 Apply the DP algorithm to this transition diagram • Costs-to-go

Dynamic programming formulation

5

4

Given a weighted graph construct a transition diagram:• stages, states at decision stages and only the destination at the terminal

stage.

• Make , if there is no link from to , and .

1

2

3

4

1

2

3

4

1

2

3

4

1

2

5

8

13

3

4

3

1

0

5

8

0

3

1

1

0

5

8

0

3

1

8

0

1

3

h = n� 1 n

wij = 1 i j

Destination

1

3

3

Initial

wii = 0ckij = wij

11

Page 8: Optimal Control and Dynamic Programming › New › Antunes › Slides › Lecture3.pdfDynamic programming solution 6 Apply the DP algorithm to this transition diagram • Costs-to-go

Stage k0 1 2 3

Stat

e x k

1

2

3

4

Dynamic programming solution

6

Apply the DP algorithm to this transition diagram

• Costs-to-go at a stage are the costs of the shortest path with hops. In particular costs-to-go at the initial stage are the optimal costs for each initial condition.

• To find an optimal path follow the policy for a given initial state.• Cost-to-go at stage of a given state is infinite if there is no path from that initial state to

the destination.

1

2

5

8

13

3

4

3

Destination

1

3

3

Initial

0 0 0 0

333

6 6

887

The implementation can be made more efficient and one does not need to first construct the transition diagram. Moreover, one can stop when the costs-to-go remain unchanged.

k n� 1� k

0

Page 9: Optimal Control and Dynamic Programming › New › Antunes › Slides › Lecture3.pdfDynamic programming solution 6 Apply the DP algorithm to this transition diagram • Costs-to-go

Stage k0 1 2 3 4 5

Stat

e x k

1

2

3

4

5

6

Example

7

1

23

4

5

67

813

3

4 6

5

Another example for an undirected graph

0 0 0 0 0

4

13 15

12

1

1

1

4 4 4 4

7 7 7 7

7 7 7 7 7

10 10 10

11 11

Page 10: Optimal Control and Dynamic Programming › New › Antunes › Slides › Lecture3.pdfDynamic programming solution 6 Apply the DP algorithm to this transition diagram • Costs-to-go

8

Shortest paths in road networks

What is the shortest distance from Bucharest to Lugoj?

Lugoj

Neamt

Iasi

Vaslui

Hirsova

Eforie

UrziceniBucharest

Giurgiu

Fagaras

Pitesti

Craiova

Sibiu

Rimnicu Vilcea

Oradea

Zerind

Arad

Timisoara

Mehadia

Dobreta

71

75

118

111

70

75120

146

97

138

80

99

211

10186

98

142

92

87

85

90

151

140

Rode map of Romania

Page 11: Optimal Control and Dynamic Programming › New › Antunes › Slides › Lecture3.pdfDynamic programming solution 6 Apply the DP algorithm to this transition diagram • Costs-to-go

9

Shortest paths in road networks

504 km (Route: Bucharest, Pitesti, Craiova, Dobreta, Mehadia, and Lugoj)

Page 12: Optimal Control and Dynamic Programming › New › Antunes › Slides › Lecture3.pdfDynamic programming solution 6 Apply the DP algorithm to this transition diagram • Costs-to-go

5 10 15 20 25 30 35 40 45

-5

0

5

10

15

20

25

30

10

Robot path planning

A

What is the shortest path for a robot to go from point A to B?

B

Page 13: Optimal Control and Dynamic Programming › New › Antunes › Slides › Lecture3.pdfDynamic programming solution 6 Apply the DP algorithm to this transition diagram • Costs-to-go

11

Assumptions

• It takes distance unit to move horizontally or vertically between adjacent nodes and units to move diagonally.

• Distances to obstacle nodes are infinite.

• Distance between two diagonally adjacent nodes, adjacent to the same obstacle node is infinite.

1

p2

p2

1 1

1

1

1

1

p2

p2

p2

1

1 1

11

1

1

1

1

1

Page 14: Optimal Control and Dynamic Programming › New › Antunes › Slides › Lecture3.pdfDynamic programming solution 6 Apply the DP algorithm to this transition diagram • Costs-to-go

12

Robot path planning

B

A

What is the shortest path for a robot to go from point A to B?

Page 15: Optimal Control and Dynamic Programming › New › Antunes › Slides › Lecture3.pdfDynamic programming solution 6 Apply the DP algorithm to this transition diagram • Costs-to-go

2 4 6 8 10 12 141

2

3

4

5

6

7

8

9

10

11

13

Robot path planning

2 4 6 8 10 12 141

2

3

4

5

6

7

8

9

10

11

7.83 6.83 5.83 4.83 3.83 3.41 3.00 3.41 3.83 4.83 5.83 6.83 7.83

7.41 6.41 5.41 4.41 3.41 2.41 2.00 2.41 3.41 4.41 5.41 6.41 7.41

7.00 6.00 5.00 4.00 3.00 2.00 1.00 1.41 5.41 8.41

7.41 6.41 6.00 0.00 1.00 6.41 9.41

7.83 7.41 7.00 1.00 1.41 5.41 8.41

8.83 8.41 8.00 2.00 2.41 3.41 4.41 5.41 6.41 7.41

9.83 9.41 9.00 3.00 6.41 6.83 7.83

10.00 9.00 8.00 7.00 6.00 5.00 4.00 5.00 6.00 7.41 7.83 8.24

10.41 9.41 9.00 6.00 6.41 7.41 8.41 8.83 9.24

10.83 10.41 10.00 11.00 10.00 9.00 8.00 7.00 7.41 7.83 8.83 9.83 10.24

Simpler example to show the costs-to-go

Side remark: the cost-to-go can be view as a Lyapunov function and the policy can be obtained by following the direction of maximum decrease of this function.

Page 16: Optimal Control and Dynamic Programming › New › Antunes › Slides › Lecture3.pdfDynamic programming solution 6 Apply the DP algorithm to this transition diagram • Costs-to-go

14

Time-varying graphs

How to design a shortest path from A to B when the obstacles are moving?

t = 0 t = 1

t = 2 t = T

Initial position

Final

Page 17: Optimal Control and Dynamic Programming › New › Antunes › Slides › Lecture3.pdfDynamic programming solution 6 Apply the DP algorithm to this transition diagram • Costs-to-go

15

Time-varying graphs1. Consider the set of static graphs for each time step

t = 0 t = 1

t = 2 t = T

Page 18: Optimal Control and Dynamic Programming › New › Antunes › Slides › Lecture3.pdfDynamic programming solution 6 Apply the DP algorithm to this transition diagram • Costs-to-go

16

Time-varying graphs

t = 0

t = 1

t = 2

2. Build a time-invariant graph in 3D

3. Compute shortest path for 3D graph Initial node: initial node at time Final node: final node at time t = T

t = 0

t = T

Example

p2p

21

p2

p2 1

p21 1

Page 19: Optimal Control and Dynamic Programming › New › Antunes › Slides › Lecture3.pdfDynamic programming solution 6 Apply the DP algorithm to this transition diagram • Costs-to-go

Outline

• Shortest paths in graphs

• Dynamic programming

• Dijkstra’s and A* algorithms

• Certainty equivalent control

Page 20: Optimal Control and Dynamic Programming › New › Antunes › Slides › Lecture3.pdfDynamic programming solution 6 Apply the DP algorithm to this transition diagram • Costs-to-go

17

Discussion

DP can be quite inefficient when computing an optimal path in enough.

• For shortest path problems in graphs, there are many alternative algorithms. We describe next the Dijkstra’s and the A* algorithms.

1 11 3

4 5

2

nn� 1

> 2 > 2 > 2> 2> 2> 2 > 2 > 2

initial destination

• Figure example: DP searches the full space - not necessary to compute the optimal path.

Page 21: Optimal Control and Dynamic Programming › New › Antunes › Slides › Lecture3.pdfDynamic programming solution 6 Apply the DP algorithm to this transition diagram • Costs-to-go

18

Dijkstra’s algorithm Main ideas

• Iteratively generate shorter paths from the origin to every node.

• Updates list of nodes (wavefront) which can be explored next.

• New nodes are added to the wavefront based on the cost: neighbors of node with the smallest distance to the origin.

source: wikipedia

Page 22: Optimal Control and Dynamic Programming › New › Antunes › Slides › Lecture3.pdfDynamic programming solution 6 Apply the DP algorithm to this transition diagram • Costs-to-go

19

Dijkstra’s algorithm

Initialization• for , , and OPEN initial node - final node

Steps 1. Remove a node from OPEN with the minimum estimate . If stop, otherwise

execute step 2 for every node for which there is a path (arrow) from to .

2. If : set , set , place in OPEN if it is not there already. Otherwise do not update , .

3. After executing Step 2 for all the nodes corresponding to out-neighbors of , go to step I.

di = 1 i 2 V � {p} p�= {p}dp = 0

i di i = tj i

j

j

j i

dj

di + wij < dj dj = di + wij �(j) = i�(j)

Optimal path• To keep track of the shortest paths if suffices to save for every node the next node

along the optimal path (discovered so far) leading to the initial node.

• The optimal path is then given by for , or equivalently , , where is such that .

• If OPEN is empty at a given step of the algorithm then there is no path to the destination.

i

(i0, i1, . . . , iL) iL = t iL�1 = �(t) . . . , i0 = �(i1)

` 2 {1, 2, . . . , L} L i0 = p

�(i)

i`�1 = �(i`)

t

Page 23: Optimal Control and Dynamic Programming › New › Antunes › Slides › Lecture3.pdfDynamic programming solution 6 Apply the DP algorithm to this transition diagram • Costs-to-go

20

Example I

1 11 3

4 5

2

nn� 1

> 2 > 2 > 2> 2> 2> 2 > 2 > 2

initial destination

Dijkstra’s algorithm requires only three iterations for this example

Iteration Pairs (i, di), i 2OPEN

1

2

0

+ other pairs pertaining to other neigh. of node

Destination/final node removed from OPEN - terminate

(1, 0)

(2, 1)

1

3

+ other pairs pertaining to other neigh. of nodes & 21

(3, 2)

�(2) = 1

�(3) = 2

Page 24: Optimal Control and Dynamic Programming › New › Antunes › Slides › Lecture3.pdfDynamic programming solution 6 Apply the DP algorithm to this transition diagram • Costs-to-go

21

Example II

1

23

4

5

67

813

3

4 6

5

Iteration Pairs (i, di), i 2OPEN

1

2

3

4

0

5

(1, 0)

(2, 1), (3, 8), (4, 6)

(3, 6), (4, 4)

(3, 6), (5, 7)

(5, 7), (6, 13)

(6, 11)

�(2) = 1 �(3) = 1 �(4) = 1

�(3) = 2 �(4) = 2

�(5) = 4

�(6) = 5

Optimal path (from end to start) (6,�(6),�(�(6)), . . . , 1) = (6, 5, 4, 2, 1)

�(6) = 3

Page 25: Optimal Control and Dynamic Programming › New › Antunes › Slides › Lecture3.pdfDynamic programming solution 6 Apply the DP algorithm to this transition diagram • Costs-to-go

22

Shortest paths in road networks

What is the shortest distance from Bucharest to Lugoj?

Lugoj

Neamt

Iasi

Vaslui

Hirsova

Eforie

UrziceniBucharest

Giurgiu

Fagaras

Pitesti

Craiova

Sibiu

Rimnicu Vilcea

Oradea

Zerind

Arad

Timisoara

Mehadia

Dobreta

71

75

118

111

70

75120

146

97

138

80

99

211

10186

98

142

92

87

85

90

151

140

Rode map of Romania

Page 26: Optimal Control and Dynamic Programming › New › Antunes › Slides › Lecture3.pdfDynamic programming solution 6 Apply the DP algorithm to this transition diagram • Costs-to-go

23

Example III

Shortest path from Bucharest to Lugoj

Iteration Pairs {i, di}, i 2 OPEN0 {Lugoj, 0}1 {Mehadia, 70}, {Timisoara, 111}2 {Timisoara, 111}, {Dobreta, 145}3 {Dobreta, 145}, {Arad, 229}4 {Arad, 229}, {Craiova, 265}5 {Craiova, 265}, {Sibiu, 369}, {Zerind, 304}6 {Sibiu, 369}, {Zerind, 304}, {Pitesti, 403}, {Rimnicu Vilcea, 411}7 {Sibiu, 369}, {Pitesti, 403}, {Rimnicu Vilcea, 411}, {Oradea, 375}8 {Pitesti, 403}, {Rimnicu Vilcea, 411}, {Oradea, 375}, {Fagaras, 468}9 {Pitesti, 403}, {Rimnicu Vilcea, 411}, {Fagaras, 468}10 {Rimnicu Vilcea, 411}, {Fagaras, 468}, {Bucharest, 504}11 {Fagaras, 468}, {Bucharest, 504}12 {Bucharest, 504}

From this data we can obtain and compute the optimal path.�

Page 27: Optimal Control and Dynamic Programming › New › Antunes › Slides › Lecture3.pdfDynamic programming solution 6 Apply the DP algorithm to this transition diagram • Costs-to-go

24

A*

• Similar to Dijkstra’s algorithm but an estimate (heuristic) of the distance to the destination for each node is also taken into account when picking the node to be explored next. New nodes are added to the wavefront based on .

• If the heuristic is: (i) smaller than the optimal cost from that node to the destination; (ii) is such that for every then optimal path is found. Otherwise no optimality guarantees.

• To run the A* algorithm under the two heuristic assumptions: 1. Change the weights to . 2. Run Dijkstra’s algorithm and get optimal path. 3. Obtain optimal cost in the original graph with weights .

• The general algorithm is given next, which works under or without these assumptions.

h(i)i 2 V

h(i) wij + h(j) i, j

di + h(i)

w̄ij = wij + h(j)� h(i)

wij

Page 28: Optimal Control and Dynamic Programming › New › Antunes › Slides › Lecture3.pdfDynamic programming solution 6 Apply the DP algorithm to this transition diagram • Costs-to-go

25

A*

Initialization• for , , and OPEN initial node - final node

Steps 1. Remove a node from OPEN with the minimum . If stop, otherwise

execute step 2 for every node for which there is a path (arrow) from to .

2. If : set , set , place in OPEN if it is not there already. Otherwise do not update , .

3. After executing Step 2 for all the nodes corresponding to out-neighbors of , go to step I.

di = 1 i 2 V � {p} p�= {p}dp = 0

i i = tj i

j

j

j i

dj

di + wij < dj dj = di + wij �(j) = i�(j)

t

di + h(i)

(same algorithm as the Dijkstra’s algorithm except for , same remarks to find optimal path as in slide 19)

di + h(i)

Page 29: Optimal Control and Dynamic Programming › New › Antunes › Slides › Lecture3.pdfDynamic programming solution 6 Apply the DP algorithm to this transition diagram • Costs-to-go

26

A* and Dijkstra’s algorithm

A* typically much faster if we have good a heuristic (might not be easy to find! especially if we require it to satisfy two conditions discussed before)

source: wikipedia

Dijkstra’s A*

Page 30: Optimal Control and Dynamic Programming › New › Antunes › Slides › Lecture3.pdfDynamic programming solution 6 Apply the DP algorithm to this transition diagram • Costs-to-go

27

Example

Lugoj

Neamt

Iasi

Vaslui

Hirsova

Eforie

UrziceBucharest

Giurgiu

Fagaras

Pitesti

Craiova

Sibi

Rimnicu

Oradea

Zerin

Arad

Timisoara

Mehadia

Dobreta

71

7

11

117

712

149

13

89

21

108

9

14

9

8

8

9

15

14

Neamt 234Lasi 226Vaslui 199Urziceni 80Hirsova 151Eforie 161

Bucharest 0Giurgi 77Pitesti 98Craiova 160Fagaras 178Sibiu 253

Rimnicu Vilcea 193Lugoj 244

Mehadia 241Dobreta 242Timisoara 329

Arad 366Zerind 374Oradea 380

Straight line distance to Bucharest h(i)

Page 31: Optimal Control and Dynamic Programming › New › Antunes › Slides › Lecture3.pdfDynamic programming solution 6 Apply the DP algorithm to this transition diagram • Costs-to-go

28

Example

Iteration Pairs {i, di} in OPEN0 {Lugoj,0}1 {Mehadia,67}, {Timisoara,196}2 {Timisoara,196}, {Dobreta,143}3 {Timisoara,196}, {Craiova,181}4 {Timisoara,196}, {Pitesti,257}, {Rim. Vilcea,360}5 {Pitesti,257}, {Rim. Vilcea,360}, {Arad,351}6 {Rim. Vilcea,360}, {Arad,351}, {Bucharest,260}

1. Change the weights to . 2. Run Dijkstra’s algorithm and get optimal path. 3. Obtain optimal cost in the original graph with weights .

w̄ij = wij + h(j)� h(i)

From this data we can obtain and compute the optimal path.�

wij

Page 32: Optimal Control and Dynamic Programming › New › Antunes › Slides › Lecture3.pdfDynamic programming solution 6 Apply the DP algorithm to this transition diagram • Costs-to-go

29

Discussion

• For large graphs one cannot even store the number of nodes and initialise but we can still run the algorithm if we keep track of a list of closed nodes (removed in step 1, see slide 19) so that they are not visited again (on slide 19 this is assured by )

• If optimality is not needed, there are many more graph search algorithms, e.g., breath-first search, depth-first search (see label correcting methods in Bertseka’s book, Ch.2)

• For robot motion planning Dijktra and A* are in general naive:

• construct nodes as we move along (so the graph is only implicit).

• random placement of nodes are in general better.

• A popular method that improve upon previous methods based on these two remarks is Rapidly-exploring random tree (RRT) (see LaValle’s book).

di

di + wij < dj

Page 33: Optimal Control and Dynamic Programming › New › Antunes › Slides › Lecture3.pdfDynamic programming solution 6 Apply the DP algorithm to this transition diagram • Costs-to-go

30

Discussion

• The Dijkstra’s algorithm and other search algorithms (e.g. A*) are typically computationally more efficient than DP to compute optimal paths.

• DP explores every node providing the optimal paths from every node to the destination. This is inefficient when interested in one optimal path.

• Why then dynamic programming? Provides a policy which allows to cope with disturbances - see lecture 2.

• We discuss next how to use the Dijkstra’s algorithm to provide the optimal policy in real time (online).

• In Appendix A, the Dijkstra’s algorithm is used to obtain the same optimal policy obtained in the first lecture with DP.

• Thus, again, why DP then? Stochastic DP! + other advantages.

Page 34: Optimal Control and Dynamic Programming › New › Antunes › Slides › Lecture3.pdfDynamic programming solution 6 Apply the DP algorithm to this transition diagram • Costs-to-go

31

Edsger W. Dijkstra

Historical note• Edsger W. Dijkstra was a professor at TU/Eindhoven from 1962 to 1984

What's the shortest way to travel from Rotterdam to Groningen? It is the algorithm for the shortest path, which I designed in about 20 minutes. One morning I was shopping in Amsterdam with my young fiancée, and tired, we sat down on the café terrace to drink a cup of coffee and I was just thinking about whether I could do this, and I then designed the algorithm for the shortest path. As I said, it was a 20-minute invention. In fact, it was published in 1959, three years later. The publication is still quite nice. One of the reasons that it is so nice was that I designed it without pencil and paper. Without pencil and paper you are almost forced to avoid all avoidable complexities. Eventually that algorithm became, to my great amazement, one of the cornerstones of my fame. Edsger W. Dijkstra (1930-2002)

Page 35: Optimal Control and Dynamic Programming › New › Antunes › Slides › Lecture3.pdfDynamic programming solution 6 Apply the DP algorithm to this transition diagram • Costs-to-go

Outline

• Shortest paths in graphs

• Dynamic programming

• Dijkstra’s and A* algorithms

• Certainty equivalent control

Page 36: Optimal Control and Dynamic Programming › New › Antunes › Slides › Lecture3.pdfDynamic programming solution 6 Apply the DP algorithm to this transition diagram • Costs-to-go

32

Shortest paths in graphs and policies• A transition diagram is just a weighted graph and therefore we can compute optimal paths with methods to compute shortest paths in graphs (e.g. Dijkstra’s, A*)

initial stage final stage

• Doing this for every stage and every state and taking the first decision of the optimal paths we obtain the optimal policy! (function that for each state give the first decision of the optimal path from each state to last stage)

example

(see also appendix A)

optimal policy

•However, this is typically computationally less efficient than DP

0 h

Page 37: Optimal Control and Dynamic Programming › New › Antunes › Slides › Lecture3.pdfDynamic programming solution 6 Apply the DP algorithm to this transition diagram • Costs-to-go

33

Certainty equivalent control

• Yet, we can implement the method just described (using e.g., the Dijkstra’s algorithm) online

1. Compute the optimal path for the initial sate and take the first decision

initial stage final stageh0

2. If no disturbance occurred use the next decision along the optimal path, otherwise recompute (online!) and apply first decision

disturbance(recompute)

another disturbances(recompute)

Page 38: Optimal Control and Dynamic Programming › New › Antunes › Slides › Lecture3.pdfDynamic programming solution 6 Apply the DP algorithm to this transition diagram • Costs-to-go

34

Discussion• Doing this we end up with the same policy as DP neglecting disturbances considered in the previous lecture.

• The policy obtained with DP is explicit whereas this new (equivalent) one is implicit and requires online computations!

• In the literature this (equivalent) policy is called certainty equivalent control and is very related to model predictive control (to be addressed later)

•To summarize:

Optimal paths

DP

Dijkstra’s (more efficient)

DP (might be computationally hard)

Dijkstra’s offline (less efficient)

Dijkstra’s online (requires online computations)

Certainty equivalent control Stochastic DP

Stochastic DP

Dijkstra

Page 39: Optimal Control and Dynamic Programming › New › Antunes › Slides › Lecture3.pdfDynamic programming solution 6 Apply the DP algorithm to this transition diagram • Costs-to-go

35

Concluding remarks

Summary• DP can be used to solve shortest paths in graphs.

• Discussed alternative methods, Dijkstra’s and A*.

• Introduced certainty equivalent control.

• Main message there are other methods to compute optimal paths and optimal policies (except stochastic DP!) - (dis)advantages depend on the application (e.g. can we use online computations?).

After this lecture, you should be able to:• Compute the shortest path in a graph with DP, Dijkstra and A*.

Page 40: Optimal Control and Dynamic Programming › New › Antunes › Slides › Lecture3.pdfDynamic programming solution 6 Apply the DP algorithm to this transition diagram • Costs-to-go

Appendix ASolving a DP problem with Dijkstra’s algorithm

Page 41: Optimal Control and Dynamic Programming › New › Antunes › Slides › Lecture3.pdfDynamic programming solution 6 Apply the DP algorithm to this transition diagram • Costs-to-go

012 3 5

1

23

45

2

3

4

0

1

24

11

1

43

1

Initial transition diagram

5

Example

1. Add artificial terminal node with a cost to arrive to it at the final stage coinciding with the terminal cost

Consider the same initial transition diagram considered in the first lecture and follow steps I-3

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

3. Compute the optimal paths (using Dijkstra’s algorithm) for each state to the artificial terminal node and keep track for each initial state of the first decision of the optimal path (this is the optimal policy)

0

4

Page 42: Optimal Control and Dynamic Programming › New › Antunes › Slides › Lecture3.pdfDynamic programming solution 6 Apply the DP algorithm to this transition diagram • Costs-to-go

Example

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

Iteration Pairs (i, di), i 2OPEN

1

2

3

4

0

5

(1, 0)

Initial state 1

(3, 2), (4, 1)

(3, 2), (7, 1), (8, 2)

(3, 2), (10, 2), (11, 4), (8, 2)

(6, 3), (10, 2), (11, 4), (8, 2)

(6, 3), (10, 2), (11, 4), (12, 7)

(6, 3), (14, 4), (13, 7), (11, 4), (12, 7)

(15, 4), (13, 7), (11, 4), (12, 7)

(14, 4), (13, 7), (11, 4), (12, 7)

6

7

8

It is clear that if we consider the states 4, 7, 10 as initial states the transitions(arrows) along this path are also optimal first decisions for these initial states (belong to the optimal policy)

Belongs to the optimal policy

(15, 8), (13, 7), (11, 4), (12, 7)

(15, 8), (13, 6), (12, 7)

(15, 6), (12, 7)

9

10

11

Page 43: Optimal Control and Dynamic Programming › New › Antunes › Slides › Lecture3.pdfDynamic programming solution 6 Apply the DP algorithm to this transition diagram • Costs-to-go

Example

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

Iteration Pairs (i, di), i 2OPEN

1

2

3

4

0

5

Initial state

6

7

It is clear that if we consider the states 5, 9, 12, 14 as initial states the transitions(arrows) along this path are also optimal first decisions for these initial states (belong to the optimal policy)

Belongs to the optimal policy

(2, 0)

(4, 4), (5, 1)

(4, 4), (9, 4), (8, 5)

(7, 4), (9, 4), (8, 5)

(10, 5), (11, 7) (9, 4), (8, 5)

(10, 5), (11, 7) (12, 5), (8, 5)

(10, 5), (11, 7) (14, 5), (8, 5), (13, 10)

2

(10, 5), (11, 7), (15, 9), (8, 5), (13, 10)

(10, 5), (11, 7), (15, 9), (13, 10)

(11, 7), (15, 9), (13, 10)

(15, 9), (13, 10)

8

9

10

Page 44: Optimal Control and Dynamic Programming › New › Antunes › Slides › Lecture3.pdfDynamic programming solution 6 Apply the DP algorithm to this transition diagram • Costs-to-go

Example

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

Iteration Pairs (i, di), i 2OPEN

1

2

3

4

0

5

Initial state

6

7

Belongs to the optimal policy

3

It is clear that if we consider the states 8, 11, 13 as initial states the transitions(arrows) along this path are also optimal first decisions for these initial states (belong to the optimal policy)

(3, 0)

(6, 1), (7, 3), (8, 1)

(10, 5), (11, 4) (7, 3), (8, 1)

(10, 5), (11, 3) (7, 3), (12, 6)

(10, 4), (11, 3) (12, 6)

(10, 4), (13, 7), (14, 6) (12, 6)

(13, 7), (14, 6) (12, 6)

(13, 7), (15, 10), (12, 6)

(15, 7), (12, 6)8

(15, 7), (14, 6)

(15, 7)9

10

Page 45: Optimal Control and Dynamic Programming › New › Antunes › Slides › Lecture3.pdfDynamic programming solution 6 Apply the DP algorithm to this transition diagram • Costs-to-go

Example

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

Iteration Pairs (i, di), i 2OPEN

1

2

3

4

0

5

Initial state

Belongs to the optimal policy

It is clear that if we consider the states 8, 11, 13 as initial states the transitions(arrows) along this path are also optimal first decisions for these initial states (belong to the optimal policy)

6

(6, 0)

(10, 4), (11, 3)

(10, 4), (13, 7), (14, 5)

(13, 7), (14, 5)

(13, 7), (15, 9)

(13, 7)

Page 46: Optimal Control and Dynamic Programming › New › Antunes › Slides › Lecture3.pdfDynamic programming solution 6 Apply the DP algorithm to this transition diagram • Costs-to-go

Optimal policy

Example

Combining the first decisions leading to the end stage for each node we obtain the optimal policy (the same obtained with the DP algorithm in the first lecture)

Page 47: Optimal Control and Dynamic Programming › New › Antunes › Slides › Lecture3.pdfDynamic programming solution 6 Apply the DP algorithm to this transition diagram • Costs-to-go

Appendix BDP with terminal constraints

Page 48: Optimal Control and Dynamic Programming › New › Antunes › Slides › Lecture3.pdfDynamic programming solution 6 Apply the DP algorithm to this transition diagram • Costs-to-go

B1

Suppose that we want to reach a given state at the final stage of a transition diagram starting at a given initial state with minimum cost (as opposed to simply reaching the final stage)

1 1

2

3

5

1

2

22

2

50

1

24

2

DP with terminal constraints

terminalstate

Initial state

Since a transition diagram is simply a weighted graph, we can apply graph search methods, and in particular repeat the trick just used to apply DP.

1

1

Page 49: Optimal Control and Dynamic Programming › New › Antunes › Slides › Lecture3.pdfDynamic programming solution 6 Apply the DP algorithm to this transition diagram • Costs-to-go

B2

DP with terminal constraints

1.Relabel nodes

1

2

3

4

5

6

7

62.transform graph to trans. diagram

weighted graph with final and terminal nodes

1

2

3

4

5

6

7

1

2

3

4

5

6

71

1

1

1

4

1

0

0

0

0

0

0

0

25

2

2

3. Apply DP

1

0

1

1

4

1

111

000

11

44

11

2

0

0

0

0

0

0

0

2

2

2

2

5

0

3

33

3

Page 50: Optimal Control and Dynamic Programming › New › Antunes › Slides › Lecture3.pdfDynamic programming solution 6 Apply the DP algorithm to this transition diagram • Costs-to-go

B3

DP with terminal constraints

2

22

2

50

4

11

2

3

4

5

6

3

2

4

1

1

By inspection we can see that this is the only only part that matters

Conclusion: if there is a terminal constraint:• Remove the arrows from nodes at the final decision states that do not lead to the

desired terminal state.• For each state choose the arrow with minimum cost and set the cost-to-go of that node

to be the terminal cost of the desired terminal node plus the cost of such arrow. If the state has no arrows, set the cost-to-go to infinity.

• Apply DP.