22
Final Exam January 24th, 2015 Dynamic Programming & Optimal Control (151-0563-01) Prof. R. D’Andrea Solutions Exam Duration: 150 minutes Number of Problems: 5 Permitted aids: One A4 sheet of paper. No calculators allowed.

Solutions - ETH Z · Final Exam January 24th, 2015 Dynamic Programming & Optimal Control (151-0563-01) Prof. R. D’Andrea Solutions Exam Duration:150 minutes Number of Problems:5

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Solutions - ETH Z · Final Exam January 24th, 2015 Dynamic Programming & Optimal Control (151-0563-01) Prof. R. D’Andrea Solutions Exam Duration:150 minutes Number of Problems:5

Final Exam January 24th, 2015

Dynamic Programming & Optimal Control (151-0563-01) Prof. R. D’Andrea

Solutions

Exam Duration: 150 minutes

Number of Problems: 5

Permitted aids: One A4 sheet of paper.

No calculators allowed.

Page 2: Solutions - ETH Z · Final Exam January 24th, 2015 Dynamic Programming & Optimal Control (151-0563-01) Prof. R. D’Andrea Solutions Exam Duration:150 minutes Number of Problems:5

Page 2 Final Exam – Dynamic Programming & Optimal Control

Problem 1 15%

Convert the following problems into the basic problem formulation of Dynamic Programming.In particular, write down the state vector xk containing all relevant information at stage k,the system dynamics xk+1 = fk(xk, uk, wk) describing the evolution of the state vector, thestage cost gk(xk, uk, wk), the final cost gN (xN ), and the disturbance vector wk with itsprobability distribution Pk( · |xk, uk). Note that in all problems the variable uk denotes thedecision variable.

a) Problem 1: The system dynamics are given by

yk+1 = yk + vkuk + rkuk−1, k = 0, 1, . . . , N − 1,

with y0 = 0, u(−1) = 0, and uk ∈ R. The disturbances vk and rk are independent randomvariables both taking the value 0 and 1 with equal probability. The cost function is givenby

cost =N−1∑k=0

y2k + u2k.

b) Problem 2: The system dynamics are given by

yk+1 = yk + uk + vk, k = 0, 1, . . . , N − 1,

with y0 = 0, y(−1) = 0, and uk ∈ [−1, 1]. The disturbance vk takes the value 1− vk−1and 1 + vk−1 with equal probability (we assume that v(−1) = 0). The cost function isgiven by

cost =

N∑k=0

ykyk−1.

c) Problem 3: The system dynamics are given by

yk+1 = yk + uk + vk, k = 0, 1, . . . , N − 1,

with y0 = 0 and uk ∈ [−1, 1]. The disturbance vk is normally distributed withmean ak ∈ {−1, 0, 1} and variance 1. Before each stage k, an oracle tells us the valueof ak. A priori, all possible values of ak have equal probability. The cost function is givenby

cost = y2N .

Page 3: Solutions - ETH Z · Final Exam January 24th, 2015 Dynamic Programming & Optimal Control (151-0563-01) Prof. R. D’Andrea Solutions Exam Duration:150 minutes Number of Problems:5

Final Exam – Dynamic Programming & Optimal Control Page 3

Solution 1

In the following, the problems 1-3 are described in the basic problem formulation. Note thatthe solution is not unique, i.e. there are other formulations that are also correct.

a) Problem 1:

State vector : xk =

(ykuk−1

)

Dynamics : fk(xk, uk, wk) =

(yk + vkuk + rkuk−1

uk

)

Cost : gk(xk, uk, wk) = y2k + u2k

gN (xN ) = 0

Disturbance : wk =

(vkrk

)Pk(vk = 0 | xk, uk) = 0.5

Pk(vk = 1 | xk, uk) = 0.5

Pk(rk = 0 | xk, uk) = 0.5

Pk(rk = 1 | xk, uk) = 0.5

b) Problem 2:

State vector : xk =

ykyk−1vk−1

Dynamics : fk(xk, uk, wk) =

yk + uk + 1 + rkvk−1yk

1 + rkvk−1

Cost : gk(xk, uk, wk) = ykyk−1

gN (xN ) = yNyN−1

Disturbance : wk = rk

Pk(rk = −1 | xk, uk) = 0.5

Pk(rk = 1 | xk, uk) = 0.5

Page 4: Solutions - ETH Z · Final Exam January 24th, 2015 Dynamic Programming & Optimal Control (151-0563-01) Prof. R. D’Andrea Solutions Exam Duration:150 minutes Number of Problems:5

Page 4 Final Exam – Dynamic Programming & Optimal Control

c) Problem 3:

State vector : xk =

(ykak

)

Dynamics : fk(xk, uk, wk) =

(yk + uk + vk

rk

)

Cost : gk(xk, uk, wk) = 0

gN (xN ) = y2N

Disturbance : wk =

(vkrk

)Pk(vk | xk, uk) = N (ak, 1)

Pk(rk = −1 | xk, uk) = 1/3

Pk(rk = 0 | xk, uk) = 1/3

Pk(rk = 1 | xk, uk) = 1/3

Page 5: Solutions - ETH Z · Final Exam January 24th, 2015 Dynamic Programming & Optimal Control (151-0563-01) Prof. R. D’Andrea Solutions Exam Duration:150 minutes Number of Problems:5

Final Exam – Dynamic Programming & Optimal Control Page 5

Problem 2 20%

Consider the dynamic system

xk+1 = xk + uk, k = 0, 1,

with initial state x0 = 1 and a discrete control input uk ∈ {−1, 0}.

Note: All parts can be solved independently.

a) Assume that we want to minimize a quadratic cost function given by

cost = α(x22 + x21) + (1− α)(u21 + u20),

with some known parameter α ∈ [0, 1]. Apply the Dynamic Programming Algorithm tocompute the optimal cost J0(x0).

b) Assume that the costs cannot be combined in one single cost function. Hence, there aretwo cost functions to be minimized:

cost1 = x22 + x21,

cost2 = u21 + u20.

Apply the Dynamic Programming Algorithm to compute the set of non-inferior solu-tions F0(x0).

c) Assume you found a solution to b), i.e. you computed the set of non-inferior solu-tions F0(x0). Explain how you can use F0(x0) to compute the solution of a), i.e. tocompute the optimal cost J0(x0), without applying the Dynamic Programming Algorithmagain.

d) Assume you found a solution to b), i.e. you computed the set of non-inferior solu-tions F0(x0). Explain how you can use F0(x0) to compute the solution of a), i.e. tocompute the optimal cost J0(x0), if an additional constraint x22 + x21 < 2 has to be satis-fied.

Page 6: Solutions - ETH Z · Final Exam January 24th, 2015 Dynamic Programming & Optimal Control (151-0563-01) Prof. R. D’Andrea Solutions Exam Duration:150 minutes Number of Problems:5

Page 6 Final Exam – Dynamic Programming & Optimal Control

Solution 2

a) First, we write down the state space:

S0 = {1},S1 = {0, 1},S2 = {−1, 0, 1}.

Now, we apply the Dynamic Programming Algorithm:

• k = N = 2:

J2(x2) = αx2

• k = 1:

J1(x1) = minu1∈{−1,0}

{αx21 + (1− α)u21 + J2(x1 + u1)

}= min

u1∈{−1,0}

{αx21 + (1− α)u21 + α(x1 + u1)

2}

= minu1∈{−1,0}

{2αx21 + u21 + 2αx1u1

}J1(0) = min

u1∈{−1,0}

{u21}

= min{

1, 0}

⇒ J1(0) = 0

J1(1) = minu1∈{−1,0}

{2α+ u21 + 2αu1

}= min

{1, 2α

}⇒ J1(1) =

{2α if α ≤ 1

2

1 if α > 12

• k = 0:

J0(x0) = minu0∈{−1,0}

{(1− α)u20 + J1(x0 + u0)

}J0(1) = min

{1− α,min{1, 2α}

}= min

{1− α, 2α

}⇒ J0(1) =

{2α if α ≤ 1

3

1− α if α > 13

b) First, we write down the state space:

S0 = {1},S1 = {0, 1},S2 = {−1, 0, 1}.

Now, we apply the Dynamic Programming Algorithm for multiobjective problems:

Page 7: Solutions - ETH Z · Final Exam January 24th, 2015 Dynamic Programming & Optimal Control (151-0563-01) Prof. R. D’Andrea Solutions Exam Duration:150 minutes Number of Problems:5

Final Exam – Dynamic Programming & Optimal Control Page 7

• k = N = 2:

F2(x2) = {(x22, 0)}

• k = 1:

F1(x1) = noninfu1∈{−1,0}

{(x21 + c1, u

21 + c2) | (c1, c2) ∈ F2(x1 + u1)

}= noninf

u1∈{−1,0}

{(x21 + x22, u

21)}

= noninfu1∈{−1,0}

{(2x21 + 2x1u1 + u21, u

21)}

F1(0) = noninfu1∈{−1,0}

{(u21, u

21)}

= noninf{

(1, 1), (0, 0)}

⇒ F1(0) ={

(0, 0)}

F1(1) = noninfu1∈{−1,0}

{(2 + 2u1 + u21, u

21)}

= noninf{

(1, 1), (2, 0)}

⇒ F1(1) ={

(1, 1), (2, 0)}

• k = 0:

F0(x0) = noninfu0∈{−1,0}

{(c1, u

20 + c2) | (c1, c2) ∈ F1(x0 + u0)

}F0(1) = noninf

{(0, 1), (1, 1), (2, 0)

}⇒ F0(1) =

{(0, 1), (2, 0)

}c) The cost of a) is a linear combination of the two costs in b), with the weights α and (1− α),

respectively. For each element of F0(x0), we can thus compute a scalar cost value corre-sponding to the cost of a). We pick the lowest one.

d) Reject all elements of F0(x0) that do not satisfy the constraint. From the remainingelements, choose the one with the lowest cost.

Page 8: Solutions - ETH Z · Final Exam January 24th, 2015 Dynamic Programming & Optimal Control (151-0563-01) Prof. R. D’Andrea Solutions Exam Duration:150 minutes Number of Problems:5

Page 8 Final Exam – Dynamic Programming & Optimal Control

Page 9: Solutions - ETH Z · Final Exam January 24th, 2015 Dynamic Programming & Optimal Control (151-0563-01) Prof. R. D’Andrea Solutions Exam Duration:150 minutes Number of Problems:5

Final Exam – Dynamic Programming & Optimal Control Page 9

Problem 3 25%

The following questions are about shortest path problems. Please obey the following instruc-tions. If you are given any output of the Label Correcting Algorithm, you can assume it hasbeen generated according to these instructions.

Instructions: Recall that in the Label Correcting Algorithm only one instance of a node can bein the OPEN bin at any time. If a node already in the OPEN bin enters the OPEN bin again, treatthis node as if it entered the OPEN bin at the current iteration. If two nodes enter the OPEN binin the same iteration, add the one with the lowest node number first. New nodes enter the OPEN

bin from the right-hand side.Example: OPEN bin: 2, 3, 4; Node exiting OPEN 2; Nodes entering OPEN: 3, 5; new OPEN bin: 4,3, 5.

a) Your robot has successfully landed on Mars and just made its first picture. For reasons ofrobustness, the image was not stored in a single file, but in multiple chunks that overlap(see Table 1). You are really excited about the picture and want to download and publishit in tomorrow’s newspaper. Unfortunately, the communication protocol with your robotonly allows you to select and download one chunk at a time. Furthermore, you have a verylimited bandwidth b to download the image and there is a delay d when communicatingwith your robot on Mars. Each chunk that you download thus takes 2d+ l/b time, wherel is the chunk length. If you download all chunks, the picture won’t be ready in time fortomorrow’s issue. Which chunks do you download in order to receive the complete imagein minimum time?

i) Draw a graph of the shortest path problem described above and compute the arclengths. You don’t have to solve the problem!Hint: Let the state be the number of consecutive bytes starting from byte 0 thatyou successfully downloaded.

ii) Compute a lower bound for the time required to download the remaining chunks.

Problem data:

Image size s: 2000 bytesNumber of chunks N : 8Download bandwidth b: 10 bytes/minCommunication delay d: 10 min

start byte end byte

0 299200 999300 799400 699600 1999700 1699900 15991500 1999

Table 1: Overlappingimage data chunks.

Page 10: Solutions - ETH Z · Final Exam January 24th, 2015 Dynamic Programming & Optimal Control (151-0563-01) Prof. R. D’Andrea Solutions Exam Duration:150 minutes Number of Problems:5

Page 10 Final Exam – Dynamic Programming & Optimal Control

b) Consider the shortest path problem shown in Figure 1.

S

1

2

3

4

5

6

7

8

9

10 T

7

3

1

4

2

3

6

2

8

11

1

3

5 12

24

3 8

2 3

563

Figure 1

i) Continue table 2 for two iterations (i.e. iteration 6 and 7) using best-first search todetermine at each iteration which node to remove from the OPEN bin.

IterationNode exiting

OPENOPEN dS d1 d2 d3 d4 d5 d6 d7 d8 d9 d10 dT

5 1 7, 8, 4 0 5 3 1 9 4 ∞ 12 15 ∞ ∞ ∞6 ...

Table 2

ii) Continue table 3 for two iterations (i.e. iteration 7 and 8) using depth-first search todetermine at each iteration which node to remove from the OPEN bin.

IterationNode exiting

OPENOPEN dS d1 d2 d3 d4 d5 d6 d7 d8 d9 d10 dT

6 9 1, 2, 7 0 7 3 1 ∞ 4 ∞ 12 15 17 17 207 ...

Table 3

iii) Continue table 4 for two iterations (i.e. iteration 9 and 10) using breadth-first searchto determine at each iteration which node to remove from the OPEN bin.

IterationNode exiting

OPENOPEN dS d1 d2 d3 d4 d5 d6 d7 d8 d9 d10 dT

8 6 4, 7, 8, 9 0 5 3 1 9 4 17 12 15 18 ∞ ∞9 ...

Table 4

iv) The shortest path problem from Figure 1 can be solved using Dynamic Program-ming. Perform the initialization and one iteration step of the Dynamic ProgrammingAlgorithm.

Page 11: Solutions - ETH Z · Final Exam January 24th, 2015 Dynamic Programming & Optimal Control (151-0563-01) Prof. R. D’Andrea Solutions Exam Duration:150 minutes Number of Problems:5

Final Exam – Dynamic Programming & Optimal Control Page 11

c) Consider the result of the Label Correcting Algorithm shown in Table 5.Note: Table 5 does not correspond to the graph in Figure 1.

IterationNode exiting

OPENOPEN dS d1 d2 d3 d4 d5 d6 d7 d8 d9 dT

0 - S 0 ∞ ∞ ∞ ∞ ∞ ∞ ∞ ∞ ∞ ∞1 S 1, 2 0 3 2 ∞ ∞ ∞ ∞ ∞ ∞ ∞ ∞2 2 1, 3 0 3 2 7 ∞ ∞ ∞ ∞ ∞ ∞ ∞3 1 3 0 3 2 6 ∞ ∞ ∞ ∞ ∞ ∞ ∞4 3 4, 5, 6, 7 0 3 2 6 14 8 12 14 ∞ ∞ ∞5 5 4, 7, 6, 8 0 3 2 6 14 8 11 14 12 ∞ ∞6 6 4, 8, 7 0 3 2 6 14 8 11 12 12 ∞ 177 7 4, 8 0 3 2 6 14 8 11 12 12 ∞ 178 8 4 0 3 2 6 14 8 11 12 12 ∞ 149 4 - 0 3 2 6 14 8 11 12 12 ∞ 14

Table 5

Mark the correct answer for each statement.

Grading: Each correct answer is worth 0.50%. Answers left blank are worth 0%.Wrong answers are penalized with -0.25%. The minimum score of this subproblem 3.c) is0%.

i) Node 1 and node 2 are connected.� true � undetermined � false

ii) The shortest path from node 6 to node T has cost� 6. � 3. � undetermined.

iii) The cost to go from node 7 to 9 is� > 5. � ∞ (not connected). � ≥ 5.

iv) The shortest path from node S to node 5 has cost 8.� true � undetermined � false

v) The following method was applied in Table 5 to determine at each iteration whichnode exits the OPEN bin:� breadth-first search. � best-first search � depth-first search.

vi) The shortest path from node S to node 4 has cost 14.� true � undetermined � false

vii) Given a lower bound hj ≥ 2 to the smallest cost from j to T , j = 1, . . . , 9. Thenumber of iterations changes if you apply the A∗Algorithm to Table 5.� true � undetermined � false

viii) The shortest path from node 8 to node T has cost 3.� true � undetermined � false

ix) Node 2 and node 4 are connected.� true � undetermined � false

Page 12: Solutions - ETH Z · Final Exam January 24th, 2015 Dynamic Programming & Optimal Control (151-0563-01) Prof. R. D’Andrea Solutions Exam Duration:150 minutes Number of Problems:5

Page 12 Final Exam – Dynamic Programming & Optimal Control

Solution 3

a) i) If the state is defined as the number of consecutive bytes starting from byte 0 thatare already successfully downloaded, then one can move to nodes with a lower nodenumber at zero costs. This is equivalent to throwing the overlapping bytes away.Fig. 2 shows a graph of the shortest path problem.

400

10000

200

300 600 2000

700800

1600

900

1500

1700

50

70

0 100

0

50 120

160

90

0

0

70

000

0

0

Figure 2

ii) In the best case all remaining data is stored in a single file. Starting from byte x ittakes

s− xb

+ 2d

time to download all remaining bytes.

b) i) Best-first search:

IterationNode exiting

OPENOPEN dS d1 d2 d3 d4 d5 d6 d7 d8 d9 d10 dT

5 1 7, 8, 4 0 5 3 1 9 4 ∞ 12 15 ∞ ∞ ∞6 4 8, 6, 7 0 5 3 1 9 4 15 11 15 ∞ ∞ ∞7 7 8, 6, 9 0 5 3 1 9 4 15 11 15 14 ∞ ∞

ii) Depth-first search:

IterationNode exiting

OPENOPEN dS d1 d2 d3 d4 d5 d6 d7 d8 d9 d10 dT

6 9 1, 2, 7 0 7 3 1 ∞ 4 ∞ 12 15 17 17 207 7 1, 2, 9 0 7 3 1 ∞ 4 ∞ 12 15 15 17 208 9 1, 2 0 7 3 1 ∞ 4 ∞ 12 15 15 17 20

iii) Breadth-first search:

IterationNode exiting

OPENOPEN dS d1 d2 d3 d4 d5 d6 d7 d8 d9 d10 dT

8 6 4, 7, 8, 9 0 5 3 1 9 4 17 12 15 18 ∞ ∞9 4 8, 9, 6, 7 0 5 3 1 9 4 15 11 15 18 ∞ ∞10 8 6, 7, 9, 10 0 5 3 1 9 4 15 11 15 17 17 ∞

Page 13: Solutions - ETH Z · Final Exam January 24th, 2015 Dynamic Programming & Optimal Control (151-0563-01) Prof. R. D’Andrea Solutions Exam Duration:150 minutes Number of Problems:5

Final Exam – Dynamic Programming & Optimal Control Page 13

iv) There are N = 11 nodes (S, 1, 2, . . . , 10) and the terminal node T in the graph (seeFig. 1). We define the cost-to-go Jk(i) to be

Jk(i) := optimal cost of getting from i to t in N − k moves.

Using the Dynamic Programming Algorithm, the optimal cost-to-go is then computedas follows:

• k = 10: (Initialization)

i J10(i) ← aiT

S ∞ aST =∞1 ∞ a1T =∞2 ∞ a2T =∞3 ∞ a3T =∞4 ∞ a4T =∞5 ∞ a5T =∞6 ∞ a6T =∞7 ∞ a7T =∞8 ∞ a8T =∞9 5 a9T = 510 3 a10T = 3

• k = 9: (Recursion)

i J9(i) ← minj∈{S,1,...,10}

[aij + J10(j)]

S ∞ T cannot be reached in ≤ 2 steps1 ∞ T cannot be reached in ≤ 2 steps2 ∞ T cannot be reached in ≤ 2 steps3 ∞ T cannot be reached in ≤ 2 steps4 ∞ T cannot be reached in ≤ 2 steps5 ∞ T cannot be reached in ≤ 2 steps6 6 min {∞, 3 +∞, 1 + 5}7 8 min {∞, 8 +∞, 5 +∞, 3 + 5}8 5 min {∞, 2 + 5, 2 + 3}9 5 min {5, 1 +∞, 6 + 3}10 3 min {3}

c) i) Undetermined.

ii) Undetermined.

iii) ≥ 5.

iv) True.

v) Best-first search.

vi) True.

vii) False.

viii) False.

ix) False.

Page 14: Solutions - ETH Z · Final Exam January 24th, 2015 Dynamic Programming & Optimal Control (151-0563-01) Prof. R. D’Andrea Solutions Exam Duration:150 minutes Number of Problems:5

Page 14 Final Exam – Dynamic Programming & Optimal Control

Page 15: Solutions - ETH Z · Final Exam January 24th, 2015 Dynamic Programming & Optimal Control (151-0563-01) Prof. R. D’Andrea Solutions Exam Duration:150 minutes Number of Problems:5

Final Exam – Dynamic Programming & Optimal Control Page 15

Problem 4 20%

Consider the following Stochastic Shortest Path Problem:

xk+1 = wk,

xk ∈ {0, 1, 2},uk ∈ {A,B}.

The transition probabilities pij(uk) := P (wk = j|xk = i, uk) between the states are given by

p00(A) = 1,p01(A) = 0,p02(A) = 0,p10(A) = 0,p11(A) = 0.2,p12(A) = 0.8,p20(A) = 0.5,p21(A) = 0.3,p22(A) = 0.2,

p00(B) = 1,p01(B) = 0,p02(B) = 0,p10(B) = 0.2,p11(B) = 0.3,p12(B) = 0.5,p20(B) = 0.2,p21(B) = 0.5,p22(B) = 0.3.

The cost function to be minimized is given by

limN→∞

E

{N−1∑k=0

g(uk)xk

},

withg(A) = 16, g(B) = 5.

Note: All parts can be solved independently.

a) Explain why the state xk = 0 can be considered to be the termination state of this Stochas-tic Shortest Path Problem.

b) Perform one iteration of the Value Iteration Algorithm, i.e. compute J1(1) and J1(2).Use J0(1) = 20 and J0(2) = 10 as initial guess.

c) Perform one iteration of the Policy Iteration Algorithm, i.e. compute µ1(1) and µ1(2).Use µ0(1) = A and µ0(2) = B as initial guess.

d) The optimal cost of the above Stochastic Shortest Path Problem can be solved by maxi-mizing J(1) + J(2) subject to a set of linear constraints on J(1) and J(2). Write downthese constraints. You don’t have to simplify the constraints.

Page 16: Solutions - ETH Z · Final Exam January 24th, 2015 Dynamic Programming & Optimal Control (151-0563-01) Prof. R. D’Andrea Solutions Exam Duration:150 minutes Number of Problems:5

Page 16 Final Exam – Dynamic Programming & Optimal Control

Solution 4

a) The termination state must be cost-free. For xk = 0 the stage cost is zero. Further, theremust exist a policy such that from any initial state we reach the termination state withnonzero probability after a finite number of stages, and once we are in the terminationstate, the probability of leaving it again must be zero for all inputs. With the giventransition probabilities, this is the case for xk = 0.

b) We initialize the value iteration algorithm and perform one iteration:

• Initial guess:

J0(1) = 20, J0(2) = 10

• Iteration 1:

J1(1) = minu∈{A,B}

[g(u) · 1 + p11(u)J0(1) + p12(u)J0(2)

]= min

[16 · 1 + 0.2 · 20 + 0.8 · 10, 5 · 1 + 0.3 · 20 + 0.5 · 10

]= min

[28, 16

]⇒ J1(1) = 16

J1(2) = minu∈{A,B}

[g(u) · 2 + p21(u)J0(1) + p22(u)J0(2)

]= min

[16 · 2 + 0.3 · 20 + 0.2 · 10, 5 · 2 + 0.5 · 20 + 0.3 · 10

]= min

[40, 23

]⇒ J1(2) = 23

c) We initialize the policy iteration algorithm and perform one iteration:

• Initial guess:

µ0(1) = A, µ0(2) = B

• Iteration 1:

- Policy evaluation:

Jµ0(1) = g(A) · 1 + p11(A)Jµ0(1) + p12(A)Jµ0(2)

= 16 · 1 + 0.2 · Jµ0(1) + 0.8 · Jµ0(2)

⇒ Jµ0(1) = 20 + Jµ0(2)

Jµ0(2) = g(B) · 2 + p21(B)Jµ0(1) + p22(B)Jµ0(2)

= 5 · 2 + 0.5 · Jµ0(1) + 0.3 · Jµ0(2)

= 10 + 0.5 · (20 + Jµ0(2)) + 0.3 · Jµ0(2)

= 20 + 0.8 · Jµ0(2)

⇒ Jµ0(2) = 100

⇒ Jµ0(1) = 120

Page 17: Solutions - ETH Z · Final Exam January 24th, 2015 Dynamic Programming & Optimal Control (151-0563-01) Prof. R. D’Andrea Solutions Exam Duration:150 minutes Number of Problems:5

Final Exam – Dynamic Programming & Optimal Control Page 17

- Policy improvement:

µ1(1) = arg minu∈{A,B}

[g(u) · 1 + p11(u)Jµ0(1) + p12(u)Jµ0(2)

]= arg min

[16 · 1 + 0.2 · 120 + 0.8 · 100, 5 · 1 + 0.3 · 120 + 0.5 · 100

]= arg min

[120, 91

]⇒ µ1(1) = B

µ1(2) = arg minu∈{A,B}

[g(u) · 2 + p21(u)Jµ0(1) + p22(u)Jµ0(2)

]= arg min

[16 · 2 + 0.3 · 120 + 0.2 · 100, 5 · 2 + 0.5 · 120 + 0.3 · 100

]= arg min

[88, 100

]⇒ µ1(2) = A

d) For each admissible pair {x, u} we get one linear constraint on the optimal cost. Thisresults in four constraints:

J(1) ≤ g(A) · 1 + p11(A)J(1) + p12(A)J(2)

= 16 + 0.2 · J(1) + 0.8 · J(2),

J(1) ≤ g(B) · 1 + p11(B)J(1) + p12(B)J(2)

= 5 + 0.3 · J(1) + 0.5 · J(2),

J(2) ≤ g(A) · 2 + p21(A)J(1) + p22(A)J(2)

= 32 + 0.3 · J(1) + 0.2 · J(2),

J(2) ≤ g(B) · 2 + p21(B)J(1) + p22(B)J(2)

= 10 + 0.5 · J(1) + 0.3 · J(2).

Page 18: Solutions - ETH Z · Final Exam January 24th, 2015 Dynamic Programming & Optimal Control (151-0563-01) Prof. R. D’Andrea Solutions Exam Duration:150 minutes Number of Problems:5

Page 18 Final Exam – Dynamic Programming & Optimal Control

Page 19: Solutions - ETH Z · Final Exam January 24th, 2015 Dynamic Programming & Optimal Control (151-0563-01) Prof. R. D’Andrea Solutions Exam Duration:150 minutes Number of Problems:5

Final Exam – Dynamic Programming & Optimal Control Page 19

Problem 5 20%

Consider the dynamic system

x(t) = x(t)u(t)

with the fixed initial state x(0) = 1 and the fixed terminal state x(T ) = 2. The control inputu(t) is constrained to u(t) ∈ [umin, umax], with umin < 0 < umax.

Use the Minimum Principle to find the input u∗(t) that minimizes the cost function∫ T

0(x(t)− 1)2 dt

for T > ln(2)/umax. Show that your solution is the only one that satisfies the Minimum Prin-ciple.

Page 20: Solutions - ETH Z · Final Exam January 24th, 2015 Dynamic Programming & Optimal Control (151-0563-01) Prof. R. D’Andrea Solutions Exam Duration:150 minutes Number of Problems:5

Page 20 Final Exam – Dynamic Programming & Optimal Control

Solution 5

The Hamiltonian function is

H(x(t), u(t), p(t)) = (x(t)− 1)2 + p(t)x(t)u(t).

Pontryagin’s necessary conditions for optimality can be written as

State equation: x(t) =∂H(x(t), u(t), p(t))

∂p, x(0) = x0, x(T ) = xT .

⇒ x(t) = x(t)u(t), x(0) = 1, x(T ) = 2.

Adjoint equation: p(t) = −∂H(x(t), u(t), p(t))

∂x.

⇒ p(t) = −p(t)u(t)− 2x(t) + 2.

Control input: u∗(t) = arg minu∈U

H(x(t), u, p(t)).

⇒ u∗(t) =

umax, p(t)x(t) < 0,

[umin, umax], p(t)x(t) = 0,

umin, p(t)x(t) > 0.

Hamiltonian: H(x(t), u(t), p(t)) = constant, ∀t ∈ [0, T ].

We prove by contradiction that we start on a singular arc:

• p(0) > 0:Because p(0)x(0) > 0, the control input u(t) = umin is applied for some time interval. Theevolution of the state x(t) is then

x(t) = x(t)umin,

⇒ x(t) = eumint.

Inserting this result into the adjoint equation yields

p(t) = −p(t)umin − 2eumint + 2.

A solution for p(t) can be obtained using the ansatz

p(t) = C1e−umint + C2e

umint + C3,

⇒ p(t) = C1e−umint − 1

umineumint +

2

umin.

To satisfy the initial condition p(0) > 0, C1 has to be > − 1umin

. Since both x(t) and p(t)remain positive for all future, a switching in the control input u(t) never appears and thefinal state x(T ) = 2 cannot be reached.

• p(0) < 0:Because p(0)x(0) < 0, the control input u(t) = umax is applied for some time interval.Following the same argument as for the case p(0) < 0, a switching in the control inputu(t) never appears and x(T ) = eumaxT . The final constraint x(T ) = 2 is only satisfied ifT = ln 2

umax. The only possibility left for T > ln 2

umaxis to start on a singular arc.

Page 21: Solutions - ETH Z · Final Exam January 24th, 2015 Dynamic Programming & Optimal Control (151-0563-01) Prof. R. D’Andrea Solutions Exam Duration:150 minutes Number of Problems:5

Final Exam – Dynamic Programming & Optimal Control Page 21

• p(0) = 0:A singular arc is only possible if p(t)x(t) = 0 for a non-trivial time interval [0, t∗]. Becauseof the initial state condition x(0) = 1, the adjoint has to be zero, p(t) = 0, ∀t ∈ [0, t∗], i.e.

p(t)!

= 0, ∀t ∈ [0, t∗],

− p(t)u(t)− 2x(t) + 2 = 0, ∀t ∈ [0, t∗],

⇒ x(t) = 1, ∀t ∈ [0, t∗],

and hence

x(t) = 0, ∀t ∈ [0, t∗],

⇒ u(t) = 0, ∀t ∈ [0, t∗].

Since it requires ln(2)umax

time to move from x(t∗) = 1 to x(T ) = 2, t∗ is given by

t∗ = T − ln(2)

umax.

Therefore the optimal solution is to remain on the singular arc until t = t∗ and then switch tou∗(t) = umax.

• 0 ≤ t < t∗:

u∗(t) = 0,

x∗(t) = 1,

p∗(t) = 0.

• t∗ ≤ t ≤ T :

u∗(t) = umax,

x∗(t) = eumax(t−t∗),

p∗(t) = − 1

umaxe−umax(t−t∗) − 1

umaxeumax(t−t∗) +

2

umax.

Page 22: Solutions - ETH Z · Final Exam January 24th, 2015 Dynamic Programming & Optimal Control (151-0563-01) Prof. R. D’Andrea Solutions Exam Duration:150 minutes Number of Problems:5

Page 22 Final Exam – Dynamic Programming & Optimal Control