30
ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm 1 Chapter 1: The DP Algorithm To do: sequential decision-making state random elements discrete-time stochastic dynamic system optimal control/decision problem actions vs. strategy (information gathering, feedback) Illustrated via examples, later on the general model will be described.

ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm 1 Chapter 1: The DP Algorithm To do: sequential decision-making state

Embed Size (px)

Citation preview

Page 1: ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm 1 Chapter 1: The DP Algorithm To do:  sequential decision-making  state

ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm

1Chapter 1: The DP Algorithm

To do:

sequential decision-making

state

random elements

discrete-time stochastic dynamic system

optimal control/decision problem

actions vs. strategy (information gathering, feedback)

Illustrated via examples, later on the general model will be described.

Page 2: ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm 1 Chapter 1: The DP Algorithm To do:  sequential decision-making  state

ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm

2Example: Inventory Control Problem

Quantity of a certain item, e.g. gas in a service station, oil in a refinery, cars in a dealership, spare parts in a maintenance facility, etc. The stock is checked at equally spaced periods in time, e.g. every morning, at the end of each week, etc. At those times, a decision must be made as to what quantity of the item to order, so that demand over the present period is “satisfactorily” met (we will give a quantitative meaning to this).

0 1 2 k-1 k k+1 N-1 N

k-1 k k+1

kth period

check stock, place order

Page 3: ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm 1 Chapter 1: The DP Algorithm To do:  sequential decision-making  state

ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm

3Example: Inventory Control Problem

Stochastic Difference Equation:

xk+1 = xk + uk – wk

xk : stock at the beginning of kth period

uk : quantity ordered at beginning of kth period. Assume delivered during kth period.

wk : demand during kth period, {wk} stochastic process

assume real-valued variables

Page 4: ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm 1 Chapter 1: The DP Algorithm To do:  sequential decision-making  state

ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm

4Example: Inventory Control Problem

Negative stock is interpreted as excess demand, which is backlogged and filled ASAP.

Cost of operation:

1. purchasing cost: cuk (c = cost per unit)

2. H(xk+1): penalty for holding and storage of extra quantity (xk+1>0), or for shortage (xk+1<0)

Cost for period k = cuk + H(xk+uk-wk)

= g(xk,uk,wk)

x k+1

Page 5: ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm 1 Chapter 1: The DP Algorithm To do:  sequential decision-making  state

ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm

5Example: Inventory Control Problem

Let

or

0,0

0,

0,0

0,

)(

ph

xpx

x

xhx

xH

),0max(),0max()( xhxpxH

Page 6: ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm 1 Chapter 1: The DP Algorithm To do:  sequential decision-making  state

ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm

6Example: Inventory Control Problem

Objective:

to minimize, in some meaningful sense, the total cost of operation over a finite number of periods (finite “horizon”)

total cost over N periods =

1

0

)}({N

kkkkk wuxHcu

Page 7: ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm 1 Chapter 1: The DP Algorithm To do:  sequential decision-making  state

ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm

7Example: Inventory Control Problem

Two distinct situations can arise:

Deterministic Case: xo is perfectly known, and the demands are known in advance to the manager.

1. at k=0, all future demands are known {w0, w1, ..., wN-1}.

select all orders at once, so as to exactly meet the demand

x1 = x2 = ... = xN-1 = 0

0 = x1 = x0 + u0 – w0

u0 = w0 – x0

uk = wk, 1 k N-1

: fixed order schedule

assume x0 w0

Page 8: ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm 1 Chapter 1: The DP Algorithm To do:  sequential decision-making  state

ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm

8Example: Inventory Control Problem

What we do is to select a set of fixed “actions” (numbers, i.e. precomputed order schedule).

2. At the beginning of period k, wk becomes known (perfect forecast). Hence, we must gather information and make decisions sequentially.

“strategy” rule for making decisions based on information as it becomes available:

11,000

Nkwu

xwu

kk

forecast

Page 9: ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm 1 Chapter 1: The DP Algorithm To do:  sequential decision-making  state

ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm

9

Stochastic Case: x0 is perfectly known (can generalize to case when only distribution is known), but is a random process.

Assume that are i.i.d., -valued r.v. , with pdf fw , i.e.

w

baP

dwwfbaw

w

b

a

wk

,:

,Prob

Independent of k

Pw : Probability distribution or measure, i.e.

is the problem that takes a value in the set BPw kw B

w

Stochastic Case

Page 10: ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm 1 Chapter 1: The DP Algorithm To do:  sequential decision-making  state

ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm

10

Note that the stock is now a r.v.

Alternatively, we can describe the evolution of the system in terms of a transition law:

Prob

= Prob

= Prob

=

kkkk wuxx 1

kk uxBP ,| kkk uxBx ,|1 kkkkk uxBwux ,|

BuxP kkw

Bbbux kk |

Stochastic Case

kkkkk uxBuxw ,|

Page 11: ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm 1 Chapter 1: The DP Algorithm To do:  sequential decision-making  state

ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm

11

Also, the cost is a random quantity minimize expected cost

Action: select all orders (numbers) at k=0 most likely not “optimal” (reduces to nonlinear programming problem)

VS

Strategy: select a sequence of functions

s.t.

1

0

N

kkkkk wuxHcu

,0:k kkk xu

Stochastic Case

: difficult problem ! Optimization is over a function space

Information available of kth period

Page 12: ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm 1 Chapter 1: The DP Algorithm To do:  sequential decision-making  state

ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm

12

Let = (0, , 1, ... , N-1) : control / decision strategy, policy, law

: set of all admissible strategies (e.g. k(x) 0)

1

00 )()(:),(

N

kkkkkkkN wxxHxcExJ

s.t.

kkkk wuxx 1

Stochastic Dynamic Program:

If the problem is feasible, then and optimal strategy *, i.e.

),(),(:)( 000** xJxJxJ

NNN

Then, the stochastic DP problem is

minimize:

Page 13: ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm 1 Chapter 1: The DP Algorithm To do:  sequential decision-making  state

ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm

13

Note: No backlogging

0,0

0,1

kkkk wuxx

BuxPuxBPx kkwkkk ,|~1 : transition law

Summary of the Problem

1-Discrete time Stochastic System

),,(:1 kkkkkkkk wuxfwuxx : system equation

Page 14: ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm 1 Chapter 1: The DP Algorithm To do:  sequential decision-making  state

ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm

14

3-Control constraint:

2-Stochastic element , assumed i.i.d. for example, will generalize to depending on xk and uk.

kw kw

,0ku if there is a maximum capacity M,

kk xMu 0

4-Additive cost

1

0

N

kNNkkkk xgwuxHcuE

kkkk wuxg ,,

Stochastic Dynamic Program

then,

Page 15: ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm 1 Chapter 1: The DP Algorithm To do:  sequential decision-making  state

ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm

15

5-Optimization over admissible strategies

We will see later on that this problem has a neat closed form solution:

kk

kkkkkk Tx

TxxTx

;0

;*

for some (threshold levels) Tk : base-stock policy

Stochastic Dynamic Program

Page 16: ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm 1 Chapter 1: The DP Algorithm To do:  sequential decision-making  state

ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm

16

Role of Information: Actions Vs. Strategies

1112

0001 ;:*

wuxx

xwux

Example: Let a two-stage problem be given as:

where w0 is a random variable s.t. it takes values 1 w. p. , i.e. 2

1

2

11Prob 0 w 1Prob 0w

0

0

i

i

u

x

Page 17: ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm 1 Chapter 1: The DP Algorithm To do:  sequential decision-making  state

ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm

17

Role of Information: Actions Vs. Strategies

Problem A: Choose actions (u0 , u1) (open loop, control schedule) to minimize: || 2

0

xEw

Equivalently, let ||

0,,

0,,

222

1111

0000

xxg

wuxg

wuxg

N=2

minimize

1

022,,

,0 kkkkk

wwxgwuxgE

21,uu

s.t. (*)

Page 18: ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm 1 Chapter 1: The DP Algorithm To do:  sequential decision-making  state

ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm

18

010200

wuuExEww

|1|2

1|1|

2

11010 uuuu

Solution A:

Case (i): 1|| 10 uu

11 10 uu

1

)1(2

1)1(

2

1|| 10102

0

uuuuxEw

Role of Information: Actions Vs. Strategies

Page 19: ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm 1 Chapter 1: The DP Algorithm To do:  sequential decision-making  state

ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm

19

Role of Information: Actions Vs. Strategies

Case (ii): 1|| 10 uu

110 uu

1)(

)1(2

1)1(

2

1||

10

101020

uu

uuuuxEw

Page 20: ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm 1 Chapter 1: The DP Algorithm To do:  sequential decision-making  state

ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm

20

Can be anything, then chooseappropiately.

Role of Information: Actions Vs. Strategies

101 uu

1

)1(2

1)1(

2

1||

10

101020

uu

uuuuxEw

1

1min

*1

*0

2, 010

uu

xEwuu

*0: u

*1u

No information gathering: we choose at the start and do not take in to consideration x1 at the beginning of stage 1.

*1

*0 and uu

Page 21: ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm 1 Chapter 1: The DP Algorithm To do:  sequential decision-making  state

ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm

21

Role of Information: Actions Vs. Strategies

Problem B: Choose u0 and u1 sequentially, using the observed value of x1.

0|| 20

xEw

020011 xwuxu

Sequential decision-making, feedback control. Thus to take decision u1, we wait until outcome x1 becomes available, and act accordingly.

Solution B: from (*), we select

Page 22: ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm 1 Chapter 1: The DP Algorithm To do:  sequential decision-making  state

ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm

22

Note: information gathering doesn’t always help:

Let

1 ..

00

pw

w (Deterministic case)

02

01

102

x

uu

uuxDo not gain anything by making decisions sequentially

Role of Information: Actions Vs. Strategies

Page 23: ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm 1 Chapter 1: The DP Algorithm To do:  sequential decision-making  state

ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm

23Discrete-Time Stochastic Dynamic System Modeland Optimal Decision / Control Problem

1-Discrete time stochastic dynamic system (t, k can be time or events)

:

:

:

,),,(1

kk

kk

kk

kkkkk

Dw

Cu

Sx

kwuxfx

1N ..., 1, 0,

state space of time k

control space

disturbance space (countable)

Also, depending on the state of the system, there are constraints on the actions that can be taken:

kkkk CxUu )(Non empty subset

Page 24: ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm 1 Chapter 1: The DP Algorithm To do:  sequential decision-making  state

ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm

24Discrete-Time Stochastic Dynamic System Modeland Optimal Decision / Control Problem

2-Stochastic disturbance {wk }.

k

kkk

wkkk

DB

uxBPuxBw

,,Prob )(

: probability measure (distribution), may depend explicitly in time, current state and action, but not on previous disturbances wk-1, … , w0 .

Page 25: ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm 1 Chapter 1: The DP Algorithm To do:  sequential decision-making  state

ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm

25

3-Admissible Control / Decision Laws (Strategies, Policies)

1N ..., 0,

kCS kkk

N

;:

,...,, 110

Define information patterns !

▪Feasible policies ▪Markov: -Deterministic

-Randomize

and)()( kkkk xUx (*)

policyadmissibleanis : (*) holds

Discrete-Time Stochastic Dynamic System Modeland Optimal Decision / Control Problem

Page 26: ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm 1 Chapter 1: The DP Algorithm To do:  sequential decision-making  state

ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm

26

4-Finite Horizon Optimal Control / Decision Problem

Given an initial state x0 , and cost functions gk , k=0, …, N-1 find thatminimizes the cost functional

1

00 )),(,()(:),(

N

kkkkkkNN

wN wxxgxgExJ

k

k=0, …, N-1

subject to the system equation constraint

1,...,1,0,)),(,(1 Nkwxxfx kkkkk

Discrete-Time Stochastic Dynamic System Modeland Optimal Decision / Control Problem

Page 27: ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm 1 Chapter 1: The DP Algorithm To do:  sequential decision-making  state

ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm

27

We say that * is optimal for the initial state x0 if

),(min:)(),(

),,(),(

00**

0

0*

0

xJxJxJ

xJxJ

NNN

NN

Optimal N-stage cost (or value) function

Discrete-Time Stochastic Dynamic System Modeland Optimal Decision / Control Problem

Likewise, for > 0 given, is said to be -optimal if

)(),( 0**

0 xJxJ NN

*

Page 28: ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm 1 Chapter 1: The DP Algorithm To do:  sequential decision-making  state

ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm

28

This stochastic optimal control problem is difficult!:we are optimizing over strategies

Discrete-Time Stochastic Dynamic System Modeland Optimal Decision / Control Problem

The Dynamic Programming Algorithm will give us necessary and sufficient conditions to decompose this problem into a sequence of coupled minimization problems over actions, (optimization) from which we will obtain . *

NJ

DP is only general approach for sequential design making under uncertainty.

Page 29: ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm 1 Chapter 1: The DP Algorithm To do:  sequential decision-making  state

ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm

29

Given a dynamic description of a system via a system equation

),,(1 kkkkk wuxfx

Then we can alternatively describe the system via a transition law.

Alternative System Description

Page 30: ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm 1 Chapter 1: The DP Algorithm To do:  sequential decision-making  state

ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm

30

Alternative System Description

),(~

;),,()(

01

kkk

wk

kkkkk

uxPw

xwuxfx given

Given xk and uk , xk+1 has distribution:

kkkkk uxBxuxBP ,Prob,| 1

(state) lawn transitiosystem

),,( kkkk wuxf

kkkkkkkkk uxDBwuxfDw ,),,(:Prob

kkkkkkkkk

w uxBwuxfDwP ,),,(:)(

kkk uxB ,: )( System equation system transition lawP