ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm 1 Chapter 1: The DP Algorithm To do: sequential decision-making state

ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm

1Chapter 1: The DP Algorithm

To do:

sequential decision-making

state

random elements

discrete-time stochastic dynamic system

optimal control/decision problem

actions vs. strategy (information gathering, feedback)

Illustrated via examples, later on the general model will be described.


2Example: Inventory Control Problem

Quantity of a certain item, e.g. gas in a service station, oil in a refinery, cars in a dealership, spare parts in a maintenance facility, etc. The stock is checked at equally spaced periods in time, e.g. every morning, at the end of each week, etc. At those times, a decision must be made as to what quantity of the item to order, so that demand over the present period is “satisfactorily” met (we will give a quantitative meaning to this).

0 1 2 k-1 k k+1 N-1 N

k-1 k k+1

kth period

check stock, place order



Stochastic Difference Equation:

xk+1 = xk + uk – wk

xk : stock at the beginning of kth period

uk : quantity ordered at beginning of kth period. Assume delivered during kth period.

wk : demand during kth period, {wk} stochastic process

assume real-valued variables



Negative stock is interpreted as excess demand, which is backlogged and filled ASAP.

Cost of operation:

1. purchasing cost: cuk (c = cost per unit)

2. H(xk+1): penalty for holding and storage of extra quantity (xk+1>0), or for shortage (xk+1<0)

Cost for period k = cuk + H(xk+uk-wk)

= g(xk,uk,wk)

x k+1



Let

or

0,0

0,

0,0

0,

)(

ph

xpx

x

xhx

xH

),0max(),0max()( xhxpxH



Objective:

to minimize, in some meaningful sense, the total cost of operation over a finite number of periods (finite “horizon”)

total cost over N periods =

1

0

)}({N

kkkkk wuxHcu



Two distinct situations can arise:

Deterministic Case: xo is perfectly known, and the demands are known in advance to the manager.

1. at k=0, all future demands are known {w0, w1, ..., wN-1}.

select all orders at once, so as to exactly meet the demand

x1 = x2 = ... = xN-1 = 0

0 = x1 = x0 + u0 – w0

u0 = w0 – x0

uk = wk, 1 k N-1

: fixed order schedule

assume x0 w0



What we do is to select a set of fixed “actions” (numbers, i.e. precomputed order schedule).

2. At the beginning of period k, wk becomes known (perfect forecast). Hence, we must gather information and make decisions sequentially.

“strategy” rule for making decisions based on information as it becomes available:

11,000

Nkwu

xwu

kk

forecast


9

Stochastic Case: x0 is perfectly known (can generalize to case when only distribution is known), but is a random process.

Assume that are i.i.d., -valued r.v. , with pdf fw , i.e.

w

baP

dwwfbaw

w

b

a

wk

,:

,Prob

Independent of k

Pw : Probability distribution or measure, i.e.

is the problem that takes a value in the set BPw kw B

w

Stochastic Case


10

Note that the stock is now a r.v.

Alternatively, we can describe the evolution of the system in terms of a transition law:

Prob

= Prob

= Prob

=

kkkk wuxx 1

kk uxBP ,| kkk uxBx ,|1 kkkkk uxBwux ,|

BuxP kkw

Bbbux kk |

Stochastic Case

kkkkk uxBuxw ,|


11

Also, the cost is a random quantity minimize expected cost

Action: select all orders (numbers) at k=0 most likely not “optimal” (reduces to nonlinear programming problem)

VS

Strategy: select a sequence of functions

s.t.

1

0

N

kkkkk wuxHcu

,0:k kkk xu

Stochastic Case

: difficult problem ! Optimization is over a function space

Information available of kth period


12

Let = (0, , 1, ... , N-1) : control / decision strategy, policy, law

: set of all admissible strategies (e.g. k(x) 0)

1

00 )()(:),(

N

kkkkkkkN wxxHxcExJ

s.t.

kkkk wuxx 1

Stochastic Dynamic Program:

If the problem is feasible, then and optimal strategy *, i.e.

),(),(:)( 000** xJxJxJ

NNN

Then, the stochastic DP problem is

minimize:


13

Note: No backlogging

0,0

0,1

kkkk wuxx

BuxPuxBPx kkwkkk ,|~1 : transition law

Summary of the Problem

1-Discrete time Stochastic System

),,(:1 kkkkkkkk wuxfwuxx : system equation


14

3-Control constraint:

2-Stochastic element , assumed i.i.d. for example, will generalize to depending on xk and uk.

kw kw

,0ku if there is a maximum capacity M,

kk xMu 0

4-Additive cost

1

0

N

kNNkkkk xgwuxHcuE

kkkk wuxg ,,

Stochastic Dynamic Program

then,


15

5-Optimization over admissible strategies

We will see later on that this problem has a neat closed form solution:

kk

kkkkkk Tx

TxxTx

;0

;*

for some (threshold levels) Tk : base-stock policy

Stochastic Dynamic Program


16

Role of Information: Actions Vs. Strategies

1112

0001 ;:*

wuxx

xwux

Example: Let a two-stage problem be given as:

where w0 is a random variable s.t. it takes values 1 w. p. , i.e. 2

1

2

11Prob 0 w 1Prob 0w

0

0

i

i

u

x


17


Problem A: Choose actions (u0 , u1) (open loop, control schedule) to minimize: || 2

0

xEw

Equivalently, let ||

0,,

0,,

222

1111

0000

xxg

wuxg

wuxg

N=2

minimize

1

022,,

,0 kkkkk

wwxgwuxgE

21,uu

s.t. (*)


18

010200

wuuExEww

|1|2

1|1|

2

11010 uuuu

Solution A:

Case (i): 1|| 10 uu

11 10 uu

1

)1(2

1)1(

2

1|| 10102

0

uuuuxEw



19


Case (ii): 1|| 10 uu

110 uu

1)(

)1(2

1)1(

2

1||

10

101020

uu

uuuuxEw


20

Can be anything, then chooseappropiately.


101 uu

1

)1(2

1)1(

2

1||

10

101020

uu

uuuuxEw

1

1min

*1

*0

2, 010

uu

xEwuu

*0: u

*1u

No information gathering: we choose at the start and do not take in to consideration x1 at the beginning of stage 1.

*1

*0 and uu


21


Problem B: Choose u0 and u1 sequentially, using the observed value of x1.

0|| 20

xEw

020011 xwuxu

Sequential decision-making, feedback control. Thus to take decision u1, we wait until outcome x1 becomes available, and act accordingly.

Solution B: from (*), we select


22

Note: information gathering doesn’t always help:

Let

1 ..

00

pw

w (Deterministic case)

02

01

102

x

uu

uuxDo not gain anything by making decisions sequentially



23Discrete-Time Stochastic Dynamic System Modeland Optimal Decision / Control Problem

1-Discrete time stochastic dynamic system (t, k can be time or events)

:

:

:

,),,(1

kk

kk

kk

kkkkk

Dw

Cu

Sx

kwuxfx

1N ..., 1, 0,

state space of time k

control space

disturbance space (countable)

Also, depending on the state of the system, there are constraints on the actions that can be taken:

kkkk CxUu )(Non empty subset


24Discrete-Time Stochastic Dynamic System Modeland Optimal Decision / Control Problem

2-Stochastic disturbance {wk }.

k

kkk

wkkk

DB

uxBPuxBw

,,Prob )(

: probability measure (distribution), may depend explicitly in time, current state and action, but not on previous disturbances wk-1, … , w0 .


25

3-Admissible Control / Decision Laws (Strategies, Policies)

1N ..., 0,

kCS kkk

N

;:

,...,, 110

Define information patterns !

▪Feasible policies ▪Markov: -Deterministic

-Randomize

and)()( kkkk xUx (*)

policyadmissibleanis : (*) holds

Discrete-Time Stochastic Dynamic System Modeland Optimal Decision / Control Problem


26

4-Finite Horizon Optimal Control / Decision Problem

Given an initial state x0 , and cost functions gk , k=0, …, N-1 find thatminimizes the cost functional

1

00 )),(,()(:),(

N

kkkkkkNN

wN wxxgxgExJ

k

k=0, …, N-1

subject to the system equation constraint

1,...,1,0,)),(,(1 Nkwxxfx kkkkk



27

We say that * is optimal for the initial state x0 if

),(min:)(),(

),,(),(

00**

0

0*

0

xJxJxJ

xJxJ

NNN

NN

Optimal N-stage cost (or value) function


Likewise, for > 0 given, is said to be -optimal if

)(),( 0**

0 xJxJ NN

*


28

This stochastic optimal control problem is difficult!:we are optimizing over strategies


The Dynamic Programming Algorithm will give us necessary and sufficient conditions to decompose this problem into a sequence of coupled minimization problems over actions, (optimization) from which we will obtain . *

NJ

DP is only general approach for sequential design making under uncertainty.


29

Given a dynamic description of a system via a system equation

),,(1 kkkkk wuxfx

Then we can alternatively describe the system via a transition law.

Alternative System Description


30

Alternative System Description

),(~

;),,()(

01

kkk

wk

kkkkk

uxPw

xwuxfx given

Given xk and uk , xk+1 has distribution:

kkkkk uxBxuxBP ,Prob,| 1

(state) lawn transitiosystem

),,( kkkk wuxf

kkkkkkkkk uxDBwuxfDw ,),,(:Prob

kkkkkkkkk

w uxBwuxfDwP ,),,(:)(

kkk uxB ,: )( System equation system transition lawP

Documents

ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm 1 Chapter 1: The DP Algorithm To do: sequential decision-making state