Upload
sharon-wade
View
218
Download
0
Embed Size (px)
Citation preview
ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm
1Chapter 1: The DP Algorithm
To do:
sequential decision-making
state
random elements
discrete-time stochastic dynamic system
optimal control/decision problem
actions vs. strategy (information gathering, feedback)
Illustrated via examples, later on the general model will be described.
ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm
2Example: Inventory Control Problem
Quantity of a certain item, e.g. gas in a service station, oil in a refinery, cars in a dealership, spare parts in a maintenance facility, etc. The stock is checked at equally spaced periods in time, e.g. every morning, at the end of each week, etc. At those times, a decision must be made as to what quantity of the item to order, so that demand over the present period is “satisfactorily” met (we will give a quantitative meaning to this).
0 1 2 k-1 k k+1 N-1 N
k-1 k k+1
kth period
check stock, place order
ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm
3Example: Inventory Control Problem
Stochastic Difference Equation:
xk+1 = xk + uk – wk
xk : stock at the beginning of kth period
uk : quantity ordered at beginning of kth period. Assume delivered during kth period.
wk : demand during kth period, {wk} stochastic process
assume real-valued variables
ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm
4Example: Inventory Control Problem
Negative stock is interpreted as excess demand, which is backlogged and filled ASAP.
Cost of operation:
1. purchasing cost: cuk (c = cost per unit)
2. H(xk+1): penalty for holding and storage of extra quantity (xk+1>0), or for shortage (xk+1<0)
Cost for period k = cuk + H(xk+uk-wk)
= g(xk,uk,wk)
x k+1
ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm
5Example: Inventory Control Problem
Let
or
0,0
0,
0,0
0,
)(
ph
xpx
x
xhx
xH
),0max(),0max()( xhxpxH
ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm
6Example: Inventory Control Problem
Objective:
to minimize, in some meaningful sense, the total cost of operation over a finite number of periods (finite “horizon”)
total cost over N periods =
1
0
)}({N
kkkkk wuxHcu
ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm
7Example: Inventory Control Problem
Two distinct situations can arise:
Deterministic Case: xo is perfectly known, and the demands are known in advance to the manager.
1. at k=0, all future demands are known {w0, w1, ..., wN-1}.
select all orders at once, so as to exactly meet the demand
x1 = x2 = ... = xN-1 = 0
0 = x1 = x0 + u0 – w0
u0 = w0 – x0
uk = wk, 1 k N-1
: fixed order schedule
assume x0 w0
ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm
8Example: Inventory Control Problem
What we do is to select a set of fixed “actions” (numbers, i.e. precomputed order schedule).
2. At the beginning of period k, wk becomes known (perfect forecast). Hence, we must gather information and make decisions sequentially.
“strategy” rule for making decisions based on information as it becomes available:
11,000
Nkwu
xwu
kk
forecast
ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm
9
Stochastic Case: x0 is perfectly known (can generalize to case when only distribution is known), but is a random process.
Assume that are i.i.d., -valued r.v. , with pdf fw , i.e.
w
baP
dwwfbaw
w
b
a
wk
,:
,Prob
Independent of k
Pw : Probability distribution or measure, i.e.
is the problem that takes a value in the set BPw kw B
w
Stochastic Case
ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm
10
Note that the stock is now a r.v.
Alternatively, we can describe the evolution of the system in terms of a transition law:
Prob
= Prob
= Prob
=
kkkk wuxx 1
kk uxBP ,| kkk uxBx ,|1 kkkkk uxBwux ,|
BuxP kkw
Bbbux kk |
Stochastic Case
kkkkk uxBuxw ,|
ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm
11
Also, the cost is a random quantity minimize expected cost
Action: select all orders (numbers) at k=0 most likely not “optimal” (reduces to nonlinear programming problem)
VS
Strategy: select a sequence of functions
s.t.
1
0
N
kkkkk wuxHcu
,0:k kkk xu
Stochastic Case
: difficult problem ! Optimization is over a function space
Information available of kth period
ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm
12
Let = (0, , 1, ... , N-1) : control / decision strategy, policy, law
: set of all admissible strategies (e.g. k(x) 0)
1
00 )()(:),(
N
kkkkkkkN wxxHxcExJ
s.t.
kkkk wuxx 1
Stochastic Dynamic Program:
If the problem is feasible, then and optimal strategy *, i.e.
),(),(:)( 000** xJxJxJ
NNN
Then, the stochastic DP problem is
minimize:
ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm
13
Note: No backlogging
0,0
0,1
kkkk wuxx
BuxPuxBPx kkwkkk ,|~1 : transition law
Summary of the Problem
1-Discrete time Stochastic System
),,(:1 kkkkkkkk wuxfwuxx : system equation
ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm
14
3-Control constraint:
2-Stochastic element , assumed i.i.d. for example, will generalize to depending on xk and uk.
kw kw
,0ku if there is a maximum capacity M,
kk xMu 0
4-Additive cost
1
0
N
kNNkkkk xgwuxHcuE
kkkk wuxg ,,
Stochastic Dynamic Program
then,
ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm
15
5-Optimization over admissible strategies
We will see later on that this problem has a neat closed form solution:
kk
kkkkkk Tx
TxxTx
;0
;*
for some (threshold levels) Tk : base-stock policy
Stochastic Dynamic Program
ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm
16
Role of Information: Actions Vs. Strategies
1112
0001 ;:*
wuxx
xwux
Example: Let a two-stage problem be given as:
where w0 is a random variable s.t. it takes values 1 w. p. , i.e. 2
1
2
11Prob 0 w 1Prob 0w
0
0
i
i
u
x
ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm
17
Role of Information: Actions Vs. Strategies
Problem A: Choose actions (u0 , u1) (open loop, control schedule) to minimize: || 2
0
xEw
Equivalently, let ||
0,,
0,,
222
1111
0000
xxg
wuxg
wuxg
N=2
minimize
1
022,,
,0 kkkkk
wwxgwuxgE
21,uu
s.t. (*)
ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm
18
010200
wuuExEww
|1|2
1|1|
2
11010 uuuu
Solution A:
Case (i): 1|| 10 uu
11 10 uu
1
)1(2
1)1(
2
1|| 10102
0
uuuuxEw
Role of Information: Actions Vs. Strategies
ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm
19
Role of Information: Actions Vs. Strategies
Case (ii): 1|| 10 uu
110 uu
1)(
)1(2
1)1(
2
1||
10
101020
uu
uuuuxEw
ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm
20
Can be anything, then chooseappropiately.
Role of Information: Actions Vs. Strategies
101 uu
1
)1(2
1)1(
2
1||
10
101020
uu
uuuuxEw
1
1min
*1
*0
2, 010
uu
xEwuu
*0: u
*1u
No information gathering: we choose at the start and do not take in to consideration x1 at the beginning of stage 1.
*1
*0 and uu
ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm
21
Role of Information: Actions Vs. Strategies
Problem B: Choose u0 and u1 sequentially, using the observed value of x1.
0|| 20
xEw
020011 xwuxu
Sequential decision-making, feedback control. Thus to take decision u1, we wait until outcome x1 becomes available, and act accordingly.
Solution B: from (*), we select
ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm
22
Note: information gathering doesn’t always help:
Let
1 ..
00
pw
w (Deterministic case)
02
01
102
x
uu
uuxDo not gain anything by making decisions sequentially
Role of Information: Actions Vs. Strategies
ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm
23Discrete-Time Stochastic Dynamic System Modeland Optimal Decision / Control Problem
1-Discrete time stochastic dynamic system (t, k can be time or events)
:
:
:
,),,(1
kk
kk
kk
kkkkk
Dw
Cu
Sx
kwuxfx
1N ..., 1, 0,
state space of time k
control space
disturbance space (countable)
Also, depending on the state of the system, there are constraints on the actions that can be taken:
kkkk CxUu )(Non empty subset
ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm
24Discrete-Time Stochastic Dynamic System Modeland Optimal Decision / Control Problem
2-Stochastic disturbance {wk }.
k
kkk
wkkk
DB
uxBPuxBw
,,Prob )(
: probability measure (distribution), may depend explicitly in time, current state and action, but not on previous disturbances wk-1, … , w0 .
ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm
25
3-Admissible Control / Decision Laws (Strategies, Policies)
1N ..., 0,
kCS kkk
N
;:
,...,, 110
Define information patterns !
▪Feasible policies ▪Markov: -Deterministic
-Randomize
and)()( kkkk xUx (*)
policyadmissibleanis : (*) holds
Discrete-Time Stochastic Dynamic System Modeland Optimal Decision / Control Problem
ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm
26
4-Finite Horizon Optimal Control / Decision Problem
Given an initial state x0 , and cost functions gk , k=0, …, N-1 find thatminimizes the cost functional
1
00 )),(,()(:),(
N
kkkkkkNN
wN wxxgxgExJ
k
k=0, …, N-1
subject to the system equation constraint
1,...,1,0,)),(,(1 Nkwxxfx kkkkk
Discrete-Time Stochastic Dynamic System Modeland Optimal Decision / Control Problem
ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm
27
We say that * is optimal for the initial state x0 if
),(min:)(),(
),,(),(
00**
0
0*
0
xJxJxJ
xJxJ
NNN
NN
Optimal N-stage cost (or value) function
Discrete-Time Stochastic Dynamic System Modeland Optimal Decision / Control Problem
Likewise, for > 0 given, is said to be -optimal if
)(),( 0**
0 xJxJ NN
*
ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm
28
This stochastic optimal control problem is difficult!:we are optimizing over strategies
Discrete-Time Stochastic Dynamic System Modeland Optimal Decision / Control Problem
The Dynamic Programming Algorithm will give us necessary and sufficient conditions to decompose this problem into a sequence of coupled minimization problems over actions, (optimization) from which we will obtain . *
NJ
DP is only general approach for sequential design making under uncertainty.
ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm
29
Given a dynamic description of a system via a system equation
),,(1 kkkkk wuxfx
Then we can alternatively describe the system via a transition law.
Alternative System Description
ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm
30
Alternative System Description
),(~
;),,()(
01
kkk
wk
kkkkk
uxPw
xwuxfx given
Given xk and uk , xk+1 has distribution:
kkkkk uxBxuxBP ,Prob,| 1
(state) lawn transitiosystem
),,( kkkk wuxf
kkkkkkkkk uxDBwuxfDw ,),,(:Prob
kkkkkkkkk
w uxBwuxfDwP ,),,(:)(
kkk uxB ,: )( System equation system transition lawP