Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
© 2008 Warren B. Powell Slide 1
Approximate Dynamic Programming:Solving the curses of dimensionality
Informs Computing Society Tutorial
October, 2008
Warren PowellCASTLE LaboratoryPrinceton University
http://www.castlelab.princeton.edu
© 2008 Warren B. Powell, Princeton University
© 2008 Warren B. Powell Slide 2
© 2008 Warren B. Powell Slide 3
© 2008 Warren B. Powell Slide 4
The fractional jet ownership industry
© 2008 Warren B. Powell Slide 5NetJets Inc.
© 2008 Warren B. Powell Slide 6
Schneider National
© 2008 Warren B. Powell Slide 7
Schneider National
© 2008 Warren B. Powell Slide 8
Planning for a risky world
Weather
•Robust design of emergency response networks.
•Design of financial instruments to hedge against weather emergencies to protect individuals, companies and municipalities.
•Design of sensor networks and communication systems to manage responses to major weather events.
Disease
•Models of disease propagation for response planning.
•Management of medical personnel, equipment and vaccines to respond to a disease outbreak.
•Robust design of supply chains to mitigate the disruption of transportation systems.
© 2008 Warren B. Powell Slide 9
Energy management
Energy resource allocation• What is the right mix of energy technologies?
• How should the use of different energy resources be coordinated over space and time?
• What should my energy R&D portfolio look like?
• Should I invest in nuclear energy?
• What is the impact of a carbon tax?
Energy markets• How should I hedge energy commodities?
• How do I price energy assets?
• What is the right price for energy futures?
© 2008 Warren B. Powell Slide 10
High value spare parts
Electric Power Grid
•PJM oversees an aging investment in high-voltage transformers.
•Replacement strategy needs to anticipate a bulge in retirements and failures
•1-2 year lag times on orders. Typical cost of a replacement ~ $5 million.
•Failures vary widely in terms of economic impact on the network.
Spare parts for business jets
•ADP is used to determine purchasing and allocation strategies for over 400 high value spare parts.
•Inventory strategy has to determine what to buy, when and where to store it. Many parts are very low volume (e.g. 7 spares spread across 15 service centers).
•Inventories have to meet global targets on level of service and inventory costs.
© 2008 Warren B. Powell Slide 11
Challenges
Real-time control» Scheduling aircraft, pilots, generators, tankers» Pricing stocks, options» Electricity resource allocation
Near-term tactical planning» Can I accept a customer request?» Should I lease equipment?» How much energy can I commit with my windmills?
Strategic planning» What is the right equipment mix?» What is the value of this contract?» What is the value of more reliable aircraft?
Outline
A resource allocation modelAn introduction to ADP ADP and the post-decision state variableA blood management exampleHierarchical aggregationStepsizesHow well does it work?Some applications» Transportation» Energy
Outline
A resource allocation modelAn introduction to ADP ADP and the post-decision state variableA blood management exampleHierarchical aggregationStepsizesHow well does it work?Some applications» Transportation» Energy
© 2008 Warren B. Powell Slide 14
A resource allocation model
Attribute vectors:
a =Location
ETAA/C typeFuel level
Home shopCrewEqpt1
Eqpt100
⎡ ⎤⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦
LocationETAHome
ExperienceDriving hours
⎡ ⎤⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦
TypeLocation
Age
⎡ ⎤⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦
Asset classTime invested⎡ ⎤⎢ ⎥⎣ ⎦
© 2008 Warren B. Powell Slide 15
A resource allocation model
Modeling resources:» The attributes of a single resource:
» The resource state vector:
» The information process:
The attributes of a single resource The attribute space
aa=∈A
ˆ The change in the number of resources with attribute .
taRa
=
( )The number of resources with attribute
The resource state vectorta
t ta a
R a
R R∈
=
=A
© 2008 Warren B. Powell Slide 16
A resource allocation model
Modeling demands:» The attributes of a single demand:
» The demand state vector:
» The information process:ˆ The change in the number of demands with
attribute .tbD
b=
The attributes of a demand to be served. The attribute space
bb=∈B
( )The number of demands with attribute
The demand state vectortb
t tb b
D b
D D∈
=
=B
© 2008 Warren B. Powell Slide 17
Energy resource modeling
The system state:
( ), , System state, where:
Resource state (how much capacity, reserves) Market demands "system parameters" State of the technology (costs, pe
t t t t
t
t
t
S R D
RD
ρ
ρ
= =
=
==
rformance) Climate, weather (temperature, rainfall, wind) Government policies (tax rebates on solar panels) Market prices (oil, coal)
⎫⎪⎪⎪⎬⎪⎪⎪⎭
© 2008 Warren B. Powell Slide 18
Energy resource modeling
The decision variable:
New capacityRetired capacity
:Type
LocationTechnology
t
for eachx
⎛ ⎞⎜ ⎟⎜ ⎟⎜ ⎟
= ⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠
⎫⎪⎪⎪⎬⎪⎪⎪⎭
© 2008 Warren B. Powell Slide 19
Energy resource modeling
Exogenous information:
⎫⎪⎪⎪⎬⎪⎪⎪⎭
( )ˆ ˆ ˆNew information = , ,t t t tW R D ρ=
ˆ Exogenous changes in capacity, reservesˆ New demands for energy from each sourceˆ Exogenous changes in parameters.
t
t
t
R
Dρ
=
=
=
© 2008 Warren B. Powell Slide 20
Energy resource modeling
The transition function
⎫⎪⎪⎪⎬⎪⎪⎪⎭
1 1( , , )Mt t t tS S S x W+ +=
© 2008 Warren B. Powell Slide 21
A resource allocation model
DemandsResources
© 2008 Warren B. Powell Slide 22
A resource allocation model
t t+1 t+2
© 2008 Warren B. Powell Slide 23
A resource allocation model
t t+1 t+2
Optimizing at a point in time
Optimizing over time
Outline
A resource allocation modelAn introduction to ADP ADP and the post-decision state variableA blood management exampleHierarchical aggregationStepsizesHow well does it work?Some applications» Transportation» Energy
© 2008 Warren B. Powell Slide 25
Introduction to ADP
We just solved Bellman’s equation:
» We found the value of being in each state by stepping backward through the tree.
{ }1 1( ) max ( , ) ( ) |t t t t t t t txV S C S x E V S S+ +∈
= +X
© 2008 Warren B. Powell Slide 26
Introduction to ADP
The challenge of dynamic programming:
Problem: Curse of dimensionality
{ }( )1 1( ) max ( , ) ( ) |t t t t t t t txV S C S x E V S S+ +∈
= +X
© 2008 Warren B. Powell Slide 27
Introduction to ADP
The challenge of dynamic programming:
Problem: Curse of dimensionality
{ }( )1 1( ) max ( , ) ( ) |t t t t t t t txV S C S x E V S S+ +∈
= +X
Three curses
State spaceOutcome spaceAction space (feasible region)
© 2008 Warren B. Powell Slide 28
Introduction to ADP
The computational challenge:
How do we find ? 1 1( )t tV S+ +
How do we compute the expectation?
How do we find the optimal solution?
{ }( )1 1( ) max ( , ) ( ) |t t t t t t t txV S C S x E V S S+ +∈
= +X
Introduction to ADP
Classical ADP» Most applications of ADP focus on the challenge of
handling multidimensional state variables» Start with
» Now replace the value function with some sort of approximation
{ }( )1 1( ) max ( , ) ( ) |t t t t t t t txV S C S x E V S S+ +∈
= +X
( ) ( )t t t tV S V S≈
Introduction to ADP
Approximating the value function:» We have to exploit the structure of the value function
(e.g. concavity).» We might approximate the value function using a
simple polynomial
» .. or a complicated one:
» Sometimes, they get really messy:
20 1 2( | )t t t tV S S Sθ θ θ θ= + +
( )20 1 2 3 4( | ) ln sin( )t t t t t tV S S S S Sθ θ θ θ θ θ= + + + +
2 3(0) (1) (1)
, ' , ' , ' , '' '
(2) 2 (2) 2, ' , ' , ' , '
' '2
(3), , '
'
21 1(4)
, ' , ' '' ' '
( | )
1
12
t t
t t st t st t wt t wts t t w t t
t wt t wt t st t stw t s t
ts t st t s ts s
t t
ts t st t s ts t t s t t
V R R R
R R
R RS
R RS
θ θ θ θ
θ θ
θ
θ
+ +
= =
+ +
= =
= + +
+ +
⎛ ⎞+ −⎜ ⎟
⎝ ⎠
⎛ ⎞⎛ ⎞+ −⎜ ⎟⎜ ⎟
⎝ ⎠⎝ ⎠
+
∑∑ ∑∑
∑∑ ∑∑
∑ ∑
∑ ∑ ∑∑22 2
(5), ' , ' '
' ' '
( ,1), , ,
2( ,2), , , '
'
3 2( ,3), , ' , '
' '
( ,4), , '
'
13
t t
ts t st t s ts t t s t t
wst ws t wt t st
w s
tws
t ws t wt t stw s t t
t tws
t ws t wt t stw s t t t t
wst ws t wt
t
R RS
R R
R R
R R
R
θ
θ
θ
θ
θ
+ +
= =
+
=
+ +
= =
⎛ ⎞⎛ ⎞ −⎜ ⎟⎜ ⎟⎝ ⎠⎝ ⎠
+
⎛ ⎞+ ⎜ ⎟⎝ ⎠
⎛ ⎞⎛ ⎞+ ⎜ ⎟⎜ ⎟⎝ ⎠⎝ ⎠
+
∑ ∑ ∑∑
∑∑
∑∑ ∑
∑∑ ∑ ∑2 2
, ''
( ,5), , , 2 , , , 2
t t
t stw s t t t
wst sw t s t t wt t w t
w s
R
R R Rθ
+ +
= =
+ +
⎛ ⎞⎛ ⎞⎜ ⎟⎜ ⎟⎝ ⎠⎝ ⎠
+
∑∑ ∑ ∑
∑∑
1,2,3 4,5,6,7
8-14
15
16
17
18
19
20
21
22
Introduction to ADP
We can write a model of the observed value of being in a state as:
This is often written as a generic regression model:
The ADP community refers to the independent variables as basis functions:
( )20 1 2 3 4ˆ ln sin( )t t t tv S S S Sθ θ θ θ θ ε= + + + + +
0 1 1 2 2 3 3 4 4 Y X X X Xθ θ θ θ θ= + + + +
0 0 1 1 2 2 3 3 4 4( ) ( ) ( ) ( ) ( )
= ( )f ff
Y S S S S S
S
θ ϕ θ ϕ θ ϕ θ ϕ θ ϕ
θ ϕ∈
= + + + +
∑F
( )f Rϕ are also known as features.
Introduction to ADP
Methods for estimating»
»
θ1 2ˆ ˆ ˆGenerate observations , ,..., , and use traditional regression
methods to fit .
Nv v vθ
Stochastic gradient for updating :nθ
( )
( )
1 1 1 1 11
1
21 1 11
ˆ( | ) ( | )
( )( )
ˆ ( | )
( )
n n n n n n n n nn
n n n n nn
F
V S v V S
SS
V S v
S
θ θ α θ θ
ϕϕ
θ α θ
ϕ
− − − − −−
− − −−
= − − ∇
⎛ ⎞⎜ ⎟⎜ ⎟= − −⎜ ⎟⎜ ⎟⎝ ⎠
Error Basis functions
Introduction to ADP
Methods for estimating» Recursive statistics
• Iterative equations avoid matrix inverse:
θ
( )( )( )
1
11
1 1 1
1
Error1 is matrix=
1
1
n n n n n n
n n Tn
Tn n n n n nn
Tn n n n
H x
H B B F F XX
B B B x x B
x B x
θ θ ε ε
γ
γ
γ
−
−−
− − −
−
= − =
⎡ ⎤= × ⎣ ⎦
= −
= +
Introduction to ADP
Other statistical methods
» Regression trees• Combines regression with techniques for discrete variables.
» Data mining• Good for categorical data
» Neural networks• Engineers like this for low-dimensional continuous problems
Introduction to ADP
Notes:» When approximating value functions, we are basically
drawing on the entire field of statistics.» Choosing an approximation is primarily an art.» In special cases, the resulting algorithm can produce
optimal solutions.» Most of the time, we are hoping for “good” solutions.» In some cases, it can work terribly.» As a general rule – you have to use problem structure.
Value function approximations have to capture the right structure. Blind use of polynomials will rarely be successful.
Introduction to ADP
What you will struggle with:» Stepsizes
• Can’t live with ‘em, can’t live without ‘em.• Too small, you think you have converged but you have really
just stalled (“apparent convergence”)• Too large, and the system is unstable.
» Stability• There are two sources of randomness:
– The traditional exogenous randomness– An evolving policy
» Exploration vs. exploitation• You sometimes have to choose to visit a state just to collect
information about the value of being in a state.
Introduction to ADP
But we are not out of the woods…» Assume we have an approximate value function.» We still have to solve a problem that looks like
» This means we still have to deal with a maximization problem (might be a linear, nonlinear or integer program) with an expectation.
1( ) max ( , ) ( )t t t t t f f tx fV S C S x E Sθ φ +∈ ∈
⎛ ⎞= +⎜ ⎟
⎝ ⎠∑
X F
Outline
A resource allocation modelAn introduction to ADP ADP and the post-decision state variableA blood management exampleHierarchical aggregationStepsizesHow well does it work?Some applications» Transportation» Energy
Do not use
weather report
Use w
eath
er re
port
Forecast sunny .6
Rain .8 -$2000Clouds .2 $1000Sun .0 $5000Rain .8 -$200Clouds .2 -$200Sun .0 -$200
Schedule game
Cancel game
Rain .1 -$2000Clouds .5 $1000Sun .4 $5000Rain .1 -$200Clouds .5 -$200Sun .4 -$200
Schedule game
Cancel game
Rain .1 -$2000Clouds .2 $1000Sun .7 $5000Rain .1 -$200Clouds .2 -$200Sun .7 -$200
Schedule game
Cancel game
Rain .2 -$2000Clouds .3 $1000Sun .5 $5000Rain .2 -$200Clouds .3 -$200Sun .5 -$200
Schedule game
Cancel game
Forecast cloudy .3
Forecast rain .1
- Decision nodes
- Outcome nodes
Information
ActionInformation
Action
State
State
Do not use
weather report
Use w
eath
er re
port
Forecast sunny .6
Schedule game
Cancel game
Schedule game
Cancel game
Schedule game
Cancel game
Schedule game
Cancel game
Forecast cloudy .3
Forecast rain .1
-$1400
-$200
$2300
-$200
$3500
-$200
$2400
-$200
Do not use
weather report
Use w
eath
er re
port
Forecast sunny .6Schedule game
Cancel game
Forecast cloudy .3
Forecast rain .1 -$200
$2300
$3500
$2400
-$200
Do not use
weather report
Use w
eath
er re
port
$2770
$2400
© 2008 Warren B. Powell Slide 44
The post-decision state
© 2008 Warren B. Powell Slide 45
The post-decision state
Managing blood inventories over time
t=0
0S1 1
ˆ ˆ,R D1S
Week 1
1x
2 2ˆ ˆ,R D
2S
Week 2
2x2xS
3 3ˆ ˆ,R D
3S3x
Week 3
3xS
t=1 t=2 t=3
Week 0
0x
© 2008 Warren B. Powell Slide 46
The post-decision state
What happens if use Bellman’s equation?» State variable is:
• The supply of each type of blood– 8 blood types– 6 ages
• The demand for each type of blood– 8 blood types
• 56 dimensional state vector» Decision variable is how much of 8 blood
types to supply to 8 demand types.• 162- dimensional decision vector
» Random information• Blood donations by week (8 types)• New demands for blood (8 types)• 16-dimensional information vector
Do not
weath
Use w
eath
er re
port
Forecast sunny .6
Rain .8 -$2000Clouds .2 $1000Sun .0 $5000Rain .8 -$200Clouds .2 -$200Sun .0 -$200
Schedule game
Cancel game
Rain .1 -$2000Clouds .5 $1000Sun .4 $5000Rain .1 -$200Clouds .5 -$200Sun .4 -$200
Schedule game
Cancel game
Rain .1 -$2000Clouds .2 $1000Sun .7 $5000Rain .1 -$200Clouds .2 -$200S n 7 $200
Schedule game
Cancel game
Rain 2 -$2000
Forecast cloudy .3
Forecast rain .1
Do not
weath
Use w
eath
er re
port
Forecast sunny .6
Rain .8 -$2000Clouds .2 $1000Sun .0 $5000Rain .8 -$200Clouds .2 -$200Sun .0 -$200
Schedule game
Cancel game
Rain .1 -$2000Clouds .5 $1000Sun .4 $5000Rain .1 -$200Clouds .5 -$200Sun .4 -$200
Schedule game
Cancel game
Rain .1 -$2000Clouds .2 $1000Sun .7 $5000Rain .1 -$200Clouds .2 -$200S n 7 $200
Schedule game
Cancel game
Rain 2 -$2000
Forecast cloudy .3
Forecast rain .1
{ }( )1 1( ) max ( , ) ( ) |t t t t t t t txV S C S x E V S S+ +∈
= +X
tS
1tS +
Do not use
weather report
Use w
eath
er re
port
Forecast sunny .6
Rain .8 -$2000Clouds .2 $1000Sun .0 $5000Rain .8 -$200Clouds .2 -$200Sun .0 -$200
Schedule game
Cancel game
Rain .1 -$2000Clouds .5 $1000Sun .4 $5000Rain .1 -$200Clouds .5 -$200Sun .4 -$200
Schedule game
Cancel game
Rain .1 -$2000Clouds .2 $1000Sun .7 $5000Rain .1 -$200Clouds .2 -$200Sun .7 -$200
Schedule game
Cancel game
Rain .2 -$2000Clouds .3 $1000Sun .5 $5000Rain .2 -$200Clouds .3 -$200Sun .5 -$200
Schedule game
Cancel game
Forecast cloudy .3
Forecast rain .1
- Decision nodes
- Outcome nodes
Do not use
weather report
Use w
eath
er re
port
Forecast sunny .6
Rain .8 -$2000Clouds .2 $1000Sun .0 $5000Rain .8 -$200Clouds .2 -$200Sun .0 -$200
Schedule game
Cancel game
Rain .1 -$2000Clouds .5 $1000Sun .4 $5000Rain .1 -$200Clouds .5 -$200Sun .4 -$200
Schedule game
Cancel game
Rain .1 -$2000Clouds .2 $1000Sun .7 $5000Rain .1 -$200Clouds .2 -$200Sun .7 -$200
Schedule game
Cancel game
Rain .2 -$2000Clouds .3 $1000Sun .5 $5000Rain .2 -$200Clouds .3 -$200Sun .5 -$200
Schedule game
Cancel game
Forecast cloudy .3
Forecast rain .1
- Decision nodes
- Outcome nodes
© 2008 Warren B. Powell Slide 49
The post-decision state
New concept:» The “pre-decision” state variable:
•
• Same as a “decision node” in a decision tree.
» The “post-decision” state variable:
•
• Same as an “outcome node” in a decision tree.
The information required to make a decision t tS x=
The state of what we know immediately after we make a decision.
xtS =
© 2008 Warren B. Powell Slide 50
The post-decision state
An inventory problem:» Our basic inventory equation:
» Using pre- and post-decision states:
Rt+1 = max{0, Rt + xt − D̂t+1}where
Rt = Inventory at time t.
xt = Amount we ordered.
D̂t+1 = Demand in next time period.
Rxt = Rt + xt Post-decision state
Rt+1 = max{0, Rxt − D̂t+1} Pre-decision state
© 2008 Warren B. Powell Slide 51
⎛⎜⎜⎜⎝
⎞⎟⎟⎟⎠
The post-decision state
Pre-decision, state-action, and post-decision
Pre-decision state State Action Post-decision state
93 states 93 9 state-action pairs× 93 states
© 2008 Warren B. Powell Slide 52
The post-decision state
( , )t t tS R D=
Pre-decision: resources and demands
© 2008 Warren B. Powell Slide 53
, ( , )x M xt t tS S S x=
The post-decision state
© 2008 Warren B. Powell Slide 54
The post-decision state
1 1 1ˆ ˆ( , )t t tW R D+ + +=
xtS ,
1 1( , )M W xt t tS S S W+ +=
© 2008 Warren B. Powell Slide 55
The post-decision state
1tS +
© 2008 Warren B. Powell Slide 56
The post-decision state
Classical form of Bellman’s equation:
Bellman’s equations around pre- and post-decision states:» Optimization problem (making the decision):
• Note: this problem is deterministic!» Simulation problem (the effect of exogenous
information):
( )( ),( ) max ( , ) ( , ) x M xt t x t t t t t t tV S C S x V S S x= +
{ },1 1( ) ( ( , )) |x x M W x x
t t t t t tV S E V S S W S+ +=
{ }( )1 1( ) max ( , ) ( ) |t t t t t t t txV S C S x E V S S+ +∈
= +X
© 2008 Warren B. Powell Slide 57
The post-decision state
Challenges» For most practical problems, we are not going to be
able to compute .
» Concept: replace it with an approximation and solve
» So now we face:• What should the approximation look like?• How do we estimate it?
( )( ) max ( , ) ( ) x xt t x t t t t tV S C S x V S= +
( )x xt tV S
( )xt tV S
( )( ) max ( , ) ( ) xt t x t t t t tV S C S x V S= +
© 2008 Warren B. Powell Slide 58
The post-decision state
Value function approximations:» Linear (in the resource state):
» Piecewise linear, separable:
» Indexed PWL separable:
( ) ( )x xt t ta ta
aV R V R
∈
= ∑A
( )x xt t ta ta
aV R v R
∈
= ⋅∑A
( ) ( ) | ( )x xt t ta ta t
aV R V R features
∈
= ∑A
© 2008 Warren B. Powell Slide 59
The post-decision state
Value function approximations:» Ridge regression (Klabjan and Adelman)
» Benders cuts
0x
( )t tV R
1x
( ) ( ) f
xt t tf tf tf fa ta
f aV R V R R Rθ
∈ ∈
= =∑ ∑F A
© 2008 Warren B. Powell Slide 60
The post-decision state
Comparison to other methods:» Classical MDP (value iteration)
» Classical ADP (pre-decision state):
» Our method (update around post-decision state):
( ), 1 ,
, 1 ,1 1 1 1 1 1
ˆ max ( , ) ( ( , ))
ˆ( ) (1 ) ( )
n n x n M x nt x t t t t t t
n x n n x n nt t n t t n t
v C S x V S S x
V S V S vα α
−
−− − − − − −
= +
= − +
( )11( ) max ( , ) ( )n n
x tV S C S x EV Sγ −+= +
( )11
'
11 1
ˆ max ( , ) ( ' | , ) '
ˆ( ) (1 ) ( )
n n n nt x t t t t t t
s
n n n n nt t n t t n t
v C S x p s S x V s
V S V S vα α
−+
−− −
⎛ ⎞= +⎜ ⎟⎝ ⎠
= − +
∑ˆ updates ( )t t tv V S
1 1ˆ updates ( )xt t tv V S− −
, 1x ntV −
Expectation
© 2008 Warren B. Powell Slide 61
The post-decision state
Step 1: Start with a pre-decision state Step 2: Solve the deterministic optimization using
an approximate value function:
to obtain . Step 3: Update the value function approximation
Step 4: Obtain Monte Carlo sample of andcompute the next pre-decision state:
Step 5: Return to step 1.
, 1 ,1 1 1 1 1 1 ˆ( ) (1 ) ( )n x n n x n n
t t n t t n tV S V S vα α−− − − − − −= − +
( )1 ,ˆ max ( , ) ( ( , ) )n n n M x nt x t t t t t tv C S x V S S x−= +
ntx
ntS
( )ntW ω
1 1( , , ( ))n M n n nt t t tS S S x W ω+ +=
Simulation
Deterministicoptimization
Recursivestatistics
Approximate dynamic programming
Optimization Simulation
CplexApproximate dynamic programming combines simulation and optimization in a rigorous yet flexible framework.
OptimizationStrengths
Produces optimal decisions.Mature technology
WeaknessesCannot handle uncertainty.Cannot handle high levels
of complexity
SimulationStrengths
Extremely flexibleHigh level of detailEasily handles uncertainty
WeaknessesModels decisions using
user-specified rules.Low solution quality.
Outline
A resource allocation modelAn introduction to ADP ADP and the post-decision state variableA blood management exampleHierarchical aggregationStepsizesHow well does it work?Some applications» Transportation» Energy
© 2008 Warren B. Powell Slide 64
Blood management
Managing blood inventories
© 2008 Warren B. Powell Slide 65
Blood management
Managing blood inventories over time
t=0
0S1 1
ˆ ˆ,R D1S
Week 1
1x1xS
2 2ˆ ˆ,R D
2S
Week 2
2x2xS
3 3ˆ ˆ,R D
3S3x
Week 3
3xS
t=1 t=2 t=3
Week 0
0x0xS
O-,1
O-,2
O-,3
AB+,2
AB+,3
O-,0
,ˆ
t ABD +
AB+,0
AB+,1
AB+,2
O-,0
O-,1
O-,2
,( ,0)t ABR +
,( ,1)t ABR +
,( ,2)t ABR +
,( ,0)t OR −
,( ,1)t OR −
,( ,2)t OR −
,ˆ
t ABD −
,ˆ
t AD +
,ˆ
t ABD +
,ˆ
t ABD +
,ˆ
t ABD +
,ˆ
t ABD +
AB+
AB-
A+
A-
B+
B-
O+
O-
xtR
AB+,0
AB+,1
,ˆ
t ABD +
Satisfy a demand Hold
tS = ( )ˆ , t tR D
AB+,0
AB+,1
AB+,2
tR
O-,0
O-,1
O-,2
xtR
AB+,0
AB+,1
AB+,2
AB+,3
O-,0
O-,1
O-,2
O-,3
AB+,0
AB+,1
AB+,2
AB+,3
O-,0
O-,1
O-,2
O-,3
1,ˆ
t ABR + +
1tR +
1,ˆ
t OR + −
ˆtD
,( ,0)t ABR +
,( ,1)t ABR +
,( ,2)t ABR +
,( ,0)t OR −
,( ,1)t OR −
,( ,2)t OR −
AB+,0
AB+,1
AB+,2
tR xtR
O-,0
O-,1
O-,2
AB+,0
AB+,1
AB+,2
AB+,3
O-,0
O-,1
O-,2
O-,3
ˆtD
,( ,0)t ABR +
,( ,1)t ABR +
,( ,2)t ABR +
,( ,0)t OR −
,( ,1)t OR −
,( ,2)t OR −
( )tF R
AB+,0
AB+,1
AB+,2
tR xtR
O-,0
O-,1
O-,2
AB+,0
AB+,1
AB+,2
AB+,3
O-,0
O-,1
O-,2
O-,3
ˆtD
,( ,0)t ABR +
,( ,1)t ABR +
,( ,2)t ABR +
,( ,0)t OR −
,( ,1)t OR −
,( ,2)t OR −
Solve this as a linear program.
( )tF R
AB+,0
AB+,1
AB+,2
tR xtR
O-,0
O-,1
O-,2
AB+,0
AB+,1
AB+,2
AB+,3
O-,0
O-,1
O-,2
O-,3
ˆtD
Dual variables give value additional unit of blood..
Duals
,( ,0)t̂ ABν +
,( ,1)t̂ ABν +
,( ,2)t̂ ABν +
,( ,0)t̂ ABν −
,( ,1)t̂ ABν −
,( ,2)t̂ ABν −
© 2008 Warren B. Powell Slide 71
Updating the value function approximation
Estimate the gradient at
,( ,2)nt ABR +
,( ,2)ˆnt ABν +
ntR
( )tF R
© 2008 Warren B. Powell Slide 72
Updating the value function approximation
Update the value function at
,1
x ntR −
11 1( )n x
t tV R−− −
,1
x ntR −
,( ,2)ˆnt ABν +
( )tF R
,( ,2)nt ABR +
© 2008 Warren B. Powell Slide 73
Updating the value function approximation
Update the value function at ,1
x ntR −
,( ,2)ˆnt ABν +
,1
x ntR −
11 1( )n x
t tV R−− −
© 2008 Warren B. Powell Slide 74
Updating the value function approximation
Update the value function at ,1
x ntR −
,1
x ntR −
11 1( )n x
t tV R−− −
1 1( )n xt tV R− −
© 2008 Warren B. Powell Slide 75
Iterative learning
t
© 2008 Warren B. Powell Slide 76
Iterative learning
© 2008 Warren B. Powell Slide 77
Iterative learning
© 2008 Warren B. Powell Slide 78
Iterative learning
Approximate dynamic programmingWith luck, the objective function will improve steadily
Objec tive func tion
97
97.5
98
98.5
99
99.5
100
1 11 21 31 41 51 61 71 81
Itera tion
Chart Title
1700000
1720000
1740000
1760000
1780000
1800000
1820000
1840000
1860000
1880000
1900000
300 400 500 600 700 800 900 1000
Iterations
Obj
ectiv
e fu
nctio
n
Approximate dynamic programming
… but performance can be jagged.
Outline
A resource allocation modelAn introduction to ADP ADP and the post-decision state variableA blood management exampleHierarchical aggregationStepsizesHow well does it work?Some applications» Transportation» Energy
Hierarchical aggregation
Attribute vectors:
a =Location
ETAA/C typeFuel level
Home shopCrewEqpt1
Eqpt100
⎡ ⎤⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦
LocationETA
Bus. segmentSingle/team
DomicileDrive hours
Days from home
⎡ ⎤⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦
Blood typeAge
Frozen?
⎡ ⎤⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦
Asset classTime invested⎡ ⎤⎢ ⎥⎣ ⎦
Estimating value functions
1 ˆ( ) (1 ) ( ) ( )n n
LocationLocation Location Fleet
v Fleet v Fleet v DomicileDomicile Domicile DOThrs
DaysFromHome
α α−
⎡ ⎤⎢ ⎥⎡ ⎤ ⎡ ⎤ ⎢ ⎥⎢ ⎥ ⎢ ⎥ ⎢ ⎥= − +⎢ ⎥ ⎢ ⎥ ⎢ ⎥⎢ ⎥ ⎢ ⎥⎣ ⎦ ⎣ ⎦ ⎢ ⎥⎢ ⎥⎣ ⎦
$2050 = $2000 $2500(1 0.10)− × + (0.10)
Value function Approximation may have fewer attributes than driver.
Drivers may have very detailed attributes
Aggregation
Estimating value functions» Most disaggregate level
Hierarchical aggregation
1 ˆ( ) (1 ) ( ) ( )n n
LocationLocation Location Fleet
v Fleet v Fleet v DomicileDomicile Domicile DOThrs
DaysFromHome
α α−
⎡ ⎤⎢ ⎥⎡ ⎤ ⎡ ⎤ ⎢ ⎥⎢ ⎥ ⎢ ⎥ ⎢ ⎥= − +⎢ ⎥ ⎢ ⎥ ⎢ ⎥⎢ ⎥ ⎢ ⎥⎣ ⎦ ⎣ ⎦ ⎢ ⎥⎢ ⎥⎣ ⎦
Estimating value functions» Middle level of aggregation
Hierarchical aggregation
1 ˆ( ) (1 ) ( ) ( )n n
LocationFleet
Location Locationv v v Domicile
Fleet FleetDOThrs
DaysFromHome
α α−
⎡ ⎤⎢ ⎥⎢ ⎥⎡ ⎤ ⎡ ⎤⎢ ⎥= − +⎢ ⎥ ⎢ ⎥⎢ ⎥⎣ ⎦ ⎣ ⎦⎢ ⎥⎢ ⎥⎣ ⎦
Estimating value functions» Most aggregate level
Hierarchical aggregation
[ ] [ ]1 ˆ( ) (1 ) ( ) ( )n n
LocationFleet
v Location v Location v DomicileDOThrs
DaysFromHome
α α−
⎡ ⎤⎢ ⎥⎢ ⎥⎢ ⎥= − +⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦
State-dependent weighted aggregation:» Now we have a linear regression for each attribute
» We cannot solve hundreds of thousands of linear regressions. Instead use
( ) ( ) g ga a a
g
v w v=∑
Estimate of variance Estimate of bias
( ) ( )( ) 12( ) ( ) ( ) ( ) 1g g g ga a a a
g
w Var v wβ−
∝ + =∑
Hierarchical aggregation
1400000
1450000
1500000
1550000
1600000
1650000
1700000
1750000
1800000
1850000
1900000
0 100 200 300 400 500 600 700 800 900 1000
Iterations
Obj
ectiv
e fu
nctio
n
Aggregate
Disaggregate
Hierarchical aggregation
1400000
1450000
1500000
1550000
1600000
1650000
1700000
1750000
1800000
1850000
1900000
0 100 200 300 400 500 600 700 800 900 1000
Iterations
Obj
ectiv
e fu
nctio
n
Weighted Combination
Aggregate
Disaggregate
Hierarchical aggregation
Outline
A resource allocation modelAn introduction to ADP ADP and the post-decision state variableA blood management exampleHierarchical aggregationStepsizesHow well does it work?Some applications» Transportation» Energy
Stepsizes
Stepsizes:» Fundamental to ADP is an updating equation that looks
like:
11 1 1 1 1 1 ˆ( ) (1 ) ( )n x n x n
t t n t t n tV S V S vα α−− − − − − −= − +
Old estimate New observationUpdated estimate
The stepsize“Learning rate”
“Smoothing factor”
Stepsizes
High noise data:
Low noise, changing signal
0
0.5
1
1.5
2
2.5
3
1 21 41 61 81
Best stepsize
0
0.2
0.4
0.6
0.8
1
1.2
1 21 41 61 81
1/n nα =
1nα ≈
Stepsizes
Theorem 1 (general knowledge):» If data is stationary, then 1/n is the best possible
stepsize.
Theorem 2 (e.g. Tsitsiklis 1994)» For general problems, 1/n is provably convergent
Theorem 3 (Frazier and Powell):» 1/n works so badly it should (almost) never be used.
Stepsizes
Lower bound on the value function after n iterations:
210 410 610 810 1010 1210
Stepsizes
( ) ( )
( ) ( )
2
21 2
2 21
2
11
where:
1
As increases, stepsize decreases toward 1/As increases, stepsize increases toward 1.
n n n
n nn n
n
n
σαλ σ β
λ α λ α
σ
β
−
−
= −+ +
= − +
Estimate of the variance
Estimate of the bias
Bias
Noise
Bias-adjusted Kalman filter
Stepsizes
The bias-adjusted Kalman filter
0
0.2
0.4
0.6
0.8
1
1.2
1 21 41 61 81
Observed values
BAKF stepsize rule
1/n stepsize rule
Stepsizes
The bias-adjusted Kalman filter
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1 21 41 61 81
Observed values
BAKF stepsize rule
Stepsizes
BAKF formula
Our newest formula» where
» First stepsize formula designed specifically for dynamic programming (note: no need to estimate bias )
» Captures the natural growth of a value function through iterative learning.
( )( )( )( )
21 2
22 1 2
n n
nn n
λ σ βα
σ λ σ β
−
−
+=
+ +
( )( )( )( ) ( )( )( )
21 2 1 2
22 1 2 1 2 1 2
1 1
1 1 1
n n
nn n n
c
c
λ σ γ δα
γ λ σ λ σ γ δ
− −
− − −
+ − −=
+ + + − −
nβ
Stepsizes
I recommend:» Start with a constant stepsize
• Vary it, get a sense of what works best
» Next try a deterministic stepsize such as a/(a+n)• Choose a so that it declines to roughly the best deterministic
stepsize at an iteration comparable to where your constant stepsize seems to stabilize
– Might be 50 iterations– Might be 5000
» Try an (optimal) adaptive stepsize rule• Can work very well if there is not too much noise• Adaptive rules work well when there is a need to keep the
stepsize from declining too quickly (but you do not know how quickly)
Outline
A resource allocation modelAn introduction to ADP ADP and the post-decision state variableA blood management exampleHierarchical aggregationStepsizesHow well does it work?Some applications» Transportation» Energy
Two-stage optimizationPiecewise linear, separable value function approximations:
Piecewise linear, separable:
( ) ( )t t tl tll
V R V R∈
=∑L
Two-stage optimizationBenders decomposition:
0x
( )t tV R
1x
⎫⎪⎪⎬⎪⎪⎭
Multidimensional cuts produce provably convergent, nonseparablevalue function approximation.
The competition
Exact solutions using Benders:
0x
0V“L-Shaped” decomposition
(Van Slyke and Wets)
0x
0VStochastic decomposition
(Higle and Sen)
0x
0VCUPPS
(Chen and Powell)
The competitionPercent from optimal 100 iterations
0
5
10
15
20
25
30
35
40
45
SD L-shaped CUPPS SPAR
10 locations25 locations50 locations100 locations
10
20
30
40
0
Percent over optimal after 100 iterations
Benders
Perc
ent e
rror
Increasing problem size
Separable
The competitionPercent from optimal 100 iterations
0
5
10
15
20
25
30
35
40
45
SD L-shaped CUPPS SPAR
10 locations25 locations50 locations100 locations
10
20
30
40
0
Percent over optimal after 100 iterations
Increasing problem size
Benders Separable
Perc
ent e
rror
Multistage problems
Deterministic, single commodity flow
Multistage problems
Deterministic, single commodity flow» ADP solution as a percent of the optimal solution (from
Cplex)
Multistage problems
Stochastic, single commodity flow
0
20
40
60
80
100
120
(20,10
0)(20
,200)
(20,40
0)(40
,100)
(40,20
0)(40
,400)
(80,10
0)(80
,200)
(80,40
0)
Perc
ent o
f pos
terio
r opt
imal
Rolling horizonADP
100 = optimal posterior bound
Multistage problems
Deterministic, (integer) multicommodity flow
Multistage problems
Deterministic, (integer) multicommodity flow
60
65
70
75
80
85
90
95
100
105
Base
T_30
T_90
I_10
I_40
C_IIC_IIIC_IV R_1
R_5R_10
0R_40
0C_1C_8
Perc
ent o
f opt
imal
100 = optimal continuous relaxation
Multistage problems
Stochastic, (integer) multicommodity flow
0
20
40
60
80
100
120
Base
I_10
I_40
C_II C_III
C_IV R_1 R_5R_1
00R_4
00 C_1 C_8
Perc
ent o
f pos
terio
r opt
imal
Rolling horizonADP
Outline
A resource allocation modelAn introduction to ADP ADP and the post-decision state variableA blood management exampleHierarchical aggregationStepsizesHow well does it work?Some applications» Transportation» Energy
© 2008 Warren B. Powell Slide 113
Schneider National
© 2008 Warren B. Powell Slide 114
© 2008 Warren B. Powell Slide 115
0
200
400
600
800
1000
1200
1400
US_SOLO US_IC US_TEAM
Capacity category
Rev
enue
per
WU
Historical maximum
Simulation
Historical minimum
0
200
400
600
800
1000
1200
US_SOLO US_IC US_TEAM
Capacity category
Util
izat
ion Historical maximum
Simulation
Historical minimumRevenue per WU
Utilization
Historical min and maxCalibrated model
Schneider National
Princeton team:Warren PowellBelgacem Bouzaiene-Ayari
NS team:Clark ChengRicardo FiorilloJunxia ChangSourav Das
© 2008 Warren B. Powell Slide 117
© 2008 Warren B. Powell Slide 118
© 2008 Warren B. Powell Slide 119
© 2008 Warren B. Powell Slide 120
Convergence
Train coverage
Model calibration
Train delay curve – October 2007D
elay
hou
rs/d
ay
Fleet size
Model calibration
Train delay curve – March 2008D
elay
hou
rs/d
ay
Fleet size
Outline
A resource allocation modelAn introduction to ADP ADP and the post-decision state variableA blood management exampleHierarchical aggregationStepsizesHow well does it work?Some applications» Transportation» Energy
© 2008 Warren B. Powell Slide 125
Energy resource modeling
© 2008 Warren B. Powell Slide 126
Energy resource modeling
Hourly model» Decisions at time t impact t+1 through the amount of water held in
the reservoir.Hour t Hour t+1
© 2008 Warren B. Powell Slide 127
Energy resource modeling
Hourly model» Decisions at time t impact t+1 through the amount of water held in
the reservoir.
Value of holding water in the reservoir for future time periods.
Hour t
© 2008 Warren B. Powell Slide 128
Energy resource modeling
© 2008 Warren B. Powell Slide 129
Energy resource modeling2008
…
Hour 1 2 3 4 87602009
1 2
© 2008 Warren B. Powell Slide 130
Energy resource modeling2008
…
Hour 1 2 3 4 87602009
1 2
© 2008 Warren B. Powell Slide 131
Annual energy model
Optimal
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
0 100 200 300 400 500 600 700 800 900 1000
PrecipitationHydroUsageRemainingWastageDemandCoalWind
Reservoir level
Demand
© 2008 Warren B. Powell Slide 132
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
0 100 200 300 400 500 600 700 800 900 1000
PrecipitationUsageRemainingWastageDemandCoalWind
Annual energy model
Approximate dynamic programming
Reservoir level
Demand
© 2008 Warren B. Powell Slide 133
Annual energy model
Optimal
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
0 100 200 300 400 500 600 700 800 900 1000
PrecipitationHydroUsageRemainingWastageDemandCoalWind
Reservoir level
Demand
© 2008 Warren B. Powell Slide 134
Annual energy model
Approximate dynamic programming
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
0 100 200 300 400 500 600 700 800 900 1000
PrecipitationUsageRemainingWastageDemandCoalWind
Reservoir level
Demand
© 2008 Warren B. Powell Slide 135
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
0 100 200 300 400 500 600 700 800
Res
ervo
ir le
vel
Time period
ADP at last iterationOptimal for individual
scenarios
Annual energy model
ADP vs optimal reservoir levels for stochastic rainfall
© 2008 Warren B. Powell Slide 136