136
© 2008 Warren B. Powell Slide 1 Approximate Dynamic Programming: Solving the curses of dimensionality Informs Computing Society Tutorial October, 2008 Warren Powell CASTLE Laboratory Princeton University http://www.castlelab.princeton.edu © 2008 Warren B. Powell, Princeton University

Approximate Dynamic Programming: Solving the curses of ... · Planning for a risky world Weather •Robust design of emergency response networks. ... Government policies (tax rebates

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Approximate Dynamic Programming: Solving the curses of ... · Planning for a risky world Weather •Robust design of emergency response networks. ... Government policies (tax rebates

© 2008 Warren B. Powell Slide 1

Approximate Dynamic Programming:Solving the curses of dimensionality

Informs Computing Society Tutorial

October, 2008

Warren PowellCASTLE LaboratoryPrinceton University

http://www.castlelab.princeton.edu

© 2008 Warren B. Powell, Princeton University

Page 2: Approximate Dynamic Programming: Solving the curses of ... · Planning for a risky world Weather •Robust design of emergency response networks. ... Government policies (tax rebates

© 2008 Warren B. Powell Slide 2

Page 3: Approximate Dynamic Programming: Solving the curses of ... · Planning for a risky world Weather •Robust design of emergency response networks. ... Government policies (tax rebates

© 2008 Warren B. Powell Slide 3

Page 4: Approximate Dynamic Programming: Solving the curses of ... · Planning for a risky world Weather •Robust design of emergency response networks. ... Government policies (tax rebates

© 2008 Warren B. Powell Slide 4

The fractional jet ownership industry

Page 5: Approximate Dynamic Programming: Solving the curses of ... · Planning for a risky world Weather •Robust design of emergency response networks. ... Government policies (tax rebates

© 2008 Warren B. Powell Slide 5NetJets Inc.

Page 6: Approximate Dynamic Programming: Solving the curses of ... · Planning for a risky world Weather •Robust design of emergency response networks. ... Government policies (tax rebates

© 2008 Warren B. Powell Slide 6

Schneider National

Page 7: Approximate Dynamic Programming: Solving the curses of ... · Planning for a risky world Weather •Robust design of emergency response networks. ... Government policies (tax rebates

© 2008 Warren B. Powell Slide 7

Schneider National

Page 8: Approximate Dynamic Programming: Solving the curses of ... · Planning for a risky world Weather •Robust design of emergency response networks. ... Government policies (tax rebates

© 2008 Warren B. Powell Slide 8

Planning for a risky world

Weather

•Robust design of emergency response networks.

•Design of financial instruments to hedge against weather emergencies to protect individuals, companies and municipalities.

•Design of sensor networks and communication systems to manage responses to major weather events.

Disease

•Models of disease propagation for response planning.

•Management of medical personnel, equipment and vaccines to respond to a disease outbreak.

•Robust design of supply chains to mitigate the disruption of transportation systems.

Page 9: Approximate Dynamic Programming: Solving the curses of ... · Planning for a risky world Weather •Robust design of emergency response networks. ... Government policies (tax rebates

© 2008 Warren B. Powell Slide 9

Energy management

Energy resource allocation• What is the right mix of energy technologies?

• How should the use of different energy resources be coordinated over space and time?

• What should my energy R&D portfolio look like?

• Should I invest in nuclear energy?

• What is the impact of a carbon tax?

Energy markets• How should I hedge energy commodities?

• How do I price energy assets?

• What is the right price for energy futures?

Page 10: Approximate Dynamic Programming: Solving the curses of ... · Planning for a risky world Weather •Robust design of emergency response networks. ... Government policies (tax rebates

© 2008 Warren B. Powell Slide 10

High value spare parts

Electric Power Grid

•PJM oversees an aging investment in high-voltage transformers.

•Replacement strategy needs to anticipate a bulge in retirements and failures

•1-2 year lag times on orders. Typical cost of a replacement ~ $5 million.

•Failures vary widely in terms of economic impact on the network.

Spare parts for business jets

•ADP is used to determine purchasing and allocation strategies for over 400 high value spare parts.

•Inventory strategy has to determine what to buy, when and where to store it. Many parts are very low volume (e.g. 7 spares spread across 15 service centers).

•Inventories have to meet global targets on level of service and inventory costs.

Page 11: Approximate Dynamic Programming: Solving the curses of ... · Planning for a risky world Weather •Robust design of emergency response networks. ... Government policies (tax rebates

© 2008 Warren B. Powell Slide 11

Challenges

Real-time control» Scheduling aircraft, pilots, generators, tankers» Pricing stocks, options» Electricity resource allocation

Near-term tactical planning» Can I accept a customer request?» Should I lease equipment?» How much energy can I commit with my windmills?

Strategic planning» What is the right equipment mix?» What is the value of this contract?» What is the value of more reliable aircraft?

Page 12: Approximate Dynamic Programming: Solving the curses of ... · Planning for a risky world Weather •Robust design of emergency response networks. ... Government policies (tax rebates

Outline

A resource allocation modelAn introduction to ADP ADP and the post-decision state variableA blood management exampleHierarchical aggregationStepsizesHow well does it work?Some applications» Transportation» Energy

Page 13: Approximate Dynamic Programming: Solving the curses of ... · Planning for a risky world Weather •Robust design of emergency response networks. ... Government policies (tax rebates

Outline

A resource allocation modelAn introduction to ADP ADP and the post-decision state variableA blood management exampleHierarchical aggregationStepsizesHow well does it work?Some applications» Transportation» Energy

Page 14: Approximate Dynamic Programming: Solving the curses of ... · Planning for a risky world Weather •Robust design of emergency response networks. ... Government policies (tax rebates

© 2008 Warren B. Powell Slide 14

A resource allocation model

Attribute vectors:

a =Location

ETAA/C typeFuel level

Home shopCrewEqpt1

Eqpt100

⎡ ⎤⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦

LocationETAHome

ExperienceDriving hours

⎡ ⎤⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦

TypeLocation

Age

⎡ ⎤⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦

Asset classTime invested⎡ ⎤⎢ ⎥⎣ ⎦

Page 15: Approximate Dynamic Programming: Solving the curses of ... · Planning for a risky world Weather •Robust design of emergency response networks. ... Government policies (tax rebates

© 2008 Warren B. Powell Slide 15

A resource allocation model

Modeling resources:» The attributes of a single resource:

» The resource state vector:

» The information process:

The attributes of a single resource The attribute space

aa=∈A

ˆ The change in the number of resources with attribute .

taRa

=

( )The number of resources with attribute

The resource state vectorta

t ta a

R a

R R∈

=

=A

Page 16: Approximate Dynamic Programming: Solving the curses of ... · Planning for a risky world Weather •Robust design of emergency response networks. ... Government policies (tax rebates

© 2008 Warren B. Powell Slide 16

A resource allocation model

Modeling demands:» The attributes of a single demand:

» The demand state vector:

» The information process:ˆ The change in the number of demands with

attribute .tbD

b=

The attributes of a demand to be served. The attribute space

bb=∈B

( )The number of demands with attribute

The demand state vectortb

t tb b

D b

D D∈

=

=B

Page 17: Approximate Dynamic Programming: Solving the curses of ... · Planning for a risky world Weather •Robust design of emergency response networks. ... Government policies (tax rebates

© 2008 Warren B. Powell Slide 17

Energy resource modeling

The system state:

( ), , System state, where:

Resource state (how much capacity, reserves) Market demands "system parameters" State of the technology (costs, pe

t t t t

t

t

t

S R D

RD

ρ

ρ

= =

=

==

rformance) Climate, weather (temperature, rainfall, wind) Government policies (tax rebates on solar panels) Market prices (oil, coal)

⎫⎪⎪⎪⎬⎪⎪⎪⎭

Page 18: Approximate Dynamic Programming: Solving the curses of ... · Planning for a risky world Weather •Robust design of emergency response networks. ... Government policies (tax rebates

© 2008 Warren B. Powell Slide 18

Energy resource modeling

The decision variable:

New capacityRetired capacity

:Type

LocationTechnology

t

for eachx

⎛ ⎞⎜ ⎟⎜ ⎟⎜ ⎟

= ⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠

⎫⎪⎪⎪⎬⎪⎪⎪⎭

Page 19: Approximate Dynamic Programming: Solving the curses of ... · Planning for a risky world Weather •Robust design of emergency response networks. ... Government policies (tax rebates

© 2008 Warren B. Powell Slide 19

Energy resource modeling

Exogenous information:

⎫⎪⎪⎪⎬⎪⎪⎪⎭

( )ˆ ˆ ˆNew information = , ,t t t tW R D ρ=

ˆ Exogenous changes in capacity, reservesˆ New demands for energy from each sourceˆ Exogenous changes in parameters.

t

t

t

R

=

=

=

Page 20: Approximate Dynamic Programming: Solving the curses of ... · Planning for a risky world Weather •Robust design of emergency response networks. ... Government policies (tax rebates

© 2008 Warren B. Powell Slide 20

Energy resource modeling

The transition function

⎫⎪⎪⎪⎬⎪⎪⎪⎭

1 1( , , )Mt t t tS S S x W+ +=

Page 21: Approximate Dynamic Programming: Solving the curses of ... · Planning for a risky world Weather •Robust design of emergency response networks. ... Government policies (tax rebates

© 2008 Warren B. Powell Slide 21

A resource allocation model

DemandsResources

Page 22: Approximate Dynamic Programming: Solving the curses of ... · Planning for a risky world Weather •Robust design of emergency response networks. ... Government policies (tax rebates

© 2008 Warren B. Powell Slide 22

A resource allocation model

t t+1 t+2

Page 23: Approximate Dynamic Programming: Solving the curses of ... · Planning for a risky world Weather •Robust design of emergency response networks. ... Government policies (tax rebates

© 2008 Warren B. Powell Slide 23

A resource allocation model

t t+1 t+2

Optimizing at a point in time

Optimizing over time

Page 24: Approximate Dynamic Programming: Solving the curses of ... · Planning for a risky world Weather •Robust design of emergency response networks. ... Government policies (tax rebates

Outline

A resource allocation modelAn introduction to ADP ADP and the post-decision state variableA blood management exampleHierarchical aggregationStepsizesHow well does it work?Some applications» Transportation» Energy

Page 25: Approximate Dynamic Programming: Solving the curses of ... · Planning for a risky world Weather •Robust design of emergency response networks. ... Government policies (tax rebates

© 2008 Warren B. Powell Slide 25

Introduction to ADP

We just solved Bellman’s equation:

» We found the value of being in each state by stepping backward through the tree.

{ }1 1( ) max ( , ) ( ) |t t t t t t t txV S C S x E V S S+ +∈

= +X

Page 26: Approximate Dynamic Programming: Solving the curses of ... · Planning for a risky world Weather •Robust design of emergency response networks. ... Government policies (tax rebates

© 2008 Warren B. Powell Slide 26

Introduction to ADP

The challenge of dynamic programming:

Problem: Curse of dimensionality

{ }( )1 1( ) max ( , ) ( ) |t t t t t t t txV S C S x E V S S+ +∈

= +X

Page 27: Approximate Dynamic Programming: Solving the curses of ... · Planning for a risky world Weather •Robust design of emergency response networks. ... Government policies (tax rebates

© 2008 Warren B. Powell Slide 27

Introduction to ADP

The challenge of dynamic programming:

Problem: Curse of dimensionality

{ }( )1 1( ) max ( , ) ( ) |t t t t t t t txV S C S x E V S S+ +∈

= +X

Three curses

State spaceOutcome spaceAction space (feasible region)

Page 28: Approximate Dynamic Programming: Solving the curses of ... · Planning for a risky world Weather •Robust design of emergency response networks. ... Government policies (tax rebates

© 2008 Warren B. Powell Slide 28

Introduction to ADP

The computational challenge:

How do we find ? 1 1( )t tV S+ +

How do we compute the expectation?

How do we find the optimal solution?

{ }( )1 1( ) max ( , ) ( ) |t t t t t t t txV S C S x E V S S+ +∈

= +X

Page 29: Approximate Dynamic Programming: Solving the curses of ... · Planning for a risky world Weather •Robust design of emergency response networks. ... Government policies (tax rebates

Introduction to ADP

Classical ADP» Most applications of ADP focus on the challenge of

handling multidimensional state variables» Start with

» Now replace the value function with some sort of approximation

{ }( )1 1( ) max ( , ) ( ) |t t t t t t t txV S C S x E V S S+ +∈

= +X

( ) ( )t t t tV S V S≈

Page 30: Approximate Dynamic Programming: Solving the curses of ... · Planning for a risky world Weather •Robust design of emergency response networks. ... Government policies (tax rebates

Introduction to ADP

Approximating the value function:» We have to exploit the structure of the value function

(e.g. concavity).» We might approximate the value function using a

simple polynomial

» .. or a complicated one:

» Sometimes, they get really messy:

20 1 2( | )t t t tV S S Sθ θ θ θ= + +

( )20 1 2 3 4( | ) ln sin( )t t t t t tV S S S S Sθ θ θ θ θ θ= + + + +

Page 31: Approximate Dynamic Programming: Solving the curses of ... · Planning for a risky world Weather •Robust design of emergency response networks. ... Government policies (tax rebates

2 3(0) (1) (1)

, ' , ' , ' , '' '

(2) 2 (2) 2, ' , ' , ' , '

' '2

(3), , '

'

21 1(4)

, ' , ' '' ' '

( | )

1

12

t t

t t st t st t wt t wts t t w t t

t wt t wt t st t stw t s t

ts t st t s ts s

t t

ts t st t s ts t t s t t

V R R R

R R

R RS

R RS

θ θ θ θ

θ θ

θ

θ

+ +

= =

+ +

= =

= + +

+ +

⎛ ⎞+ −⎜ ⎟

⎝ ⎠

⎛ ⎞⎛ ⎞+ −⎜ ⎟⎜ ⎟

⎝ ⎠⎝ ⎠

+

∑∑ ∑∑

∑∑ ∑∑

∑ ∑

∑ ∑ ∑∑22 2

(5), ' , ' '

' ' '

( ,1), , ,

2( ,2), , , '

'

3 2( ,3), , ' , '

' '

( ,4), , '

'

13

t t

ts t st t s ts t t s t t

wst ws t wt t st

w s

tws

t ws t wt t stw s t t

t tws

t ws t wt t stw s t t t t

wst ws t wt

t

R RS

R R

R R

R R

R

θ

θ

θ

θ

θ

+ +

= =

+

=

+ +

= =

⎛ ⎞⎛ ⎞ −⎜ ⎟⎜ ⎟⎝ ⎠⎝ ⎠

+

⎛ ⎞+ ⎜ ⎟⎝ ⎠

⎛ ⎞⎛ ⎞+ ⎜ ⎟⎜ ⎟⎝ ⎠⎝ ⎠

+

∑ ∑ ∑∑

∑∑

∑∑ ∑

∑∑ ∑ ∑2 2

, ''

( ,5), , , 2 , , , 2

t t

t stw s t t t

wst sw t s t t wt t w t

w s

R

R R Rθ

+ +

= =

+ +

⎛ ⎞⎛ ⎞⎜ ⎟⎜ ⎟⎝ ⎠⎝ ⎠

+

∑∑ ∑ ∑

∑∑

1,2,3 4,5,6,7

8-14

15

16

17

18

19

20

21

22

Page 32: Approximate Dynamic Programming: Solving the curses of ... · Planning for a risky world Weather •Robust design of emergency response networks. ... Government policies (tax rebates

Introduction to ADP

We can write a model of the observed value of being in a state as:

This is often written as a generic regression model:

The ADP community refers to the independent variables as basis functions:

( )20 1 2 3 4ˆ ln sin( )t t t tv S S S Sθ θ θ θ θ ε= + + + + +

0 1 1 2 2 3 3 4 4 Y X X X Xθ θ θ θ θ= + + + +

0 0 1 1 2 2 3 3 4 4( ) ( ) ( ) ( ) ( )

= ( )f ff

Y S S S S S

S

θ ϕ θ ϕ θ ϕ θ ϕ θ ϕ

θ ϕ∈

= + + + +

∑F

( )f Rϕ are also known as features.

Page 33: Approximate Dynamic Programming: Solving the curses of ... · Planning for a risky world Weather •Robust design of emergency response networks. ... Government policies (tax rebates

Introduction to ADP

Methods for estimating»

»

θ1 2ˆ ˆ ˆGenerate observations , ,..., , and use traditional regression

methods to fit .

Nv v vθ

Stochastic gradient for updating :nθ

( )

( )

1 1 1 1 11

1

21 1 11

ˆ( | ) ( | )

( )( )

ˆ ( | )

( )

n n n n n n n n nn

n n n n nn

F

V S v V S

SS

V S v

S

θ θ α θ θ

ϕϕ

θ α θ

ϕ

− − − − −−

− − −−

= − − ∇

⎛ ⎞⎜ ⎟⎜ ⎟= − −⎜ ⎟⎜ ⎟⎝ ⎠

Error Basis functions

Page 34: Approximate Dynamic Programming: Solving the curses of ... · Planning for a risky world Weather •Robust design of emergency response networks. ... Government policies (tax rebates

Introduction to ADP

Methods for estimating» Recursive statistics

• Iterative equations avoid matrix inverse:

θ

( )( )( )

1

11

1 1 1

1

Error1 is matrix=

1

1

n n n n n n

n n Tn

Tn n n n n nn

Tn n n n

H x

H B B F F XX

B B B x x B

x B x

θ θ ε ε

γ

γ

γ

−−

− − −

= − =

⎡ ⎤= × ⎣ ⎦

= −

= +

Page 35: Approximate Dynamic Programming: Solving the curses of ... · Planning for a risky world Weather •Robust design of emergency response networks. ... Government policies (tax rebates

Introduction to ADP

Other statistical methods

» Regression trees• Combines regression with techniques for discrete variables.

» Data mining• Good for categorical data

» Neural networks• Engineers like this for low-dimensional continuous problems

Page 36: Approximate Dynamic Programming: Solving the curses of ... · Planning for a risky world Weather •Robust design of emergency response networks. ... Government policies (tax rebates

Introduction to ADP

Notes:» When approximating value functions, we are basically

drawing on the entire field of statistics.» Choosing an approximation is primarily an art.» In special cases, the resulting algorithm can produce

optimal solutions.» Most of the time, we are hoping for “good” solutions.» In some cases, it can work terribly.» As a general rule – you have to use problem structure.

Value function approximations have to capture the right structure. Blind use of polynomials will rarely be successful.

Page 37: Approximate Dynamic Programming: Solving the curses of ... · Planning for a risky world Weather •Robust design of emergency response networks. ... Government policies (tax rebates

Introduction to ADP

What you will struggle with:» Stepsizes

• Can’t live with ‘em, can’t live without ‘em.• Too small, you think you have converged but you have really

just stalled (“apparent convergence”)• Too large, and the system is unstable.

» Stability• There are two sources of randomness:

– The traditional exogenous randomness– An evolving policy

» Exploration vs. exploitation• You sometimes have to choose to visit a state just to collect

information about the value of being in a state.

Page 38: Approximate Dynamic Programming: Solving the curses of ... · Planning for a risky world Weather •Robust design of emergency response networks. ... Government policies (tax rebates

Introduction to ADP

But we are not out of the woods…» Assume we have an approximate value function.» We still have to solve a problem that looks like

» This means we still have to deal with a maximization problem (might be a linear, nonlinear or integer program) with an expectation.

1( ) max ( , ) ( )t t t t t f f tx fV S C S x E Sθ φ +∈ ∈

⎛ ⎞= +⎜ ⎟

⎝ ⎠∑

X F

Page 39: Approximate Dynamic Programming: Solving the curses of ... · Planning for a risky world Weather •Robust design of emergency response networks. ... Government policies (tax rebates

Outline

A resource allocation modelAn introduction to ADP ADP and the post-decision state variableA blood management exampleHierarchical aggregationStepsizesHow well does it work?Some applications» Transportation» Energy

Page 40: Approximate Dynamic Programming: Solving the curses of ... · Planning for a risky world Weather •Robust design of emergency response networks. ... Government policies (tax rebates

Do not use

weather report

Use w

eath

er re

port

Forecast sunny .6

Rain .8 -$2000Clouds .2 $1000Sun .0 $5000Rain .8 -$200Clouds .2 -$200Sun .0 -$200

Schedule game

Cancel game

Rain .1 -$2000Clouds .5 $1000Sun .4 $5000Rain .1 -$200Clouds .5 -$200Sun .4 -$200

Schedule game

Cancel game

Rain .1 -$2000Clouds .2 $1000Sun .7 $5000Rain .1 -$200Clouds .2 -$200Sun .7 -$200

Schedule game

Cancel game

Rain .2 -$2000Clouds .3 $1000Sun .5 $5000Rain .2 -$200Clouds .3 -$200Sun .5 -$200

Schedule game

Cancel game

Forecast cloudy .3

Forecast rain .1

- Decision nodes

- Outcome nodes

Information

ActionInformation

Action

State

State

Page 41: Approximate Dynamic Programming: Solving the curses of ... · Planning for a risky world Weather •Robust design of emergency response networks. ... Government policies (tax rebates

Do not use

weather report

Use w

eath

er re

port

Forecast sunny .6

Schedule game

Cancel game

Schedule game

Cancel game

Schedule game

Cancel game

Schedule game

Cancel game

Forecast cloudy .3

Forecast rain .1

-$1400

-$200

$2300

-$200

$3500

-$200

$2400

-$200

Page 42: Approximate Dynamic Programming: Solving the curses of ... · Planning for a risky world Weather •Robust design of emergency response networks. ... Government policies (tax rebates

Do not use

weather report

Use w

eath

er re

port

Forecast sunny .6Schedule game

Cancel game

Forecast cloudy .3

Forecast rain .1 -$200

$2300

$3500

$2400

-$200

Page 43: Approximate Dynamic Programming: Solving the curses of ... · Planning for a risky world Weather •Robust design of emergency response networks. ... Government policies (tax rebates

Do not use

weather report

Use w

eath

er re

port

$2770

$2400

Page 44: Approximate Dynamic Programming: Solving the curses of ... · Planning for a risky world Weather •Robust design of emergency response networks. ... Government policies (tax rebates

© 2008 Warren B. Powell Slide 44

The post-decision state

Page 45: Approximate Dynamic Programming: Solving the curses of ... · Planning for a risky world Weather •Robust design of emergency response networks. ... Government policies (tax rebates

© 2008 Warren B. Powell Slide 45

The post-decision state

Managing blood inventories over time

t=0

0S1 1

ˆ ˆ,R D1S

Week 1

1x

2 2ˆ ˆ,R D

2S

Week 2

2x2xS

3 3ˆ ˆ,R D

3S3x

Week 3

3xS

t=1 t=2 t=3

Week 0

0x

Page 46: Approximate Dynamic Programming: Solving the curses of ... · Planning for a risky world Weather •Robust design of emergency response networks. ... Government policies (tax rebates

© 2008 Warren B. Powell Slide 46

The post-decision state

What happens if use Bellman’s equation?» State variable is:

• The supply of each type of blood– 8 blood types– 6 ages

• The demand for each type of blood– 8 blood types

• 56 dimensional state vector» Decision variable is how much of 8 blood

types to supply to 8 demand types.• 162- dimensional decision vector

» Random information• Blood donations by week (8 types)• New demands for blood (8 types)• 16-dimensional information vector

Page 47: Approximate Dynamic Programming: Solving the curses of ... · Planning for a risky world Weather •Robust design of emergency response networks. ... Government policies (tax rebates

Do not

weath

Use w

eath

er re

port

Forecast sunny .6

Rain .8 -$2000Clouds .2 $1000Sun .0 $5000Rain .8 -$200Clouds .2 -$200Sun .0 -$200

Schedule game

Cancel game

Rain .1 -$2000Clouds .5 $1000Sun .4 $5000Rain .1 -$200Clouds .5 -$200Sun .4 -$200

Schedule game

Cancel game

Rain .1 -$2000Clouds .2 $1000Sun .7 $5000Rain .1 -$200Clouds .2 -$200S n 7 $200

Schedule game

Cancel game

Rain 2 -$2000

Forecast cloudy .3

Forecast rain .1

Do not

weath

Use w

eath

er re

port

Forecast sunny .6

Rain .8 -$2000Clouds .2 $1000Sun .0 $5000Rain .8 -$200Clouds .2 -$200Sun .0 -$200

Schedule game

Cancel game

Rain .1 -$2000Clouds .5 $1000Sun .4 $5000Rain .1 -$200Clouds .5 -$200Sun .4 -$200

Schedule game

Cancel game

Rain .1 -$2000Clouds .2 $1000Sun .7 $5000Rain .1 -$200Clouds .2 -$200S n 7 $200

Schedule game

Cancel game

Rain 2 -$2000

Forecast cloudy .3

Forecast rain .1

{ }( )1 1( ) max ( , ) ( ) |t t t t t t t txV S C S x E V S S+ +∈

= +X

tS

1tS +

Page 48: Approximate Dynamic Programming: Solving the curses of ... · Planning for a risky world Weather •Robust design of emergency response networks. ... Government policies (tax rebates

Do not use

weather report

Use w

eath

er re

port

Forecast sunny .6

Rain .8 -$2000Clouds .2 $1000Sun .0 $5000Rain .8 -$200Clouds .2 -$200Sun .0 -$200

Schedule game

Cancel game

Rain .1 -$2000Clouds .5 $1000Sun .4 $5000Rain .1 -$200Clouds .5 -$200Sun .4 -$200

Schedule game

Cancel game

Rain .1 -$2000Clouds .2 $1000Sun .7 $5000Rain .1 -$200Clouds .2 -$200Sun .7 -$200

Schedule game

Cancel game

Rain .2 -$2000Clouds .3 $1000Sun .5 $5000Rain .2 -$200Clouds .3 -$200Sun .5 -$200

Schedule game

Cancel game

Forecast cloudy .3

Forecast rain .1

- Decision nodes

- Outcome nodes

Do not use

weather report

Use w

eath

er re

port

Forecast sunny .6

Rain .8 -$2000Clouds .2 $1000Sun .0 $5000Rain .8 -$200Clouds .2 -$200Sun .0 -$200

Schedule game

Cancel game

Rain .1 -$2000Clouds .5 $1000Sun .4 $5000Rain .1 -$200Clouds .5 -$200Sun .4 -$200

Schedule game

Cancel game

Rain .1 -$2000Clouds .2 $1000Sun .7 $5000Rain .1 -$200Clouds .2 -$200Sun .7 -$200

Schedule game

Cancel game

Rain .2 -$2000Clouds .3 $1000Sun .5 $5000Rain .2 -$200Clouds .3 -$200Sun .5 -$200

Schedule game

Cancel game

Forecast cloudy .3

Forecast rain .1

- Decision nodes

- Outcome nodes

Page 49: Approximate Dynamic Programming: Solving the curses of ... · Planning for a risky world Weather •Robust design of emergency response networks. ... Government policies (tax rebates

© 2008 Warren B. Powell Slide 49

The post-decision state

New concept:» The “pre-decision” state variable:

• Same as a “decision node” in a decision tree.

» The “post-decision” state variable:

• Same as an “outcome node” in a decision tree.

The information required to make a decision t tS x=

The state of what we know immediately after we make a decision.

xtS =

Page 50: Approximate Dynamic Programming: Solving the curses of ... · Planning for a risky world Weather •Robust design of emergency response networks. ... Government policies (tax rebates

© 2008 Warren B. Powell Slide 50

The post-decision state

An inventory problem:» Our basic inventory equation:

» Using pre- and post-decision states:

Rt+1 = max{0, Rt + xt − D̂t+1}where

Rt = Inventory at time t.

xt = Amount we ordered.

D̂t+1 = Demand in next time period.

Rxt = Rt + xt Post-decision state

Rt+1 = max{0, Rxt − D̂t+1} Pre-decision state

Page 51: Approximate Dynamic Programming: Solving the curses of ... · Planning for a risky world Weather •Robust design of emergency response networks. ... Government policies (tax rebates

© 2008 Warren B. Powell Slide 51

⎛⎜⎜⎜⎝

⎞⎟⎟⎟⎠

The post-decision state

Pre-decision, state-action, and post-decision

Pre-decision state State Action Post-decision state

93 states 93 9 state-action pairs× 93 states

Page 52: Approximate Dynamic Programming: Solving the curses of ... · Planning for a risky world Weather •Robust design of emergency response networks. ... Government policies (tax rebates

© 2008 Warren B. Powell Slide 52

The post-decision state

( , )t t tS R D=

Pre-decision: resources and demands

Page 53: Approximate Dynamic Programming: Solving the curses of ... · Planning for a risky world Weather •Robust design of emergency response networks. ... Government policies (tax rebates

© 2008 Warren B. Powell Slide 53

, ( , )x M xt t tS S S x=

The post-decision state

Page 54: Approximate Dynamic Programming: Solving the curses of ... · Planning for a risky world Weather •Robust design of emergency response networks. ... Government policies (tax rebates

© 2008 Warren B. Powell Slide 54

The post-decision state

1 1 1ˆ ˆ( , )t t tW R D+ + +=

xtS ,

1 1( , )M W xt t tS S S W+ +=

Page 55: Approximate Dynamic Programming: Solving the curses of ... · Planning for a risky world Weather •Robust design of emergency response networks. ... Government policies (tax rebates

© 2008 Warren B. Powell Slide 55

The post-decision state

1tS +

Page 56: Approximate Dynamic Programming: Solving the curses of ... · Planning for a risky world Weather •Robust design of emergency response networks. ... Government policies (tax rebates

© 2008 Warren B. Powell Slide 56

The post-decision state

Classical form of Bellman’s equation:

Bellman’s equations around pre- and post-decision states:» Optimization problem (making the decision):

• Note: this problem is deterministic!» Simulation problem (the effect of exogenous

information):

( )( ),( ) max ( , ) ( , ) x M xt t x t t t t t t tV S C S x V S S x= +

{ },1 1( ) ( ( , )) |x x M W x x

t t t t t tV S E V S S W S+ +=

{ }( )1 1( ) max ( , ) ( ) |t t t t t t t txV S C S x E V S S+ +∈

= +X

Page 57: Approximate Dynamic Programming: Solving the curses of ... · Planning for a risky world Weather •Robust design of emergency response networks. ... Government policies (tax rebates

© 2008 Warren B. Powell Slide 57

The post-decision state

Challenges» For most practical problems, we are not going to be

able to compute .

» Concept: replace it with an approximation and solve

» So now we face:• What should the approximation look like?• How do we estimate it?

( )( ) max ( , ) ( ) x xt t x t t t t tV S C S x V S= +

( )x xt tV S

( )xt tV S

( )( ) max ( , ) ( ) xt t x t t t t tV S C S x V S= +

Page 58: Approximate Dynamic Programming: Solving the curses of ... · Planning for a risky world Weather •Robust design of emergency response networks. ... Government policies (tax rebates

© 2008 Warren B. Powell Slide 58

The post-decision state

Value function approximations:» Linear (in the resource state):

» Piecewise linear, separable:

» Indexed PWL separable:

( ) ( )x xt t ta ta

aV R V R

= ∑A

( )x xt t ta ta

aV R v R

= ⋅∑A

( ) ( ) | ( )x xt t ta ta t

aV R V R features

= ∑A

Page 59: Approximate Dynamic Programming: Solving the curses of ... · Planning for a risky world Weather •Robust design of emergency response networks. ... Government policies (tax rebates

© 2008 Warren B. Powell Slide 59

The post-decision state

Value function approximations:» Ridge regression (Klabjan and Adelman)

» Benders cuts

0x

( )t tV R

1x

( ) ( ) f

xt t tf tf tf fa ta

f aV R V R R Rθ

∈ ∈

= =∑ ∑F A

Page 60: Approximate Dynamic Programming: Solving the curses of ... · Planning for a risky world Weather •Robust design of emergency response networks. ... Government policies (tax rebates

© 2008 Warren B. Powell Slide 60

The post-decision state

Comparison to other methods:» Classical MDP (value iteration)

» Classical ADP (pre-decision state):

» Our method (update around post-decision state):

( ), 1 ,

, 1 ,1 1 1 1 1 1

ˆ max ( , ) ( ( , ))

ˆ( ) (1 ) ( )

n n x n M x nt x t t t t t t

n x n n x n nt t n t t n t

v C S x V S S x

V S V S vα α

−− − − − − −

= +

= − +

( )11( ) max ( , ) ( )n n

x tV S C S x EV Sγ −+= +

( )11

'

11 1

ˆ max ( , ) ( ' | , ) '

ˆ( ) (1 ) ( )

n n n nt x t t t t t t

s

n n n n nt t n t t n t

v C S x p s S x V s

V S V S vα α

−+

−− −

⎛ ⎞= +⎜ ⎟⎝ ⎠

= − +

∑ˆ updates ( )t t tv V S

1 1ˆ updates ( )xt t tv V S− −

, 1x ntV −

Expectation

Page 61: Approximate Dynamic Programming: Solving the curses of ... · Planning for a risky world Weather •Robust design of emergency response networks. ... Government policies (tax rebates

© 2008 Warren B. Powell Slide 61

The post-decision state

Step 1: Start with a pre-decision state Step 2: Solve the deterministic optimization using

an approximate value function:

to obtain . Step 3: Update the value function approximation

Step 4: Obtain Monte Carlo sample of andcompute the next pre-decision state:

Step 5: Return to step 1.

, 1 ,1 1 1 1 1 1 ˆ( ) (1 ) ( )n x n n x n n

t t n t t n tV S V S vα α−− − − − − −= − +

( )1 ,ˆ max ( , ) ( ( , ) )n n n M x nt x t t t t t tv C S x V S S x−= +

ntx

ntS

( )ntW ω

1 1( , , ( ))n M n n nt t t tS S S x W ω+ +=

Simulation

Deterministicoptimization

Recursivestatistics

Page 62: Approximate Dynamic Programming: Solving the curses of ... · Planning for a risky world Weather •Robust design of emergency response networks. ... Government policies (tax rebates

Approximate dynamic programming

Optimization Simulation

CplexApproximate dynamic programming combines simulation and optimization in a rigorous yet flexible framework.

OptimizationStrengths

Produces optimal decisions.Mature technology

WeaknessesCannot handle uncertainty.Cannot handle high levels

of complexity

SimulationStrengths

Extremely flexibleHigh level of detailEasily handles uncertainty

WeaknessesModels decisions using

user-specified rules.Low solution quality.

Page 63: Approximate Dynamic Programming: Solving the curses of ... · Planning for a risky world Weather •Robust design of emergency response networks. ... Government policies (tax rebates

Outline

A resource allocation modelAn introduction to ADP ADP and the post-decision state variableA blood management exampleHierarchical aggregationStepsizesHow well does it work?Some applications» Transportation» Energy

Page 64: Approximate Dynamic Programming: Solving the curses of ... · Planning for a risky world Weather •Robust design of emergency response networks. ... Government policies (tax rebates

© 2008 Warren B. Powell Slide 64

Blood management

Managing blood inventories

Page 65: Approximate Dynamic Programming: Solving the curses of ... · Planning for a risky world Weather •Robust design of emergency response networks. ... Government policies (tax rebates

© 2008 Warren B. Powell Slide 65

Blood management

Managing blood inventories over time

t=0

0S1 1

ˆ ˆ,R D1S

Week 1

1x1xS

2 2ˆ ˆ,R D

2S

Week 2

2x2xS

3 3ˆ ˆ,R D

3S3x

Week 3

3xS

t=1 t=2 t=3

Week 0

0x0xS

Page 66: Approximate Dynamic Programming: Solving the curses of ... · Planning for a risky world Weather •Robust design of emergency response networks. ... Government policies (tax rebates

O-,1

O-,2

O-,3

AB+,2

AB+,3

O-,0

t ABD +

AB+,0

AB+,1

AB+,2

O-,0

O-,1

O-,2

,( ,0)t ABR +

,( ,1)t ABR +

,( ,2)t ABR +

,( ,0)t OR −

,( ,1)t OR −

,( ,2)t OR −

t ABD −

t AD +

t ABD +

t ABD +

t ABD +

t ABD +

AB+

AB-

A+

A-

B+

B-

O+

O-

xtR

AB+,0

AB+,1

t ABD +

Satisfy a demand Hold

tS = ( )ˆ , t tR D

Page 67: Approximate Dynamic Programming: Solving the curses of ... · Planning for a risky world Weather •Robust design of emergency response networks. ... Government policies (tax rebates

AB+,0

AB+,1

AB+,2

tR

O-,0

O-,1

O-,2

xtR

AB+,0

AB+,1

AB+,2

AB+,3

O-,0

O-,1

O-,2

O-,3

AB+,0

AB+,1

AB+,2

AB+,3

O-,0

O-,1

O-,2

O-,3

1,ˆ

t ABR + +

1tR +

1,ˆ

t OR + −

ˆtD

,( ,0)t ABR +

,( ,1)t ABR +

,( ,2)t ABR +

,( ,0)t OR −

,( ,1)t OR −

,( ,2)t OR −

Page 68: Approximate Dynamic Programming: Solving the curses of ... · Planning for a risky world Weather •Robust design of emergency response networks. ... Government policies (tax rebates

AB+,0

AB+,1

AB+,2

tR xtR

O-,0

O-,1

O-,2

AB+,0

AB+,1

AB+,2

AB+,3

O-,0

O-,1

O-,2

O-,3

ˆtD

,( ,0)t ABR +

,( ,1)t ABR +

,( ,2)t ABR +

,( ,0)t OR −

,( ,1)t OR −

,( ,2)t OR −

Page 69: Approximate Dynamic Programming: Solving the curses of ... · Planning for a risky world Weather •Robust design of emergency response networks. ... Government policies (tax rebates

( )tF R

AB+,0

AB+,1

AB+,2

tR xtR

O-,0

O-,1

O-,2

AB+,0

AB+,1

AB+,2

AB+,3

O-,0

O-,1

O-,2

O-,3

ˆtD

,( ,0)t ABR +

,( ,1)t ABR +

,( ,2)t ABR +

,( ,0)t OR −

,( ,1)t OR −

,( ,2)t OR −

Solve this as a linear program.

Page 70: Approximate Dynamic Programming: Solving the curses of ... · Planning for a risky world Weather •Robust design of emergency response networks. ... Government policies (tax rebates

( )tF R

AB+,0

AB+,1

AB+,2

tR xtR

O-,0

O-,1

O-,2

AB+,0

AB+,1

AB+,2

AB+,3

O-,0

O-,1

O-,2

O-,3

ˆtD

Dual variables give value additional unit of blood..

Duals

,( ,0)t̂ ABν +

,( ,1)t̂ ABν +

,( ,2)t̂ ABν +

,( ,0)t̂ ABν −

,( ,1)t̂ ABν −

,( ,2)t̂ ABν −

Page 71: Approximate Dynamic Programming: Solving the curses of ... · Planning for a risky world Weather •Robust design of emergency response networks. ... Government policies (tax rebates

© 2008 Warren B. Powell Slide 71

Updating the value function approximation

Estimate the gradient at

,( ,2)nt ABR +

,( ,2)ˆnt ABν +

ntR

( )tF R

Page 72: Approximate Dynamic Programming: Solving the curses of ... · Planning for a risky world Weather •Robust design of emergency response networks. ... Government policies (tax rebates

© 2008 Warren B. Powell Slide 72

Updating the value function approximation

Update the value function at

,1

x ntR −

11 1( )n x

t tV R−− −

,1

x ntR −

,( ,2)ˆnt ABν +

( )tF R

,( ,2)nt ABR +

Page 73: Approximate Dynamic Programming: Solving the curses of ... · Planning for a risky world Weather •Robust design of emergency response networks. ... Government policies (tax rebates

© 2008 Warren B. Powell Slide 73

Updating the value function approximation

Update the value function at ,1

x ntR −

,( ,2)ˆnt ABν +

,1

x ntR −

11 1( )n x

t tV R−− −

Page 74: Approximate Dynamic Programming: Solving the curses of ... · Planning for a risky world Weather •Robust design of emergency response networks. ... Government policies (tax rebates

© 2008 Warren B. Powell Slide 74

Updating the value function approximation

Update the value function at ,1

x ntR −

,1

x ntR −

11 1( )n x

t tV R−− −

1 1( )n xt tV R− −

Page 75: Approximate Dynamic Programming: Solving the curses of ... · Planning for a risky world Weather •Robust design of emergency response networks. ... Government policies (tax rebates

© 2008 Warren B. Powell Slide 75

Iterative learning

t

Page 76: Approximate Dynamic Programming: Solving the curses of ... · Planning for a risky world Weather •Robust design of emergency response networks. ... Government policies (tax rebates

© 2008 Warren B. Powell Slide 76

Iterative learning

Page 77: Approximate Dynamic Programming: Solving the curses of ... · Planning for a risky world Weather •Robust design of emergency response networks. ... Government policies (tax rebates

© 2008 Warren B. Powell Slide 77

Iterative learning

Page 78: Approximate Dynamic Programming: Solving the curses of ... · Planning for a risky world Weather •Robust design of emergency response networks. ... Government policies (tax rebates

© 2008 Warren B. Powell Slide 78

Iterative learning

Page 79: Approximate Dynamic Programming: Solving the curses of ... · Planning for a risky world Weather •Robust design of emergency response networks. ... Government policies (tax rebates

Approximate dynamic programmingWith luck, the objective function will improve steadily

Objec tive func tion

97

97.5

98

98.5

99

99.5

100

1 11 21 31 41 51 61 71 81

Itera tion

Page 80: Approximate Dynamic Programming: Solving the curses of ... · Planning for a risky world Weather •Robust design of emergency response networks. ... Government policies (tax rebates

Chart Title

1700000

1720000

1740000

1760000

1780000

1800000

1820000

1840000

1860000

1880000

1900000

300 400 500 600 700 800 900 1000

Iterations

Obj

ectiv

e fu

nctio

n

Approximate dynamic programming

… but performance can be jagged.

Page 81: Approximate Dynamic Programming: Solving the curses of ... · Planning for a risky world Weather •Robust design of emergency response networks. ... Government policies (tax rebates

Outline

A resource allocation modelAn introduction to ADP ADP and the post-decision state variableA blood management exampleHierarchical aggregationStepsizesHow well does it work?Some applications» Transportation» Energy

Page 82: Approximate Dynamic Programming: Solving the curses of ... · Planning for a risky world Weather •Robust design of emergency response networks. ... Government policies (tax rebates

Hierarchical aggregation

Attribute vectors:

a =Location

ETAA/C typeFuel level

Home shopCrewEqpt1

Eqpt100

⎡ ⎤⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦

LocationETA

Bus. segmentSingle/team

DomicileDrive hours

Days from home

⎡ ⎤⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦

Blood typeAge

Frozen?

⎡ ⎤⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦

Asset classTime invested⎡ ⎤⎢ ⎥⎣ ⎦

Page 83: Approximate Dynamic Programming: Solving the curses of ... · Planning for a risky world Weather •Robust design of emergency response networks. ... Government policies (tax rebates

Estimating value functions

1 ˆ( ) (1 ) ( ) ( )n n

LocationLocation Location Fleet

v Fleet v Fleet v DomicileDomicile Domicile DOThrs

DaysFromHome

α α−

⎡ ⎤⎢ ⎥⎡ ⎤ ⎡ ⎤ ⎢ ⎥⎢ ⎥ ⎢ ⎥ ⎢ ⎥= − +⎢ ⎥ ⎢ ⎥ ⎢ ⎥⎢ ⎥ ⎢ ⎥⎣ ⎦ ⎣ ⎦ ⎢ ⎥⎢ ⎥⎣ ⎦

$2050 = $2000 $2500(1 0.10)− × + (0.10)

Value function Approximation may have fewer attributes than driver.

Drivers may have very detailed attributes

Aggregation

Page 84: Approximate Dynamic Programming: Solving the curses of ... · Planning for a risky world Weather •Robust design of emergency response networks. ... Government policies (tax rebates

Estimating value functions» Most disaggregate level

Hierarchical aggregation

1 ˆ( ) (1 ) ( ) ( )n n

LocationLocation Location Fleet

v Fleet v Fleet v DomicileDomicile Domicile DOThrs

DaysFromHome

α α−

⎡ ⎤⎢ ⎥⎡ ⎤ ⎡ ⎤ ⎢ ⎥⎢ ⎥ ⎢ ⎥ ⎢ ⎥= − +⎢ ⎥ ⎢ ⎥ ⎢ ⎥⎢ ⎥ ⎢ ⎥⎣ ⎦ ⎣ ⎦ ⎢ ⎥⎢ ⎥⎣ ⎦

Page 85: Approximate Dynamic Programming: Solving the curses of ... · Planning for a risky world Weather •Robust design of emergency response networks. ... Government policies (tax rebates

Estimating value functions» Middle level of aggregation

Hierarchical aggregation

1 ˆ( ) (1 ) ( ) ( )n n

LocationFleet

Location Locationv v v Domicile

Fleet FleetDOThrs

DaysFromHome

α α−

⎡ ⎤⎢ ⎥⎢ ⎥⎡ ⎤ ⎡ ⎤⎢ ⎥= − +⎢ ⎥ ⎢ ⎥⎢ ⎥⎣ ⎦ ⎣ ⎦⎢ ⎥⎢ ⎥⎣ ⎦

Page 86: Approximate Dynamic Programming: Solving the curses of ... · Planning for a risky world Weather •Robust design of emergency response networks. ... Government policies (tax rebates

Estimating value functions» Most aggregate level

Hierarchical aggregation

[ ] [ ]1 ˆ( ) (1 ) ( ) ( )n n

LocationFleet

v Location v Location v DomicileDOThrs

DaysFromHome

α α−

⎡ ⎤⎢ ⎥⎢ ⎥⎢ ⎥= − +⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦

Page 87: Approximate Dynamic Programming: Solving the curses of ... · Planning for a risky world Weather •Robust design of emergency response networks. ... Government policies (tax rebates

State-dependent weighted aggregation:» Now we have a linear regression for each attribute

» We cannot solve hundreds of thousands of linear regressions. Instead use

( ) ( ) g ga a a

g

v w v=∑

Estimate of variance Estimate of bias

( ) ( )( ) 12( ) ( ) ( ) ( ) 1g g g ga a a a

g

w Var v wβ−

∝ + =∑

Hierarchical aggregation

Page 88: Approximate Dynamic Programming: Solving the curses of ... · Planning for a risky world Weather •Robust design of emergency response networks. ... Government policies (tax rebates

1400000

1450000

1500000

1550000

1600000

1650000

1700000

1750000

1800000

1850000

1900000

0 100 200 300 400 500 600 700 800 900 1000

Iterations

Obj

ectiv

e fu

nctio

n

Aggregate

Disaggregate

Hierarchical aggregation

Page 89: Approximate Dynamic Programming: Solving the curses of ... · Planning for a risky world Weather •Robust design of emergency response networks. ... Government policies (tax rebates

1400000

1450000

1500000

1550000

1600000

1650000

1700000

1750000

1800000

1850000

1900000

0 100 200 300 400 500 600 700 800 900 1000

Iterations

Obj

ectiv

e fu

nctio

n

Weighted Combination

Aggregate

Disaggregate

Hierarchical aggregation

Page 90: Approximate Dynamic Programming: Solving the curses of ... · Planning for a risky world Weather •Robust design of emergency response networks. ... Government policies (tax rebates

Outline

A resource allocation modelAn introduction to ADP ADP and the post-decision state variableA blood management exampleHierarchical aggregationStepsizesHow well does it work?Some applications» Transportation» Energy

Page 91: Approximate Dynamic Programming: Solving the curses of ... · Planning for a risky world Weather •Robust design of emergency response networks. ... Government policies (tax rebates

Stepsizes

Stepsizes:» Fundamental to ADP is an updating equation that looks

like:

11 1 1 1 1 1 ˆ( ) (1 ) ( )n x n x n

t t n t t n tV S V S vα α−− − − − − −= − +

Old estimate New observationUpdated estimate

The stepsize“Learning rate”

“Smoothing factor”

Page 92: Approximate Dynamic Programming: Solving the curses of ... · Planning for a risky world Weather •Robust design of emergency response networks. ... Government policies (tax rebates

Stepsizes

High noise data:

Low noise, changing signal

0

0.5

1

1.5

2

2.5

3

1 21 41 61 81

Best stepsize

0

0.2

0.4

0.6

0.8

1

1.2

1 21 41 61 81

1/n nα =

1nα ≈

Page 93: Approximate Dynamic Programming: Solving the curses of ... · Planning for a risky world Weather •Robust design of emergency response networks. ... Government policies (tax rebates

Stepsizes

Theorem 1 (general knowledge):» If data is stationary, then 1/n is the best possible

stepsize.

Theorem 2 (e.g. Tsitsiklis 1994)» For general problems, 1/n is provably convergent

Theorem 3 (Frazier and Powell):» 1/n works so badly it should (almost) never be used.

Page 94: Approximate Dynamic Programming: Solving the curses of ... · Planning for a risky world Weather •Robust design of emergency response networks. ... Government policies (tax rebates

Stepsizes

Lower bound on the value function after n iterations:

210 410 610 810 1010 1210

Page 95: Approximate Dynamic Programming: Solving the curses of ... · Planning for a risky world Weather •Robust design of emergency response networks. ... Government policies (tax rebates

Stepsizes

( ) ( )

( ) ( )

2

21 2

2 21

2

11

where:

1

As increases, stepsize decreases toward 1/As increases, stepsize increases toward 1.

n n n

n nn n

n

n

σαλ σ β

λ α λ α

σ

β

= −+ +

= − +

Estimate of the variance

Estimate of the bias

Bias

Noise

Bias-adjusted Kalman filter

Page 96: Approximate Dynamic Programming: Solving the curses of ... · Planning for a risky world Weather •Robust design of emergency response networks. ... Government policies (tax rebates

Stepsizes

The bias-adjusted Kalman filter

0

0.2

0.4

0.6

0.8

1

1.2

1 21 41 61 81

Observed values

BAKF stepsize rule

1/n stepsize rule

Page 97: Approximate Dynamic Programming: Solving the curses of ... · Planning for a risky world Weather •Robust design of emergency response networks. ... Government policies (tax rebates

Stepsizes

The bias-adjusted Kalman filter

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1 21 41 61 81

Observed values

BAKF stepsize rule

Page 98: Approximate Dynamic Programming: Solving the curses of ... · Planning for a risky world Weather •Robust design of emergency response networks. ... Government policies (tax rebates

Stepsizes

BAKF formula

Our newest formula» where

» First stepsize formula designed specifically for dynamic programming (note: no need to estimate bias )

» Captures the natural growth of a value function through iterative learning.

( )( )( )( )

21 2

22 1 2

n n

nn n

λ σ βα

σ λ σ β

+=

+ +

( )( )( )( ) ( )( )( )

21 2 1 2

22 1 2 1 2 1 2

1 1

1 1 1

n n

nn n n

c

c

λ σ γ δα

γ λ σ λ σ γ δ

− −

− − −

+ − −=

+ + + − −

Page 99: Approximate Dynamic Programming: Solving the curses of ... · Planning for a risky world Weather •Robust design of emergency response networks. ... Government policies (tax rebates

Stepsizes

I recommend:» Start with a constant stepsize

• Vary it, get a sense of what works best

» Next try a deterministic stepsize such as a/(a+n)• Choose a so that it declines to roughly the best deterministic

stepsize at an iteration comparable to where your constant stepsize seems to stabilize

– Might be 50 iterations– Might be 5000

» Try an (optimal) adaptive stepsize rule• Can work very well if there is not too much noise• Adaptive rules work well when there is a need to keep the

stepsize from declining too quickly (but you do not know how quickly)

Page 100: Approximate Dynamic Programming: Solving the curses of ... · Planning for a risky world Weather •Robust design of emergency response networks. ... Government policies (tax rebates

Outline

A resource allocation modelAn introduction to ADP ADP and the post-decision state variableA blood management exampleHierarchical aggregationStepsizesHow well does it work?Some applications» Transportation» Energy

Page 101: Approximate Dynamic Programming: Solving the curses of ... · Planning for a risky world Weather •Robust design of emergency response networks. ... Government policies (tax rebates

Two-stage optimizationPiecewise linear, separable value function approximations:

Piecewise linear, separable:

( ) ( )t t tl tll

V R V R∈

=∑L

Page 102: Approximate Dynamic Programming: Solving the curses of ... · Planning for a risky world Weather •Robust design of emergency response networks. ... Government policies (tax rebates

Two-stage optimizationBenders decomposition:

0x

( )t tV R

1x

⎫⎪⎪⎬⎪⎪⎭

Multidimensional cuts produce provably convergent, nonseparablevalue function approximation.

Page 103: Approximate Dynamic Programming: Solving the curses of ... · Planning for a risky world Weather •Robust design of emergency response networks. ... Government policies (tax rebates

The competition

Exact solutions using Benders:

0x

0V“L-Shaped” decomposition

(Van Slyke and Wets)

0x

0VStochastic decomposition

(Higle and Sen)

0x

0VCUPPS

(Chen and Powell)

Page 104: Approximate Dynamic Programming: Solving the curses of ... · Planning for a risky world Weather •Robust design of emergency response networks. ... Government policies (tax rebates

The competitionPercent from optimal 100 iterations

0

5

10

15

20

25

30

35

40

45

SD L-shaped CUPPS SPAR

10 locations25 locations50 locations100 locations

10

20

30

40

0

Percent over optimal after 100 iterations

Benders

Perc

ent e

rror

Increasing problem size

Separable

Page 105: Approximate Dynamic Programming: Solving the curses of ... · Planning for a risky world Weather •Robust design of emergency response networks. ... Government policies (tax rebates

The competitionPercent from optimal 100 iterations

0

5

10

15

20

25

30

35

40

45

SD L-shaped CUPPS SPAR

10 locations25 locations50 locations100 locations

10

20

30

40

0

Percent over optimal after 100 iterations

Increasing problem size

Benders Separable

Perc

ent e

rror

Page 106: Approximate Dynamic Programming: Solving the curses of ... · Planning for a risky world Weather •Robust design of emergency response networks. ... Government policies (tax rebates

Multistage problems

Deterministic, single commodity flow

Page 107: Approximate Dynamic Programming: Solving the curses of ... · Planning for a risky world Weather •Robust design of emergency response networks. ... Government policies (tax rebates

Multistage problems

Deterministic, single commodity flow» ADP solution as a percent of the optimal solution (from

Cplex)

Page 108: Approximate Dynamic Programming: Solving the curses of ... · Planning for a risky world Weather •Robust design of emergency response networks. ... Government policies (tax rebates

Multistage problems

Stochastic, single commodity flow

0

20

40

60

80

100

120

(20,10

0)(20

,200)

(20,40

0)(40

,100)

(40,20

0)(40

,400)

(80,10

0)(80

,200)

(80,40

0)

Perc

ent o

f pos

terio

r opt

imal

Rolling horizonADP

100 = optimal posterior bound

Page 109: Approximate Dynamic Programming: Solving the curses of ... · Planning for a risky world Weather •Robust design of emergency response networks. ... Government policies (tax rebates

Multistage problems

Deterministic, (integer) multicommodity flow

Page 110: Approximate Dynamic Programming: Solving the curses of ... · Planning for a risky world Weather •Robust design of emergency response networks. ... Government policies (tax rebates

Multistage problems

Deterministic, (integer) multicommodity flow

60

65

70

75

80

85

90

95

100

105

Base

T_30

T_90

I_10

I_40

C_IIC_IIIC_IV R_1

R_5R_10

0R_40

0C_1C_8

Perc

ent o

f opt

imal

100 = optimal continuous relaxation

Page 111: Approximate Dynamic Programming: Solving the curses of ... · Planning for a risky world Weather •Robust design of emergency response networks. ... Government policies (tax rebates

Multistage problems

Stochastic, (integer) multicommodity flow

0

20

40

60

80

100

120

Base

I_10

I_40

C_II C_III

C_IV R_1 R_5R_1

00R_4

00 C_1 C_8

Perc

ent o

f pos

terio

r opt

imal

Rolling horizonADP

Page 112: Approximate Dynamic Programming: Solving the curses of ... · Planning for a risky world Weather •Robust design of emergency response networks. ... Government policies (tax rebates

Outline

A resource allocation modelAn introduction to ADP ADP and the post-decision state variableA blood management exampleHierarchical aggregationStepsizesHow well does it work?Some applications» Transportation» Energy

Page 113: Approximate Dynamic Programming: Solving the curses of ... · Planning for a risky world Weather •Robust design of emergency response networks. ... Government policies (tax rebates

© 2008 Warren B. Powell Slide 113

Schneider National

Page 114: Approximate Dynamic Programming: Solving the curses of ... · Planning for a risky world Weather •Robust design of emergency response networks. ... Government policies (tax rebates

© 2008 Warren B. Powell Slide 114

Page 115: Approximate Dynamic Programming: Solving the curses of ... · Planning for a risky world Weather •Robust design of emergency response networks. ... Government policies (tax rebates

© 2008 Warren B. Powell Slide 115

0

200

400

600

800

1000

1200

1400

US_SOLO US_IC US_TEAM

Capacity category

Rev

enue

per

WU

Historical maximum

Simulation

Historical minimum

0

200

400

600

800

1000

1200

US_SOLO US_IC US_TEAM

Capacity category

Util

izat

ion Historical maximum

Simulation

Historical minimumRevenue per WU

Utilization

Historical min and maxCalibrated model

Schneider National

Page 116: Approximate Dynamic Programming: Solving the curses of ... · Planning for a risky world Weather •Robust design of emergency response networks. ... Government policies (tax rebates

Princeton team:Warren PowellBelgacem Bouzaiene-Ayari

NS team:Clark ChengRicardo FiorilloJunxia ChangSourav Das

Page 117: Approximate Dynamic Programming: Solving the curses of ... · Planning for a risky world Weather •Robust design of emergency response networks. ... Government policies (tax rebates

© 2008 Warren B. Powell Slide 117

Page 118: Approximate Dynamic Programming: Solving the curses of ... · Planning for a risky world Weather •Robust design of emergency response networks. ... Government policies (tax rebates

© 2008 Warren B. Powell Slide 118

Page 119: Approximate Dynamic Programming: Solving the curses of ... · Planning for a risky world Weather •Robust design of emergency response networks. ... Government policies (tax rebates

© 2008 Warren B. Powell Slide 119

Page 120: Approximate Dynamic Programming: Solving the curses of ... · Planning for a risky world Weather •Robust design of emergency response networks. ... Government policies (tax rebates

© 2008 Warren B. Powell Slide 120

Page 121: Approximate Dynamic Programming: Solving the curses of ... · Planning for a risky world Weather •Robust design of emergency response networks. ... Government policies (tax rebates

Convergence

Train coverage

Page 122: Approximate Dynamic Programming: Solving the curses of ... · Planning for a risky world Weather •Robust design of emergency response networks. ... Government policies (tax rebates

Model calibration

Train delay curve – October 2007D

elay

hou

rs/d

ay

Fleet size

Page 123: Approximate Dynamic Programming: Solving the curses of ... · Planning for a risky world Weather •Robust design of emergency response networks. ... Government policies (tax rebates

Model calibration

Train delay curve – March 2008D

elay

hou

rs/d

ay

Fleet size

Page 124: Approximate Dynamic Programming: Solving the curses of ... · Planning for a risky world Weather •Robust design of emergency response networks. ... Government policies (tax rebates

Outline

A resource allocation modelAn introduction to ADP ADP and the post-decision state variableA blood management exampleHierarchical aggregationStepsizesHow well does it work?Some applications» Transportation» Energy

Page 125: Approximate Dynamic Programming: Solving the curses of ... · Planning for a risky world Weather •Robust design of emergency response networks. ... Government policies (tax rebates

© 2008 Warren B. Powell Slide 125

Energy resource modeling

Page 126: Approximate Dynamic Programming: Solving the curses of ... · Planning for a risky world Weather •Robust design of emergency response networks. ... Government policies (tax rebates

© 2008 Warren B. Powell Slide 126

Energy resource modeling

Hourly model» Decisions at time t impact t+1 through the amount of water held in

the reservoir.Hour t Hour t+1

Page 127: Approximate Dynamic Programming: Solving the curses of ... · Planning for a risky world Weather •Robust design of emergency response networks. ... Government policies (tax rebates

© 2008 Warren B. Powell Slide 127

Energy resource modeling

Hourly model» Decisions at time t impact t+1 through the amount of water held in

the reservoir.

Value of holding water in the reservoir for future time periods.

Hour t

Page 128: Approximate Dynamic Programming: Solving the curses of ... · Planning for a risky world Weather •Robust design of emergency response networks. ... Government policies (tax rebates

© 2008 Warren B. Powell Slide 128

Energy resource modeling

Page 129: Approximate Dynamic Programming: Solving the curses of ... · Planning for a risky world Weather •Robust design of emergency response networks. ... Government policies (tax rebates

© 2008 Warren B. Powell Slide 129

Energy resource modeling2008

Hour 1 2 3 4 87602009

1 2

Page 130: Approximate Dynamic Programming: Solving the curses of ... · Planning for a risky world Weather •Robust design of emergency response networks. ... Government policies (tax rebates

© 2008 Warren B. Powell Slide 130

Energy resource modeling2008

Hour 1 2 3 4 87602009

1 2

Page 131: Approximate Dynamic Programming: Solving the curses of ... · Planning for a risky world Weather •Robust design of emergency response networks. ... Government policies (tax rebates

© 2008 Warren B. Powell Slide 131

Annual energy model

Optimal

0

1000

2000

3000

4000

5000

6000

7000

8000

9000

0 100 200 300 400 500 600 700 800 900 1000

PrecipitationHydroUsageRemainingWastageDemandCoalWind

Reservoir level

Demand

Page 132: Approximate Dynamic Programming: Solving the curses of ... · Planning for a risky world Weather •Robust design of emergency response networks. ... Government policies (tax rebates

© 2008 Warren B. Powell Slide 132

0

1000

2000

3000

4000

5000

6000

7000

8000

9000

0 100 200 300 400 500 600 700 800 900 1000

PrecipitationUsageRemainingWastageDemandCoalWind

Annual energy model

Approximate dynamic programming

Reservoir level

Demand

Page 133: Approximate Dynamic Programming: Solving the curses of ... · Planning for a risky world Weather •Robust design of emergency response networks. ... Government policies (tax rebates

© 2008 Warren B. Powell Slide 133

Annual energy model

Optimal

0

1000

2000

3000

4000

5000

6000

7000

8000

9000

0 100 200 300 400 500 600 700 800 900 1000

PrecipitationHydroUsageRemainingWastageDemandCoalWind

Reservoir level

Demand

Page 134: Approximate Dynamic Programming: Solving the curses of ... · Planning for a risky world Weather •Robust design of emergency response networks. ... Government policies (tax rebates

© 2008 Warren B. Powell Slide 134

Annual energy model

Approximate dynamic programming

0

1000

2000

3000

4000

5000

6000

7000

8000

9000

0 100 200 300 400 500 600 700 800 900 1000

PrecipitationUsageRemainingWastageDemandCoalWind

Reservoir level

Demand

Page 135: Approximate Dynamic Programming: Solving the curses of ... · Planning for a risky world Weather •Robust design of emergency response networks. ... Government policies (tax rebates

© 2008 Warren B. Powell Slide 135

0

1000

2000

3000

4000

5000

6000

7000

8000

9000

0 100 200 300 400 500 600 700 800

Res

ervo

ir le

vel

Time period

ADP at last iterationOptimal for individual

scenarios

Annual energy model

ADP vs optimal reservoir levels for stochastic rainfall

Page 136: Approximate Dynamic Programming: Solving the curses of ... · Planning for a risky world Weather •Robust design of emergency response networks. ... Government policies (tax rebates

© 2008 Warren B. Powell Slide 136